0% found this document useful (0 votes)
4 views60 pages

Unit 2- Rdbms and SQL

This document is a course outline for a Database Management System unit focusing on RDBMS and SQL. It covers topics such as relational query languages, SQL concepts, integrity constraints, data definition and manipulation statements, and normalization. The unit aims to equip students with the skills to create and manipulate relational database objects using SQL.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views60 pages

Unit 2- Rdbms and SQL

This document is a course outline for a Database Management System unit focusing on RDBMS and SQL. It covers topics such as relational query languages, SQL concepts, integrity constraints, data definition and manipulation statements, and normalization. The unit aims to equip students with the skills to create and manipulate relational database objects using SQL.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

MASTER OF COMPUTER APPLICATIONS

O02CA504: Database Management System

SEMESTER 1

O02CA504
DATABASE MANAGEMENT SYSTEM1
Unit: 2 – RDBMS and SQL
O02CA504: Database Management System

Unit 2
RDBMS and SQL
TABLE OF CONTENTS
Fig No /
SL SAQ /
Topic Table / Page No
No Activity
Graph
1 Introduction - -
4
1.1 Objectives - -
2 Relational Query Languages - - 5
3 SQL Concepts - - 6-7
4 Integrity Constraints - -
4.1 Entity integrity - -
8 - 10
4.2 Domain integrity - -
4.3 Referential integrity - -
5 Data Definition Statements - -
5.1 Creating relations in SQL - -
11 - 13
5.2 Adding and deleting tuples - -
5.3 Destroying and altering relations - -
6 Data Definition Statements - -
6.1 SELECT statement - -
6.2 Subquery - -
6.3 Querying multiple relations - - 14 - 20
6.4 Functions - -
6.5 GROUP BY - -
6.6 Updating the database - -
7 Views - - 21
8 Embedding SQL Statements - - 22 - 24
9 Transaction Processing - - 25 - 27
10 Normalisation and Database Design - -
10.1 First normal form - -
10.2 Second normal form - -
28 - 48
10.3 Third Normal form - -
10.4 Boyce-Codd normal form - -
10.5 Fourth normal form - -

Unit: 2 – RDBMS and SQL 2


O02CA504: Database Management System

10.6 Fifth normal form - -


11 Denormalisation - - 49 - 50
12 Summary - - 51
13 Glossary - - 52 - 55
14 Terminal Questions - - 56
15 Answers - - 57 - 59
16 References - - 60

Unit: 2 – RDBMS and SQL 3


O02CA504: Database Management System

1. INTRODUCTION
In the previous unit, you studied the advantages and disadvantages of those database systems.

The interaction level of a database depends on its usage. If the user uses the database at a higher
level, then the interaction level will also be increased. Hence, each database system should have
several methods, languages, and groups of software. So that users can submit a request, process
the request, and get the output of the request. This unit introduces some of the database query
languages and tools.

Now that we are clear about various types of DBMS, let us start this unit, where you will learn
about query languages and will also study SQL features and queries.

1.1. Objectives
After studying this unit, you should be able to:
create relational database objects using SQL
formulate tables and data residing in them
create and manipulate views
describe transaction processing
discuss the concept of embedded SQL and dynamic SQL

Unit: 2 – RDBMS and SQL 4


O02CA504: Database Management System

2. RELATIONAL QUERY LANGUAGES

Modern RDBMSs support several query languages for user interaction. There are two most
common query languages available with RDBMS: SQL (Structured Query Language) and QBE
(Query by Example).

Others are Information System Base Language (ISBL) from the Peterlee Relational Test
Vehicle (PRTV) system and QUEL (Query Language) from INGRES (Interactive Graphics
Retrieval System). ISBL (Information System Base Language) is based on relational algebra and
query language, SQL is like tuple calculus, and QBE is like domain calculus.

In this section, we will focus on QBE and in the forthcoming sections (Section 3 onwards), you will
study SQL in detail.

Query by Example (QBE): QBE was developed in the mid-70s at IBM research simultaneously
with the development of SQL. M.M Zloof designed the Query by Example (QBE), which is a
relational database query language. It is the first graphical query language. QBE is used for visual
representation of tables where the user gives commands for defining what is to be done, instances
for defining how it is done and conditions in which records should be admitted into the processing.

SELF-ASSESSMENT QUESTIONS – 1
1. QBE stands for _________.
2. SQL is supported by RDBMS. (True/False)

Unit: 2 – RDBMS and SQL 5


O02CA504: Database Management System

3. SQL

SQL (Structured Query Language) is a standard relational database language used for the
creation, deletion and modification of database tables.

(Note*: The SQL Keywords are case-insensitive (SELECT, FROM, WHERE, etc); we have used
caps words where we want to put emphasis on the word. Table names, column names, etc., are
case-insensitive in Windows OS but are sensitive in UNIX OS)

Features: SQL has a very rich set of features, which are given in Table 2.1 below:

Table 2.1: SQL Features

The Data Manipulation Language (DML): As the name says, this language is used for
manipulating. The data is stored in database
objects. DML uses SELECT, INSERT, DELETE
and UPDATE commands to modify the data.

The Data Definition Language (DDL): This language is used to define the structure of the
table. With CREATE, ALTER and DROP
commands, the structure of the table can be
modified; it can also be deleted and created as
well.

Specifications of Triggers and Complex Integrating SQL provides the features of the triggers and
Constraints: complex integrity constraints (ICs) to be applied to
queries.

Triggers: Triggers are the actions that are run by DBMS


whenever some event related to the database
occurs.

Run-time (Dynamic) and Embedded SQL: With the run-time feature of SQL, users can
execute the queries at run-time. With embedded
SQL, users can retrieve the SQL statements that
are part of some other host language (such as C or
Cobol).

Unit: 2 – RDBMS and SQL 6


O02CA504: Database Management System

Execution of the Client-Server Application and This feature allows a client program to establish a
Accessing Remote
connection with the server database. This feature

Database: It also allows the user to access a remote


database.

Managing, Transaction: SQL command specifies the actions to be taken in

order to control the execution of the transaction.

Security: It controls the access to the tables and views,


thereby protecting the database.

Advanced Features provided by the SQL: Many features, such as recursive and decision-

support queries, object-oriented features, etc.

provided by the SQL

SELF-ASSESSMENT QUESTIONS – 2
3. SELECT, INSERT, DELETE and UPDATE commands are used by
_________ to modify the data.
4. SQL commands defines the actions to be taken to control _________ .

Unit: 2 – RDBMS and SQL 7


O02CA504: Database Management System

4. INTEGRITY CONSTRAINTS

DBMS maintains the data integrity to avoid the wrong information in the database.

The condition of integrity constraints is defined on the database schema. An integrity constraint
limits the data that could be stored in a database instance. When a database instance fulfils all
the integrity constraints -defined on the database schema, it is then known as a legal instance. A
DBMS implements integrity constraints; therefore, it permits only legal instances to be stored in
the database.

The major relational constraints are Domain constraints, Key constraints and constraints on null,
Entity integrity and Referential integrity and foreign keys.

4.1. Entity integrity


When all rows in the column have a unique identifier, it means each row is different from others;
it is known as entity integrity. Entity integrity is placing a primary key (PK) constraint on a particular
column. This ensures all values inserted into the column (s) should be unique. In PK constraints,
you cannot enter duplicate values and null values in column (s) because it results in failure.

The primary key of a relational table uniquely identifies each record in the table. It can either be a
normal attribute that is guaranteed to be unique, or it

can be generated by the DBMS (such as a globally unique identifier in Microsoft SQL Server).
Primary keys may consist of a single attribute or multiple attributes in combination. The intelligent
key is the utilisation of genuine data as a PK. Only one PK is assigned to a table. A composite PK
does not contain only one column. We can utilise the composite PK when not even one column
has the unique composite key.

Hence, we can say that a table can contain only one PK, but a PK can contain more than one
column. If we have to apply uniqueness on more than one column, we need to utilise a PK
constraint on a single column and a UNIQUE constraint or IDENTITY property on other columns
that do not contain duplicate values.

Unit: 2 – RDBMS and SQL 8


O02CA504: Database Management System

4.2. Domain integrity


In database language, a domain is a group of allowed values for a column (domain cannot be
confused between different types of domain, for example, Internet domain or Windows NT Domain.

Domain integrity is also called 'attribute' integrity, for example, allowed size values, right data type,
null status, etc. Implementation of data integrity can be done with DEFAULT constraint, FOREIGN
KEY, CHECK constraint, and data types. Data types restrict the fields in different ways. A default
can be defined as a value to be inserted into a column; a rule is defined as acceptable values to
be inserted into a column. Rules and defaults are the same as constraints but not similar to ANSI
standards; their continued utilisation is not promoted.

4.3. Referential integrity


Referential integrity is formed with the combination of Primary Key (PK) and Foreign Key (FK).

Primary key: As explained above, it is a key that uniquely recognises a record in a field(s) of a
table. Hence, a particular record can be tracked without confusion.

Foreign Key: A foreign key is a column or even a group of columns in a table (also called 'child
table') that accepts its values from the primary key (PK) from another table (also called 'parent
table'). To preserve the referential integrity, the foreign key in the 'child' table can only take values
that are in the primary key of the 'parent' table. The main aim of referential integrity is to avoid
'orphans'. These orphans are records in a ‘child table’ that cannot be linked to a record in the
‘parent table’.

Implementing referential integrity means that when the records go through operations like
insertion, deletion, and updation, the relationship between the tables should be maintained. PK-
FK combination also has referential integrity. An example of a primary key and a foreign key is
represented in Figure 2.1.

In the 1st table, the first column (Account Number) is the PK, and in the same table, the branch
name is the FK. To connect the 1st and 2nd tables, the FK has become PK.

Unit: 2 – RDBMS and SQL 9


O02CA504: Database Management System

Figure 2.1: Instance of PK and FK

SELF-ASSESSMENT QUESTIONS – 3
5. _________ is formed with the combination of PK and FK.
6. Domain integrity is also called as ‘_________' integrity.

Unit: 2 – RDBMS and SQL 10


O02CA504: Database Management System

5. DATA DEFINITION STATEMENTS

Data Definition Language (DDL) permits users to create or modify database objects. Specifically,
they perform the tasks of creating objects, altering or modifying objects, dropping or deleting
objects, etc.

5.1 Creating Relations in SQL


We define an SQL relation using the CREATE TABLE command to create a TABLE Structure.
CREATE TABLE syntax is given below.
CREATE TABLE <tablename>
(
Column1 data type (size) [null/not null]
Column 2 data type (size), …………….
)
For example, to create the table EMP, enter the following query.
CREATE TABLE EMP
(
EMPNO NUMBER (4) NOT NULL
ENAME VARCHAR 2 (10),
JOB VARCHAR2 (9),
DOJ DATE,
SAL NUMBER (7,2),
COMM NUMBER (7,2),
DEPTNO NUMBER (2 NOT NULL)
)

In the above program, we have created a table named “EMP”. There are seven columns: EMPNO
(Employee Number), ENAME (Employee Name), JOB, DOJ (Date of Joining), SAL (Salary),
COMM (Communication Number), DEPTNO (Department Number).

Unit: 2 – RDBMS and SQL 11


O02CA504: Database Management System

5.2 Adding and deleting tuples


Adding a tuple/record/row: INSERT command is used to insert a record in the table. The syntax
is:
INSERT INTO <tablename>
VALUES (value1, value 2, …………);
For example
INSERT INTO EMP
VALUES (‘101’, ‘Nandi’, ‘President’, ‘17-NOV-88’, 5000, null, ‘10’);

The above example shows the insertion of a record into the EMP table.

To insert values into only EMPNO, DEPTNO and ENAME fields, enter the following query.

INSERT INTO EMP (EMPNO, DEPTNO, ENAME)

VALUES (‘101’, ‘29’, ‘Sujit’);

Deleting tuple: The DELETE command is used to delete a row from the table.

The syntax is:


DELETE FROM <tablename>
WHERE<condition>
For example,
DELETE FROM EMP
WHERE SAL > 1000;

The above example shows the deletion of all the employees whose salaries are more than 1000.
If we delete the WHERE clause, then all rows of the table will be deleted, but a part of the row
cannot be deleted.

5.3. Destroying and altering relations


The DROP TABLE command is used to delete all the information on a dropped relation from the
database. The syntax is DROP TABLE<Table Name>.

Example: DROP TABLE details

There are two types of DROP commands: CASCADE and RESTRICT.

Unit: 2 – RDBMS and SQL 12


O02CA504: Database Management System

CASCADE command deletes the complete database schema, which contains tables, domains
and other elements.

RESTRICT command deletes the database schema if it does not contain any element, or else the
command will be terminated.

Alter table command: ALTER TABLE command adds attributes to an existing relation. The Null
values are assigned to all the tuples as a new attribute. The syntax is

ALTER TABLE d ADD I, D

Where d is the existing relation, I is the added attribute, and D is the domain of the added attribute.

ALTER TABLE d DROP I

This statement can drop attributes from a relation. Where d is the existing relation, and I am the
attribute of the relation.

Example: ALTER TABLE details ADD Parents_Name VARCHAR (20); The above example will
add an attribute: Parents_Name to the table details.

SELF-ASSESSMENT QUESTIONS – 4
7. There are two types of DROP commands: CASCADE and RISTRICT (True/False)
8. _________ command helps for the creation of SQL relations.

Unit: 2 – RDBMS and SQL 13


O02CA504: Database Management System

6. DATA MANIPULATION LANGUAGE


Data Manipulation Language (DML) contains commands which manipulate data in existent
database schema objects. Current transactions are committed by these statements. You can find
these commands in Table 2.2.

Table 2.2: Data Manipulation Language Commands

The SELECT statement is used to retrieve information from the database.

6.1. SELECT statement


Apart from information retrieval, this statement also gives query capability. It means that when the
select command runs, the information present in the table will be displayed on the screen.

Syntax: The three common elements of the SELECT command are SELECT, FROM and WHERE.
These elements can retrieve information from more than one table. The syntax is:

SELECT <column_list>

FROM <table_list>

WHERE <search_criteria>

Where

<column_list> defines the list of attributes whose value is to be extracted

<table list> defines the list of relation names

<Condition> defines the conditional expression that recognises the tuple.

Unit: 2 – RDBMS and SQL 14


O02CA504: Database Management System

In SQL, basic logical comparison operators are used on the WHERE clause.

Comparison operators and their meanings are given in Table 2.3:

Table 2.3: Logical Comparison Operators and their Meaning


Operator Meaning
= equal to
> greater than
>= greater than equal to
< less than
<= less than equal to
<> not equal to
!= not equal to
!> not greater than
!< not less than
() order of precedence

Let us first begin with a very basic SQL query.

Example: Assume a table whose name is EMPLOYEE

EMPNO ENAME DESIGNATION DEPTNO PAY INCENTIVES

1821 JOHN PRESIDENT 1 60000 8000

1858 AINA MANAGER 3 30000 6000

1875 KRIPSI MANAGER 1 20000 4000

1877 ARICA MANAGER 2 15000 1000

SELECT EMPNO, ENAME, DEPTNO

FROM EMPLOYEE

WHERE DEPTNO =2;

This query will display three columns, i.e., EMPNO, ENAME, AND DEPTNO, of all rows of the
EMPLOYEE table, whose DEPTNO is 2.

EMPNO ENAME DEPTNO

1877 ARICA 2

Unit: 2 – RDBMS and SQL 15


O02CA504: Database Management System

Example:

SELECT *

FROM EMPLOYEE

WHERE DESIGNATION = ‘MANAGER’;

This query will display all five columns, i.e., EMPNO, ENAME, DESIGNATION, DEPTNO and PAY,
of all rows of the EMPLOYEE table whose DESGINATION stores MANAGER.

EMPNO ENAME DESIGNATION DEPTNO PAY INCENTIVES

1858 AINA MANAGER 3 30000 6000

1875 KRIPSI MANAGER 1 20000 4000

1877 ARICA MANAGER 2 15000 1000

Note: An asterisk (*) is used to retrieve all columns from the table.

The Complete syntax for the SELECT statement is as follows:

SELECT [ALL/DISTINCT] [TOPn] [PERCENT] [WITH TIES]]


select - list
[INTO new_Table]
[FROM table_Sources]
[WHERE search_Condition]
[GROUP By [ALL] Group_by_expression [,........n]
[WITH {CUBE ¦ ROLLUP}]]
[HAVING search_Condition]
[ORDER BY {column_name [ASC / DESC]} [,....n]]
[COMPUTE {{Column_Name [ASC / DESC]} [,....n]]
[COMPUTE {{AVG | COUNT | MAX | MIN | SUM} (expression)} [,....n]
[By expression [,...n]]
[FOR BROWSE] [OPTION (query_hint [,...n])]

6.2. Subquery
With the help of WHERE and HAVING commands, it is possible to embed a SQL statement into
another. In this situation, the query is known as a query, and the entire select statement is known
as a nested query.

Unit: 2 – RDBMS and SQL 16


O02CA504: Database Management System

The structure is:

SELECT “column_name1”

FROM “table_name1”

WHERE “column_name2” [Comparison Operator] (SELECT “column_name3” FROM


“table_name2”

WHERE [Condition])

Example: Take the table EMPLOYEE mentioned above EMPLOYEE

EMPNO ENAME DESIGNATION DEPTNO PAY INCENTIVES

1821 JOHN PRESIDENT 1 60000 8000

1858 AINA MANAGER 3 30000 6000

1875 KRIPSI MANAGER 1 20000 4000

1877 ARICA MANAGER 2 15000 1000

Display the employees whose DEPTNO is the same as that of employee 1821
Select ENAME, DEPTNO
FROM EMP
Where DEPTNO =
(SELECT DEPTNO
FROM EMP
WHERE EMPNO = 1821);

In the example above, you can see that the inner query is executed first, and then the result is
followed by the outer query.

Result:

ENAME DEPTNO

JOHN 1

KRIPSI 1

6.3. Querying multiple relations


SQL have various set operators, for instance, in, any, all, exists, not exists, union, minus, and
intersects. These operators are utilised for processes like testing and membership of value in a
set of values or the values in a set of values or membership of a tuple in a set of topples.

Unit: 2 – RDBMS and SQL 17


O02CA504: Database Management System

Example: Make a list of all employees working for a department located in NEW YORK
EMPNO ENAME DESIGNATION DEPTNO PAY DEPTNO DEPTNAME LOCATION

1821 JOHN President 1 60000 1 Accounting New York

1858 AINA Manager 3 30000 2 Research Dallas

1875 KRIPSI Manager 1 20000 3 Sales Chicago

1877 ARICA Manager 2 15000 4 Operations Boston

SELECT * FROM EMP WHERE DEPTNO IN

(SELECT DEPTNO FROM DEPT WHERE LOC = ‘NEW YORK’);

Result:

EMPNO ENAME DESIGNATION DEPTNO

1821 JOHN President 1

1875 KRIPSI Manager 1

6.4. Functions
A Subprogram that returns a value is known as a function. SQL supports various aggregate
functions shown below.

(a) Count: The COUNT function contains a column name and returns the count of the tuple in
that column. When the DISTINCT command is used, then it will return only the COUNT of a
unique tuple or distinct values of the column. If the column name and DISTINCT command
are not used, then it will return the count of all tuples, including duplicates. Also, COUNT (*)
displays all the tuples of the column.

Example: Write a query to List the number of employees in the company from a table of
employee

SELECT COUNT (*)

FROM EMPLOYEE

(b) SUM: The SUM function is written with a column name and gives the sum of all tuples present
in that column.
(c) AVG: AVG function or Average function is written with column name and returns the AVG
value of that column.

Unit: 2 – RDBMS and SQL 18


O02CA504: Database Management System

(d) MAX: MAX function or Maximum value function written with column name returns the
maximum value present in that column.
(e) MIN: MIN function or Minimum value function written with column name returns the minimum
value present in that column.

Examples of Queries Based on Aggregate Functions Queries

Find the sum of salaries of all the employees and also the minimum, maximum and average salary.

Solution:
SELECT SUM (E.ESAL) AS SUM_SALARY,
MAX (E.ESAL) AS MAX_SALARY,
MIN (E.ESAL) AS MIN_SALARY,
AVG ([DISTINCT] E.ESAL) AS AVERAGE_SALARY
FROM EMPLOYEE

This query calculates the total, minimum, maximum and average salaries and also renames the

column names.

6.5. Group BY
GROUP BY clause is utilised with the group functions for retrieving the data, which is grouped
according to one or more columns.

Example: Calculate the total number of salaries spent on each department.

What would be the query?


SELECT DEPT, SUM (SALARY)
FROM EMPLOYEE
GROUP BY DEPT;

The output would be like:


dept salary
---------------- --------------
Electrical 25000
Electronics 55000
Aeronautics 35000
InfoTech 30000

Unit: 2 – RDBMS and SQL 19


O02CA504: Database Management System

6.6. Updating the database


The UPDATE command is used to update a single value without updating all the values in the
tuple. Syntax is

Update table_name set attribute = new value where condition;=

Suppose we wish to change the house name of the student ‘Simran’ stored in the relation
ST_DATA. The following statement will serve the purpose.
UPDATE ST_DATA
SET ST_HNAME =’pranavam’
WHERE ST_NAME=’meenu’;

Activity - 1
Generally, there are numerous ways to specify the same query in SQL. In your
opinion, what are the main advantages and disadvantages of this flexibility?

SELF-ASSESSMENT QUESTIONS – 5

9. With the help of WHERE and _________ commands it is possible to embed a SQL
statement into another.
10. It is not possible to query multiple relations in SQL. (True/ False)

Unit: 2 – RDBMS and SQL 20


O02CA504: Database Management System

7. VIEWS

A view is a subschema in which logical tables are generated from more than one base table. For
example, Windows is similar to a created view where the user can see the stored information in
tables. View is stored as a query as it does not contain its own data. During the query execution,
contents are taken from other tables. When the table content gets modified or changed, then the
view will change dynamically,

The syntax to create a view is given below.


CREATE VIEW <view name>
AS <query>;

In a single table, if the query does not have a GROUP BY clause and DISTINCT clause, then the
user can UPDATE and DELETE rows in a view. And if the query has columns defined by
expressions, then the user can INSERT rows.

Example: In order to create a view of the EMP table named DEPT20 and show the employees in
department 20 and their annual salary, use the following command.

CREATE VIEW DEPT20

AS SELECT ENAME, SAL *12 FROM EMP WHERE DEPARTNO= 20;

Once the VIEW is created, it can be treated like any other table. Thus, the following is a valid
command.

SELECT * FROM DEPT20;

SELF-ASSESSMENT QUESTIONS – 6
11. A _________ is a subschema in which logical tables are generated from more than one
base table.
12. During the query execution contents are taken from other tables. (True/False)

Unit: 2 – RDBMS and SQL 21


O02CA504: Database Management System

8. EMBEDDING SQL STATEMENTS


SQL statements can be embedded into various types of programming languages such as C, Cobol,
Pascal, Fortran, etc. Host language is the language in which the SQL queries are embedded.
Therefore, C, FORTRAN, Pascal, etc, are the host languages. The SQl structure, which is
embedded in the host language, is termed as embedded SQl. Therefore, programmers can make
use of the various SQL commands to access and update any data stored in the database.

The use of embedded statements makes it easier to make any amendments to the database. It
also largely enhances the programmer’s capability to modify the database. The database system
is responsible for all query execution. The database then returns the result (one tuple at a time) to
the program.

Before compiling the program, the embedded SQL statement is processed by using a special pre-
processor. To allow the embedded SQL program to be processed at runtime, they are replaced
with the declarations and procedure calls of the host language. After doing so, the resultant
program is sent for compilation. For easily recognising the embedded SQL statements to pre-
processor, you may use the EXEC SQL statement. It has the following syntax:

EXEC SQL <embedded SQL statement > END-EXEC

The syntax given above is a generalised form. However, the syntax may differ somewhat
depending on the host language for which it is being used.

Declaring Variables and Exceptions: SQL INCLUDE can be used in the host program to
determine the place for inserting the special variables (variables which are used in communication
within the database and program) by the pre-processor. Host language variables can also be used
inside the embedded SQL statements. It is good practice to append a colon before the host
variables to differentiate them from other variables used in SQL. A declared cursor statement is
used for writing an embedded SQL query within a host program. It does not run the query. The
separate command is used to fetch the result of the embedded query.

Let us take an example of banking schema. Suppose you have a host-language variable termed
“amount”, and you want to determine the names and residing cities of all the bank customers who
currently have a balance of more than a particular amount in any of their accounts. The query for
finding this can be written as shown below:

Unit: 2 – RDBMS and SQL 22


O02CA504: Database Management System

EXEC SQL
declare c cursor for
select customer-name, customer-cit
from the deposit, the customer
where deposit.customer-name = customer.customer-name and
deposit.balance > : amount

END-EXEC

The variable c that is used in the above query statement is termed the ‘cursor’. This cursor is used
to identify a query in an open statement and also helps in query evaluation.

This cursor variable is also used in the fetch statement. It places the values of a tuple/row in the
host language variable. Below is an example of this.

EXEC SQL open c END-EXCE

When any error occurs in the execution of an SQL query, the error report is stored inside a special
variable. These special variables are called SQL communication-area (SQLCA) variables. The
declarations for the SQLCA variables are contained inside the SQL INCLUDE statement.

Fetch Statement

A sequence of fetch statements is used to make tuples of the result available to the program. One
host language variable is required for each attribute of the result relation. Therefore, in the banking
schema example, we require two separate variables, i.e. one for storing the customer name and
the other for storing the customer resident city. Let us assume we take a variable en for storing
the customer’s name and cc for storing the customer's city. Then, the tuple of the result relation
can be obtained by using the following statement:

EXEC SQL fetch c into: en, cc END-EXEC

After this, the programmer can modify the values of the two variables, in and cc, by using the host
language commands and features.

Close statement

The close statement is another embedded SQL statement, which is used to delete the temporary
relation that stores the query result. Given below is the use of a close statement in our example:

Unit: 2 – RDBMS and SQL 23


O02CA504: Database Management System

EXEC SQL close c END-EXEC

Embedded SQL statements for database modifications

The Embedded SQL statements, which are used for database modification such as update, insert,
& delete, return no result. Therefore, they are simple and easy to use. For example, a database-
modification statement in Embedded SQL has the following syntax:

EXEC SQL < any valid update, insert, or delete> END-EXEC

An SQL database modification expression may also contain the host-language variables that are
preceded by a colon. In case of an error during statement execution, SQLCA comes into the
picture.

SELF-ASSESSMENT QUESTIONS – 7
13. To recognise embedded SQL requests to the pre-processor, we use the _________
statement.
14. It is a good practice to append a colon before the host variables to differentiate them
from other variables used in SQL. (True/False)

Unit: 2 – RDBMS and SQL 24


O02CA504: Database Management System

9. TRANSACTION PROCESSING
The logical unit of database processing is defined by the mechanism provided by transaction
processing. Transaction processing systems consist of immense databases and lakhs of users
concurrently executing database transactions. A transaction is a logical unit of data manipulation-
related tasks wherein either all the component tasks must be completed or none of them is
executed in order to keep the database consistent. When many transactions proceed in the
database environment, it is imperative that strict control is applied to them, failing which the
consistency of the database cannot be ensured.

ACID Properties: Unwanted inconsistencies can easily occur in the database, particularly when
various transactions are executing simultaneously. The term ACID defines those properties that
must be related to transactions in order for the reliability of the database to be assured. The term
ACID, when extended, can be read as the following:

An Atomicity

C Consistency

I Isolation

D Durability

Atomicity: A transaction usually includes various database operations. This property of a


transaction makes sure that either every operation is executed in a successful manner or none of
them is executed at all.

Consistency: This property requires that the database integrity rules must be obeyed properly.

Isolation: In the case of a multi-transaction environment, various transactions may be carried out
simultaneously on a single database. This property provides assurance that all transactions are
executed independently.

Durability: When a transaction is completed successfully, this property makes sure that the
changes performed in the database are saved in the physical database.

Unit: 2 – RDBMS and SQL 25


O02CA504: Database Management System

Transaction support in SQL

SQL offers concurrency control for the execution of a transaction via a Data Control Language,
which can also be called (SQL DCL). When a transaction begins, we use the statement BEGIN
TRANSACTION offered by SQL DCL, whereas when a transaction ends, we use the statement
END TRANSACTION.

There are two statements provided by SQL that make the process of concurrent transaction
control easy.

COMMIT: On the execution of this statement, every modification done by the related transaction
until now is made constant.

ROLLBACK: On the execution of this statement, every change performed since the preceding
COMMIT statement is rejected.

There are some conditions under which transactions may occur. These conditions are shown in
Table 2.4 below:

Table 2.4: Conditions into which Transactions may occur

Conditions Features
This condition arises when a transaction reads data
Dirty read written by a concurrent uncommitted transaction.
This condition is caused by a transaction which reads
Non- data again and finds that data has been modified by
repeatable the committed write operation of some other
read transaction.
This condition arises when a transaction executes a
Phantom query again it had previously executed and gets rows
read different from what it got earlier.
Depending upon the conditions given above, some levels of transaction isolation are defined by
SQL. These levels are discussed below:

Read uncommitted isolation: Here, the transactions are permitted to perform the execution of
all non-repeatable, dirty, and phantom reads.

Read committed isolation: In this level, when the execution of a transaction takes place, the
data committed before the beginning of a query is obtained by a SELECT query.

Unit: 2 – RDBMS and SQL 26


O02CA504: Database Management System

Repeatable read: This level does not allow dirty and non-repeatable reads. It provides permission
for only phantom reading.

Serialisable isolation: Of all the levels of isolation, this level is considered the most rigid one.
Here, the transactions are forced to execute sequentially. Thus, a transaction can start only after
the completion of the existing transaction. As serialisation failures at this level can take place often,
it must ensure the withdrawal of a transaction.

SELF-ASSESSMENT QUESTIONS – 8
15. SQL offers _________ statements that make easy the process of concurrent transaction
control.
16. In transaction processing, the integrity rules of a database are maintained by _________
property.

Activity - 1
Create a list of all Transaction Control commands in SQL and explain then with
there uses.

Unit: 2 – RDBMS and SQL 27


O02CA504: Database Management System

10. NORMALISATION AND DATABASE DESIGN


The Normalisation Process

Normalisation comprises various sets of rules which are used to make sure that the database
relations are fully normalised by listing the functional dependencies and decomposing them into
smaller, efficient tables.

Normalisation primarily helps to:

eliminate data maintenance anomalies

minimise database redundancy

eliminates data inconsistency

The normalisation technique is established on the idea of normal forms. A table is said to be in a
specific normal form if it fulfils a particular set of constraints

which are defined for that normal form. These constraints are usually applicable to the attributes
(column) and the relationships between them. There are various levels of normal forms (See
Figure 3.1). Each normal form addresses a specific issue that could result in minimising the
database anomalies.

Database Normalisation uses functional dependencies present in a relation/table and the


candidate key in examining the tables. In the beginning, three normal forms were suggested: First
Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF). Later on, Fourth
Normal Form (4NF) and Fifth Normal Form (5NF) were also introduced.

Afterwards, E.F. Codd and R. Boyce presented a more substantial definition of the third Normal
Form known as (Boyce-Codd Normal Form).

All the normal forms except 1NF are derived from the concept of functional dependencies among
the attributes of relation.

Unit: 2 – RDBMS and SQL 28


O02CA504: Database Management System

Figure 3.1: Normalisation Process


When you initially enter the records into a database table, it is commonly in unnormalised form.
Therefore, you need to refine this table with the help of various types of normalisation forms, which
are explained below:

10.1 First normal form


The first normal form, commonly termed 1NF, is the most basic normal form. In this normal form,
the condition is that there must not be any repeating groups in any column. In other words, all the
columns in the table must be composed of atomic values.

Note: Atomic: A column is said to be atomic if the values are indivisible units.

The table is said to possess atomic values if there is one and only one data item for any given row
& column intersection. Non-atomic values create repeating groups. A repeating group is just the
repetition of a data item or cluster of data items in records. For example, consider below given
Table 3.4:

Unit: 2 – RDBMS and SQL 29


O02CA504: Database Management System

Table 3.4: Employee Table with Attribute Dependents


ID Name DeptNo Sal Mgr Dependents

131 Ram 20 10000 134 Father, Mother, Sister

132 Kiran 20 7000 136 Wife, Son

133 Rajesh 20 5000 136 Wife

134 Padma 10 20000 Son, Daughter

135 Devi 30 3000 137 Father, Mother

136 Satish 20 6000 Father, Mother

137 V.V. Rao 30 10000 Wife, First Son, Second Son

In Table 3.4, you can see non-atomic values. Therefore, to modify the non-atomic values that the
dependents column contains and convert this table into INF, we need atomic values, as shown in
Table 3.5.
Table 3.5: Change of Non-atomic Values into Atomic Values of
Table 3.4

ID Name DeptNo Sal Mgr Dependents

131 Ram 20 10000 134 Father

131 Ram 20 10000 134 Mother

131 Ram 20 10000 134 Sister

132 Kiran 20 7000 136 Wife

132 Kiran 20 7000 136 Son

133 Rajesh 20 5000 136 Wife

134 Padma 10 20000 Son

134 Padma 10 20000 Daughter

135 Devi 30 3000 137 Father

135 Defi 30 3000 137 Mother

136 Satish 20 6000 Father

137 V.V. Rao 30 10000 Wife

137 V.V. Rao 30 10000 First Son

137 V.V. Rao 30 10000 Second Son

Observe in Table 3.5 that the dependents column now contains atomic values. You will note that
for each dependent, the other employee details such as ID, Name, Dept No, Sal and Mgr are
repeated, which results in the creation of a repeating group(data redundancy). According to the
first NF, the above relation employee (Table 3.5) is in 1NF. However, it is best practice to remove
the groups which are being repeated in the table.

Unit: 2 – RDBMS and SQL 30


O02CA504: Database Management System

According to the first normalisation rule, the table should not contain any repeating groups of
column values. If any such type of repeating group exists, then they should be decomposed, and
the associated columns will form their own table. Also, the new resulting table must contain a link
with the original table (from where it was decomposed). Thus, to remove repeating groups from
the Employee relation, it can be decomposed into two relations, namely Emp and Emp_Depend,
as shown in Tables 3.6 and 3.7:

Table 3.6: Emp Relation

Table 3.7: Emp_Depend Relation

Here, in the above table 3.7, {ID, Dependents} combination will act as the unique key. The tuple
‘ID’ is the common tuple in both the tables (table 3.6 and table 3.7), which acts as a link with the

Unit: 2 – RDBMS and SQL 31


O02CA504: Database Management System

original table. Now, data redundancy is in the columns ID, Name, Dept. No, Sal and Mgr are also
eliminated, and now these tables are in INF. Now, let us consider another example. Suppose we
have a customer table, as shown in Table 3.8.

Cust_id Name Address Acc_ id Acc_type Min_bal Tran_id Tran_type Tan_mode Amount Balance

001 Ravi Hyd 994 SB 1000 14300 B/F 1000 1000

001 Ravi Hyd 994 SB 1000 14301 Deposit By cash 1000 2000

001 Ravi Hyd 994 SB 1000 14302 Withdrawal ATM 500 1500

110 Tim Sec'bad 340 CA 500 14303 B/F 3500 3500

110 Tim Sec'bad 340 CA 500 14304 Deposit Payroll 3500 7000

110 Tim Sec'bad 340 CA 500 14305 Withdrawal ATM 1000 6000

420 Kavi Vizag 699 SB 1000 14306 B/F 6000 6000

420 Kavi Vizag 699 SB 1000 14307 Credit Bycash 2000 8000

420 Kavi Vizag 699 SB 1000 14308 Withdrawal ATM 6500 1500

You will notice that Table 3.8 contains a repeating group composed of Cust_id, Name and Address.
Therefore, to convert this table into the first normal form, we need to remove this repeating group.
This can be done by dividing this table into two tables: Customer and Customer Tran. (See Table
3.9 and 3.10)

(Note: The primary key columns of each table are indicated in highlights in Figures).

Table 3.9: Customer Table

Unit: 2 – RDBMS and SQL 32


O02CA504: Database Management System

Table 3.10: Customer_Tran Table

10.2 Second normal form


1NF table is not fully free from redundancy. It may have partial dependencies. Therefore, the
Second Normal Form resolves partial dependencies.

The Second Normal Form states that

The table must be in the 1st Normal form.

All the non-key columns must be fully functional dependent on the Primary key.

Any attribute (column) is said to be partially dependent if its value can be determined by any one
or more attributes of the primary key, but not all.

Every normal form is based upon the previous normal form. Therefore, the first condition for the
second normal form is to have all its tables in the first normal form.

The Fully Functional Dependency is for a given composite primary key (a primary key which is
made of more than a single attribute); each column attribute, which is not an attribute of the
Primary key, should be dependent on each and every one of the attributes.

Unit: 2 – RDBMS and SQL 33


O02CA504: Database Management System

If attributes are only partially dependent on the primary key attribute, then they must be removed
and placed in another table. The primary key of the newly formed table must have an apportion of
the original key that they were dependent on.

Again, consider the earlier example of Customer Relations. After converting it into 1NF, we have
two tables: Customer and Customer_Tran. Now, we need to convert it into 2NF (Second Normal
Form). To do so, the Customer Tran table is further decomposed into three tables: Customer
Account, Accounts and Transaction, as shown in Tables 3.11, 3.12 and 3.13.

Table 3.11: Customer_Accounts Table


Cust.id Acc_id Balance
001 994 1500
110 340 6000
420 699 1500

Table 3.12: Accounts Table

Acc_id Accjype Min.bal


994 SB 1000
340 CA 500
699 SB 1000

Unit: 2 – RDBMS and SQL 34


O02CA504: Database Management System

Table 3.13: Transaction Table

Table 3.14: Customer Table


Cust_id Name Address
001 Ravi Hyd
110 Tim Sec'bad
420 Kavi Vizag
As the Acc_type and Min_bal attributes of the Customer_Account table (Table 3.11) are not fully
functionally dependent on the primary key (dependent on acc_id), therefore a new Accounts table
is formed (Table 3.12).

Similarly, the Balance is dependent on Cust_id and Acc_id but not fully functionally dependent on
them, resulting in a new Customer_Accounts table (Table 3.14).

10.3 Third Normal form


Second, normal forms are not yet completely free from redundancies. It may show some
redundancies due to transitive dependencies. Thus, the next higher normal form, i.e. the third
normal form objective, is to resolve transitive dependencies. A transitive dependency arises
between two attributes when

any non-key attribute is functionally dependent on some other non-key column, which is, in turn,
functionally dependent on the primary key.

Unit: 2 – RDBMS and SQL 35


O02CA504: Database Management System

The essential conditions for the Third Normal Form are:

The table must be in 2nd Normal Form

The table must not contain any transitive dependencies

Transitive Dependencies: Columns dependent on other columns that, in turn, are dependent on
the primary key are said to be transitively dependent.

In other words, a relation R is said to be in the third normal form (3NF) if and only if it is in 2NF,
and every non-key attribute must be non-transitively dependent on the Primary key.

Therefore, the main objective of 3NF is to make the relation free from all transitive dependencies.
Let us understand how we can do this with the help of an example.

Example: Again, let us go back to our previous example. The Accounts table (Table 3.12)is in the
second normal form, but it has transitive dependency as follows:

In order to remove this transitive dependency, the Accounts table can be decomposed into two
tables: Acc_Detail and Product, as shown in Tables 3.15 and 3.16:

Table 3.15: Acc_Detail Table

Table 3.16: Product Table

Acc_type Min_bal

SB 1000

CA 500

Tables after the Third Normal Form are given below (Table 3.17, 3.18 and 3.19)

Unit: 2 – RDBMS and SQL 36


O02CA504: Database Management System

Table 3.17: Customer Table

Cust_id Name Address

001 Ravi Hyd

110 Tim Sec'bad

420 Kavi Vizag


Table 3.18: Customer Accounts
Table

Cust_id Acc_id Balance

001 994 1500

110 340 6000

420 699 1500

Table 3.19: Transaction Table

Tan_mod
Tran_id Acc_id Tran_type e Amount
14300 994 B/F 1000
14301 994 Deposit By cash 1000
14302 994 Withdrawal ATM 500
14303 340 B/F 3500
14304 340 Deposit Payroll 3500
14305 340 Withdrawal ATM 1000
14306 699 B/F 6000
14307 699 Credit By cash 2000
14308 699 Withdrawal ATM 6500

10.4 Boyce-Codd normal form


BCNF is the common name of Boyce-Codd normal

Form. This normal form is stricter than the 3 NF. Remember that every relation which is in BCNF

form is also in 3NF, but a relation, which is in 3NF, may or may not be necessarily in BCNF.

Unit: 2 – RDBMS and SQL 37


O02CA504: Database Management System

The essential condition for a relational schema R to be in BCNF is that

“Whenever any nontrivial FD XA holds in R, then X must be a super key

of R”.

Let us understand

the concept of BCNF with the help of relation schema TEACH.

The Teach schema is composed of the following attributes.

student varchar 5,

Course varchar 5,

Teacher varchar 5 ).

In this relation, there are two dependencies. One in which (Student+Course)→Teacher, and
second in which Teacher→Course.

In this example, it has been assumed that one teacher teaches only one course. (Student+Course)
is the primary key in this relation.

Figure 3.2: Teach Schema

In this example, we will determine whether this table Teach (Figure 3.1) is in BCNF or not. For
this, we need to first check the first condition whether the relation is in 3NF. Here, the relation
Teach is in 3NF. Hence, the first condition is satisfied.

Now, let us check the second condition. According to the BCNF criteria, the

Unit: 2 – RDBMS and SQL 38


O02CA504: Database Management System

FD Teacher Course, which is of the form XA, must hold on this

relation Teach, and Teacher should be a superkey. But, here, the Teacher is not a super key.
Therefore, this condition is not fulfilled. So, we can say that the relation is not in BCNF.

We have seen the relation Teach is not in BCNF, but it is in 3NF. This condition arises because,
for a relation to be in 3NF, it must follow either of the two conditions, which are

‘either X should be a super key of R’ or ‘A

should be a prime attribute’. In this relation, as the Teacher is not a superkey, the first condition
fails. But even then, the second condition is satisfied as the course is a prime attribute of R.
Therefore, the relation is in 3NF but not in BCNF.

Comparison of BCNF with 3NF: To understand the differences between BCNF and 3NF, we
must again carefully look back to the definition of 3NF and BNF.

According to a 3NF definition, “The condition for a relational schema R to be in 3NF is if whenever
a nontrivial functional dependency (FD)X→A holds in R, then either of these two conditions must
be fulfilled.”

11. X should be a super key of R.


12. A is a prime attribute

But, according to the definition of BCNF,” the condition for a relational schema R to be in BCNF
is if whenever a nontrivial functional dependency (FD) X→A holds in R, then X should be a super
key of R”.

Thus, we see that BCNF is stricter than 3NF. One can easily obtain a 3NF relational design without
sacrificing the condition of lossless-join dependency preservation.

However, it is not easy to achieve BCNF design, lossless join and dependency preservation
altogether. In such situations, in which we cannot achieve all three objectives together, we will
have to choose 3NF, lossless joint, and dependency preservation.

Multi-Valued Dependencies (MVDs): MVD arises in a situation where one attribute value is
possibly a ‘multi-valued fact’ about some other attribute within the same table. One special case
of MVD is FD, which you have studied earlier. Therefore, every FD is an MVD.

Unit: 2 – RDBMS and SQL 39


O02CA504: Database Management System

Now, let us understand what MVD is with the help of a few examples given below. Let us consider
the relational schema CSB with the following structure.

(Stud _ name 10,

Course char 10,

Text _ book char 10)

An instance of Relation CSB is shown in Table 3.20.

Table 3.20: Instance of Relation CSB


Stud_nam
e Course Text_book
Brown First_Yr_Optics Phy - 1
Brown First_Yr_Mech Phy - 1
Green First_Yr_Optics Phy - 1
Green First_Yr_Mech Phy - 1
Brown Org_Chem Chem - 1
Brown Inorg_Chem Chem - 1
Jones French_litter French - 1
Jones French_grmr French - 1

In this relation, the two attributes

‘Stud_name’ and ‘Text_book’ are independent multi-valued facts about the attribute ‘course’.
Therefore, we can simply say that this relation contains multi-valued dependency. Here,
Stud_name and Text_book are independent multi-valued facts about the course because the
student has no control over the textbooks which are used for a particular course.

Let us take one more example of a relation schema Emp_Profile with the following three attributes:
(Emp _ name char 15,
Equipment char 15,
Languagechar 15 ).

An instance of the relation Emp_Profile is shown in Table 3.21.

Unit: 2 – RDBMS and SQL 40


O02CA504: Database Management System

Table 3.21: Instance of Relation Emp_Profile


Equipment Emp_name Language
PC Smith French
PC Smith German
Workstation Smith German
Workstation Smith French
Workstation Jones French
Workstation Jones German
Workstation Jones German

In this relation, Equipment and language are the two independent multi-valued facts about
employee_name. Therefore, we can say that the relation also contains MVD, as shown in Figure
3.3 below.

Figure 3.3: Equipment and Language are Independent


Multivalued Facts about Employee
Note that this relation Emp_Profile is in BCNF. Here, all the attributes are required for uniquely
identifying the records. Hence, Emp_name + Equipment +Language is the

primary key. But, this relation still contains

a redundancy problem.

Therefore, we can further decompose it to a higher normal form, i.e. 4NF, to resolve the problem
of redundancy. In the next section, we will see the definition of 4NF and how MVDs are associated
with it.

10.5 Fourth normal form


The Fourth Normal form is the next higher normal form after 3NF/BCNF. It is based on the concept
of multi-valued dependency. A Multivalued dependency arises in a condition where a relation
contains at least ways.

Unit: 2 – RDBMS and SQL 41


O02CA504: Database Management System

Three columns; one column has several rows whose values are similar to the values of a single
row of one of the other columns (See Table 3.22)

A more formal definition of MVD states that: “A multi-valued dependency exists if, for each value
of an attribute A, there exists a finite set of values of attribute B that are associated with A and a
finite set of values of attribute C that are also associated with A. Attributes B and C are
independent of each other.”

4NF - Addressing Multi-Valued Dependencies

Let us take the example of a relation Branch_Staff_Client (Table 3.22), which contains information
about the various clients for a bank branch, the various staff who address the client's needs and
the various requirements of each client.

The first normal form, commonly termed 1NF, is the most basic normal form. In this normal form,
the condition is that there must not be any repeating groups in any column. In other words, all the
columns in the table must be composed of atomic values.

Note: Atomic: A column is said to be atomic if the values are indivisible units.

The table is said to possess atomic values if there is one and only one data item for any given row
& column intersection. Non-atomic values create repeating groups. A repeating group is just the
repetition of a data item or cluster of data items in records. For example, consider below given
Table 3.4:

Table 3.22: Branch_Staff_Client


BranchNumber StaffName ClientName ClientRequirement
B-41 Radha Surya A

B-41 Radha Ravi B

B-41 Smitha Surya B


B-41 Smitha Ravi A

The above relation contains MVD. In this relation, the Client's name determines the Staff name
that serves the Client, and the Client's name also determines the Client's requirements. But
Staff_name and Client_requirement are not dependent

Unit: 2 – RDBMS and SQL 42


O02CA504: Database Management System

on each other, i.e., they are both independent

facts about Client_Name. Hence, there exists MVD.

Multi-valued dependencies in the Branch_Staff_Client relation can be symbolically represented


as:

Clientname StaffName

Clientname ClientRequirements

The necessary conditions for the Fourth Normal form are as follows:

13. The table should be in Boyce-Codd normal form


14. There should be no multi-valued dependencies.

Thus, the 4NF's basic objective is to eliminate multi-valued dependencies from the relation. In
order to remove multi-valued dependencies from a table, we need to decompose the table and
shift the related columns into separate tables along with a copy of the determinant. This copy will
serve as a foreign key to the original table.
Table 3.23: Branch_Staff Table before Fourth Normal Form
BranchNumber StaffName ClientName

B-41 Radha Surya

B-41 Radha Ravi

B-41 Smitha Surya

B-41 Smitha Ravi

Table 3.24: Branch_staff Table after Fourth Normal Form

BranchNumber ClientName BranchNumber StaffName

B-41 Surya B-41 Radha

B-41 Ravi B-41 Smitha

10.6. Fifth normal form


The fifth normal form is the highest normal form used in relational database designing. It is mostly
used when there is a large relational database.

Unit: 2 – RDBMS and SQL 43


O02CA504: Database Management System

The Fifth Normal form was developed by an IBM researcher, Ronald Fagin. According to Fagin’s
theorem, “The original table must be reconstructed from the tables into which it has been
decomposed.” 5NF allows decomposing a relation into three or more relations.

The fifth normal form is based on the concept of join dependency. Join dependency means that a
relation, after being broken down into three or smaller relations, should be capable of being
combined all over again on similar keys to result in the creation of the original table. The join
dependency is the more general form of multi-valued dependency.

A relation (R) meets the condition of the fifth normal form R1, R2. Rn if and
only if R is equal to the join of R,1 R 2 . R n

(Here, Ri are subsets of the set of attributes of R)

Any relation R is said to be in 5NF or PJNF) (project join normal form ) if for all join dependencies,
in any case, one of the following holds.

15. (R1, R2…. Rn) is trivial join-dependency (that is, one of Rt is R)


16. Every Ri is a candidate key for relation R.

Definition of Fifth Normal Form: A relation should be in fifth normal form (5NF) if and only if all
join dependencies in the table are connoted by candidate keys of the relation.

The table before the Fifth Normal Form


Table 3.25: Dept-Subject
Dept. Subject Student

Comp. Sc. CP1000 John Smith

Mathematics MA1000 John Smith

Comp. Sc. CP2000 Arun Kumar

Comp. Sc. CP3000 Reena Rani

Physics PHI 000 Raymond Chew

Chemistry CH2000 Albert Garcia

Table after Fifth Normal Form

Tables 3.26, 3.27 and 3.28 are formed after converting Table 3.25 into the Fifth Normal Form.

Unit: 2 – RDBMS and SQL 44


O02CA504: Database Management System

Table 3.26: Dept-Subject

Table 3.27: Subject-Student

Table 3.28: Dept-Student

Unit: 2 – RDBMS and SQL 45


O02CA504: Database Management System

SELF-ASSESSMENT QUESTIONS – 9

17. How does Normalisation help?


A. By eliminating various database anomalies
B. By minimising redundancy
C. By eliminating data inconsistency
D. All of the above
18. An attribute (column) is said to be _____ if its value can be determined by any one
or more attributes of the primary key, but not all.
19. A table which is in __________ normal form may contain
20. redundancies due to transitive dependencies.
21. The Fifth Normal form is usually useful when we have large relational data models.
(True/False)
22. The join dependency is more generalised form of __________
23. dependency.
24. An FD is a special case of an MVD and every FD is an MVD. (True/False)
25. The fifth normal form is also called __________.

Normalisation and Database Design

Normalisation and database design are two closely integrated terms. In this section, we will study
the relationship between the two. A database design refers to the process of moving from real-life
business models to a database model which meets those requirements. Normalisation is one such
technique.

You have already studied normalisation in detail. Normalisation, as you have learnt earlier, is a
technique that is used for designing relations in which data redundancies are minimised.

By using the normalisation technique, we want to design our relational database that has the
following set of properties:

It holds all the data required for the purposes that the database is to serve.

Unit: 2 – RDBMS and SQL 46


O02CA504: Database Management System

It must have as little redundancy as possible,

It must hold manifold values for types of data that require them,

It must allow efficient updates of the data in the database and

It must avoid the risk of accidental data loss.

You have studied that there are mainly five normal forms. However, of these, there are three forms
that are most commonly used practically. These three forms are the first normal, second normal,
and third normal. When you convert an ER (Entity-Relationship) model in the Third Normal Form
(3NF) to a relational model:

Relations are referred to as tables.

Attributes are referred to as columns.

Relationships are referred to as data references (primary and foreign key references).

The third Normal Form is considered the standard normal form from the viewpoint of the relational
database model. Normalised database tables are easy to maintain and also easily understood by
the developers. However, it is not necessary that a fully normalised database is the best database
design. In most of the cases, it is suggested that the database must be optimised up to the third
normal form. Therefore, we are often required to denormalise our database relations (you will
study denormalisation in detail in the next section, 3.6) so as to meet the optimum performance
level. Therefore, we can say that an efficiently normalised database has the following advantages:

17. Simplified and easy data maintenance


18. Enhanced speed of data processing
19. Enhanced design quality.

Unit: 2 – RDBMS and SQL 47


O02CA504: Database Management System

SELF-ASSESSMENT QUESTIONS – 10

26. From a __________ point of view, it is standard to have tables that are in Third
Normal Form.
27. According to relational database rules, a completely normalised database
always has the best performance. (True/False).
28. Denormalisation is done to increase the performance of the database.
(True/False).

Unit: 2 – RDBMS and SQL 48


O02CA504: Database Management System

11. DENORMALISATION
Normalisation is implemented to preserve data integrity. Nevertheless, in a real-world project, you
need some level of data redundancy for reasons relating to performance or maintaining history.

During the normalisation process, you need to decompose database tables into smaller tables.
However, if you create more tables, the database needs to execute more joins while solving
queries. But remember, joins has a poor effect on performance. Hence, denormalisation is done
to enhance the performance.

Denormalisation is the process of converting higher normal forms to lower normal forms with the
objective of getting faster access to the database.

Keep in mind that denormalisation is a common and essential element of the database design
process, but it must follow appropriate normalisation.

Techniques used for denormalisation: There are four main techniques used for
denormalisation. Below is a brief summary of the techniques:

Duplicate Data: The easiest technique is the method of adding duplicate data into the relational
table. Doing this will help to minimise the number of joins which are required to execute a given
query. It also minimises the CPU and I/O resources being utilised as well as boosts up the
performance.

Summary data: Summarising the data stored in the relational database table is another useful
technique used for denormalising the database. In this technique, the records are summarised in
some summary columns, thereby reducing the number of records stored in a table. This technique
enhances database performance as the database server now needs to process fewer records for
a given query execution.

Horizontal partitioning: Horizontal Fragmentation is another denormalisation technique in which


the database table is split by rows. This reduces the number of records per table and hence drives
the performance.

Unit: 2 – RDBMS and SQL 49


O02CA504: Database Management System

Vertical fragmentation: Vertical fragmentation breaks tables/relations by columns. The method


makes two or more two tables by allocating the original key to all and allocating a few of the non-
key columns to every newly made identical keyed table.

SELF-ASSESSMENT QUESTIONS – 11
29. Denormalisation is a technique to move from higher to lesser normal forms of
database modelling in order to get faster access to database.(True/ False)
30. __________ splits tables by rows, thus reducing the number of records per table.

Unit: 2 – RDBMS and SQL 50


O02CA504: Database Management System

12. SUMMARY

Let us recapitulate the important concepts discussed in this unit:

• SQL and QBE are the main types of relational query languages.
• DBMS maintains the data integrity to avoid the wrong information in the database.
• A DBMS implements integrity constraints; therefore, it permits only legal instances to be
stored in the database.
• A PK is known as a ' surrogate/alternate key' for those who do not contain genuine data.
• A subquery is simply a query within another Query.
• SQL supports various functions such as max, min, avg, count, etc.
• Transaction Control commands manage changes made by Data Manipulation Language
commands.
• Dynamic SQL permits the creation and submission of SQL queries dynamically or run time.

Unit: 2 – RDBMS and SQL 51


O02CA504: Database Management System

13. GLOSSARY

DDL - Data Definition Language

DML - Data Manipulation Language

Domain
- The set of all the values that an attribute can attain.
constraints

Dynamic SQL - Dynamic SQL allows a query to be compiled at run-time.

Embedded SQL allows SQL code [program] to be used in a host


Embedded SQL - language, i.e., general programming languages, such as C, COBOL,
PASCAL, and Fortran.

ISB - Information Systems Base Language.

PRTV - Peterlee Relational Test Vehicle

QBE - Query-By-Example is a relational data manipulation language.

QUEL - Query Language

INGRES - (Interactive Graphics and Retrieval System)

View - A customised presentation of the data from one or more tables.

Unit: 2 – RDBMS and SQL 52


O02CA504: Database Management System

When each determinant in a relation is a candidate key, a relation is


Boyce-Codd said to be in Boyce-Codd.
normal form - Normal Form ( BCNF ).
(BCNF)

A database is said to be in if each of the values of all the attributes in


First normal form -
a relation are atomic in nature.

Two attributes, A and B, in any relation, R, are said to possess a


Functional
- functional dependency (FD) if, for each distinct value of A, there is only
dependency
one value of attribute B associated with it.

It is the process of obtaining good database design by decomposing


Normalisation -
the relations into normal forms based on functional dependencies.

Second normal If every non-key attribute of a relation schema is fully FD (functionally


-
form dependent) on the key, then the relation is said to be in 2NF.

Third normal A relation is said to be 3NF if it is in 2NF and no and does not contain
-
form transitive dependencies.

Stored
Are sets of precompiled SQL statements that are stored in the
procedures in a database. These procedures can be called and executed by other
programs or scripts, providing a way to encapsulate and manage
Database -
complex database operations. Here are some key points about
Management stored procedures:
System (DBMS)

A stored procedure is a named collection of one or more SQL


Definition -
statements and procedural logic stored in the database.

Stored procedures encapsulate a series of SQL statements, making it


Encapsulation -
easier to manage and execute complex database operations.

Once created, stored procedures can be reused by different parts of


Reusability - an application or by different applications altogether, promoting code
reusability.

Unit: 2 – RDBMS and SQL 53


O02CA504: Database Management System

Stored procedures can be used to control access to data. Users can


Security - be granted permission to execute a stored procedure without giving
them direct access to the underlying tables.
Stored procedures can improve performance by reducing the amount
of data transferred between the database server and the client.
Performance -
Additionally, they can be optimised and cached by the database
engine.
Procedures help break down complex operations into modular
Modularity - components, making it easier to manage and maintain the database
code.
Stored procedures can be used to define transactions, allowing for
Transaction
- better control over the sequence of SQL statements and ensuring data
Control
consistency.

Stored procedures can accept parameters, which allows for dynamic


Parameterised: -
execution based on varying inputs.

The syntax for creating and executing stored procedures can vary
Vendor-Specific between different database management systems. For example,
-
Syntax MySQL, SQL Server, and Oracle have different syntaxes for creating
stored procedures.
This makes use of a set of tables used to show data as well as the
Relational Model - relationship between those data. All tables comprise numerous
columns, and every column has an exclusive name.
Here's a simple example of a stored procedure in MySQL:

In this example, the stored procedure GetCustomerDetails takes a parameter customer_id and
selects details of a customer with that ID from the customer's table.

Unit: 2 – RDBMS and SQL 54


O02CA504: Database Management System

Unit: 2 – RDBMS and SQL 55


O02CA504: Database Management System

14. TERMINAL QUESTIONS


1. Explain SQL and its features.
2. Explain with examples different SQL commands used for creating and deleting relations.
3. What are the three basic components of a select statement? Explain with an example.
4. What are the uses of Insert, Delete and Update commands?
5. What is the function of Create and Alter commands?
6. What do you understand by DDL? Make a list of commands used in DDL.
7. Write a short note on the ACID properties of the transaction model.
8. What are the primary key and candidate key?
9. Write a short note on Dynamic SQL.
10. Explain the various types of database anomalies.
11. Define functional dependency. Give examples.
12. What is normalisation? Explain why normalisation is required in database design.
13. Explain 1NF with a suitable example.
14. Explain the second normal form with an example.
15. Explain transitive dependencies with examples. Show how these are significant in designing
databases.
16. What is a third normal form? Give example.
17. Explain how BCNF and 3 NF differ.
18. What is the fourth normal form and fifth normal form? Explain with an example.
19. Write a short note on denormalisation.
20. Explain Stored procedures in a Database Management System.

Unit: 2 – RDBMS and SQL 56


O02CA504: Database Management System

15. ANSWERS

Self Assessment Questions


1. Query by example
2. False
3. DML
4. Transaction Execution
5. Referential Integrity
6. Attribute
7. True
8. Create
9. HAVING
10. False
11. View
12. True
13. EXEC SQL
14. True
15. Two
16. Consistency
17. Dynamic SQL
18. False
19. True
20. Schema
21. (b)Inconsistency
22. Redundancy
23. (d) All of the above
24. Partially dependent
25. Second
26. True
27. Multi-valued
28. True

Unit: 2 – RDBMS and SQL 57


O02CA504: Database Management System

29. Project-Join Normal Form (PJNF).


30. Rational model
31. False
32. True
33. True
34. Horizontal Fragmentation

Terminal Questions
Answer 1: SQL refers to Structured Query language. Refer to Section 3 for more details.

Answer 2: Relations can be created by use of the Create command. Refer to Section 5 for more
details.

Answer 3: The three basic components of the select statement are SELECT, FROM and WHERE.
Refer to Section 6 for more details.

Answer 4: These commands are the DML commands. Refer to Section 6 for more details.

Answer 5: Create and Alter commands are used to create and alter database objects. Refer to
Section 5 for more details.

Answer 6: DDL refers to data definition language. Refer to Section 5 for more details.

Answer 7: Every transaction must follow the ACID property. Refer to Section 9 for more details.

Answer 8: The primary key is used to uniquely identify a row. Refer to Section 4 for more details.

Answer 9: Dynamic SQL allows a query to be constructed (and executed) at run-time. Refer to
Section 10 for more details.

Answer 10: There are mainly three types of anomalies in a database: the first is redundancy, the
second is inconsistency, and the third is update. Refer to Section 3.3 for more details.

Answer 11: Functional dependency is a type of constraint in which an attribute is dependent upon
another attribute. Refer to Section 3.2 for more details.

Answer 12 Normalisation is the process of designing a good database by converting it into various
normal forms by eliminating all the database anomalies. Refer to Section 3.4 for more details.

Unit: 2 – RDBMS and SQL 58


O02CA504: Database Management System

Answer 13: In 1NF, all attribute values of a relation are atomic in nature. Refer to Section 3.4 for
more details.

Answer 14: When all the non-key attributes of a relational schema are fully functionally dependent
on the primary key, then that relation is said to be in 2NF. Refer to Section 3.4 for more details.

Answer 15: A transitive dependency is a condition where one attribute is functionally dependent
on another non-key attribute. Refer to Section 3.4 for more details.

Answer 16: A table is said to be in 3NF if it is in 2NF, and it does not contain any transitive
dependencies. Refer to Section 3.4 for more details.

Answer 17: Boyce-Codd (BCNF) is a strict case of 3NFwhere all the determinant keys are also
candidate keys. Refer to Section 3.4 for more details.

Answer 18: The 4NF table necessarily has two conditions, i.e. firstly, it must be in Boyce-Codd
normal form, and secondly, it must be free from any multi-valued dependencies. Refer to Section
3.4 for more details.

Answer 19: Denormalisation is done to enhance the performance of a normalised database. Refer
to Section 3.6 for more details.

Answer 20: A stored procedure is a named collection of one or more SQL statements and
procedural logic stored in the database.

Unit: 2 – RDBMS and SQL 59


O02CA504: Database Management System

16. REFERENCES

• Peter Rob, Carlos Coronel, "Database Systems: Design, Implementation, and Management",
(7th Ed.), Thomson Learning
• Silberschatz, Korth, Sudarshan, "Database System Concepts", (4th Ed.), McGraw-Hill
• Elmasari Navathe, "Fundamentals of Database Systems",(3rd Ed.), Pearson Education Asia

E-References
• http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions001. htm
• http://msdn.microsoft.com/en-us/library/windows/desktop/
ms714570%28v=vs.85%29.aspx
• http://beginner-sql-tutorial.com/sql-commands.htm

Unit: 2 – RDBMS and SQL 60

You might also like