Unit 2- Rdbms and SQL
Unit 2- Rdbms and SQL
SEMESTER 1
O02CA504
DATABASE MANAGEMENT SYSTEM1
Unit: 2 – RDBMS and SQL
O02CA504: Database Management System
Unit 2
RDBMS and SQL
TABLE OF CONTENTS
Fig No /
SL SAQ /
Topic Table / Page No
No Activity
Graph
1 Introduction - -
4
1.1 Objectives - -
2 Relational Query Languages - - 5
3 SQL Concepts - - 6-7
4 Integrity Constraints - -
4.1 Entity integrity - -
8 - 10
4.2 Domain integrity - -
4.3 Referential integrity - -
5 Data Definition Statements - -
5.1 Creating relations in SQL - -
11 - 13
5.2 Adding and deleting tuples - -
5.3 Destroying and altering relations - -
6 Data Definition Statements - -
6.1 SELECT statement - -
6.2 Subquery - -
6.3 Querying multiple relations - - 14 - 20
6.4 Functions - -
6.5 GROUP BY - -
6.6 Updating the database - -
7 Views - - 21
8 Embedding SQL Statements - - 22 - 24
9 Transaction Processing - - 25 - 27
10 Normalisation and Database Design - -
10.1 First normal form - -
10.2 Second normal form - -
28 - 48
10.3 Third Normal form - -
10.4 Boyce-Codd normal form - -
10.5 Fourth normal form - -
1. INTRODUCTION
In the previous unit, you studied the advantages and disadvantages of those database systems.
The interaction level of a database depends on its usage. If the user uses the database at a higher
level, then the interaction level will also be increased. Hence, each database system should have
several methods, languages, and groups of software. So that users can submit a request, process
the request, and get the output of the request. This unit introduces some of the database query
languages and tools.
Now that we are clear about various types of DBMS, let us start this unit, where you will learn
about query languages and will also study SQL features and queries.
1.1. Objectives
After studying this unit, you should be able to:
create relational database objects using SQL
formulate tables and data residing in them
create and manipulate views
describe transaction processing
discuss the concept of embedded SQL and dynamic SQL
Modern RDBMSs support several query languages for user interaction. There are two most
common query languages available with RDBMS: SQL (Structured Query Language) and QBE
(Query by Example).
Others are Information System Base Language (ISBL) from the Peterlee Relational Test
Vehicle (PRTV) system and QUEL (Query Language) from INGRES (Interactive Graphics
Retrieval System). ISBL (Information System Base Language) is based on relational algebra and
query language, SQL is like tuple calculus, and QBE is like domain calculus.
In this section, we will focus on QBE and in the forthcoming sections (Section 3 onwards), you will
study SQL in detail.
Query by Example (QBE): QBE was developed in the mid-70s at IBM research simultaneously
with the development of SQL. M.M Zloof designed the Query by Example (QBE), which is a
relational database query language. It is the first graphical query language. QBE is used for visual
representation of tables where the user gives commands for defining what is to be done, instances
for defining how it is done and conditions in which records should be admitted into the processing.
SELF-ASSESSMENT QUESTIONS – 1
1. QBE stands for _________.
2. SQL is supported by RDBMS. (True/False)
3. SQL
SQL (Structured Query Language) is a standard relational database language used for the
creation, deletion and modification of database tables.
(Note*: The SQL Keywords are case-insensitive (SELECT, FROM, WHERE, etc); we have used
caps words where we want to put emphasis on the word. Table names, column names, etc., are
case-insensitive in Windows OS but are sensitive in UNIX OS)
Features: SQL has a very rich set of features, which are given in Table 2.1 below:
The Data Manipulation Language (DML): As the name says, this language is used for
manipulating. The data is stored in database
objects. DML uses SELECT, INSERT, DELETE
and UPDATE commands to modify the data.
The Data Definition Language (DDL): This language is used to define the structure of the
table. With CREATE, ALTER and DROP
commands, the structure of the table can be
modified; it can also be deleted and created as
well.
Specifications of Triggers and Complex Integrating SQL provides the features of the triggers and
Constraints: complex integrity constraints (ICs) to be applied to
queries.
Run-time (Dynamic) and Embedded SQL: With the run-time feature of SQL, users can
execute the queries at run-time. With embedded
SQL, users can retrieve the SQL statements that
are part of some other host language (such as C or
Cobol).
Execution of the Client-Server Application and This feature allows a client program to establish a
Accessing Remote
connection with the server database. This feature
Advanced Features provided by the SQL: Many features, such as recursive and decision-
SELF-ASSESSMENT QUESTIONS – 2
3. SELECT, INSERT, DELETE and UPDATE commands are used by
_________ to modify the data.
4. SQL commands defines the actions to be taken to control _________ .
4. INTEGRITY CONSTRAINTS
DBMS maintains the data integrity to avoid the wrong information in the database.
The condition of integrity constraints is defined on the database schema. An integrity constraint
limits the data that could be stored in a database instance. When a database instance fulfils all
the integrity constraints -defined on the database schema, it is then known as a legal instance. A
DBMS implements integrity constraints; therefore, it permits only legal instances to be stored in
the database.
The major relational constraints are Domain constraints, Key constraints and constraints on null,
Entity integrity and Referential integrity and foreign keys.
The primary key of a relational table uniquely identifies each record in the table. It can either be a
normal attribute that is guaranteed to be unique, or it
can be generated by the DBMS (such as a globally unique identifier in Microsoft SQL Server).
Primary keys may consist of a single attribute or multiple attributes in combination. The intelligent
key is the utilisation of genuine data as a PK. Only one PK is assigned to a table. A composite PK
does not contain only one column. We can utilise the composite PK when not even one column
has the unique composite key.
Hence, we can say that a table can contain only one PK, but a PK can contain more than one
column. If we have to apply uniqueness on more than one column, we need to utilise a PK
constraint on a single column and a UNIQUE constraint or IDENTITY property on other columns
that do not contain duplicate values.
Domain integrity is also called 'attribute' integrity, for example, allowed size values, right data type,
null status, etc. Implementation of data integrity can be done with DEFAULT constraint, FOREIGN
KEY, CHECK constraint, and data types. Data types restrict the fields in different ways. A default
can be defined as a value to be inserted into a column; a rule is defined as acceptable values to
be inserted into a column. Rules and defaults are the same as constraints but not similar to ANSI
standards; their continued utilisation is not promoted.
Primary key: As explained above, it is a key that uniquely recognises a record in a field(s) of a
table. Hence, a particular record can be tracked without confusion.
Foreign Key: A foreign key is a column or even a group of columns in a table (also called 'child
table') that accepts its values from the primary key (PK) from another table (also called 'parent
table'). To preserve the referential integrity, the foreign key in the 'child' table can only take values
that are in the primary key of the 'parent' table. The main aim of referential integrity is to avoid
'orphans'. These orphans are records in a ‘child table’ that cannot be linked to a record in the
‘parent table’.
Implementing referential integrity means that when the records go through operations like
insertion, deletion, and updation, the relationship between the tables should be maintained. PK-
FK combination also has referential integrity. An example of a primary key and a foreign key is
represented in Figure 2.1.
In the 1st table, the first column (Account Number) is the PK, and in the same table, the branch
name is the FK. To connect the 1st and 2nd tables, the FK has become PK.
SELF-ASSESSMENT QUESTIONS – 3
5. _________ is formed with the combination of PK and FK.
6. Domain integrity is also called as ‘_________' integrity.
Data Definition Language (DDL) permits users to create or modify database objects. Specifically,
they perform the tasks of creating objects, altering or modifying objects, dropping or deleting
objects, etc.
In the above program, we have created a table named “EMP”. There are seven columns: EMPNO
(Employee Number), ENAME (Employee Name), JOB, DOJ (Date of Joining), SAL (Salary),
COMM (Communication Number), DEPTNO (Department Number).
The above example shows the insertion of a record into the EMP table.
To insert values into only EMPNO, DEPTNO and ENAME fields, enter the following query.
Deleting tuple: The DELETE command is used to delete a row from the table.
The above example shows the deletion of all the employees whose salaries are more than 1000.
If we delete the WHERE clause, then all rows of the table will be deleted, but a part of the row
cannot be deleted.
CASCADE command deletes the complete database schema, which contains tables, domains
and other elements.
RESTRICT command deletes the database schema if it does not contain any element, or else the
command will be terminated.
Alter table command: ALTER TABLE command adds attributes to an existing relation. The Null
values are assigned to all the tuples as a new attribute. The syntax is
Where d is the existing relation, I is the added attribute, and D is the domain of the added attribute.
This statement can drop attributes from a relation. Where d is the existing relation, and I am the
attribute of the relation.
Example: ALTER TABLE details ADD Parents_Name VARCHAR (20); The above example will
add an attribute: Parents_Name to the table details.
SELF-ASSESSMENT QUESTIONS – 4
7. There are two types of DROP commands: CASCADE and RISTRICT (True/False)
8. _________ command helps for the creation of SQL relations.
Syntax: The three common elements of the SELECT command are SELECT, FROM and WHERE.
These elements can retrieve information from more than one table. The syntax is:
SELECT <column_list>
FROM <table_list>
WHERE <search_criteria>
Where
In SQL, basic logical comparison operators are used on the WHERE clause.
FROM EMPLOYEE
This query will display three columns, i.e., EMPNO, ENAME, AND DEPTNO, of all rows of the
EMPLOYEE table, whose DEPTNO is 2.
1877 ARICA 2
Example:
SELECT *
FROM EMPLOYEE
This query will display all five columns, i.e., EMPNO, ENAME, DESIGNATION, DEPTNO and PAY,
of all rows of the EMPLOYEE table whose DESGINATION stores MANAGER.
Note: An asterisk (*) is used to retrieve all columns from the table.
6.2. Subquery
With the help of WHERE and HAVING commands, it is possible to embed a SQL statement into
another. In this situation, the query is known as a query, and the entire select statement is known
as a nested query.
SELECT “column_name1”
FROM “table_name1”
WHERE [Condition])
Display the employees whose DEPTNO is the same as that of employee 1821
Select ENAME, DEPTNO
FROM EMP
Where DEPTNO =
(SELECT DEPTNO
FROM EMP
WHERE EMPNO = 1821);
In the example above, you can see that the inner query is executed first, and then the result is
followed by the outer query.
Result:
ENAME DEPTNO
JOHN 1
KRIPSI 1
Example: Make a list of all employees working for a department located in NEW YORK
EMPNO ENAME DESIGNATION DEPTNO PAY DEPTNO DEPTNAME LOCATION
Result:
6.4. Functions
A Subprogram that returns a value is known as a function. SQL supports various aggregate
functions shown below.
(a) Count: The COUNT function contains a column name and returns the count of the tuple in
that column. When the DISTINCT command is used, then it will return only the COUNT of a
unique tuple or distinct values of the column. If the column name and DISTINCT command
are not used, then it will return the count of all tuples, including duplicates. Also, COUNT (*)
displays all the tuples of the column.
Example: Write a query to List the number of employees in the company from a table of
employee
FROM EMPLOYEE
(b) SUM: The SUM function is written with a column name and gives the sum of all tuples present
in that column.
(c) AVG: AVG function or Average function is written with column name and returns the AVG
value of that column.
(d) MAX: MAX function or Maximum value function written with column name returns the
maximum value present in that column.
(e) MIN: MIN function or Minimum value function written with column name returns the minimum
value present in that column.
Find the sum of salaries of all the employees and also the minimum, maximum and average salary.
Solution:
SELECT SUM (E.ESAL) AS SUM_SALARY,
MAX (E.ESAL) AS MAX_SALARY,
MIN (E.ESAL) AS MIN_SALARY,
AVG ([DISTINCT] E.ESAL) AS AVERAGE_SALARY
FROM EMPLOYEE
This query calculates the total, minimum, maximum and average salaries and also renames the
column names.
6.5. Group BY
GROUP BY clause is utilised with the group functions for retrieving the data, which is grouped
according to one or more columns.
Suppose we wish to change the house name of the student ‘Simran’ stored in the relation
ST_DATA. The following statement will serve the purpose.
UPDATE ST_DATA
SET ST_HNAME =’pranavam’
WHERE ST_NAME=’meenu’;
Activity - 1
Generally, there are numerous ways to specify the same query in SQL. In your
opinion, what are the main advantages and disadvantages of this flexibility?
SELF-ASSESSMENT QUESTIONS – 5
9. With the help of WHERE and _________ commands it is possible to embed a SQL
statement into another.
10. It is not possible to query multiple relations in SQL. (True/ False)
7. VIEWS
A view is a subschema in which logical tables are generated from more than one base table. For
example, Windows is similar to a created view where the user can see the stored information in
tables. View is stored as a query as it does not contain its own data. During the query execution,
contents are taken from other tables. When the table content gets modified or changed, then the
view will change dynamically,
In a single table, if the query does not have a GROUP BY clause and DISTINCT clause, then the
user can UPDATE and DELETE rows in a view. And if the query has columns defined by
expressions, then the user can INSERT rows.
Example: In order to create a view of the EMP table named DEPT20 and show the employees in
department 20 and their annual salary, use the following command.
Once the VIEW is created, it can be treated like any other table. Thus, the following is a valid
command.
SELF-ASSESSMENT QUESTIONS – 6
11. A _________ is a subschema in which logical tables are generated from more than one
base table.
12. During the query execution contents are taken from other tables. (True/False)
The use of embedded statements makes it easier to make any amendments to the database. It
also largely enhances the programmer’s capability to modify the database. The database system
is responsible for all query execution. The database then returns the result (one tuple at a time) to
the program.
Before compiling the program, the embedded SQL statement is processed by using a special pre-
processor. To allow the embedded SQL program to be processed at runtime, they are replaced
with the declarations and procedure calls of the host language. After doing so, the resultant
program is sent for compilation. For easily recognising the embedded SQL statements to pre-
processor, you may use the EXEC SQL statement. It has the following syntax:
The syntax given above is a generalised form. However, the syntax may differ somewhat
depending on the host language for which it is being used.
Declaring Variables and Exceptions: SQL INCLUDE can be used in the host program to
determine the place for inserting the special variables (variables which are used in communication
within the database and program) by the pre-processor. Host language variables can also be used
inside the embedded SQL statements. It is good practice to append a colon before the host
variables to differentiate them from other variables used in SQL. A declared cursor statement is
used for writing an embedded SQL query within a host program. It does not run the query. The
separate command is used to fetch the result of the embedded query.
Let us take an example of banking schema. Suppose you have a host-language variable termed
“amount”, and you want to determine the names and residing cities of all the bank customers who
currently have a balance of more than a particular amount in any of their accounts. The query for
finding this can be written as shown below:
EXEC SQL
declare c cursor for
select customer-name, customer-cit
from the deposit, the customer
where deposit.customer-name = customer.customer-name and
deposit.balance > : amount
END-EXEC
The variable c that is used in the above query statement is termed the ‘cursor’. This cursor is used
to identify a query in an open statement and also helps in query evaluation.
This cursor variable is also used in the fetch statement. It places the values of a tuple/row in the
host language variable. Below is an example of this.
When any error occurs in the execution of an SQL query, the error report is stored inside a special
variable. These special variables are called SQL communication-area (SQLCA) variables. The
declarations for the SQLCA variables are contained inside the SQL INCLUDE statement.
Fetch Statement
A sequence of fetch statements is used to make tuples of the result available to the program. One
host language variable is required for each attribute of the result relation. Therefore, in the banking
schema example, we require two separate variables, i.e. one for storing the customer name and
the other for storing the customer resident city. Let us assume we take a variable en for storing
the customer’s name and cc for storing the customer's city. Then, the tuple of the result relation
can be obtained by using the following statement:
After this, the programmer can modify the values of the two variables, in and cc, by using the host
language commands and features.
Close statement
The close statement is another embedded SQL statement, which is used to delete the temporary
relation that stores the query result. Given below is the use of a close statement in our example:
The Embedded SQL statements, which are used for database modification such as update, insert,
& delete, return no result. Therefore, they are simple and easy to use. For example, a database-
modification statement in Embedded SQL has the following syntax:
An SQL database modification expression may also contain the host-language variables that are
preceded by a colon. In case of an error during statement execution, SQLCA comes into the
picture.
SELF-ASSESSMENT QUESTIONS – 7
13. To recognise embedded SQL requests to the pre-processor, we use the _________
statement.
14. It is a good practice to append a colon before the host variables to differentiate them
from other variables used in SQL. (True/False)
9. TRANSACTION PROCESSING
The logical unit of database processing is defined by the mechanism provided by transaction
processing. Transaction processing systems consist of immense databases and lakhs of users
concurrently executing database transactions. A transaction is a logical unit of data manipulation-
related tasks wherein either all the component tasks must be completed or none of them is
executed in order to keep the database consistent. When many transactions proceed in the
database environment, it is imperative that strict control is applied to them, failing which the
consistency of the database cannot be ensured.
ACID Properties: Unwanted inconsistencies can easily occur in the database, particularly when
various transactions are executing simultaneously. The term ACID defines those properties that
must be related to transactions in order for the reliability of the database to be assured. The term
ACID, when extended, can be read as the following:
An Atomicity
C Consistency
I Isolation
D Durability
Consistency: This property requires that the database integrity rules must be obeyed properly.
Isolation: In the case of a multi-transaction environment, various transactions may be carried out
simultaneously on a single database. This property provides assurance that all transactions are
executed independently.
Durability: When a transaction is completed successfully, this property makes sure that the
changes performed in the database are saved in the physical database.
SQL offers concurrency control for the execution of a transaction via a Data Control Language,
which can also be called (SQL DCL). When a transaction begins, we use the statement BEGIN
TRANSACTION offered by SQL DCL, whereas when a transaction ends, we use the statement
END TRANSACTION.
There are two statements provided by SQL that make the process of concurrent transaction
control easy.
COMMIT: On the execution of this statement, every modification done by the related transaction
until now is made constant.
ROLLBACK: On the execution of this statement, every change performed since the preceding
COMMIT statement is rejected.
There are some conditions under which transactions may occur. These conditions are shown in
Table 2.4 below:
Conditions Features
This condition arises when a transaction reads data
Dirty read written by a concurrent uncommitted transaction.
This condition is caused by a transaction which reads
Non- data again and finds that data has been modified by
repeatable the committed write operation of some other
read transaction.
This condition arises when a transaction executes a
Phantom query again it had previously executed and gets rows
read different from what it got earlier.
Depending upon the conditions given above, some levels of transaction isolation are defined by
SQL. These levels are discussed below:
Read uncommitted isolation: Here, the transactions are permitted to perform the execution of
all non-repeatable, dirty, and phantom reads.
Read committed isolation: In this level, when the execution of a transaction takes place, the
data committed before the beginning of a query is obtained by a SELECT query.
Repeatable read: This level does not allow dirty and non-repeatable reads. It provides permission
for only phantom reading.
Serialisable isolation: Of all the levels of isolation, this level is considered the most rigid one.
Here, the transactions are forced to execute sequentially. Thus, a transaction can start only after
the completion of the existing transaction. As serialisation failures at this level can take place often,
it must ensure the withdrawal of a transaction.
SELF-ASSESSMENT QUESTIONS – 8
15. SQL offers _________ statements that make easy the process of concurrent transaction
control.
16. In transaction processing, the integrity rules of a database are maintained by _________
property.
Activity - 1
Create a list of all Transaction Control commands in SQL and explain then with
there uses.
Normalisation comprises various sets of rules which are used to make sure that the database
relations are fully normalised by listing the functional dependencies and decomposing them into
smaller, efficient tables.
The normalisation technique is established on the idea of normal forms. A table is said to be in a
specific normal form if it fulfils a particular set of constraints
which are defined for that normal form. These constraints are usually applicable to the attributes
(column) and the relationships between them. There are various levels of normal forms (See
Figure 3.1). Each normal form addresses a specific issue that could result in minimising the
database anomalies.
Afterwards, E.F. Codd and R. Boyce presented a more substantial definition of the third Normal
Form known as (Boyce-Codd Normal Form).
All the normal forms except 1NF are derived from the concept of functional dependencies among
the attributes of relation.
Note: Atomic: A column is said to be atomic if the values are indivisible units.
The table is said to possess atomic values if there is one and only one data item for any given row
& column intersection. Non-atomic values create repeating groups. A repeating group is just the
repetition of a data item or cluster of data items in records. For example, consider below given
Table 3.4:
In Table 3.4, you can see non-atomic values. Therefore, to modify the non-atomic values that the
dependents column contains and convert this table into INF, we need atomic values, as shown in
Table 3.5.
Table 3.5: Change of Non-atomic Values into Atomic Values of
Table 3.4
Observe in Table 3.5 that the dependents column now contains atomic values. You will note that
for each dependent, the other employee details such as ID, Name, Dept No, Sal and Mgr are
repeated, which results in the creation of a repeating group(data redundancy). According to the
first NF, the above relation employee (Table 3.5) is in 1NF. However, it is best practice to remove
the groups which are being repeated in the table.
According to the first normalisation rule, the table should not contain any repeating groups of
column values. If any such type of repeating group exists, then they should be decomposed, and
the associated columns will form their own table. Also, the new resulting table must contain a link
with the original table (from where it was decomposed). Thus, to remove repeating groups from
the Employee relation, it can be decomposed into two relations, namely Emp and Emp_Depend,
as shown in Tables 3.6 and 3.7:
Here, in the above table 3.7, {ID, Dependents} combination will act as the unique key. The tuple
‘ID’ is the common tuple in both the tables (table 3.6 and table 3.7), which acts as a link with the
original table. Now, data redundancy is in the columns ID, Name, Dept. No, Sal and Mgr are also
eliminated, and now these tables are in INF. Now, let us consider another example. Suppose we
have a customer table, as shown in Table 3.8.
Cust_id Name Address Acc_ id Acc_type Min_bal Tran_id Tran_type Tan_mode Amount Balance
001 Ravi Hyd 994 SB 1000 14301 Deposit By cash 1000 2000
001 Ravi Hyd 994 SB 1000 14302 Withdrawal ATM 500 1500
110 Tim Sec'bad 340 CA 500 14304 Deposit Payroll 3500 7000
110 Tim Sec'bad 340 CA 500 14305 Withdrawal ATM 1000 6000
420 Kavi Vizag 699 SB 1000 14307 Credit Bycash 2000 8000
420 Kavi Vizag 699 SB 1000 14308 Withdrawal ATM 6500 1500
You will notice that Table 3.8 contains a repeating group composed of Cust_id, Name and Address.
Therefore, to convert this table into the first normal form, we need to remove this repeating group.
This can be done by dividing this table into two tables: Customer and Customer Tran. (See Table
3.9 and 3.10)
(Note: The primary key columns of each table are indicated in highlights in Figures).
All the non-key columns must be fully functional dependent on the Primary key.
Any attribute (column) is said to be partially dependent if its value can be determined by any one
or more attributes of the primary key, but not all.
Every normal form is based upon the previous normal form. Therefore, the first condition for the
second normal form is to have all its tables in the first normal form.
The Fully Functional Dependency is for a given composite primary key (a primary key which is
made of more than a single attribute); each column attribute, which is not an attribute of the
Primary key, should be dependent on each and every one of the attributes.
If attributes are only partially dependent on the primary key attribute, then they must be removed
and placed in another table. The primary key of the newly formed table must have an apportion of
the original key that they were dependent on.
Again, consider the earlier example of Customer Relations. After converting it into 1NF, we have
two tables: Customer and Customer_Tran. Now, we need to convert it into 2NF (Second Normal
Form). To do so, the Customer Tran table is further decomposed into three tables: Customer
Account, Accounts and Transaction, as shown in Tables 3.11, 3.12 and 3.13.
Similarly, the Balance is dependent on Cust_id and Acc_id but not fully functionally dependent on
them, resulting in a new Customer_Accounts table (Table 3.14).
any non-key attribute is functionally dependent on some other non-key column, which is, in turn,
functionally dependent on the primary key.
Transitive Dependencies: Columns dependent on other columns that, in turn, are dependent on
the primary key are said to be transitively dependent.
In other words, a relation R is said to be in the third normal form (3NF) if and only if it is in 2NF,
and every non-key attribute must be non-transitively dependent on the Primary key.
Therefore, the main objective of 3NF is to make the relation free from all transitive dependencies.
Let us understand how we can do this with the help of an example.
Example: Again, let us go back to our previous example. The Accounts table (Table 3.12)is in the
second normal form, but it has transitive dependency as follows:
In order to remove this transitive dependency, the Accounts table can be decomposed into two
tables: Acc_Detail and Product, as shown in Tables 3.15 and 3.16:
Acc_type Min_bal
SB 1000
CA 500
Tables after the Third Normal Form are given below (Table 3.17, 3.18 and 3.19)
Tan_mod
Tran_id Acc_id Tran_type e Amount
14300 994 B/F 1000
14301 994 Deposit By cash 1000
14302 994 Withdrawal ATM 500
14303 340 B/F 3500
14304 340 Deposit Payroll 3500
14305 340 Withdrawal ATM 1000
14306 699 B/F 6000
14307 699 Credit By cash 2000
14308 699 Withdrawal ATM 6500
Form. This normal form is stricter than the 3 NF. Remember that every relation which is in BCNF
form is also in 3NF, but a relation, which is in 3NF, may or may not be necessarily in BCNF.
of R”.
Let us understand
student varchar 5,
Course varchar 5,
Teacher varchar 5 ).
In this relation, there are two dependencies. One in which (Student+Course)→Teacher, and
second in which Teacher→Course.
In this example, it has been assumed that one teacher teaches only one course. (Student+Course)
is the primary key in this relation.
In this example, we will determine whether this table Teach (Figure 3.1) is in BCNF or not. For
this, we need to first check the first condition whether the relation is in 3NF. Here, the relation
Teach is in 3NF. Hence, the first condition is satisfied.
Now, let us check the second condition. According to the BCNF criteria, the
relation Teach, and Teacher should be a superkey. But, here, the Teacher is not a super key.
Therefore, this condition is not fulfilled. So, we can say that the relation is not in BCNF.
We have seen the relation Teach is not in BCNF, but it is in 3NF. This condition arises because,
for a relation to be in 3NF, it must follow either of the two conditions, which are
should be a prime attribute’. In this relation, as the Teacher is not a superkey, the first condition
fails. But even then, the second condition is satisfied as the course is a prime attribute of R.
Therefore, the relation is in 3NF but not in BCNF.
Comparison of BCNF with 3NF: To understand the differences between BCNF and 3NF, we
must again carefully look back to the definition of 3NF and BNF.
According to a 3NF definition, “The condition for a relational schema R to be in 3NF is if whenever
a nontrivial functional dependency (FD)X→A holds in R, then either of these two conditions must
be fulfilled.”
But, according to the definition of BCNF,” the condition for a relational schema R to be in BCNF
is if whenever a nontrivial functional dependency (FD) X→A holds in R, then X should be a super
key of R”.
Thus, we see that BCNF is stricter than 3NF. One can easily obtain a 3NF relational design without
sacrificing the condition of lossless-join dependency preservation.
However, it is not easy to achieve BCNF design, lossless join and dependency preservation
altogether. In such situations, in which we cannot achieve all three objectives together, we will
have to choose 3NF, lossless joint, and dependency preservation.
Multi-Valued Dependencies (MVDs): MVD arises in a situation where one attribute value is
possibly a ‘multi-valued fact’ about some other attribute within the same table. One special case
of MVD is FD, which you have studied earlier. Therefore, every FD is an MVD.
Now, let us understand what MVD is with the help of a few examples given below. Let us consider
the relational schema CSB with the following structure.
‘Stud_name’ and ‘Text_book’ are independent multi-valued facts about the attribute ‘course’.
Therefore, we can simply say that this relation contains multi-valued dependency. Here,
Stud_name and Text_book are independent multi-valued facts about the course because the
student has no control over the textbooks which are used for a particular course.
Let us take one more example of a relation schema Emp_Profile with the following three attributes:
(Emp _ name char 15,
Equipment char 15,
Languagechar 15 ).
In this relation, Equipment and language are the two independent multi-valued facts about
employee_name. Therefore, we can say that the relation also contains MVD, as shown in Figure
3.3 below.
a redundancy problem.
Therefore, we can further decompose it to a higher normal form, i.e. 4NF, to resolve the problem
of redundancy. In the next section, we will see the definition of 4NF and how MVDs are associated
with it.
Three columns; one column has several rows whose values are similar to the values of a single
row of one of the other columns (See Table 3.22)
A more formal definition of MVD states that: “A multi-valued dependency exists if, for each value
of an attribute A, there exists a finite set of values of attribute B that are associated with A and a
finite set of values of attribute C that are also associated with A. Attributes B and C are
independent of each other.”
Let us take the example of a relation Branch_Staff_Client (Table 3.22), which contains information
about the various clients for a bank branch, the various staff who address the client's needs and
the various requirements of each client.
The first normal form, commonly termed 1NF, is the most basic normal form. In this normal form,
the condition is that there must not be any repeating groups in any column. In other words, all the
columns in the table must be composed of atomic values.
Note: Atomic: A column is said to be atomic if the values are indivisible units.
The table is said to possess atomic values if there is one and only one data item for any given row
& column intersection. Non-atomic values create repeating groups. A repeating group is just the
repetition of a data item or cluster of data items in records. For example, consider below given
Table 3.4:
The above relation contains MVD. In this relation, the Client's name determines the Staff name
that serves the Client, and the Client's name also determines the Client's requirements. But
Staff_name and Client_requirement are not dependent
Clientname StaffName
Clientname ClientRequirements
The necessary conditions for the Fourth Normal form are as follows:
Thus, the 4NF's basic objective is to eliminate multi-valued dependencies from the relation. In
order to remove multi-valued dependencies from a table, we need to decompose the table and
shift the related columns into separate tables along with a copy of the determinant. This copy will
serve as a foreign key to the original table.
Table 3.23: Branch_Staff Table before Fourth Normal Form
BranchNumber StaffName ClientName
The Fifth Normal form was developed by an IBM researcher, Ronald Fagin. According to Fagin’s
theorem, “The original table must be reconstructed from the tables into which it has been
decomposed.” 5NF allows decomposing a relation into three or more relations.
The fifth normal form is based on the concept of join dependency. Join dependency means that a
relation, after being broken down into three or smaller relations, should be capable of being
combined all over again on similar keys to result in the creation of the original table. The join
dependency is the more general form of multi-valued dependency.
A relation (R) meets the condition of the fifth normal form R1, R2. Rn if and
only if R is equal to the join of R,1 R 2 . R n
Any relation R is said to be in 5NF or PJNF) (project join normal form ) if for all join dependencies,
in any case, one of the following holds.
Definition of Fifth Normal Form: A relation should be in fifth normal form (5NF) if and only if all
join dependencies in the table are connoted by candidate keys of the relation.
Tables 3.26, 3.27 and 3.28 are formed after converting Table 3.25 into the Fifth Normal Form.
SELF-ASSESSMENT QUESTIONS – 9
Normalisation and database design are two closely integrated terms. In this section, we will study
the relationship between the two. A database design refers to the process of moving from real-life
business models to a database model which meets those requirements. Normalisation is one such
technique.
You have already studied normalisation in detail. Normalisation, as you have learnt earlier, is a
technique that is used for designing relations in which data redundancies are minimised.
By using the normalisation technique, we want to design our relational database that has the
following set of properties:
It holds all the data required for the purposes that the database is to serve.
It must hold manifold values for types of data that require them,
You have studied that there are mainly five normal forms. However, of these, there are three forms
that are most commonly used practically. These three forms are the first normal, second normal,
and third normal. When you convert an ER (Entity-Relationship) model in the Third Normal Form
(3NF) to a relational model:
Relationships are referred to as data references (primary and foreign key references).
The third Normal Form is considered the standard normal form from the viewpoint of the relational
database model. Normalised database tables are easy to maintain and also easily understood by
the developers. However, it is not necessary that a fully normalised database is the best database
design. In most of the cases, it is suggested that the database must be optimised up to the third
normal form. Therefore, we are often required to denormalise our database relations (you will
study denormalisation in detail in the next section, 3.6) so as to meet the optimum performance
level. Therefore, we can say that an efficiently normalised database has the following advantages:
SELF-ASSESSMENT QUESTIONS – 10
26. From a __________ point of view, it is standard to have tables that are in Third
Normal Form.
27. According to relational database rules, a completely normalised database
always has the best performance. (True/False).
28. Denormalisation is done to increase the performance of the database.
(True/False).
11. DENORMALISATION
Normalisation is implemented to preserve data integrity. Nevertheless, in a real-world project, you
need some level of data redundancy for reasons relating to performance or maintaining history.
During the normalisation process, you need to decompose database tables into smaller tables.
However, if you create more tables, the database needs to execute more joins while solving
queries. But remember, joins has a poor effect on performance. Hence, denormalisation is done
to enhance the performance.
Denormalisation is the process of converting higher normal forms to lower normal forms with the
objective of getting faster access to the database.
Keep in mind that denormalisation is a common and essential element of the database design
process, but it must follow appropriate normalisation.
Techniques used for denormalisation: There are four main techniques used for
denormalisation. Below is a brief summary of the techniques:
Duplicate Data: The easiest technique is the method of adding duplicate data into the relational
table. Doing this will help to minimise the number of joins which are required to execute a given
query. It also minimises the CPU and I/O resources being utilised as well as boosts up the
performance.
Summary data: Summarising the data stored in the relational database table is another useful
technique used for denormalising the database. In this technique, the records are summarised in
some summary columns, thereby reducing the number of records stored in a table. This technique
enhances database performance as the database server now needs to process fewer records for
a given query execution.
SELF-ASSESSMENT QUESTIONS – 11
29. Denormalisation is a technique to move from higher to lesser normal forms of
database modelling in order to get faster access to database.(True/ False)
30. __________ splits tables by rows, thus reducing the number of records per table.
12. SUMMARY
• SQL and QBE are the main types of relational query languages.
• DBMS maintains the data integrity to avoid the wrong information in the database.
• A DBMS implements integrity constraints; therefore, it permits only legal instances to be
stored in the database.
• A PK is known as a ' surrogate/alternate key' for those who do not contain genuine data.
• A subquery is simply a query within another Query.
• SQL supports various functions such as max, min, avg, count, etc.
• Transaction Control commands manage changes made by Data Manipulation Language
commands.
• Dynamic SQL permits the creation and submission of SQL queries dynamically or run time.
13. GLOSSARY
Domain
- The set of all the values that an attribute can attain.
constraints
Third normal A relation is said to be 3NF if it is in 2NF and no and does not contain
-
form transitive dependencies.
Stored
Are sets of precompiled SQL statements that are stored in the
procedures in a database. These procedures can be called and executed by other
programs or scripts, providing a way to encapsulate and manage
Database -
complex database operations. Here are some key points about
Management stored procedures:
System (DBMS)
The syntax for creating and executing stored procedures can vary
Vendor-Specific between different database management systems. For example,
-
Syntax MySQL, SQL Server, and Oracle have different syntaxes for creating
stored procedures.
This makes use of a set of tables used to show data as well as the
Relational Model - relationship between those data. All tables comprise numerous
columns, and every column has an exclusive name.
Here's a simple example of a stored procedure in MySQL:
In this example, the stored procedure GetCustomerDetails takes a parameter customer_id and
selects details of a customer with that ID from the customer's table.
15. ANSWERS
Terminal Questions
Answer 1: SQL refers to Structured Query language. Refer to Section 3 for more details.
Answer 2: Relations can be created by use of the Create command. Refer to Section 5 for more
details.
Answer 3: The three basic components of the select statement are SELECT, FROM and WHERE.
Refer to Section 6 for more details.
Answer 4: These commands are the DML commands. Refer to Section 6 for more details.
Answer 5: Create and Alter commands are used to create and alter database objects. Refer to
Section 5 for more details.
Answer 6: DDL refers to data definition language. Refer to Section 5 for more details.
Answer 7: Every transaction must follow the ACID property. Refer to Section 9 for more details.
Answer 8: The primary key is used to uniquely identify a row. Refer to Section 4 for more details.
Answer 9: Dynamic SQL allows a query to be constructed (and executed) at run-time. Refer to
Section 10 for more details.
Answer 10: There are mainly three types of anomalies in a database: the first is redundancy, the
second is inconsistency, and the third is update. Refer to Section 3.3 for more details.
Answer 11: Functional dependency is a type of constraint in which an attribute is dependent upon
another attribute. Refer to Section 3.2 for more details.
Answer 12 Normalisation is the process of designing a good database by converting it into various
normal forms by eliminating all the database anomalies. Refer to Section 3.4 for more details.
Answer 13: In 1NF, all attribute values of a relation are atomic in nature. Refer to Section 3.4 for
more details.
Answer 14: When all the non-key attributes of a relational schema are fully functionally dependent
on the primary key, then that relation is said to be in 2NF. Refer to Section 3.4 for more details.
Answer 15: A transitive dependency is a condition where one attribute is functionally dependent
on another non-key attribute. Refer to Section 3.4 for more details.
Answer 16: A table is said to be in 3NF if it is in 2NF, and it does not contain any transitive
dependencies. Refer to Section 3.4 for more details.
Answer 17: Boyce-Codd (BCNF) is a strict case of 3NFwhere all the determinant keys are also
candidate keys. Refer to Section 3.4 for more details.
Answer 18: The 4NF table necessarily has two conditions, i.e. firstly, it must be in Boyce-Codd
normal form, and secondly, it must be free from any multi-valued dependencies. Refer to Section
3.4 for more details.
Answer 19: Denormalisation is done to enhance the performance of a normalised database. Refer
to Section 3.6 for more details.
Answer 20: A stored procedure is a named collection of one or more SQL statements and
procedural logic stored in the database.
16. REFERENCES
• Peter Rob, Carlos Coronel, "Database Systems: Design, Implementation, and Management",
(7th Ed.), Thomson Learning
• Silberschatz, Korth, Sudarshan, "Database System Concepts", (4th Ed.), McGraw-Hill
• Elmasari Navathe, "Fundamentals of Database Systems",(3rd Ed.), Pearson Education Asia
E-References
• http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions001. htm
• http://msdn.microsoft.com/en-us/library/windows/desktop/
ms714570%28v=vs.85%29.aspx
• http://beginner-sql-tutorial.com/sql-commands.htm