Unit 4
Unit 4
Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional
dependency says that if two tuples have same values for attributes A1, A2,..., An, then those two
tuples must have to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally
determines Y. The left-hand side attributes determine the values of attributes on the right-hand side.
42 abc 17
43 pqr 18
44 xyz 18
42 abc 17
43 pqr 18
44 xyz 18
What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new
tuple into a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion
of data results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data
value requires multiple rows of data to be updated.
First Normal Form is defined in the definition of relations (tables) itself. This rule
defines that all the attributes in a relation must have atomic domains. The values in an
atomic domain are indivisible units.
Each attribute must contain only a single value from its pre-defined domain.
Before we learn about the second normal form, we need to understand the following −
Prime attribute − An attribute, which is a part of the candidate-key, is known as
a prime attribute.
Non-prime attribute − An attribute, which is not a part of the prime-key, is said to
be a non-prime attribute.
If we follow second normal form, then every non-prime attribute should be fully functionally
dependent on prime key attribute. That is, if X → A holds, then there should not be any proper
subset Y of X, for which Y → A also holds true.
We see here in Student_Project relation that the prime key attributes are Stu_ID and
Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must
be dependent upon both and not on any of the prime key attribute individually. But we
find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by
Proj_ID independently. This is called partial dependency, which is not allowed in
Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
For a relation to be in Third Normal Form, it must be in Second Normal form and the following
must satisfy −
o A is prime attribute.
We find that in the above Student_detail relation, Stu_ID is the key and only prime
key attribute. We find that City can be identified by Stu_ID as well as Zip itself.
Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID →
Zip → City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as follows
−
12 Codd's Rules
Every database has tables, and constraints cannot be referred to as a rational database system. And if
any database has only relational data model, it cannot be a Relational Database System (RDBMS). So,
some rules define a database to be the correct RDBMS. These rules were developed by Dr. Edgar F.
Codd (E.F. Codd) in 1985, who has vast research knowledge on the Relational Model of database
Systems. Codd presents his 13 rules for a database to test the concept of DBMS against his relational
model, and if a database follows the rule, it is called a true relational database (RDBMS). These 13
rules are popular in RDBMS, known as Codd's 12 rules.
Rule 0: The Foundation Rule
The database must be in relational form. So that the system can handle the database through its
relational capabilities.
A database contains various information, and this information must be stored in each cell of a table in
the form of rows and columns.
Every single or precise data (atomic value) may be accessed logically from a relational database using
the combination of primary key value, table name, and column name.
This rule defines the systematic treatment of Null values in database records. The null value has various
meanings in the database, like missing the data, no value in a cell, inappropriate information, unknown
data and the primary key should not be null.
It represents the entire logical structure of the descriptive database that must be stored online and is
known as a database dictionary. It authorizes users to access the database and implement a similar query
language to access the database.
The relational database supports various languages, and if we want to access the database, the language
must be the explicit, linear or well-defined syntax, character strings and supports the comprehensive:
data definition, view definition, data manipulation, integrity constraints, and limit transaction
management operations. If the database allows access to the data without any language, it is considered
a violation of the database.
All views table can be theoretically updated and must be practically updated by the database systems.
Rule 7: Relational Level Operation (High-Level Insert, Update and delete) Rule
A database system should follow high-level relational operations such as insert, update, and delete in
each level or a single row. It also supports union, intersection and minus operation in the database
system.
Rule 8: Physical Data Independence Rule
All stored data in a database or an application must be physically independent to access the database.
Each data should not depend on other data or an application. If data is updated or the physical structure
of the database is changed, it will not show any effect on external applications that are accessing the
data from the database.
It is similar to physical data independence. It means, if any changes occurred to the logical level (table
structures), it should not affect the user's view (application). For example, suppose a table either split
into two tables, or two table joins to create a single table, these changes should not be impacted on the
user view application.
A database must maintain integrity independence when inserting data into table's cells using the SQL
query language. All entered values should not be changed or rely on any external factor or application
to maintain integrity. It is also helpful in making the database-independent for each front-end
application.
The distribution independence rule represents a database that must work properly, even if it is stored in
different locations and used by different end-users. Suppose a user accesses the database through an
application; in that case, they should not be aware that another user uses particular data, and the data
they always get is only located on one site. The end users can access the database, and these access data
should be independent for every user to perform the SQL queries.
The non-submersion rule defines RDBMS as a SQL language to store and manipulate the data in the
database. If a system has a low-level or separate language other than SQL to access the database system,
it should not subvert or bypass integrity to transform data.
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and
HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is
a Multi- valued dependency on STU_ID, which leads to unnecessary repetition of
data.
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
The above relation can be decomposed into the following three tables; therefore, it is not in
5NF −
EmployeeSkills
EmpName EmpSkills
David Java
John JavaScript
Jamie jQuery
Emma Java
EmployeeJob
EmpName EmpJob
David E145
John E146
Jamie E146
Emma E147
JobSkills
EmpSkills EmpJob
Java E145
JavaScript E146
jQuery E146
Java E147
Indexing in DBMS
o Indexing is used to optimize the performance of a database by minimizing the
number of disk accesses required when a query is processed.
o The index is a type of data structure. It is used to locate and access the data in a database
table quickly.
Index structure:
o The first column of the database is the search key that contains a copy of the primary key
or candidate key of the table. The values of the primary key are stored in sorted order so
that the corresponding data can be accessed easily.
o The second column of the database is the data reference. It contains a set of pointers holding
the address of the disk block where the value of the particular key can be found.
Indexing Methods
Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted are known as
ordered indices.
Example: Suppose we have an employee table with thousands of record and each of which is 10
bytes long. If their IDs start with 1, 2, 3....and so on and we have to search student with ID-543.
o In the case of a database with no index, we have to search the disk block from starting till
it reaches 543. The DBMS will read the record after reading 543*10=5430 bytes.
o In the case of an index, we will search using indexes and the DBMS will read the record
after reading 542*2= 1084 bytes which are very less compared to the previous case.
Primary Index
o If the index is created on the basis of the primary key of the table, then it is known as
primary indexing. These primary keys are unique to each record and contain 1:1 relation
between the records.
o As primary keys are stored in sorted order, the performance of the searching operation is
quite efficient.
o The primary index can be classified into two types: Dense index and Sparse index.
Dense index
o The dense index contains an index record for every search key value in the data file. It
makes searching faster.
o In this, the number of records in the index table is same as the number of records in the
main table.
o It needs more space to store index record itself. The index records have the search key and
a pointer to the actual record on the disk.
Sparse index
o In the data file, index record appears only for a few items. Each item points to a block.
o In this, instead of pointing to each record in the main table, the index points to the records
in the main table in a gap.
Clustering Index
o A clustered index can be defined as an ordered data file. Sometimes the index is created on
non-primary key columns which may not be unique for each record.
o In this case, to identify the record faster, we will group two or more columns to get the
unique value and create index out of them. This method is called a clustering index.
o The records which have similar characteristics are grouped, and indexes are created for
these group.
Example: suppose a company contains several employees in each department. Suppose we use a
clustering index, where all employees which belong to the same Dept_ID are considered within a
single cluster, and index pointers point to the cluster as a whole. Here Dept_Id is a non-unique key.
Secondary Index
In the sparse indexing, as the size of the table grows, the size of mapping also grows. These
mappings are usually kept in the primary memory so that address fetch should be faster. Then the
secondary memory searches the actual data based on the address got from mapping. If the mapping
size grows then fetching the address itself becomes slower. In this case, the sparse index will not
be efficient. To overcome this problem, secondary indexing is introduced.
In secondary indexing, to reduce the size of mapping, another level of indexing is introduced. In
this method, the huge range for the columns is selected initially so that the mapping size of the first
level becomes small. Then each range is further divided into smaller ranges. The mapping of the
first level is stored in the primary memory, so that address fetch is faster. The mapping of the
second level and actual data are stored in the secondary memory (hard disk).
For example:
o If you want to find the record of roll 111 in the diagram, then it will search the highest entry
which is smaller than or equal to 111 in the first level index. It will get 100 at this level.
o Then in the second index level, again it does max (111) <= 111 and gets 110. Now using
the address 110, it goes to the data block and starts searching each record till it gets 111.
o This is how a search is performed in this method. Inserting, updating or deleting is also
done in the same manner.
Eg:
CREATE INDEX index_name
ON table_name (column1, column2, ...);
PL/SQL is a block structured language that can have multiple blocks in it.
PL/SQL language such as conditional statements, loops, arrays, string, exceptions, collections, records,
triggers, functions, procedures, cursors etc. There are also given PL/SQL interview questions and
quizzes to help you better understand the PL/SQL language.
SQL stands for Structured Query Language i.e. used to perform operations on the
records stored in database such as inserting records, updating records, deleting
records, creating, modifying and dropping tables, views etc.
What is PL/SQL
PL/SQL is a block structured language. The programs of PL/SQL are logical blocks that can contain any
number of nested sub-blocks. Pl/SQL stands for "Procedural Language extension of SQL" that is used in
Oracle. PL/SQL is integrated with Oracle database (since version 7). The functionalities of PL/SQL
usually extended after each release of Oracle database. Although PL/SQL is closely integrated with SQL
language, yet it adds some programming constraints that are not available in SQL.
PL/SQL Functionalities
PL/SQL includes procedural language elements like conditions and loops. It allows declaration of
constants and variables, procedures and functions, types and variable of those types and triggers. It can
support Array and handle exceptions (runtime errors). After the implementation of version 8 of Oracle
database have included features associated with object orientation. You can create PL/SQL units like
procedures, functions, packages, types and triggers, etc. which are stored in the database for reuse by
applications.
With PL/SQL, you can use SQL statements to manipulate Oracle data and flow of control statements
to process the data.
The PL/SQL is known for its combination of data manipulating power of SQL with data processing
power of procedural languages. It inherits the robustness, security, and portability of the Oracle
Database.
PL/SQL is not case sensitive so you are free to use lower case letters or upper case letters except within
string and character literals. A line of PL/SQL text contains groups of characters known as lexical units.
It can be classified as follows:
o Delimeters
o Identifiers
o Literals
o Comments
Example of initilizing variable
1. DECLARE
2. a integer := 30;
3. b integer := 40;
4. c integer;
5. f real;
6. BEGIN
7. c := a + b;
8. dbms_output.put_line('Value of c: ' || c);
9. f := 100.0/3.0;
10. dbms_output.put_line('Value of f: ' || f);
11. END;
Value of c: 70
Value of f: 33.333333333333333333
PL/SQL If
PL/SQL supports the programming language features like conditional statements and
iterative statements. Its programming constructs are similar to how you use in
programming languages like Java and C++.
This syntax is used when you want to execute statements only when condition is TRUE.
1. IF condition
2. THEN
3. {...statements to execute when condition is TRUE...}
4. ELSE
5. {...statements to execute when condition is FALSE...}
6. END IF;
This syntax is used when you want to execute one set of statements when condition is TRUE or a
different set of statements when condition is FALSE.
1. IF condition1
2. THEN
3. {...statements to execute when condition1 is TRUE...}
4. ELSIF condition2
5. THEN
6. {...statements to execute when condition2 is TRUE...}
7. END IF;
This syntax is used when you want to execute one set of statements when condition1 is TRUE or a
different set of statements when condition2 is TRUE.
1. IF condition1
2. THEN
3. {...statements to execute when condition1 is TRUE...}
4. ELSIF condition2
5. THEN
6. {...statements to execute when condition2 is TRUE...}
7. ELSE
8. {...statements to execute when both condition1 and condition2 are FALSE...}
9. END IF;
It is the most advance syntax and used if you want to execute one set of statements when condition1 is
TRUE, a different set of statement when condition2 is TRUE or a different set of statements when both
the condition1 and condition2 are FALSE.
1. DECLARE
2. a number(3) := 500;
3. BEGIN
4. -- check the boolean condition using if statement
5. IF( a < 20 ) THEN
6. -- if condition is true then print the following
7. dbms_output.put_line('a is less than 20 ' );
8. ELSE
9. dbms_output.put_line('a is not less than 20 ' );
10. END IF;
11. dbms_output.put_line('value of a is : ' || a);
12. END;
After the execution of the above code in SQL prompt, you will get the following result:
PL/SQL Loop
The PL/SQL loops are used to repeat the execution of one or more statements for specified
number of times. These are also known as iterative control statements.
1. LOOP
2. Sequence of statements;
3. END LOOP;
PL/SQL for loop is used when when you want to execute a set of statements for a predetermined number
of times. The loop is iterated between the start and end integer values. The counter is always incremented
by 1 and once the counter reaches the value of end integer, the loop ends.
1. BEGIN
2. FOR k IN 1..10 LOOP
3. -- note that k was not declared
4. DBMS_OUTPUT.PUT_LINE(k);
5. END LOOP;
6. END;
After the execution of the above code, you will get the following result:
Play Video
1
2
3
4
5
6
7
8
9
10
Note: You must follow these steps while using PL/SQL WHILE Loop.
o You don't need to declare the counter variable explicitly because it is declared implicitly in the
declaration section.
o The counter variable is incremented by 1 and does not need to be incremented
explicitly.
o You can use EXIT WHEN statements and EXIT statements in FOR Loops but it is not done
often.
PL/SQL while loop is used when a set of statements has to be executed as long as a condition is true,
the While loop is used. The condition is decided at the beginning of each iteration and continues until
the condition becomes false.
1. WHILE <condition>
2. LOOP statements;
3. END LOOP;
1. DECLARE
2. i INTEGER := 1;
3. BEGIN
4. WHILE i <= 10 LOOP
5. DBMS_OUTPUT.PUT_LINE(i);
6. i := i+1;
7. END LOOP;
8. END;
After the execution of the above code, you will get the following result:
1
2
3
4
5
6
7
8
9
10
Note: You must follow these steps while using PL/SQL WHILE Loop.
o Initialize a variable before the loop body.
o Increment the variable in the loop.
o You can use EXIT WHEN statements and EXIT statements in While loop but it is not done
often.
PL/SQL Cursor
When an SQL statement is processed, Oracle creates a memory area known as context area. A cursor is a
pointer to this context area. It contains all information needed for processing the statement. In PL/SQL,
the context area is controlled by Cursor. A cursor contains information on a select statement and the rows
of data accessed by it.
A cursor is used to referred to a program to fetch and process the rows returned by the SQL statement,
one at a time. There are two types of cursors:
o Implicit Cursors
o Explicit Cursors
These are created by default to process the statements when DML statements like INSERT, UPDATE,
DELETE etc. are executed.
Orcale provides some attributes known as Implicit cursor's attributes to check the status of DML
operations. Some of them are: %FOUND, %NOTFOUND, %ROWCOUNT and %ISOPEN.
For example: When you execute the SQL statements like INSERT, UPDATE, DELETE then the cursor
attributes tell whether any rows are affected and how many have been affected. If you run a SELECT
INTO statement in PL/SQL block, the implicit cursor attribute can be used to find out whether any row
has been returned by the SELECT statement. It will return an error if there no data is selected.
The following table soecifies the status of the cursor with each of its attribute.
Attribute Description
%FOUND Its return value is TRUE if DML statements like INSERT, DELETE and UPDATE affect
at least one row or more rows or a SELECT INTO statement returned one or more rows.
Otherwise it returns FALSE.
%NOTFOUN Its return value is TRUE if DML statements like INSERT, DELETE and UPDATE
D affect no row, or a SELECT INTO statement return no rows. Otherwise it returns
FALSE. It is a just opposite of %FOUND.
%ISOPEN It always returns FALSE for implicit cursors, because the SQL cursor is
automatically closed after executing its associated SQL statements.
%ROWCOUN It returns the number of rows affected by DML statements like INSERT, DELETE,
T and UPDATE or returned by a SELECT INTO statement.
Let's execute the following program to update the table and increase salary of each customer by 5000.
Here, SQL%ROWCOUNT attribute is used to determine the number of rows affected:
Create procedure:
1. DECLARE
2. total_rows number(2);
3. BEGIN
4. UPDATE customers
5. SET salary = salary + 5000;
6. IF sql%notfound THEN
7. dbms_output.put_line('no customers updated');
8. ELSIF sql%found THEN
9. total_rows := sql%rowcount;
10. dbms_output.put_line( total_rows || ' customers updated ');
11. END IF;
12. END;
13. /
Output:
6 customers updated
PL/SQL procedure successfully completed.
Now, if you check the records in customer table, you will find that the rows are updated.
Steps:
You must follow these steps while working with an explicit cursor.
1. CURSOR name IS
2. SELECT statement;
1. OPEN cursor_name;
1. Close cursor_name;
Explicit cursors are defined by programmers to gain more control over the context area. It is defined in
the declaration section of the PL/SQL block. It is created on a SELECT statement which returns more
than one row.
Let's take an example to demonstrate the use of explicit cursor. In this example, we are using the already
created CUSTOMERS table.
Create procedure:
Execute the following program to retrieve the customer name and address.
1. DECLARE
2. c_id customers.id%type;
3. c_name customers.name%type;
4. c_addr customers.address%type;
5. CURSOR c_customers is
6. SELECT id, name, address FROM customers;
7. BEGIN
8. OPEN c_customers;
9. LOOP
10. FETCH c_customers into c_id, c_name, c_addr;
11. EXIT WHEN c_customers%notfound;
12. dbms_output.put_line(c_id || ' ' || c_name || ' ' || c_addr);
13. END LOOP;
14. CLOSE c_customers;
15. END;
16. /
Output:
1 Ramesh Allahabad
2 Suresh Kanpur
3 Mahesh Ghaziabad
4 Chandan Noida
5 Alex Paris
6 Sunita Delhi
PL/SQL procedure successfully completed.
Transaction
o The transaction is a set of logically related operation. It contains a group of tasks.
o A transaction is an action or series of actions. It is performed by a single user to perform
operations for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's
account. This small transaction contains several low-level tasks:
X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Operations of Transaction:
Read(X): Read operation is used to read the value of X from the database and stores
it in a buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the
buffer.
Let's take an example to debit transaction from an account which consists of following
operations:
1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);
o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain 3500.
o The third operation will write the buffer's value to the database. So X's final value will
be 3500.
But it may be possible that because of the failure of hardware, software or power, etc.
that transaction may fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing
operation 2 then X's value will remain 4000 in the database which is not acceptable by
the bank.
Transaction property
The transaction has the four properties. These are used to maintain consistency in a
database, before and after the transaction.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability
Atomicity
o It states that all operations of the transaction take place at once if not, the transaction
is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction is
treated as one unit and either run to completion or is not executed at all.
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Play Video
Example: Let's assume that following transaction T consisting of T1 and T2. A consists
of Rs 600 and B consists of Rs 300. Transfer Rs 100 from account A to account B.
T1 T2
Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)
If the transaction T fails after the completion of transaction T1 but before completion
of transaction T2, then the amount will be deducted from A but not added to B. This
shows the inconsistent database state. In order to ensure correctness of database state,
the transaction must be executed in entirety.
Consistency
o The integrity constraints are maintained so that the database is consistent before and
after the transaction.
o The execution of a transaction will leave a database in either its prior stable state or a
new stable state.
o The consistent property of database states that every transaction sees a consistent
database instance.
o The transaction is used to transform the database from one consistent state to another
consistent state.
For example: The total amount must be maintained before or after the transaction.
Therefore, the database is consistent. In the case when T1 is completed but T2 fails,
then inconsistency will occur.
Isolation
o It shows that the data which is used at the time of execution of a transaction cannot
be used by the second transaction until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then that
data item can't be accessed by any other transaction T2 until the transaction T1 ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.
Durability
o The durability property is used to indicate the performance of the database's consistent
state. It states that the transaction made the permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by the system
failure. When a transaction is completed, then the database reaches a state known as
the consistent state. That consistent state cannot be lost, even in the event of a system's
failure.
o The recovery subsystem of the DBMS has the responsibility of Durability property.
States of Transaction
In a database, the transaction can be in one of the following states -
Active state
o The active state is the first state of every transaction. In this state, the transaction is
being executed.
o For example: Insertion or deletion or updating a record is done here. But all the records
are still not saved to the database.
Partially committed
o In the partially committed state, a transaction executes its final operation, but the data
is still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is executed
in this state.
Committed
A transaction is said to be in a committed state if it executes all its operations
successfully. In this state, all the effects are now permanently saved on the database
system.
Failed state
o If any of the checks made by the database recovery system fails, then the transaction
is said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to
fetch the marks, then the transaction will fail to execute.
Aborted
o If any of the checks fail and the transaction has reached a failed state then the database
recovery system will make sure that the database is in its previous consistent state. If
not then it will abort or roll back the transaction to bring the database into a consistent
state.
o If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two
operations:
1. Re-start the transaction
2. Kill the transaction