Unit - Iii RDBMS Notes
Unit - Iii RDBMS Notes
Feasibility Study
When designing a database, the purpose for which the database is being
designed must be clearly defined.
In other words the objective of creating the database must be crystal clear.
Requirement Collection and Analysis
In requirement collection, one has to decide what data are to be stored, and to
some extent, how that data will be used.
The people who are going to use the database must be interviewed repeatedly.
Assumptions about the stated relationships between various parts of the data
must be questioned again and again.
Prototyping and Design
Design implies a procedure for analyzing and organizing data into a form
suitable to support business requirements and makes use of strategic
technology.
The three phases in relational database design are conceptual design, logical
design, and physical design.
Insertion Anomaly
We cannot insert a department without inserting a member of staff that works in that
department.
Repeating groups are not allowed in a relational design, since all attributes have to be
atomic,
i.e., there can only be one value per cell in a table.
If column A of a table uniquely identifies the column B of same table then it can
represented as A->B (Attribute B is functionally dependent on attribute A)
Multivalued dependency occurs when there are more than one independent
multivalued attributes in atable.
For example: Consider a bike manufacture company, which produces two
colors (Black and white) ineach model every year.
bike_model manuf_year color
4. Normalization
Normalization allows us to minimize insert, update, and delete anomalies and help
maintain data consistency in the database.
1. To avoid redundancy by storing each fact within the database only once
2. To put data into the form that is more able to accurately accommodate change
3. To avoid certain updating “anomalies”
4. To facilitate the enforcement of data constraint
5. To avoid unnecessary coding.
Extra programming in triggers, stored procedures can be required to handle the non-
normalized data and this in turn can impair performance significantly.
Here are the most commonly used normal forms:
First normal form(1NF)
Second normal form(2NF)
Third normal form(3NF)
Boyce & Codd normal form (BCNF)
8812121212
102 Jon Kanpur 9900012222
9990000123
104 Lester Bangalore
8123450987
A table is in second normal form (2NF) if and only if it is in 1NF and every non-key attribute is
fully dependent on the primary key.
Example: Suppose a school wants to store the data of teachers and the subjects they
teach. They create a table that looks like this: Since a teacher can teach more than one
subjects, the table can have multiple rows for a same teacher.
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic values. However, it is not in
2NF because non prime attribute teacher_age is dependent on teacher_id alone which is
a proper subset of candidate key.
This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on
the proper subset of any candidate key of the table”.
Teacher details table:
teacher_id teacher_age
111 38
222 38
333 40
Teacher subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Example: Suppose a company wants to store the complete address of each employee,
they create a table named employee_details that looks like this:
Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are
not part of any candidatekeys.
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept_mapping table:
emp_id emp_dept
1001 stores
Isolation
In DBMS system, there are many transaction may be executed simultaneously.
These transactions should be isolated to each other. One’s execution should not affect the
execution of the other transactions. To enforce this concept
DBMS has to maintain certain scheduling algorithms. One of the scheduling algorithms used is
Serial Scheduling.
Serial Scheduling
In this scheduling method, transactions are executed one by one from the start to finish.
An important technique used in this serial scheduling is interleaved execution.
Types of Failures:
Failures are generally classified as transaction, system, and media failures. There are several
possible reasons for a transaction to fail in the middle of execution:
1. A computer failure (system crash): A hardware, software, or network error occurs in the computer
system during transaction execution. Hardware crashes are usually media failures – for example, main
memory failure.
2. A transaction or system error: Some operations in the transaction may cause it to fail, such as
integer overflow or division by zero. Transaction failure may also occur because of erroneous
parameter values or because of a logical programming error.
Transaction States:
A transaction is an atomic unit of work that is either completed in its entirety or not done at all.
For recovery purposes, the system needs to keep track of when the transaction starts,
terminates, and commits or aborts.
Therefore, the recovery manager keeps track of the following operations:
1. Begin transaction: This marks the beginning of transaction execution.
2. Read or write: These specify read or write operations on the database items that are executed
as part of a transaction.
3. End transaction: This specifies that read and write transaction operations have ended and
marks the end of transaction execution.
4. Commit transaction: This signals a successful end of the transaction so that any changes
(updates) executed by the transaction can be safely committed to the database and will not be
undone.
5. Rollback (or abort): This signals that the transaction has ended unsuccessfully; so that any
changes or effects that the transaction may have applied to the database must be undone.
Types of locks:
Several types of locks are used in concurrency control such as binary locks and
shared/exclusive locks.
Binary Locks: A binary lock can have two states or values:
locked and unlocked (or 1 and 0, for simplicity). A distinct lock is associated with each
database item X. If the value of the lock on X is 1, item X cannot be accessed by a database
operation that requests the item. If the value of the lock on X is 0, the item can be accessed
Deadlocks:
A deadlock is a condition in which two (or more) transactions in a set are waiting
simultaneously for locks held by some other transaction in the set.
Neither transaction can continue because each transaction in the set is on a waiting queue,
waiting for one of the other transactions in the set to release the lock on an item.
Thus, a deadlock is an impasse that may result when two or more transactions are each waiting
for locks to be released that are held by the other.
Transactions whose lock requests have been refused are queued until the lock can be granted.
A deadlock is also called a circular waiting condition where two transactions are waiting
(directly or indirectly) for each other.
Thus in a deadlock, two transactions are mutually excluded from accessing the next record
required to complete their transactions.
Example: A deadlock exists two transactions A and B exist in the following example:
Transaction A=access data items X and Y Transaction B=access data items Y and X Here,
Transaction-A has acquired lock on X and is waiting to acquire lock on y. While, Transaction-
B has acquired lock on Y and is waiting to acquire lock on X. But, none of them can execute
further.
Deadlock Detection and Prevention:
Deadlock detection:
This technique allows deadlock to occur, but then, it detects it and solves it. Here, a
database is periodically checked for deadlocks.
If a deadlock is detected, one of the transactions, involved in deadlock cycle, is aborted.
Other transactions continue their execution.
An aborted transaction is rolled back and restarted.
6. Database Security
Security refers to the protection of data against unauthorized disclosure, alteration, or
destruction; integrity refers to the accuracy or validity of that data.
To put it a little glibly:
– Security means protecting the data against unauthorized users.
– Integrity means protecting the data against authorized users.
The database security system stores authorization rules and enforces them for each database
access.
The authorization rules define authorized users, allowable operations, and accessible parts of a
database.
When a group of users access the data in the database, then privileges can be assigned to
groups rather than individual users. Users are assigned to groups and given passwords.
Terminology: DBMS’s that support mandatory controls are sometimes called multilevel secure
systems. The term trusted system is also used with much the same meaning.