Unit 2 - Dbms-Complete
Unit 2 - Dbms-Complete
UNIT II
CHAPTER 4
RELATIONAL DATA MODEL AND
RELATIONAL DATABASE CONSTRAINTS
4.1.1 Domain: A domain D is a set of atomic values. Each value in the domain is indivisible. A
common method of specifying a domain is to specify a data type from which the data values
forming the domain are drawn.
Example:
Set of telephone numbers- Numbers
Set of student names- Character strings representing names of students
They are logical definitions of domains. A data type or format is specified for each domain.
4.1.2 Relation Schema: A relation schema R(A1,A2,….,An) is made up of a relation R and a list
of attributes A1,A2,….,An. Each attribute Ai is the name of the role played by some domain D in
the relation schema R. D is called the domain of Ai and is denoted by dom(Ai). A relation
schema is used to describe a relation. The degree of a relation is the number of attributes in the
relation schema. A relation or a relation state r of a relation schema R(A1,A2,….,An), is also
denoted by r(R) is a set of n-tuples r={t1,t2,…,tn}. Each n-tuple is an ordered list of values
t=<v1,v2,….,vn> where each value vi is 1<=i<=n is an element of dom(Ai) or a special null
value.
Domain Constraint: This constraint specifies that within each tuple the value of each
attribute A must be an atomic value from the domain dom(A). The data types for domain
include standard numeric data types for integer and real values, fixed length and variable
length strings.
Key Constraints: A relation is a set of tuples. By definition all the elements in a set are
distinct.
No two tuples can have the same combination of values for all attributes.
Let SK be a sub set of attributes of R with the property that no two tuples have the same
combination of values.
Let t1,t2 be two distinct tuples. Then, t1[SK]t2[SK].
Such set of attributes is called a super key of the relation schema R. A super key SK
specifies uniqueness constraint that no two tuples in the state r of R can have the same value
for SK. Every relation has at least one default super key- the set of all its attributes
Candidate Key: A relation schema can have more than one key. Each of the key is called a
candidate key.
Primary key: A candidate key whose values are used to identify tuples in a relation is called
a primary key.
Constraints as null: This is a constraint on attributes that specifies whether null values are
permitted or not. E.g. If every student tuple must have a valid non-null value for name
attribute, then the name attribute is constrained as not null.
Entity Integrity Constraint: It states that no primary key value can be null. This is because
primary keys are used to identify the tuples in a relation.
4.3.1 Insert operation: It provides a list of attribute values for a new tuple that is to be inserted
into R. Insert operation can violate the following constraints:
Domain constraint can be violated if an attribute value does not match the specified
domain.
Key constraint can be violated if a key value in the new tuple t already exists in another
tuple in the relation r(R).
Entity integrity constraint can be violated if the primary key of the new tuple t is null
Referential integrity constraint can be violated if the value of any foreign key in t refers
to a tuple that does not exist in the referenced relation.
4.3.3 Update Operation: This operation is used to modify / change values of one or more
attributes in tuple(s) in a relation R. It is necessary to specify a condition on attributes of a
relation to select a tuple to be modified. Updating an attribute that is neither a primary key nor a
foreign key creates no problem. Modifying a primary key is similar to deleting a tuple and
inserting another in its place. When updating DBMS checks to confirm the new value is of
correct data type and domain.
Assignment
Short Answers (2 marks)
1. Define primary key
2. Define candidate key
3. Define foreign key
4. Define domain of a relation
5. Define relation schema
6. Define tuple in a relation
7. Define attribute in a relation
CHAPTER 5
RELATIONAL ALGEBRA
The basic set of operations on relational model is the relational algebra. These operations
enable a user to specify basic retrieval requests. The result of retrieval is a new relation that is
formed from one or more relations
5.1.1 SELECT Operation: The select operation is used to select a subset of tuples from a
relation that satisfy a selection condition. It can be considered as a filter that keeps only those
tuples that satisfy a qualifying condition. The select operation can be visualized as a horizontal
partition of the relation into two sets of tuples- those that satisfy the condition are selected and
those that do not satisfy the condition are discarded. Select operation is denoted by
<select condition>(R)
E.g. To select employee tuples whose department is 4, it can be specified as follows:
DNO=4(EMPLOYEE)
The boolean expression specified in the selection condition is made up of a number of clauses of
the form
<attribute name> <comparison operator> <constant value>
OR
<attribute name> <comparison operator> <attribute name>
The comparison operators can be any of the elements in the set {, , , , , }
Two or more selection conditions can be combined using boolean operators like AND, OR and
NOT.
Condition 1 AND condition2 is true only if both condition1 and condition2 are true, otherwise it
is false.
Condition 1 OR condition2 is true only if either of the condition is true, otherwise it is false.
NOT condition is true if condition is false and false otherwise
The select operation is commutative
i.e. condition 1 (condition 2 (R))=condition 2condition1 (R))
A sequence of selects can be applied in any order. We can also combine a cascade of select
operations into a single select operation with an AND condition
i.e. condition 1 (condition 2 (…..condition n (R))= condition 1AND condition2 AND condition 3…..condition n(R)
5.1.2 PROJECT Operation: Project operation selects certain columns from the table and
discards other columns. It can be visualized as vertical partitioning of relation into two relations
– one has the required attributes that contains the result of the operation and the other contains
discarded columns. Project operations is denoted by
<attribute list> (R)
Here, <attribute list> is the desired list of attributes that are to be projected from the relation R
For example, if the name and salary of an employee is to be listed, it can be written as follows:
NAME,SAL (EMPLOYEE)
The number of tuples in a relation resulting from project operation is always less than or equal to
the total number of tuples in R.
Union Compatibility:
These are binary operations and are applied on two relations. But, the relations must have same
type of tuples. Two relations R(A1,A2,….,An) and S(B1,B2,…,Bn) are said to be union
compatible if they have the same degree n and if dom(Ai)=dom(Bi) for 1 i n
i.e. Two relations should have the same number of attributes and each corresponding pair of
attributes should have the same domain. This condition is known as union compatibility.
Example: Two union Compatible relations
STUDENT INSTRUCTOR
FNAME LNAME FNAME LNAME
Suresh Rao Sachin Kumar
Ramesh Krishna Rohit Sharma
Ravi Reddy Ravi Reddy
Vipul Kumar
Vinay Kumar
Sachin Kumar
Let R and S be two union compatible relations. The given set of binary operations can be defined
as follows:
5.2.1 UNION – The result of this operation is denoted by RS. The result is a relation that
includes all tuples that are in R or in S or in both R and S. Duplicate tuples are eliminated.
FNAME LNAME
Suresh Rao
Ramesh Krishna
Ravi Reddy
Vipul Kumar
Vinay Kumar
Sachin Kumar
Rohit Sharma
5.2.2 INTERSECTION – The result of this operation is denoted by RS. It results in a relation
that contains all tuples that are present in both R and S
5.2.3 SET DIFFERENCE – The result of this operation is denoted by R-S. It is a relation that
includes all tuples that are in R but not in S.
Examples:
a) STUDENT-INSTRUCTOR
FNAME LNAME
Suresh Rao
Ramesh Krishna
Vipul Kumar
Vinay Kumar
b) INSTRUCTOR-STUDENT
FNAME LNAME
Rohit Sharma
a) Equi-Join – When join conditions involve only equality comparisons on the attributes of
2 tables ,then, such joins are called equi-joins. The result of equi-join operation has
always one or more pair of attributes that are identical in every tuple.
EMP
EMPNO ENAME DNO
E001 RAMA D001
E002 GITA D002
E003 RITA D001
DEPT
DNUM DNAME
D001 HR
D002 SALES
EMP⋈ DEPT
EMPNO ENAME DNO DNUM DNAME
E001 RAMA D001 D001 HR
E002 GITA D002 D002 SALES
E003 RITA D001 D001 HR
b) Natural Join- In order to eliminate identical values in every tuple, we use natural joins.
The definition of natural join requires that the two join attributes have the same name in both
the relations. In case the two attributes do not have the same name, renaming is done.
EMP
EMPNO ENAME DNAME
E001 RAMA HR
E002 GITA SALES
E003 RITA HR
DEPT
DNAME DLOCATION
HR CHENNAI
SALES KANNUR
EMP⋈ DEPT
EMPNO ENAME DNAME DLOCATION
E001 RAMA HR CHENNAI
E002 GITA SALES KANNUR
E003 RITA HR CHENNAI
c) Theta Join – When the join conditions involve all comparison operators on the attributes
i.e.{<,>,<=,>=,!=} then, such a join is called theta join.
CAR
CarModel CarPrice
CarA 20,000
CarB 30,000
CarC 50,000
BOAT
BoatModel BoatPrice
Boat1 10,000
Boat2 40,000
Boat3 60,000
CAR⋈CARPRICE>=BOATPRICE BOAT
CarModel CarPrice BoatModel BoatPrice
CarA 20,000 Boat1 10,000
CarB 30,000 Boat1 10,000
CarC 50,000 Boat1 10,000
CarC 50,000 Boat2 40,000
Left Outer Join ⟕: This operation keeps all the tuples in the first or left relation (R x S) i..e. R.
If no match is found in S then the join result pads the attribute with null values.
Employee
Name EmpId DeptName
Hari 3415 Finance
Samit 2241 Sales
Geetha 3401 Finance
Haritha 2202 Sales
Tom 1123 Executive
Dept
DeptName Manager
Sales Haritha
Production Charles
Employee ⟕ Dept
Name EmpId DeptName Manager
Hari 3415 Finance NULL
Samit 2241 Sales Haritha
Geetha 3401 Finance NULL
Haritha 2202 Sales Haritha
Tom 1123 Executive NULL
Right Outer Join(⟖): This operation keeps all the tuples in the second or right relation (R x S)
i..e. S. If no match is found in R then the join result pads the attribute with null values.
Employee
Name EmpId DeptName
Hari 3415 Finance
Samit 2241 Sales
Geetha 3401 Finance
Haritha 2202 Sales
Tom 1123 Executive
Dept
DeptName Manager
Sales Haritha
Production Charles
Employee ⟖ Dept
Name EmpId DeptName Manager
Samit 2241 Sales Haritha
Haritha 2202 Sales Haritha
NULL NULL Production Charles
Full Outer Join(⟗): This operation keeps all the tuples in both the left and right relations.
When no match is found in either relations, the corresponding attributes are padded with null
values as needed.
Employee
Name EmpId DeptName
Hari 3415 Finance
Samit 2241 Sales
Geetha 3401 Finance
Haritha 2202 Sales
Tom 1123 Executive
Dept
DeptName Manager
Sales Haritha
Production Charles
Employee ⟗ Dept
Name EmpId DeptName Manager
Hari 3415 Finance NULL
Samit 2241 Sales Haritha
Geetha 3401 Finance NULL
Haitha 2202 Sales Haritha
Tom 1123 Executive NULL
NULL NULL Production Charles
5.3.2 Division Operator:
The division operation is denoted by and is useful for special kind of query that
sometimes occur in database operation. The division operator is applied to two relations R(Z)
S(X) where XZ. For a tuple t to appear in the result T of the division, the values in t must
appear in R in combination with every tuple t in S.
Example: T=RS
R S T
Empno Pno Empno Pno
101 1 101 1
102 1 102 4
103 1 103
104 1
101 2
103 2
102 3
103 3
104 3
101 4
102 4
103 4
Example: To retrieve the department number, number of employees in the department and their
average salary, we can write it as
R(DNO,NO-OF-EMP,AVG-SAL) (DNO COUNTENO,AVERAGE SAL(EMPLOYEE))
Assignment
Short Answers (2 marks)
1. What is the purpose of SELECT operation in relational algebra?
2. What is the purpose of PROJECT operation in relational algebra?
3. List the set theory operations in relational algebra.
4. What is union compatibility?
5. What do you mean by outer union operation?
CHAPTER 6
FUNCTIONAL DEPENDENCIES AND NORMALIZATION
6.1 Functional Dependency:
Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y
in R X→Y if and only if each value of X is associated with one value of Y. Here X is called
determinant and Y is called dependent.
Example: Student table
Rollno Name Sub-code Subject Marks
121 Priya BCA01 DBMS 59
121 Priya BCA02 OOPS 70
122 Rama BCA01 DBMS 80
122 Rama BCA02 OOPS 65
123 Payal BCA03 NETWORKS 70
Here, rollno does not uniquely identify rows in a table, therefore it cannot be a primary key.
Similarly, sub-code does not uniquely identify rows in a table. But, a combination of rollno and
sub_code uniquely identifies a row in the table. Hence (rollno,sub-code) together will be a
primary key in a table
Normal Forms:
The different forms of normalization that can be applied to relations are as follows:
First Normal Form 1NF
Second Normal Form 2NF
Third Normal Form 3NF
Boyce-Codd Normal Form BCNF
First Normal Form (1NF)
A relation R is said to be in 1NF if every attribute of R takes only single atomic values. In
order to transform un-normalized table to 1NF we identify and remove repeating groups within a
table.
Example
DEPT
Deptno Deptname DeptLoc
D001 Accounts Chennai
D002 R&D Delhi,Bangalore
EMP-PROJ
ECODE P-NUMBER HOURS ENAME PROJ-NAME PLOCATION
E1
E2
E3
|| 2NF
E1
ECODE P-NUMBER HOURS
E2
ECODE ENAME
E3
P-NUMBER PROJ-NAME PLOCATION
Transitive Dependency:
A functional dependency X→Y is a transitive dependency if there is a set of attributes Z in R
such that if X→Y and Y→Z, then, X→Z.
Example:
ECODE ENAME DOB DEPTNO DNAME DMNGR
The above table has transitive dependency. Here, ecode→deptno and deptno→dmngr. Hence,
ecode→dmngr.
Example:
EMPDEPT
ECODE ENAME DOB ADDR DEPTNO DNAME DMNGR
3NF
EMP
ECODE ENAME DOB ADDR DEPTNO
DEPT
DEPTNO DNAME DMNGR
this dependency if Y is a primary key attribute and A is not a candidate key; whereas in BCNF X
must be a candidate key. Hence, BCNF is stronger than 3NF.
Example:
PRODUCT(PROD_NO,PNAME,PRICE)
Here, PROD_NO →PNAME,PRICE
In the above example, PROD_NO is the candidate key. Hence the above relation is in BCNF
Assignment
Short Answers (2 marks)
1. Define normalization
2. Define 1NF
3. Define functional dependency
4. Define transitive dependency
CHAPTER 7
DISK STORAGE
7.1 Introduction:
The collection of data that makes up a computerized database must be physically stored
on a computer storage medium. The DBMS can then retrieve, update and process data as and
when needed. The two main categories of computer storage media are
a. Primary Storage – This category includes storage media that can be directly accessed by
CPU. It provides fast access to data but is of limited storage. Example: RAM, ROM
b. Secondary Storage – Data stored on these devices cannot be directly accessed by the
CPU. It must be first be copied onto the primary storage. These devices have larger
capacity, cost less and provide slower access to data. Example: Magnetic disks, optical
disks and tapes
hard disk independent of and in parallel to CPU processing. Buffering is most useful when
processes can run concurrently in a parallel fashion either because separate disk I/O processor is
available or multiple CPU processors exist.
Double Buffering: The CPU can start processing a block once its transfer to the main memory is
completed. At the same time disk I/O processor can be reading and transferring the next block
into a different buffer. This technique can be used to write a continuous stream of blocks from
memory to disk. It permits continuous reading/writing of blocks thus, eliminating seek time and
rotational delay for all except for the first block transfer.
A file may have variable length records because of the following reasons
The file records are of same record type but one or more fields may be of varying size.
The file records are of same record type but , one or more fields may have multiple value
for the same field for individual records. Such fields are called repeating fields.
A file may contain same record type but the fields may be optional.
A file may contain records of different record types and hence of varying size.
For variable-length fields, each record has a value for each field, but we do not know the
exact length of some fields. To determine the bytes within a particular record that represent each
field, we can use special characters (such as $, %,?) - which do not appear in any field value to
terminate variable length fields. These characters are called separator characters.
The values in the record are stored as <field name, field value> pairs. Separators are used
to separate field name from field value, separate one field from the next and for repeating field.
We can also assign a short field type code – say, an integer number to each field and include the
record sequence as <field-type, field value>pairs. Repeating field needs one separator character
to separate the repeating values of the field and another separator to indicate the termination of
the field.
by block. Deleting a record is also a slow process because we need to first locate the record to be
deleted. The program first find the a block, copies the block into the buffer, then delete the
record from the buffer and finally rewrite the block back to the disk. This results in wastage of
storage space. Another technique is to have an extra bit/byte for each record called the deletion
marker. A record is deleted for a certain value in the deletion marker. A different value for the
marker indicates that the record cannot be deleted. Both the above methods require periodic
reorganization of the file to reclaim the unused space. This organization is also refered to as
sequential file organization.
setting the pointer of the occupied hash address location to the address of that overflow
location.
Multiple hashing – The program applies a second hash function if the first results in a
collision
Assignment
Short Answers (2 marks)
1. Define mirroring
2. Define stripping
3. Define seek time
4. Define block transfer time