Dinesh Verma - MCS-043 Advanced Database Management Systems (2021)
Dinesh Verma - MCS-043 Advanced Database Management Systems (2021)
MANAGEMENT SYSTEMS
MCS - 043
For
Masters In Computer Applications [MCA]
Dinesh Verma
S. Roy
Useful For
IGNOU, KSOU (Karnataka), Bihar University (Muzaffarpur), Nalanda
University, Jamia Millia Islamia, Vardhman Mahaveer Open University
(Kota), Uttarakhand Open University, Kurukshetra University, Seva
Sadan’s College of Education (Maharashtra), Lalit Narayan Mithila
University, Andhra University, Pt. Sunderlal Sharma (Open) University
(Bilaspur), Annamalai University, Bangalore University, Bharathiar
University, Bharathidasan University, HP University, Centre for distance
and open learning, Kakatiya University (Andhra Pradesh), KOU
(Rajasthan), MPBOU (MP), MDU (Haryana), Punjab University,
Tamilnadu Open University, Sri Padmavati Mahila Visvavidyalayam
(Andhra Pradesh), Sri Venkateswara University (Andhra Pradesh),
UCSDE (Kerala), University of Jammu, YCMOU, Rajasthan University,
UPRTOU, Kalyani University, Banaras Hindu University (BHU) and all
other Indian Universities.
New Edition
Price: `129/-
ISBN: 978-93-81638-09-5
Dear Reader,
Welcome in the world of GullyBaba Publishing House (P) Ltd. Profound,
in-depth study and research can guarantee you the most reliable, latest & accurate
information on the subject. However, as the saying goes, nothing is perfect, but we
still believe that there is always a scope for improvement. And we wish to be
nothing less but aim for the best.
You, the reader can be our best guide in making this book more interesting and user
friendly.
Your valuable suggestions are welcome!
Feedback about the book can be sent at [email protected].
Publisher.
TOPICS COVERED
Block—1 Data Base Design and Implementation
CAR and TRUCK can be generalized into entity type VEHICLE. Therefore,
CAR and TRUCK can now be subclasses of the super class generalized class
VEHICLE.
G S
E P
Vehicle
N E
E C
R I
A A
L d L
I I
S S
E Car Scooter Truck E
D D
But if the real world entity is not disjoint their set of entities may overlap; that
is an entity may be a member of more than one subclass of the specialization.
This is represented by an (o) in the circle. For example, if we classify cars as
luxury cars and cars then they will overlap.
MCS-043 GPH Book 3
In some cases, a single class has a similar relationship with more than one
class. For example, the sub class ‘car’ may be owned by two different types
of owners:
INDIVIDUAL or ORGANISATION. Both these types of owners are different
classes thus such a situation can be modeled with the help of a Union (u).
one exists.
An academic department has the attributes name DName, telephone DPhone,
and office number Office and is related to the faculty member who is its
chairperson CHAIRS and to the college to which it belongs CD. Each college
has attributes college name CName, office number COffice, and the name of
its dean Dean.
A course has attributes course number C#; course name Cname, and course
description CDesc. Several sections of each course are offered, with each
section having the attributes section number Sec# and the year and quarter in
which the section was offered (Year and Qtr). Section numbers uniquely
identify each section. The sections being offered during the current semester
are in a subclass CURRENT_SECTION of SECTION, with the defining
predicate Qtr = CurrentQtr and Year = CurrentYear. Each section is related to
the instructor who taught or is teaching it (TEACH, if that instructor is in the
database).
The category INSTRUCTOR_RESEARCHER is a subset of the union of
FACULTY and GRAD_STUDENT and includes all faculty, as well as graduate
students who are supported by teaching or research. Finally, the entity type
GRANT keeps track of research grants and contracts awarded to the university.
Each grant has attributes grant title Title, grant number No, the awarding
agency Agency, and the starting date StDate. A grant is related to one principal
investigator PI and to all the researchers it supports SUPPORT. Each instance
of support has an attributes the starting date of support Start, the ending date
of the support (if known) End, and the percentage of time being spent on the
project Time by the researcher being supported.
MCS-043 GPH Book 5
FPhone FACULTY
Class
1
N College Degree Year
ADVISOR
STUDENT
M
N Degrees
COMMITTEE
Class=5
GRAD_STUDENT
1
Pl
Title
N No
GRANT
Agency
M U
StDate M
BELONGS Start N
1 N
Time
SUPPORT MINOR N REGISTERED
CHAIRS N End
MAJOR
M INSTRUCTOR_RESEARCHER 1 N
1
1 Grade
M
1
TEACH N TRANSCRIPT
CURRENT_SECTION
Qtr=Current Qtr
N
and
Year=Current Year
Sce#
DEPARTMENT
SECTION Year
DName DPhone Office N Otr
CS
1
1 N 1
N
COLLEGE CD DC COURSE
Q7. What are Multi valued Dependencies? When we can say that a
constraint X is multi determined.
Ans. An MVD is a constraint due to multi-valued attributes. A relational must
have at least 3 attributes out of which two should be Multi-valued.
Ans.EMP_PROJECTS
ENAME PNAME
Krishna X
Krishna Y
EMP_DEPENDENCIES
ENAME DNAME
Krishna Vijay
Krishna Manu
Q9. Does the following relation satisfy MVDs with 4NF? Decompose the
following relation into 5NF.
SUPPLY
SNAME PARTNAME PROJNAME
Krishna Bolt X
Krishna Nut Y
Radha Bolt Y
Vasudev Nut Z
Radha Nail X
Radha Bolt X
Krishna Bolt Y
Ans. No,
R1
SNAME PARTNAME
Krishna Bolt
Krishna Nut
Radha Bolt
Vasudev Nut
Radha Nail
R2
MCS-043 GPH Book 7
SNAME PROJNAME
Krishna X
Krishna Y
Radha Y
Vasudev Z
Radha X
R3
PARTNAME PROJNAME
Bolt X
Nut Y
Bolt Y
Nut Z
Nail X
Q10. Define 5NF. Explain why the following relation does not satisfy 5NF
and decompose it to satisfy 5NF. What advantages are gained by this
decomposition? [Dec06(s), Q1(i)]
Parent Child Hobby
P1 C1 H1
P1 C1 H2
P1 C2 H3
P2 C3 H3
P2 C4 H3
P3 C4 H4
Ans. 5NF: A Relation is said to be in the fifth normal form (5NF) if every join
dependency is a consequence of its relation (candidate keys).
Alternatively, for each and every non-trivial join dependency * (R1, R2, R3)
each decomposed Relation R1 is a super key of the main Relation.
5NF is also called project-Join Normal Form (PJNF).
Projections:
D1:
Parent Child
P1 C1
P1 C2
P2 C3
P2 C4
P3 C4
D2:
Parent Hobby
P1 H1
P1 H2
P1 H3
P2 H3
P3 H4
D3:
Parent Hobby
P1 H1
P1 H2
P1 H3
P2 H3
P3 H4
D1 D2
D1 D3
Spurious tuple(s)
D2 D3
record missing: P2 C4 H3
But D1 = D2 = D3 = R
So *(D1, D2, D3) is join dependency
*((Parent, Child), (Child, Hobby), (Parent, Hobby)) is a join dependency.
The consequence of these join dependencies is that parent, child or Hobby is
not relation key.
Decompose Relation to Satisfy 5NF:
R Decomposed in
10 GullyBaba Publishing House (P) Ltd.
R1 (Parent, Child),
R2 (Parent, Hobby), and
R3 (Child, Hobby)
Advantage of 5NF Decomposition
Relations R1 R2 and R3
are in 5NF
as it satisfy trivial join-dependency
So, no need to insert a value of Hobby when new child record is inserted.
Let us show both the points above with the help of an example each.
T (A B C)
a' b c"
a' b c'
a b ' c"
a b c
TD.
T (A B)
b c
a b
a c
Figure : Untyped TD T
Let us now show the relationship of JD and MVD to the TD.
T (A B C)
a b c'
a b' c
a b c
Figure : A TD T for MVD
However, please note that not every TD corresponds to a JD. This can be
ascertained from the fact that their can an infinite number of different TDs
over a given relation schema, whereas there is only a finite set of JDs over the
same schema. Therefore, some of the TDs must not be equivalent to any JD.
Q17. List the various points which should be kept in mind while modeling
the time in a temporal database system.
In a temporal database system you need to model time keeping the following
points in mind:
• You need to define database as a sequence of time based data in chronological
order.
• You need to resolve events that happen at the same time.
• A reference point of time may be defined to find time relative to it.
• Sometimes a calendar is used.
MCS-043 GPH Book 13
Q19. Which of the information System Life Cycle stages are important
for database design?
Ans. The stages where a lot of useful input is generated for database design
are communication and modeling.
Chapter-2
DATABASE IMPLEMENTATION
(i) Single Schema Design : This process results in the development of single
schema.
Name
ID
Code Details
‘Type’ can be regular or visiting faculty. A visiting faculty members can
teach only one programme. Make a suitable database application design
for this.
Ans. The modified EER diagram is:
Name
Teacher Teaches Programme
m n
ID
Code Details
d
Increment Host
date institute
TEACHER (id, Name, Increment-date)
VISITING (id, Name, Host-institute)
TEACHES (id, code)
PROGRAMME (code, details)
Transaction/constraints: The id in TEACHES, if exists in id in visiting then
COUNT on ‘id’ in teaches should not exceed 1.
Q9. What is UML? Describe its main features, also what are major building
blocks of UML? Why is it widely used?
Ans. The Unified Modeling Language (UML) is used to express the construct
and the relationships of complex systems. It was created in response to a
request for proposal (RFP) from the Object Management Group (OMG). Earlier
in the 1990s, different methodologies along with their own set of notations
were introduced in the market. The three prime methods were OMT
(Rumbaugh), Booch and OOSE (Jacobson). OMT was strong in analysis,
Booch was strong in design, and Jacobson was strong in behavioral analysis.
UML represents unification of the Booch, OMT and OOSE notations, as well
as are the key points from other methodologies. The major contributors in this
development shown in Figure below.
UML is an attempt to standardize the artifacts of analysis and design consisting
of semantic models, syntactic notations and diagrams. The first draft (version
0.8) was introduced in October 1995. The next two versions, 0.9 in July 1996
and 0.91 in October 1996 were presented after taking input from Jacobson.
Version 1.0 was presented to Object Management Group in September 1997.
In November 1997, UML was adopted as standard modeling language by
OMG. The current version while writing this material is UML 2.0.
Booch
UML
Rumbaugh Development
Jacobson
Mayer
Harel
Brock
Odell
Mellor
Gamma
Embley
Fusion
The Input for UML development
The major features of UML are:
• defined system structure for object modeling
• support for different model organization
• strong modeling for structure and behavior
• clear representation of concurrent activities
MCS-043 GPH Book 19
Dynamic Model
Static Model
Class Structure
Logical Model Object Structure
The Logical view of a system serves to describe the existence and meaning
of the key abstractions and the mechanism that form the problem space, or
that define the system architecture.
The Physical model describes the concrete software and hardware components
of the system’s context or implementation. UML could be used in visualizing,
specifying, constructing and documenting object oriented systems. The major
building blocks of UML are structural, behavioral, grouping, and annotational
notations. Let us discuss these blocks, one by one.
(a) Structural Notations: These notations include static elements of a model.
They are considered as nouns of the UML model which could be conceptual
or physical. Their elements comprises class, interface, collaboration, use case,
active class, component, and node. It also includes actors, signals, utilities,
processes, threads, applications, documents, files, library, pages, etc.
(b) Behavioral Notations: These notations include dynamic elements of a
model. Their elements comprises interaction, and state machine. It also includes
classes, collaborations, and objects.
(c) Grouping Notations: These notations are the boxes into which a model
can be decomposed. Their elements comprises of packages, frameworks,
and subsystems.
(d) Annotational Notations: These notations may be applied to describe,
illuminate, and remark about any element in the model. They are considered as
20 GullyBaba Publishing House (P) Ltd.
Class Name
Attribute: Type = initial Value
Operation (arg list): return type
<<Interface>>
Message
Comparable
Realizing an interface
Component
Component
Components
Components
Use case
<<user>>
Actor
Use case
Use case
Use case
Actor
Relationship between actor and use case
Package
Name
Package Name
+Attribute 1
+Attribute 2
+Attribute 3
Package Diagram
Use Case
<<uses>>
Actor
Use Case
<<extends>>
Inheritance
Dependency
Aggregation
Containment
Association
Directed
Association
Realization
Component
Component
Dependency
Component
Package Name
+Attribute 1
-Attribute 2
<<import>>
Package Name
+Attribute 1
-Attribute 2
Node Course
Hospital
Hospital
Hospital
MCS-043 GPH Book 27
role
Class A Class B
role
1 *
Message Queue Message
Aggregation
A state encompasses all the properties of the object along with the values of
each of these properties.
Student date
Name 1
Class
Batch * Room
1
Check Class Location
Check Batch () capacity
Class diagram for a class room scheduling system
a model. Their elements comprise interaction and state machine and also classes,
collaborations and objects.
1. Use Case Diagram : It shows a set of use cases, actors and their
relationships according to the context or requirement of system. It contains
use cases, actors dependency, generalizations, association, relationships, roles,
constraints, packages and instances. Such types of diagrams makes systems,
a subsystems and classes approachable by presenting outside view of how the
elements may be used in context.
<<extend>>
Extend cases
<<Message (x)
<<Destroy>>
Return (Y)
The sequence diagram for sending the document along the matter is shown
as:
load files ( )
[ERROR]
send ERROR ( )
[HTML]
send Document ( )
[CGI] process ( )
[ERROR]
send ERROR ( )
send Document ( )
: Base Class
Collaboration Diagrams
Activity Diagrams : It shows the flow from one activity to another. An
activity is an ongoing nonatomic execution with in a state machine. It results
in some action which results in a change in state of the system or return of a
value. It contains activity states, action states, transition states and objects.
Establish E-mail
Communication
Receive
Send E-mail Response
See Subsidiary
Activity Diagram
UML Activity
Private Diagram E-mail
control Encryption
else Wait 1 hours
after sending
End [no reply]
(error)
Encrypted Encrypted
E-mail E-mail
Proper
reply
Communication
Activity diagram
Established
for E-mail encryption
MCS-043 GPH Book 33
State State
do activity do activity
Q14. Which of the UML diagrams help in database schema design the
most and why?
Ans. The class diagram – it resembles ER diagram and also shows the possible
transactions. Thus, it is useful for schema design as well as application design.
Generalization
Address
Section Section-Name
accounts & billing Administrators Section-
Specialization
Maintainer Updates
HAS
Inventory Management
Department D-Number
Manage
Is-A D-Name
I-Number
Quantity
I_Quantity
V_Name Request for
I_code Purchase
Vendor Purchases
Address
GullyBaba Publishing House (P) Ltd.
MCS-043 GPH Book 37
Here the attributes of tables are enclosed within curly braces and underlined
attributes represent key to the relation.
The EMPLOYEE table is in 3NF because code is the prime attribute of
EMPLOYEE and all other attributes are dependant on the code attribute i.e.
Code →| Name, Address, Designation
The INVENTORY table is in 3NF because I-Code is the prime attribute and all
other attributes are dependant upon I-Code i.e.
I-Code →| I-Name Quantity
The DEPARTMENT table is also in 3NF because the D-Number is the prime
attribute of DEPARTMENT and D-Name is dependant on D-Number i.e.
D-Number →| D-Name
The VENDOR table is also in 3NF because V-Code is the prime attribute of
VENDOR table and
V-Code →| V-Name Address
This implies all are in 3NF.
Q2. A construction company has many branches spread all over the
country. The company has two types of constructions to offer: Housing
and commercial. A housing company provides low income housing,
medium style housing and high-end housing schemes, while on the
commercial side it offers Multiplexes and shopping zones. The customers
of the company may be individuals or corporate clients. Draw the E-R
diagram for the company showing generalization or specialization
hierarchies and aggregations, if any. Also create 3NF tables of your design.
Ans.
38 GullyBaba Publishing House (P) Ltd.
B_Code
Company Has Branches
C_Code B_Address
Specialization
Is_a Is_a Is_a Is_a
Specialization
LIG MIG HIG Multiplex Shopping
If a table is in 3NF, we Leave it as it is, otherwise we break the table into more
tables to bring it into 3NF.
Name Membership#
USERS
University Date-of-Issue
HAS Is-A
HAS GETS
Generalization
Date-of-
Student Staff E-Code Return
Library
Specialization
Course S-Code Is-A
Returns
Editor
Name READING-MATERIALS
Number-of-Copies
Is-A
Specialization
Generalization
Books Journals
Book_Name J_Volume
Book_id J_id
If a table is in 3NF, we Leave it as it is, otherwise we break the table into more
tables to bring it into 3NF.
Account_number
Account Holder_name
Open_Date
Type Balance
characteristics.
For e.g., Saving Account provides cheque facility, Interest at fixed Rate, Saving
Account has maintained minimum balance as limit for amount withdrawal.
While Debit_Card Account allow withdrawal beyond balance but charge or
debit interest on over_draft amount.
So, create separate Entity for each this account.
Open_Date Account_number
Account
is-A
Maximum
overdraft_limit
Branch_Code
Street
Pin
Branch_Contact
MCS-043 GPH Book 43
Customer
Area Name
City Address
Pin
• A customer can open many Accounts, but an account can be opened by one
customer only.
Open_Date Account_number
Branch_Code
Open_Balance Type
Account Open
l Branch Branch_Address
m
is-A
Address Area
Pin City
Q5. A university has many Academic Units named schools. Each school
is headed by a Director of School. The school has teaching and non-
teaching staff. A school offers many courses. A course consists of many
subjects. A subject is taught to the students who have registered for that
subject in a class by a teacher. Draw the ER diagram for the university
specifying aggregation, generalisation or specialisation hierarchy, if any.
Create 3NF tables of your design. Make suitable assumptions, if any.
Ans.
E-R Diagram of University as given below:-
Room_No Course_code
MCS-043 GPH Book
Teaches
Designation
Time
Name Code
Salary Registration
S-Code
Specialization
Headed-By
Schools
Generalization
Teaching Non-Teaching
Is-A S_Address
offers
Generalization
Specialization
The staff table is in 3NF because the code is the attribute of Staff relation & all
other attributes are dependent are dependant on this i.e.
Code Name, Designation Address
The school table is in 3NF because sch-Name is the PK of school relation and
all other attributes are dependant on this
Sch-Name Address
The student table is not in 3NF because the S-Code is its PK and the attributes
S-Name, Address and C-Code are dependant upon S-Code but C-name is
dependant upon C-Code. This is transitive dependency.
S-Code S-N Address C-Code
MCS-043 GPH Book 47
but
C-Code C-Name
The course relation is in 3NF because C-Code is its PK and all other attributes
are dependant on this i.e.
The schedule relation is in 3NF because all attributes of this relation form a PK
i.e. composite Primary key
This implies all relations are in 3NF.
Chapter-4
ADVANCED SQL AND SYSTEM CATALOGUE
Q1. What are Assertions? What is the utility of assertions? How are
Assertions different from Views? Describe both Assertions and Views
with the help of example. [June-07, Q1(c)]
Ans. Assertions are constraints that are normally of general nature. For example,
the age of the student in a hypothetical University should not be more than 20
years of the minimum age of the teacher of that University should be 30 years.
Such general constraints can be implemented with the help of an assertion
statement. The syntax for creating assertion is:
Syntax :
CHECK (<Condition>);
Thus, the assertion on age for the University as above can be implemented as:
The assertion name helps in identifying the constraints specified by the assertion.
These names can be used to modify or delete an assertion later. But how are
these assertions enforced? The database management system is responsible
for enforcing the assertion on to the database such that the constraints stated
MCS-043 GPH Book 49
in assertion are not violated. Assertions are checked whenever a related relation
changes.
Assertion : The total medical claims made by a faulty member in the current
financial year should not exceed his/her medical allowance.
View Definition
Views are tables whose contents are taken or derived from other tables. View
tables do not contain data. They just act as a window through which certain
(not all) contents of a table can be displayed or seen. Also, one can manipulate
the contents of the original table through such views.
For example, suppose we want that the clerk in your office should not have
access to the details of the salaries of other employees, but he needs such
information as EMP_NAME, DESIG and DEPT_NAME. Then we can create
a view for this clerk. This view will contain only the required information and
nothing more than that.
Similarly we can create a view for the Manager, containing the information
that he needs. Hence, depending upon the requirement of different users we
can create different views, all based upon one or more than one tables. View
are created using the CREATE command. The syntax is as follows:
50 GullyBaba Publishing House (P) Ltd.
Once created, a view can be treated just as you treat any other table. You can
retrieve the records in a view using the SELECT command, delete the records
using the DELETE command and update them using the UPDATE command.
Views can also be joined to other views and / or table.
To view the records in the clerk view, the clerk user can use the following
SELECT statement:
SELECT * FROM clerk;
Uses of View
Views are used for the following purposes:
(a) Views restrict access to a database. If you want users to see some but not
all data of a table, you can create a view showing only that part of the table.
(b) Critical data in a table can be easily safe guarded using views.
(c) Views allow user to make simple queries to retrieve the required results.
For example, users can retrieve information from multiple tables without
knowing how to perform a join.
(d) Views allow the same data to be seen by different users in different ways.
Ans. Logical data independencies indicate that the conceptual scheme can be
changed without affecting the existing internal and conceptual levels. Logical
data independence is achieved by providing external level/user view of the
database.
The relations stored in the database are termed as base relations. Corresponding
to these relations we derive a new relation. They are important because of
1. Security Concerns.
2. Forming more than one base Table.
In short a view in SQL terminology is a single table that is derived from other
tables. These other tables could be base tables or previously defined views.
A view does not necessarily exist in physical form, it is considered as a virtual
table.
The application program or users see the database as described by their external
views. The user is provided with the abstraction of physical data and user
manipulates this abstraction. For example: We have two relations Employee
personal and employee_salary. Now for an external user we can create view
containing attributes from both the relations.
This view does not exist physically.
employee-personal (emp#, name, address)
employee-salary (emp#, post, salary)
we can define view:
employee-information (emp#, name, post, salary)
Example :- Suppose we have 3 relations as:-
EMPLOYEE:
Super
FNAME Minit LName SSN BDatte Address Sex Salary SSN DNo.
EMPLOYEE:
WORKS-ON
WORKS-ON 1
The view implies logical data independence in the sense that if fields are added
or deleted from the base relations and if there field name have been put into the
view then the view need not be altered. The base relations will be altered &
views will remain the same. This provides the high availability of data to the
users. Here conceptual schema can be changed without changing the external
schemas.
Thus, we can say that view implements logical data independence.
Q4. Consider a constraint – the value of the age field of the student of a
formal University should be between 17 years and 50 years. Would you
like to write an assertion for this statement?
Ans. The constraint is a simple Range constraint and can easily be implemented
with the help of declarative constraint statement. (CHECK CONSTRAINT
statement can be used here). Therefore, there is no need to write an assertion
for this constraint.
Ans. There are two strategies for implementing the views. These are :
- Query modification
- View materialisation
In the query modification strategy, any query that is made on the view is
modified to include the view defining experssion. For example, consider the
view STUDENT-PERFORMANCE. A query on this view may be : The teacher
of the course MCS-013 wants to find the maximum and average marks in the
course. The query for this in SQL will be :
SELECT MAX (smakrs), AVG(smarks)
FROM SUBJECT-PERFORMANCE
Since SUBJECT-PERFORMANCE is itself a view the query will be modified
automatically as :
SELECT MAX (smarks), AVG (smarks)
FROM STUDENT s, MARKS m
WHERE s.enrolement-no=m.enrolement-no AND subjectcode="MCS-013";
However, this approach has a major disadvantage. For a large database system,
if complex queries have to be repeatedly executed on a view, the query
modification will have to be done each time, leading to inefficient utilisation of
resources such as time and space.
Q7. What are the weaknesses of SQL? Can a host language help SQL in
overcoming these weaknesses? Justify your answer with the help of
examples.
Ans. SQL is both the data definition & data manipulation language of a number
of relational database Management systems. It is based on tuple calculus. It
resembles relational algebra in some places and tuple calculus in others
Weaknesses of SQL
1. SQL does not fully support some of the basic features of relational data
model : The concept of domains, entity & referential integrity and hence the
concept of primary key and foreign keys.
2. SQL is redundant in the sense that the same query may be expressed in
more than one way.
54 GullyBaba Publishing House (P) Ltd.
It is true that a host language can help SQL in overcoming its weaknesses.
SQL only provides facilities to define and retrieve data interactively. To extend
data manipulation operations of SQL. Example to separately process each
tuple of a relation - it should be used with traditional high level language,
which is called host language.
#include<stdio.h>
#include<stdio.h>
VARCHAR user name [30];
VARCHAR pass word [10];
VARCHAR v_fname [15];
VARCHAR v_minit [1];
VARCHAR v_lname [15];
VARCHAR v_address [30];
char v_ssn [9];
float f_salary;
main ()
{
Strcpy (user name.arr, “Scott”);
Username.len=strlen (Username.arr);
Strcpy (password,arr,”TIGER”);
Password.len=Strlen (Password.arr);
EXEC SQL WHENEVER SQLERROR DO SQL_error();
EXEC SQL CONNECT : Username IDENTIFIED By : password;
EXEC SQL SELECT fname,minit, lname,address, salary
MCS-043 GPH Book 55
Q8. Why is the embedded SQL used? Describe an application that needs
embedded language. How are static embedded SQL different from
dynamic embedded SQL?
Ans. When an SQL statement is embedded in a source program then it is
called embedded SQL. The DBMS precompiler accepts this source code as
input and translates the SQL compiled. Basically, it is based on cursor concept
that can range over the query result.
When a job having complex calculations is to be done at different time, then it
is very easy to code the entire logic in a programming language & get the data
through the embedded SQL. Now put the program in library function call
from where it can be called as and when required.
embedded DML statements are not precompiled nor optimized. We can use
statically embedded DML where we can ascertain the data type and range of
HLL variable before execution. In this situation we will not get data type
mismatch errors. Some times we can not ascertain the data type and range of
variables produced by the HLL at run time besides we have to use these variables
in DML. In this situation we use dynamically embedded statements for example.
Let we have VB environment. We know all the variables at design time for
example these are item_code, quantity.
Example:1
Option explicit
Dim item_code, quantity
Item_code = text1.text
Quantity = Limit (text2. text)
rs. address ‘rs is opened recordset object.
rs. fields (“item_code”) = item_code
rs. fields (“quantity”) = quantity
rs. update.
However, they are more powerful than embedded SQL as they allow run time
application logic. The basic advantage of using dynamic embedded SQL is
that we need not compile and test a new program for a new query.
Let us explain the use of dynamic SQL with the help of an example:
Example : Write a dynamic SQL interface that allows a student to get and
modify permissible details about him/her. The student may ask for subset of
information also. Assume that the student database has the following relations.
STUDENT (enrolno, name, dob)
RESULT (enrolno, coursecode, marks)
In the table above, a student has access rights for accessing information on
his/her enrolment number, but s/he cannot update the data. Assume that user
names are enrolment number.
/*declarations in SQL*/
EXEC SQL BEGIN DECLARE SECTION;
char inputfields (50);
char tablename(10)
58 GullyBaba Publishing House (P) Ltd.
SQLJ cannot use dynamic SQL. It can only use simple embedded SQL. SQLJ
provides a standard form in which SQL statements can be embedded in JAVA
MCS-043 GPH Book 59
program. SQLJ statements always begin with a #sql keyword. These embedded
SQL statements are of two categories– Declarations and Executable Statements.
Example:
Let us write a JAVA functions to print the student details of student table, for
the student who have taken admission in 2008 and name are like ‘Shyam’.
Assuming that the first two digits of the 9-digit enrolment number represents
year, the required input conditions may be:
Q12. Q1. A University decided to enhance the marks for the students by
2 in the subject MCS-013 in the table: RESULT (enrolno,
coursecode,marks). Write a segment of embedded SQL program to do
this processing.
Ans. /* The table is RESULT (enrolno.coursecode, marks). */
EXEC SQL BEGIN DECLARE SECTION;
char enrolno [10], coursecode[7]; /* grade is just one character */
int marks;
char SQLSTATE[6]
EXEC SQL END DECLARE SECTION;
/* The connection needs to be established with SQL */
/* program segment for the required function */
printf(“enter the course code for which 2 grace marks are to be added”)
scanf(“%s”, &coursecode);
EXEC SQL DECLARE CURSOR GRACE
SELECT enrolno, coursecode, marks
FROM RESULT
WHERE coursecode=:coursecode
FOR UPDATE OF marks;
EXEC SQL OPEN GRACE;
EXEC SQL FETCH FROM GRACE
INTO: enrolno, :coursecode, :marks;
WHILE(SQL CODE = = 0) {
EXEC SQL
UPDATE RESULT
SET marks = marks+2
MCS-043 GPH Book 61
ENDIF
END;
Q18. Define catalogue and data dictionary. How both are similar or
different?
Ans. The system catalogue is a collection of tables and views that contain
important information about a database. It is the place where a relational database
management system stores schema metadata, such as information about tables
and columns, and internal bookeeping information. A system catalogue is
available for each database. Information in the system catalogue defines the
structure of the database.
The system catalogue can also be used to store some statistical and descriptive
information about relations. Some such information can be:
- number of tuples in each relation,
- the different attribute values,
- storage and access methods used in relation.
All such information finds its use in query processing.
In relational DMBSs the catalogue is stored as relations.
The terms system catalogue and data dictionary have been used interchangeably
in most situations.
Data dictionaries also include data on the secondary keys, indexes and views.
The above could also be extended to the secondary key, index as well as view
information by defining the secondary key, indexes and views. Data dictionaries
MCS-043 GPH Book 65
do not contain any actual data from the database, it contains only book-keeping
information for managing it. Without a data dictionary, however, a database
management system cannot access data from the database.
Data Dictionary Benefits : The benefits of a fully utilised data dictionary are
substantial. A data dictionary has the potential to :
* facilitate data sharing by
- enabling database classes to automatically handle multi-user coordination,
buffer layouts, data validation, and performance optimisations,
- improving the ease of understanding of data definitions,
- ensuring that there is a single authoritative source of reference for all users.
- facilitiate application integration by identifying data redundancies,
- reduce development lead times by
- simplifying documentation
- automating programming activities.
- reduce maintenance effort by identifying the impact of change as it affects:
- users,
- database administrators,
- programmers
- improve the quality of application software by enforcing standards in the
development process.
- ensure application system longevity by maintaining documentation beyond
project completions.
- data dictionary information created under one database system can easily be
used to generate the same database layout on any of the other database systems
BFC supports (Oracle, MS SQL Server, Access, DB2, Sybase, SQL, Anywhere,
MCS-043 GPH Book 67
etc.)
Q21. List the various types of views through which access to the data
dictionary is allowed.
Ans. a) Views with the Prefix USER
b) Views with the Prefix ALL
c) Views with the Prefix DBA
Q23. What is the difference between active and passive Data Dictionary?
Ans. If a data dictionary system is used only by designers, users, and
administrators, not by the DBMS software, it is called a passive data dictionary
otherwise, it is called an active data dictionary or data directory. An active data
dictionary is automatically updated as changes occur in the database. A passive
data dictionary must be manually updated.
Q1. With the help of a block diagram describe the phases of Query
processing. How do we optimize a Query under consideration? Does Query
optimisation contribute to the measurement of Query cost? Support
your answer with suitable explanation. [June-07, Q1(d)]
Ans.
Query in any high-level language
Query Scanning
Parsing
Checking
Statistics of the
QUERY OPTIMISER
database
DATABASE RUNTIME
PROCESSOR
Result of query
Figure : Query processing
In the first step Scanning, Parsing, and Validating is done to translate the query
into its internal form. This is then further translated into relational algebra (an
MCS-043 GPH Book 69
intermediate query form). Parser checks syntax and verifies relations. The
query then is optimized with a query plane, which then is compiled into a code
that can be executed by the database runtime processor.
We can define query evaluation as the query-execution engine taking a query-
evaluation plan, executing that plan, and retuning the answers to the query.
The query processing involves the study of the following concepts:
Query Optimisation: Amongst all equivalent plans choose the one with the
lowest cost. Cost is estimated using statistical information from the database
catalogue, for example, number of tuples in each relation, size of tuples, etc.
Thus, in query optimisation we find an evaluation plan with the lowest cost.
The cost estimation is made on the basic of heuristic rules.
Q3. Define the algorithm for Block Nested-Loop Join for the worst-case
scenario.
Ans. For each block Br of r {
For each block tuple Bs of s {
For each tuple ti in Br {
For each tuple ti in Bs {
Test Pair (ti, si) to see if they satisfy the join condition
If they do, add the joined tuple to result
};
};
};
};
QUERY EXPRESSIONS :
Q8. For the following relations, write the SQL statement to find the
details of the student(s) who have scored grade A in MCS-031. Also
write its equivalent relationship algebraic query and form its query tree.
STUDENT (enrolno, name, age)
MARKS (enrolno, subjectcode, grade)
Ans. Relation given:
STUDENT (enrolno, name, age)
MARKS (enrolno, subjectcode, grade)
Query : To find detail of the student(s) who have scored grade A in MCS-
031.
SQL:
SELECT *
FROM Student
WHERE enrolno in
(SELECT enrolno
FROM MARKS
Where subject code = ‘MCS-031’ and grade = ‘A’);
Relational Algebra :
R1 ← σ subjectcode = 'MCS-031' ^ grade = 'A' ( MARKS)
R2 ← ∏ enrolno ( R1 )
Query tree:
Π Enrolno, name, Age
Student Π Enrolno
Marks
MCS-043 GPH Book 73
(ii) Find the names of all the employees who live in the same city where
the company for which they work is located.
π emp_name ( σ EMPLOYEE.City=COMPANY.City
(EMPLOYEE WORKS COMPANY))
{e[emp_name] |e ∈ EMPLOYEE ^ ∃ c,w (c ∈ COMPANY ^ w ∈ WORKS^
e[emp_name] = w [emp_name] ^ w [comp_name] = c [comp_name] ^ e
[city] = c[city]
(iii) Find the names of those employees who earn more than every
employee of the XYZ Bank Corporation.
π emp_name (σsalary>(Jmax(salary) (σcomp_name = 'xyz Bank Corporation' (WORKS) )) (WORKS))
{e[emp_name] |e ∈ WORKS ^ ∃ w (W ∈ WORKS ^
w [comp_name] = ‘xyz Bank Corporation^ e[salary] > w [salary] }
(iv) Find the name, street and city of those employees who work for the
XYZ Bank Corporation and earn more than Rs. 2,50,000 per annum.
74 GullyBaba Publishing House (P) Ltd.
π EMPLOYEE.* ( σcomp_name = 'xyz Bank Corporation' ^ salary > 250000/12 ( EMPLOYEE WORKS)
{e|e ∈ EMPLOYEE ^ ∃ w (W ∈ WORKS ^ w [salary] >
250000/12 ^ w [comp_name = ‘xyz Bank Corporation’
^ e[emp_name] = w[emp_name]}
(v) Find the names of managers who work in a bank located in Delhi.
π manager_name ( σ (MANAGES
city=`Delhi’ manager_name=emp_name WORKS COMPANY))
ANY))
{e[manager_name] |e ∈ MANAGES ^ ∃ c, w (W ∈
WORKS ^ c ∈ company ^ e [manager_name] = w [emp_name] ^
w[comp_name] = c[comp_name] ^ c [city] = ‘Delhi’}
πstud_id,S-name,grade πstud-id,S-name,drade
σ stud_id=100
π stud-id = 100 SUBJECT
GRADE SUBJECT
GRADE
Initial Query graph Optimized Query Graph
MCS-043 GPH Book 75
πstud_id,grade π stud_id,grade
σ teachers = 'XYZ'
σ b =6
σ b =6 BC
σ b =6 C
ABC ABC
R1 σ b =6 R1 σ b =6
R1 R2 R1 R3
R2 R3
Initial Query Tree Optimized Query Tree
π ABC
π ABC
σ B=b π BC π AB π BC π AB π C
σ D=d
π AB R2 σ B=b R2 R1 σ D= d
R1 R1 R3 R1 R3
76 GullyBaba Publishing House (P) Ltd.
Optimized Query is
: (π AB (σB-b (R1)) π BC (R2))-(π AB (R1) (π C (σ D=d (R3)))
Q14. What are the various transaction states ? Explain each state and
state transition with the help of example(s).
Ans. Active ⇒ It is the initial state. The transaction stays in this state while
it is executed.
Failed ⇒ Failed state comes after the discovery that normal execution can
no longer proceed.
Aborted ⇒ This state comes after the transaction has been rolled back & the
database has been restarted to its state prior to the start of the transaction.
Partially
Committed
Committed
Active
Failed Aborted
Example:
Consider the following transaction:
MCS-043 GPH Book 77
T
1. Sum:=0
2. Read(A)
3. Read(B)
4. Sum = A + B
5. Write(A)
6. Write(B)
When we issue the transaction T and it is ready for execution then it will be in
the Active state. When it is in process (line 1 to 5) it will be in the Partially
Committed state. If any operation fails at any point then it will be in the Failed
state and followed by Aborted state. If the last operation is completed and
successfully communicated to physical database then it will be in the Committed
state.
ACID Properties
action that the system is carrying out at a time. If there are two transactions
that are performing the same function and are running at the same time,
transaction isolation will ensure that each transaction thinks that it has exclusive
use of the system.
Durability : A transaction is durable in that once it has been successfully
completed, all of the changes it has made to the system are permanent. There
are safeguards that will prevent the loss of information, even in the case of
system failure.
Violation of durability causes lost update problem.
Suppose the 2 transactions T3 and T4 run concurrently and they happen to be
interleaved in the following way (assume the initial value of X as 10000).
T3 T4 Value of X
T3 T4
Read X 10000
Read X 10000
Sub 5000 5000
Add 3000 13000
Write X 5000
Write X 13000
After the execution of both the transactions the value X is 13000 while the
semantically correct value should be 8000. The problem occurred as the update
made by T3 has been over written by T4. The root cause of the problem was
the fact that both the transactions had read the value of X as 10000. Thus one
of the two update has been lost and we say that a lost update has occurred.
Q16. Why we use lock based protocol? What are the pitfalls of lock
based protocol?
Ans. Lock based protocols are one of the major techniques to implement
concurrent transaction execution. These protocols try to ensure serialisability.
The pitfalls of such protocols can be:
• Overheads due to locking
• Locking may lead to deadlocks.
• We may either restrict concurrency – strict 2 PL or allow uncommitted data
to be seen by other transactions.
Q17. How do you define transaction processing methods? Also list the
MCS-043 GPH Book 79
Q19. How are insert and delete operations performed in a tree protocol?
Ans. The main problems of Insert and Delete are that it may result in
inconsistent analysis when concurrent transactions are going on due to phantom
records. One such solution to this problems may be locking the index while
performing such an analysis.
T1 T2
The Fig(a) show the schedule of transactions T1 & T2 where both are in
deadlock state, as T1 is waiting for X and T2 is waiting for Y Neither T1 nor T2
nor any other transaction can access items X and Y1 and X and Y are locked.
T3
T2
T1
T4
(ii) Once we have decide that a particular transaction must be rolled back, we
must determine how far this transaction should be rolled back : There are two
choices :
• Total rollback means abort the transaction & restart it.
• Partial rollback means rollback the transaction only as far as necessary to
break the deadlock.
The transaction must be capable to resume execution after a partial rollback.
for additional ones held by other transactions in the chain never occurs. The
two phase locking protocol ensures serializability but does not ensure a deadlock
free situation. There are various approaches for avoiding deadlocks. Some of
them are as follows:
(i) One simplest Method for avoiding deadlock into lock all data items at the
beginning of a transaction.
This has to be done in atomic manner otherwise there could be a deadlock
situation again. Disadvantages of this approach are:
(a) Degree of concurrency is covered considerably.
(b) Locking all data items for the entire duration of a transaction makes there
data items inaccessible to other concurrent transactions.
(a) Wait-Die :-
• If the requesting transaction is elder that the transaction that holds the lock
on the requested data item the requesting transaction is allowed to wait.
• If the requesting transaction is younger that the transaction that holds the
lock on the requested data item, the requesting transaction is aborted or rolled
back.
2. It has to modes exclusive It has five modes Each data item has two
and shared S, X, IS, IX SIX stamps write time stamp and
record time stamp.
3. It causes less overhead than It creates more overhead in Optimum overhead
IM Locking maintenance of locks
violated.
Q29. What is a log file? What does it contain? How can a log file be used
for recovery? Describe this with the help of an example that includes all
recovery operations (UNDO, REDO, etc)
Ans. Log ⇒ contains the redundant data, which will be used in case of
recovery. Log information is kept on stable storage. In case of a system crash
losing volatile information. The modified data in buffers and log information in
buffers will be lost. The transactions having commit marker will be redone
and the transactions which have missing commit marker will be undone. It
means that longer the system operates without a crash the longer it will take to
recover from the crash. The recovery operation after a system crash will have
to reprocess all transactions from the time of start up of the system. These
problems can be avoided by using a means to propagate volatile information to
stable storage at regular intervals. This scheme is called check pointing check
pointing is used to limit the volume of log information to be processed in case
of a system failure.
We will prefer log based recovery over check pointing where we have low
traffic to database and it has high cost to checkpoint frequently such as in an
EMI calculation application. Check pointing scheme will used where so many
transactions are running per second and changing data. Such as in an online
reservation system.
Example :
Using immediate update, and the transaction TRAN1 again, the process is:
86 GullyBaba Publishing House (P) Ltd.
Q30. Suppose the write ahead Log Scheme is being used. Give the REDO
and UNDO processes in the strategy of writing the partial update made
by a transaction to the database, as well as in the strategy of delaying all
writes to database till the commit. 5
Ans. In write-ahead log strategy the transaction updates the physical database
and the modified record replaces the old record in the database on nonvolatile
storage. The log information about the transaction modifications are written
before the corresponding put(x) operation, initiated by the transaction, is
performed. The write-ahead log strategy has the following requirements:
1. Before a transaction is allowed to modify the database, at lest the undo
portion of the transaction log record is written to the stable storage.
2. A transaction is committed only after both the undo and the redo portion of
the log are written to stable storage.
In the strategy of delaying all writes to database till the commit : the
MCS-043 GPH Book 87
The initiation of a transaction causes the start of the log of its activities; a start
transaction along with the identification of the transaction is written out to the
log. During the execution of the transaction, any output is preceded by a log
output to indicate the modification being made to the database. This output to
the log consists of the record(s) being modified, old values of the data items in
the case of an update, and the values of the data item. The old values will be
used by the recovery system to undo the modifications made by a transaction
in case a system crash occurs before the completion of the transaction. When
a system crash occurs after a transaction commits, the new values will be
used by the recovery system to redo the changes made by the transaction and
thus ensure that the modifications made by a committed transaction are
correctly reflected in the database.
The transaction shown in before consists of reading in the value of some data
item X and modifying it by a certain amount. The transaction then reads in the
value of another data item Y and modifies it by an equal but opposite amount.
The transaction may subtract, let us say, a quantity n from the inventory for
part Px and add this amount to quantities of that item shipped to customer Cy.
For consistency this transaction must be completed atomically. A system crash
occurring at any time before time t9 will require that the transaction be undone.
88 GullyBaba Publishing House (P) Ltd.
A system crash occurring after t9, when the commit transaction marker is
written to the log, requires that we redo the transaction to ensure that all of the
changes made by this transaction are propagated to the physical database.
According to the write-ahead log strategy, the redo portion of the log need not
be written to the log until the commit transaction is issued by the program
performing the transaction.
In the strategy of writing partial updates : If the crash occurs just during
or after t4, the recovery process, when it examines the log, find that the commit
transaction marker is missing and, hence will undo the partially completed
transaction. To do this it will use the old value for the modified field. And
restore the database to the consistent state that existed before the crash and
before transaction was started. If the crash occurs after step t9 is completed,
the recovery system will find an end-of-transaction in the log.
However, since the log was written before the database, all modifications to
the database may not have been propagate d to the database. Thus to ensure
that all modifications made by transaction are propagated to the database, the
recovery system will redo the committed transaction. To do this it uses the
new values of the appropriate fields of the records.
Q31. What are the steps one must take with its database management
system, in order to ensure disaster recovery? Define the process of
recovery in case of disaster.
Ans. Disasters refers to circumstances that result in the loss of the physical
database stored on the nonvolatile storage medium.
The data stored in stable storage consists of the archival copy of the database
and the archival log of the transactions on the database represented in the
archival copy.
The disaster recovery process requires a global redo. In this case the changes
made by every transaction in the archival log are redone using the archival
database as the initial version of the current database.
• As the order of redoing the operation must be same as the original order,
hence the archival log must be chronology ordered.
• Since the archival database should be consistent, it must be a copy of the
current database in a quiescent stage.
The recovery operation consists of redoing the changes made by committed
transactions from the archive log on the archive database. A new consistent
MCS-043 GPH Book 89
Q32. Compare and contrast the features of log based recovery mechanism
versus check pointing based recovery. Suggest applications where you
will prefer log based recovery scheme over check pointing. Give an
example of check pointing based recovery scheme.
Ans. Log contains the redundant data, which will be used in case of recovery.
Log information is kept on stable storage. In case of a system crash losing
volatile information. The modified data in buffers and log information in buffers
will be lost. The transactions having commit marker will be redone and the
transactions which have missing commit marker will be undone. It means that
longer the system operates without a crash the longer it will take to recover
from the crash. The recovery operation after a system crash will have to
reprocess all transactions from the time of start up of the system. These
problems can be avoided by using a means to propagate volatile information to
stable storage at regular intervals. This scheme is called check pointing check
pointing is used to limit the volume of log information to be processed in case
of a system failure.
We will prefer log based recovery over check pointing where we have low
traffic to database and it has high cost to checkpoint frequently such as in an
EMI calculation application. Check pointing scheme will used where so many
transactions are running per second and changing data. Such as in an online
reservation system.
Recovery is a process of restoring database to the consistent state that existed
before the failure. It is an integral part of the database. The recovery scheme
must minimize the time for which the database is not usable after a crash.
The most widely used structure for recording database modifications is the
log. The log is a sequence of log records all the update activities in the database.
The log contains redundant data required to recover database to a consistent
state. The log is generally stored on stable storage.
T0
T1
T2
T3
Ti
T4 T i+1 T i+2
Ti+3
T i+3
t0 tc tx
A checkpoint is taken tc time Tc, which indicates that at that time all log and
data buffers were propagated to storage.
The transactions T0, T1, T2, T4 were committed and their modifications are
reflected in the database. With the checkpoint scheme, these transactions are
not required to be redone after the system crash.
The transactions Ti and Ti+3 are to be redone at the point of checkpoint, where
the transaction Ti+2 has to be undo, as no log has been written for this
transaction.
version exit, the DBMS selects one of the version of the data-item. The value
read by a transaction must be consistent with some serial execution of the
transaction with a single version of the database. Thus the concurrency control
problem is transferred into the selection of the correct version from the multiple
version of a data item. With the multiversion technique, write operations can
occur concurrently.
Example : Assume X1, X2, …, Xn are the version of a data item X created by
a write operation of transactions. With each Xi a read_TS (read timestamp)
and a write_TS (write timestamp) are associated.
read_TS(Xi) : The read timestamp of Xi is the largest of all the timestamps of
transactions that have successfully read version Xi.
write_TS(Xi) : The write timestamp of Xi that wrote the value of version Xi.
A new version of Xi is created only by a write operation.
Q36. What are Triggers, Cursors and Alerts? What is their requirement
in DBMS? How are Triggers different from stored procedures?
Ans. Trigger : A trigger is a statement that the system executes automatically
as a side effect of a modification to the database.
To design a trigger mechanism, we must meet two requirements
(a) Specify when a trigger is to be executed: That is broken up into an
event that causes the trigger to be checked and a condition that must be
satisfied for trigger execution to proceed.
(b) Specify the actions to be taken when the trigger executes : The above
model of triggers is referred to as the event condition action model for triggers.
The database stores triggers as they are regular data, so that they are persistent
MCS-043 GPH Book 93
and are accessible to all database operations. Once trigger is stored in the
database, the database system takes on responsibility of executing it whenever
the specified events occurs and corresponding condition is satisfied.
Use: Triggers have many uses, such as implementing business rules, audit
logging and even carrying out actions outside the database.
Trigger as supportive for integrity constraints : E.g., in a banking system,
instead of allowing negative balances, the bank create a loan in the amount of
overdraft and set balance to 0.
To do this,
Condition for executing trigger is: update to account relation that results in
a negative value.
Actions to be taken are:
(i) Insert a new record/tuple/row in the load relation with
s[loan-number] = t[account-number]
s[branch-name] = t[branch-name]
s[amount] = -(t[balance])
(ii) Insert a new record/row/tuple in borrower relation with
u[customer-name] = t[customer-name]
u[loan-number] = t[loan-number]
(iii) Set t[Balance] to 0
Define in SQL
CREATE TRIGGER overdraft-trigger AFTER UPDATE OF balance ON
account
REFERENCING NEW ROW AS nrow
FOR EACH ROW
WHEN nrow.balance < 0
BEGIN ATOMIC
INSERT INTO loan values
(nrow.account-number, nrow.branch_name, -(nrow.balance));
INSERT INTO borrower
(select customer-name, account-number
from depositor
where nrow.account-number = depositor.account-number);
UPDATE ACCOUNT set balance = 0
where account.account-number = nrow.account-number;
END
Cursor: A cursor basically is a pointer to a query result and is used to read
attribute values of selected tuples into variables.
94 GullyBaba Publishing House (P) Ltd.
action.
4. Trigger is run automatically if the event is occurred but Stored Procedure
don’t run automatically but user have to run it manually.
5. Within a trigger user can call specific Stored Procedure but within a Stored
Procedure user can’t call a trigger.
6. Stored Procedure can pass the parameters which is not a case with Triggers.
7. A Trigger can call the specific Stored Procedure in it but the reverse is not
true.
DATABASE SECURITY
Q40. Write the syntax for ‘Revoke Statement’ that revokes with grant
option.
Ans. REVOKE GRANT OPTION FOR <permissions>
ON <table>
FROM <user/role>
Q44. What are the different ways of preventing open access to the
Internet?
Ans. Trusted IP address.
Server Account Disabling
Special Tools, Example: Real source by ISS.
Q47. List the different types of security threats. Explain three major
defence mechanisms required to be built into the database management
system.
Ans. Security threats can be categorised into two categories :
1. Accidental Security Threats – It includes :
(i) Improper Recovery Procedures
(ii) Concurrent usage anomalies.
(iii) System error.
(iv) Improper authorisation .
(v) Hardware failures.
Defence Mechanisms
MCS-043 GPH Book 97
1. Human factors – It is the outer most level which encompass the ethical,
legal, and social environments.
2. Physical security - physical security mechanisms include appropriate locks
and keys and entry logs to computing facility and terminals.
3. Administrative controls – These determine what information will be
accessible to what class of users, and the type of access that will be allowed
to this class.
Q48. How can views be useful in enforcing security? What are the types
of updates that are allowed through views?
Ans. View Mechanism – The architecture of most of the database models is
divided into three levels: The external level, the conceptual level, and the internal
level. How we see the data at these levels is called view at the particular level.
How these all views are mapped with each other is called view mechanism.
External level
User/application View View View
view Defined A B Z
by user or
application
programmer in
consultation
with DBA
Conceptual level
Defined by DBA Conceptual View
Internal level
DBA defined for Internal View
optimization
External or User view – Only that portion of database is supplied to the user
that is relevant to that particular user. External view is defined by the DBA by
considering the requirements and authorities of the user. External view is
generated by extracting the objects from conceptual view.
98 GullyBaba Publishing House (P) Ltd.
Internal View ⇒ Indicates how the data will be stored. It describes the data
structures and access methods.
Q49. Assuming that you are the data security administrator of a public
sector bank, what are the different security and privacy measures that
you will propose for its customer’s data?
Ans. Here the question has a situation of a data security administration of a
bank. The Data security Administrator has to implement security & privacy
measures on Bank database. This is important because a large number of
customers are allowed to access the database to prevent unauthorized access
of data of one user from others. The implementation of various security &
privacy measures on database is necessary to save the customers data from
malicious manipulation. Keeping in view the need for security privacy Measures,
we can use two types of security mechanisms:
(a) Security Mechanisms involving Access protection, user accounts & database
MCS-043 GPH Book 99
audits.
(b) Security mechanisms involving granting & Revoking of privileges.
Q50. How can you recover from media failure on which your database
was stored? Describe the mechanism of such recovery.
Ans. Media failure is the biggest threat to your data. A media failure is a
physical problem that occurs when a computer unsuccessfully attempts to
read from or write to a file necessary to operate the database. Common types
of media problems include:
• A disk drive that holds one of the database files experiences a head crash.
• A datafile, online or archieved redo log, or control file is accidentally deleted,
overwritten, or corrupted.
The technique you use to recover from media failure of a database file depends
heavily on the type of media failure that occurred. For example, the strategy
you use to recover from a corrupted datafile is different from the strategy for
recovering from the loss of the control file.
The basic steps for media recovery are:
• Determine which files to recover.
• Determine the type of media recovery required: complete or incomplete,
open database or closed database.
• Restore backups or copies of necessary files: datafiles, control files, and the
archived redo logs necessary to recover the datafiles.
Note :
If you do not have a backup, then you can still perform recovery if you have
the necessary redo logs and the control file contains the name of the damaged
file. If you cannot restore a file to its original location, then you must relocate
the restored file and rename the file in the control file.
• Apply redo records to recover the datafiles. (When using Recovery Manager,
apply redo records or incremental backups, or both.)
• Reopen the database. If you perform incomplete recovery or restore a backup
control file, then you must open the database with the RESETLOGS option.
The following techniques are used to recover from storage media failure.
(i) Data Base Backup – The whole database and the log are periodically
copied on to a cheap storage media, like tape in case of failure, the latest
backup copy is reloaded from the tape to the worky disk.
(ii) Volatile Storage – For avoiding failure of volatile storage media, we
100 GullyBaba Publishing House (P) Ltd.
Q51. What are the several forms of authorising database users ? Can a
user pass his authorisation to another user ? How can you withdraw an
authorisation ?
Ans. Keeping in view the need for security privacy measures, we can use
two types of security mechanisms :-
(a) Security Mechanism involving access protection, user accounts and database
audits.
(b) Security mechanisms involving granting and revoking of privileges.
Informally, there are two levels of assigning privileges for the user of database
system :
(i) The Account Level – At this level the DBA Specifies privileges to a particular
account. The privileges can be Insert, Delete etc.
(ii) The Relational (table) level – Privileges at the relational level specifies
for each user the individual relation on which each type of command can be
applied. The relation can be base relation or virtual relation (view). Some
privileges also refer to individual columns of relation.
The granting and revoking privileges generally follows an authorization model
for discretionary privileges known as access matrix model where the rows of
a matrix M represent subjects (users, accounts, programs) and the columns
represents objects (relations, records, columns, views and operations). Each
position (i, j) in the matrix represents the type of privileges (read, write, update)
that subject i holds on object j.
Authorization Grant Tree ⇒ The needs of authorization Grant Tree are the
users. The Tree includes an edges Ui → Uj if user Ui grant rights to user Uj
the root of the graph is database Administrator.
In the following figure.
102 GullyBaba Publishing House (P) Ltd.
U1 U4
DBA U2 U5
U3
Q53. With ............... sensitive data values are not provided; the query
is rejected without response.
Ans. Suppression.
Q54. Write a sequence of queries that would disclose the name of the
person who scored the highest marks.
Ans.
Q59. .................... while providing SSL & S-HTTP security, its using
java as a basic component of its security model.
Ans. Oracle
Q60. What is Audit trail? Give four benefits provided by Audit trail to
DBMS. [June-07, Q5(b)]
Ans. An audit trail tracks and reports activities around confidential data. Many
companies have not realised the potential amount or risk associated with
sensitive information within database unless they run an internal audit which
details who has access to sensitive data and have assessed it. Consider the
situation that a DBA who has complete control of database information may
conduct a security breach, with respect to business details and financial
information. This will cause tremendous loss to the company.
(1) In such a situation database audit helps in locating the source of the problem.
The database audit process involves a review of log files to find and examine
all reads and writes to database items during a specific time period, to ascertain
mischief if any, banking database is one such database which contains very
104 GullyBaba Publishing House (P) Ltd.
• Security of different objects may be different from the other attribute values
of that tuple or security may be different from the other values of the attributes.
• Thus, every individual element is the micro items for security.
• In addition, security against statistical queries may also need to be provided.
• A number of security level needs to be defined for such security.
There are many techniques to support multi-level security. Two important
methods are:
(i) Partitioning
In this approach the original database is divided into partitions. Each of the
partitions has a different level of security.
However, the major drawback of this approach is that the database looses the
advantage of a relation.
(ii) Encryption
Encryption of critical information may help in maintaining security as a user
who accidentally receives them cannot interpret the data. Such a scheme may
be used when a user password file is implemented in the database design.
However, the simple encryption algorithms are not secure, also since data is
available in the database it may also be obtained in response to a query.
MCS-043 GPH Book 105
There are may more schemes to handle such multi-level database security.
Q63. Design a database security scheme for a library issue and return
system.
(Hint: you may define security levels for database access and external
schema)
Ans. A library issue & return system will issue the books to the valid members
& also take return of the books.
Here this system will keep information about members. Their renewals, number
of books, due date of return etc.
The security mechanism is needed for such database system so that any invalid
member could not take the book.
Besides any member whose membership needs to be renewed could not take
the books. Issued books should be returned on due dates and if they are kept
after the due date of return by the members, then fine should be charged.
Site 4 Site 1
DB Communications DB
Network
Site 3 Site 2
DB DB
Q65. How does the architecture of DDBMS differ from that of a centralised
DBMS?
Ans. Reference architecture shows data distribution, which is an extension of
the ANSI/SPARC three level architecture for a centralised DBMS. The reference
architecture consists of the following :
• A set of global external schemas to describe the external view for the entire
DBMS,
• A global conceptual schema containing the definition of entities, relationships,
constraints and security and integrity information for the entire system,
• A fragmentation and allocation schema, and
• A set of schemas for each local DBMS a per ANSI/SPARC architecture.
Q1. What are Object Oriented Design (OOD) techniques? Explain the
OOD process.
Ans. OOD techniques are useful for development of large, complex systems.
It has been observed that large projects developed using OOD techniques
resulted in a 30% reduction in development times and a 20% reduction in
development staff effort, as compared to similarly sized projects using traditional
software development techniques.
Although object oriented methods have roots that are strongly anchored back
in the 60s, structured and functional methods were the first to be used. This is
not very uncommon, since functional methods are inspired directly by computer
architecture (a proven domain, well known to computer scientists). The
separation of data and program as exists physically in the hardware was
translated into functional methods. This is the reason why computer scientists
got into the habit of thinking in terms of system functions. Later it was felt
that hardware should act as the servant of the software that is executed on it
rather than imposing architectural constraints on system development process.
Moving from a functional approach to an object oriented one requires a
translation of the functional model elements into object model elements, which
is far from being straightforward or natural. Indeed, there is no direct
relationship between the two sets, and it is, therefore, necessary to break the
model elements from one approach in order to create model element fragments
that can be used by the other. In the initial phases of OO design a mixed
approach was followed computer scientist tend to use functional approach in
analysis phase and OO approach in design phase. The combination of a functional
approach for analysis and an object-oriented approach for design and
implementation does not need to exist today, as modern object-oriented methods
cover the full software lifecycle.
System Object
Requirements Analysis Design Design Implementation Testing
Q5. Represent and address using SQL that has a method for locating
pin-code information.
Ans. CREATE TYPE Addrtype As
houseNo CHAR (8),
street CHAR (10),
colony CHAR (10),
state CHAR (8),
pincode CHAR (6),
);
METHOD pin ( ) RETURNS CHAR (6);
CREATE METHOD pin ( ) RETURNS CHAR (6);
FOR Addrtype
BEGIN
. . . . .
END
Q9. List the major considerations while converting ODL designs into
relational designs.
Ans. The major considerations while converting ODL designs into relational
designs are as follows :
(a) It is not essential to declare keys for a class in ODL but in Relational design
now attributes have to be created in order for it to work as a key.
(b) Attributes in ODL could be declared as non-atomic whereas, in Relational
design, they have to be converted into atomic attributes.
(c) Methods could be part of design in ODL but, they can not be directly
converted into relational schema although, the SQL supports it, as it is not the
property of a relational schema.
(d) Relationships are defined in inverse pairs for ODL but, in case of relational
design, only one pair is defined.
For example, for the book class schema the relation is :
Book (ISBNNO, TITLE, CATEGORY, fauthor, sauthor, tauthor)
Thus, the ODL has been created with the features required to create an object
oriented database in OODBMS.
Q11. Create a class staff using ODL that also in the Book class which is
given below
class Book
{
attribute string ISBNNO;
attribute string TITLE;
attribute enum CATEGORY
{text, reference, journal} BOOKTYPE;
attribute struct AUTHORS
{string fauthor, string sauthor, string tauthor}
AUTHORLIST;
};
Ans. class staff
{
attribute string STAFF_ID;
118 GullyBaba Publishing House (P) Ltd.
Q12. Find the list of books that have been issued to “Dinesh”.
Ans. SELECT DISTINCT b.TITLE
FROM BOOK b
WHERE b.issuedto.NAME = “Dinesh”
Example:
define class department:
type set (department);
operations add dept (d: department): boolean;
/* adds a department to the department set object */
MCS-043 GPH Book 119
Although SGML was a good format for document sharing, and HTML was a
good language for describing the document layout in a standardised way,
there was no standardised way of describing and sharing data that was stored
in the document. For example, an HTML page might have a body that contains
a listing of today's share prices. HTML can structure the data using tables,
colour etc., once they are rendered as HTML; they no longer are individual
pieces of data to extract the top ten shares. You may have to do a lot of
processing.
Thus, there was a need for a tag-based markup language standard that could
describe data more effectively than HTML, while still using the very popular
120 GullyBaba Publishing House (P) Ltd.
and standardised HTTP over the Internet. Therefore, in 1998 the World Wide
Web Consortium (W3C) came up with the first Extensible Markup Language
(XML) Recommendations. Now, the XML (eXtended Markup Language) has
emerged as the standard for structuring and exchanging data over the Web.
Please note : XML is case sensitive whereas HTML is not. So, you may see
that XML and databases have something in common.
Q18. What is XML? How does XML compare to SGML and HTML?
Ans. Most document on Web are currently stored and transmitted in HTML.
One strength of HTML is its simplicity. However, it may be one of its weaknesses
with the growing needs of users who want HTML document to be more
attractive and dynamic. XML is a restricted version of SGML, designed
especially for Web documents. SGML defines the structure of the document
(DTD), and text separately. By giving documents a separately defined structure,
and by giving web page designers ability to define custom structures, SGML
has and provides extremely powerful document management system but has
MCS-043 GPH Book 121
Q19. Why is XML case sensitive, whereas SGML and HTML are not?
Ans. XML is designed to work with applications that might not be case sensitive
and in which the case folding (the conversion to just one case) cannot be
predicted. Thus, to avoid making unsafe assumptions, XML takes the safest
route and opts for case sensitivity.
Q20. Why is it possible to define your own tags in XML but not in HTML?
Ans. In HTML, the tags tell the browser what to do to the text between them.
When a tag is encountered, the browser interprets it and displays the text in
the proper form. If the browser does not understand a tag, it ignores it. In
XML, the interpretation of tags is not the responsibility of the browser. It is
the programmer who defines the tags through DTD or Schema.
(optional; “yes” if the document does not refer to any external document or
entities, “no” otherwise).
Q25. What are two XML constructs that let you specify an XML
document’s syntax so that it can be checked for validity?
Ans. Document Type Definitions (DTDs) and XML schemas.
Q27. What is the difference between a well formed XML document and
a valid XML Document?
Ans. XML document that conforms to structural and notational rules of XML
is considered well-formed. XML Validating processor checks that an XML
document is well-formed and conforms to a DTD, in which case the XML
document is considered valid. A well formed xml document may not be a valid
document.
</employee>
Ans. In a well-formed XML document, there must be one root element that
contains all the others.
Q31. Describe the differences between DTD and the XML Schema?
Ans.
124 GullyBaba Publishing House (P) Ltd.
Q32. How can you use an XML schema to declare an element called
<name> that holds text content?
Ans. You can declare the element like this:
<xsd:element name = “name” type= “xsd:string”/>
placed on the path. While X Link allows elements to be inserted into XML
Documents to create and describe links between resources. It uses XML
syntax to create structures that can describe links similar to simple unidirectional
hyperlinks of HTML as well as more sophisticated links.
Q37. How can you store an XML document in any relational database?
Ans. 1. XML data can be stored as strings in a relational database.
2. Relations can represent XML data as tree.
3. XML data can be mapped to relations in the same way that E-R schemes are
mapped to relational schemas.
Figure defines the basic architecture of a data warehouse. The analytical reports
are not a part of the data warehouse but are one of the major business application
126 GullyBaba Publishing House (P) Ltd.
key data items uniquely. It also identifies the logical relationships between
them ensuring organisation wide consistency in terms of:
Admission Student
Examination Programme
The meta data directory component defines the repository of the information
stored in the data warehouse. The meta data can be used by the general users
as well as data administrators. It contains the following information.
The first step in data warehousing is, to perform data extraction, transformation,
and loading of data into the data warehouse. This is called ETL that is Extraction,
Transformation, and Loading. ETL refers to the methods involved in accessing
and manipulation data available in various sources and loading it into a target
data warehouse. Initially the ETL was performed using SQL programs,
however, now there are tools available for ETL processes. The manual ETL
was complex as it required the creation of a complex code for extracting data
from many sources. ETL tools are very powerful and offer many advantages
MCS-043 GPH Book 129
over the manual ETL. ETL is a step-by-step process. As a first step, it maps
the data structure of a source system to the structure in the target data
warehousing system. In the second step, it cleans up the data using the process
of data transformation and finally, if loads the data into the target system.
(3) Dynamic sparse matrix handling : This is a feature that is much needed
as it contains huge amount of data.
(4) Client/server architecture : This feature helps a data warehouse to be
accessed in a controlled environment by multiple users.
(5) Accessibility and transparency, intuitive data manipulation and
consistent reporting performance : This is one of the major features of the
data warehouse. A data warehouse contains, huge amounts of data, however,
that should not be the reason for bad performance or bad user interfaces.
Since the objectives of data warehouse are clear, therefore, it has to support
the following easy to use interfaces, strong data manipulation, support for
applying and reporting of various analyses and user-friendly output.
Q43. What are the key concerns when building a data warehouse?
Ans. Following are the key concerns when building a data warehouse:
• How will the data be acquired?
• How well it be stored?
• The type of environment in which the data warehouse will be implemented?
Q50. What do you mean by Data mining? How does Database processing
differ from Data mining processing? [June-07, Q1(c)]
Ans. Data is growing at a phenomenal rate today and the users expect more
sophisticated information from this data. There is need for new techniques
and tools that can automatically generate useful information and knowledge
132 GullyBaba Publishing House (P) Ltd.
from large volumes of data. Data mining is one such technique of generating
hidden information from the data. Data mining can be defined as : “an automatic
process of extraction of non-trivial or implicit or previously unknown but
potentially useful information or patterns from data in large databases, data
warehouses or in flat files”.
Data mining is related to data warehouse in this respect that, a data warehouse
is well equipped for providing data as input for the data mining process. The
advantages of using the data of data warehouse for data mining are many
some of them are listed below:
• Data quality and consistency are essential for data mining, to ensure, the
accuracy of the predictive models. In data warehouses, before loading the
data, it is first extracted, cleaned and transformed. We will get good results
only if we have good quality data.
• Data warehouse consists of data from multiple sources. The data in data
warehouses is integrated and subject oriented data. The data mining process
performed on this data.
• In data mining, it may be the case that, the required data may be aggregated
or summarised data. This is already there in the data warehouse.
• Data warehouse provides the capability of analysing data by using OLAP
operations, Thus, the result of a data mining study can be analyzed for hirtherto,
uncovered patterns.
• Find all credit card applicants with the last name Veenu.
• Identify customers who have made purchases of more than Rs. 10,000/-in
the last month.
• Find all customers who have purchased shirt(s).
Hierarchical Clustering
In this method, the clusters are created in levels and depending upon the
threshold value at each level the clusters are again created.
An agglomerative approach begins with each tuple in a distinct cluster and
successively merges clusters together until a stopping criterion is satisfied.
This is the bottom up approach.
A divisive method beings with all tuples in a single cluster and performs splitting
until a stopping criterion is met. This is the top down approach.
Q58. List the example of problems which can be solved by data mining
and also discuss briefly the approaches to data mining problems.
Sample Data Mining Problems : problems that can be solved through the
data mining process.
(1) Mr. Suraj Gupta manages a supermarket and the cash counters, he adds
transactions into the database. Some of the questions that can come to
Mr. Gupta’s mind are as follows:
(a) Can you help me visualize my sales?
(b) Can you profile my customers?
(c) Tell me something intersecting about sales such as, what time sales will be
maximum etc.
He does not know statistics, and he does not want to hire statisticians.
The answer of some of the above questions may be answered by data mining.
(2) Mr. Avinash Arun is an astronomer and the sky survey has 3 tera-bytes
(1012) of data, 2 billion objects. Some of the questions that can come to the
mind of Mr. Arun are as follows:
(a) Can you help me recognize the objects?
(b) Most of the data is beyond my reach. Can you find new/unusual items in
my data?
(c) Can you help me with basic manipulation, so I can focus on the basic
MCS-043 GPH Book 137
science of astronomy?
He knows the data and statistics, but that is not enough. The answer to some
of the above questions may be answered once again, by data mining.
Please note : The use of data mining in both the questions given above lies in
finding certain patterns and information. Definitely the type of the data in both
the database as given above will be quite different.
Step 2: S
Can the data set D and count each itemset in Ck , if it is greater than minimum
support, it is frequent.
Q61. Apply the Apriori algorithm for generating large itemset on the
following dataset:
Assuming the minimum support as 50% for calculating the large item sets. As
we have 4 transaction at least 2 transaction should have the data item.
1. scan D → C1; a1:2, a2:3, a3:3, a4:1, a5:3
→ L1: a1:2, a2:3, a3:3, a5:3
→ C2 : a 1 a 2 , a 1 a 3 , a 1 a 5 , a 2 a 3 , a 2 a 5 , a 3 a 5
2. scan D → C2: a1a2:1, a1a3:2, a1a5:1, a2a3:2, a2a5:3, a3a5:2
→ L2: a1a3:2, a2a3:2, a2a5:3, a3a5:2
→ C3: a1a2a3, a1a2a5, a2a3a5
→ Pruned C2: a2a3a5
3. scan D → L3: a2a3a5;2
Thus L= {L1,L2,L3}
Chapter-7
EMERGING TRENDS AND EXAMPLE
DBMS ARCHITECTURES
Q1. Explain the need and features for the following advanced database
application systems:
Ans. (i) Multimedia Database
A Multimedia Database Management System (MMDBMS) provides support
for multimedia datatypes required for digital images, audio, video, animation
and graphics along with textual data.
Main application or need of Traditional Database Management System is to
store frequently used data, represent data in various layout/forms which are
stored once. In revolutionary world of Technology, the huge amount of data
in different multimedia-related application need consistency, concurrency,
integrity, security and high availability of data. For example, application like
digital signature authentication, eye ratina authentication, speech recognition,
text to speech conversion, speech to text conversion, animation in movie etc.
require processing, retrieval, searching like database operation or multimedia
type of data. Thus, Multimedia and its application have experience tremendous
growth of Multimedia database experience tremendous growth of Multimedia
database.
Features :
- Provides features of traditional database.
- Provide homogeneous framework for storing, processing, retrieving,
transmitting and presenting wide variety of multimedia data types.
- Provide large volume of formats for variety of multimedia data types.
- Huge size of MMDBMS.
- Manage different type of input, output and storage devices.
- Size of multimedia data is larger than traditional text data. So, it should
provide variety of data compression and storage formats for various types of
data.
- It consists of information about sampling rate, resolution, frame rate, encoding
scheme etc. of various media data.
MCS-043 GPH Book 141
Q2. What are the reasons for the growth of multimedia data? What are
the contents of multimedia database?
Ans. Reasons for the growth of multimedia data
(a) Advanced technology in terms of devices that were digital in nature and
support capture and display equipment.
(b) High speed data communication network and software support for
multimedia data transfer.
(c) Newer application requiring multimedia support.
Q10. Define the term Mobile Database? Also discuss about the future
scope of Mobile Databases.
Ans. A mobile database is a database that can be connected to by a mobile
computing device over a mobile network. The client and server have wireless
connections. A cache is maintained to hold frequent data and transactions so
that they are not lost due to connection failure.
The use of laptops, mobiles and PDAs is increasing and likely to increase in
the future with more and more applications residing in the mobile systems. It
is clear that a large percentage of mobile users will require the use of a database
of some sort. Many applications such as databases would require the ability to
download information from an information repository and operate on this
information even when “out of range” or disconnected.
Q16. Why do you need JDBC? What happens when a DBMS does not
have a JDBC driver?
Ans. JDBC is an API that allows Java programmers to access any database
through the set of its standard API. In case a JDBC driver is not available for
a DBMS then the ODBC-JDBC bridge can be used to access data.
Q19. List the different types of hardware and software required for
digital libraries?
Ans. Digital libraries require:
• Very large secondary storage
• Distributed array of powerful servers
• Large bandwidth and data transfer rate
• A reliable library software.
Application Application
Data Grid
Structure
Virtual Database
Database Database Database
Figure : A Data Grid
A data grid should address the following issues:
• It should allow data security and domain specific description of data. Domain
MCS-043 GPH Book 147
Thus, a data grid virtualizes data. Many DBMSs today, have the ability to
separate the roles of the database administrator and the data owner from the
application access. This needs to be extended to the grid across a heterogeneous
infrastructure.
PostgreSQL
system columns are normally invisible to the user, however, explicit queries
can report these entries. These columns, in general, contains meta–data i.e.,
data about data contained in the records of a table.
Thus, any record would have attribute values for the system-defined columns
as well as the user-defined columns of a table. The following table lists the
system columns.
Column Name Description
oid (object identifier) The unique object identifier of a record.
It is automatically added to all records.
It is a 4-byte number.
It is never re-used within the same table.
tableoid (table object identifier) The oid of the table that contains a row.
The pg_class system table relates the name
and oid of a table.
xmin (transaction minimum) The transaction identifier of the inserting
transaction of a tuple.
Cmin (command minimum) The command identifier, starting at 0, is
associated with the insetting transaction of
a tuple.
xmax (transaction maximum) The transaction identifier of a tuple’s
deleting transaction.
If a tuple has not been deleted then this is
set to zero.
cmax (command maximum) The command identifier is associated with
the deleting transaction of a tuple. Like
xmax, if a tuple has not been deleted then
this is set to zero.
ctid (tuple identifier) This identifier describes the physical
location of the tuple within the database.
A pair of numbers are represented by the
ctid: the block number, and tuple index
within that block.
Figure : System Columns
If the database creator does not create a primary key explicitly, it would become
difficult to distinguish between two records with identical column values. To
avoid such a situation PostgreSQL appends every records with its own object
identifier number, or OID, which is unique to that table. Thus, no two records
in the same table will ever have the same OID, which, also mean that no two
records are identical in a table. The oid makes sure of this.
150 GullyBaba Publishing House (P) Ltd.
Internally, PostgreSQL stores data in operating system files. Each table has its
own file, and data records are stored in a sequence in the file. You can create
an index on the database. An index is stored as a separate file that is sorted on
one or more columns as desired by the user.
Indexes
Indexes allow fast retrieval of specific rows from a table. For a large table
using an index, finding a specific row takes fractions of a second while non-
indexed entries will require more time to process the same information.
PostgreSQL does not create indexes automatically. Indexes are user defined
for attributes or columns that are frequently used for retrieving information.
• B-Tree Indexes: These are the default type index. These are useful for
comparison and range queries.
• Hash Indexes: This index uses linear hashing. Such indexes are not preferred
in comparison with B-tree indexes.
• R-Tree Indexes : Such index are created on built-in-spatial data types such
as box, circle for determining operations like overlap etc.
• GiST Indexes: These indexes are created using Generalised search trees.
Such indexes are useful for full text indexing, and are thus useful for retrieving
information.
The Oracle RDBMS stores data logically in the form of tablespaces and physically
in the form of data files. Tablespaces can contain various types of memory
segments; for example, Data Segments, Index Segments etc. Segments in
turn comprise one or more extents. Extents comprise groups of contiguous
data blocks. Data blocks form the basic units of data storage. At the physical
level, data-files comprise one or more data blocks, where the block size can
vary between data-files.
Oracle database management keeps track of its computer data storage with
the help of information stored in the SYSTEM tablespace. The SYSTEM
tablespace contains the data dictionary — and often (by default) indexes and
clusters. (A data dictionary consists of a special collection of tables that contains
information about all user-objects in the database). Since version 8i, the Oracle
RDBMS also supports "locally managed" tablespaces which can store space
management information in bitmaps in their own headers rather than in the
SYSTEM tablespace (as happens with the default "dictionary-managed"
tablespaces).
If the Oracle database administrator has instituted Oracle RAC (Real Application
Clusters), then multiple instances, usually on different servers, attach to a
152 GullyBaba Publishing House (P) Ltd.
The Oracle DBMS can store and execute stored procedures and functions
within itself. PL/SQL (Oracle Corporation's proprietary procedural extension
to SQL), or the object-oriented language Java can invoke such code objects
and/or provide the programming structures for writing them.
The SCOTT schema has seen less use as it uses so few of the features of a
modern release of Oracle. Most recent examples reference the default HR or
OE schemas.
Each Oracle instance allocates itself an SGA when it starts and de-allocates it
at shutdown time. The information in the SGA consists of the following
elements, each of which has a fixed size, established at instance startup:
* the database buffer cache: this stores the most recently-used data blocks.
These blocks can contain modified data not yet written to disk (sometimes
known as "dirty blocks"), unmodified blocks, or blocks written to disk since
modification (sometimes known as clean blocks). Because the buffer cache
keeps blocks based on a most-recently-used algorithm, the most active buffers
stay in memory to reduce I/O and to improve performance.
* the redo log buffer: this stores redo entries - a log of changes made to the
database. The instance writes redo log buffers to the redo log as quickly and
efficiently as possible. The redo log aids in instance recovery in the event of a
system failure.
* the shared pool: this area of the SGA stores shared-memory structures
such as shared SQL areas in the library cache and internal information in the
data dictionary. An insufficient amount of memory allocated to the shared pool
can cause performance degradation.
If multiple applications issue the same SQL statement, each application can
access the shared SQL area: this reduces the amount of memory needed and
reduces the processing-time used for parsing and execution planning.
154 GullyBaba Publishing House (P) Ltd.
Oracle stores information here about the logical and physical structure of the
database. The data dictionary contains information such as the following:
The Oracle instance frequently accesses the data dictionary in order to parse
SQL statements. The operation of Oracle depends on ready access to the data
dictionary: performance bottlenecks in the data dictionary affect all Oracle
users. Because of this, database administrators should make sure that the data
dictionary cache has sufficient capacity to cache this data. Without enough
memory for the data-dictionary cache, users see a severe performance
degradation. Allocating sufficient memory to the shared pool where the data
dictionary cache resides precludes these particular performance problems.
The size and content of the PGA depends on the Oracle-server options installed.
This area consists of the following components:
stack-space: the memory that holds the session's variables, arrays, and so on.
session-information: unless using the multithreaded server, the instance stores
its session-information in the PGA. (In a multithreaded server, the session-
information goes in the SGA.)
Q15. What are the three basic steps involved in SQL tuning?
Ans.
(a) Identifying high load on top of SQL Statements.
(b) Verifying that the execution plans perform reasonably, and
(c) Implementing corrective actions to generate better execution plans.
Q18. Name the basic technology used to move data from are ORACLE
database to another.
Ans.
a) Basic replication,
b) Advanced replication
c) Transportable replication,
d) Advanced queuing and streams, and
e) Extraction, transformation, loading.
process of extracting data from source systems and transferring it to the data
warehouse.
In this relation each product comes in the range of colours and sizes.
Ans. multivalued dependency: A multivalued dependency is a full constraint
between two sets of attributes in a relation.
In contrast to the functional dependency, the multivalued dependency requires
that certain tuples be present in a relation. Therefore, a multivalued dependency
is also referred to as a tuple-generating dependency. The multivalued
dependency plays a role in the 4NF database normalization.
Formal definition
Let R be a relation schema and let α ⊆ R and β ⊆ R. The multivalued
dependency α → β holds on R if, in any legal relation r(R), for all pairs of
tuples t1 and t2 in r such that t1[ α ] = t2[ α ], there exist tuples t3 and t4 in r
such that
t1[ α ] = t2[ α ] = t3[ α ] = t4[ α ]
t3[ β ] = t1[ β ]
t3[R - β ] = t2[R - β ]
t4[ β ] = t2[ β ]
162 GullyBaba Publishing House (P) Ltd.
Ti+3
T i+3
t0 tc tx
A checkpoint is taken at time tc, which indicates that at that time all log and
data buffers were propagated to storage.
The transactions T0, T1, T2, T4, were committed and their modifications are
reflected in the database. With the checkpoint scheme, these transactions are
not required to be redone after the system crash.
The transaction Ti and Ti+3 are to be redone at the point of checkpoint, where
the transaction Ti+2 has to be undo, as no log has been written for this
transaction.
(iii) Show the cost calculation for a simple hash-join operation. Make
the necessary assumptions, if any. 6
Ans. Cost calculation for Simple Hash-Join
(i) Cost of partitioning r and s: all the blocks of r and s are read once and after
partitioning written back, so cost 1 = 2 (blocks of r + blocks of s).
(ii) Cost performing the hash-join using build and probe will require at least
one block transfer for reading the partitions
Cost 2 = (blocks of r + blocks of s)
(iii) There are a few more blocks in the main memory that may be used for
evaluation, they may be read or written back. We ignore this cost as it will be
too less in comparison to cost 1 and cost 2.
Thus, the total cost = cost 1 +cost 2.
= 3 (blocks of r + blocks of s)
The cost for step (ii) and (iii) here will be the same as that given in steps (ii)
and (iii) above.
main memory, n can be set to 1 and the algorithm need not partition the
relations but may still build an in-memory index, in such cases the cost estimate
goes down to (Number of blocks r + Number of blocks of s).
Mname
Fnam e
Lname
ees
employ name
Name No_of_ Dept_ Dept_id
Emp_id n
Address Locatio
Do Birth
Salary 1
N
Employee Works_is DEPARTMENT
Gender
1 1
Manages
has
dependents
N
DEPENDENT
Birthdate
Name Gender
Data models like E-R model still persists but systems built using object modeling
techniques like UML are now preferred. Both UML and ER model focus on
object as an implicit way to analyze, conceive and observe user requirements.
Since UML is accepted by the objects management group (OMG) as the
standard modeling object-oriented programme it becomes one of the automatic
choices.
6. In HTML, links are unidirectional, 6. In XML, with Xlink, one can use
i.e., it connects only two resources bi-directional links, link up more
and a link has no semantics. than two resources and specify a
“role” to the link by associating
semantics with it.
7. Compared with XML, HTML has 7. The syntax for XML is strictly
a relatively loose syntax. For defied.
example, some syntactical errors
may be acceptable (i.e., the
document may still be processed
even if a syntactical error occurs).
Integrity constraints are defined with a table and are stored as part of the
table’s definition in the data dictionary, so that all database applications adhere
to the same set of rules. When a rule changes, it only need to be changed once
at the database level and not for each application.
Fragmentation Schema
Allocation Schema
Database
on disk
III. Local schemas : The local schema at each site can consists of conceptual
and internal schema. However, to map the fragmentation and allocation schema
to local external objects may require local mapping schema. It is this schema
(i.e. local mapping schema), that hides the details of various DBMSs, thus
170 GullyBaba Publishing House (P) Ltd.
(ii) How does the architecture of DDBMS differ from that of centralized
DBMS? 5
Ans. If the data of an organization is stored at the single location under the
strict control of the DBMS, then, it is termed as a centralized database system.
A centralized system has the advantage of controlling data under a control
authority, so, the chances of integrity failure less, inaccuracy are very low. In
such a system it is easier to enforce business, rules and the system can also be
made very secure.
In a DDBMS, all the above problems are solved to an extent, as most of the
data access can new be local, however, it adds overhead on part of maintaining
databases at several sites. The sites must coordinate if global query is to be
handled. Thus, it will increase the overall complexity of the system.
(iii) Identify the functional dependencies which hold for the following
relation: 5
P_No Warehouse
Factory
P1 F1 WH1
P1 F2 WH1
P2 F2 WH1
P3 F2 WH2
P4 F2 WH2
The above relation indicates where each product is made and where it is
stored. Each product may be produced at a number of different factories,
but may only be stored at one warehouse.
MCS-043 GPH Book 171
(ii) Describe an application scenario (in brief) for which you would choose
an ORDBMS and explain why. 6
Ans. A relational database management system (RDBMS) that supports object
classes as data types is called object-relational database management. The
term “object relational” is also applied imprecisely by the database industry to
any number of products (especially SQL-based ones) that have some object-
oriented and pseudo-relational features. There is no formal definition for this
commercial usage of the term.
An object-relational database (ORD) or object-relational database management
system (ORDBMS) is a relational database management system that allows
developers to integrate the database with their own custom data types and
methods. The term object-relational database is sometimes used to describe
external software products running over traditional DBMSs to provide similar
features; these systems are more correctly referred to as object-relational
mapping systems.
Whereas RDBMS or SQL-DBMS products focused on the efficient
management of data drawn from a limited set of data types (defined by the
relevant language standards), an object-relational DBMS allows software
developers to integrate their own types and the methods that apply to them
into the DBMS. The goal of ORDBMS technology is to allow developers to
raise the level of abstraction at which they view the problem domain.
In an RDBMS, it would be fairly common to see SQL statements like this:
CREATE TABLE Customers (
Id CHAR(12) NOT NULL PRIMARY KEY,
Surname VARCHAR(32) NOT NULL,
FirstName VARCHAR(32) NOT NULL,
DOB DATE NOT NULL
);
SELECT InitCap(Surname) || ‘, ‘ || InitCap(FirstName)
FROM Customers
WHERE Month(DOB) = Month(getdate())
AND Day(DOB) = Day(getdate())
SELECT Formal(Id)
FROM Customers
MCS-043 GPH Book 173
Many of the ideas of early object-relational database efforts have largely been
added to SQL:1999. In fact, any product that adheres to the object-oriented
aspects of SQL:1999 could be described as an object-relational database
management product. For example, IBM’s DB2, Oracle database, and Microsoft
SQL Server, make claims to support this technology and do so with varying
degrees of success.
Applications scenario for an ORDBMS is better choice are :
174 GullyBaba Publishing House (P) Ltd.
(b) Show the details of all the members who have been involved in any
event organized by the society whose name is “XYZ”.
σ
Ans. π M_id, M_name M_phone, M_addres ( s_name = 'xyz' (society Event
Member))
Examples of classification :
• Teachers classify students marks data into a set of grades as A, B, C, D, or
F.
• Classification of the height of a set of persons into the classes tall, medium,
or short.
Clustering is a very useful exercise especially for identifying similar groups
from the given data. Such data can be about buying patterns, geographical
locations, web information and many more.
Examples of Clustering :
• To segment the customer database of a departmental store based on similar
buying patterns.
• To identify similar web usage patterns etc.
(b) What is a mechanism for deadlock detection ? Explain with the help
of a diagram. 5
Ans. Deadlock Detection
Deadlocks can be described precisely in terms of a directed graph called a
wait for graph. This graph consists of a pair G = (V, E), where V is a set of
vertices and E is a set of edges. The set of vertices consists of all the transaction
in the system. Each element in the set E of edges is an ordered pair Ti → Tj . If
Ti → Tj is in E, then there is a directed edge from transaction Ti to Tj, implying
that transaction Ti is waiting for transaction Tj to release a data item that it
needs.
When transaction Ti requests a data item currently being held by transaction Tj
then the edge Ti → Tj is inserted in the wait-for graph. This edge is remove
only when transaction Tj is no longer holding a data item needed by transaction
Ti.
A deadlock exists in the system if and only if the wait-for graph contains a
cycle. Each transaction involved in the cycle is said to be deadlocked. To
detect deadlocks, the systems needs to maintain the wait-for graph, and
periodically to invoke an algorithm that searches for a cycle in the graph.
To illustrate these concepts, consider the wait-for graph in figure (1), which
depicts the following situation:
MCS-043 GPH Book 177
T26 T28
T25
T27
T26 T28
T25
T27
(ii) Serializability
Ans. In databases and transaction processing, serializability is the property of
a schedule being serializable. It means equivalence (in its outcome, the resulting
database state, the values of the database’s data) to a serial schedule (serial
schedule: No overlap in two transactions’ execution time intervals; consecutive
transaction execution). It relates to the isolation property of a transaction, and
plays an essential role in concurrency control. Transactions are usually executed
concurrently since their serial executions are typically extremely inefficient
and thus impractical.
Serializability is the major criterion for the correctness of concurrent
transactions’ executions (i.e., transactions that have overlapping execution
time intervals, and possibly access same shared resources), and a major goal
for concurrency control. As such it is supported in all general purpose database
systems. The rationale behind it is the following: If each transaction is correct
by itself, then any serial execution of these transactions is correct. As a result,
any execution that is equivalent (in its outcome) to a serial execution, is correct.
Schedules that are not serializable are likely to generate erroneous outcomes.
Well known examples are with transactions that debit and credit accounts
with money. If the related schedules are not serializable, then the total sum of
money may not be preserved. Money could disappear, or be generated from
nowhere. These violations of possibly needed other invariant preservations
are caused by one transaction writing, and “stepping on” and erasing what has
MCS-043 GPH Book 179
(iii) ODBC
Ans. (iii) Open Database Connectivity (ODBC) is a standard software API
specification for using database management systems (DBMS). ODBC is
independent of programming language, database system and operating system.
ODBC was created by the SQL Access Group and first released in September,
1992. ODBC is based on the Call Level Interface (CLI) specifications from
SQL, X/Open (now part of The Open Group), and the ISO/IEC.
The ODBC API is a library of ODBC functions that let ODBC-enabled
applications connect to any database for which an ODBC driver is available,
execute SQL statements, and retrieve results.
The goal of ODBC is to make it possible to access any data from any application,
regardless of which database management system (DBMS) is handling the
data. ODBC achieves this by inserting a middle layer called a database driver
between an application and the DBMS. This layer translates the application’s
data queries into commands that the DBMS understands.
MS-ACCESS
ACCESS database
ODBC driver
Application DRIVER
manager SQL driver
MS-SQL
database
ODBC Application Architecture
(iv) OLAP
Ans. On Line Analytical Processing, or OLAP, is an approach to quickly
providing answers to analytical queries that are multidimensional in nature.
OLAP is part of the broader category business intelligence, which also includes
Extract transform load (ETL), relational reporting and data mining. The typical
applications of OLAP are in business reporting for sales, marketing,
management reporting, business process management (BPM), budgeting and
forecasting, financial reporting and similar areas. The term OLAP was created
as a slight modification of the traditional database term OLTP (On Line
Transaction Processing).
Databases configured for OLAP employ a multidimensional data model, allowing
180 GullyBaba Publishing House (P) Ltd.
for complex analytical and ad-hoc queries with a rapid execution time. Nigel
Pendse has suggested that an alternative and perhaps more descriptive term to
describe the concept of OLAP is Fast Analysis of Shared Multidimensional
Information (FASMI). They borrow aspects of navigational databases and
hierarchical databases that are speedier than their relational kin.
The output of an OLAP query is typically displayed in a matrix (or pivot)
format. The dimensions form the row and column of the matrix; the measures,
the values.
(a)
182 GullyBaba Publishing House (P) Ltd.
EMP
Sanjay X Johny
Sanjay Y Anita
Sanjay X Anita
Sanjay Y Johny
(b)
EMP_PROJECTS
SUPPLY
R1 R2 R3
Figure : Fourth and fifth normal form. (a) The EMP relation with two MVDs:
MCS-043 GPH Book 183
(b) What are Assertions? What is the utility of assertions? How are
Assertions different from Views? Describe both Assertions and Views
with the help of example. 6
Ans. Refer to Chapter-4, Q.No.-1, Page No.-48
(c) What is UML? How does UML have an edge over other database
designing tools? With the help of an example, describe the designing of
database by using a UML class diagram. 6
Ans. UML is a standard modeling language used for modeling software systems
of varying complexities. System can range from enterprise information systems
to distributed web based systems.
Since UML is accepted by the Object Management Group (OMG) as the
standard for modeling object-oriented programs it becomes one of the automatic
choices. UML has quickly become one of the popularly used tools for modeling
business and software application needs. The UML became popular due to the
following reasons:
(1) It is very flexible. It allows for many different types of modeling. Some of
them include:
184 GullyBaba Publishing House (P) Ltd.
Student Programme
ID Programmecode
Name Name
Phone Fee
..... .....
registersprogramme ( ) addprogramme
viewtimetable ( )
......
viewtimetable ( ) numberofhrs
......
viewtimetable ( )
feecalculation ( )
Figure : UML class diagram
Please note that in the diagram above, you can make clear-cut table design.
One such possible design may be:
STUDENT (ID, Name, Phone, type (PT, FT))
PT (ID, numberofhrs)
PROGRAMME (Programmecode, Name, Fee)
Stuprog (ID, Programmecode)
You can also identify several functions/data processing functions/triggers
relating to functions of classes. For example, feecalculation( ) may need to be
implemented as a stored procedure while calculating the fee of part-time
students, whereas addprogramme( ) is a simple data entry function. Thus,
UML Class diagram with generalization/ specialisation is a very useful database
design tool.
(d) With the help of a block diagram describe the phases of Query
processing. How do we optimize a Query under consideration? Does Query
optimisation contribute to the measurement of Query cost? Support
your answer with suitable explanation. 8
186 GullyBaba Publishing House (P) Ltd.
<user> specifies the name of the user to whom the privileges have been granted.
To grant the privileges to all the uses PUBLIC option can be used.
[WITH GRANT OPTION] is optional.
A user having privileges can pass on his privileges to another user by using the
GRANT command. The recipient user can then grant the same privileges to
some other user. For example, when an owner of a table grants privilege on
the table to another user B, the privilege can be given with or without Grant
Option. If the [WITH GRANT OPTION] is given, this means that B can also grant
privilege to other user, else B cannot grant privilege to other users.
Revoke Command
The Revoke command is used to cancel database privileges from user(s). The
syntax for revoke command is
If r and s are the relation state of R and S respectively at any specific time
then:
x (r(R)) ⊆ y (s(S))
The subnet relationship does not necessarily have to be a proper subnet. The
sets of attributes on which the inclusion dependency is specified viz. X of R
and Y of S above, must have the same number of attributes. In addition, the
domains for each corresponding pair of attributes in X and Y should be
188 GullyBaba Publishing House (P) Ltd.
Here
X = {A1, A2, ……., An} and
Y = {B1, B2, ….., Bn} and
Ai correspondence to Bi for 1 in
IDIR3 (transitivity): if R.X < S.Y and S.Y < T.Z then R.X < T.Z
Template Dependencies
The Template dependencies are the more general and natural class of data
dependencies that generalizes the concepts of JDs. A template dependency is
representation of the statement that a relation is invariant under a certain tableau
mapping. Therefore, it resembles a tableau. It consists of a number of hypothesis
rows that define certain variables with a special row at the bottom, called the
conclusion row. A relation r satisfies a template dependency, if and only if, a
valuation (say ρ )that successfully maps the hypothesis rows to tuples in a
relation r, finds a map for conclusion row to a tuple in r.
Definition : A template dependency (TD) on a relation scheme R is a pair T =
(T, w) where T={w1, w2, …..wk} is a set of hypothesis rows on R, and w
is a single conclusion row on R. A relation r(R) satisfies TD T if for every
valuation ρ of T such that ρ(T) ⊆ r, ρ can be extended to show that
ρ(w) ∈ r. A template dependency is trivial if every relation over R satisfies it.
As the name of this protocol suggests that in this protocol a single site known
as the central site maintains the entire locking information. Therefore, it has
only one lock manager that grants and releases locks. There is only one lock
manager, for the entire distributed DBMS that can grant and release locks.
This scheme can be modified by making the transaction manager responsible
for making all the lock requests rather than the sub transaction’s local transaction
manager. Thus, the centralised lock manager needs to talk to only the transaction
coordinator.
One of the major advantage of the centralised 2PL is that this protocol can
detect deadlocks very easily as there is only one lock manager. However, this
lock manager often, becomes the bottleneck in the distributed database and
may make a system less reliable. This is, due to the fact, that in such a case,
for the system to function correctly it needs to depend on a central site meant
for lock management.
Distributed 2PL
The basic objective of this protocol is to overcome the problem of the central
site controlling the locking. It distributes the lock managers to every site such
that each the lock manager is responsible for managing the locks on the data
items of that site. In case there is no replication of data, this protocol, thus,
will be identical to the primary copy locking.
Distributed 2PL is a Read–One–Write–All type of protocol. It means that for
reading a data item any copy of the replicated data may be read (as all values
must be consistent), however, for any update all the replicated copies must be
exclusively locked. This scheme manages the locks in a decentralised manner.
The disadvantage of this protocol is that deadlock handling is very complex,
as it needs to be determined by information from many distributed lock
managers. Another disadvantage of this protocol is the high communication
cost while updating the data, as this protocol would require locks on all the
replicated copies and updating all of them consistently.
(c) What do you mean by Data mining? How does Database processing
differ from Data mining processing? 5
Ans. Refer to Chapter-6, Q.No.-131
to access databases, regardless of the DBMS, through JAVA. There are many
drivers for JDBC that support popular DBMSs. However, if no such driver
exits for the DBMS that you have selected then, you can use a driver provided
by Sun Microsystems to connect to any ODBC compliant database. This is
called JDBC to ODBC Bridge. For such an application, you may need to create,
an ODBC data source for the database before, you can access it from the Java
application.
Now, you can connect to the database using the driver manage class that
select the appropriate driver for the database. In more complex applications,
we may use different drivers to connect to multiple databases. We may identify
our database using a URL, which helps in identifying the database. A JDBC
URL starts with “jdbc:” that indicates the use of JDBC protocol.
jdbc:[email protected]:2000:student
connection db_conn=
DriverManager.getConection
(jdbc:[email protected]:2000:student,
“username”,”password”)
Thus, you will now be connected. Now the next step is to execute a query.
Thus, the JDBC standard allows you to handle databases through JAVA as the
host language.
(iii) PJNF
Ans. PJNF is defined using the concept of the join dependencies. A relation
MCS-043 GPH Book 193
Let us first define the concept of PJNF from the viewpoint of the decomposition
and then refine it later to a standard form.
Definition 3: Let R be a relation scheme having F as the set of FDs and JDs
over R. R will be in project-joint normal form (PJNF) if for every JD *[R1,
R2, . . . Rn] which can be derived by F that applies to R, the following holds:
• The JD is trivial, or
• Every Ri is a super key for R.
(b) What do you mean by tuning of SQL? What are the steps involved in
tuning of SQL? Discuss each step briefly. 6
Ans. An important facet of database system performance tuning is the tuning
of SQL statements. SQL tuning involves three basic steps:
• Identifying high load or top SQL statements that are responsible for a large
share of the application workload and system resources, by reviewing past
SQL execution history available in the system.
• Verifying that the execution planes produced by the query optimizer for
these statements perform reasonably.
• Implementing corrective actions to generate better execution plans for poorly
performing SQL statements.
These three steps are repeated until the system performance reaches a
satisfactory level or no more statements can be tuned.
Let us consider the Oracle to explain the tuning of SQL.
should start when tuning Oracle SQL. Tuning Oracle SQL is like fishing. You
must first fish in the Oracle library cache to extract SQL statements and rank
the statements by their amount of activity.
Step 2: Determine the execution plan for SQL: As each SQL statement is
identified, it will be “explained” to determine its existing execution plan. There
are a host of third-party tools on the market that show the execution plan for
SQL statements. The most common way of determining the execution plan
for a SQL statement is to use Oracle’s explain plan utility. By using explain
plan, the Oracle DBA can ask Oracle to parse the statement and display the
execution class path without actually executing the SQL statement.
This syntax is piped into the SQL optimizer, which will analyze the query and
store the plan information in a row in the plan table identified by RUN1. Please
note that the query will not execute; it will only create the internal access
information in the plan table. The plan tables contains the following fields:
• operation: The type of access being performed. Usually table access, table
merge, sort or index operation
• options: Modifiers to the operation, specifying a full table, a range table or a
join
• object_name: The name of the table being used by the query component
196 GullyBaba Publishing House (P) Ltd.
Step 3: Tune the SQL statement: For those SQL statements that possess a
non-optimal execution plan, the SQL will be tuned by one of the following
methods:
• Adding SQL “hints” to modify the execution plan
• Re-write SQL with Global Temporary Tables
• Rewriting the SQL in PL/SQL. For certain queries this can result in more
than a 20x performance improvement. The SQL would be replaced with a call
to a PL/SQL package that contained a stored procedure to perform the query.
(c) What is SQLJ? What are the requirements of SQLJ? Briefly describe
the working of SQLJ. Can SQLJ use dynamic SQL? If yes, then how?
Otherwise specify what type of SQL it can use. 7
Ans. Refer to Chapter-4, Q.No.-11, Page No.-58
version Qk
R–timestamp (Qk) – largest timestamp of a transaction that successfully
read version Qk.
(1) If transaction Ti issues a read(Q), then the value returned is the content of
version Qk.
(2) If transaction Ti issues a write(Q), and if TS(Ti)< R-timestamp(Qk), then
Transaction Ti is rolled back, otherwise, if TS(Ti)=W–timestamp(Qk), the
content of Qk are overwritten a new version of Q is created.
The following multi–version technique ensure serialisability.
Suppose that transaction Ti issues a read (Q) or write (Q) operation. Let Qk
denote the versions of Q whose write timestamp is the largest write timestamp
less than or equal to TS (Ti).
(1) If transaction Ti issues a read (Q), then the value returned is the content of
version Qk.
(2) If transaction Ti issues a write (Q), and if TS (Ti) < LR–timestamp(Qk),
then transaction Ti is rolled back. Otherwise, if TS(Ti)= W–timestamp(Qk),
the contents of Qk are overwritten, then a version of Q is created.
Read always succeeds. A write by Ti is rejected if some other transaction Ti
(in the serialization order defined by the timestamp values) should read Ti’s
write, has already read a version created by a transaction older than Ti.
(b) What is Audit trail? Give four benefits provided by Audit trail to
DBMS. 4
198 GullyBaba Publishing House (P) Ltd.
can expect the procurement and maintenance costs for DDBMS to be higher
than those for a centralised DBMS. In additions to software, a distributed
DBMS requires additional hardware to establish a network between sites.
• Greater potential for bugs: Since the sites of a distributed system operate
concurrently, it is more difficult to ensure the correctness of algorithms. The
art of constructing distributed algorithms is an active and important area of
research.
1. (a) Why is the functional dependency called so? Consider the following
functional dependency:
if you study → you will pass
Create instances where this functional dependency will hold/not hold.
12
(b) Create and explain an object oriented database for the following
UML diagram. Assume your attributes and functions. 8
Fruit
Mango Banana
Pulp Seed
(d) Explain the Apriori algorithm for finding frequent itemsets using an
example. 10
4. (a) Explain the tasks in the KDD process with the help of a figure.
10
(b) Explain the architecture of Oracle 10g with the help of a figure. 10
202 GullyBaba Publishing House (P) Ltd.
AB →C
A → DE
F=
F →GH
D → IJ
1. (a) Why is the functional dependency called so? Consider the following
functional dependency:
if you study you will pass
Create instances where this functional dependency will hold/not hold.
Ans. When a single constraint is established between two sets of attributes
from the database it is called functional dependency. A functional dependency
denoted by x → y between two sets of attributes X and Y that are subset of
universal relation “A” specifies a constraint on the possible tuples that can
form a relation state of “A”. The constraint is that, for any two tuples t1 and
t2 in “A” that have t1(X) = t2(X), we must also have t1(Y) = t2(Y). It means
that, if tuple t1 and t2 have same values for attributes X than X → Y to hold
t1 and t2 must have same values for attributes Y.
The relation schema “A” determines the function dependency of Y on X
( X → Y ) when and only when:
(1) if two tuples in “A”, agree on their X value then
(2) they must agree on their Y value.
This semantic property of functional dependency explains how the attributes
in “A” are related to one another. A FD in “A” must be used to specify constraints
on its attributes that must hold at all times.
Instance where FD hold
Name Study Pass
A Yes Yes
B Yes Yes
C No No
D No Yes
Instance where FD does not hold
(b) Create and explain an object oriented database for the following
UML diagram. Assume your attributes and functions.
Fruit
Mango Banana
Pulp Seed
(ii) List all students taking at least one course that their advisor teaches.
πS# (σ ENROLL.S#=ADVISE.S# (ADVISE TEACH ENROLL))
(iii) List those professors who teach more than one section of the same
course.
(
π prof σ count (section ) >1 ( g prof-c# ( teachers ) ) )
MCS-043 GPH Book 207
4. (a) Explain the tasks in the KDD process with the help of a figure.
Ans. The different tasks in KDD are as follows:
• Obtains information on application domain: It gathers knowledge from
the domain relevant to the user.
• Extracting data set: It includes extracting required data which will later, be
used for analysis.
• Data cleansing process: It involves basic operations such as, the removal
of noise, collecting necessary information to from noisy data, such as, deciding
on strategies for handling missing data fields.
• Data reduction and projection: Using dimensionality reduction or
transformation methods it reduces the effective number of dimensions under
consideration.
• Selecting data mining task: In this stage we decide what the objective of
the KDD process is. Whether it is classification, clustering, association rules
etc.
• Selecting data mining method: In this stage, we decide the methods and
the parameter to be used for searching for desired patterns in the data.
Data organised by
function The KDD Process
Create/select
target database
Data Warehousing
Select sample
data
Supply missing
values
Find important
Normalize Transform Create derived attributes and
values Values attributes value ranges
Transform to
Different
representation
Tasks in the KDD process
208 GullyBaba Publishing House(P) Ltd.
(b) Explain the architecture of Oracle 10g with the help of a figure.
Ans. A schematic diagram of Oracle database is given below:
RECO PMON SMON
DBWO
Legend : LGWR
RECO Recoverer process User Control
PMON Process Monitor Process Files
SMON System monitor
CKPT Checkpoint
ARC0 Archiver Redo Log
DBW0 Database writer Files
LGWR Log writer
D000 Dispatcher Process
Datafiles
database. A redo log is made up of redo entries (also called redo records).
Parameter Files: Parameters files contain a list of configuration parameters
for that instance and database.
Backup Files: To restore a file is to replace it with a backup file. Typically,
you restore a file when a media failure or user error has damaged or deleted
the original file.
2. Logical Database Structures: The logical storage structures, including
data blocks, extents and segments, enable Oracles fine-grained control of disk
space use.
Tablespaces: A database is divided into logical storage units called tablespaces,
which group related logical structures together. For example, tablespaces
commonly group together all application objects to simplify administrative
operations.
Each database is logically divided into one or more tablespaces. One or more
datafiles are explicitly created for each tablespace to physically store the data
of all logical structures in a tablespace.
Oracle Data Blocks: At the finest level of granularity, Oracle database data is
stored in data blocks. One data block corresponds to a specific number of
bytes of physical database space on disk. The standard block size is specified
by the DB_BLOCK_SIZE initialization parameter.
Extents: The next level of logical database space is an extent. An extent is a
specific number of contiguous data blocks, obtained in a single allocation,
used to store a specific type of information.
Segments: Next, is the segment or the level of logical database storage. A
segment is a set of extents allocated for a certain logical structure.
3. Oracle Data Dictionary: Each Oracle database has a data dictionary. An
Oracle data dictionary is a set of tables and views that are used as a read-only
reference on the database. For example, a data dictionary stores information
about the logical and physical structure of the database.
4. Oracle Instance: An Oracle database server consists of an Oracle database
and an Oracle instance. Every time a database is started, a system global area
(SGA) is allocated and Oracle background processes are started. The
combination of the background processes and memory buffers is known as
an Oracle instance.
Instance Memory Structures: Oracle creates and uses memory structures
to complete several jobs. For example, memory stores program code being
run and data shared among users. Two basic memory structures associated
with Oracle are: the system global area and the program global area. The
following subsections explain each in detail.
System Global Area: The System Global Area (SGA) is a shared memory
region that contains data and control information for one Oracle instance.
210 GullyBaba Publishing House(P) Ltd.
Oracle allocates the SGA when an instance starts and deallocates it when the
instance shuts down. Each instance has its own SGA.
Database Buffer Cache of the SGA: Database buffers store the most recently
used blocks of data. The set of database buffers in an instance is the database
buffer cache.
Redo Log Buffer of the SGA: The redo log buffer stores redo entries – a log
of changes made to the database. The redo entries stored in the redo log
buffers are written to an online redo log, which is used if database recovery is
necessary.
Shared Pool of the SGA: The shared pool contains shared memory constructs,
such as shared SQL areas. A shared SQL area is required to process every
unique SQL statement submitted to a database. A shared SQL area contains
information such as the parse tree and execution plan for the corresponding
statement.
Program Global Area: The Program Global Area (PGA) is a memory buffer
that contains data and control information for a server process. A PGA is
created by Oracle when a server process is started. The information in a PGA
depends on the configuration of Oracle.
5. Oracle Background Processes: An Oracle database uses memory structures
and processes to manage and access the database.
There are numerous background processes and each Oracle instance can use
several background processes.
Process Architecture: A process is a “thread of control” or a mechanism in
an operating system that can run a series of steps. Some operating systems
use the terms job or task. A process generally has its own private memory area
in which it runs. An Oracle database server has two general types of processes:
user processes and Oracle processes.
Oracle Processes: Oracle processes are invoked by other processes to perform
functions on behalf of the invoking process. Oracle creates server processes
to handle requests from connected user processes. A server process
communicates with the user process and interacts with Oracle to carry out
requests from the associated user process.
AB →C
A → DE
F=
F →GH
D → IJ
MCS-043 GPH Book 211
R2 (A D E) F2 = {A → DE}
R3 (F G H) F3 = {F → GH}
R4 (D I J) F4 = {D → IJ}
Note: Question number 1 is compulsory. Answer any three questions from the
rest.
Q1. (a) What are cursors, stored procedures and triggers? Explain with
an example of your choice.
Refer to Page-92-94, Q.No.-36
(b) The concept of “marriage” is identified as a class in first view, as a
relationship in second view and as an attribute in third view. How will
you solve this using View-Integration problem? Suggest one good
measure.
Ans. View 1:
MARRIAGE
Bride
Groom
Date of marriage
View 2:
Person
All the above three views can be integrated into one conceptual schema as
given below:
Person
Date of
marriage
MCS-043 GPH Book 213
(c) Differentiate between Logical and Physical database design. How can
UML Diagrams help in database design?
Ans.
Basis Logical Database Design Physical Database Design
Task Maps or transforms the The specifications for the
conceptual schema (or an ER stored database in terms of
schema) from the high-level data physical storage structures,
model into a relational database record placement, and
schema. indexes are designed.
Choice of The mapping can proceed in two The following criteria are
criteria stages: often used to guide the choice
System-independent of physical database design
mapping but data model- options:
dependent Response Time
Tailoring the schemas to a Space Utilization
specific DBMS Transaction Throughput
Result DDL statements in the language An initial determination of
of the chosen DBMS that storage structures and the
specify the conceptual and access paths for the database
external level schemas of the files. This corresponds to
database system. But if the DDL defining the internal schema
statements include some in terms of Data Storage
physical design parameters, a Definition Language.
complete DDL specification
must wait until after the physical
database design phase is
completed.
The database design is divided into several phases. The logical database design
and physical database design are two of them. This separation is generally
based on the concept of three-level architecture of DBMS, which provides the
data independence. Therefore, we can say that this separation leads to data
independence because the output of the logical database design is the conceptual
and external level schemas of the database system which is independent from
the output of the physical database design that is internal schema.
Now Refer : Page-183-184, Q.No.-1(c)
(d) Explain how Hash Join is applicable to Equi Join and Natural Joins.
Explain the Algorithm and Cost calculation for Simple Hash Join.
Ans. This is applicable to both the equi-joins and natural joins. A hash function
h is used to partition tuples of both relations, where h maps joining attribute
(enroll no in our example) values to {0, 1, …, n-1}.
214 GullyBaba Publishing House(P) Ltd.
The join attribute is hashed to the join-hash partitions. In the example of Figure
we have used mod 100 function to hashing and n = 100.
Student Marks
partitions partitions
0 1100 0
STUDENT table MARKS table
Enrolon Name …
1001 1
1 Enrolon Name Marks
1001 Ajay . 1001 1001 1001 MCS-11 55
1002 Aman . 1001 MCS-12 75
1005 Rakes . 2 2 1002 MCS-11 90
1100 Raman . 1002 1002 1005 MCS-15 75
. . . . . .
. . . . . .
3 3
. . . . . .
. . . . . .
. . . . . .
4 4
. . . . . .
5 5
1005 1005
– 1 i.e., the number of buckets is > the number of buffer pages. In such a
case, the relation s can be recursively partitioned, instead of partitioning n
ways, use M – 1 partitions for s and further partition the M – 1 partitions using
a different hash function.
Cost calculation for Simple Hash-Join:
(i) Cost of partitioning r and s: all the blocks of r and s are read once and after
partitioning written back, so cost 1 = 2 (blocks of r + blocks of s).
(ii) Cost of performing the hash-join using build and probe will require at least
one block transfer for reading the partitions
Cost 2 = (blocks of r + blocks of s)
(iii) There are a few more blocks in the main memory that may be used for
evaluation, they may be read or written back. We ignore this cost as it will be
too less in comparison to cost 1 and cost 2.
Thus, the total cost = cost 1 + cost 2
= 3 (blocks of r + blocks of s)
Cost of Hash-Join requiring recursive partitioning:
(i) The cost of partitioning in this case will increase to number of recursion
required, it may be calculated as:
Number of iterations required = ([logM-1 (blocks of s)] – 1)
Thus, cost 1 will be modified as:
= 2 (blocks of r + blocks of s) × ([logM-1 (blocks of s)] – 1)
The cost for step (ii) and (iii) here will be the same as that given in steps (ii)
and (iii) above.
Thus, total cost = 2(blocks of r + blocks of s) ([logM-1(blocks of s) – 1]) +
(blocks of r + blocks of s).
Because s is in the inner term in this expression, it is advisable to choose the
smaller relation as the build relation. If the entire build input can be kept in the
main memory, n can be set to 1 and the algorithm need not partition the
relations but may still build an in-memory index, in such cases the cost estimate
goes down to (Number of blocks r + Number of blocks of s).
(e) How is a database management system different from a data
warehouse? When we have sufficient tools and concepts to develop a
DBMS, then why do we still design warehouses?
Ans. The first and most important difference between a classical, general
purpose DBMS and a data warehouse–specific DBMS is how updates are
performed. A classical, general purpose DBMS must be able to accommodate
record-level, transaction-based updates as a normal part of operations. Because
these updates are a regular feature of the general purpose DBMS, this DBMS
must offer facilities for such items as:
216 GullyBaba Publishing House(P) Ltd.
Locking
COMMITs
Checkpoints
Log processing
Deadlock
Backout
Not only do these features become a normal part of the DBMS, they consume
a tremendous amount of overhead. Interestingly, the overhead is consumed
even when it isn’t being used. In other words, at least some update and locking
overhead that’s dependent on the DBMS is required by a general purpose
DBMS even when read-only processing is being executed. Depending on the
general purpose DBMS, the overhead required by an update can be minimized,
but it cannot be completely eliminated. For a data warehouse–specific DBMS,
there’s no need for any of the overhead of an update.
The second major difference between a general purpose DBMS and a data
warehouse–specific DBMS regards basic data management. For a general
purpose DBMS, data management at the block level includes space that’s
reserved for future block expansion at the moment of update or insertion.
Typically, this space is referred to as freespace. For a general-purpose DBMS,
freespace may be as high as 50%. For a data warehouse–specific DBMS,
freespace always equals 0% because there’s no need for expansion in the
physical block, once loaded; after all, an update is not done in the data warehouse
environment. Indeed, given the amount of data to be managed in a data
warehouse, it makes no sense to reserve vast amounts of space that may
never be used. Another relevant difference between the data warehouse and
the general purpose environment that’s reflected in the different types of DBMS
is indexing.
A general purpose DBMS environment is restricted to a finite number of indexes.
This restriction exists because as updates and insertions occur, the indexes
require their own space and their own data management. In a data warehouse
environment where there is no update and there is a need to optimize data
access, there’s a need (and an opportunity) for many indexes. Indeed, a much
more robust and sophisticated indexing structure can be employed for data
warehousing than for operational, update-oriented databases.
Beyond indexing, update and basic data management at the physical block
level are some other basic differences between the data management capabilities
and philosophies of a general purpose, transaction-processing DBMS and a
data warehouse–specific DBMS. Perhaps the most basic difference is the
ability to physically organize data in an optimal fashion for different kinds of
access. Typically a general purpose DBMS physically organizes data for optimal
transaction access and manipulation. Organizing in this fashion allows many
MCS-043 GPH Book 217
DB Database
F1 F2 Files
R1 R2 Records
(g) What are the various problems that arise in distributed DBMS
environment that are not encountered in a centralized DBMS
environment?
Refer to Page-110, Q.No.-68
Q2. (a) Distinguish between the following:
(i) Embedded SQL and Dynamic SQL
Refer to Page-55-56, Q.No.-8
(ii) XML and HTML
Refer to Page-166-167, Q.No.-1(vii)
(iii) 2 PC and 3 PC
Ans. Major difference of 3PC occurs with 2PC, where a crashed participant
could recover to a Commit state while all the others were still in state Ready.
In that case, the remaining operational processes could not reach a final decision
and would have to wait until the crashed process recovered. With 3PC, if any
operational process is in its Ready state, no crashed process will recover to a
state other than INIT, Abort, or PreCommit. For this reason, surviving
processes can always come to a final decision. Finally, if the processes that P
can reach are in state Precommit (and they form a majority), then it is safe to
commit the transaction. Again, it can be shown that in this case, all other
processes will either be in state Ready or at least, will recover to state Ready,
Precommit, or Commit when they had crashed.
(iv) Data warehousing and Data mining
Ans. There is a lot of confusion concerning the terms data mining and data
warehousing (also referred to as business intelligence in the marketplace today).
To my chagrin, many IT professionals use the two terms interchangeably,
with little hesitation or regard for the differences between the two types of
applications. While the goals of both are related, and often overlap; data mining
and data warehousing are dedicated to furnishing different types of analytics,
for different types of users and therefore merit their own space.
By definition, data mining is intended for users who are statistically inclined.
These analysts look for patterns hidden in data, which they are able to extract
using statistical models. Data miners engage in question formulation based
primarily on the “law of large numbers” to identify potentially useful relationships
between data elements, which can be profitable to companies.
For instance, car insurance companies will sift through terabytes of data to
link accident rates to demographic groups. They may start with a hypothesis
that single men in the 18-25 age group who drive red sports cars are prone to
drive recklessly (based on number of tickets they get and number of accidents
they have) than older men who drive minivans. After sifting through their
MCS-043 GPH Book 219
data, they may find that this hypothesis is not necessarily true. On the contrary,
they may find that families who have multiple cars, among which one is a
sports car (color is irrelevant), have more accidents and tickets when they
have a teenager living in the house. Such cause-and-effect patterns help data
miners quote premiums fairly on insurance rates. And hopefully allow them to
reduce premiums where appropriate.
(b) List all the functional dependencies satisfied by the following relation:
A B C
a1 b1 c1
a1 b1 c2
a2 b1 c1
a2 b1 c3
Write an SQL-Code to identify whether a given functional dependency
holds.
Ans. FDS:
A→B
C→B
SQL Code for
First FD: A → B
Select A, Count (A) From R Group By A
Second FD: C → B
Select C, Count (B) From R Group By C
(c) What are Semantic Databases? List features of Semantic Databases.
Explain the process of searching the knowledge in these databases.
Ans. Semantic modeling provides a far too rich set of data structuring capabilities
for database applications. A semantic model contains far too many constructs
that may be able to represent structurally complex inter-relations among data
in a somewhat more natural way.
Semantic modeling is one of the tools for representing knowledge especially in
Artificial Intelligence and object-oriented applications. Thus, it may be a good
idea to model some of the knowledge databases using semantic database system.
Some of the features of semantic modeling and semantic databases are:
· these models represent information using high-level modeling abstractions,
· these models reduce the semantic overloading of data type constructors,
· semantic models represent objects explicitly along with their attributes,
· semantic models are very strong in representing relationships among objects,
and
· they can also be modeled to represent IS A relationships derived schema and
220 GullyBaba Publishing House(P) Ltd.
also complex objects. Some of the applications that may be supported by such
database systems in addition to knowledge databases may be applications such
as bio-informatics, that require support for complex relationships, rich
constraints and large-scale data handling.
Q3. (a) What is Data Mining? How is Data Mining a part of Knowledge
Discovery process? What are the goals of Data Mining and Knowledge
Discovery?
Ans. Data mining is the process of automatic extraction of interesting (non
trivial, implicit, previously unknown and potentially useful) information or
patterns from the data in large databases. Data mining is only one of the many
steps involved in knowledge discovery in databases. The various steps in
KDD are data extraction, data cleaning and preprocessing, data transformation
and reduction, data mining and knowledge interpretation and representation.
The different data-mining goals are: Classification, Clustering and Association
Rule Mining.
(b) Create an object-oriented database using ODL for the following figure:
Book Student
ISB_No. ENROLMENT_No.
TITLE * * NAME
Supply
PRICE MARKS
PUBLISHER COURSE
AUTHORS
*
Supplier
SUPPLIER_ID
SUPPLIER_NAME
SUPPLIER_ADDRESS
SUPPLIER_CITY
Ans. class Book
(key ISB_No.)
{ attribute string ISB_No;
attribute string title;
attribute float price;
attribute string publisher;
attribute string authors;
relationship set <student> student :: student;
relationship set <supplier> supplier :: supplier;
MCS-043 GPH Book 221
};
class student
(key Enrolment_No)
{ attribute number Enrolment_No;
attribute string name;
attribute number marks;
attribute string course;
relationship set <book> book :: book;
relationship set <supplier> supplier :: supplier;
};
class supplier
(key supplier_ID)
{ attribute number supplier_ID;
attribute string supplier_name;
attribute string supplier_address;
attribute string supplier_city;
relationship set <book> book :: book;
relationship set <student> student :: student;
};
Open source doesn’t just mean access to the source code. The distribution
terms of open-source software must comply with the following criteria:
1) Free Redistribution : The license shall not restrict any party from selling
or giving away the software as a component of an aggregate software
distribution containing programs from several different sources. The license
shall not require a royalty or other fee for such sale.
2) Source Code : The program must include source code, and must allow
distribution in source code as well as compiled form. Where some form of a
222 GullyBaba Publishing House(P) Ltd.
(b) Explain the overview of ORACLE architecture. What are the main
ORACLE tools available for application development? Can mobile use
of data centric applications be performed by ORACLE?
Ans. Refer to Dec-2007, Q.No.-4(b)
ORACLE forms Developer
ORACLE Reports Developer
ORACLE Designer
ORACLE J Developer
ORACLE Discoverer Administrative Edition
ORACLE Portal.
(c) What is data dictionary? List some features of data dictionary. What
are the various approaches to implement a Distributed Database
Catalogue?
Refer to Page-64-67, Q.No.-18, 20 and 24
Q5. (a) What is shadow paging? Illustrate with an example. What are
the advantages and disadvantages of shadow paging?
Ans. Shadow paging is an alternative to log-based recovery; this scheme is
useful if transactions are executed serially. In this two page tables are maintained
during the lifetime of a transaction—the current page table and the shadow
page table. It stores the shadow page table in non-volatile storage, in such a
way that the state of the database prior to transaction execution may be
recovered (shadzow pages table is never modified during execution). To start
with, both the page tables are identical. Only the current page table is used for
data item accesses during execution of the transaction. Whenever any page is
about to be written for the first time a copy of this page is made on an unused
page, the current page table is then made to point to the copy and the update is
performed on the copy.
224 GullyBaba Publishing House(P) Ltd.
1
2
3
4
5
6
7
•
•
•
n
Page table
Page on disk
A Sample page table
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
Shadow page table Current page table
Pages on disk
Shadow page table
MCS-043 GPH Book 225
To commit a transaction:
1. Flush all modified pages in main memory to disk
2. Output current page table to disk
3. Make the current page table the new shadow page table, as follows:
• keep a pointer to the shadow page table at a fixed (known) location on disk,
• to make the current page table the new shadow page table, simply update
the pointer to point at the current page table on disk.
Once pointer to shadow page table has been written, transaction is committed.
No recovery is needed after a crash — new transactions can start right away,
using the shadow page table. Pages not printed to from current/shadow page
table should be freed (garbage collected).
Advantages of shadow paging over log-based schemes:
• It has no overhead of writing log records,
• The recovery is trivial.
Disadvantages:
• Copying the entire page table is very expensive, it can be reduced by using a
page table structured like a B+-tree (no need to copy entire tree, only need to
copy paths in the tree that lead to updated leaf nodes).
• Commit overhead is high even with the above extension (Need to flush every
updated page and page table).
• Data gets fragmented (related pages get separated on disk).
• After every transaction is completed, the database pages containing old
versions is completed, of modified data need to be garbage collected/freed.
• Hard to extend algorithm to allow transactions to run concurrently (easier to
extend log based schemes).
(b) What are the various reasons for a transaction to fail in the middle
of execution?
Ans. A DBMS may encounter a failure. These failures may be of the following
types:
Transaction failure: An ongoing transaction may fail due to:
• Logical errors: Transaction cannot be completed due to some internal error
condition.
• System errors: The database system must terminate an active transaction
due to an error condition (e.g., deadlock).
System crash: A power failure or other hardware or software failure causes
the system to crash.
• Fail-stop assumption: Non-volatile storage contents are assumed to be
uncorrupted by system crash.
• Disk failure: A head crash or similar disk failure destroys all or part of the
disk storage capacity.
226 GullyBaba Publishing House(P) Ltd.
(d) What is Distributed DBMS? Explain its architecture. What are the
advantages of DDBMS over centralized databases?
Refer to Page-168-170, Q.No.-2(i)
Now Refer to Page-107-109, Q.No.-64
MCS – 043: ADVANCED DATABASE DESIGN
Dec, 2008
Note: Question number 1 is compulsory. Answer any three questions from the
rest.
(b) What is a data warehouse? Describe the process of ETL for a data
warehouse.
Ans. A Data Warehouse can be considered to be a “corporate memory”. It is
a repository of processed but integrated information that can be used for
queries and analysis. Data and information are extracted from heterogeneous
sources as they are generated. Academically, it is subject-oriented, time-variant
and a collection of operational data.
ETL is Extraction, transformation and loading. ETL refers to the methods
involved in accessing and manipulating data available in various sources and
loading it into target data warehouse. The following are some of the
transformations that may be used during ETL:
• Filter Transformation
• Joiner Transformation
• Aggregrator Transformation
• Sorting Transformation
d d
Find the names of employees using relational algebra who work on all
projects controlled by Department Number 5.
Ans.
( πename, pno ( Employee = Dept = Project ) ) ÷ ( πpno ( σdnum =5 ( Pr oject ) ) )
(f) How will you enforce Referential Integrity Constraint in Oracle?
Explain with the help of one example.
Ans. The referential integrity constraint can be enforced in Oracle (i) at the
time of table creation, and (ii) after the table creation.
Example: (i) At the time of table creation
create table Dept
( DNo number (4) Primary key
DName var char2 (10));
create table emp
( ENo number (6) Primary key
Ename var char2 (20)
DNo number (4) references dept (dno));
m n
Teacher Student
(h) What are the common database security failures? What are the
SQL commands for granting permission? Why are statistical databases
more prone to disclosure? Explain with the help of an example.
Ans. Common Database Security Failures: Database security is of
paramount importance for an organisation, but many organisations do not
take this fact into consideration, till an eventual problem occurs. The common
pitfalls that threaten database security are:
Weak User Account Settings: Many of the database user accounts do not
contain the user settings that may be found in operating system environments.
For example, the user accounts name and passwords, which are commonly
known, are not disabled or modified to prevent access.
thus, is not acceptable. Thus, the first step in this direction would be to reject
any query that directly asks for sensitive information that is hidden. But, how
about a sequence of queries that are raised for the purpose of statistics? For
example, we may be able to determine the average marks obtained in a class of
50 student, but if only 2 students have opted for a subject then the first student
who knows his/her marks can find the marks of the other student by issuing
the average marks query. Thus, statistical queries should be permitted only
when some minimum number of records satisfies a condition. Thus, the overall
objectives are to make sure that security is not compromised.
(c) What do you mean by deadlock? How can we prevent them? Write
an algorithm that checks whether the concurrently executing
transactions are in deadlock.
Refer to Chapter-5, Q.No.-2
Q3. (a) What is data mining? How is it different from OLTP? What is
classification in context of data mining?
Refer to Chapter-6, Q.No.-50, Q.No.-53 and Q.No.-55
marts are also called dependent data marts and may be the subsets of larger
data warehouses.
A data mart is like a data warehouse and contains operational data that helps in
making strategic decisions in an organisation. The only difference between
the two is that data marts are created for a certain limited predefined application.
Even in a data mart, the data is huge and from several operational systems,
therefore, they also need a multinational data model. In fact, the star schema is
also one of the popular schema choices for a data mart.
(c) What is XML? How is it different from HTML? What are the
advantages of XML? Create an XML schema for list of students and
their marks.
Refer to Dec-2006, Q.No.-1(viii) and Chapter-6, Q.No.-22
<?xml version = “1.0” encoding = “VTF-8” standalone = “Type”?>
<!DOCTYPE document [
<!ELEMENT document (student) *>
<!ELEMENT Student (name, marks)>
<!ELEMENT NAME (# PCDATA)>
<!ELEMENT Marks (# PCDATA))]>
<document>
<student>
<name> XYZ </name>
<marks> 45 </marks>
</student>
</document>
A relation R is in 5NF if for all join dependencies at least one of the following
holds:
(a) (R1, R2, ..., Rn) is a trivial join-dependency (that is, one of Ri is R)
(b) Every Ri is a candidate key for R.
(b) Develop a query plan for the following query and compute its cost:
‘SALARY > 40000 (EMPLOYEE = DEPARTMENT)’
Make suitable assumptions of your own about the relation schema as
well as the database statistics.
Ans. Assumption
Size of Employee relation = 1000 tuples
Size of Department relation = 400 tuples
No. of employees having salary > 4000 = 100
Note: Question number 1 is compulsory. Answer any three questions from the
rest.
T1 T2 T3
Lock-X(X) Lock-S(X) Lock-S(X)
Read (X) Read (X) Read (X)
Y:=X
X = X – 1000 display (X) display (X)
Write (X) Unlock (X) Unlock (X)
Unlock (X)
MCS-043 GPH Book 235
SSN
Gender
FACULTY
M STUDENT Programme
Department M
Teach
N Name
COURSE
N
Code
(c) Define Simple Hash-Join and explain the process and cost calculation
of Hash-Join with the help of example.
Make suitable assumptions of your own about the relation schema as
well as the database statistics.
Refer to June-2008, Q.No.-1(d)
(g) What is ETL? What are different transformations that are needed
during the ETL Process?
Refer to Dec-2008, Q.No.-1(b)
(h) Given the following semi structured data in XML create the DTD
(Document Type Declaration) for it:
<document>
<employee>
<Name> Ramesh Jain </Name>
<Address> H-1, 25, Delhi </Address>
<Address> B-1, New office, Delhi </Address>
</employee>
<employee>
<Name> Anuj </Name>
<Address> 25, Gurgaon, Haryana </Address>
MCS-043 GPH Book 237
</employee>
</document>
Ans. <?xml version = “1.0” encoding = “UTF – 8” standalone = “Yes”?>
<!DOCUMENT document [
<!ELEMENT document (employee) *>
<!ELEMENT employee (name, address *)>
<!ELEMENT name (# PCDATA)>
<!ELEMENT address (# PCDATA)>]>
Q2. (a) What is data mail and how it is different from dataware house?
Ans. The basic constructs used to design a data warehouse and a data mart
are the same. However, a Data Warehouse is designed for the enterprise level,
while Data Marts may be designed for a business division/department level. A
data mart contains the required subject specific data for local analysis only.
(c) State the main disadvantage of basic time stamp method for
concurrency control. How can you overcome this advantage?
Refer to Dec-2006, Q.No.-1(vi)
(d) How audit trails is done in data base? How are they related to database
security?
Refer to Chapter-5, Q.No.-60
238 GullyBaba Publishing House(P) Ltd.
Q3. (a) What were the limitations of relational databases and why there
was a need for extending them to object-object databases? Consider we
want to represent the information of a book comprising of ISBNNO,
TITLE, CATEGORY (such as text, Reference, Journal) and list of
AUTHORS create the database structure for:
(i) OODB using ODL (Object Definition Language)
(ii) an object relation database of Book.
Ans. Limitations of Relational Databases: Relational database technology
was not able to handle complex application systems such as Computer Aided
Design (CAD), Computer Aided Manufacturing (CAM) and Computer
Integrated Manufacturing (CIM), Computer Aided Software Engineering
(CASE) etc. The limitation for relational databases is that, they have been
designed to represent entities and relationship in the form of two-dimensional
tables. Any complex interrelationship like, multi-valued attributes or composite
attribute may result in the decomposition of a table into several tables, similarly,
complex interrelationships result in a number of tables being created. Thus,
the main asset of relational databases viz., its simplicity for such applications,
is also one of its weaknesses, in the case of complex applications.
allows program to use standard SQL commands for accessing database. Thus,
we need not master the typical interface of any specific DBMS.
For implementing ODBC in a system the following components will be required:
• the applications themselves,
• a core ODBC library, and
• the database drives for ODBC.
(c) What are various mechanism to deal with Dead lock? Explain any
one with the help of example.
Refer to Chapter-5, Q.No.-21
Q4. (a) Explain MVD (Multi valued dependency) and Join Dependency
with the help of an example each. Given the relation R{ABCDE} with
FDS
{A → BCDE, B → ACDE, C → ABDE}.
Where the join dependencies for R? Give thelosslessjoin decomposition
for R.
Ans. Multivalued dependency: The multivalued dependency X →→ Y is
said to hold for a relation R(X, Y, Z) if, for a given set of value (set of values
if X is more than one attribute) for attribute X, there is a set of (zero or more)
associated values for the set of attributes Y and the Y values depend only on X
values and have no dependence on the set of attributes Z.
Please note that whenever X →→ Y holds, so does X →→ Z since the role
of the attributes Y and Z is symmetrical.
The fifth normal form deals with join-dependencies, which is a generalisations
240 GullyBaba Publishing House(P) Ltd.
of the MVD. The aim of fifth normal form is to have relations that cannot be
decomposed further. A relation in 5NF cannot be constructed from several
smaller relations.
A relation R satisfies join dependency *(R1, R2, ..., Rn) if and only if R is equal
to the join of R1, R2, ..., Rn where Ri are subsets of the set of attributes of R.
A relation R is in 5NF if for all join dependencies at least one of the following
holds:
(a) (R1, R2, ..., Rn) is a trivial join-dependency (that is, one of Ri is R)
(b) Every Ri is a candidate key for R.
Join Dependency is B → C
R1 (ABDE)
R2 (BC)
R3 (CA)
Q5. Explain the following with the help of am example diagram, if any:
(a) Semantic databases
Refer to June-2008, Q.No.-2(c)
Note: Question number 1 is compulsory. Answer any three questions from the
rest.
(b) Two-phase locking protocol uses waiting, time stamping and optimistic methods
use restacting, to avoid nonserializable execution. Justify the statement.
Ans. This protocol manages concurrent execution in such a manner that the
timestamps determine the serialisability order. Each timestamp based transaction
is issued a timestamp when it enters the system. If an old transaction Ti has
timestamp Ts(Ti), a new transaction Tj is assigned timestamp Ts(Tj) such
that Ts(Ti)<Ts<(Tj). There are two timestamp date(q) values.
W-timestamp (q) is the largest timestamp of any transaction that executed
write q successfully.
(243)
244 GullyBaba Publishing House(P) Ltd.
R-times (q) is the largest time-stamp of any transaction that executed read (q)
successfully. Serialisability order is determined by a timestamp given at the
time of validation to increase concurrency. Thus TS(Ti) is given the value of
validation (TJ).
(c) What is the difference between document type definition and XML
schema? Explain using an example.
Ans. Both DTD and schemas are document definition languages, schema are
written in XML, while DTDs use EBNF (Extended Bakus Naur Format) notation
Thus, schema are extensible as they are written in XML. They are also easy to
read, write and define.
DTD provide the capability for validation of the following:
Element nesting
Element occurrence constraints
Permitted attributes
Attribute types and default values.
However DTD do not provide control over the format and data types of element
and attributes value; e.g. once an element or attribute has been declared to
contain character data, no limits may be placed on the length, type, or format
of that content.
XML standard includes
Simple and complex data types
Type derivation and inheritance
Element occurrence constraints
Name-space-aware elements and attribute declaration.
XML can use simple data types for passed character data and attribute values,
and can also enforce specific rules on the contents of elements and attributes
than DTDs.
normal form, also known as project-join normal form, because it can be proven
that if you decompose a scheme R in tables R1 to Rn, the decomposition will
be a lossless-join decomposition if you restrict the legal relations on R to a join
dependency on R called * (R1,R2,...Rn). It is to say that the set of relationships
in the join dependency is independent of each other.
Example : Given a pizza-chain that models purchases in table Customer = {
order-number, customer-name, pizza-name, delivery-boy }. It is obvious that
you can derive the following relations:
customer-name depends on order-number
pizza-name depends on order-number
delivery-boy depends on order-number
Since the relationships are independent you can say there is a join dependency
as follows: *((order-number, customer-name), (order-number, pizza-name),
(order-number,delivery-boy)).
If each customer has his own delivery-boy however, you could have a join-
dependency like this: *((order-number, customer-name), (order-number,
delivery-boy), (customer-name, delivery-boy), (order-number,pizza-name)),
but *((order-number, customer-name, delivery-boy), (order-number,pizza-
name)) would be valid as well. This makes it obvious that just having a join
dependency is not enough to normalize a database scheme.
Let R be a relation schema and let R1,R2,...,Rn be a decomposition of R.
The relation r(R) satisfies the join dependency * (R1,R2,...Rn) if .
A join dependency is trivial if one of the Ri is R itself.
Minimum_eligibility
Date_of Start
Address
Fee
PROGRAMME
INSTITUTE
Associated
Location
Type
with
P_code
Name
M Register
for
Date_of Birth
Address
M
STUDENT
Gender
Name
Phone_no
Enrol_no
(h) What are triggers and what is their use? Explain with the help of an example.
Refer to Q.No.-36, Page No.-92
Q2. (a) Why is query expressed in relational algebra preferred over
query expressed in SQL in query optimization. Explain this by taking a
suitable example.
Ans. Query expressed in relational algebra preferred over query
expressed in SQL in query optimization because :
1) It is more procedural, because algebraic expressions also give the order of
application of operations in the computation of the query; thus, it is a more
appropriate model for query optimization and system implementation.
2) Many existing approaches to query optimization and equivalence are (or
can easily be) expressed in relational algebra.
3) Relational algebra uses relations (or sets) as operands, while calculus is
based on tuple variables. For the optimization of queries, particularly with
distributed environments or special-purpose database machines, set-oriented
models are more appropriate than tuple-oriented models.
Example : Let us consider the relational schema:
S(S#,NAME), storing supplier's name and number SP(P#, S#), storing supplier
and part numbers for a supply and a typical SQL query with nesting, taken
from Date's book [3, sect. 7.2.14]. The query retrieves the name of suppliers
who do not supply product 'P2':
SELECT NAME FROM S WHERE 'P2' * ALL SELECT P# FROM SP WHERE
S#=S. S#
Notice that the clause ALL is used to indicate that only those suppliers should
be retrieved for which there is no supply tuple with P# equal to P2; if the ALL
clause were omitted, then the query would retrieve suppliers having at least
one supply tuple with P# different than P2. The naming transformation produces
the following query, in which attribute names are extended with relation names:
SELECT S.NAME FROM S WHERE 'P2' * ALL SELECT SP.P# FROM SP
WHERE SP.S#=S.S#
This query can be parsed using EG; the preprocessing produces as result the
equivalent query (without the ALL keyword):
(SELECT S.NAME FROM S MINUS SELECT S.NAME FROM S WHERE
'P2' = SELECT SP.P# FROM SP WHERE SP.S#=S.S#)
This query can be parsed using RG; thus, its meaning can be evaluated. We
anticipate the notation for some algebraic operations:
PJ [A ] (projection over a set A of attributes)
SL[p] (selection of tuples satisfying predicate p)
CP (Cartesian product) DF (difference)
JN[jp] (oin with join predicate jp). We use a prefix notation for unary operations
248 GullyBaba Publishing House(P) Ltd.
(e.g., PJ and SL), and an infix notation for binary operations (e.g., CP, DF,
and JN). The meaning evaluation produces the following expression:
(PJ[S.SNAME] S)DF
(PJ [S.SNAME] SL[SP.P#= 'P2']
(PJ[SP.P#,S.NAME,S. S#] SL[SP.S#=S. S#; (S CP SP)))
Finally, the postprocessing step is applied to the above expression. The
postprocessing eliminates useless projections and transforms selections over
Cartesian products into joins, giving:
(PJ [S. SNAME] S) DF
(PJ [S.SNAME] ((SL[SP.P#= 'P2'] S) JN [SP. S#=S. S#] SP))
(b) Determine all 4NF violations for the relation schema R(X, Y, Z, W)
with multivalued dependencies X → → Y and X → → Z. decompose the
relation into 4NF.
Ans. A relation schema R is in 4NF with respect to a set D of functional and
multivalved dependencies if for all multivalued dependencies in D of the form
a → → b where acR and bcR. At last one of the following hold:
a → → b is trival (ie) bca or a aub = R)
a is a superkey for schema R. Since here in example there is no functional
dependencies as the only key or all four attributes x, y,z,w. Thus each of the
non trivial multivalued dependencies x → → y and x → → z voilates 4NF..
We must separate out the attributes of these dependencies, first decomposing
into x,y and xzw. And then further decomposing the latter into xz and xw
because x → → z is still a 4NF violation for xzw. xy, xz and xw. Also we know
that a relation is in 4NF only if it is in BCNF and there are no true MVDs.
(c) The 3-phase commit protocol increases the system’s availability and
doesn’t allow transaction to remain blocked until a failure is repaired.
Justify the statement.
Ans. Refer to Ch–5, Q.No.-73, Page No.-111
The disadvantage with 2PC was that it can block participants in certain
circumstances. For example process that encounter a timeout after voting
COMMIT, but before receiving the global commit or abort message from the coordinator,
are awaiting for the message and doing nothing or in other words are blocked.
But in case of 3PC it does not block the participants on site failures, except for the
case when all sites fail. Thus the basic conditions that this protocol requires are:
No network partitioning
At least one available site
At most k failed sites (called K-resilent)
The basic objective for 3PC is to remove the blocking period for the participant
who have voted commit, and are thus, waiting for the global abort/commit message
MCS-043 GPH Book 249
from the coordinator. This objective is met by adding another phase-called pre-
commit is introduced between the voting phase and global decision phase. If a
coordinator receives all the commit votes from the participants, it issues a global
pre-commit message from the participant. A participants on receiving the global
pre-commit messages knows that this transaction is going to commit definitely.
The coordinator on receiving the acknowledgement of pre-commit message
from all the participants issues the global COMMIT. A Global ABORT is still
handled the same way as that of in 2 PC.
Thus, commit protocols ensure that DDBMS are in consistent state after the
execution of a transaction.
START START
Prepare Prepare
Wait
For Ready
One Vote
Abort All Global
vote commit-vote Abort Pre-commit
Global
Pre-commit Abort Pre-commit
Abort
Acknowledgment
received
Commit Commit
Coordinator Participants
(2) Explicit cursor : These cursors are declared by the user in embedded SqL
explicitly. Every cursor will have to be defined like:
Declare
Open
Close
For example :
Create cursor C1 is
Select sname, enrolment, course From STUDENT; To Open cursor
Open C1.
Q3. (a) Create an object oriented database using ODL for the following
scheme. Make suitable assumptions about the attributes.
BANK
ACCTS
ACCOUNT
M
A-C
N
CUSTOMER
Answer the following query using OQL. List all the account of a customer
whose name is ‘Q’ “XYZ”.
Ans. The ODL for the following scheme is as follows after taking consideration
into the following assumption.
MCS-043 GPH Book 251
OQL:
Select C.Acc_No from customer C Where C. Name = ‘Q’ Or e.Name = ‘xyz’;
This Query is also written in the form of relationship as
Select C.Acc_No From customer C where c.open by. Name =’Q’ Or c.openby.
Name = ‘xyz’;
Table F
Table D Table E
Table A,D,F constitute a linear hierarchy. Table CEF form another Linear Join
hierarchy.
In a Linear Join, each pair of tables represent a one-to-many relationship,
which the lower table of the pair in the “one” side and the higher table of the
pair is the “many” side. Linear Join hierarchies can rely on any of the underlying
Join conditions. Key Join, natural join, or On clause Join.
(c) Explain the “Deferred database modification” approach of log-based
recovery.
Ans. The deferred database modification scheme records all the modification
to the Log, but defers all the writes to after partial commit. Let us assume that
transaction executes serially :
A transaction starts by writing <Ti start> record to Log. A write (x) operation
results in a log record <Ti,X,V> being written, where V is the new value of X.
The write is not performed on x at this time, but is defined. When Ti partially
commits, <Ti commit> is written to the Log. Finally, the log records are read
and used to actually execute the previously deferred writes. During recovery
after a crash, a transaction needs to be redone if both <Ti Start> and <Ti commit>
are there in the log. Redoing a transaction Ti (redoTi) sets the value of all data
item updated by the transaction to the new values. Crashes can occur while
the transaction is executing the original update or
while recovery action is being taken.
MCS-043 GPH Book 253
(b) What are views and what is their significance. How views are created
in SQL explain using one example.
Refer to Ch–4, (View Definition), Page No.-49, and 50
Q5. (a) Differentiate between star scheme and snowflake scheme using
the same example.
Ans. A multidimensional storage model contains two types of tables : The
dimension tables and the fact table. The dimension tables have tuples of
dimension attributes, whereas the fact tables have one tuple each for a rewarded
fact. In order to relate a fact to a dimension, we may have to use pointers. Let
us demonstrate this with the help of an example.
Consider the University data warehouse where one of the data tables is the
student enrolment table. The three dimension in such a case would be
Year
Programme
Region.
The star schema for such a data is
254 GullyBaba Publishing House(P) Ltd.
Rcode
Dimension Table: Rcname
Region
Rcaddress
Rcphone
A star schema
Each dimension tables is a table for a single dimensions only and that is why
this schema is known as star schema.
SNOFLAKE Schema : A snowflake schema has normalized but hierarchical
dimensional tables. E.g. in star schema by Region dimension table, the value of
the field Rcphone is multivalued, the Region dimension table is not normalized.
Thus, we can create a snowflake schema for such situation as follows:
Dimension Table: Fact Table: Dimension Table:
Programme Enrolment Year
Rcode Rcphone
Dimension Table:
Region Rcname
Rcaddress
MCS-043 GPH Book 255
Data ware housing storage can also utilise indexing to support high performance
access. Dimensional data can be indexed in star schema to tuples in the fact
table by using a join index.
(d) What are deadlocks? How are they detected? Explain with the help
of an example.
Refer to Ch–5, Q.No.-21, Page No.-79
MCS – 43: ADVANCE DATABASE DESIGN
June, 2010
Note : Question number 1 is compulsory. Answer any three questions from the
rest.
Q1. (a) The ABC Bank offers five types of Accounts : loan, checking,
savings, daily interest saving and money market. It operates a number
of branches within the country. A client of the bank can have any number
of accounts. Accounts can be self or a joint account.
(i) Draw an EER diagram for the ABC bank identifying various entities,
attributes and cardinality. Show meaningful relationship that exists
among the entities.
Ans. Entities in the given problem will be Account, Branch, and customer.
Attributes of Account(entity) are Account-number, Type, open-date, opening-
balance. For Branch entity Branch-code, Branch-contact, Branch-Address.
For customer entity SSN, Name, Address etc. In Addition to these there can
be worker entity.
Bank operates number of branches in a city. It also keeps track of branch
details. It maintains branch information like branch-code, branch-address,
branch-contact, branch-city. Branch code is unique for all branches. So branch
is a strong entity.
A Branch can have many accounts while an Account is opened in one Branch
only. Double lines indicates each account entity must have been opened in
branch.
(256)
MCS-043 GPH Book 257
258 GullyBaba Publishing House(P) Ltd.
BRANCH
Branch-code: Primary key Brach-contact Branch-Address
CUSTOMER
SSN: Primary key Name Address
Here between the table Branch and ACCOUNT there will be I:N relationship
because a Branch can open many accounts. Branch-code is primary key in the
branch table where as Account-number is the primary key in the Account
table. In customer table SSN will be primary key because a particular SSN
can be applied to only one people.
An addition to these there will be some more tables like LOAN, SAVING, D-I-
SAVING, MONEY MARKET, CHECKING ETC.
LOAN
Loan-no: PK Account-no: PK Loan-amount
Here in this table both the Loan-no and Account-no will form composite Primary
key and act as Primary key.
SAVING
MINIMUM-Balance Interest-rate Cheque-book range
MONEY MARKET
Transaction-id Amount Period
DAILY-INTEREST-SAVING
G
Tree based protocol:
Transaction T1 Transaction T2 Transaction T3
Lock A Lock B Lock A
Lock B Lock D Lock C
Lock E Lock G Lock F
Unlock A Unlock B Unlock A
Unlock B Unlock D Unlock C
Unlock E Unlock G Unlock F
The tree protocol ensures conflict serialisability as well as freedom from deadlock
for these transactions. Please note that transaction T3 will get A only after it is
set free by Transaction T1 . Similarly, T2 will get B when it is set free by T1 .
Unlocking may occur earlier in the tree-locking protocol than in the two-phase
locking protocol. Thus tree protocol is a protocol with:-
Shorter waiting time and increased concurrency
That is deadlock free and does not require roll back.
Where aborting a transaction can still lead to cascading roll backs.
However, in the tree-locking protocol a transaction may have to lock data time
that is does not access. Thus it has:
Increased locking overhead, and additional waiting time.
Potential decrease in concurrency.
(ii) Timestamp-Based Protocol.
Ans. Time stamp-based Protocols:- Each transaction is issued a timestamp
when it enters the system. If an old transaction Ti has time-stamp Ts(Ti), a
new transaction Tj is assigned time-stamp Ts(Tj) such that Ts(Ti) < Ts(Tj).
This protocol manages concurrent execution in such a manner that the time
260 GullyBaba Publishing House(P) Ltd.
stamp determine the serialisability order. In order to assure such behaviour the
protocols need to maintain for each data Q two timestamp values.
W-timestamp(Q) is the largest time-stamp of any transaction that executed
write (Q) successfully.
R-time-stamp(Q) is the largest time-stamp of any transaction that executed
read (Q) successfully.
The timestamp ordering protocol executes any conflicting read and write
operations an timestamp order.
The timestamp ordering protocol guarantees serialisability if all the arcs in the
precedence graph of the form as shown in following figure
(c) With he help of a process diagram, explain the various tasks involved
in the Knowledge Discovery in Databases (KDD) process.
Refer to Q.No.-4(a), Page No.-207
(d) Explain the role of ODBC and JDBC with the help of an example.
Ans. Refer to Page No.-179
Refer to Q.No.-3(b)(i), Page No.-190
(e) Is the following XML document well formed? Justify your answer :
<?xml version = “1.0” standalone= “yes” ?>
< employee>
< name > Amit </ name >
< position > Professor </ position>
</ employee >
< employee >
< name > Sumit >/name >
< position > Reader </ position >
</ employee >
Ans. No, the given XML document is not well formed. Here the tag opens at
the beginning and closes at the end of the beginning statement as a well-formed
XML document, these must be one root element that contains all the others.
Q2. (a) What are multimedia databases (MMDBs)? List some of the
applications of MMDBs. Describe various contents of MMDBs. Also,
mention the challenges in designing of MMDBs.
Refer to Ch–7, Q.No.-1, 2, 3 and 4, Page No.-140
MCS-043 GPH Book 261
Q3. (a) With the help of a diagram, explain the reference architecture
of Distributed DBMS. How is this different from component Architecture
of DDBMS?
Ans. Refer to Dec-2006, Q.No.-2(i), Page No.-168 and;
Component architecture of DDBMS: Component Architecture of DDBMS
From the viewpoint of commercial implementation a DDBMS would require
client server implementation with the backbone of a network. Thus, such an
architecture has the following components which are not in reference architecture:
Local DBMS Component (Clients and Servers): At a local database site a
DDBMS communicates with a local standard DBMS. This site therefore can
have its own client server architecture. A computer on such a site (also referred
to as a node) may be a client or server depending on its role at that movement
of time.
Data Communications Component: This communication should be
established with remote clients may not be connected to a server directly. It is
the data communication component that enables direct or indirect
communication among various sites. For example, in the Figure 2 the transactions
that are shown occurring on Site 1 will have direct connection to Site 1, but they
will have an indirect communication to Site 2, to access the local sales data of that
site. Thus, a network is necessary for a distributed database system.
262 GullyBaba Publishing House(P) Ltd.
Site 1 Site 2
Database Server Database Server
Distribution Distribution
Schema Schema
Network
Database Link
Connected to....
Identified by.....
Department Employee
HQ Sites
Table Table
Database Database
Transaction
Transaction
Find Total
Read from all sites employee@sales
Add to Total
Commit;
Thus, we have clients, servers and the network in the distributed databases.
Global System Catalogue: One of the ways of keeping information about
data distribution is through the system catalogue of the DBMSs. In a DDBMS
we need to keep a global system catalogue. This catalogue itself might be
distributed in implementation. In addition to basic information about the databases
it should keep information on data fragmentation and replication schema.
Database Links: A database in a distributed database is distinct from the
other databases in the system. Each of these distributed databases may have
their own global database name. All these database components would also
have certain local names.
Q3. (b) Explain the following two ways to implement the object-oriented
concepts in DBMS :
(i) To extend the existing RDBMS to include object orientation.
Ans. The RDBMS technology has been enhanced over the period of last
two decades. The RDBMS are based on the theory of relations and thus
are developed on the basis of proven mathematical background. Hence,
they can be proved to be working correctly. Thus, it may be a good idea to
include the concepts of object orientation so that, they are able to support
object-oriented technologies too. The first two concepts that were added
MCS-043 GPH Book 263
include the concept of complex types, inheritance, and some newer types
such as multisets and arrays. One of the key concerns in object-relational
database are the storage of tables that would be needed to represent inherited
tables, and representation for the newer types.
One of the ways of representing inherited tables may be to store the inher-
ited primary key attributes along with the locally defined attributes. In such
a case, to construct the complete details for the table, you need to take a
join between the inherited table and the base class table.
The second possibility here would be, to allow the data to be stored in all the
inherited as well as base tables. However, such a case will result in data
replication. Also, you may find it difficult at the time of data insertion.
As far as arrays are concerned, since they have a fixed size their imple-
mentation is straight forward However, the cases for the multiset would
desire to follow the principle of normalisation in order to create a separate
table which can be joined with the base table as and when required.
(ii) To create a new DBMS that is exclusively devoted to OODBMS.
Ans. The database system consists of persistent data. To manipulate that
data one must either use data manipulation commands or a host language
like C using embedded command. However, a persistent language would
require a seamless integration of language and persistent data.
Please note: The embedded language requires a lot many steps for the
transfer of data from the database to local variables and vice-versa. The
question is, can we implement an object oriented language such as C++ and
Java to handle persistent data? Well a persistent object-orientation would
need to address some of the following issues:
Object persistence: A practical approach for declaring a persistent ob-
ject would be to design a construct that declares an object as persistent.
The difficulty with this approach is that it needs to declare object persis-
tence at the time of creation, An alternative of this approach may be to
mark a persistent object during run time. An interesting approach here would
be that once an object has been marked persistent then all the objects that
are reachable from that object should also be persistent automatically.
Object Identity: All the objects created during the execution of an object
oriented program would be given a system generated object identifier, how-
ever, these identifiers become useless once the program terminates. With
the persistent objects it is necessary that such objects have meaningful object
identifiers. Persistent object identifiers may be implemented using the concept
of persistent pointers that remain valid even after the end of a program.
Storage and access: The data of each persistent object needs to be stored.
One simple approach for this may be to store class member definitions and
the implementation of methods as the database schema. The data of each
object, however, needs to be stored individually along with the schema. A
264 GullyBaba Publishing House(P) Ltd.
database of such objects may require the collection of the persistent point-
ers for all the objects of one database together. Another, more logical way
may be to store the objects as collection types such as sets. Some object
oriented database technologies also define a special collection as class
extent that keeps track of the objects of a defined schema.
Q4. (a) What is a (DW) Data Warehouse? Explain the basic components
of a DW.
Refer to Ch–6, Q.No.-39, Page No.-125
Q4. (b) Consider a Supply Data of an organization having three
dimensions as Supplier, Part and Project. Draw a star schema with supply
as the fact table. Make suitable assumption.
Ans. A multidimensional storage model contains two types of tables i.e. the
dimensional table and the fact table. The dimensional tables have one tuples
each for a recorded fact.
Here in our example of three dimension would be
supplier part project
and the fact table will be suppliers:-
Dimension Table: Fact table: Dimension Table:
Supplier Supply Part
Supplier-no. boucher-no Part-no
Supplier-name Supplier Part-name
Supplier-add Part Part-specification
Project Qty-in-Stock
Qty
Amount Dimension Table:
Project
Project-no
Project-name
The Star schema with supply fact table:- Project-loc
Project-status
(iv) Indexing
Refer to Page No.-150
Q5. (a) With reference to special Databases and GIS explain the following:
(i) Application of Geographic Databases
Ans. The application of the geographic data base can be categorized into three
broad categories. These are:
Cartographic Applications:- Used for analysis of cartographic information
in a number of layers. Some of the basic application in this category would be
to analyze crop yields, imigration, facility planning, evaluation of land use
facility and landscape management, traffic monitoring system etc. These
applications need to store data as per the required applications.
3-D Digital Modeling Applications:- Such applications store information about
the digital representation of land, and elevations of ports of earth surface at sample
points. Then, a surface model is fitted in using the interpolation and visualization
techniques-such models are very useful in each science oriented studies.
The third kind of application of such information system is using the geographic
objects applications. Such applications are required to store additional
information about various regions or objects.
(ii) Requirements of GIS
Ans. Requirement of GIS:- The data in GIS needs to be represented in
graphical form, such data would require any of the following formats:
Vector Data:- In such representations, the data is represented using some
geometric objects such as line, square, circle etc. E.g. we can represent a
road using a sequence of line segments.
Raster Data:- Here data is represented using an attribute value for each
pixel or voxel (a three dimensional point) Raster data can be used to represent
three dimensional elevation using a format termed as digital elevation format.
For object related applications a GIS may include a temporal structure that records
information about some movement related detail such as traffic movement.
A GIS must also support the analysis of data. Some of data analysis operation are
(i) analyzing soil erosion (ii) measurement of gradients (iii) computing shortest paths.
(iii) Operations on the data captured in GIS
Ans. Some operations are:- Interpolation for locating elevations at some
intermediate points with reference to same points.
Some operations may be required for data enhancement, smoothing the
data interpreting the terrain etc.
Creating a proximity analysis to determine the distances among the areas of interests.
Performing image enhancement using image processing algorithms for the
raster data.
Performing analysis related to networks of specific type like road network.
266 GullyBaba Publishing House(P) Ltd.
Permit Type
Colour
Fuel Type
Seating Seating
Side Stand Capacity Capacity
(1) The attribute Regn_No, Engine_No, ChesisNo, Make, Model, colour and
No of wheels are common for all types of vehicles.
(2) The attribute side stand is applicable to two wheelers only whether they
have it or not.
(3) The attributes Permit Type, Fuel Type, Air Conditioned are common to all
four wheelers.
(4) Seating capacity varies in vans and buses only,
(c) Explain Hash – Join in the context of query processing with the help
of an example.
Ans. This physical operator supports the hash join algorithm. hash join may
consume more runtime resources, but is valuable when the joining columns
do not have useful indexes or when a relatively large number of rows satisfy
the join condition, compared to the product of the number of rows in the
joined tables.
For example: Hash join assign each tuple of Employee and of Department to a
“bucket” by applying a hash function to its WorkDept (DeptNo) value. Within
each bucket, look for Employee/Department tuple pairs for which WorkDept
= DeptNo.
(d) Explain the role of checkpoints in log-based recovery with the help
of an example.
Refer to Ch–5, Q.No.-32, Page No.-89
Locking Mechanism
Implicit Explicit
Share
Shared Exclusive
update
Lock Lock
Lock
(iv)Indexes.
Refer to Ch–7, Page No.-150[Indexes]
Q2. (a) Explain the basic framework for mobile computing and discuss
the characteristics of mobile databases. Also mention the challenges of
mobile computing.
Ans. Mobile Computing is primarily a distributed architecture. This basic
architecture is shown below:
Mobile
units
Host Hosts
Mobile
Support Hosts
Station
Wired LAN
Mobile units
(4) Changing location of the client: The wireless client is expected to move
from a present mobile support station to an other mobile station where the
device has been moved. Thus, in general, the topology of such networks will
keep on changing and the place where the data is requested also changes. This
would require implementation of dynamic routing protocols.
However, some of the challenges for mobile computing are:
(i) Scalability: As the number of stations increase, latency increases. Thus,
the time for servicing the client also increases. The results in increase in latency,
thus more problems are created for data consistency.
The solution: Do data broadcasting from many mobile stations thus, making the
most recent information available to all, thus eliminating the enough latency time.
(ii) Data Mobile problem: Client locations keeps on changing in such networks
thus, keeping track of the location of the client is important for, the data server
and data should be made available to the client from the server which is minimum
latency way from the client.
(b) Explain the role of UML diagrams in database design with the help of
examples.
Refer to June–2008, Q.No.- 1(c), Page No.-213
Q3. (a) State 3NF and BCNF. Explain their differences with the help of
examples.
Ans.Formal Definition: A relation is in Boyce/Codd normal form (BCNF) if
and only if every determinant is a candidate key. [A determinant is any attribute
on which some other attribute is (fully) functionally dependent.]
Boyce-Codd normal form is stricter than 3NF, meaning that every relation in
BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF. A
relation schema is an BCNF if whenever a functional dependency X->A holds
in the relation, then X is a superkey of the relation. The only difference between
BCNF and 3NF is that condition of 3NF (i.e 3NF also require prime attribute),
which allows A to be prime if X is not a superkey, is absent from BCNF.
Consider, as an example, the relation Professor:
Professor (Professor code, Dept., Head of Dept., Parent time)
Assuming that
1. The percentage of the time spent in each department is given.
2. A professor can work in more than one department
3. Each department has only one Head of Department
272 GullyBaba Publishing House (P) Ltd.
The relationship diagram for the above relation is given below. The two possible
composite keys are professor code and Dept. or Professor code and Head of
Department. Observe that department as well as Head of Department are not
non-key attributes. They are a part of composite key.
Head of
Department
Department
Professor code
Percent time
Head of Department
Department
P1 Mathematics Krishnan 50
P2 Chemistry Rao 25
P2 Physics Dinesh 75
P3 Mathematics Krishnan 100
The relation given in above table is in 3NF. Observe, however, the names of
Dept. and Head of Dept. are duplicated. Further, if professor P2 resigns,
rows 3 and 4 are deleted. We lose the information that Rao is the Head of
Department in Chemistry.
The normalization of the relation is done by creating a new relation for Dept.
and Head of Dept. and deleting Head of Dept. from Professor relation. The
normalized relations are shown in the following table.
(a)
Professor Departme- Percent
Code nt time
P1 Physics 50
Mathemat
P1 ics 50
P2 Chemistry 25
P2 Physics 75
Mathemat
P3 ics 100
MCS-043 GPH Book 273
(b)
Department Head of Dept.
Physics Dinesh
Mathematics Krishnan
Chemistry Rao
And the dependency diagrams for these new relations in figure. The
dependency diagram gives the important clue to this normalization step as is
clear from the figure below.
Department
Percent time
Professor code
(c) Define a trigger CHECK_KEY that does not allow duplicate values
in attribute EmpNo. in Employee table.
Ans. Create trigger check_key before Insert on Employee for each row
Declare
c integer:
Begin
Select count (Emp. No) into c From Employee where EmpNo = : new.
Emp_No;
If c>0 Then raise_application_error (-22000, ‘Duplicate Key value’);
Endif
end;
(d) Explain the following in the context of data mixing with the help of
an example.
274 GullyBaba Publishing House (P) Ltd.
(i) Classification.
Refer to Ch–6 Q.No.-55, Page No.-134
(ii) Clustering.
Refer to Ch–6, Q.No.-56, Page No.-134
XM L choose
Flow Control when
otherwise
forE ach
if
transform
Transformation para m
The XML tags use XPath as a local expression language; XPath expressions
are always specified using attribute select. This means that only values specified
for select attributes are evaluated using the XPath expression language. All
other attributes are evaluated using the rules associated with the global
expression language.
of that type, above and beyond the basic syntactical constraints imposed by
XML itself. These constraints are generally expressed using some combination
of grammatical rules governing the order of elements, Boolean predicates that
the content must satisfy, data types governing the content of elements and
attributes, and more specialized rules such as uniqueness and referential integrity
constraints. There are languages developed specifically to express XML
schemas. The Document Type Definition (DTD) language, which is native to
the XML specification, is a schema language that is of relatively limited capability,
but that also has other uses in XML aside from the expression of schemas.
Two more expressive XML schema languages in widespread use are XML
Schema (with a capital S) and RELAX NG.
(iv) X Query.
Refer to Ch–6, Q.No.-38, Page No.-125
Measures Dimensions
Dimension Hierarchies
Attributes Levels
suite of the GNOME desktop. The project aims to provide a free unified data
access architecture to the GNOME project for all Unix platforms. GNOME-
DB is useful for any application that accesses persistent data (not only
databases, but data), since it now contains a data management API. GNOME-
DB is useful for any application that accesses persistent data (not only
databases, but data), since it now contains a pretty good data management
API. GNOME-DB’s production corresponds to the Libgda library which is
mainly a database and data abstraction layer, and includes a GTK+ based UI
extension, and some graphical tools.
(c) Domain Key Normal Form (DKNF).
Ans. We can also always define stricter forms that take into account additional
types of dependencies and constraints. The idea behind domain-key normal
form is to specify, (theoretically, at least) the “ultimate normal form” that
takes into account all possible dependencies and constraints. A relation is said
to be in DKNF if all constraints and dependencies that should hold on the
relation can be enforced simply by enforcing the domain constraints and the
key constraints specified on the relation.
For a relation in DKNF, it becomes very straightforward to enforce the
constraints by simply checking that each attribute value in a tuple is of the
appropriate domain and that every key constraint on the relation is enforced.
However, it seems unlikely that complex constraints can be included in a
DKNF relation; hence, its practical utility is limited.
(d) Log - Based Recovery algorithm.
Refer to Ch–5, Q.No.- 29, Page No.-85
MCS – 43: ADVANCED DATABASE DESIGN
June, 2011
Note: Question number one is compulsory. Answer any three questions from
the rest.
Q1. (a) Which MVDs (multivalued dependency) hold for the following
table:
P-No. Colour Size
P1 C1 S1
P1 C2 S1
P1 C1 S2
P1 C2 S2
P1 C1 S3
P1 C2 S3
P2 C3 S1
P2 C3 S3
(277)
278 GullyBaba Publishing House (P) Ltd.
M N M N
Employee Move Project Uses Item Desc
to
M
Designation
sc Date of
Cost
Purchase
from
has Qty.
N
delivered tested
Manufacturer
Client C Add.
M. No. M Name Address
C.No. C Name
Relational Schema
Employee (EmpNo, Ename, Designation)
Project (P.No, PName, PDesc)
Client (C.No, CName, CAddr)
Item (ItemNo, Name, Desc)
Manufacturer (M-No., MName, Address)
Moveto (EmpNo, P.No)
has (P.No, CNo, delivered, tested)
uses (P.No, ItemNo, M_No)
(e) How does embedded SQL differ from Dynamic SQL? With the help
of an example, describe the implementation of cursors and triggers.
Refer to Ch–4, Q.No.- 10, Page No.-57 & Ch–5, Q.No.-36, Page No.-92
(ii) Get details of those teachers who are conducting practical numbers
P1 to P4.
Ans. πTeacher # (Teacher (Conducts ÷ (πP # (σP =
#> ' P1' Λ P =
# < ' P4 ' (CONDUCTS)) )) �
(b) What are the different types of index in PostgreSQL? Explain each
one of them.
Refer to Page No.-150[Indexes]
(c) What is the difference between document type definition and XML
schema? Explain with an example.
Refer to Dec–2009, Q.No.- 1(c), Page No.-244
Q4. (a) What do you understand by query optimization? What are query
trees? Explain with an example.
Ans. Query tree is an internal representation of an SQL statement where the
282 GullyBaba Publishing House (P) Ltd.
σ Pcode = Code
X P
processes as per the information needs at various end user levels, logical and
physical schema design, etc.
(2) Prototype: A data warehouse is a high cost project, thus, it may be a good
idea to deploy it partially for a select group of decision-makers and database
practitioners in the end user communities. This will help in developing a system
that will be easy to accept and will be mostly as per the user’s requirements.
(3) Deploy: Once the prototype is approved, then the data warehouse can be
put to actual use. A deployed data warehouse comes with the following
processes;
(4) Operation: Once deployed the data warehouse is to be used for day-to-
day operations. The operation in data warehouse includes extracting data,
putting it in database and output of information by DSS.
(5) Enhancement: These are needed with updating of technology, operating
processes, schema improvements, etc. to accommodate the change.
Q5. (a) What are views? How are they implemented? Can views be used
for data manipulation? Explain with the help of an example.
Refer to Ch–4, Q.No.-1-3, Page No.-48
The audit trail to be maintained for audit purposes has some similarity with the
data collected for recovery operations. Thus for each update operation, the
before and after image of the data objects undergoing modification are recorded.
All logons, read operations, and suspect or illegal operations are recorded.
This information can be used to analyze the practice of the users of the database,
detect any attempted violations, help correct errors in design or execution,
and improve the control procedure.
The control of the database starts with the design of the database and the
application programs that will be using the data. One of the first control principles
is that of separating the responsibilities. This can be practiced by assigning
separate teams for the design and implementation of the program and for its
validation and installation.
The integrity control mechanisms should be integrated in the database and the
data entry function should be validated by the application programs.
Note: Question number one is compulsory. Answer any three questions from
the rest.
Q1. (a) Identify the functional dependencies which hold for the following table:
(287)
288 GullyBaba Publishing House (P) Ltd.
age
Name Address
Id Instructor gender
(M, N)
M N
M
takes Instructs teaches
N M
N
age
(N, M) Name (N, M) (N ,M)
Class has Students enrol Subject code
Id M M for N
1
Time Room gender
gets Name Credit
% code
1
Marks
Results
(c) How is the check pointing information used in the recovery operation following
a system crash in DBMS? Explain.
Ans. Checkpoint-Recovery is a common technique for incorporating a program or
system with fault tolerant qualities, and grew from the ideas used in systems which
employ transaction processing. It allows systems to recover after some fault
interrupts the system, and causes the task to fail, or be aborted in some way. While
many systems employ the technique to minimize lost processing time, it can be
used more broadly to tolerate and recover from faults in a critical application or task.
The basic idea behind checkpoint-recover is the saving and restoration of system
state. By saving the current state of the system periodically or before critical code
sections, it provides the baseline information needed for the restoration of lost
state in the event of a system failure. While the cost of checkpoint-recovery can be
high, by using techniques like memory exclusion, and by designing a system to
have as small a critical state as possible may minimize the cost of checkpointing
enough to be useful in even cost sensitive embedded applications.
When a system is checkpointed, the state of the entire system is saved to non-
volatile storage. The checkpointing mechanism takes a snapshot of the system state
and stores the data on some non-volatile storage medium. Clearly, the cost of a
checkpoint will vary with the amount of state required to be saved and the bandwidth
available to the storage mechanism being used to save the state.
(d) Explain the concept of inheritance in object oriented database system, with the
help of an example.
Ans. Inheritance is a way to reuse code of existing objects, or to establish a subtype
from an existing object, or both, depending upon programming language support.
In classical inheritance where objects are defined by classes, classes can inherit
attributes and behavior from pre-existing classes called base classes, superclasses,
parent classes or ancestor classes. The resulting classes are known as derived
classes, subclasses or child classes.
MCS-043 GPH Book 289
Example:
Vehicle Truck Bus
Char* number_plate; int license; int seats;
Char* model; float price;
date data_last_overhual;
date next_overhual;
(e) What are assertions? Explain with an example.
Refer to Ch-4, Q.No.-1, Page No.-48
(f) How can you protect your database from statistical query attacks? Explain.
Ans. One way to protect your database is to not to construct SQL queries as
Strings within application code. If SQL queries are constructed this way, then the
application is taking text directly from a web control on the web page and generating
a SQL statement from it. Good hackers can enter SQL queries within the web control
that will instruct the SQL to perform some task other than what was originally
expected.
(g) Explain Clustering in data mining.
Refer to Ch-6, Q.No.-56, Page No.-134
Q2. (a) Distinguish between the followings with appropriate examples.
(i) Centralized two phase locking and Distributed two phase locking.
Refer to June-2007, Q.No.-2(a)(iii), Page No.-188
(ii) XML and HTML
Refer to June-2007, Q.No.-2(a)(ii), Page No.-188
(b) Consider the following database employee (emp_name, street, city), working
(emp_name, factory, name_salary) factory (factory_name, city) Manager
(emp_name, manager_name) Write the relational algebra expressions for the
following queries.
(i) Find the names, streets and cities of all factory employees who work for factories
Fl and F5 and earn more than 25000.
Refer to Ch-5, Q.No.-11(iv), Page No.-73
(ii) Find all the factory employees who live in the same city as the factory where
they are working.
Refer to Ch-5, Q.No.-11(ii), Page No.-73
(c) With the help of an example explain insertion and deletion of aromatics.
Ans. Insertion Anomaly
The insertion anomaly occurs when a new record is inserted in the relation. In this
anomaly, the user cannot insert a fact about an entity until he has an additional fact
about another entity.
Example: A table which stores records for a company’s sales people & the clients
for whom they are responsible. Because client is a required field - if a newly hired
sales representative must complete several weeks of training before being allowed
to call on clients, it is not possible to record him in the table during training. Or, if we
do add new hires, we must create “dummy” clients as placeholders.
290 GullyBaba Publishing House (P) Ltd.
Deletion Anomaly: The deletion anomaly occurs when a record is deleted from the
relation. In this anomaly, the deletion of facts about an entity automatically deleted
the fact of another entity.
Modification Anomaly: The modification anomaly occurs when the record is updated
in the relation. In this anomaly, the modification in the value of specific attribute
requires modification in all records in which that value occurs.
Example: Let’s say ‘Gully’ takes on a temporary research assignment that requires
her to give up her existing clients for next 6 months. Because we can not delete just
the client value, we are faced with the choice of deleting her record completely from
the sales Rep. table.
Q3. (a) Describe the reference architecture of a distributed DBMS with the help of
a block diagram.
Refer to Dec-2006, Q.No.-2(i), Page No.-168
(b) How does postgre SQL perform storage and indexing of tables? Briefly discuss
the type of indexes involved in postgre SQL.
Refer to Ch-7, Q.No.-27, Page No.-148
(c) What is semi structured data? Explain with an example.
Refer to Ch-6, Q.No.-17, Page No.-120
Example:
• name: Peter Wood
• email: [email protected], [email protected]
• name:
o first name: Mark
o last name: Levene
• email: [email protected]
Q4. (a) Define Hash join and explain the process and cost calculation of Hash join
with the help of an example.
Refer to June-2008, Q.No.-1(d), Page No.-213
(b) Describe two phase commit protocol in distributed databases.
Refer to Ch-5, Q.No.-72, Page No.-111
(c) List the features of semantic database.
Refer to June-2008, Q.No.-2(c), Page No.-219
Q5. (a) Discuss the 5th normal form and domain key normal form with a suitable
example in each.
Refer to Ch-1, Q.No.-10, Page No.-7 & Dec-2010, Q.No.-5(c), Page No.-276
(b) What do you mean by deadlock in DBMS? How can you detect a deadlock?
Suggest a technique that can be used to prevent it.
Refer to Ch-5, Q.No.-24, Page No.-83 & Q.No.-21 , Page No.-79
(c) What are challenges in designing multimedia databases? Discuss.
Refer to Ch-7, Q.No.-4, Page No.-143
MCS – 43: ADVANCED DATABASE DESIGN
June, 2012
Note: Question number one is compulsory. Answer any three questions from
the rest.
Q1. (a) Is the following XML document well formed? Justify your answer:
<?xml version = “1.0” standalone = “yes” ?>
<employee>
<name>Amita</name>
<department>English</department>
</employee>
<employee>
<name>Sumita</name>
<department>Hindi</department>
</employee>
Refer to June-2010, Q.No.-1(e), Page No.-260
(b) Determine all 4NF violations for the relation schema R (X, Y, Z, W) with
multivalued dependencies X →→ Y and X →→ Z. Decompose the relation in to 4NF..
Ans. Fourth normal form (4NF) is a normal form used in database normalization.
Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after
Boyce–Codd normal form (BCNF). Whereas the second, third, and Boyce–Codd
normal forms are concerned with functional dependencies, 4NF is concerned with a
more general type of dependency known as a multivalued dependency. A Table is in
4NF if and only if, for every one of its non-trivial multivalued dependencies
X →→ Y, X is a superkey—that is, X is either a candidate key or a superset.
A multivalued dependency X →→ Y signifies that if we choose any x actually
occurring in the table (call this choice xc), and compile a list of all the xcyz
combinations that occur in the table, we will find that xc is associated with the same
y entries regardless of z.
(c) What are triggers? Explain the significance of triggers with the help of an
example in SQL.
Refer to Ch-5, Q.No.-35, Page No.-92
(d) What is Data Warehousing? Discuss various characteristics of Data
Warehousing.
Refer to Dec-2008, Q.No.-1(b), Page No.-227 & Ch-6, Q.No.-41, Page No.-129
(e) Create an object oriented database (using ODL) for the following class diagram,
(291)
292 GullyBaba Publishing House (P) Ltd.
Ans. Teacher
Name Type ID Code Name Detail
Sita English T01 C01 Hari I year
Mukesh Maths T02 C05 Megha IIIyear
(f) Explain with the help of an example, the log based recovery scheme, using
deferred database modification approach.
Refer to Ch-5, Q.No.-29, Page No.-85
(g) Explain the role of query optimizer in oracle.
Refer to Ch-5, Q.No.-1, Page No.-68
(h) Why do you need 3 Phase Commit (3PC) Protocol in case of distributed
Databases? Explain the 3PC protocol with the help of a diagram.
Refer to Ch-5, Q.No.-73, Page No.-111
Q2. (a) What is multilevel security? What are typical security levels?
Refer to Ch-5, Q.No.-61, Page No.-104 & Q.No.-63 , Page No.-106
(b) How database Queries differ from data mining Queries? Explain the K – means
clustering in data mining algorithm with the help of an example.
Refer to Page No.-132 [Database Processing Vs. Data Mining Processing] & Ch-6,
Q.No.-56, Page No.-134
(c) Consider the following Query
Select student_id, student_name, subject, marks
From STUDENT, RESULT
Where STUDENT. student_id = RESULT.
Student_id
And RESULT. marks > 60
Create a Query evaluation plan for the above Query, assume suitable relations and
statistics.
Ans. Query Evaluation plan:
Student.s_id = Result.s_id
Result.marks>60
Supply Fact
S_id (FK)
Pr_id (FK)
P_id (FK)
Q_ID (FK)
Su_id(PK)
Note: Question number one is compulsory. Answer any three questions from
the rest.
Q1. (a) Given the following semi structured data in XML, create the DTD
(Document Type Declaration) for it:
<DOCUMENT>
<STUDENT>
<NAME> RAJESH </NAME>
<CLASS> X-A </CLASS>
<SCHOOL> K.V; R.K.PURAM,
DELHI</SCHOOL>
</STUDENT>
<STUDENT>
<NAME> SANJAY</NAME>
<CLASS> XI-C </CLASS>
<SCHOOL) GURUKUL, GHAZIBAD
</SCHOOL>
<STUDENT>
</DOCUMENT>
Ans. < ? xml version = “1.0” encoding = “UTF-8” stand alone = “yes” ?>
<! DOCUMENT document [
<! ELEMENT document (student) * >
<! ELEMENT student (name, class, school)>
<! ELEMENT name (# PCDATA) >
<! ELEMENT class (# PCDATA) >
<! ELEMENT school (# PCDATA)>]>
(b) How BCNF is different from 3NF. Explain with the help of an example.
Consider a relation R(A, B, C) with functional dependencies AB → C
and C → A. Decompose the relation ‘R’ into BCNF relations.
Refer to Dec-2010, Q.No.-3(a), Page No.-271 & June-2010,
Q.No-2(c), Page No.-261
(c) What are views and what is their significance? How views are managed
in SQL, explain using an example?
Refer to Ch-4, Page No.-49[View] & Q.No.-3, Page No.-50
(d) What is data mart? How is it different from data warehouse? How the
management of data in a database, is different from the management of
data in a data warehouse. Explain using example.
Ans. Refer to Dec-08, Q.No.- 3(b) Page No.-231, & Ch-6, Q.No.-49 ,
Page No.-131
(295)
296 GullyBaba Publishing House (P) Ltd.
As one of the oldest components associated with computers, the database
management system (DBMS), is a computer software program that is designed
as the means of managing all databases that are currently installed on a system
hard drive or network. Different types of database management systems exist,
with some of them designed for the oversight and proper control of databases
that are configured for specific purposes. Here are some examples of the various
incarnations of DBMS technology that are currently in use, and some of the
basic elements that are parts of DBMS software applications.
Data is the most valuable enterprise asset. A good integrated data management
strategy will enhance an organisation’s ability to develop valuable insights
that provides greater business value. This strategy is driven by the following
needs:
• Need Integrated Data: Disparate data sources lead to information silos
resulting in decision deficiencies
• Need Accurate Data: Lack of data standards lead to data quality issues and
therefore distorted insights
• Need Data Governance: Inadequate definitions, unclear ownership and lack
of standards can lead to inconsistencies in organisational data management.
(e) Consider the following three transactions
T1 T2 T3
Read (x) Read (x) Read (x)
x=x–1000 display (x) Y : = (x)
Write (x) display (x)
Insert shared and exclusive locks in T1, T2 and T3 such that the transactions
when executed concurrently, do not encounter and concurrency related
problem.
Refer to June-2009, Q.No.-1(a), Page No.-234
(f) What is Datagrid? Show typical structure of datagrid. What are the
application areas of datagrid?
Ans. Refer to Ch-7, Q.No.-20, Page No.-146
Applications:
A data grid includes most of relational database capabilities including schema
integration formal conversion of data, distributed query support, etc.
(g) What are Web-databases? How do you create them?
Ans. The term web database can be used in at least two different way:
Definition 1: A web database may be defined as the organised listing of web
pages for a particular topic. Since the number of web pages may be large for a
topic, a web database would require strong indexing.
Definition 2: A web database is a database that can be accessed through the
web.
Now refer to Ch-7, Q.No.-12, Page No.-145
MCS-043 GPH Book 297
(h) Explain shadow paging recovery scheme with the help of diagram.
Refer to June-08, Q.No.-5(a), Page No.-223
Q2. (a) Illustrate using an example the Apriori algorithm for association
rule mining.
Refer to Ch-6, Q.No.-60 & 61, Page No.-138-139
(b) Discuss, how Oracle manages database security?
Refer to Ch-8, Q.No.-24, Page No.-159
(c) Consider the following SQL Query on the employee relation.
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY > (SELECT MAX (SALARY) FROM EMPLOYEE
WHERE DNO = 5)
Derive an execution plan for the Query. Give a measure of Query Cost.
Make suitable assumptions.
Ans.
π LNAME,FNAME
σ DNo=5
Employee
Figure: Query Execution Plan
Q4. (a) What is “tuning a database”? What are the goals of tuning in
relational database system? Why is Query tuning required?
Ans. Database tuning describes a group of activities used to optimise and
homogenise the performance of a database. It usually overlaps with query
tuning, but refers to design of the database files, selection of the database
management system (DBMS), operating system and CPU the DBMS runs on.
The goal is to maximise use of system resources to perform work as efficiently
and rapidly as possible. Most systems are designed to manage work efficiently,
but it is possible to greatly improve performance by customising settings and
the configuration for the database and the DBMS being tuned.
MCS-043 GPH Book 299
Sql Statements are used to retrieve data from the database. We can get same
results by writing different sql queries. But use of the best query is important
when performance is considered. So you need to sql query tuning based on the
requirement. Here is the list of queries which we use regularly and how these
sql queries can be optimised for better performance.
SQL Tuning/SQL Optimisation Techniques:
(1) The sql query becomes faster if you use the actual columns names in
SELECT statement instead of than ‘*’.
(2) HAVING clause is used to filter the rows after all the rows are selected. It
is just like a filter. Do not use HAVING clause for any other purposes.
(3) Sometimes you may have more than one subqueries in your main query.
Try to minimise the number of subquery block in your query.
(4) Use operator EXISTS, IN and table joins appropriately in your query.
(a) Usually IN has the slowest performance.
(b) IN is efficient when most of the filter criteria is in the sub-query.
(c) EXISTS is efficient when most of the filter criteria is in the main query.
(5) Use EXISTS instead of DISTINCT when using joins which involves tables
having one-to-many relationship.
(6) Try to use UNION ALL in place of UNION.
(7) Be careful while using conditions in WHERE clause.
(8) Use DECODE to avoid the scanning of same rows or joining the same
table repetitively. DECODE can also be made used in place of GROUP BY or
ORDER BY clause.
(9) To store large binary objects, first place them in the file system and add the
file path in the database.
(10) To write queries which provide efficient performance follow the general
SQL standard rules.
(a) Use single case for all SQL verbs
(b) Begin all SQL verbs on a new line
(c) Separate all words with a single space
(d) Right or left aligning verbs within the initial SQL verb
(b) Explain the following two ways to implement the object oriented
concepts in DBMS:
(i) To extend the existing RDBMS to include object orientation
Ans. The RDBMS technology has been enhanced over the period of last two
decades. The RDBMS are based on the theory of relations and thus are
developed on the basis of proven mathematical background. Hence, they can
be proved to be working correctly. Thus, it may be a good idea to include the
concepts of object orientation so that, they are able to support object-oriented
technologies too. The first two concepts that were added include the concept
of complex types, inheritance, and some newer types such as multisets and
arrays. One of the key concerns in object-relational database are the storage of
300 GullyBaba Publishing House (P) Ltd.
tables that would be needed to represent inherited tables, and representation
for the newer types.
One of the ways of representing inherited tables may be to store the inherited
primary key attributes along with the locally defined attributes.
The second possibility here would be, to allow the data to be stored in all the
inherited as well as base tables. However, such a case will result in data
replication.
(ii) To create a new DBMS that is exclusively devoted to Object Oriented
DBMS
Ans. The database system consists of persistent data. To manipulate that data
one must either use data manipulation commands or a host language like C
using embedded command. However, a persistent language would require a
seamless integration of language and persistent data.
A practical approach for declaring a persistent object would be to design a
construct that declares an objects as persistent. The difficulty with this approach
is that it needs to declare object persistence at the time of creation, An alternative
of this approach may be to mark a persistent object during run time.
All the objects created during the execution of an object oriented program
would be given a system generated object identifier, however, these identifiers
become useless once the program terminates. With the persistent objects it is
necessary that such objects have meaningful object identifiers. Persistent object
identifiers may be implemented using the concept of persistent pointers that
remain valid even after the end of a program.
(c) Explain data fragmentation in DDBMS with the help of an examples.
Ans. DATA FRAGMENTATION
Data fragmentation allows you to break a single object into two or more
segments or fragments. The object might be a user’s database, a system
database, or a table. Each fragment can be stored at any site over a computer
network.
Information about data fragmentation is stored in the distributed data catalog
(DDC), from which it is accessed by the TP to process user requests
There are three types of data fragmentation strategies:
• Horizontal fragmentation refers to the division of a relation into subsets
(fragments) of tuples (rows). Each fragment is stored at a different node, and
each fragment has unique rows. However, the unique rows all have the same
attributes (columns). In short, each fragment represents the equivalent of a
SELECT statement, with the WHERE clause on a single attribute.
• Vertical fragmentation refers to the division of a relation into attribute
(column) subsets. Each subset (fragment) is stored at a different node, and
each fragment has unique columns—with the exception of the key column,
which is common to all fragments.
MCS-043 GPH Book 301
• Mixed fragmentation refers to a combination of horizontal and vertical
strategies. In other words, a table may be divided into several horizontal subsets
(rows), each one having a subset of the attributes (columns).
To illustrate the fragmentation strategies, let’s use the CUSTOMER table for
the XYZ Company, depicted in Figure bellow.
Table name: CUSTOMER
CUS_NUM CUS_NAME CUS_ADDRESS CUS_STATE CUS_LIMIT CUS_BAL CUS_RATING CUS_DUE
10 Sinex, Inc. 12 Main St. TN 3500.00 2700.00 3 1245.00
11 Martin Crop. 321 Sunset Blvd. FL 6000.00 1200.00 1 0.00
12 Mynux Corp. 910 Eagle St. TN 4000.00 3500.00 3 3400.00
13 BTBC, Inc. Rue du Monde. FL 6000.00 5890.00 3 1090.00
14 Victory, Inc. 123 Maple St. FL 1200.00 550.00 1 0.00
15 NBCC Crop. 909 High Ave. GA 2000.00 350.00 2 50.00
Note: Question number one is compulsory. Answer any three questions from
the rest.
You may assume that projects are independent of equipments that may
be allocated to a person. Also assume that employee name, project name
and equipment are unique. Perform the following tasks for the description
given above:
(i) Identify the primary key of the relation. Justify your selection.
Ans. The primary key of the relation is {Employee Name, Project Name,
Equipment} because there is no FD exist in the relation therefore all the
attributes have to take in the primary key.
(ii) Identify the FDs and MVDs in the relation. Give reasons for your
selection.
Ans. There is no FD exist in the relation. The MVDs are (Employee Name
→→ Project Name,
Employee Name →→ Equipment, Project Name →→ Equipment,
Project Name →→ Employee Name, Equipment →→ Project Name,
Equipment →→ Employee Name)
(iii) Normalise the relation upto 4th Normal Form (4NF). Show that your
decomposition is lossless.
Ans. Loseless 4NF decomposition are
EP
Employee Name Project Name
Sanjay Inventory Control
Sanjay Secure Website
Harish Intranet
Harish Secure Website
(302)
MCS-043 GPH Book 303
PE
Project Name Equipment
Inventory Control Computer
Secure Website Mobile
Secure Website Computer
Inventory Control Mobile
Intranet Computer
EE
Employee Name Equipment
Sanjay Computer
Sanjay Mobile
Harish Computer
π name
Employee Project
304 GullyBaba Publishing House (P) Ltd.
(ii) Assume that hash join is to be used to join the two tables, show the
process of joining the two, tables using the hash join.
Ans.
Employee Table Employee Project Project Table
Partition Partition
EID Name Project Project Project Name
1 Mohan P1 P1 P1 P1 DBMS
2 Shyam P2 P2 P2 P2 OS
3 Deep P3 P3 Networks
4 Deepak P3 P3 OOPS
P4 P4
. . . P 4 P4 . .
. . . . .
(c) Assume that a departmental store has many branches, it has many
products and the sales information is recorded for every year. For example,
Year Product Code Branch Code Sales amount
2009 A001 B001 5000
2009 A001 B002 6000
2010 A001 B002 3000
2010 A002 B002 2000
.
.
.
.
Identify the dimension tables and fact tables and create the star schema
for above.
Ans.
Dimension Table: Dimension Table:
Year Product
Year Fact Table: Product Code
Quarter Sales PName
Year
Product Code Dimension Table:
Branch Code Branch
Sales Amount
Branch Code
BName
Location
Figure: Star Schema
MCS-043 GPH Book 305
(d) Differentiate between the following:
(i) JDBC versus ODBC
Refer to Page No.-190 [JDBC] & Page No.-179 [ODBC]
(ii) B - Tree indexes versus R - Tree indexes used in postgre SQL
Ans. B-Tree Indexes: These are the default type index. These are useful for
comparison and range queries.
R-Tree indexes: Such index are created on built-in spatial data types such as
box, circle for determining operations like overlap etc.
(e) What is a system catalogue? Write SQL command to show all the table
names in a database.
Ans. The system catalogue is a collection of tables and views that contain
important information about a database. It is the place where a relational
database management system stores schema metadata, such as information
about tables and columns, and internal bookkeeping information. A system
catalogue is available for each database. Information in the system catalogue
defines the structure of the database.
The terms system catalogue and data dictionary have been used interchangeably
in most situations.
(f) Consider that the adjustment of salary of the faculty members is done
as follows, where Fac_Salary i represents the salary of ith faculty member:
Transaction 1: Fac_Salary i: = Fac_Salary i + 1025
Transaction 2: Fac_Salary i: = Fac_Salary i * 1.1
If the above two transactions are run concurrently, what type of problems
can occur. Justify.
Ans. If the given two transactions run concurrently, the lost update problem
may occur. For example, consider the following schedule:
T1 T2 fac_salary i
Read (fac_salary i) 5000
Read (fac_salary i) 5000
fac_salary ∗ = 1.1
fac_salary + = 1.25
write (fac_salary i) 5500
write(fac_salary i) 6025
Enrollment No.
Student id Name
Prog-Code PName
M 1. Programme
Date of Birth Student Enrolled in
Mode
Duration Fee
Qualification d
Date of Admission
Q3. (a) Explain the process of log based recovery using immediate database
modification scheme with the help of an example.
Ans. Log-Based Recovery
A log is maintained on a stable storage media. The log is a sequence of log
records, and maintains a record of update activities on the database. When
transaction Ti starts, it registers itself by writing a < Ti start> log record.
Before Ti executes write(X), a log record < Ti X,V1,V2> is written, where V1
is the value of X before the write (undo value), and V2 is the value to be written
to X (redo value).
• Log record notes that Ti has performed a write on data item X. X had value
V1 before the write, and will have value V2 after the write.
• When Ti finishes it last statement, the log record < Ti commit> is written. We
assume for now that log records are written directly to a stable storage media
(that is, they are not buffered).
Two approaches for recovery using logs are:
• Deferred database modification.
• Immediate database modification.
Immediate Database Modification: The immediate database modification
scheme allows database updates on the stored database even of an uncommitted
transaction. These updates are made as the writes are issued (since undoing
may be needed, update logs must have both the old value as well as the new
value). Updated log records must be written before database item is written
(assume that the log record is output directly to a stable storage and can be
extended to postpone log record output, as long as prior to execution of an
output (Y) operation for a data block Y all log records corresponding to items
Y must be flushed to stable storage).
The recovery procedure in such has two operations instead of one:
• undo (Ti) restores the value of all data items updated by Ti to their old values,
moving backwards from the last log record for Ti ,
• redo (Ti) sets the value of all data items updated by Ti to the new values,
moving forward from the first log record for Ti .
When recovering after failure:
• Transaction Ti needs to be undone if the log contains the record < Ti start>,
but does not contain the record < Ti commit>.
• Transaction Ti needs to be redone if the log contains both the record < Ti start>
and the record < Ti commit>.
Undo operations are performed first, then redo operations.
MCS-043 GPH Book 309
Example:
Consider the log as it appears at three instances of time.
<T1 start> <T1 start> <T1 start>
<T1, X10000, 9000> <T1, X10000, 9000> <T1, X10000, 9000>
<T1, Y8000, 9000> <T1, Y8000, 9000> <T1, Y8000, 9000>
<T1, Commit> <T1, Commit>
<T2 start> <T2 start>
<T2, Z20000, 19000> <T2, Z20000, 19000>
<T2, Commit>
(a) (b) (c)
Note: Question number one is compulsory. Answer any three questions from
the rest.
(315)
316 GullyBaba Publishing House (P) Ltd.
(h) What are control files in Oracle? What does a control file contain?
What is the role of control file in starting an instance of oracle. What is
the purpose of redo log file during this time.
Q2. (a) Draw an EER diagram for the following situation: A vehicle can
be owned by an Individual or an organisation. In case the vehicle is
owned by an organisation then the database should store the name and
designation of contact person in the organisation, the office address,
and the name of the organisation along with name of the head. However,
for an individual owner just name, address and phonenumber of owner
needs to be recorder. “You may assume that the basic information that
is to be stored about the vehicle should be-vehicle registration number
type, date of model, date of purchase.”
(b) Convert the EER diagram created in part (a) to equivalent RDBMS
tables having proper keys and constraints.
(c) Define the term data mining. How data mining is different from
Knowledge Discovery in data bases (KDD)? Explain the steps of KDD
with the help of an example.
(d) Explain any three common database security failures. Explain with
the help of an example, how SQL can be used to enforce access control
in a database.
Note: Question number one is compulsory. Answer any three questions from
the rest.
is if ( F1 ∪ F2 ∪... ∪ Fn ) = F+
+
Example:
• R(A B C D)
• FD1 :A → B
• FD 2 :B → C
• FD3 :C → D
• Decomposition:
R 1 (A B C) R 2 (C D)
• FD1 :A → B
• FD 2 :B → C
• FD3 :C → D
(318)
MCS-043 GPH Book 319
R1 (A B C)
FD1
FD2
• FD1 :A → B
• FD 2 :B → C
• FD3 :C → D
R1 (A B C)
FD3
• FD1 :A → B
• FD 2 :B → C
• FD3 :C → D
R1 (A B C)
FD1
FD2
Have all 3 functional dependencies. Therefore, it’s preserving the dependencies.
Yes, the decomposition of R into (A, C, D), (B, C, D) and (E, F, D) lossless
and dependency preserving.
(b) What is meant by a schedule in the context of concurrent transactions
in Database? Also explain serial and serialisable schedules with the
help of a suitable example.
Ans. A schedule is a collection of many transactions which is implemented as
a unit. Depending upon how these transactions are arranged in within a
schedule, a schedule can be of two types:
• Serial: The transactions are executed one after another, in a non-preemptive
manner.
• Concurrent: The transactions are executed in a preemptive, time shared
method.
In Serial schedule, there is no question of sharing a single data item among
many transactions, because not more than a single transaction is executing at
any point of time. However, a serial schedule is inefficient in the sense that
the transactions suffer for having a longer waiting time and response time, as
well as low amount of resource utilisation.
Let us consider there are two transactions T1 and T2,whose instruction sets
are given as following. T1 is the same as we have seen earlier, while T2 is a
new transaction.
320 GullyBaba Publishing House (P) Ltd.
T1 T2 T1 T2
Read (A) Read (A)
A = a – 100 Write (A) Temp = a*0.1 Read (C)
Read (B) C = C+ Temp
B = B + 100 Write (B) Write (C)
T2 is a new transaction which deposits to account C 10% of the amount in
account A.
If we prepare a serial schedule, then either T1 will completely finish before T2
can begin, or T2 will completely finish before T1 can begin. However, if we
want to create a concurrent schedule, then some Context Switching need to
be made, so that some portion of T1 will be executed, then some portion of T2
will be executed and so on. For example say we have prepared the following
concurrent schedule.
T1 T2 T1 T2
Read (A) Read (B)
A = A – 100 B=B+100
Write (A) Write (B)
T1 T2 T1 T2
Write (Q) Write (Q)
Read (Q) Read (R)
Read (R) Read (Q)
Write (Q) Write (Q)
Figure: Views Serializability
(c) Define locking in concurrency control. Discuss the various types of
locking techniques.
Ans. Locking is Technique used to control concurrent access to Data. Locking
is the one of the most widely used technique to ensure Serializability. Principle
used in locking Technique is as follows. “A transaction must contain a read
or write lock on data item before it can perform a read or write operation”.
The basic rules implemented in locking technique are as follows
• If a transaction has read lock on a data item. It can performer only read
operation not write.
• If a transaction has read lock on a data item and other transaction want to
use it then it can implement only read lock not write lock on this data.
• If a transaction has write lock on a data item. It can performer Both read
operation and write operation.
• If a transaction has write lock on a data item and other transaction want to
use it then it can’t implement neither read lock nor write lock on this data.
Types of locking Technique
• The two Phase locking protocol
• Time Stamping Protocol
• Validation Based protocol
The two Phase locking protocol:
The two-phase locking protocol is used to ensure the Serializability in Database.
This protocol is implemented in two phase:
• Growing Phase-
In this phase we put read or write lock based on need on the data. In this
phase we does not release any lock.
• Shrinking Phase-
This phase is just reverse of growing phase. In this phase we release read and
write lock but doesn’t put any lock on data.
Lock point
MFG.ACME.COM HQ.ACME.COM
SALES.ACME.COM
Sales
Loan_
Loan Payment
Payment
Branch_staff_owner Branch_staff_owner
Branch_No S_Name Branch_No O_Name
B003 AnnBeech B003 Carol Farrel
B003 David Ford B003 TinaMurphy
(b) How are data marts different from data warehouse? Explain the
different types of data marts.
Ans. Data Warehouse:
• Holds multiple subject areas
• Holds very detailed information
• Works to integrate all data sources
• Does not necessarily use a dimensional model but feeds dimensional models.
330 GullyBaba Publishing House (P) Ltd.
Data Mart:
• Often holds only one subject area- for example, Finance, or Sales
• May hold more summarised data (although many hold full detail)
• Concentrates on integrating information from a given subject area or set of
source systems
• Is built focused on a dimensional model using a star schema.
There are two types of data marts:
1. Independent or stand-alone data mart and
2. Dependent data mart.
Stand-alone data mart
A Stand-alone data mart focuses exclusively on one subject area and it is not
designed in an enterprise context.
Dependent data mart
According to Bill Inmon, a dependent data mart is a place where its data
comes from a data warehouse. Data in a data warehouse is aggregated,
restructured, and summarised when it passes into the dependent data mart.
(c) Explain, Business Intelligence in context of data warehousing.
Ans. A data warehouse is an integrated collection of data and can help the
process of making better business decisions. Several tools and methods are
available to that enhances advantage of the data of data warehouse to create
information and knowledge that supports business decisions. Two such
techniques are-
• Decision-support systems
• Online analytical processing
Decision-support systems- The DSS is a decision support system and NOT
a decision-making system. DSS is a specific class of computerised information
systems that support the decision-making activities of an organisation. A
properly designed DSS is an interactive software based system that helps
decision makers to compile useful information from raw data, documents,
personal knowledge, and/or business models to identify and solve problems
and make decisions.
Online analytical processing- It is an approach for performing analytical queries
and statistical analysis of multidimensional data. OLAP tools can be put in the
category of business intelligence tools along with data mining. Some of the
typical applications of OLAP may include reporting of sales projections, judging
the performance of a business, budgeting and forecasting etc.
OLAP tools require multidimensional data and distributed query-processing
capabilities. Thus, OLAP has data warehouse as its major source of information
and query processing.
MCS-043 GPH Book 331
Q5. Explain the following with the help of examples or illustration.
(a) Postages SQL
(b) Deadlock Recovery
(c) Semantic Query Optimisation
(d) Spatial and Multimedia Databases
Ans. (a) Postgre SQL is a decision many must make when approaching
open-source relational database management systems. It is time-proven
solutions that compete strongly with proprietary database software.
PostgreSQL was assumed to be a more densely featured database system
often described as an open-source version of Oracle. PostgreSQL can
compress and decompress its data on the fly with a fast compression scheme
to fit more data in an allotted disk space. The advantage of compressed data,
besides saving disk space, is that reading data takes less IO, resulting in
faster data reads.
(b) Deadlock Recovery: A set of two or more processes are deadlocked if
they are blocked (i.e., in the waiting state) each holding a resource and waiting
to acquire a resource held by another process in the set. A process is
deadlocked if it is waiting for an event which is never going to happen.
A B
T1 T2
Deadlock depends on the dynamics of the execution. All of the following four
necessary conditions must hold simultaneously for deadlock to occur:
Mutual exclusion: only one process can use a resource at a time.
Hold and wait: a process holding at least one resource is waiting to acquire
additional resources which are currently held by other processes.
No preemption: a resource can only be released voluntarily by the process
holding it.
circular wait: a cycle of process requests exists (i.e., P0 is waiting for a
resource hold by P1 who is waiting for a resource held by Pj ... who is
waiting for a resource held by P(n-1) which is waiting for a resource held by
Pn which is waiting for a resource held by P0). Circular wait implies the hold
and wait condition. Therefore, these conditions are not completely independent.
(c) Semantic Query Optimisation: Semantic query optimisation is the process
of transforming a query issued by a user into a different query which, because
of the semantics of the application, is guaranteed to yield the correct answer
for all states of the database. While this process has been successfully applied
in centralised databases, its potential for distributed and heterogeneous systems
is enormous, as there is the potential to eliminate inter-site joins which are the
single biggest cost factor in query processing. Further justification for its use
is provided by the fact that users of heterogeneous databases typically issue
332 GullyBaba Publishing House (P) Ltd.
queries through high-level languages which may result in very inefficient
queries if mapped directly, without consideration of the semantics of the
system.
(d) Spatial and Multimedia Databases: A spatial database is a database that
is optimised to store and query data that represents objects defined in a
geometric space. Most spatial databases allow representing simple geometric
objects such as points, lines and polygons. Some spatial databases handle
more complex structures such as 3D objects, topological coverages, linear
networks, and TINs. While typical databases are designed to manage various
numeric and character types of data, additional functionality needs to be added
for databases to process spatial data types efficiently. These are typically
called geometry or feature. The Open Geospatial Consortium created
the Simple Features specification and sets standards for adding spatial
functionality to database systems.
A multimedia database is a system able to store and retrieve objects made up
of text, images, sounds, animations, voice, video, etc. The wide range of
applications for MMDBs leads to a number of different problems with respect
to traditional database systems, which only consider textual and numerical
attributes.
These problems include:
• data modeling;
• support for different data types;
• efficient data storing;
• data compressing techniques;
• index structures for non-traditional data types;
• query optimisation;
• presentation of objects of different types.
MCS – 43: ADVANCED DATABASE DESIGN
December, 2014
Note: Question number one is compulsory. Answer any three questions from
the rest.
Q1. (a) How do UML diagrams help in designing the database? Discuss
with the help of an example.
(b) How does data granularity affect the performance of concurrency
control? Do you think that data granularity and database security are
interrelated? Justify your answer.
(c) Compare and contrast the distributed DBMS environment with the
centralised DBMS environment.
(d) What are semantic databases? List the features of semantic
databases. Explain the process of searching the knowledge in these
databases.
(e) What is shadow paging? Give two advantages and two disadvantages
of shadow paging.
(f) What is a data warehouse? Describe the process of ETL for a data
warehouse.
(g) What are data marts? Briefly discuss the method of creating the
data marts.
(h) Explain the role of Query Optimiser in Oracle.
Q2. (a) Explain the algorithm and cost calculation for Simple Hash Join.
(b) Differentiate between the following:
(i) Embedded SQL and Dynamic SQL
(ii) XML and HTML
(iii) 2 PC and 3 PC Protocol
(iv) Data Warehousing and Data Mining
Q3. (a) Give suitable example to discuss the Apriori algorithm for finding
frequent itemsets.
(b) Write a short note, with suitable example, for each of the following:
(i) Vendor-Specific Security
(ii) Multilevel Security
Q4. (a) What are cursors, stored procedures and triggers? Give SQL
syntax for each and discuss the utility aspect of each.
(333)
334 GullyBaba Publishing House (P) Ltd.
(b) Explain Join-Dependency with the help of an example. With which
normal form is it associated? Functional dependency and Multivalued
dependency are special types of join dependencies. Justify.
Q5. (a) What do you mean by Deadlock? How can we prevent Deadlock?
Write an algorithm that checks whether the concurrently executing
transactions are in deadlock or not.
(b) Compare and contrast Relational DBMS with Object-Relational DBMS
and Object-Oriented DBMS. Suggest one application for each of these
DBMS.
MCS – 43: ADVANCED DATABASE DESIGN
June, 2015
Note: Question number one is compulsory. Answer any three questions from
the rest.
(335)
336 GullyBaba Publishing House (P) Ltd.
Q4. (a) How does PostgreSQL perform storage and indexing of tables?
Briefly discuss the types of indexes involved in PostgreSQL.
(b) What is SQLJ? What are the requirements of SQLJ? Briefly describe
the working of SQLJ. “Can SQLJ use dynamic SQL?” If yes, then
how? Otherwise, specify the type of SQL if can use.
Note: Question number one is compulsory. Answer any three questions from
the rest.
Note: Question number one is compulsory. Answer any three questions from
the rest.
Q1. (a) What are Triggers? Explain the significance of triggers with the
help of an example.
(b) Explain the Log-Based recovery scheme, using deferred database
modification approach. Give a suitable example in your explanation.
(c) What are mobile databases? List the characteristics of mobile
databases. Discuss the challenges of implementing mobile databases.
(d) List the index implementation available in PostgreSQL. Explain
each index available in PostgreSQL.
(e) What is Data Dictionary? What is the significance of creating Data
Dictionary in DBMS? Explain the statistics stored in the Data Dictionary.
(f) What are Database Security features? Discuss with example how
Oracle manages database security.
(g) What are web databases? How do you create them? List the steps.
(h) What are views? What is the significance of views in DBMS? How
are views managed in SQL? Explain with an example.
Q2. (a) How do Database Queries differ from Data Mining Queries?
Explain K-Means Clustering algorithm for Data Mining, with suitable
example.
(b) Explain any two of the following:
(i) OLAP and its types
(ii) Spatial Databases
(iii) Dynamic SQL
(c) Explain Data fragmentation in distributed DBMS, with the help of
an example.
Q3. (a) Discuss the term ‘Association rule mining’. Write the Apriori
algorithm for association rule mining. Illustrate it using an example.
(b) Explain any two of the following with suitable examples:
(i) Semantic Databases
(ii) Temporal Databases
(iii) Embedded SQL
(c) Discuss 4NF with suitable example. Is 4NF decomposition dependency
preserving? Justify.
(340)
MCS-043 GPH Book 341
Q4. (a) In PostgreSQL, explain how B-Tree Indexes are different from
R-Tree Indexes.
(b) Discuss any three of the following with suitable examples:
(i) UML Class Diagram
(ii) Query Optimization
(iii) Assertion
(iv) Multiversioning
Q5. (a) What are checkpoints? How are checkpoints used in log-based
recovery? Explain with the help of an example and a diagram.
(b) What are control files in Oracle? What does a control file contain?
What is the role of control file in starting an instance of Oracle? Discuss
the role of redo log file in starting an instance of Oracle.
(c) Describe ID3 algorithm for classifying datasets, with suitable
example.
MCS – 43: ADVANCED DATABASE MANAGEMENT SYSTEMS
December, 2016
Note: Question number 1 is compulsory. Answer any three questions from the
rest.
Q2. (a) What do you understand by the term ‘Database Tuning’? What
are the goals of tuning a database? What is the requirement of tuning
the database queries?
(b) What is multiversion concurrency control? How can multiversion
concurrency control be achieved by using time stamp ordering?
(c) How do you perform cost calculation for simple Hash-Join?
Q3. (a) What are mobile databases? Discuss the characteristics of mobile
databases. Give an application of mobile databases.
(b) Explain the following:
(i) PJNF
(ii) SQLJ
(iii) Audit Trail
(iv) Deductive Database
(c) What do you understand by ‘Transaction’ in DBMS? Discuss the
properties of concurrent transactions performed in DBMS.
(342)
MCS-043 GPH Book 343
Q4. (a) Differentiate between following:
(i) XML and HTML
(ii) JDBC and ODBC
(iii) Clustering and Classification
(iv) Serial schedule and Serializable schedule
(b) Explain “log based” recovery using immediate database modification
scheme. Give suitable example for explanation.
(c) Discuss the concept of exclusive locks and shared locks. How are
these two locks used in transaction management? Discuss with an
example.
Q5. (a) Define the term Data mining. How is data mining different
from Knowledge Discovery in Databases (KDD)? Explain the steps of
KDD with the help of an example.
(b) How are object oriented databases different from relational
databases? Write an ODL syntax to create a class Employee that also
references the Project class.
(c) Explain the role of Query optimizer in Oracle.
(d) What are web databases? Discuss the steps required in creating a
web database.
.
MCS – 43: ADVANCED DATABASE MANAGEMENT SYSTEMS
June, 2017
Note: Question number 1 is compulsory. Answer any three questions from the
rest.
Q2. (a) What are Multivalued dependencies? When can we say that
multivalued dependency is trivial? Discuss with a suitable example.
(b) List the various UML diagrams. How do UML class diagrams
contribute to Database Design?
(c) What is a system catalogue? Discuss the role of system catalogues
in Database Administration.
Note: Question number 1 is compulsory. Answer any three questions from the
rest.
Q1. (a) How does Data-mining differ from the technique of Knowledge
Discovery in Database (KDD)? Can we use these two techniques
interchangeably or alternatively? Justify.
Refer to Ch-6, Q.No.-51(ii), Page No.-133 & Dec.-2009, Q.No.-1(d), Page
No.-244
(b) Explain the index implementations available in PostgreSQL, with
suitable examples.
Refer to Ch-7, Q.No.-27, Page No.-148
(c) Consider the following data:
Employee with code E1 works on two projects, P1 and P2, and can use
two tools T1 and T2. Likewise, E2 works on projects P3 and P4 and can
use tools T2 and T3. You may assume that projects and tools are
independent of each other, but depend only on employee.
Perform the following tasks for the data:
(i) Represent the data as above in a relation, namely,
Employee_Project_Tool (EC, PC, TC).
(ii) List all the MVDs in the relation created in part (i) above. Also
identify the primary key of the relation.
(iii) Normalise the relation created in part (i) into 4NF.
(iv) Join the relations created in part (iii) above and show that they will
produce the same data on joining as per the relation created in part (i).
Same as June-2013, Q.No.-1(a), Page No.-302
(d) What is Granularity in databases? How does granularity relate to
the security of databases? In a concurrent environment, how does
granularity affect the performance?
Ans. Granularity could be defined as any entity whose data fields could be
sub divided.
(346)
MCS-043 GPH Book 347
For e.g. when you take an entity called as a person. A person’s identity could
be further divided into following:
• Name, Address, Gender, City, State, Country etc.
• This means that a person as an entity has high granularity. Since there are
many sub divisions of the same entity.
• But if we were to choose gender-this could simply be maximum 3 values-
• Male, Female and Transgender
• This means that Gender as a field/attribute has low granularity.
Now, Refer to June-2008, Q.No.-1(f), Page No.-217
(e) What are Semantic databases? Give the features of semantic
databases. Discuss the process of searching the knowledge in these
databases.
Refer to June-2008, Q.No.-2(c), Page No.-219
(f) Illustrate the concept of Shadow Paging with suitable example. Give
the advantages and disadvantages of shadow paging.
Refer to June-2008, Q.No.-5(a), Page No.-223
(g) How does Clustering differ from Classification? Briefly discuss one
approach for both, i.e., clustering and classification.
Refer to Ch-6, Q.No.-54, 56, Page No.-134 & Dec-2006, Q.No.-4(a), Page
No.-174
Q2. (a) What are Alerts, Cursors, Stored Procedures and Triggers? Give
the utility of each. Explain each with suitable code of your choice.
Refer to Ch-5, Q.No.-36, Page No.92
(b) What is Simple Hash Join? Discuss the algorithm and cost calculation
process for simple hash join. Explain how Hash Join is applicable to
Equi Join and Natural Join.
Refer to June-2008, Q.No.-1(d), Page No.-213
Q4. (a) What are Views in SQL? What is the significance of views?
Give an example code in SQL to create a view.
Refer to Ch-4, Q.No.-1, Page No.-48
(b) (i) Create an XML document that stores information about students
of a class. The document should use appropriate tags/attributes to store
information like student_id (unique), name (consisting of first name
and last name), class (consisting of class no. and section) and books
issued to the student (minimum 0, maximum 2). Create at least two
such student records.
Refer to Gullybaba.com (Download section)
(ii) Create the DTD for verification of the above student data.
Refer to Gullybaba.com (Download section)
(c) Explain the features and challenges of multimedia databases.
Refer to Ch-7, Q.No.-1, 4, Page No.-140, 142
Q5. (a) What is Data Dictionary? Give the features and benefits of a
data dictionary. What are the disadvantages of a data dictionary?
Refer to Ch-4, Q.No.-18, 19, 20, Page No.-64, 65
(b) What is the utility of Multi-Version schemes in databases? Discuss
any one multi-version scheme, with suitable example.
Refer to June.-2007, Q.No.-5(a), Page No.196
(c) What is Association Rule Mining? Write Apriori Algorithm for finding
frequent itemset. Discuss it with suitable example.
Refer to Ch-6, Q.No.-59, 60, Page No.-137, 138
Must Read
R
vo'; i<+sa
GULLYBABA PUBLISHING
HOUSE PVT. LTD.
New Syllabus Based
IGNOU
100%
Guidance for
IGNOU EXAM
HELP BOOKS
B.A., B.COM, B.A. FOUNDATION, M.A., M.COM.,
BCA, B.ED., M.ED., AND OTHER SUBJECTS
Note: Question number one is compulsory. Answer any three questions from
the rest.
Q1. (a) What are Cursors, Stored Procedures and Triggers? Explain
each with the help of an example code.
(b) Differentiate between Logical and Physical Database design.
(c) What is a View? Explain any two strategies for implementing the
Views.
(d) What is Hash join? How is Hash join between two relations computed?
Explain the algorithm and cost calculation for simple hash join.
(e) Differentiate between a database management system and a data
warehouse. What is the need for a data warehouse?
(f) What is granularity of data? How does granularity of data items
affect the performance of concurrency control?
(g) Differentiate between Centralised DBMS and Distributed DBMS.
(h) What is Timestamp Ordering? Explain timestamp based protocol
for serialisable schedule.
Q2. (a) What is Data Mining? How does Data Mining differ from OLTP?
Discuss Classification as a technique for data mining.
(b) What are Data Marts? Briefly discuss the significance of data marts.
(c) What is XML? How does XML differ from HTML? What are the
advantages of XML ? Create an XML schema for a list of students and
their marks.
(350)
MCS-043 GPH Book 351
Q4. (a) What is Shadow Paging? Illustrate with an example. Give the
advantages and disadvantages of shadow paging.
(b) Define Multi-valued Dependency. What is Trivial Multi-valued
Dependency? State the fourth normal form.
(c) Explain any one clustering technique for data mining.
(d) What is Query Optimisation? Briefly discuss the techniques of Query
Optimisation with suitable examples.
Note: Question number one is compulsory. Answer any three questions from
the rest.
Q1. (a) Explain the process of Query optimisation with suitable example.
(b) What is the difference between Document Type Definition (DTD)
and XML Schema? Explain using an example.
(c) Explain Data mining in the context of knowledge discovery in
databases.
(d) What is Join dependency? Explain it with the help of an example.
What is trivial join dependency’?
(e) Consider a small institute in which students register for programmes
run by the institute. A programme can be a full or part time programme
or both. Every student necessarily registers in at least one programme
and at most three programmes, Assuming suitable attributes, design
an EER Diagram for the same.
(f) Explain the reference architecture for distributed database
management system.
(g) What are triggers? Explain the utility of triggers in DBMS. Give
suitable SQL code for triggers.
(h) What is a System catalogue? What is the information stored in
catalogue of RDBMS?
Note: Question number 1 is compulsory. Answer any three questions from the
rest.
Q1. (a) What is a data warehouse? Describe the process of ETL for data
warehouse.
Refer to Dec-2008, Q.No.-1(b), Page No.-227
(354)
MCS-043 GPH Book 355
(d) What are data marts? What is the significance of creating data
marts?
Refer to Dec-2008, Q.No.-3(b), Page No.-231
(f) What is the effect of data granularity over database security ? Give
suitable example.
Refer to June-2008, Q.No.-1(f), Page No.-217
(d) Differentiate between ODBC and JDBC. What are the components
required for implementing ODBC in a system?
Refer to June-2009, Q.No.-3(b), Page No.-238 and Refer to Ch-7,
Q.No.-17, Page No.-145
Q4. (a) What is k-means clustering ? How does it differ from nearest
neighbour clustering ?
Ans. In the K-Means clustering, initially a set of clusters is randomly chosen.
Then iteratively, items are moved among sets of clusters until the desired set
is reached. A high degree of similarity among elements in a cluster is obtained
by using this algorithm. For this algorithm a set of clusters Ki={ti1,ti2,…,tim}
is given , the cluster mean is:
mi = (1/m)(ti1 + … + tim)
where ti represents the tuples and m represents the mean
The K-Means algorithm is as follows:
Input:
D= {t1,t2,…tn} //Set of elements
A //Adjacency matrix showing distance between elements.
k //Number of desired clusters.
Output:
K //Set of Clusters
K-Means Algorithm:
Assign initial values for means m1,m2..mk;
Repeat
Assign each item ti to the cluster which has the closest mean;
Calculate new mean for each cluster;
Until convergence criteria is met.
Now, Refer to Dec-2006, Q.No.-4(a), Page No.-174
(b) What is Audit Trail ? Give benefits of audit trail in context of DBMS.
Refer to Ch-5, Q.No.-60, Page No.-103
Q5. (a) How does PostgreSQL perform storage and indexing? Discuss
the types of indexes involved in PostgreSQL with suitable examples.
Refer to Ch-7, Q.No.-27, Page No.-148
Note: Question number one is compulsory. Answer any three questions from
the rest.
Q1. (a) What are triggers? What is the utility of triggers in DBMS?
Explain with the help of an example.
(b) Discuss data mining in the context of knowledge discovery in
databases.
(c) What are mobile databases? Give characteristics of mobile databases.
(d) What is shadow paging? Give advantages and disadvantages of shadow
paging.
(e) Differentiate between logical database design and physical database
design.
(f) How hash join is applicable to Equi Join and Natural Join?
(g) What is an ETL process? Describe the different transformations
performed during ETL process.
(h) Compare Assertions and Views. Discuss each with a suitable example.
Q3. (a) What is Join Dependency? How join dependency relates to SNF?
(b) What is a Datagrid? Give a block diagram to describe the structure
of datagrid. What is the utility of datagrid?
(c) What is a serializable schedule? How does serializable schedule differ
from a serial schedule? What are the problems, associated with both
schedules? Explain the time stamp based protocols for serializable
schedule.
(359)
360 GullyBaba Publishing House (P) Ltd.
(b) What are the common database security failures? Give SQL
commands, to grant permission for database access. Why statistical
databases are more prone to security breach? Explain with the help of
an example.
(c) What is OLAP? Discuss the different implementations of OLAP.
(d) Discuss the different types of structural diagrams in UML.
Note: Question number one is compulsory. Answer any three questions from
the rest.
Q2. (a) Discuss the following with the help of a suitable example for
each:
(1) Triggers
(ii) Alerts
(iii) Cursors
(iv) Procedures
(v) Functions
(b) Compare and contrast Database Queries and Data Mining Queries.
Give example for each.
(c) Explain K-means algorithm with the help of an example.
(361)
362 GullyBaba Publishing House (P) Ltd.
(d) What is Data Dictionary in DBMS? Discuss its significance in DBMS.
Give an example of a data dictionary entry.
Q5. (a) What are Exclusive mode locks? How do Exclusive mode locks
differ from shared mode locks? How are these locks used in transaction
management? Give suitable example to support your discussion.
(b) What is multivalued dependency? How is 4NF related to multivalued
dependency? Is 4NF decomposition dependency preserving? Justify your
answer.
(c) Explain the role of the following files in Oracle:
(i) Control Files
(ii) Data Files