0% found this document useful (0 votes)
227 views83 pages

Kcs-501: Database Management System

The document outlines the units and topics covered in a course on database management systems (DBMS). Unit 1 introduces database concepts like the database architecture, data modeling using ER diagrams, and database languages. Unit 2 covers the relational data model and SQL. Unit 3 discusses database design and normalization. Unit 4 describes transaction processing and recovery. Unit 5 presents various concurrency control techniques for distributed databases.

Uploaded by

Ashishkumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
227 views83 pages

Kcs-501: Database Management System

The document outlines the units and topics covered in a course on database management systems (DBMS). Unit 1 introduces database concepts like the database architecture, data modeling using ER diagrams, and database languages. Unit 2 covers the relational data model and SQL. Unit 3 discusses database design and normalization. Unit 4 describes transaction processing and recovery. Unit 5 presents various concurrency control techniques for distributed databases.

Uploaded by

Ashishkumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 83

KCS–501: DATABASE MANAGEMENT SYSTEM

UNIT-1: INTRODUCTION: Overview, Database System vs File System, Database System Concept and
Architecture, Data Model Schema and Instances, Data Independence and Database Language and
Interfaces, Data Definitions Language, DML, Overall Database Structure. Data Modeling Using the Entity
Relationship Model: ER Model Concepts, Notation for ER Diagram, Mapping Constraints, Keys, Concepts
of Super Key, Candidate Key, Primary Key, Generalization, Aggregation, Reduction of an ER Diagrams to
Tables, Extended ER Model, Relationship of Higher Degree.

UNIT-2: RELATIONAL DATA MODEL: Relational Data Model Concepts, Integrity Constraints, Entity
Integrity, Referential Integrity, Keys Constraints, Domain Constraints, Relational Algebra, Relational
Calculus, Tuple and Domain Calculus. Introduction on SQL: Characteristics of SQL, Advantage of SQL. SQl
Data Type and Literals. Types of SQL Commands. SQL Operators and Their Procedure. Tables, Views and
Indexes. Queries and Sub Queries. Aggregate Functions. Insert, Update and Delete Operations, Joins,
Unions, Intersection, Minus, Cursors, Triggers, Procedures in SQL/PL SQL.

UNIT-3: DATA BASE DESIGN & NORMALIZATION: Functional dependencies, normal forms, first,
second, 8 third normal forms, BCNF, inclusion dependence, loss less join decompositions, normalization
using FD, MVD, and JDs, alternative approaches to database design.

UNIT-4: TRANSACTION PROCESSING CONCEPT: Transaction System, Testing of Serializability,


Serializability of Schedules, Conflict & View Serializable Schedule, Recoverability, Recovery from
Transaction Failures, Log Based Recovery, Checkpoints, Deadlock Handling. Distributed Database:
Distributed Data Storage, Concurrency Control, Directory System.

UNIT-5: CONCURRENCY CONTROL TECHNIQUES: Concurrency Control, Locking Techniques for


Concurrency Control, Time Stamping Protocols for Concurrency Control, Validation Based Protocol,
Multiple Granularity, Multi Version Schemes, Recovery with Concurrent Transaction, Case Study of Oracle.
UNIT-1: INTRODUCTION
1. What are the functions of DBMS?
Ans. The functions of DBMS are:
i. The ability to update and retrieve data
ii. Support concurrent updates
iii. Recovery of data
iv. Security
v. Data integrity
1.2. List some applications of DBMS.
Ans. Applications of DBMS are:
i. Banking
ii. Airlines
iii. Credit card transactions
iv. Finance
v. Web based services
vi. Telecommunications
3. What are the advantages of file processing system which were removed by the DBMS?
Ans.
i. No problem of centralization
ii. Less expensive
iii. Less need of hardware
iv. Less complex in backup and recovery
4. What is data model? List the types of data model used.
Ans. Data model is a logical structure of the database. It is a collection of conceptual tools for describing
data, data relationships, data semantics and consistency constraints.
Types of data model used:
1. Hierarchical model
2. Network model
3. Relational model
4. Object-oriented model
5. Object-relational model
5. Explain the difference between physical and logical data independence with example.
Ans.
S. No. Physical data independence Logical data independence
1. Physical data independence is the ability to Logical data independence is the ability to
modify physical schema without causing the modify the conceptual schema without
conceptual schema or application programs to having to change the external schemas or
be rewritten. application programs
2. Examples of physical independence are Examples of logical data independence are
reorganisations of files, adding a new access addition/removal of entities
path etc.
6. Write advantages of database.
Ans. Advantages of database:
1. Controlling data redundancy
2. Sharing of data
3. Data consistency
4. Integration of data
5. Integration constraints
6. Data security
7. Explain logical data independence.
Ans. The separation of the external views from the conceptual views which enables the user to change the
conceptual view without effecting the external views or application program is called logical data
independence
8. Define the term data redundancy and data consistency.
Ans. Data redundancy: The occurrence of values for data elements more than once within a file or
database is called data redundancy.
Data consistency: Data consistency states that only valid data will be written to the database.
9. What do you mean by DML and DDL?
OR Define DML.
Ans. Data Manipulation Language (DML): A DML is a language that enables users to access and
manipulate data as organized by the appropriate data model. Insert, update, delete, query are commonly
used DML commands.
Data Definition Language (DDL): DDL is set of SQL commands used to create, modify and delete
database structures but not data.They are normally used by the DBA. Create, drop, alter,
truncate are commonly used DDL command.
10. Give example of a simple, composite attributes of an entity.
Ans. Simple attribute: A simple attribute is an attribute composed of a single component with an
independent existence.
Example: Roll number, salary etc.
Composite attribute: An attribute composed of multiple components each with an independent
existence is called a composite attribute.
Example: Name, which is composed of attributes like first name, middle name and last name.

First name Middle name Last name

Student name

Fig. 1.11.1.

11. Write the difference between super key and candidate key.
Ans.
S. No Super key Candidate key
1. Super key is a set of one or more attributes A candidate key is a column, or set of column, in
taken collectively that allows us to identify the table that can uniquely identify any database
uniquely an entity in the entity set record without referring to any other data
2. Super key is a broadest unique identifier Candidate key is a subset of super key

12. Give example for one to one and one to many relationships.
Ans.

(a) One to one (b) One to many


Fig. 1.12.1.
13. Describe the purpose of foreign key.
Ans. A foreign key is used to link tables together and create a relationship. It is a field in one table that is
linked to the primary key in another table.
14. Explain specialization.
Ans. Specialization is the abstracting process of introducing new characteristics to an existing class of
objects to create one or more new classes of objects. This involves taking a higher-level, and using
additional characteristics, generating lower-level entities.
15. What do you mean by aggregation?
Ans. Aggregation is an abstraction through which relationships are treated as higher level entities.
16. Define super key, candidate key, primary key and foreign key.
Ans. Super key: It is a set of one or more attributes that, taken collectively, allows us to identify uniquely
an entity in the entity set.
Candidate key: A candidate key is a column, or set of column, in the table that can uniquely identify any
database record without referring to any other data.
Primary key: Primary key is a candidate key that is used for unique identification of entities within the
table.
Foreign key: A foreign key is derived from the primary key of the same or some other table. Foreign key is
the combination of one or more columns in a table (parent table) at references a primary key in another
table (child table).
17. What is strong and weak entity set?
Ans. Strong entity: Strong entity is not dependent of any other entity
in schema. Strong entity always has primary key. Strong entity is represented by single rectangle. Two
strong entity’s relationship is represented by single diamond.
Weak entity: Weak entity is dependent on strong entity to ensure the existence of weak entity. Weak
entity does not have any primary key, it has partial discriminator key. Weak entity is represented by double
rectangle.
18. Explain the difference between a weak and a strong entity set with example.
Ans.
S. No Weak entity set Strong entity set
1. An entity set which does not possess An entity set which does have a primary key is
sufficient attributes to from a primary key is called a strong entity set.
called a weak entity set.
2. It is represented by a rectangle. It is represented by a double rectangle.
3. It contains a primary key represented by an It contains a primary key represented by a
underline. dashed underline.

19. Discuss three level of abstractions or schemas architecture of DBMS.


Ans. Different levels of data abstraction:
1. Physical level
2. Logical level
3. View level

LONG ANSWER TYPE QUESTIONS


1. What is database management system (DBMS)? What are the tasks performed by users in
DBMS?
Answer:
1. Database management system (DBMS) is a software which is use to manage the database. For example,
MySQL, Oracle, are commercial database which is used in different applications.
2. DBMS provides an interface to perform various operations like database creation, storing data, updating
data, creating a table in the database etc.
3. It provides protection and security to the database. In case of multiple users, it also maintains data
consistency.
DBMS allows users the following tasks:
1. Data definition: It is used for creation, modification, and removal of database objects that defines the
organization of data in the database.
2. Data updation: It is used for the insertion, modification, and deletion of the actual data in the
database.
3. Data retrieval: It is used to retrieve the data from the database which can be used by applications for
various purposes.
4. User administration: It is used for registering and monitoring users, maintaining data integrity,
enforcing data security, dealing with concurrency control, monitoring performance and recovering
information corrupted by unexpected failure.

2. What are the advantages and disadvantages of DBMS?


Answer:
Advantages of DBMS:
1. Database redundancy: It controls data redundancy because it stores all the data in one single
database file and that recorded data is placed in the database.
2. Data sharing: In DBMS, the authorized users of an organization can share the data among multiple
users.
3. Easy maintenance: It can be easily maintainable due to the centralized nature of the database system.
4. Reduce time: It reduces development time and maintenance need.
5. Backup: It provides backup and recovery subsystems which create automatic backup of data from
hardware and software failures and restores the data if required.
6. Multiple user interface: It provides different types of user interfaces like graphical user interface,
application program interface.
Disadvantages of DBMS:
1. Cost of hardware and software: It requires high speed of data processor and large memory size to
run DBMS software.
2. Size: It occupies a large space of disks and large memory to run efficiently.
3. Complexity: Database system creates additional complexity and requirements.
4. Higher impact of failure: Failure is highly impacted the database because in most of the
organization, all the data stored in a single database and if the database is damaged due to electric failure or
database corruption then the data may be lost forever.

3. What do you understand by database users? Describe the different types of database
users.
Answer:
Database users are the one who use and take the benefits of database. The different types of users
depending on the need and way of accessing the database are:
1. Application programmers:
a. They are the developers who interact with the database by means of DML queries.
b. These DML queries are written in the application programs like C, C++, JAVA, Pascal etc.
c. These queries are converted into object code to communicate with the database.
2. Sophisticated users:
a. They are database developers, who write SQL queries to select/ insert/delete/update data.
b. They directly interact with the database by means of query language like SQL.
c. These users can be scientists, engineers, analysts who thoroughly study SQL and DBMS to apply the
concepts in their requirement.
3. Specialized users:
a. These are also sophisticated users, but they write special database application programs.
b. They are the developers who develop the complex programs according to the requirement.
4. Standalone user:
a. These users will have standalone database for their personal use.
b. These kinds of database will have predefined database packages which will have menus and graphical
interfaces.
5. Native users:
a. These are the users who use the existing application to interact with the database.
b. For example, online library system, ticket booking systems, ATMs etc.

4. Who are data administrators? What are the functions of database administrator?
OR
Discuss the role of database administrator.
Answer:
Database administrators are the personnel’s who has control over data and programs used for accessing the
data.
Functions/role of database administrator (DBA):
1. Schema definition:
a. Original database schema is defined by DBA.
b. This is accomplished by writing a set of definitions, which are translated by the DDL compiler to a set of
labels that are permanently stored in the data dictionary.
2. Storage structure and access method definition:
a. The creation of appropriate storage structure and access method.
b. This is accomplished by writing a set of definitions, which are translated by the data storage and
definition language compiler.
3. Schema and physical organization and modification:
a. Modification of the database schema or the description of the physical storage organization.
b. These changes are accomplished by writing a set of definition to do modification to the appropriate
internal system tables.
4. Granting of authorization for data access: DBA grants different types of authorization for data
access to the various users of the database.
5. Integrity constraint specification: DBA carry out data administration in data dictionary such as
defining constraints.

5. What is data abstraction? Explain different levels of abstraction.


Answer:
Data abstraction is the process of finding irrelevant details from user i.e., hiding the background details
from the users.
Different levels of data abstraction:
1. Physical level:
i. Physical level is the lowest level of abstraction and describes how the data are actually stored.
ii. The physical level describes the complex low-level data structures in details.
2. Logical level:
i. Logical level is the next-higher level of abstraction and it describes what data are stored in the database,
and what relationship exists among those data.
ii. The logical level thus describes the entire database in terms of a small number of relatively simple
structures.
3. View level:
i. View level is the highest level of abstraction; it describes only part of the entire database.
ii. The view level of abstraction exists to simplify their interaction with the system.
iii. The system may provide many views for the same database.

6. Explain the differences between physical level, conceptual level and view level of data
abstraction.
Answer:
S.No. Physical level Conceptual/Logical level View level
1. This is the lowest level of This is the middle level of data This is the highest level of
data abstraction. abstraction. data
abstraction.

2. It describes how data is It describes what data is stored in It describes the user
actually stored in database. database. interaction with database
system.

3. It describes the complex It describes the structure of whole It describes only those part
low-level data structures in database and hides details of of the database in which the
detail. physical storage structure. users are interested and
hides rest of all information
from the users.

4. . A user is not aware of the A user is not aware of the A user is aware of the
complexity complexity of database. complexity of
of database. database.

7. Explain the difference between database management system (DBMS) and file system.
Answer
S.No. DBMS File System
1. In DBMS, the user is not required to write In this system, the user has to write the
the procedures. procedures for managing the file.

2. DBMS gives an abstract view of data that File system provides the detail of the data
hides the details. representation and storage of data.

3. DBMS provides a crash recovery File system do not have a crash mechanism, i.e., if
mechanism, i.e., DBMS protects the data the system crashes while entering some data, then
from the system failure. the content of the file will lost.

4. DBMS provides a good protection It is very difficult to protect a file under the file
mechanism. system.

5. DBMS can efficiently store and retrieve the File system cannot efficiently store and retrieve
data. the data.

6. DBMS takes care of In the file system, concurrent access has many
concurrent access of data using some form problems like redirecting the file while other
of locking. deleting some information or updating some
information.
8. Discuss the architecture of DBMS. What are the types of DBMS architecture?
Answer:
1. The DBMS design depends upon its architecture. The basic client/ server architecture is used to deal with
a large number of PCs, web servers, database servers and other components that are connected with
networks.
2. DBMS architecture depends upon how users are connected to the database to get their request done.
Types of DBMS architecture:
i. 1-Tier architecture:
1. In this architecture, the database is directly available to the user.
2. Any changes done are directly done on the database itself. It does not provide a handy tool for end users.
3. The 1-Tier architecture is used for development of the local application, where programmers can directly
communicate with the database for the quick response.
ii. 2-Tier architecture:
1. The 2-Tier architecture is same as basic client-server.
2. In the two-tier architecture, applications on the client end can directly communicate with the database at
the server side. For this interaction, API’s such as: ODBC, JDBC are used.
3. The user interfaces and application programs are run on the client-side.
4. The server side is responsible to provide the functionalities like query processing and transaction
management.
5. To communicate with the DBMS, client-side application establishes a connection with the server side.

Database system Server

Application Client

User

iii. 3-Tier architecture:


1. The 3-Tier architecture contains another layer between the client and server. In this architecture, client
cannot directly communicate with the server.
2. The application on the client-end interacts with an application server which further communicates with
the database system.
3. End user has no idea about the existence of the database beyond the application server. The database also
has no idea about any other user beyond the application.
4. The 3-Tier architecture is used in case of large web application.

Database
Server
Application server

Application client
Client
User

9. What are data models? Briefly explain different types of data models.
Answer:
Data models:
1. Data models define how the logical structure of the database is modeled.
2. Data models are a collection of conceptual tools for describing data, data relationships, data semantics
and consistency constraints.
3. Data models define how data is connected to each other and how they are processed and stored inside the
system.
Types of data models:
1. Entity relationship model:
a. The entity relationship (ER) model consists of a collection of basic objects, called entities and of
relationships among these entities.
b. Entities are represented by means of their properties, called attributes.

Attribute Attribute Attribute Attribute

Entity Relationship Entity

2. Relational model:
a. The relational model represents data and relationships among data by a collection of tables, each of
which has a number of columns with unique names.
b. Relational data model is used for data storage and processing.
c. This model is simple and it has all the properties and capabilities required to process data with storage
efficiency.
3. Hierarchical model:
a. In hierarchical model data elements are linked as an inverted tree structure (root at the top with
branches formed below).
b. Below the single root data element are subordinate elements each of which in turn has its own
subordinate elements and so on, the tree can grow to multiple levels.
c. Data element has parent child relationship as in a tree.
4. Network model:
a. This model is the extension of hierarchical data model.
b. In this model there exist a parent child relationship but a child data element can have more than one
parent element or no parent at all.
5. Object-oriented model:
a. Object-oriented models were introduced to overcome the shortcomings of conventional models like
relational, hierarchical and network model.
b. An object-oriented database is collection of objects whose behaviour, state, and relationships are defined
in accordance with object-oriented concepts (such as objects, class, etc.).

10. Describe data schema and instances.


Answer:
1. The description of a database is called the database schema, which specified during database design and
is not expected to change frequently.
2. Most of the data models have certain convention for displaying schema as diagram which is called as
schema diagram.
3. A schema diagram displays only some aspects of a schema, such as the names of record types and data
items, and some types of constraints.
For example: Schema diagram for student info database
Student (Name_Student_number_Class_Branch)
Course (Course_name_Course_number_Department)
Instances:
1. The data in the database at a particular moment is called a database state or snapshot. It is also called the
current set of occurrences or instances in the database.
Database Management System 1–11 A (CS/IT-Sem-5)
2. In a database state, each schema construct has its own current set of instances.
3. Many database states can be constructed to correspond to a particular database schema. Every time we
insert or delete a record or change the value of a data item in a record, we change one state of the database
into another state.

11. Describe data independence with its types.


OR
Explain data independence with its types.
Answer:
Data independence: Data independence is defined as the capacity to change the schema at one level of a
database system without having to change the schema at the next higher level.
Types of data independence:
1. Physical data independence:
a. Physical data independence is the ability to modify internal schema without changing the conceptual
schema.
b. Modification at the physical level is occasionally necessary in order to improve performance.
c. It refers to the immunity of the conceptual schema to change in the internal schema.
d. Examples of physical data independence are reorganizations of files, adding a new access path or
modifying indexes, etc.
2. Logical data independence:
a. Logical data independence is the ability to modify the conceptual schema without having to change the
external schemas or application programs.
b. It refers to the immunity of the external model to changes in the conceptual model.
c. Examples of logical data independence are addition/removal of entities.

12. Describe the classification of database language. Which type of language is SQL?
OR
Discuss the following terms (i) DDL Command (ii) DML command.
Answer:
Classification of database languages:
1. Data Definition Language (DDL):
a. DDL is set of SQL commands used to create, modify and delete database structures but not data.
b. They are used by the DBA to a limited extent, a database designer, or application developer.
c. Create, drop, alter, truncate are commonly used DDL command.
2. Data Manipulation Language (DML):
a. A DML is a language that enables users to access or manipulates data as organized by the appropriate
data model.
b. There are two types of DMLs:
i. Procedural DMLs: It requires a user to specify what data are needed and how to get those data.
ii. Declarative DMLs (Non-procedural DMLs): It requires a user to specify what data are needed
without specifying how to get those data.
c. Insert, update, delete, query are commonly used DML commands.
3. Data Control Language (DCL):
a. It is the component of SQL statement that control access to data and to the database.
b. Commit, rollback command are used in DCL.
4. Data Query Language (DQL):
a. It is the component of SQL statement that allows getting data from the database and imposing ordering
upon it.
b. It includes select statement.
5. View Definition Language (VDL):
1. VDL is used to specify user views and their mapping to conceptual schema.
2. It defines the subset of records available to classes of users.
3. It creates virtual tables and the view appears to users like conceptual level.
4. It specifies user interfaces.
SQL is a DML language.

13. Explain all database languages in detail with example.


Answer:
Database languages: Refer Q. 1.12
Examples:
DDL:
CREATE, ALTER, DROP, TRUNCATE, COMMENT, GRANT, REVOKE statement
DML:
INSERT, UPDATE, DELETE statement
DCL:
GRANT and REVOKE statement
DQ:
SELECT statement
VDL:
1. create view emp5 as
select * from employee
where dno = 5;
Creates view for dept 5 employees.
2. create view empdept as
select fname, lname, dno, dname
from employee, department
where dno=dnumber;
Creates view using two tables.
14. Explain DBMS interfaces. What are the various DBMS interfaces?

Answer:

DBMS interfaces: A database management system (DBMS) interface is a user interface which allows for
the ability to input queries to a database without using the query language itself.

Various DBMS interfaces are:

1. Menu-based interfaces for web clients or browsing:


a. These interfaces present the user with lists of options (called menus) that lead the user through the
formulation of a request.
b. Pull-down menus are a very popular technique in Web-based user interfaces.

c. They are also often used in browsing interfaces, which allow a user to look through the contents of a
database in an exploratory and unstructured manner.

2. Forms-based interfaces:
a. A forms-based interface displays a form to each user.
b. Users can fill out all of the form entries to insert new data, or they can fill out only certain entries, in
which the DBMS will retrieve matching data for the remaining entries.

3. Graphical user interfaces (GUI):


a. A GUI typically displays a schema to the user in diagrammatic form.
b. The user then can specify a query by manipulating the diagram. In many cases, GUIs utilize both menus
and forms.

4. Natural language interfaces:


a. A natural language interface has its own schema, which is similar to the database conceptual schema, as
well as a dictionary of important words.
b. The natural language interface refers to the words in its schema, as well as to the set of standard words in
its dictionary to interpret the request.
c. If the interpretation is successful, the interface generates a high level query corresponding to the natural
language request and submits it to the DBMS for processing; otherwise, a dialogue is started with the user
to clarify the request.

5. Speech input and output:


a. The speech input is detected using a library of predefined words and used to set up the parameters that
are supplied to the queries. b. For output, a similar conversion from text or numbers into speech takes
place.

6. Interfaces for the DBA:


a. Most database systems contain privileged commands that can be used only by the DBA’s staff.
b. These include commands for creating accounts, setting system parameters, granting account
authorization, changing a schema, and reorganizing the storage structures of a database.

15. Briefly describe the overall structure of DBMS.

OR

Draw the overall structure of DBMS and explain its components in brief.

Answer:
A database system is partitioned into modules that deal with each of the responsibilities of the overall
system. The functional components of a database system can be broadly divided into two components:
1. Storage Manager (SM): A storage manager is a program module that provides the interface between
the low level data stored in the database and the application programs and queries submitted to the system.
The SM components include:
a. Authorization and integrity manager: It tests for the satisfaction of integrity constraints and
checks the authority of users to access data.

b. Transaction manager: It ensures that the database remains in a consistent state despite of system
failures and that concurrent transaction executions proceed without conflicting.

c. File manager: It manages the allocation of space on disk storage and the data structures are used to
represent information stored on disk.

d. Buffer manager: It is responsible for fetching data from disk storage into main memory and deciding
what data to cache in main memory. The buffer manager is a critical part of the database system, since it
enables the database to handle data sizes that are much larger than the size of main memory.
2. Query Processor (QP): The Query Processor (Query Optimizer) is responsible for taking every
statement sent to SQL Server and figure out how to get the requested data or perform the requested
operation. The QP components are:
a. DDL interpreter: It interprets DDL statements and records the definition in data dictionary.
b. DML compiler: It translates DML statements in a query language into an evaluation plan consisting of
low-level instructions that the query evaluation engine understands.
c. Query optimization: It picks the lowest cost evaluation plan from among the alternatives.
d. Query evaluation engine: It executes low-level instructions generated by the DML compiler.

16. What is ER model? What are the elements of ER model?


OR
What are the notations of ER diagram?
Answer:
An entity relationship model (ER model) is a way of representing the entities and the relationships between
the entities in order to create a database.
Elements/notation of ER model/diagram:
1. Entity:
a. An entity is a real world object that can be easily identifiable. b. An entity can be abstract.
c. An entity is an object that exists and is distinguishable from other objects.
2. Entity set:
a. Entity set is a collection of similar type of entities.
b. An entity set may contain entities with attribute sharing similar values.
3. Attribute:
a. An attribute gives the characteristics of the entity.
b. It is also called as data element, data field, a field, a data item, or an elementary item.
4. Relationship:
a. A relationship is the association between entities or entity occurrence.
b. Relationship is represented by diamond with straight lines connecting the entities.

17. What do you understand by attributes and domain? Explain various types of attributes
used in conceptual data model.
Answer:
Attributes:
1. Attributes are properties which are used to represent the entities.
2. All attributes have values. For example, a student entity may have name, class, and age as attributes.
3. There exists a domain or range of values that can be assigned to attributes.
4. For example, a student’s name cannot be a numeric value. It has to be alphabetic.
A student’s age cannot be negative, etc.
Domain:
1. A domain is an attribute constraint which determines the type of data values that are permitted for that
attribute.
2. Attribute domains can be very large, or very short.
Types of attributes used in conceptual data model:
1. Simple attribute: Simple attributes are atomic values, which cannot be divided further. For example, a
student’s phone number is an atomic value of 10 digits.
2. Composite attribute: Composite attributes are made of more than one simple attribute. For example,
a student's complete name may have first_name and last_name.
3. Derived attribute: Derived attributes are the attributes that do not exist in the physical database, but
their values are derived from other attributes present in the database. For example, average_salary in a
department should not be saved directly in the database, instead it can be derived.
4. Single-value attribute: Single-value attributes contain single value. For example,
Social_Security_Number.
5. Multi-value attribute: Multi-value attributes may contain more than one values. For example, a
person can have more than one phone number, email_address, etc.

18. What is purpose of the ER diagram? Construct an ER diagram for a University system
which should include information about students, departments, professors, courses, which
students are enrolled in which course, which professors are teaching which courses, student
grades, which course a department offers.
Answer:
Purpose of the ER diagram:
1. ER diagram is used to represent the overall logical structure of the database.
2. ER diagrams emphasis on the schema of the database and not on the instances because the schema of the
database is changed rarely.
3. It is useful to communicate the logical structure of database to end users.
4. It serves as a documentation tool.
5. It helps the database designer in understanding the information to be contained in the database.
ER diagram:

19. Draw an ER diagram for a small marketing company database, assuming your own data
requirements.
Answer:
20. A university registrar’s office maintains data about the following entities (a) courses,
including number, title, credits, syllabus and prerequisites; (b) course offerings, including
course number, year, semester section number, instructor(s), timings and classroom; (c)
students, including student-id, name and program; and (d) instructors, including
identification number, name department and title. Further the enrollment of students in
courses and grades awarded to students in each course they are enrolled for must be
appropriately modeled. Construct an ER diagram for the registrar’s office. Document all
assumption that you make about the mapping constraints.
Answer:
In this ER diagram, the main entity sets are student, course, course offering and instructor. The entity set
course offering is a weak entity set dependent on course. The assumptions made are:
a. A class meets only at one particular place and time. This ER diagram cannot model a class meeting at
different places at different times.
b. There is no guarantee that the database does not have two classes meeting at the same place and time.

21. Describe mapping constraints with its types.


OR
Describe mapping constraints with its types.
Answer:
1. Mapping constraints act as a rule followed by contents of database.
2. Data in the database must follow the constraints.
Types of mapping constraints are:
1. Mapping cardinalities:
a. Mapping cardinalities (or cardinality ratios) specifies the number of entities of which another entity can
be associated via a relationship set.
b. Mapping cardinalities are used in describing binary relationship sets, although they contribute to the
description of relationship sets that involve more than two entity sets.
c. For binary relationship set R between entity sets A and B, the mapping cardinality must be one of the
following:
i. One to one: An entity in A is associated with at most one entity in B and an entity in B is associated with
at most one entity in A.
ii. One to many: An entity in A is associated with any number of entities in B. An entity in B, however,
can be associated with at most one entity in A.
iii. Many to one: An entity in A is associated with at most one entity in B, and an entity in B, however, can
be associated with any number of entities in A.

iv. Many to many: An entity in A is associated with any number of entities in B, and an entity in B is
associated with any number of entities in A.
2. Participation constraints: It tells the participation of entity sets. There are two types of
participations:
i. Partial participation
ii. Total participation

22. Discuss the candidate key, primary key, super key, composite key and alternate key.
OR
Explain the primary key, super key, foreign key and candidate key with example.
OR
Define key. Explain various types of keys.
Answer:
Key:
1. Key is a attribute or set of attributes that is used to identify data in entity sets.
2. Key is defined for unique identification of rows in table.
Consider the following example of an Employee table:
Employee (EmployeeID, FullName, SSN, DeptID)
Various types of keys are:
1. Primary key:
a. Primary key uniquely identifies each record in a table and must never, be the same for records. Here in
Employee table we can choose either EmployeeID or SSN columns as a primary key.
b. Primary key is a candidate key that is used for unique identification of entities within the table.
c. Primary key cannot be null.
d. Any table has a unique primary key.
2. Super key:
a. A super key for an entity is a set of one or more attribute whose combined value uniquely identifies the
entity in the entity set.
b. For example: Here in employee table (EmployeeID, FullName) or (EmployeeID, FullName, DeptID) is a
super key.
3. Candidate key:
a. A candidate key is a column, or set of column, in the table that can uniquely identify any database record
without referring to any other data.
b. Candidate key are individual columns in a table that qualifies for uniqueness of all the rows. Here in
Employee table EmployeeID and SSN are candidate keys.
c. Minimal super keys are called candidate keys.
Introduction 1–24 A (CS/IT-Sem-5)
4. Composite key:
a. A composite key is a combination of two or more columns in a table that can be used to uniquely identify
each row in the table.
b. It is used when we cannot identify a record using single attributes.
c. A primary key that is made by the combination of more than one attribute is known as a composite key.
5. Alternate key:
a. The alternate key of any table are those candidate keys which are not currently selected as the primary
key.
b. Exactly one of those candidate keys is chosen as the primary key and the remainders, if any are then
called alternate keys.
c. An alternate key is a function of all candidate keys minus the primary key.
d. Here in Employee table if EmployeeID is primary key then SSN would be the alternate key.
6. Foreign key:
a. Foreign key represents the relationship between tables and ensures the referential integrity rule.
b. A foreign key is derived from the primary key of the same or some other table.
c. Foreign key is the combination of one or more columns in a table (parent table) at references a primary
key in another table (child table).
d. A foreign key value can be left null.
For example: Consider another table:
Project (ProjectName, TimeDuration, EmployeeID)
a. Here, the ‘EmployeeID’ in the ‘Project’ table points to the ‘EmployeeID’ in ‘Employee’ table
b. The ‘EmployeeID’ in the ‘Employee’ table is the primary key.
c. The ‘EmployeeID’ in the ‘Project’ table is a foreign key.

23. What do you mean by a key to the relation? Explain the differences between super key,
candidate key and primary key.
Answer
Key: Refer Q.22
Difference between super key, candidate key and primary key:

S. No. Super key Candidate key Primary key


1. Super key is an attribute (or set Candidate key is a minimal set Primary key is a minimal
of attributes) that is used to of super key. set of attributes that
uniquely identifies all uniquely identifies rows in
attributes in a relation. a relation.

2. All super keys cannot be All candidate keys are super Primary key is a subset of
candidate keys. keys but not primary key. candidate key and super
key.

3. It can be null. It can be null. It cannot be null.

4. A relation can have any Number of candidate keys is less Number of primary keys is
number of super keys. than super keys. less than candidate keys.

5. For example, in Fig. 1.23.1, For example, in Fig. 1.23.1, For example, in Fig.
super key are: (Registration), candidate key are : 1.23.1, primary key is:
(Vehicle_id), ( Re g i st r a t i o (Registration, Vehicle_id) (Registration)
n , Vehicle_id), (Registration,
Vehicle_id, Make) etc.
Fig. An entity CAR for defining keys.
24. Explain generalization, specialization and aggregation.
OR
Compare generalization, specialization and aggregation with suitable examples.
Answer:
Generalization:
a. Generalization is a process in which two lower-level entities combine to form higher level entity.
b. It is bottom-up approach.
c. Generalization is used to emphasize the similarities among lower-level entity sets and to hide the
differences.
For example:

Specialization:
a. Specialization is a process of breaking higher-level entity into lower level entity.
b. It is top-down approach.
c. It is opposite to generalization.
Aggregation:
a. Aggregation is an abstraction through which relationships are treated as higher level entities.
For example:
1. The relationship works_on (relating the entity sets employee, branch and job) act as a higher-level entity
set.
2. We can then create a binary relationship ‘Manages’, between works on and manager to represent who
manages what tasks.

Fig. 1.24.3. ER diagram with aggregation.


Comparison:
S. No. Generalization Specialization Aggregation
1. In generalization, the In specialization, an entity of Aggregation is an
common attributes of two or higher-level entity is broken down abstraction through which
more lower-level entities into two or more entities of lower relationships are treated
combines to form a new level. as higher level entities.
higher-level entity.
2. Generalization is a bottom-up Specialization is a top-down It allows us to indicate
approach. approach. that a relationship set
participates in another
relationship set.

3. It helps in reducing the It increases the size of schema. It also increases the size of
schema size. schema.

4. It is applied to group of It can be applied to a single entity. It is applied to group of


entities. relationships.

25. Explain the reduction of ER schema to tables.


OR
How to reduce an ER model into table?
Answer:
1. In ER model, database are represented using the different notations or diagrams, and these notations can
be reduced to a collection of tables.
2. In the database, every entity set or relationship set can be represented in tabular form.
Consider following ER diagram:

Basic rules for converting the ER diagrams into tables are:


1. Convert all the entities in the diagram to tables:
a. All the entities represented in the rectangular box in the ER diagram become independent tables in the
database.
b. In the ER diagram, Student, Course, Lecturer and Subjects forms individual tables.
2. All single-valued attribute becomes a column for the table:
a. All the attributes, whose value at any instance of time is unique, are considered as columns of that table.
b. In the Student entity, Student_Name and Student_ID form the column of Student table. Similarly,
Course_Name and Course_ID form the column of Course table and so on.
3. A key attribute of the entity is the primary key:
a. All the attributes represented in the oval shape and underlined in the ER diagram are considered as key
attribute which act as a primary key of table.
b. In the given ER diagram, Student_ID , Course_ID, Subject_ID, and Lecture_ID are the key attribute of
the Student, Course, Subjects and Lecturer entity.
4. The multivalued attribute is represented by a separate table:
a. In the student table, a hobby is a multivalued attribute.
b. So it is not possible to represent multiple values in a single column of Student table. Hence we create a
table Stud_Hobby with column name Student_ID and Hobby. Using both the column, we create a
composite key.
5. Composite attributes are merged into same table as different columns:
a. In the given ER diagram, student address is a composite attribute. It contains City, Pin, Door_No, Street,
and State.
b. In the Student table, these attributes can merge as an individual column.
6. Derived attributes are not considered in the table:
a. In the Student table, Age is the derived attribute.
b. It can be calculated at any point of time by calculating the difference between current date and Date of
Birth (DoB).
Table structure for given ER diagram is :

26. Discuss extended ER (EER) model.


Answer:
1. The ER model that is supported with the additional semantic concepts is called the extended entity
relationship model or EER model.
2. The EER model includes all the concepts of the original ER model together with the following additional
concepts:
a. Specialization: Refer Q. 1.24
b. Generalization: Refer Q. 1.24
c. Aggregation: Refer Q. 1.24
3. The super class/subclass entity types (or super type /subtype entities) is one of the most important
modelling constructs that is included in the EER model.
4. This feature enables us to model a general entity and then subdivide it into several specialized entity
types (subclasses or subtypes).
5. EER diagrams are used to capture business rules such as constraints in the super type/subtype relations.
Thus, a super class is an entity type that includes distinct subclasses that require to be represented in a data
model.
6. A subclass is an entity type that has a distinct role and is also a member of a super class.

Fig. 1.26.1. Basic notation of the superclass/subclass relationship.

27. What is Unified Modeling Language? Explain different types of UML.


Answer:
1. Unified Modeling Language (UML) is a standardized modeling language enabling developers to specify,
visualize, construct and document artifacts of a software system.
2. UML makes these artifacts scalable, secure and robust in execution.
3. UML is an important aspect involved in object-oriented software development.
4. It uses graphic notation to create visual models of software systems.
Types of UML:
1. Activity diagram:
a. It is generally used to describe the flow of different activities and actions.
b. These can be both sequential and in parallel.
c. They describe the objects used, consumed or produced by an activity and the relationship between the
different activities.
2. Use case diagram:
a. Case diagrams are used to analyze the system’s high-level requirements.
b. These requirements are expressed through different use cases.
3. Interaction overview diagram:
a. The interaction overview diagram is an activity diagram made of different interaction diagrams.
4. Timing diagram:
a. Timing UML diagrams are used to represent the relations of objects when the center of attention rests on
time.
b. Each individual participant is represented through a lifeline, which is essentially a line forming steps
since the individual participant transits from one stage to another.
c. The main components of a timing UML diagram are:
i. Lifeline
ii. State timeline
iii. Duration constraint
iv. Time constraint
v. Destruction occurrence
5. Sequence UML diagram:
a. Sequence diagrams describe the sequence of messages and interactions that happen between actors and
objects.
b. Actors or objects can be active only when needed or when another object wants to communicate with
them.
c. All communication is represented in a chronological manner.
6. Class diagram:
a. Class diagrams contain classes, alongside with their attributes (also referred to as data fields) and their
behaviours (also referred to as member functions).
b. More specifically, each class has three fields: the class name at the top, the class attributes right below the
name, the class operations/ behaviours at the bottom.
c. The relation between different classes (represented by a connecting line), makes up a class diagram.

UNIT-2
Relational Data Model and Language
LONG ANSWER TYPE QUESTIONS
1. What is relational model? Explain with example.
Answer:
1. A relational model is a collection of conceptual tools for describing data, data relationships, data
semantics and consistency constraints.
2. It is the primary data model for commercial data processing applications.
3. The relational model uses collection of tables to represent both data and the relationships among those
data.
4. Each table has multiple columns and each column has a unique name.
For example:
1. The tables represent a simple relational database.
2. The Table(1) shows details of bank customers, Table (2) shows accounts and Table (3) shows which
accounts belong to which customer.
Table (1): Customer table
cust_id c_name c_city

C_101 Ajay Delhi


C_102 Amit Mumbai
C_103 Alok Kolkata
C_104 Akash Chennai

Table (2): Account table


acc_no balance
A-1 1000
A-2 2000
A-3 3000
A-4 4000

Table (3): Depositor table


Cust_id Acc_no
C_101 A-1
C_102 A-2
C_103 A-3
C_104 A-4

3. The Table (1), i.e., customer table, shows the customer identified by cust_id C_101 is named Ajay and
lives in Delhi.
4. The Table (2), i.e., accounts, shows that account A-1 has a balance of rs 1000.
5. The Table (3), i.e., depositor table, shows that account number (acc_no) A-1 belongs to the cust whose
cust_id is C_101 and account number (acc_no) A-2 belongs to the cust whose cust_id is C_102 and
likewise.

2. Explain constraints and its types.


Answer:
1. A constraint is a rule that is used for optimization purposes.
2. Constraints enforce limits to the data or type of data that can be inserted/ updated/deleted from a table.
3. The whole purpose of constraints is to maintain the data integrity during an update/delete/insert into a
table.
Types of constraints:
1. NOT NULL:
i. NOT NULL constraint makes sure that a column does not hold NULL value.
ii. When we do not provide value for a particular column while inserting a record into a table, it takes NULL
value by default.
iii. By specifying NULL constraint, we make sure that a particular column cannot have NULL values.
2. UNIQUE:
i. UNIQUE constraint enforces a column or set of columns to have unique values.
ii. If a column has a unique constraint, it means that particular column cannot have duplicate values in a
table.
3. DEFAULT:
i. The DEFAULT constraint provides a default value to a column when there is no value provided while
inserting a record into a table.
4. CHECK:
i. This constraint is used for specifying range of values for a particular column of a table.
ii. When this constraint is being set on a column, it ensures that the specified column must have the value
falling in the specified range.
5. Key constraints:
i. Primary key:
a. Primary key uniquely identifies each record in a table.
b. It must have unique values and cannot contain null.
ii. Foreign key:
a. Foreign keys are the columns of a table that points to the primary key of another table.
b. They act as a cross-reference between tables.
6. Domain constraints:
i. Each table has certain set of columns and each column allows a same type of data, based on its data type.
ii. The column does not accept values of any other data type.

3. Explain integrity constraints.


Answer:
1. Integrity constraints provide a way of ensuring that changes made to the database by authorized users do
not result in a loss of data consistency.
2. A form of integrity constraint with ER models is:
a. key declarations: certain attributes form a candidate key for the entity set.
b. form of a relationship: mapping cardinalities 1-1, 1-many and many-many.
3. An integrity constraint can be any arbitrary predicate applied to the database.
4. Integrity constraints are used to ensure accuracy and consistency of data in a relational database.

4. Explain the following constraints:


i. Entity integrity constraint
ii. Referential integrity constraint
iii. Domain constraint
Answer
i. Entity integrity constraint:
a. This rule states that no attribute of primary key will contain a null value.
b. If a relation has a null value in the primary key attribute, then uniqueness property of the primary key
cannot be maintained.
Example: In the Table (1) SID is primary key and primary key cannot be null.
Table (1)
SID Name Class (semester) Age
8001 Ankit 1st 19
8002 Srishti 2nd 18
8003 Somveer 4th 22
Saurabh 6th 19

ii. Referential integrity constraint:


a. This rule states that if a foreign key in Table (2) refers to the primary key of Table (3), then every value of
the foreign key in Table (2) must be null or be available in Table (3).
Table (2). Foreign Key
ENO NAME Age DNO
1 Ankit 19 10
2 Srishti 18 11
3 Somveer 22 14 Table (3)
4 Saurabh 19 DN
10 D. Location
O
10 Rohtak
11 Bhiwani
13 Hansi
iii. Domain constraints:
a. Domain constraints specify that what set of values an attribute can take, value of each attribute X must be
an atomic value from the domain of X.
b. The data type associated with domains includes integer, character, string, date, time, currency etc. An
attribute value must be available in the corresponding domain.
Example:
SID Name Class (semester) Age
8001 Ankit 1st
19
8002 Srishti 1st 18
8003 Somveer 4th 22
8004 Saurabh 6th A
A is not allowed here because Age is an integer attribute.

5. What is relational algebra? Discuss its basic operations.


Answer:
1. The relational algebra is a procedural query language.
2. It consists of a set of operations that take one or two relations as input and produces a new relation as a
result.
3. The operations in the relational algebra are select, project, union, set difference, cartesian product and
rename.
Basic relational algebra operations are as follows:
1. Select operation:
a. The select operation selects tuples that satisfies a given predicate. b. Select operation is denoted by sigma
(σ).
c. The predicate appears as a subscript to σ.
d. The argument relation is in parenthesis after the σ.
2. Project operation:
a. The project operation is a unary operation that returns its argument relation with certain attributes left
out.
b. In project operation duplicate rows are eliminated.
c. Projection is denoted by pi (Π).
3. Set difference operation:
a. The set difference operation denoted by (−) allows us to find tuples that are in one relation but are not in
another.
b. The expression r − s produces a relation containing those tuples in r but not in s.
4. Cartesian product operation:
a. The cartesian product operation, denoted by a cross (×), allows us to combine information from any two
relations. The cartesian product of relations r1 and r2 is written as r1 × r2.
5. Rename operation:
a. The rename operator is denoted by rho (ρ).
b. Given a relational algebra expression E,
ρ x(E)
returns the result of expression E under the name x.
c. The rename operation can be used to rename a relation r to get the same relation under a new name.
d. The rename operation can be used to obtain a new relation with new names given to the original
attributes of original relation as
ρ xA1, xA2, ......, xAn(E)

6. Consider the following relations:


Student (ssn, name, address, major)
Course (code, title)
Registered (ssn, code)
Use relational algebra to answer the following:
a. List the codes of courses in which at least one student is registered (registered courses).
b. List the title of registered courses.
c. List the codes of courses for which no student is registered.
d. The titles of courses for which no student is registered.
e. Name of students and the titles of courses they registered to.
f. SSNs of students who are registered for both database systems and analysis of algorithms.
g. SSNs of students who are registered for both database systems and analysis of algorithms.
h. The name of students who are registered for both database systems and analysis of
algorithms.
i. List of courses in which all students are registered.
j. List of courses in which all ‘ECMP’ major students are registered.
Answer:
a. πcode (Registered)
b. πtitle (Course  Registered)
c. πcode (Course) – πcode (Registered)
d. πname ((πcode (Course) – πcode (Registered))  Course)
e. πname, title (Student  Registered  Course))
f&g. πssn (Student  Registered  (σtitle = ‘Database Systems’ Course)) ∪
πssn (Student  Registered  (σtitle = ‘Analysis of Algorithms’ Course))
h. A = πssn (Student  Registered  (σtitle = ‘Database System’ Course)) ∩
πssn (Student  Registered  (σtitle = ‘Analysis of Algorithms’ Course)) πname (A  Student)
A = ρ( ) function
i. πcode, ssn (Registered) / πssn (Student)
j. πcode, ssn (Registered) / πssn (σmajor = ‘ECMP’ Student)

7. What are the additional operations in relational algebra?


Answer:
The additional operations of relational algebra are:
1. Set intersection operation:
a. Set intersection is denoted by ∩, and returns a relation that contains tuples that are in both of its
argument relations. The set intersection operation is written as:
r ∩ s = r – (r – s)
2. Natural join operation:
a. The natural join is a binary operation that allows us to combine certain selections and a cartesian product
into one operation. It is denoted by the join symbol ><.
b. The natural join operation forms a cartesian product of its two arguments, performs a selection forcing
equality on those attributes that appear in both relation schemas and finally removes duplicate attributes.
3. Division operation:
1. In division operation, division operator is denoted by the symbol E (÷).
2. The relation r ÷ s is a relation on schema R – S. A tuple t is in r ÷ s if and only if both of two conditions
hold:
a. t is in ΠR–S (r).
b. For every tuple ts in s, there is a tuple tr in r satisfying both of the following:
i. tr[S] = ts[S]
ii. tr[R – S] = t
3. The division operation can be written in terms of fundamental operation as follows:
r ÷ s = ΠR–S (r) – ΠR–S ((ΠR–S (r) × s) – ΠR–S, S (r))
4. Assignment operation: The assignment operation, denoted by ←, works like assignment in a
programming language.

8. Give the following queries in the relational algebra using the relational schema:
Student (id, name)
Enrolled (id, code)
Subject (code, lecturer)
i. What are the names of students enrolled in cs3020?
ii. Which subjects is Hector taking?
iii. Who teaches cs1500?
iv. Who teaches cs1500 or cs3020?
v. Who teaches at least two different subjects?
vi. What are the names of students in cs1500 or cs307?
vii. What are the names of students in both cs 1500 and cs1200?
Answer:
i. πname (σcode = cs3020 (student >< enrolledin))
ii. πcode (σname = Hector (student >< enrolledin))
iii. πlecturer (σcode = cs1500 (subject))
iv. πlecturer (σcode = cs1500 ∨ ¬ code = cs3020 (subject))
v. For this query we have to relate subject to itself. To disambiguate the relation, we will call the subject
relation R and S. πlecturer(σR.lecture = S.lecturer ∧ R.code< >S.code(R >< S))
vi. πname(σcode = cs1500(student >< enrolledin)) ∪ (πname(σcode = cs307(student >< enrolledin)))
vii. πname(σcode = cs1500(student >< enrolledin)) ∩ πname(σcode = cs1200(student >< enrolledin))

9. What is relational calculus? Describe its important characteristics. Explain tuple and
domain calculus. OR
What is tuple relational calculus and domain relational calculus?
Answer:
1. Relational calculus is a non-procedural query language.
2. Relational calculus is a query system where queries are expressed as formulas consisting of a number of
variables and an expression involving these variables.
3. In a relational calculus, there is no description of how to evaluate a query.
Important characteristics of relational calculus:
1. The relational calculus is used to measure the selective power of relational languages.
2. Relational calculus is based on predicate calculus.
3. In relational calculus, user is not concerned with the procedure to obtain the results.
4. In relational calculus, output is available without knowing the method about its retrieval.
Tuple Relational Calculus (TRC):
1. The TRC is a non-procedural query language.
2. It describes the desired information without giving a specific procedure for obtaining that information.
3. A query in TRC is expressed as:
{t | P(t)}
That is, it is the set of all tuples t such that predicate P is true for t. The notation t[A] is used to denote the
value of tuple t on attribute A and t ∈ r is used to denote that tuple t is in relation r.
4. A tuple variable is said to be a free variable unless it is quantified by a ∃ or ∀.
5. Formulae are built using the atoms and the following rules:
a. An atom is a formula.
b. If P1 is a formula, then so are ¬ P1 and (P1).
c. If P2 and P1 are formulae, then so are P1∨ P2, P1∧ P2 and P1 ⇒ P2.
d. If P1(s) is a formula containing a free tuple variable s, and r is a relation, then ∃ s ∈ r (P1(s)) and ∀ s ∈ r
(P1(s)) are also formulae.
Domain Relational Calculus (DRC):
1. DRC uses domain variables that take on values from an attribute domain, rather than values for an entire
tuple.
2. An expression in the DRC is of the form:
{<x1, x2, ……., xn> | P(x1, x2, ………, xn)}
where x1, x2, ………, xn represent domain variable. P represents a formula compose of atoms.
3. An atom in DRC has one of the following forms:
a. <x1, x2, …….., xn> ∈ r, where r is a relation on n attributes and x 1, x2, ……., xn are domain variables or
domain constant.
b. x θ y, where x and y are domain variable and θ is a comparison operator (< , ≤, =, ≠, >, ≥). The attributes
x and y must have the domain that can be compared.
c. x θ c, where x is a domain variable, θ is a comparison operator and c is a constant in the domain of the
attribute for which x is a domain variable.
4. Following are the rules to build up the formula:
a. An atom is a formula.
b. If P1 is a formula then so is ¬P1.
c. If P1 and P2 are formula, then so are P1∨ P2, P1∧ P2 and P1 ⇒ P2.
d. If P1(x) is a formula in x, where x is a domain variable, then ∃ x (P1(x)) and x ∀ (P1(x)) are also formulae.

10. Write short note on SQL. Explain various characteristics of SQL.


Answer:
1. SQL stands for Structured Query Language.
2. It is a non-procedural language that can be used for retrieval and management of data stored in
relational database.
3. It can be used for defining the structure of data, modifying data in the database and specifying the
security constraints.
4. The two major categories of SQL commands are:
a. Data Definition Language (DDL): DDL provides commands that can be used to create, modify and
delete database objects.
b. Data Manipulation Language (DML): DML provides commands that can be used to access and
manipulate the data, that is, to retrieve, insert, delete and update data in a database.
Characteristics of SQL:
1. SQL usage is extremely flexible.
2. It uses a free form syntax that gives the user the ability to structure SQL statements in a way best suited
to them.
3. Each SQL request is parsed by the RDBMS before execution, to check for proper syntax and to optimize
the request.
4. Unlike certain programming languages, there is no need to start SQL statements in a particular column
or be finished in a single line. The same SQL request can be written in a variety of ways.

11. What are the advantages and disadvantages of SQL?


Answer:
Advantages of SQL:
1. Faster query processing: Large amount of data is retrieved quickly and efficiently. Operations like
insertion, deletion, manipulation of data is done in almost no time.
2. No coding skills: For data retrieval, large number of lines of code is not required. All basic keywords
such as SELECT, INSERT INTO, UPDATE, etc are used and also the syntactical rules are not complex in
SQL, which makes it a user-friendly language.
3. Standardised language: Due to documentation it provides a uniform platform worldwide to all its
users.
4. Portable: It can be used in programs in PCs, server, laptops independent of any platform (Operating
System, etc). Also, it can be embedded with other applications as per need/requirement/use.
5. Interactive language: Easy to learn and understand, answers to complex queries can be received in
seconds.
Disadvantages of SQL:
1. Complex interface: SQL has a difficult interface that makes few users uncomfortable while dealing
with the database.
2. Cost: Some versions are costly and hence, programmers cannot access it.
3. Partial control: Due to hidden business rules, complete control is not given to the database.

12. What are the different datatypes used in SQL?


Answer:
SQL supports following datatypes:
1. char (n): A fixed length character string with user specified maximum length n.
Relational Data Model & Language 2–14 A (CS/IT-Sem-5)
2. varchar (n): A variable length character string with user specified maximum length n.
3. int: An integer which is a finite subset of the integers that is machine dependent.
4. small int: A small integer is machine independent subset of integer domain type.
5. numeric (p, d): A fixed point number with user defined precision. It consists of p digits and d of the p
digits are to the right of the decimal point.
6. real or double precision: Floating point and double precision floating point numbers with machine
dependent precision.
7. float (n): A floating point number with precision of at least n digits.
8. date: A calendar date containing a year (four digit), month (two digit) and day (two digit) of the month.
9. time: The time of the day in hours, minutes and seconds.

13. What are the types of literal used in SQL?


Answer:
The four kinds of literal values supported in SQL are:
1. Character string:
a. Character strings are written as a sequence of characters enclosed in single quotes.
b. The single quote character is represented within a character string by two single quotes. For example,
‘Computer Engg’, ‘Structured Query Language’
2. Bit string:
a. A bit string is written either as a sequence of 0s and 1s enclosed in single quotes and preceded by the
letter ‘B’ or as a sequence of hexadecimal digits enclosed in single quotes and preceded by the letter ‘X’.
b. For example, B’ 1011011’, B’1’, B’0’, X’A 5’, X’T’
3. Exact numeric:
a. These literals are written as a signed or unsigned decimal number possibly with a decimal point.
b. For example, 9, 90, 90.00, 0.9, + 99.99, – 99.99.
4. Approximate numeric:
a. Approximate numeric literals are written as exact numeric literals followed by the letter ‘E’, followed by a
signed or unsigned integer. b. For example, 5E5, 55.5E5, + 55E–5, + 55E–5, 055E, – 5.55E–9.

14. What are the different types of SQL commands?


Answer:
Different types of SQL commands are:
1. Insert:
a. This command is used to insert tuples in a table.
b. This command adds a single tuple at a time in a table. Syntax:
Insert into table_name (attribute1, ..., attributen) values (values_list);
2. Update:
a. This command is used to make changes in the values of attributes of the table.
b. It use set and where clause.
Syntax:
Update table_name set attribute_name = new_value where condition;
3. Delete:
a. This command is used to remove tuples.
b. Tuples can be deleted from only one table at a time.
Syntax:
Delete from table_name where condition;
4. Select: This command is used to retrieve a subset of tuples from one or more table.
Syntax:
Select attribute1, ..., attributen from table_name where condition;
5. Alter table:
a. This command is used to make changes in the structure of a table.
b. This command is used:
i. to add an attribute
ii. to drop an attribute
iii. to rename an attribute
iv. to add and drop a constraint
Syntax:
Alter table table_name add column_name datatype;
Alter table table_name drop column column_name;
Alter table table_name drop constraint constraint_name;

15. Write a short note on SQL DDL commands.


Answer:
a. SQL DDL is used to define relation of a system. The general syntax of SQL sentence is:
VERB (parameter1, parameter2; ......, parametern)
b. The relations are created using CREATE verb.
1. CREATE TABLE: This command is used to create a new relation and the corresponding syntax is:
CREATE TABLE relation_name
(field1 datatype (size), field2 datatype (size),..., fieldn datatype (size));
2. CREATE TABLE ... AS SELECT ...: This type of create command is used to create the structure of a
new table from the structure of existing table.
The generalized syntax of this form is :
CREATE TABLE relation_name1
(field1, field2, ...., fieldn)
AS SELECT field1, field2, ..., fieldn
FROM relation_name2;
c. Structure of relations are changed using ALTER verb.
1. ALTER TABLE ... ADD ...: This is used to add some extra columns into an existing table. The
generalized format is :
ALTER TABLE relation_name
ADD (new field1 datatype (size),
new field2 datatype (size), ......,
new fieldn datatype (size)) ;
2. ALTER TABLE … MODIFY …: This form is used to change the width as well as data type of existing
relations. The generalized syntax is :
ALTER TABLE relation_name
MODIFY (field1 new data type (size),
field2 new data type (size),
----------------
fieldn new data type (size)) ;

16. Draw an ER diagram of Hospital or Bank with showing the specialization, Aggregation,
Generalization. Also convert it in to relational schemas and SQL DDL.
Answer:
Relational schemas:
branch (branch-name, branch-city, assets)
customer (customer-name, customer-street, customer-city, customer-id) account (account-number,
balance)
loan (loan-number, amount)
employee (employee-id, employee-name, telephone-number, start- date, employment length, dependent-
name)
payment (payment-number, payment-amount, payment-date) saving-account (interest-rate)
checking-account (overdraft-amount)
Fig. 2.16.1. ER diagram for a banking enterprise.
SQL DDL of ER diagram:
create table branch (branch-city varchar (40),
branch-name varchar (40) primary key,
assets number (20));
create table customer (customer-id number (5) primary key,
customer-name varchar (40),
customer-street varchar (20),
customer-city varchar (30));
create table loan (loan-number number (6) primary key,
amount number (10));
create table employee (employee-id number(5) primary key,
employee-name varchar (40),
telephone- number (10),
number
start-date date,
employment number (4),
length
dependent- varchar (10));
name
create table payment (payment- number (6), number
payment- number (10),
amount
payment-date date);
create table account (account- number (12) primary key,
number
balance number (10));
create table
saving-account (interest-rate number (3);
create table
checking-account (overdraft- number (15));
amount

17. Describe the operators and its types in SQL.


Answer:
Operators and conditions are used to perform operations such as addition, subtraction or comparison on
the data items in an SQL statement.
Different types of SQL operators are:
1. Arithmetic operators: Arithmetic operators are used in SQL expressions to add, subtract, multiply,
divide and negate data values. The result of this expression is a number value.
Unary Operators (B)
+, - Denotes a positive or negative expression
Binary operators (B)
* Multiplication
/ Division
+ Addition
- Subtraction

2. Comparison operators: These are used to compare one expression with another. The comparison
operators are given below:
Operator Defination
= Equality
!=, <> Inequality
> Greater than
< Less than
≥ Greater than or equal to
≤ Less than or equal to

3. Logical operators: A logical operator is used to produce a single result from combining the two
separate conditions.

4. Operator Definition
AND Returns true if both component conditions are true; otherwise returns false.

OR Returns true if either component condition is true otherwise returns false

NOT Returns true if the condition is false; otherwise returns false.

Set operators: Set operators combine the results of two separate queries into a single result.
Operator Definition
UNION Returns all distinct rows from both queries
INTERSECT Returns common rows selected by both queries

MINUS Returns all distinct rows that are in the first query, but not in second one.

Operator Definition
5. Operator : Prefix for host variable precedence:
a. Precedence , Variable separator defines the
order that the () Surrounds subqueries DBMS uses
when “ Surrounds a literal evaluating the
different operators in
“ “ Surrounds a table or column alias or literal text
the same expression.
() Overrides the normal operator precedence
b. The DBMS evaluates
+, - Unary operators
operators with the highest
*, / Multiplication and division
precedence first before
evaluating the +, - Addition and subtraction operators of
lower || Character concatenation
NOT Reverses the result of an expression
AMD True if both conditions are true
OR True if either conditions are true
UNION Returns all data from both queries
INTERSECT Returns only rows that match both queries
MINUS Returns only row that do not match both queries
Que 2.18. What are the relational algebra operations supported in SQL? Write the SQL
statement for each operation.
Answer:
Basic relational algebra operations: Refer Q. 2.5
SQL statement for relational algebra operations:
1. Select operation: Consider the loan relation,
loan (loan_number, branch_name, amount)
Find all the tuples in which the amount is more than ` 12000, then we write
σ amount > 12000 (loan)
2. Project operation: We write the query to list all the customer names and their cities as :
Π customer_name, customer_city (customer)
3. Set difference operation: We can find all customers of the bank who have an account but not a loan
by writing:
Π customer_name (depositor) – Πcustomer_name (borrower)
4. Cartesian product: We have the following two tables:

PERSONNEL SOFTWARE PACKAGES


Id Name S
101 Jai J1
103 Suraj J2
104 XX
105 BB
106 CC

We want to manipulate the × operation between [personnel × software packages].


Pi id Pi Name S
101 Jai J1
101 Jai J2
103 Suraj J1
103 Suraj J2
104 XX J1
104 XX J2
105 BB J1
105 BB J1
106 CC J1
106 CC J1

5. Rename:
Consider the Book relation with attributes Title, Author, Year and Price. The rename operator is used on
Book relation as follows:
* ρ Temp(Bname, Aname, Pyear, Bprice) (Book)
Here both the relation name and the attribute names are renamed.

19. Give the brief explanation of view.


Answer:
1. A view is a virtual relation, whose contents are derived from already existing relations and it does not
exist in physical form.
2. The contents of view are determined by executing a query based on any relation and it does not form the
part of database schema.
3. Each time a view is referred to, its contents are derived from the relations on which it is based.
4. A view can be used like any other relation that is, it can be queried, inserted into, deleted from and joined
with other relations or views.
5. Views can be based on more than one relation and such views are known as complex views.
6. A view in SQL terminology is a single table that is derived from other tables. These other tables can be
base tables or previously defined views.
Syntax for creating view:
CREATE VIEW view_name
AS SELECT * FROM table_name
WHERE Category IN (‘attribute1’, ‘attribute2’);
For example: Command to create a view consisting of attributes Book_title, Category, Price and P_ID of
the BOOK relation, Pname and State of the PUBLISHER relation can be specified as
CREATE VIEW BOOK_3
AS SELECT BOOK_title, Category, Price, BOOK.P_ID, Pname, State
FROM BOOK, PUBLISHER
WHERE BOOK.P_ID = PUBLISHER.P_ID;

20. Describe indexes in SQL.


Answer:
1. Indexes are special lookup tables that the database search engine can use to speed up data retrieval.
2. An index helps to speed up SELECT queries and WHERE clauses, but it slows down data input, with the
UPDATE and the INSERT statements.
3. Indexes can be created or dropped with no effect on the data.
4. Indexes are used to retrieve data from the database more quickly.
5. The users cannot see the indexes; they are just used to speed up searches/ queries.
6. Syntax:
CREATE INDEX index
ON TABLE column;
where the index is the name given to that index and TABLE is the name of the table on which that index is
created and column is the name of that column for which it is applied.
7. Unique indexes are used for the maintenance of the integrity of the data present in the table as well as for
the fast performance; it does not allow multiple values to enter into the table.
Syntax for creating unique index is:
CREATE UNIQUE INDEX index_name
ON TABLE column;
8. To remove an index from the data dictionary by using the DROP INDEX command.
DROP INDEX index_name;

21. Explain sub-query with example.


Answer:
1. A sub-query is a SQL query nested inside a larger query.
2. Sub-queries must be enclosed within parenthesis.
3. The sub-query can be used with the SELECT, INSERT, UPDATE, or DELETE statement along with the
operators like =, >, <, >=, <=, IN, ANY, ALL, BETWEEN.
4 A sub-query is usually added within the WHERE clause of another SQL SELECT statement.
5. A sub-query is also called an inner query while the statement containing a sub-query is also called an
outer query.
6. The inner query executes first before its parent query so that the result of an inner query can be passed to
the outer query.
Syntax of SQL sub-query:
A sub-query with the IN operator, SELECT column_names
FROM table_name1
WHERE column_name IN (SELECT column_name
FROM table_name2
WHERE condition);
Example:
We have the following two tables ‘student’ and ‘marks’ with common field ‘StudentID’.
Student Marks
StudentID Name StudentID Total_Marks
V001 Abha V001 95
V002 Abhay V002 80
V003 Anand V003 74
V004 Amit V004 81
Now considering table ‘Student’, we want to write a query to identify all students who get more marks than
the student whose StudentID is ‘V002’, but we do not know the marks of ‘V002’.
So, consider another table ‘Marks’ containing total marks of the student and apply query considering both
tables.
SQL code with sub-query:
SELECT a.StudentID, a.Name, b.Total_marks
FROM student a, marks b
WHERE a.StudentID = b.StudentID AND b.Total_marks >
(SELECT Total_marks
FROM marks
WHERE StudentID = ‘V002’);
Query result:
StudentID Name Total_Marks
V001 Abha 95
V004 Amit 81

22. Write full relation operation in SQL. Explain any one of them.
OR
Explain aggregate function in SQL.
Answer:
In SQL, there are many full relation operations like:
i. Eliminating duplicates
ii. Duplicating in union, intersection and difference
iii. Grouping
iv. Aggregate function
Aggregate function:
1. Aggregate functions are functions that take a collection of clues as input and return a single value.
2. SQL offers five built-in aggregate functions:

a. Average: avg Syntax: max ([ Distinct | All] expr)


Syntax: avg ( [ Distinct |All ] n) Purpose: Returns maximum value of expression
Purpose: Returns average value of n, ignoring null Example:
values. SQL> select max(unit_price) as “Maximum Price”
Example: Let us consider a SQL query: from book;
select avg(unit price) as “Average Price” from book; Output:
Output: Maximum Price
450
Average Price d. Sum: sum
359.8 Syntax: sum ([ Distinct | All] n)
b. Minimum: min Purpose: Returns sum of values of n
Syntax: min ( [ Distinct | All ] expr) Example:
Purpose: Returns minimum value of expression SQL> select sum (unit price) as “Total”
Example: from book;
SQL> select min(unit_price) as “Minimum Price” Output:
from book; Total
Output: Minimum Price 1799
250 e. Count: count
Syntax: count ([ Distinct | All] expr)
Purpose: Returns the number of rows where expr
c. Maximum: max is not null Example:
SQL> select count(title) as “No. of Books” No. of Books
from book; 5
Output:

23. Explain how the GROUP BY clause in SQL works. What is the difference between
WHERE and HAVING clause?
Answer:
GROUP BY:
1. GROUP BY was added to SQL because aggregate functions (like SUM) return the aggregate of all column
values every time they are called, and without the GROUP BY function it is not possible to find the sum for
each individual group of column values.
2. The syntax for the GROUP BY function is:
SELECT columns, SUM (column) FROM table GROUP BY column
Example:
This “Sales” Table:
Company Amount
TCS 5500
IBM 4500
TCS 7100

And this SQL:


SELECT Company, SUM(Amount) FROM Sales
GROUP BY Company
Return following result:
Company Amount
TCS 12600
IBM 4500
Difference:
S. No. WHERE HAVING
1. WHERE clause is used for filtering rows HAVING clause is used to filter groups in SQL.
and it applies on each and every row.
2. WHERE clause is used before GROUP HAVING clause is used after GROUP BY clause.
BY clause.
3. WHERE clause can be used with HAVING clause can only be used with SELECT
SELECT, INSERT, UPDATE and query i.e., if we want to perform INSERT, UPDATE
DELETE clause and DELETE clause it will returns an error.
4. We cannot use aggregate functions in the We can use aggregate function in HAVING clause
WHERE clause unless it is in a sub query
contained in a HAVING clause
24. Explain how a database is modified in SQL.
OR
Explain database modification.
Answer:
Different operations that modify the contents of the database are:
1. Delete:
a. The delete operation is used to delete all or specific rows from database.
b. Delete command do not delete values of particular attributes.
c. A delete command operates only on relation or table.
Syntax:
delete from table_name
where condition;
2. Insert:
a. Insert command is used to insert data into a relation/table.
b. The attribute values for inserted tuples must be members of the attribute’s domain specified in the same
order as in the relation schema.
Syntax: Insert into table_name values (attribute1, attribute2, attribute3, ...... attributeN);
3. Updates: Update command is used to update a value in a tuple.
Syntax: Update table_name set column_name = new_value condition;

25. Discuss join and types with suitable example.


OR
Define join. Explain different types of join.
Answer:
A join clause is used to combine rows from two or more tables, based on a related column between them.
Various types of join operations are:
1. Inner join:
a. Inner join returns the matching rows from the tables that are being joined.
For example: Consider following two relations:
Employee (Emp_Name, City)
Employee_Salary (Emp_Name, Department, Salary)
These two relations are shown in Table (1) and (2).
Table. (1). The Employee relation.

Employee
Emp_Name City
Hari Pune
Om Mumbai
Suraj Nashik
Jai Solapur

Table. (2). The Employee_Salary relation.

Employee_Salary
Emp_Name Department Salary
Hari Computer 10000
Om IT 7000
Billu Computer 8000
Jai IT IT 5000

Select Employee.Emp_Name, Employee_Salary.Salary from Employee inner join Employee_Salary on


Employee.Emp_Name = Employee_Salary.Emp_Name;
Result: The result of preceding query with selected fields of Table (1) and Table (2)

Emp_Name Salary
Hari 10000
Om 7000
Jai 5000

2. Outer join:
a. An outer join is an extended form of the inner join.
b. It returns both matching and non-matching rows for the tables that are being joined.
c. Types of outer join are as follows:
i. Left outer join: The left outer join returns matching rows from the tables being joined and also non-
matching rows from the left table in the result and places null values in the attributes that comes from the
right table.
For example:
Select Employee.Emp_Name, Salary
from Employee left outer join Employee_Salary
on Employee.Emp_Name = Employee_Salary.Emp_Name;
Result: The result of preceding query with selected fields of Table (1) and Table (2)

Emp_Name Salary
Hari 10000
Om 7000
Jai 5000
Suraj null
ii. Right outer join: The right outer join operation returns matching rows from the tables being joined,
and also non matching rows from the right table in the result and places null values in the attributes that
comes from the left table.
For example:
Select Employee.Emp_Name, City, Salary from Employee right outer join
Employee_Salary on Employee.Emp_Name =
Employee_Salary. Emp_Name;
Result: The result of preceding query with selected fields of Table (1) and Table (2)

Emp_Name City Salary


Hari Pune 10000
Om Mumbai 7000
Jai Solapur 5000
Billu null 8000

26. Write difference between cross join, natural join, left outer join and right outer join with
suitable example.
Answer:
Cross join:
1. Cross join produces a result set which is the product of number of rows in the first table multiplied by the
number of rows in the second table if no where clause is used along with cross join. This kind of result is
known as Cartesian product.
2. If where clause is used with cross join, it functions like an inner join.
For Example:

Natural join:
1. Natural join joins two tables based on same attribute name and data types.
2. The resulting table will contain all the attributes of both the table but keep only one copy of each common
column.
3. In natural join, if there is no condition specifies then it returns the rows based on the common column.
For example: Consider the following two relations:
Student (Roll_No, Name)
Marks (Roll_No, Marks)
These two relations are shown in Table (1) and (2).
Table (1). The Student relation.
Student
Roll_No Name
1 A
2 B
3 C

Table 2.26.2. The Marks relation.


Marks
Roll_No Name
2 70
3 50
4 85

Consider the query:


Select * from Student natural join Marks;
Result:
Roll_No Name Marks
2 B 70
3 C 50

Left outer join and right outer join: Refer Q.25

27. Describe the SQL set operations.


Answer:
The SQL set operations are:
1. Union operation: Union clause merges the output of two or more queries into a single set of rows and
column.

Fig: Output of union clause.


Output = Record only in query one + records only in query two + A single set of records which is common
in both queries.
2. Intersect operation: The intersect clause outputs only rows produced by both the queries intersected
i.e., the intersect operation returns common records from the output of both queries.

Common
records
in both
queries

Fig: Output of intersect clause


Output = A single set of records which are common in both queries.
3. The except operation: The except also called as Minus outputs rows that are in first table but not in
second table.
Records
only in
query one

Fig: Output of except (Minus) clause


Output = Records only in query one.

28. Explain cursors, sequences and procedures used in SQL.


Answer: Cursors:
1. A cursor is a temporary work area created in the system memory when a SQL statement is executed.
2. A cursor contains information on a select statement and the rows of data accessed by it.
3. A cursor can hold more than one row, but can process only one row at a time.
4. The set of rows the cursor holds is called the active set.
5. There are two types of cursors:
a. Implicit cursors:
i. These are created by default when DML statements like, INSERT, UPDATE, and DELETE statements are
executed.
ii. They are also created when a SELECT statement that returns just one row is executed.
b. Explicit cursors:
i. They must be created when we are executing a SELECT statement that returns more than one row.
ii. When we fetch a row the current row position moves to next row.
Sequences:
Sequences are frequently used in databases because many applications require each row in a table to
contain a unique value and sequences provide an easy way to generate them.
Syntax:
CREATE SEQUENCE [schema]sequence_name
[ AS datatype]
[ START WITH value]
[ INCREMENT BY value]
[ MINVALUE value | NO MINVALUE]
[ MAXVALUE value | NO MAXVALUE]
[ CYCLE | NO CYCLE]
[ CACHE value | NO CACHE];
Procedures:
1. A procedure is a sub-program that performs a specification.
2. A procedure has two parts:
i. Specification: The procedure specification begins with the keyword procedure and ends with the
procedure name or parameter list.
ii. Body: The procedure body begins with the keyword is and ends with the keyword end.
Syntax: To create a procedure,
create or replace procedure <proc name> [parameter list] is < local declaration >
begin
(executable statements)
[exception] (exception handlers)
end;
Syntax: To execute a procedure,
exec < proce_name > (parameters);

29. What is trigger? Explain different trigger with example.


OR
Describe the following terms trigger.
Answer:
Triggers:
1. A trigger is a procedure (code segment) that is executed automatically when some specific events occur in
a table/view of a database.
2. Triggers are mainly used for maintaining integrity in a database. Triggers are also used for enforcing
business rules, auditing changes in the database and replicating data.
Following are different types of triggers:
1. Data Manipulation Language (DML) triggers:
a. DML triggers are executed when a DML operation like INSERT, UPDATE OR DELETE is fired on a Table
or View.
b. DML triggers are of two types:
i. AFTER triggers:
1. AFTER triggers are executed after the DML statement completes but before it is committed to the
database.
2. AFTER triggers if required can rollback its actions and source DML statement which invoked it.
ii. INSTEAD OF triggers:
1. INSTEAD OF triggers are the triggers which get executed automatically in place of triggering DML (i.e.,
INSERT, UPDATE and DELETE) action.
2. It means if we are inserting a record and we have a INSTEAD OF trigger for INSERT then instead of
INSERT whatever action is defined in the trigger that gets executed.
2. Data Definition Language (DDL) triggers:
a. DDL triggers are executed when a DDL statements like CREATE, ALTER, DROP, GRANT, DENY,
REVOKE, and UPDATE STATISTICS statements are executed.
b. DDL triggers can be DATABASE scoped or SERVER scoped. The DDL triggers with server level scope
gets fired in response to a DDL statement with server scope like CREATE DATABASE, CREATE LOGIN,
GRANT_SERVER, ALTER DATABASE, ALTER LOGIN etc.
c. Where as DATABASE scoped DDL triggers fire in response to DDL statement with DATABASE SCOPE
like CREATE TABLE, CREATE PROCEDURE, CREATE FUNCTION, ALTER TABLE, ALTER
PROCEDURE, ALTER FUNCTION etc.
3. LOGON triggers:
a. LOGON triggers get executed automatically in response to a LOGON event.
b. They get executed only after the successful authentication but before the user session is established.
c. If authentication fails the LOGON triggers will not be fired.
4. CLR triggers:
a. CLR triggers are based on the Sql CLR.
b. We can write DML and DDL triggers by using the supported .NET CLR languages like C#, VB.NET etc.
c. CLR triggers are useful if heavy computation is required in the trigger or a reference to object outside
SQL is required.

30. Consider the following relational database employee


(employee_name, street, city works (employee_name, company_name, salary) company
(company_name, city) manage (employee_name, manager_name).
Give an expression in SQL to express each of the following queries:
i. Find the names and cities of residence of all employees who work for XYZ bank.
ii. Find the names, street address, and cities of residence of all employee who works for XYZ
Bank and earn more than Rs. 10,000 per annum.
iii. Find the names of all employees in this database who live in the same city as the company
for which they work.
Answer:
i. Select E.employee_name, city
from employee E, works W
where W.company_name = ‘XYZ Bank’ and
W.employee_name = E.employee_name
ii. Select * from employee
where employee_name in
select employee_name from Works
where company_name=‘XYZ Bank’ and salary>10000
select E. employee_name, street address, city
from Employee as E, Works as W
where E. employee_name=W.person_name and
W.company_name = ‘XYZ Bank’ and W.salary>10000
iii. Select E. employee _name
from Employee as E, Works as W, Company as C
where E. employee _name=W.person_name and E.city=C.city and W.company_name=C.company_name

31. Consider the following relation. The primary key is Rollno, ISBN, Student (Roll No,
Name, Branch), Book (ISBN, Title, Author, Publisher) Issue (Roll No, ISBN, date_of_issue).
Write the query in relational algebra and SQL of the following:
i. List the Roll Number and Name of All CSE Branch Students.
ii. Find the name of students who have issued a book of publication ‘BPB’.
iii. List the title and author of all books which are issued by a student name started with ‘a’.
iv. List the title of all books issued on or before 20/09/2012.
v. List the name of student who will read the book of author named ‘Sanjeev’.
Answer:
i. In relational algebra:
πRoll No, Name (σBranch = “CSE” (Student))
In SQL:
Select Roll No, Name from Students
where Branch = “CSE”;
ii. In relational algebra:
πName σPublisher = “BPB” and Student_Roll No = P.Roll No (Student >< (πRoll No, Publisher (σIssue.ISBN = Book.ISBN ρP (Book >< Issue))))
In SQL:
Select Student.name from Student inner join
(Select Book.Publisher, Issue.Roll No from Issue inner join Book on Issue.ISBN = Book.ISBN as P)
ON Student.Roll No = P. Roll No
where P.Publisher = “BPB”;
iii. In relational algebra:
πS.Title, S.Author σS.Name like(‘a%’) (πT.Name, Book.Author, Book.Title σBook.ISBN = T.ISBN ρS (Book >< (πName, ISBN σStudent.Roll No = Issue.Roll No ρT
(Student >< Issue))));
In SQL:
Select S.title, S.Author from
(Select T.Name, Book.Author, Book.Title from Book inner join (Select Student.Name, Issue.ISBN from
Student inner join Issue
ON Student.Roll No = Issue.Roll No as T)
ON Book.ISBN = T.ISBN as S)
where S.Name like ‘a%’;
iv. In relational algebra:
πTitle (σdate > = 20/09/2012 (Book >< Issue))
In SQL:
Select Book.Title from Book inner join Issue ON Book.ISBN = Issue.ISBN as R
where R.date > = 20/09/2012;
v. In relational algebra:
πName (σAuthor = “Sanjeev” and Student.Roll No = Q.Roll No (Student >< πRoll No, Author σIssue.ISBN = Book.ISBN ρQ (Book >< Issue)))
In SQL:
Select Student.Name from Student inner join
(Select Issue.Roll No, Book.Author from Issue inner join Book ON Issue.ISBN = Book.ISBN as Q)
ON Student.Roll No = Q.Roll No
where Q.Author = “Sanjeev”;

32. Suppose there are two relations R (A, B, C), S (D, E, F). Write TRC and SQL for the
following RAs:
i. IIA, B (R)
ii. σB = 45 (R)
iii. IIA, F (σC = D (R × S))
Answer:
i. IIA, B (R):
TRC: {s.A, s.B|R(s)}
SQL: Select A, B from R;
ii. σB = 45 (R):
TRC: {s |R(s) ^ s.B = 45}
SQL: Select * from R where B = 45;
iii. IIA, F (σC = D (R × S)):
TRC: {t|pr q s (t[A] = p[A] ^ t[F] = q[F] ^ p[C] = q[D])}
SQL: Select A, F from R inner join S ON R.C = S.D;

Que 2.33. Consider the following relational DATABASE. Give an expression in SQL for each
following queries. Underline records are primary key
Employee (person_name, street, city)
Works (person_name, Company_name, salary)
Company (Company_name, city)
Manages (person_name, manager_name)
i. Finds the names of all employees who works for the ABC bank.
ii. Finds the name of all employees who live in the same city and on the same street as do
their managers.
iii. Find the name street address and cities of residence of all employees who work for ABC
bank and earn more than 7,000 per annum.
iv. Find the name of all employee who earn more than every employee of XYZ.
v. Give all employees of corporation ABC a 7% salary raise.
vi. Delete all tuples in the works relation for employees of ABC.
vii. Find the name of all employees in this DATABASE who live in the same city as the
company for which they work.
Answer
i. Select person_name from Works
Where company_name=‘ABC Bank’
ii. Select E1.person_name
From Employee as E1, Employee as E2, Manages as M
Where E1.person_name=M.person_name
and E2.person_name=M.manager_name
and E1.street=E2.street and E1.city=E2.city
iii. Select * from employee
where person_name in
(select person_name from Works
where company_name= ‘ABC Bank’ and salary>7000
select E.person_name, street,
city from Employee as E, Works as W
where E.person_name = W.person_name
and W.company_name=‘ABC Bank’ and W.salary>7000
iv. Select person_name from Works
where salary > all
(select salary from Works
where company_name=‘XYZ’)
select person_name from Works
where salary>(select max(salary) from Works
where company_name=‘XYZ’)
v. Update Works
set salary=salary*1.07
where company_name=‘ABC Bank’
vi. Delete from Works
where company_name=‘ABC Bank’
vii. Select E.person_name
from Employee as E, Works as W, Company as C
where E.person_name=W.person_name and E.city=C.city
and W.company_name=C.company_name
34. Explain embedded SQL and dynamic SQL in detail.
Answer:
Embedded SQL:
1. The SQL standard defines embeddings of SQL in a variety of programming languages such as Pascal,
PL/I, Fortran, C and COBOL.
2. A language in which SQL queries are embedded is referred to as a host language and the SQL structures
permitted in the host language constitute embedded SQL.
3. Programs written in the host language can use the embedded SQL syntax to access and update data
stored in a database.
4. In embedded SQL, all query processing is performed by the database system.
5. The result of the query is then made available to the program one tuple at a time.
6. Embedded SQL statements must be completely present at compile time and compiled by the embedded
SQL pre-processor.
7. To identify embedded SQL requests to the pre-processor, we use the
EXEC, SQL statement as:
EXEC SQL <embedded SQL statement> END.EXEC
8. Variable of the host language can be used within embedded SQL statements, but they must be preceded
by a colon (:) to distinguish them from SQL variables.
Dynamic SQL:
1. The dynamic SQL component of SQL allows programs to construct and submit SQL queries at run time.
2. Using dynamic SQL, programs can create SQL queries as strings at run time and can either have them
executed immediately or have them prepared for subsequent use.
3. Preparing a dynamic SQL statement compiles it, and subsequent uses of the prepared statement use the
compiled version.
4. SQL defines standards for embedding dynamic SQL calls in a host language, such as C, as in the following
example,
char * sqlprog = “update account set balance = balance * 1.05
where account_number =?”;
EXEC SQL prepare dynprog from: sqlprog;
char account [10] = “A-101”;
EXEC SQL execute dynprog using: account;

35. Describe procedures in PL/SQL with its advantages and disadvantages.


Answer
1. PL/SQL is a block-structured language that enables developers to combine the power of SQL with
procedural statements.
2. A stored procedure in PL/SQL is nothing but a series of declarative SQL statements which can be stored
in the database catalogue.
3. A procedure can be thought of as a function or a method.
4. They can be invoked through triggers, other procedures, or applications on Java, PHP etc.
5. All the statements of a block are passed to Oracle engine all at once which increases processing speed and
decreases the traffic.
Advantages of procedures in PL/SQL:
1. They result in performance improvement of the application. If a procedure is being called frequently in an
application in a single connection, then the compiled version of the procedure is delivered.
2. They reduce the traffic between the database and the application, since the lengthy statements are
already fed into the database and need not be sent again and again via the application.
3. They add to code reusability, similar to how functions and methods work in other languages such as
C/C++ and Java.
Disadvantages of procedures in PL/SQL:
1. Stored procedures can cause a lot of memory usage. The database administrator should decide an upper
bound as to how many stored procedures are feasible for a particular application.
2. MySQL does not provide the functionality of debugging the stored procedures.

SHORT ANSWER TYPE QUESTIONS


1. Define the term degree and cardinality.
Ans. Degree: The number of attributes in a relation is known as degree.
Cardinality: The number of tuples in a relation is known as cardinality.
2. What do you mean by referential integrity ?
Ans. In the relational model, for some cases, we often wish to ensure that a value that appears in one
relation for a given set of attributes also appears for a certain set of attributes in another relation. This
condition is called as referential integrity.
3. Explain entity integrity constraints.
Ans. The entity integrity constraint states that primary keys cannot be null. There must be a proper value
in the primary key field. This is because the primary key value is used to identify individual rows in a table.
If there were null values for primary keys, it would mean that we could not identify those rows.
4. Define foreign key constraint.
Ans. A foreign key constraint allows certain attributes in one relation to refer to attributes in another
relation. The relation on which foreign key constraint is defined contains the partial information.
5. With an example show how a referential integrity can be
implemented.
Ans. This rule states that if a foreign key in Table 1 refers to the primary key of Table 2, then every value of
the foreign key in Table 1 must be null or be available in Table 2.

6. When do you get constraints violation? Also, define null value constraint.
Ans. Constraints get violated during update operations on the relation.
Null value constraint: While creating tables, if a row lacks a data value for a particular column, that
value is said to be null.
7. What is the role of join operations in relational algebra?
Ans. The join operation, denoted by, is used to join two relations to form a new relation on the basis of a
common attribute present in the two operand relations.
8. What are characteristics of SQL?
Ans. Characteristics of SQL:
1. SQL usage is extremely flexible.
2. It uses a free form syntax
9. Give merits and demerits of SQL database.
Ans. Merits of SQL database:
i. High speed
ii. Security
iii. Compatibility
iv. No coding required
Demerits of SQL database:
i. Some versions of SQL are costly.
ii. Difficulty in interfacing
iii. Partial control is given to database.
10. What is the purpose of view in SQL?
Ans. A view is a virtual relation, whose contents are derived from already existing relations and it does not
exist in physical form. A view can be used like any other relation, which is, it can be queried, inserted into,
deleted from and joined with other relations or views, though with some limitations on update operations.
11. Which command is used for creating user-defined data types?
Ans. The user-defined data types can be created using the CREATE DOMAIN command.
12. What do you mean by query and subquery?
Ans. Query is a request to database for obtaining some data. A subquery is a SQL query nested inside a
larger query. Subqueries must be enclosed within parenthesis.
13. Write the purpose of trigger.
Ans. Purpose of trigger:
1. Automatically generate derived column values.
2. Prevent invalid transactions.
3. Enforce complex security authorizations.
4. Enforce referential integrity across nodes in a distributed database.
5. Enforce complex business rules.
14. What do you mean by PL / SQL?
Ans. PL/SQL stands for Procedural Language/SQL. PL/SQL extends SQL by adding constructs found in
procedural languages, resulting is a structural language that is more powerful than SQL.
15. What is union compatibility?
Ans. Two relation instances are said to be union compatible if the following conditions hold:
i. They have the same number of the fields.
ii. Corresponding fields, taken in order from left to right, have the same domains.
16. What is Relational Algebra?
Ans. The relational algebra is a procedural query language. It consists of a set of operations that take one or
two relations as input and produces a new relation as a result.
17. Define constraint and its types in DBMS.
Ans. A constraint is a rule that is used for optimization purposes.
Types of constraints:
1. NOT NULL
2. UNIQUE
3. DEFAULT
4. CHECK
5. Key constraints
i. Primary key
ii. Foreign key
6. Domain constraints

UNIT-3
DATA BASE DESIGN & NORMALIZATION
1. Distinguish between functional dependency and multivalued dependency.
Ans. Functional dependency:
A functional dependency, denoted by X → Y, between two sets of attributes X and Y that are subsets of R
specifies a constraint on the possible tuples that can form a relation, state r or R.
Multivalued dependency (MVD):
MVD occurs when two or more independent multivalued facts about the same attribute occur within the
same relation. MVD is denoted by X →→ Y specified on relation schema R, where X and Y are both subsets
of R.
2. When are two sets of functional dependencies said to be equivalent?
Ans. Two sets F1 and F2 of FDs are said to be equivalent, if F1+ = F2+, that is, every FD in F1 is implied by
F2 and every FD in F2 is implied by F1.
3. Define the following:
a. Full functional dependency
b. Partial dependency
Ans.
a. A dependency X → Y in a relational schema R is said to be a fully functionally dependency if there is no A,
where A is the proper subset of X such that A → Y. It implies removal of any attribute from X means that
the dependency does not hold any more.
b. A dependency X → Y in a relational schema R is said to be a partial dependency if there is any attribute A
where A is the proper subset of X such that A → Y. The attribute Y is said to be partially dependent
on the attribute X.
4. What is transitive dependency? Name the normal form which is based on the concept of
transitive dependency.
Ans. An attribute Y of a relational schema R is said to be transitively dependent on attribute X (X → Y), if
there is a set of attributes A that is neither a candidate key nor a subset of any key of R and both
X → A and A → Y hold. The normal form that is based on transitive dependency is 3NF.
5. What is normalization?
Ans. Normalization is the process of organizing a database to reduce redundancy and improve data
integrity.
6. Define 2NF.
Ans. A relation R is in second normal form (2NF) if and only if it is in 1NF and every non-key attribute is
fully dependent on the primary key. A relation R is in 2NF if every non-prime attribute of R is fully
functionally dependent on each relation key.
7. Why is BCNF considered simpler as well as stronger than 3NF?
Ans. BCNF is the simpler form of 3NF as it makes explicit reference to neither the first and second normal
forms nor to the concept of transitive dependence.
In addition, it is stronger than 3NF as every relation that is in BCNF is also in 3NF but the vice versa is not
necessarily true.
8. Define lossless join decomposition.
Ans. Let R be a relational schema and let F be a set of functional dependencies on R. Let R1 and R2 form a
decomposition of R. This decomposition is a lossless join decomposition of R if at least one of
the following functionl dependencies is in F+.
i. R1 R2 → R1
ii. R1 R2 → R2
9. What do you understand by the closure of a set of attributes?
Ans. The closure of a set of attributes implies a certain subset of the closure that consists of all FDs with a
specified set of Z attributes as determinant.
10. What are the uses of the closure algorithm?
Ans. Besides computing the subset of closure, the closure algorithm has other uses that are as follows:
i. To determine if a particular FD, say X → Y is in the closure F+ of F, without computing the closure F+.
This can be done by simply computing X+ by using the closure algorithm and then checking if Y  X+.
ii. To test if a set of attributes A is a super key of R. This can be done by computing A+ and checking if A+
contains all the attributes of R.
11. Describe the dependency preservation property.
Ans. It is a property that is desired while decomposition, that is, no FD of the original relation is lost. The
dependency preservation property ensures that each FD represented by the original relation is enforced by
examining and single relation resulted from decomposition or can be inferred from FDs in some
decomposed relation.

12. What are the various anomalies associated with RDBMS?


OR
What are the different types of anomalies associated with database?
Ans. In RDBMS, certain update anomalies can arise, which are as follows:
i. Insertion anomaly: It leads to a situation in which certain information cannot be inserted in a relation
unless some other information is stored.
ii. Deletion anomaly: It leads to a situation in which deletion of data representing certain information
results in losing data representing some other information that is associated with it.
iii. Modification anomaly: It leads to a situation in which repeated data changed at one place results in
inconsistency unless the same data are also changed at other places.
13. Explain normalization. What is normal form?
Ans. Normalization: Refer Q.5
Normal form: Normal forms are based on the functional dependencies among the attributes of a relation.
These forms are simply stages of database design, with each stage applying more strict rules to the types of
information which can be stored in a table.
14. Why do we normalize database?
Ans. We normalize database:
1. To avoid redundancy
2. To avoid update/delete anomalies
15. Are normal forms alone sufficient as a condition for a good schema design? Explain.
Ans. No, normal forms alone are not sufficient as a condition for a good schema design. There are two
additional properties, namely lossless join property and dependency preservation property that must hold
on decomposition to qualify it as a good design.

LONG ANSWER TYPE QUESTIONS


1. What is functional dependency? Explain its role in database design. Describe the inference
rules for functional dependencies.
Answer:
Functional dependency:
1. A functional dependency is a constraint between two sets of attributes from the database.
2. A functional dependency is denoted by X  Y, between two sets of attributes X and Y that are subsets of
R specifies a constraint on the possible tuples that can form a relation r.
3. The constraint for any two tuples t1 and t2, in r which have
t1 [X] = t2 [X];
Also, must have
t1 [Y] = t2 [Y];
4. This means that the values of the Y component of a tuple in r depends on, or are determined by the value
of the X components, or alternatively, the values of the X component of a tuple uniquely (or functionally)
determine the value of the Y component.
Role of functional dependency:
1. Functional dependency allows the database designer to express facts about the enterprise that the
designer is modeling with the enterprise databases.
2. It allows the designers to express constraints, which cannot be expressed with super keys.
Inference rules for functional dependencies:
1. Reflexivity rule: If  is a set of attributes and then -> holds.
2. Augmentation rule: If  holds and  is a set of attributes then  holds.
3. Transitivity rule: If  holds and  holds, then  holds.
4. Complementation rule: If  hold, then  {R – ( )} holds.
5. Multivalued augmentation rule:  hold and  R and , then  holds.
6. Multivalued transitivity rule: If  holds, then  holds, then – holds.
7. Replication rule: If  holds and  and there is a  such that  R and  =
 and , then  holds.
8. Union rule: if  holds and  holds, then  holds.
9. Decomposition rule: If  holds, then  holds, and  holds.
10. Pseudo transitivity rule: If  holds and  holds, then  holds.

2. What is functional dependency? Explain trivial and non-trivial functional dependency.


Define canonical cover. Compute canonical cover for the following:
R = (A, B, C) F = {A  BC, B  C, A  B, AB  C}
Answer:
Functional dependency: Refer Q.1
Trivial functional dependency: The dependency of an attribute on a set of attributes is known as trivial
functional dependency if the set of attributes includes that attribute.
A  B is trivial functional dependency if B is a subset of A.
Non-trivial functional dependency: If a functional dependency X  Y holds true where Y is not a
subset of X then this dependency is called nontrivial functional dependency.
For example:
Let a relation R (A, B, C)
The following functional dependencies are non-trivial:
A  B (B is not a subset of A)
A  C (C is not a subset of A)
The following dependencies are trivial:
{A, B}  B [B is a subset of {A, B}]
Canonical cover: A canonical cover of a set of functional dependencies F is a simplified set of functional
dependencies that has the same closure as the original set F.
Numerical:
There are two functional dependencies with the same set of attributes
A  BC
AB
These two can be combined to get
A  BC
Now, the revised set F becomes:
F= {
A  BC
BC
AB  C
}
There is an extraneous attribute in AB  C because even after removing AB  C from the set F, we get the
same closures. This is because B  C is already a part of F.
Now, the revised set F becomes:
F= {
A  BC
BC
}
C is an extraneous attribute in A  BC, also A  B is logically implied by A  B and B  C (by transitivity)
F= {
AB
BC
}
After this step, F does not change anymore.
Hence, the required canonical cover is,
F = {A  B, B  C}

3. Explain full functional dependency and partial functional dependency.


Answer:
Full functional dependency:
1. Given a relation scheme R and functional dependency X  Y, Y is fully functionally dependent on X, if
there is no Z, where Z is a proper subset of Y such that Z  Y.
2. The dependency X  Y is left reduced, there being no extraneous attributes in the L.H.S of the
dependency.
For example: In the relational schema R (ABCDEH) with the FDs.
F = {A  BC, CD  E, E  C, CD  AH, ABH  BD, DH  BC}.
The dependency A  BC is left reduced and BD is fully functionally dependent on A.
However the functional dependencies ABH  BC is not left reduced because the attribute B being
extraneous in this dependency.
Partial functional dependency:
1. Given a relation schema R with the functional dependencies F defined on the attributes of R and K as a
candidate keys if X is a proper subset of K and if X  A then A is said to be partially dependent on K.
For example:

i. In Fig. 3.3.1, [Name + Course] is a candidate key, So Name and Course are prime attributes, Grade is fully
functionally dependent on the candidate keys and Phone no., Course-deptt. and roll no. are partially
functional dependent on the candidate key.
ii. Given R (A, B, C, D) and F = {AB  C, B  D}. Then key of this relation is AB and D is partially
dependent on the key.

4. Define partial functional dependency. Consider the following two steps of functional
dependencies F = {A  C, AC  D, E  AD, E  H} and G = {A  CD, E  AH}. Check
whether or not they are equivalent.
Answer:
Partial functional dependency: Refer Q.3
Numerical:
From F,
E  AD
E  A (By Decomposition Rule)
ED
Also given that
EH
So, E  AH (By Union Rule)
which is a FD of set G.
Again A  C and AC  D
Imply A  D (By Pseudotransitivity Rule)
A  CD (by Union Rule)
which is FD of set G.
Hence, F and G are equivalent.

5. Write the algorithm to find minimal cover F for set of functional dependencies E.
Answer:
Algorithm:
1. Set F: = E.
2. Replace each functional dependency X  {A1, A2, ..., An} in F by the n functional dependencies X  A1,
X  A2, ..., X  An.
3. For each functional dependency X  A in F for each attribute B that is an element of X if { {F – {X  A} }
 { (X – {B})  A} } is equivalent to F, then replace X  A with (X – {B} )  A in F.
4. For each remaining functional dependency X  A in F if {F – {X  A} } is equivalent to F, then remove
X  A from F.

6. Define minimal cover. Suppose a relation R (A, B, C) has FD set F = {A  B, B  C, A  C,


AB  B, AB  C, AC  B}. Convert this FD set into minimal cover.
Answer:
Minimal cover: A minimal cover of a set of FDs F is a minimal set of functional dependencies Fmin that
is equivalent to F.
Numerical:
Given: R (A, B, C)
Non-redundant cover for F:
Step 1: Only one attribute on right hand side
F=AB
BC
AC
AB  B
AB  C
AC  B
Step 2: No extraneous attribute on left hand side. Since AB  B, AB  C, AC  B are extraneous attribute.
Hence, remove all these we get
AB
BC
AC
Step 3: By rule of transitivity, we can remove. Hence, we get the minimal cover
AC
AB
BC

7. Define normal forms. List the definitions of first, second and third normal forms. Explain
BCNF with a suitable example.
OR
Explain 1NF, 2NF, 3NF and BCNF with suitable example.
Answer:
1. Normal forms are simply stages of database design, with each stage applying more strict rules to the types
of information which can be stored in a table.
2. Normal form is a method to normalize the relations in database.
3. Normal forms are based on the functional dependencies among the attributes of a relation.
Different normal forms are:
1. First Normal Form (1NF):
a. A relations R is in 1NF if all domains are simple i.e., all elements are atomic.
For example: The relation LIVED-IN given in Table (1) is not in 1NF because the domain values of the
attribute ADDRESS are not atomic.
Table (1). LIVED-IN
Name Address
Ashok CITY Year-moved-in Year-left
Kolkata 2007 2015
Delhi 2011 2015
Ajay CITY Year-moved-in Year-left
Mumbai 2000 2004
Chennai 2005 2009

Relation not in 1NF and can be normalized by replacing the non-simple domain with simple domains. The
normalized form of LIVED-IN is given in Table 3.7.2.
Table 3.7.2. LIVED-IN
Name City Year-moved-in Year-left
Ashok Kolkata 2007 2010
Ashok Delhi 2011 2015
Ajay Mumbai 2000 2004
Ajay Chennai 2005 2009

2. Second Normal Form (2NF):


a. A relation R is in 2NF if and only if it is in 1NF and every non-key attribute is fully dependent on the
primary key.
b. A relation R is in 2NF if every non-prime attribute of R is fully functionally dependent on each relation
key.
For example: The relation flight (flight#, Type_of_aircraft, date, source, destination) with functional
dependencies given is not in 2NF.
flight#  Type_of_aircraft
flight# date  source destination
Here flight# date is key but Type_of_aircraft depends only on flight#.
To convert relation flight (flight#, Type_of_aircraft, data, source, destination) into 2NF break the relation
into two relation:
flight1 (flight#, Type_of_aircraft)
flight2 (flight#, date, source, destination)
3. Third Normal Form (3NF):
a. A relation R is in 3NF if and only if, for all time, each tuple of R consists of a primary key value that
identifies some entity in the database.
b. A relation schema R is in 3NF with respect to a set F of functional dependencies, if for all functional
dependencies in F+ of the form  , where  R and  R, at least one of the following
holds :
i.   is a trivial functional dependency.
ii.  is a super key for R.
iii. Each attribute A in  –  is contained in candidate key for R.
For example: Let us consider a relation R(B, E, F, G, H) with primary key BFGH and functional
dependency are B  F, F  GH. The relation R has transitive property as B  F, F  GH then
B  GH. So R is not in 3NF. To convert relation R in 3NF break the relation R into two relation as R1(B, E,
F) , R2(F, G, H).
4. Boyce-Codd Normal Form (BCNF):
a. A relation R is in BCNF if and only if every determinant is a candidate key.
b. A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional
dependencies in F+ of the form  , where  R and  R, at least one of the following
holds :
i.   is a trivial functional dependency (i.e.,  )
ii.  is a super key for schema R.
c. A database design is in BCNF if each member of the set of relation schemas that constitute the design is
in BCNF.
For example: Let consider a relation R(A, B, C, D, E) with AC as primary key and functional dependencies
in the relation R is given as A  B, C  DE.
To convert relation R into BCNF break the relation in three relation R1(A, B), R2(C, D, E), R3(A, C).
8. Consider the universal relational schema R (A, B, C, D, E, F, G, H, I, J) and a set of
following functional dependencies. F = {AB  C, A  DE, B  F, F  GH, D  IJ} determine
the keys for R? Decompose R into 2nd normal form.
Answer
(AB)+ = ABC AB  C
= ABCDE  A  DE
= ABCDEF BF
= ABCDEFGH  F  GH
= ABCDEFGHIJ  D  IJ
So, AB is key of R.
In the given relation, R has a composite primary key [A,B]. The non-prime attribute are [C, D, E, F, G, H, I,
J].
In this case, FDs are AB  C, A  DE, B  F which is only part of the primary key. Therefore, this table
does not satisfy 2NF.
To bring this table to 2NF, we break the table into three relation as:
R1(A, B, C), R2(A, D, E, I, J) and R3(B, F, G, H).

9. Write the difference between BCNF and 3NF.


Answer:
S. No. BCNF 3NF
1. In BCNF, for any FDs for a relation R, A  B, In 3NF, there should be no transitive
A should be a super key of relation. dependency that is no non-prime attribute
should be transitively dependent on the
candidate key.
2. It is comparatively stronger than 3NF. It is less strong than BCNF.
3. In BCNF, the functional dependencies are In 3NF, the functional dependencies are
already in 1NF, 2NF and 3NF. already in 1NF and 2NF.
4. Redundancy is low. Redundancy is high.
5. In BCNF, there may or may not be In 3NF, there is preservation of all functional
preservation of all. dependencies.
6. It is difficult to achieve. It is comparatively easier to achieve.
7. Lossless decomposition is hard to achieve in Lossless decomposition can be achieved by
BCNF. 3NF.

10. Prove that BCNF is stricter than 3NF.


OR
Prove that BCNF is stronger than 3NF.
Answer:
1. A relation, R, is in 3NF iff for every dependency X  A satisfied by R at least one of the following
conditions:
a. X  A is trivial (i.e., A is subset of X)
b. X is a superkey for R, or
c. A is a key attribute for R. BCNF does not permit the third of these options.
2. BCNF identifies some of the anomalies that are not addressed by 3NF.
3. A relation in BCNF is also in 3NF but vice-versa is not true.
Hence, BCNF is more strict / stronger than 3NF.

11. Write the difference between 3NF and BCNF. Find the normal form of relation R (A, B, C,
D, E) having FD set F= {A  B, BC  E, ED  A}.
Answer:
Difference: Refer Q.9
Numerical:
Given: R (A, B, C, D, E) and
F=AB
BC  E
ED  A
(ACD)+ = ACDB AB
= ABCDE  BC  E
= ABCDE  ED  A
So, ACD is a key of R.
12. Explain inclusion dependencies.
Answer:
1. An inclusion dependency R. X < S.Y between two set of attributes X of relation schema R, and Y of
relation schema S specifies the constraint that, at any specific time when r is a relation state of R and s a
relation state of S, we must have
X(r(R)) Y(s(S))
2. The set of attributes on which the inclusion dependency is specified X of R and Y of S must have the same
number of attributes. Also domains for each pair of corresponding attributes should be compatible.
3. Inclusion dependencies are defined in order to formalize two types of inter relational constraints:
a. The foreign key (or referential integrity) constraint cannot be specified as a functional or multivalued
dependency because it relates attributes across relations.
b. The constraint between two relations that represent a class/subclass relationship also has no formal
definition in terms of the functional, multivalued, and join dependencies.
4. For example, if X = {A1, A2, . . . , An} and Y = {B1, B2, . . . , Bn}, one possible correspondence is to have
dom(Ai) compatible with dom(Bi) for 1  i  n. In this case, we say that Ai corresponds to Bi.

13. Describe lossless decomposition.


OR
Define functional dependency. What do you mean by lossless decomposition? Explain with
suitable example how functional dependencies can be used to show that decompositions are
lossless.
Answer:
Functional dependency: Refer Q.1
Lossless decomposition: A decomposition {R1, R2, ... Rn} of a relation R is called a lossless
decomposition for R if the natural join of R1, R2, ..., Rn produces exactly the relation R.
Following are the condition to show that decompositions are lossless using FD set :
1. Union of attributes of R1 and R2 must be equal to attribute of R. Each attribute of R must be either in R1
or in R2.
Att(R1)  Att(R2) = Att(R)
2. Intersection of attributes of R1 and R2 must not be NULL.
Att(R1)  Att(R2) 
3. Common attribute must be a key for at least one relation (R1 or R2)
Att(R1)  Att(R2)  Att(R1) or Att(R1)  Att(R2)  Att(R2)
For example:
Consider a relation R (A, B, C, D) with FD set A BC and A  D is decomposed into R1(A, B, C) and
R2(A, D) which is a lossless join decomposition as:
1. First condition holds true as:
Att(R1)  Att(R2) = (A, B, C)  (A, D) = (A, B, C, D) = Att(R).
2. Second condition holds true as:
Att(R1)  Att(R2) = (A, B, C)  (A, D) 
3. Third condition holds true as:
Att(R1)  Att(R2) = A is a key of R1(A, B, C) because A BC.

14. Consider the relation r(X, Y, Z, W, Q) the set F = {XZ, YZ, ZW, WQZ, ZQX} and
the decomposition of r into relations R1(X, W), R2(X, Y), R3(Y, Q), R4(Z, W, Q) and R5(X,
Q). Check whether the decompositions are lossy or lossless.
Answer:
To check the decomposition is lossless following condition should hold.
1. R1  R2  R3  R4  R5 = (X, W)  (X, Y)  (Y, Q)  (Z, W, Q)  (X, Q) = (X, Y, Z, W, Q) = R
2. (R1  R2)  (R3  R4)  R5 = ((X, W)  (X, Y))  ((Y, Q)
 (Z, W, Q))  (X, Q) = X  Q {X, Q)
=XQ=
Since, condition 2 violates the condition of lossless join decomposition. Hence decomposition is lossy.

15. What is normalization? Explain.


OR
Write a short note on normalization with advantages.
Answer:
1. Normalization is the process of reducing data redundancy in a relational database.
2. Normalization is a refinement process that the database designer undertakes. After identifying the data
objects of the proposed database, their relationships define the tables required and columns within each
table.
3. The fundamental principle of normalization is, “The same data should not be stored in multiple places.”
No information is lost in the process; however, the number of tables generally increases as the rules are
applied.
Types of normalization: Refer Q. 7
Advantages:
1. It helps to remove the redundancy from the relation.
2. It helps in easy manipulation of data.
3. It helps to provide more information to the user.
4. It eliminates modification anomalies.

16. What is MVD and join dependency? Describe.


OR
Write a short note on MVD or JD.
OR
Describe the multivalued dependency.
Answer:
Multivalued Dependency (MVD):
1. MVD occurs when two or more independent multivalued facts about the same attribute occur within the
same relation.
2. MVD is denoted by X  Y specified on relation schema R, where X and Y are both subsets of R.
3. Both X and Y specifies the following constraint on any relation state r of R: If two tuples t1 and t2 exist in
r such that t1(X) = t2(X), then two tuples t3 and t4 should also exist in r with the following properties,
where we use Z to denote (R – (X  Y))
t3(X) = t4(X) = t1(X) = t2(X)
t3(Y) = t1(Y) and t3 (Z) = t2(Z)
t4(Y) = t2(Y) and t4(Z) = t1(Z)
4. An MVD X  Y in R is called a trivial MVD if
a. X is a subset of Y or
b. X  Y = R
An MVD that satisfies neither (a) nor (b) is called a non-trivial MVD.
For example:
Relation with MVD
Faculty Subject Committee
John DBMS Placement
John Networking Placement
John MIS Placement
John DBMS Scholarship
John Networking Scholarship
John MIS Scholarship

Join Dependency (JD):


1. A Join Dependency (JD), denoted by (R1, R2, ..., Rn) specified on relation scheme R, specifies a
constraint on the states r of R.
2. The constraint states that every legal state r of R should have a lossless join decomposition into R1,
R2, ..., Rn. That is, for every such r, we have (R1(r), R2(r), ...., Rn(r)) = r
3. A join dependency JD (R1, R2, ..., Rn), specified on relation schema R, is a trivial JD if one of the relation
schemas Ri in JD (R1, R2, ..., Rn) is equal to R.
4. Such a dependency is called trivial because it has the lossless join property for any relation state r of R
and hence does not specify any constraint on R.

17. Explain the fourth and fifth normal with suitable example.
Answer:
Fourth Normal Form (4NF):
1. A table is in 4NF, if it is in BCNF and it contains multivalued dependencies.
2. A relation schema R is in 4NF, with respect to a set of dependencies F (that includes FD and multivalued
dependencies) if, for every nontrivial multivalued dependency X  Y in F+, X is super key for R.
For example: A Faculty has multiple courses to teach and he is leading several committees. This relation
is in BCNF, since all the three attributes concatenated together constitutes its key. The rule for
decomposition is to decompose the offending table into two, with the multi-determinant attribute or
attributes as part of the key of both. In this case to put the relation in 4NF, two separate relations are
formed as follows:
FACULTY_COURSE (FACULTY, COURSE)
FACULTY_COMMITTEE (FACULTY, COMMITTEE)
Faculty Course Faculty Committee
John Subject John Placement
John Networking John Scholarship
John MIS

Fifth Normal Form (5NF):


1. A relation is in 5NF, if it is 4NF and cannot be further decomposed.
2. In 5NF, we use the concept of join dependency which is a generalized form of multivalued dependency.
3. A relation schema R is in 5NF or Project Join Normal Form (PJNF) with respect to a set F of functional,
multivalued and join dependencies if, for every non-trivial join dependency JD (R1, R2, .... Rn) in F* (that is
implied by F), every Ri is a superkey of R.
For example:
Company Product Supplier
Godrej Soap Mr. X
Godrej Shampoo Mr. X
Godrej Shampoo Mr. Y
Godrej Shampoo Mr. Z
H.Lever Soap Mr. X
H.Lever Soap Mr. Y
H.Lever Shampoo Mr. Y

The table is in 4NF as there is no multivalued dependency.


If we decompose the table then we will lose information, which can be as follows:
Suppose the table is decomposed into two parts as:

Company_Product Company_Supplier
Company Product Company Supplier
Godrej Soap Godrej Mr. X
Godrej Shampoo Godrej Mr. X
H. Lever Soap Godrej Mr. Z
H. Lever Shampoo H. Lever Mr. X
H. Lever Mr. Y

The redundancy has been eliminated but we have lost the information. Now suppose that the original table
to be decomposed in three parts, Company_Product, Company_Supplier and Product_Supplier, which is as
follows:
Product_Supplier
PRODUCT SUPPLIER
Soap Mr. X
Soap Mr. Y
Shampoo Mr. X
Shampoo Mr. Y
Shampoo Mr. Z

So, it is clear that if a table is in 4NF and cannot be further decomposed, it is said to be in 5NF.

18. What is meant by the attribute preservation condition on decomposition? Given relation
R (A, B,C,D,E) with the functional dependencies F = {ABCD, AE, CD}, the
decomposition of R into R1(A, B, C), R2(B, C, D), R3(C, D, E) check whether the relation is
lossy or lossless.
Answer:
Attribute preservation condition on decomposition:
1. The relational database design algorithms start from a single universal relation schema R = {A1, A2, ...,
An} that includes all the attributes of the database.
2. We implicitly make the universal relation assumption, which states that every attribute name is unique.
3. Using the functional dependencies, the algorithms decompose the universal relation schema R into a set
of relation schemas D = {R1, R2, ..., Rm} that will become the relational database schema; D is called a
decomposition of R.
4. Each attribute in R must appear in at least one relation schema Ri in the decomposition so that no
attributes are lost; formally, we have

This is called the attribute preservation condition of decomposition.


Numerical:

R1= (A, B, C) A B C D E
R2= (B, C, D) R1 a1 a2 a3 a4 a5
R3= (C, D, E) R2 a1 b22 b23 b24 a5
R1 R2 R3= C R3 b31 b32 a3 a4 b35
After applying first two functional dependencies first row contain all “a” symbols. Hence it is lossless join.

19. What are the alternate approaches to database design?


OR
Describe dangling tuples.
Answer:
An alternate approach to database design is dangling tuples:
1. Tuples that “disappear” in computing a join are known as dangling tuples.
a. Let r1(R1), r2(R2), ..., rn(Rn) be a set of relations.
b. A tuple t of relation Ri is a dangling tuple if t is not in the relation:
Ri (r1  r2  ...  rn)
2. The relation r1  r2  ...  rn is called a universal relation since it involves all the attributes in
the “universe” defined by R1  R2  ...  Rn.
3. If dangling tuples are allowed in the database, instead of decomposing a universal relation, we may prefer
to synthesize a collection of normal form schemas from a given set of attributes.

UNIT-4:
TRANSACTION PROCESSING CONCEPT
SHORT ANSWER TYPE QUESTIONS
1. Define transaction.
Ans. A collection of operations that form a single logical unit of work is called a transaction. The operations
that make up a transaction typically consist of requests to access existing data, modify existing data, add
new data or any combination of these requests.
2. Define the term ACID properties.
Ans. ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of database transactions
intended to guarantee validity even in the event of errors, power failures, etc.
3. State the properties of transaction.
Ans. ACID properties of transaction:
i. Atomicity
ii. Consistency
iii. Isolation
iv. Durability
4. Explain I in ACID property.
Ans. I in ACID property stands for isolation i.e., each transaction is unaware of other transaction executing
concurrently in the system.
5. What is serializability? How it is tested?
Ans. Serializability is the classical concurrency scheme which ensures that a schedule for executing
concurrent transaction serially in same order. Serializability is tested by constructing precedence graph.
6. Define schedule.
Ans. A schedule is a list of operations (actions) ordered by time, performed by a set of transactions that are
executed together in the system.
7. What do you mean by serial schedule?
Ans. Serial schedule is a schedule in which transactions in the schedule are defined to execute one after the
other.
8. Define replication in distributed database.
Ans. Replication is a technique used in distributed databases to store multiple copies of a data table at
different sites.
9. Define data atomicity.
Ans. Data atomicity is one of the transaction properties which specify that either all operations of the
transaction are reflected properly in the database or not.
10. Define cascading rollback and blind writes.
Ans. Cascading rollback is a situation in which failure of single transaction leads to a series of transaction
rollbacks. Blind writes are those write operations which are performed without performing the read
operation.
11. Define precedence graph.
Ans. A precedence graph is a directed graph G = (N, E) where N = {T1, T2, .... , Tn} is a set of nodes and E =
{e1, e2 .... en} is a set of directed edges.
12. Give types of failures.
Ans. Types of failures:
i. Transaction failure
ii. System crash
iii. Disk failure
13. Give the idea behind shadow paging technique.
Ans. The key idea behind shadow paging technique is to maintain following two page tables during the life
of transaction:
i. Current page table
ii. Shadow page table
14. Give merits and demerits of shadow paging.
Ans. Merits of shadow paging:
i. The overhead of log record output is eliminated.
ii. Recovery from crashes is significantly faster.
Demerits:
i. Commit overhead
ii. Data fragmentation
iii. Garbage collection
15. What is multimedia database?
Ans. Multimedia database provides features that allow users to store and query different types of
multimedia information, which includes images (pictures or drawings), video clips (movies, news reels,
home video), audio clips (songs, phone messages, speeches) and documents (books, articles).
16. Why is it desirable to have concurrent execution of multiple transactions?
Ans. It is desirable to have concurrent execution of multiple transaction:
i. To increase system throughput.
ii. To reduce average response time.
17. What do you mean by conflict serializable schedule?
Ans. A schedule is called conflict serializable if it can be transformed into a serial schedule by swapping
non-conflicting operation.
18. Define concurrency control.
Ans. Concurrency Control (CC) is a process to ensure that data is updated correctly and appropriately
when multiple transactions are concurrently executed in DBMS.

LONG ANSWER TYPE QUESTIONS


1. Write a short note on transaction.
Answer:
1. A transaction is a logical unit of database processing that includes one or more database access
operations; these include insertion, deletion, modification or retrieval operations.
2. The database operations that form a transaction can be embedded within an application program.
3. By specifying explicit begin transaction and end transaction we can specify the transaction boundaries.
4. If the database operations in a transaction do not update the database but only retrieve data, the
transaction is called a read-only transaction.

2. Explain ACID properties of transaction.


OR
What do you mean by transaction? Explain transaction property with detail and suitable
example.
OR
What do you understand by ACID properties of transaction? Explain in details.
OR
Define transaction and explain its properties with suitable example.
Answer:
Transaction: Refer Q. 1
To ensure integrity of data, the database system maintains some properties of transaction. These properties
are known as ACID properties. Let us consider an example for the set of operations:
1. Deduct the amount Rs.500 from A’s account.
2. Add amount Rs. 500 to B’s account.
ACID properties are as follows:
1. Atomicity: It implies that either all of the operations of the transaction should execute or none of them
should occur.
Example: All operations in this set must be done. If the system fails to add the amount in B’s account after
deducting from A’s account, revert the operation on A’s account.
2. Consistency: The state of database before the execution of transaction and after the execution of
transaction should be same.
Example: Let us consider the initial value of accounts A and B are Rs.1000 and Rs.1500. Now, account A
transfer Rs. 500 to account B.
Before transaction: A + B = 1000 + 1500 = 2500
After transaction: A + B = 500 + 2000 = 2500
Since, total amount before transaction and after transaction are same. So, this transaction preserves
consistency.
3. Isolation: A transaction must not affect other transactions that are running parallel to it.
Example: Let us consider another account C. If there is any ongoing transaction between C and A, it
should not make any effect on the transaction between A and B. Both the transactions should be isolated.
4. Durability: Once a transaction is completed successfully. The changes made by transaction persist in
database.
Example: A system gets crashed after completion of all the operations. If the system restarts it should
preserve the stable state. An amount in A and B account should be the same before and after the system
gets
a restart.

3. Write and describe ACID properties of transaction. How does the recovery manager
ensure atomicity of transactions? How does it ensure durability?
Answer:
ACID properties: Refer Q. 2
Ensuring the atomicity:
1. To ensure atomicity, database system keeps track of the old values of any data on which a transaction
performs a write.
2. If the transaction does not complete its execution, the database system restores the old values.
3. Atomicity is handled by transaction management component.
Ensuring the durability:
1. Ensuring durability is the responsibility of a component called the recovery management component.
2. The durability property guarantees that, once a transaction completes successfully, all the updates that it
carried out on the database persist, even if there is a system failure after the transaction completes
execution.

4. List the ACID properties. Explain the usefulness of each property.


Answer:
ACID properties of transaction: Refer Q. 2
Usefulness of ACID properties:
Atomicity: Atomicity is useful to ensure that if for any reason an error occurs and the transaction is
unable to complete all of its steps, then the system is returned to the state it was in before the transaction
was started.
Consistency: The consistency property is useful to ensure that a complete execution of transaction from
beginning to end is done without interference of other transactions.
Isolation: Isolation property is useful to ensure that a transaction should appear isolated from other
transactions, even though many transactions are executing concurrently.
Durability: Durability is useful to ensure that the changes applied to the database by a committed
transaction must persist in the database.
5. Explain transaction state in brief.
OR
What is transaction? Draw a state diagram of a transaction showing its states. Explain ACID
properties of a transaction with suitable examples.
OR
Draw a transaction state diagram and describe the states that a transaction goes through
during execution.
Answer:
Transaction: Refer Q. 1
State diagram of transaction:
1. Active: The transaction is said to be in the active state till the final statement is executed.

Fig. 1.
2. Partially committed: A transaction is said to be entered in the partial state when final statement gets
executed. But it is still possible that it may have to be aborted, since its actual operation is still resided
in main memory in which the power failure brings failure of its execution.
3. Failed: A transaction enters a failed state after the system determines that the transaction can no longer
proceeds with its normal execution.
4. Aborted: A transaction enters this state after the transaction has been rolled back and the database has
been restored to its state, prior to the start of the transaction.
5. Committed: A transaction enter this state after successful completion.
ACID properties with example: Refer Q. 2

6. How can you implement atomicity in transactions?


Answer:
Implementation of atomicity in transaction can be done in two ways:
1. Completeness:
a. All of the operations encapsulated within a database transaction represent an atomic unit of work.
b. According to atomicity either all of transaction will run to completion (Commit) or none of them.
c. There will not be any partial transaction in left over state from incomplete execution of one or more
operations in a transaction.
d. If the user decides to cancel everything (Rollback), all of the changes made by the transaction will be
undone and the state would be as if the transaction never began by using undo operation.
e. For every change made by operations in the database, it logs undo data to be used to roll back the effects
of operations.
2. Mutual exclusion/locking:
a. Only one transaction will be allowed to progress by taking an exclusive lock on the particular data item.
b. The lock will not be released until the transaction ends (either through rollback, commit or abort).
c. Any other concurrent transaction interested in updating the same row will have to wait.

7. What is serializability? Why serializability is required? Write short note on serializability


of schedule.
Answer:
Serializability: Serializability is a property of a transaction schedule which is used to keep the data in the
data item in consistent state. It is the classical concurrency scheme.
Serializability is required:
1. To control concurrent execution of transaction.
2. To ensure that the database state remains consistent.
Serializability of schedule:
1. In DBMS, the basic assumption is that each transaction preserves database consistency.
2. Thus, the serial execution of a set of transaction preserves database consistency.
3. A concurrent schedule is serializable if it is equivalent to a serial schedule.
8. Discuss conflict serializability with example.
Answer:
1. Consider a schedule S, in which there are two consecutive instructions Ii and Ij of transactions Ti and Tj
respectively (i = j).
2. If Ii and Ij refer to different data items, then swap Ii and Ij without affecting the results of any instruction
in the schedule.
3. However, if Ii and Ij refer to the same data item Q, then the order of the two steps matter.
4. Following are four possible cases:
Ii Ij Swapping possible
Read (Q) Read (Q) Yes
Read (Q) Write (Q) No
Write (Q) Read (Q) No
Write (Q) Write (Q) No

5. Ii and Ij conflict if there are operations by different transactions on the same data item, and at least one of
these instructions is a write operation.
For example:
Schedule S
T1 T2
read (A)
write (A)
read (A)
write(A)
read (B)
write (B)
read (B)
write (B)

i. The write (A) instruction of T1 conflicts with read (A) instruction of T2. However, the write (A) instruction
of T2 does not conflict with the read (B) instruction of T1 as they access different data items.
Schedule S'
T1 T2
read (A)
write (B)
read (A)
read (B)
write(A)
write (B)
read (B)
write (B)

ii. Since the write (A) instruction of T2 in Schedule S’ does not conflict with the read (B) instruction of T1,
we can swap these instructions to generate an equivalent schedule.
iii. Both schedules will produce the same final system state.
6. If a schedule S can be transformed into a schedule S’ by a series of swaps of non-conflicting instructions,
we say that S and S’ are conflict equivalent.
7. The concept of conflict equivalence leads to the concept of conflict serializability and the schedule S is
conflict serializable.

9. Explain view serializability with example.


Answer:
1. The schedule S and S’ are said to be view equivalent if following three conditions met:
a. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then transaction Ti in
schedule S’, must also read the initial value of Q.
b. For each data item Q if transaction Ti executes read (Q) in schedule S and if that value produced by a
write (Q) operation executed by transaction Tj, then the read (Q) operation of transaction Ti, in schedule S’,
must also read the value of Q that was produced by the same write (Q) operation of transaction Tj.
c. For each data item Q, the transaction (if any) that performs the final write (Q) operation in schedule S
must perform the final write (Q) operation in schedule S’.
2. Conditions (a) and (b) ensure that each transaction reads the same values in both schedules and
therefore, performs the same computation. Condition (c), coupled with condition (a) and condition (b)
ensure that both schedules result in the same final system state.
3. The concept of view equivalence leads to the concept of view serializability.
4. We say that schedule S is view serializable, if it is view equivalent to serial schedule.
5. Every conflict serializable schedule is also view serializable but there are view serializable schedules that
are not conflict serializable.
Example:
Schedule S1 Schedule S2
T1 T2 T1 T2
read (A) read (A)
write(A) write (A)
read (B) read (A)
write (B) write(A)
read (A) read (B)
write(A) write (B)
read (B) read (B)
write (B) write (B)
Schedule S1 and S2 are view equivalent as:
1. T1 reads initial value of data item A in S1 and S2.
2. T2 reads value of data item A written by T1 in S1 and S2.
3. T2 writes final value of data item A in S1 and S2.

10. What is schedule? Define the concept of recoverable, cascade less and strict schedules.
Answer:
Schedule: A schedule is a set of transaction with the order of execution of instruction in the transaction.
Recoverable schedule:
A recoverable schedule is one in which for each pair of transaction Ti and Tj if Tj reads a data item
previously written by Ti, the commit operation of Ti appears before the commit operation of Tj.
For example: In schedule S, let T2 commits immediately after executing read (A) i.e., T2 commits before
T1 does. Now let T1 fails before it commits, we must abort T2 to ensure transaction atomicity. But as T2 has
already committed, it cannot be aborted. In this situation, it is impossible to recover correctly from the
failure of T1.
Schedule S
T1 T2
read (A)
write (B)
read (A)
read (B)

Cascade less schedule:


1. A cascade less schedule is one, where for each pair of transaction Ti and Tj such that Tj reads a data item
previously written by Ti, the commit operation to Tj appears before the read operation of Ti.
2. Even if a schedule is recoverable, to recover correctly from the failure of a transaction Ti, we may have to
rollback several transactions. Such situations occur if transactions have read data written by Ti.
Strict schedule:
1. A schedule is called strict if every value written by a transaction T is not read or changed by other
transaction until T either aborts or commits.
2. A strict schedule avoids cascading and recoverability.

11. What is precedence graph? How can it be used to test the conflict serializability of a
schedule?
Answer:
Precedence graph:
1. A precedence graph is a directed graph G = (N, E) that consists of set of nodes N = {T1, T2, ...., Tn} and
set of directed edges E = [e1, e2...., em].
2. There is one node in the graph for each transaction Ti in the schedule.
3. Each edge ei in the graph is of the form (Tj → Tk), 1 ≤ j ≤ n, 1 ≤ k ≤ n, where Tj is the starting node of ei
and Tk is the ending node of ei.
4. Such an edge is created if one of the operations in Tj appears in the schedule before some conflicting
operation in Tk.
Algorithm for testing conflict serializability of schedule S:
a. For each transaction Ti participating in schedule S, create a node labeled Ti in the precedence graph.
b. For each case in S where Tj executes a read_item(X) after Ti executes a write_item(X), create an edge (Ti
→ Tj) in the precedence graph.
c. For each case in S where Tj executes a write_item(X) after Ti executes read_item(X), create an edge (Ti
→ Tj) in the precedence graph.
d. For each case in S where Tj executes a write_item(X) after Ti executes a write_item(X), create an edge
(Ti → Tj) in the precedence graph.
e. The schedule S is serializable if and only if the precedence graph has no cycles.
5. The precedence graph is constructed as described in given algorithm.
6. If there is a cycle in the precedence graph, schedule S is not (conflict) serializable; if there is no cycle, S is
serializable.
7. In the precedence graph, and edge from Ti to Tj means that transaction Ti must come before transaction
Tj in any serial schedule that is equivalent to S, because two conflicting operations appear in the schedule in
that order.
8. If there is no cycle in the precedence graph, we can create an equivalent serial schedule S’ that is
equivalent to S, by ordering the transactions that participate in S as follows: Whenever an edge exists in the
precedence graph from Ti to Tj, Ti must appear before Tj in the equivalent serial schedule S’.
Example:
T1 T2 T3
read (Y);
read (Z);
read (X);
write (X);
write (Y);
write (Z);
read (Z);
read (Y);
write (Y);
read (Y);
write (Y);
read (X);
write (X);

Fig. 4.11.1. Equivalent serial schedules T3 → T1 → T2.

12. Test the serializability of the following schedule:


i. r1(x); r3(x); w1(x); r2(x); w3(x)
ii. r3(x); r2(x); w3(x); r1(x); w1(x)
Answer:
i. The serialization graph is:
Fig. 1.
There are two cycles. It is not serializable.
ii. The serialization graph is:
Fig. 2.
There is no cycle, so it is serialized.
The equivalent serial schedule is:
r3(x), r2(x), w3(x), r1(x), w1(x)

13. Discuss cascade less schedule and cascading rollback. Why is cascade less of schedule
desirable?
Answer:
Cascade less schedule: Refer Q.10
Cascading rollback: Cascading rollback is a phenomenon in which a single failure leads to a series of
transaction rollback.
For example:
Schedule S
T1 T2 T3
read (A)
read (B)
write (A)
read (A)
write (A)
read (A)

In the example, transaction T1 writes a value of A that is read by transaction T2. Transaction T2 writes a
value of A that is read by transaction T3. Suppose that at this point T1 fails. T1 must be rolled back. Since T2
is dependent on T1, T2 must be rolled back, since T3 is dependent on T2, T3 must be rolled back.
Need for cascade less schedules:
Cascade less schedules are desirable because the failure of a transaction does not lead to the aborting of any
other transaction. This comes at the cost of less concurrency.

14. Discuss the rules to be followed while preparing a serializable schedule. Why should we
prefer serializable schedules instead of serial schedules?
Answer:
The set of rules which must be followed for preparing serializable schedule are:
1. Take any concurrent schedule.
2. Draw the precedence graph for concurrent schedule.
3. If there is a cycle in precedence graph then schedule is not serializable.
4. If there is no cycle the schedule is serializable.
5. Prepare serializable schedule using precedence graph.
We prefer serializable schedule instead of serial schedule because:
1. The problem with serial schedule is that it limits concurrency or interleaving of operations.
2. In a serial schedule, if a transaction waits for an I/O operation to complete, we cannot switch the CPU
processor to another transaction, thus wasting valuable CPU processing time.
3. If some transaction T is quite long, the other transactions must wait for T to complete all its operations
before committing.

15. What are schedules? What are differences between conflict serializability and view
serializability? Explain with suitable example what are cascade less and recoverable
schedules?
Answer:
Schedule: Refer Q. 10
Difference between conflict and view serializability:
S. No. Conflict serializability View serializability
1. Easy to achieve Difficult to achieve
2. Cheaper to test Expensive to test
3. Every conflict serializable is view Every view serializable is not conflict serializable
serializable
4. Used in most concurrency control scheme Not used in concurrency control scheme

Cascade less schedule: Refer Q. 13


Recoverable schedule: Refer Q. 10

16. What is schedule? What are its types? Explain view serializable and cascade less schedule
with suitable example of each.
Answer:
Schedule: Refer Q.10
Types of schedules are:
1. Recoverable schedule
2. Cascade less schedule
3. Strict schedule
View serializable: Refer Q. 9
Cascade less schedule: Refer Q. 10

17. Which of the following schedules are conflicts serializable? For each serializable
schedule find the equivalent schedule.
S1: r1(x); r3(x); w3(x); w1(x); r2(x)
S2: r3(x); r2(x); w3(x); r1(x); w1(x)
S3: r1(x); r2(x); r3(y); w1(x); r2(z); r2(y); w2(y)
Answer:
For S1 For S2 For S3
T1 T2 T3 T1 T2 T3 T1 T2 T3
r1(x) r3(x) R3(x)
r3(x) r2(x) R3(x)
w3(x) w3(x) R1(x)
w1(x) r1(x) W3(x)
r2(x) w1(x W1(x)
)

Since, the graph does Since, the graph does Since, the graph contains
not contain cycle. not contain cycle. cycle. Hence, it is not
Hence, it is conflict Hence, it is conflict conflict serializable.
serializable. serializable.

18. Explain log-based recovery.


OR
What is log? How is it maintained? Discuss the features of deferred database modification
and immediate database modification in brief.
Answer:
1. The log / system log is a sequence of log records, recording all the update activities in the database.
2. Various types of log records are denoted as:
a. <Ti start>: Transaction Ti has started.
b. <Ti, Xj, V1, V2>: Transaction Ti has performed a write on data item Xj. Xj had value V1 before the write,
and will have value V2 after the write.
c. <Ti commit>: Transaction Ti has committed.
d. <Ti abort>: Transaction Ti has aborted.
3. Whenever a transaction performs a write, it is essential that the log record for that write be created
before the database is modified.
Log based recovery: Log based recovery is a method to ensure atomicity using log when failure occurs.
In log-based recovery, following two techniques are used to ensure atomicity and to maintain log:
1. Deferred database modification:
i. The deferred database modification technique ensures transaction atomicity by recording all database
modifications in the log, but deferring the execution of all write operations of a transaction until the
transaction partially commits.
ii When a transaction partially commits, the information on the log associated with the transaction is used
in executing the deferred writes.
Features of deferred database modification:
1. All logs written onto the database is updated when a transaction commits.
2. It does not require old value of data item on the log.
3. It do not need extra I/O operation before commit time.
4. It can manage with large memory space.
5. Locks are held till the commit point.
2. Immediate database modification:
i. The immediate database modification technique allows database modifications to be output to the
database while the transaction is still in the active state.
ii. Data modification written by active transactions.
Features of immediate database modification:
1. All logs written onto the database is updated immediately after every operation.
2. It requires both old and new value of data item on the log.
3. It needs extra I/O operation to flush out block-buffer.
4. It can manage with less memory space.
5. Locks are released after modification.

19. Describe shadow paging recovery technique.


Answer:
1. Shadow paging is a technique in which multiple copies (known as shadow copies) of the data item to be
modified are maintained on the disk.
2. Shadow paging considers the database to be made up of fixed-size logical units of storage called pages.
3. These pages are mapped into physical blocks of storage with the help of page table (or directory).
4. The physical blocks are of the same size as that of the logical blocks.
5. A page table with n entries is constructed in which the ith entry in the page table points to the ith database
page on the disk as shown in Fig. 4.19.1.
6. The main idea behind this technique is to maintain two page tables.
a. In current page the entries points to the most recent database pages on the disk. When a transaction
starts, the current page table is copied into a shadow page table (or shadow directory).
b. The shadow page table is then saved on the disk and the current page table is used by the transaction.
The shadow page table is never modified during the execution of the transaction.

Fig. 1. Shadow paging.


When shadow paging does not require log:
It does not require the use of log in an environment where only one transaction is active at a time.

20. What do you mean by checkpointing? Explain important types of checkpointing


methods.
Answer:
Checkpointing:
1. It is a process of saving a snapshot of the application’s state, so that it can restart from that point in case
of failure.
2. Checkpoint is a point of time at which a record is written onto the database from the buffers.
3. Checkpointing shortens the recovery process.
Types of checkpointing techniques:
1. Consistent checkpointing:
a. Consistent checkpointing creates a consistent image of the database at checkpoint.
b. During recovery, only those transactions which take place after last checkpoint are undone or redone.
c. The transactions that take place before the last consistent checkpoint are already committed and need not
be processed again.
d. The actions taken for checkpointing are:
i. All changes in main-memory buffers are written onto the disk.
ii. A “checkpoint” record is written in the transaction log.
iii. The transaction log is written to the disk.
2. Fuzzy checkpointing:
a. In fuzzy checkpointing, at the time of checkpoint, all the active transactions are written in the log.
b. In case of failure, the recovery manager processes only those transactions that were active during
checkpoint and later.
c. The transactions that have been committed before checkpoint are written to the disk and hence need not
be redone.

21. What is log file? Write the steps for log-based recovery of a system with suitable example.
Answer:
Log file: A log file is a file that records all the update activities occur in the database.
Steps for log-based recovery:
1. The log file is kept on a stable storage media.
2. When a transaction enters the system and starts execution, it writes a log about it
<Tn, start>
3. When the transaction modifies an item X, it write log as follows
<Tn, X, V1, V2>
It reads as Tn has changed the value of X, from V1 to V2.
4. When the transaction finishes, it logs
<Tn, commit>
For example:
<T0, start>
<T0, A, 0, 10>
<T0, commit>
<T1, start>
<T1, B, 0, 10>
<T2, start>
<T2, C, 0, 10>
<T2, C, 10, 20>
<checkpoint (T1, T2)>
<T3, start>
<T3, A, 10, 20>
<T3, D, 0, 10>
<T3, commit>

22. Describe the important types of recovery techniques. Explain their advantages and
disadvantages.
Answer:
There are many different database recovery techniques to recover a database:
1. Deferred update recovery: Refer Q.18
Advantages:
a. Recovery is easy.
b. Cascading rollback does not occur because no other transaction sees the work of another until it is
committed.
Disadvantages:
a. Concurrency is limited.
2. Immediate update recovery: Refer Q. 18
Advantages:
a. It allows higher concurrency because transactions write continuously to the database rather than waiting
until the commit point.
Disadvantages:
a. It leads to cascading rollbacks.
b. It is time consuming and may be problematic.

23. What is a deadlock? Describe methods to handle a deadlock.


OR
What is deadlock? How it can be detected and avoided?
Answer:
Deadlock:
1. A deadlock is a situation in which two or more transactions are waiting for locks held by the other
transaction to release the lock.
2. Every transaction is waiting for another transaction to finish its operations.
Methods to handle a deadlock:
1. Deadlock prevention protocol: This protocol ensures that the system will not go into deadlock state.
There are different methods that can be used for deadlock prevention:
a. Pre-declaration method: This method requires that each transaction locks all its data item before it
starts execution.
b. Partial ordering method: In this method, system imposes a partial ordering of all data items and
requires that a transaction can lock a data item only in the order specified by partial order.
c. Timestamp method: In this method, the data item are locked using timestamp of transaction.
2. Deadlock detection:
a. When a transaction waits indefinitely to obtain a lock, system should detect whether the transaction is
involved in a deadlock or not.
b. Wait-for-graph is one of the methods for detecting the deadlock situation.
c. In this method a graph is drawn based on the transaction and their lock on the resource.
d. If the graph created has a closed loop or a cycle, then there is a deadlock.
Wait-for-lock(R1)

T1 T2

Wait-for-lock(R2)
Fig. 1. Wait-for-graph.
3. Recovery from deadlock:
a. Selection of a victim: In this we determine which transaction (or transactions) to roll back to break
the deadlock. We should rollback those transactions that will incur the minimum cost.
b. Rollback: The simplest solution is a ‘‘total rollback’’. Abort the transaction and then restart it.
c. Starvation: In a system where selection of transactions, for rollback, is based on the cost factor, it may
happen that the some transactions are always picked up.
4. Deadlock avoidance: Deadlock can be avoided by following methods:
a. Serial access: If only one transaction can access the database at a time, then we can avoid deadlock.
b. Auto commit transaction: It includes that each transaction can only lock one resource immediately
as it uses it, then finishes its transaction and releases its lock before requesting any other resource.
c. Ordered updates: If transactions always request resources in the same order (for example,
numerically ascending by the index value of the row being locked) then system do not enter in deadlock
state.
d. By rolling back conflicting transactions.
e. By allocating the locks where needed.

24. Discuss about the deadlock prevention schemes.


OR
Discuss about deadlock prevention schemes.
Answer:
Deadlock prevention schemes:
1. Wait-die scheme:
i. In this scheme, if a transaction request to lock a resource (data item), which is already held with
conflicting lock by some other transaction, one of the two possibilities may occur:
a. If TS(Ti) < TS(Tj), i.e., Ti, which is requesting a conflicting lock, is older than Tj, Ti is allowed to wait until
the data item is available.
b. If TS(Ti) > TS(Tj), i.e., Ti is younger than Tj, so Ti dies. Ti is restarted later with random delay but with
same timestamp.
ii. This scheme allows the older transaction to wait but kills the younger one.
2. Wound-wait scheme:
i. In this scheme, if a transaction request to lock a resource (data item), which is already held with
conflicting lock by some other transaction, one of the two possibilities may occur:
a. If TS(Ti) < TS(Tj), i.e., Ti, which is requesting a conflicting lock, is older than Tj, Ti forces Tj to be rolled
back, that is Ti wounds Tj. Tj is restarted later with random delay but with same timestamp.
b. If TS(Ti) > TS(Tj), i.e., Ti is younger than Tj, Ti is forced to wait until the resource (i.e., data item) is
available.
ii. This scheme, allows the younger transaction to wait but when an older transaction request an item held
by younger one, the older transaction forces the younger one to abort and release the item. In both cases,
transaction, which enters late in the system, is aborted.

25. What is deadlock? What are necessary conditions for it? How it can be detected and
recovered?
Answer:
Deadlock: Refer Q. 23
Necessary condition for deadlock: A deadlock situation can arise if the following four conditions hold
simultaneously in a system:
1. Mutual exclusion: At least one resource must be held in a non-sharable mode; that is, only one process
at a time can use the resource. If another process requests that resource, the requesting process must be
delayed until the resource has been released.
2. Hold and wait: A process must be holding at least one resource and waiting to acquire additional
resources that are currently being held by other processes.
3. No pre-emption: Resources cannot be pre-empted; i.e., a resource can be released only by the process
holding it, after that process has completed its task.
4. Circular wait: A set {P0, P1, ..., Pn} of waiting processes must exist such that P0 is waiting for a resource
held by P1, P1 is waiting for a resource held by P2, ..., Pn–1 is waiting for a resource held by Pn, and Pn is
waiting for a resource held by P0.
Deadlock detection and recovery: Refer Q. 23

26. What is distributed databases? What are the advantages and disadvantages of
distributed databases?
OR
Explain the advantages of distributed DBMS.
Answer
Distributed database:

Fig. 1. Distributed database.


1. A distributed database system consists of collection of sites, connected together through a
communication network.
2. Each site is a database system site in its own right and the sites have agreed to work together, so that a
user at any site can access anywhere in the network as if the data were all stored at the user’s local site.
3. Each side has its own local database.
4. A distributed database is fragmented into smaller data sets.
5. DDBMS can handle both local and global transactions.
Advantages of DDBMS:
1. DDBMS allows each site to store and maintain its own database, causing immediate and efficient access
to data.
2. It allows access to the data stored at remote sites. At the same time users can retain the control to its own
site to access the local data.
3. If one site is not working due to any reason (for example, communication link goes down) the system will
not be down because other sites of the network can possibly continue functioning.
4. New sites can be added to the system anytime with no or little efforts.
5. If a user needs to access the data from multiple sites then the desired query can be subdivided into sub-
queries in parallel.
Disadvantages of DDBMS:
1. Complex software is required for a distributed database environment.
2. The various sites must exchange message and perform additional calculations to ensure proper
coordination among the sites.
3. A by-product of the increased complexity and need for coordination is the additional exposure to
improper updating and other problems of data integrity.
4. If the data are not distributed properly according to their usage, or if queries are not formulated
correctly, response to requests for data can be extremely slow.

27. What are atomic commit protocols?


Answer:
1. Atomic commit protocols are the key element in supporting global atomicity of distributed transactions.
2. Two-phase commit protocol (2PC) is the standard atomic commit protocol.
3. 2PC is important to guarantee correctness properties in the complex distributed world whilst at the same
time it reduces parallelism due to high disk and message overhead and locking during windows of
vulnerability.
4. An atomic commitment problem requires processes to agree on a common outcome which can be either
commit or abort.
5. An atomic commit protocol must guarantee the following atomic commitment properties:
a. AC1: All processes that reach an outcome reach the same one.
b. AC2: A process cannot reverse its outcome after it has reached one.
c. AC3: The commit outcome can only be reached if all participant voted Yes.
d. AC4: If there are no failures and all participants voted Yes, then the outcome will be commit.
e. AC5: Consider any execution containing only failures that the protocol is designed to tolerate.
6. At any point in the execution, if all existing failures are repaired and no new failures occur for sufficiently
long, then all processes will eventually reach an outcome.

28. Explain replication and its types in distributed system.


Answer:
1. Replication is a technique of replicating data over a system.
2. Replication is a key to the effectiveness of distributed systems in that, it provides enhanced performance,
high availability and high fault tolerance.
3. The replication is the maintenance of copies of data at multiple computers.
4. Replication is a technique for enhancing a service.
5. When data are replicated, the replication transparency is required i.e., clients should not normally have
to be aware that multiple copies of data exist.
Types of replications:
i. Active replication:
1. In active replication each client request is processed by all the servers.
2. This requires that the process hosted by the servers is deterministic, i.e., given the same initial state and a
request sequence, all processes will produce the same response sequence and end up in the same final state.
Active replication
Client

Process Process Process

State State State

Server server server


Fig. 1.
3. In order to make all the servers receive the same sequence of operations, an atomic broadcast protocol
must be used.
4. An atomic broadcast protocol guarantees that either all the servers receive a message or none, plus that
they all receive messages in the same order.
ii. Passive replication:
1. In passive replication there is only one server (called primary) that processes client requests.
2. After processing a request, the primary server updates the state on the other (backup) servers and sends
back the response to the client.
3. If the primary server fails, one of the backup servers takes its place.
4. Passive replication may be used even for non-deterministic processes.
Passive replication
Client

Process Process Process

State State State

Server server server


Fig. 2.
29. Explain data fragmentation with types.
Answer:
Fragmentation:
1. It is the decomposition of a relation into fragments.
2. It permits to divide a single query into a set of multiple sub-queries that can execute parallel on
fragments.
3. Fragmentation is done according to the data selection patterns of applications running on the database.
Fragmentation techniques/types are as follows:
1. Vertical fragmentation:
a. It divides a relation into fragments which contain a subset of attributes of a relation along with the
primary key attribute of the relation.
Name Reg. No. Course Dept

Fragmentation1 Fragmentation2 Fragmentation3

Fig. 1. Vertical fragmentation.


b. The purpose of vertical fragmentation is to partition a relation into a set of smaller relations to enable
user applications to run on only one fragment.
2. Horizontal fragmentation:
a. It divides a relation into fragments along its tuples. Each fragment is a subset of tuples of a relation.
b. It identifies some specific rows based on some criteria and marks it as a fragment.
Name Reg. No. Course Depth
Fragmentation1
Fragmentation2
Fragmentation3
Fragmentation4

Fig. 2. Horizontal fragmentation.


c. Various horizontal fragmentation techniques are:
i. Primary horizontal fragmentation: This type of fragmentation is done where the tables in a
database are neither joined nor have dependencies. So, no relationship exists among the tables.
ii. Derived horizontal fragmentation: Derived horizontal fragmentation is used for parent relation. It
is used where tables are interlinked with the help of foreign keys. It ensures that the fragments which are
joined together are put on the same site.
3. Hybrid/mixed fragmentation:
a. The mixed/hybrid fragmentation is combination of horizontal and vertical fragmentations.
b. This type is most complex one, because both types are used in horizontal and vertical fragmentation of
the DB application.
c. The original relation is obtained back by join or union operations.

Fig. 3. Hybrid/mixed fragmentation.

30. What are distributed database? List advantages and disadvantages of data replication
and data fragmentation. Explain with a suitable example, what are the differences in
replication and fragmentation transparency?
OR
Explain the types of distributed data storage.
OR
What are distributed database? List advantage and disadvantage of data replication and data
fragmentation.
Answer:
Distributed database: Refer Q.26
Advantages of data replication:
i. Availability: If one of the sites containing relation r fails, then the relation r can be found in another
site. Thus, the system can continue to process queries involving ‘r’, despite the failure of one site.
ii. Increased parallelism: Number of transactions can read relation r in parallel. The more replicas of ‘r’
there are, the greater parallelism is achieved.
Disadvantages of data replication:
i. Increased overhead on update: The system must ensure that all replicas of a relation r are
consistent; otherwise, erroneous computation may result. Thus, whenever r is updated, the update must be
propagated to all sites containing replicas. The result is increased overhead.
Advantages of data fragmentation:
i. Parallelized execution of queries by different sites is possible.
ii. Data management is easy as fragments are smaller compare to the complete database.
iii. Increased availability of data to the users/queries that are local to the site in which the data stored.
iv. As the data is available close to the place where it is most frequently used, the efficiency of the system in
terms of query processing, transaction processing is increased.
v. Data that are not required by local applications are not stored locally. It leads to reduced data transfer
between sites, and increased security.
Disadvantages of data fragmentation:
i. The performance of global application that requires data from several fragments located at different sites
may be slower.
ii. Integrity control may be more difficult if data and functional dependencies are fragmented and located at
different sites.
Differences in replication and fragmentation transparency:
S. No. Replication transparency Fragmentation transparency
1. It involves placing copies of each table or It involves decomposition of a table in many
each of their fragments on more than one tables in the system.
site in the system.
2. The user does not know about how many The user does not know about how relation is
replicas of the relation are present in the divided/ fragmented in the system.
system.
3. For example, if relation r is replicated, a For example, if relation ‘r’ is fragmented, ‘r’ is
copy of relation r is stored in two or more divided into a number of fragments. These
sites. In extreme case, a copy is stored in fragments contain sufficient information to
every site in the system which is called as allow reconstruction of the original relation ‘r’.
full replication.

There are two types of distributed data storage:


1. Data fragmentation: Refer Q.29
2. Data replication: Refer Q.28
31. Discuss the types of distributed database.
Answer:
Distributed databases are classified as:
1. Homogeneous distributed database:
a. In this, all sites have identical database management system software.
b. All sites are aware of one another, and agree to co-operate in processing user’s requests.
2. Heterogeneous distributed database:
a. In this, different sites may use different schemas, and different database management system software.
b. The sites may not be aware of one another, and they may provide only limited facilities for co-operation
in transaction processing.

32. What is concurrency control? Why it is needed in database system.


OR
Explain concurrency control. Why it is needed in database system?
Answer:
1. Concurrency Control (CC) is a process to ensure that data is updated correctly and appropriately when
multiple transactions are concurrently executed in DBMS.
2. It is a mechanism for correctness when two or more database transactions that access the same data or
dataset are executed concurrently with time overlap.
3. In general, concurrency control is an essential part of transaction management.
Concurrency control is needed:
1. To ensure consistency in the database.
2. To prevent following problem:
a. Lost update:
i. A second transaction writes a second value of a data item on top of a first value written by a first
concurrent transaction, and the first value is lost to other transactions running concurrently which need, by
their precedence, to read the first value.
ii. The transactions that have read the wrong value end with incorrect results.
b. Dirty read:
i. Transactions read a value written by a transaction that has been later aborted.
ii. This value disappears from the database upon abort, and should not have been read by any transaction
(“dirty read”).
iii. The reading transactions end with incorrect results.

33. Explain concurrency control mechanism performed in distributed databases?


Answer:
Following are concurrency control mechanism in distributed database:
a. Two-phase commit protocol: Two-phase commit protocol is designed to allow any participant to
abort its part of transaction. Due to the requirement for atomicity, if one part of a transaction is aborted
then the whole transaction must also be aborted.
Following are the two-phase used in this protocol:
Phase 1 (voting phase):
1. The co-ordinator sends a can Commit? request to each of the participants in the transaction.
2. When a participant receives can Commit? request it replies with its vote (Yes or No) to the co-ordinator.
Before voting Yes, it prepares to commit by saving objects in permanent storage. If the vote is No, the
participant aborts immediately.
Phase 2 (completion according to outcome of vote):
1. The co-ordinator collects the votes (including its own).
a. If there are no failures and all the votes are Yes the co-ordinator decides to commit the transaction and
sends a do Commit request to each of the participants.
b. Otherwise the co-ordinator decides to abort the transaction and sends do Abort requests to all
participants that voted Yes.
2. Participants that voted Yes are waiting for a do Commit or do Abort request from the co-ordinator. When
a participant receives one of these messages it acts accordingly and in the case of commit, makes a have
Committed calls as confirmation to the co-ordinator.

Co-ordinator Participant
status step status step
can Commit?
1 prepared to commit 2 Prepared to
(waiting for votes) Yes commit
(uncertain)
do Commit
3 committed 4 committed
done have Committed

Fig. 4.23.2. Communication in two-phase commit protocol.

b. Moss concurrency control protocol:


1. Moss concurrency control protocol for nested transactions is based on the concept of upward inheritance
of locks.
2. A transaction can acquire a lock on object O in some mode M.
3. Doing that, it holds the lock in mode M until its termination.
4. Besides holding a lock, a transaction can retain a lock in mode M.
5. When a sub-transaction commits, its parent transaction inherits its locks and then retains them. If a
transaction holds a lock, it has the right to access the locked object (in the corresponding mode).
6. However, the same is not true for retained locks.
7. A retained lock is only a place holder and indicates that transactions outside the hierarchy of the retainer
cannot acquire the lock, but that descendants potentially can.
8. As soon as a transaction becomes a retainer of a lock, it remains a retainer for the lock until it terminates.

34. Explain directory system in detail.


Answer:
1. A directory is a listing of information about some class of objects such as persons.
2. Directories can be used to find information about a specific object, or in the reverse direction to find
objects that meet a certain requirement.
3. In the networked world, the directories are present over a computer network, rather than in a physical
(paper) form.
4. A directory system is implemented as one of more servers, which service multiple clients.
5. Clients use the application programmer interface defined by the directory system to communicate with
the directory servers.
Directory access protocols:
1. Directory access protocol is a protocol that allows to access directory information through program.
2. Directory access protocols also define a data model and access control.
3. For instance, web browsers can store personal bookmarks and other browser settings in a directory
system. A user can thus access the same settings from multiple locations, such as at home and at work,
without having to share a file system.

UNIT-5
CONCURRENCY CONTROL TECHNIQUES
SHORT ANSWER TYPE QUESTIONS
1. Why is concurrency control needed?
Ans. Concurrency control is needed so that the data can be updated correctly when multiple transactions
are executed concurrently.
2. Write down the main categories of concurrency control.
Ans. Categories of concurrency control are:
i. Optimistic ii. Pessimistic
iii. Semi-optimistic
3. What do you mean by optimistic concurrency control?
Ans. Optimistic concurrency control states means transactions fails when they commit with conflicts. It is
useful where we do not expect conflicts but if it occurs than the committing transaction is rollbacked and
can be restarted.

4. Define locks.
Ans. A lock is a variable associated with each data item that indicates whether read or write operation is
applied.
5. Define the modes of lock.
Ans. Data items can be locked in two modes:
1. Exclusive (X) mode: If a transaction Ti has obtained an exclusive mode lock on item Q, then Ti can
read as well as write Q data item.
2. Shared (S) mode: If a transaction Ti has obtained a shared mode lock on item Q, then Ti can only read
the data item Q but Ti cannot write the data item Q.
6. Give merits and demerits of two-phase locking.
Ans. Merits of two phase locking:
i. It maintains database consistency.
ii. It increases concurrency over static locking as locks are held for shorter period.
Demerits of two-phase locking:
i. Deadlock
ii. Cascade aborts / rollback
7. Define lock compatibility.
Ans. Lock compatibility determines whether locks can be acquired on a data item by multiple transactions
at the same time.
8. Define upgrade and downgrade in locking protocol.
Ans. Upgrade: Upgrade is the lock conversion from shared to exclusive mode. It takes place only in
growing phase.
Downgrade: Downgrade is the lock conversion from exclusive to shared mode. It can take place only in
shrinking phase.
9. Define the term intention lock.
Ans. Intention lock is a type of lock mode used in multiple granularity locking in which a transaction
intends to explicitly lock a lower level of the tree. To provide a higher degree of concurrency, intention
mode is associated with shared mode and exclusive mode.
10. What are the pitfalls of lock based protocol?
Ans. Pitfalls of lock based protocols are:
i. Deadlock can occur.
ii. Starvation is also possible if concurrency control manager is badly designed.
11. Define exclusive lock.
Ans. Exclusive lock is a lock which provides only one user to read a data item at a particular time.
12. Define timestamp.
Ans. A timestamp is a unique identifier created by the DBMS to identify a transaction. This timestamp is
used in timestamp based concurrency control techniques.
13. Define multi version scheme.
Ans. Multi version concurrency control is a scheme in which each write(Q) operation creates a new version
of Q. When a transaction issues a read(Q) operation, the concurrency control manager selects one of the
version of Q to be read that ensures serializability.
14. Define Thomas’ write rule.
Ans. Thomas’ write rule is the modification to the basic timestamp ordering, in which the rules for write
operations are slightly different from those of basic timestamp ordering. It does not enforce conflict
serializability.

LONG ANSWER TYPE QUESTIONS


1. Describe concurrency control.
Answer:
1. Concurrency Control (CC) is a process to ensure that data is updated correctly and appropriately when
multiple transactions are concurrently executed in DBMS.
2. It is a mechanism for correctness when two or more database transactions that access the same data or
dataset are executed concurrently with time overlap.
3. In general, concurrency control is an essential part of transaction management.
Concurrency control is needed:
1. To ensure consistency in the database.
2. To prevent following problem:
a. Lost update:
i. A second transaction writes a second value of a data item on top of a first value written by a first
concurrent transaction, and the first value is lost to other transactions running concurrently which need, by
their precedence, to read the first value.
ii. The transactions that have read the wrong value end with incorrect results.
b. Dirty read:
i. Transactions read a value written by a transaction that has been later aborted.
ii. This value disappears from the database upon abort, and should not have been read by any transaction
(“dirty read”).
iii. The reading transactions end with incorrect results.

2. What is lock? Explain different types of locks.


OR
Describe lock based locking techniques.
Answer:
Lock: A lock is a variable associated with each data item that indicates whether read or write operation is
applied.
Different types of locks are:
1. Binary lock:
i. A binary lock can be two states or values: locked and unlocked (or 1 and 0).
ii. A distinct lock is associated with each database item X.
iii. If the value of the lock on X is 1, item X cannot be accessed by a database operation that requests the
item.
iv. If the value of the lock on X is 0, the item can be accessed when requested.
v. We refer to the current value (or state) of the lock associated with item X as lock(X).
vi. Two operations, lock_item and unlock_item, are used with binary locking.
If the simple binary locking scheme is used, every transaction must obey the following rules:
i. A transaction T must issue the operation lock_item(X) before any read_item(X) or write_item(X)
operations are performed in T.
ii. A transaction T must issue the operation unlock_item(X) after all read_item(X) and write_item(X)
operations are completed in T.
iii. A transaction T will not issue a lock_item(X) operation if it already holds the lock on item X.
iv. A transaction T will not issue an unlock_item(X) operation unless it already holds the lock on item X.
2. Shared/Exclusive locks:
i. In this scheme there are three locking operations: read_lock(X), write_lock(X), and unlock(X).
ii. A lock associated with an item X, lock(X), has three possible states: read-locked, write-locked and
unlocked.
iii. A read-locked item is also called share-locked because other transactions are allowed to read the item,
whereas a write-locked item is called exclusive-locked because a single transaction exclusively holds the
lock on the item.
If the shared/exclusive locking scheme is used, every transaction must obey the following
rules:
i. A transaction T must issues the operation read_lock(X) or write_lock(X) before any read_item(X)
operation is performed in T.
ii. A transaction T must issue the operation write_lock(X) before any write_item(X) operation is performed
in T.
iii. A transaction T must issue the operation unlock(X) after all read_item(X) and write_item(X) operations
are completed in T.
iv. A transaction T will not issue a read_lock(X) operation if it already holds a read (shared) lock or a write
(exclusive) lock on item X.
v. A transaction T will not issue a write_lock(X) operation if it already holds a read (shared) lock or write
(exclusive) lock on item X.
vi. A transaction T will not issue an unlock(X) operation unless it already holds a read (shared) lock or a
write (exclusive) lock on item X.

3. What do you understand by lock compatibility? Explain with example.


Answer:
1. Lock compatibility determines whether locks can be acquired on a data item by multiple transactions at
the same time.
2. Suppose a transaction Ti requests a lock of mode m1 on a data item Q on which another transaction Tj
currently holds a lock of mode m2.
3. If mode m2 is compatible with mode m1, the request is immediately granted, otherwise rejected.
4. The lock compatibility can be represented by a matrix called the compatibility matrix.
5. The term “YES” indicates that the request can be granted and “NO” indicates that the request cannot be
granted.
Requested mode Shared Exclusive
Shared Yes No
Exclusive No No

Fig. 1. Compatibility matrix.


4. How is locking implemented? How are requests to lock and unlock a data item handled?
Answer:
Implementation of locking:
1. The locking or unlocking of data items is implemented by a subsystem of the database system known as
the lock manager.
2. It receives the lock requests from transactions and replies them with a lock grant message or rollback
message (in case of deadlock).
3. In response to an unlock request, the lock manager only replies with an acknowledgement. In addition, it
may result in lock grant messages to other waiting transactions.
The lock manager handles the requests by the transaction to lock and unlock a data item in
the following way:
1. Lock request:
a. When a first request to lock a data item arrives, the lock manager creates a new linked list to record the
lock request for the data item.
b. It immediately grants the lock request of the transaction.
c. If the linked list for the data item already exists, it includes the request at the end of the linked list.
d. The lock request will be granted only if the lock request is compatible with all the existing locks and no
other transaction is waiting for acquiring lock on this data item otherwise, the transaction has to wait.
2. Unlock request:
a. When an unlock request for the data items arrives, the lock manager deletes the record corresponding to
that transaction from the linked list for the data item.
b. It then checks whether other waiting requests on that data item can be granted.
c. If the request can be granted, it is granted by the lock manager, and the next record, if any, is processed.
d. If a transaction aborts, the lock manager deletes all waiting lock requests by the transaction.
e. In addition, the lock manager releases all locks acquired by the transaction and updates the records in the
lock table.

5. Describe how a typical lock manager is implemented. Why must lock and unlock be
atomic operations? What is the difference between a lock and a latch? What are convoys and
how should a lock manager handle them?
Answer:
Implementation of lock manager:
1. A typical lock manager is implemented with a hash table, also called lock table, with the data object
identifier as the key.
2. A lock table entry contains the following information:
a. The number of transactions currently holding a lock on the object.
b. The nature of the lock.
c. A pointer to a queue of lock requests.
Reason for lock and unlock being atomic operations: Lock and unlock must be atomic operations
because it may be possible for two transactions to obtain an exclusive lock on the same object, thereby
destroying the principles of 2PL.
Difference between lock and latch:
S. No. Lock Latch
1. It is used when we lock any data item. It is used when we release lock
2. Hold for long duration. Hold for short duration
3. It is used at initial stage of transaction. It is used when all the operations are completed

Convoy:
1. Convoy is a queue of waiting transactions.
2. It occurs when a transaction holding a heavily used lock is suspended by the operating system, and every
other transaction that needs this lock is queued.
Lock manager handle convoy by allowing a transaction to acquire lock only once.

6. Write short notes on lock-based protocols.


Answer:
1. Lock based protocol indicates when a transaction may lock and unlock the data items, during the
concurrent execution. It restricts the number of possible schedules.
2. It ensures that the data items must be accessed in mutual exclusive manner and for this we use different
lock modes.
3. There are two modes in which a data item may be locked:
i. Shared mode lock: If a transaction Ti has obtained a shared mode lock on item Q then Ti can read but
cannot write Q. It is denoted by S.
ii. Exclusive lock: If a transaction Ti has obtained an exclusive mode lock on item Q then Ti can read and
also write Q. It is denoted by X.

7. Explain two-phase locking technique for concurrency control.


OR
What is two-phase locking (2PL)? Describe with the help of example.
OR
Explain two phase locking protocol with suitable example.
Answer:
1. Two-phase locking is a procedure in which a transaction is said to follow the two-phase locking protocol if
all locking operations precede the first unlock operation in the transaction.
2. In 2PL, each transaction lock and unlock the data item in two phases:
a. Growing phase: In the growing phase, the transaction acquires locks on the desired data items.
b. Shrinking phase: In the shrinking phase, the transaction releases the locks acquired by the data items.
3. According to 2PL, the transaction cannot acquire a new lock, after it has unlocked any of its existing
locked items.
4. Given below, the two transactions T1 and T2 that do not follow the two-phase locking protocol.
T1 T2
Read-lock (Y); Read-lock (X);
Read-item (Y); Read-item (X);
Unlock (Y); Unlock (X);
Write-lock (X); Write-lock (Y);
Read-item (X); Read-item (Y);
X: = X + 1; Y : = Y + 1;
Write-item (X); Write-item (Y);
Unlock (X); Unlock (Y);
5. This is because the write-lock (X) operation follows the unlock (Y) operation in T1, and similarly the
write-lock (Y) operation follows the unlock (X) operation in T2.
6. If we enforce two-phase locking, the transaction can be rewritten as:
T1 T2
Read-lock (Y); Read-lock (X);
Read-item (Y); Read-item (X);
Write-lock (X); Write-lock (Y);
Unlock (Y); Unlock (X);
Write-lock (X); Write-lock (Y);
Read-item (X); Read-item (Y);
X : = X + 1; Y : = Y + 1;
Write-item (X); Write-item (Y);
Unlock (X); Unlock (Y);
7. It can be proved that, if every transaction in a schedule follows the two-phase locking protocol, the
schedule is guaranteed to be serializable, obviating the need to test for serializability of schedules any more.

8. Discuss strict 2PL.


Answer:
1. Cascading rollbacks can be avoided by a modification of two-phase locking called the strict two-phase
locking protocol.
2. This protocol requires not only that locking be two phase, but also that all exclusive-mode locks taken by
a transaction be held until that transaction commits.
3. This requirement ensures that any data written by an uncommitted transaction are locked in exclusive
mode until the transaction commits, preventing any other transaction from reading the data.
4. Strict two-phase is the most widely used locking protocol in concurrency control. This protocol has two
rules:
a. If a transaction T wants to read (modify) an object, it first requests
a shared (exclusive) lock on the object.
b. All locks held by a transaction are released when the transaction is completed.
5. If strict two-phase locking is used for concurrency control, locks held by a transaction T may be released
only after the transaction has been rolled back.
6. Once transaction T (that is being rolled back) has updated a data item, no other transaction could have
updated the same data item, because of the concurrency control requirements.
7. Therefore, restoring the old value of the data item will not erase the effects of any other transaction.

Fig. 1.
9. Write the salient features of graph based locking protocol with suitable example.
Answer:
Salient features of graph based locking protocol are:
1. The graph based locking protocol ensures conflict serializability.
2. Free from deadlock.
3. Unlocking may occur earlier in the graph based locking protocol than in the two phase locking protocol.
4. Shorter waiting time, and increase in concurrency.
5. No rollbacks are required.
6. Data items may be unlocked at any time.
7. Only exclusive locks are considered.
8. The first lock by T1 may be on any data item. Subsequently, a data Q can be locked by T1 only if the
parent of Q is currently locked by T1.
9. A data item that has been locked and unlocked by T1 cannot subsequently be relocked by T1.
For example:
We have three transactions in this schedule, i.e., we will only see how locking and unlocking of data item.
T1 T2 T3
Lock-X(A)
Lock-X(D)
Lock-X(H)
Unlock-X(D)
Lock-X(E)
Lock-X(D)
Unlock-X(B)
Unlock-X(E)
Lock-X(B)
Lock-X(E)
Unlock-X(H)

The schedule is conflict serializable.


Serializability for locks can be written as T2 → T1 → T3.
10. Describe two-phase locking technique for concurrency control. Explain. How does it
guarantee serializability?
Answer:
Two-phase locking technique: Refer Q. 7
Two-phase locking guarantee the serializability\:
1. Two-phase locking protocol restricts the unwanted read/write by applying exclusive lock.
2. Moreover, when there is an exclusive lock on an item it will only be released in shrinking phase.
3. Due to this restriction, there is no chance of getting any inconsistent state. Because any inconsistency
may only be created by write operation. In this way the two-phase locking protocol ensures serializability.

11. Describe major problems associated with concurrent processing with examples. What is
the role of locks in avoiding these problems?
OR
Describe the problem faced when concurrent transactions are executing in uncontrolled
manner. Give an example and explain.
Answer:
Concurrent transaction : Concurrent transaction means multiple transactions are active at the same
time. Following problems can arise if many transactions try to access a common database simultaneously:
1. The lost update problem :
a. A second transaction writes a second value of a data item on top of a first value written by a first
concurrent transaction, and the first value is lost to other transactions running concurrently which need, by
their precedence, to read the first value.
b. The transactions that have read the wrong value end with incorrect results.
Example:
Transaction T1 Transaction T2
Read X → A2
Read Y → A1
A2 + A1 → A2

Read X → A2
A2 + 1 → A2
Write A2 → X
Write A1 → Y
Time Write A2 → X

In the example, the update performed by the transaction T2 is lost (overwritten) by transaction T1.
2. The dirty read problem:
a. Transactions read a value written by a transaction that has been later aborted.
b. This value disappears from the database upon abort, and should not have been read by any transaction
(“dirty read”).
c. The reading transactions end with incorrect results.
Example:
Transaction T1 Transaction T2
Read Y → A1
Read X → A2
A2 + A1 → A2
Write A2 → X
.
.
. Read X → A2
. Time Commit
Fails

In the example, transaction T1 fails and changes the value of X back to its old value, but T2 is committed
and reads the temporary incorrect value of X.
3. The incorrect summary problem:
a. While one transaction takes a summary over the values of all the
Transaction T1 Transaction T2

Read Y → A2 Read X → A2
Read X → A1 A2 + 1 → A2
A2 + A1 → A2 Write A2 →X
Write A2 →X Roll back
Time
instances of a repeated data item, a second transaction updates some instances of that data item.
b. The resulting summary does not reflect a correct result for any (usually needed for correctness)
precedence order between the two transactions (if one is executed before the other).
Example:
An example of unrepeatable read in which if T1 were to read the value of X after T2 had updated X, the
result of T1 would be different.
Role of locks:
1. It locks the data item in the transaction in correct order.
2. If any data item is locked than it must be unlock at the end of operation.

12. Explain timestamp-based protocol and timestamp ordering protocol.


OR
Discuss the timestamp-based protocol to maintain serializability in concurrent execution.
Also explain its advantages and disadvantages.
Answer:
Timestamp based protocols:
Timestamp based protocol ensures serializability. It selects an ordering among transactions in advance
using timestamps.
Timestamps:
1. With each transaction in the system, a unique fixed timestamp is associated. It is denoted by TS(Ti).
2. This timestamp is assigned by the database system before the transaction Ti starts execution.
3. If a transaction Ti has been assigned timestamp TS(Ti), and a new transaction Tj enters the system, then
TS(Ti) < TS(Tj).
4. The timestamps of the transactions determine the serializability order. Thus, if TS(Tj) > TS(Ti ) , then the
system must ensure that in produced schedule, transaction Ti appears before transaction Tj .
5. To implement this scheme, two timestamps are associated with each data item Q.
a. W-timestamp (Q): It denotes the largest timestamp of any transaction that executed write(Q)
successfully.
b. R-timestamp (Q): It denotes the largest timestamp of any transaction that executed read(Q)
successfully. These timestamps are updated whenever a new read(Q) or write(Q) instruction is executed.
The timestamp ordering protocol:
The timestamp ordering protocol ensures that any conflicting read and write operations are executed in
timestamp order. This protocol operates as follows:
1. Suppose that transaction Ti issues read(Q).
a. If TS(Ti ) < W-timestamp(Q), then Ti needs a value of Q that was already overwritten. Hence, read
operation is rejected, and Ti is rolled back.
b. If TS(Ti )  W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is set to the
maximum of R-timestamp(Q) and TS(Ti ).
2. Suppose that transaction Ti issues write(Q).
a. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and the
system assumed that the value would never be produced. Hence, the system rejects write operation and
rolls Ti back.
b. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, the system
rejects this write operation and rolls back Ti.
c. Otherwise, the system executes the write operation and sets W-timestamp(Q) to TS(Ti ). If a transaction
Ti is rolled by the concurrency control scheme, the system assigns it a new timestamp and restarts it.
Advantages of timestamp ordering protocol:
1. The timestamp ordering protocol ensures conflict serializability. This is because conflicting operation are
processed in timestamp order.
2. The protocol ensures freedom from deadlock, since no transaction ever waits.
Disadvantages of timestamp ordering protocol:
1. There is a possibility of starvation of long transaction if a sequence of conflicting short transaction causes
repeated restarting of the long transaction.
2. The protocol can generate schedules that are not recoverable.

13. Write short note on the following:


i. Thomas’ write rule
ii. Strict timestamp ordering protocol
Answer:
i. Thomas’ write rule: Thomas’ write rule is a modified version of
timestamp ordering protocol. Suppose that transaction Ti issues
write(Q) :
1. If TS (Ti) < R-timestamp(Q), then the value of Q that Ti is producing was previously needed, and it had
been assumed that the value would never be produced. Hence, the system rejects the write operation and
rolls Ti back.
2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, this write
operation can be ignored.
3. Otherwise, the system executes the write operation and sets W-timestamps (Q) to TS (Ti).
ii. Strict timestamp ordering protocol:
1. Strict timestamp ordering ensures that the schedules are both strict and serializable.
2. In this variation, a transaction T issues a read_item (X) or write_item (X) such that TS (T) > W-
timestamp (X) has its read or write operation delayed until the transaction T1 that wrote the value of X
(hence TS (T1) = W-timestamp (X)) has committed or aborted.
3. To implement this algorithm, it is necessary to simulate the locking of an item X that has been written by
transaction T until T1 is either committed or aborted.
4. This algorithm does not cause deadlock, since T waits for T1 only if TS (T) > TS (T1).

14. Explain validation protocol in concurrency control.


Answer:
Validation protocol in concurrency control consists of following three phase:
1. Read phase:
a. During this phase, the system executes transaction Ti.
b. It reads the values of the various data items and stores them in variables local to Ti.
c. It performs all write operations on temporary local variables, without updates of the actual database.
2. Validation phase: Transaction Ti performs a validation test to determine whether it can copy to the
database, the temporary local variables that hold the results of write operations without causing a violation
of serializability.
3. Write phase: If transaction Ti succeeds in validation phase, then the system applies the actual updates
to the database, otherwise, the system rolls back Ti. All three phases of concurrently executing transactions
can be interleaved. To perform the validation test, we should know when the various phases of transactions
Ti took place. We shall, therefore, associate three different timestamps with transaction Ti :
1. Start (Ti), the time when Ti started its execution.
2. Validation (Ti), the time when Ti finished its read phase and started its validation phase.
3. Finish (Ti), the time when Ti finished its write phase.

15. Explain the phantom phenomenon. Devise a timestamp-based protocol that avoids the
phantom phenomenon.
OR
Explain the phantom phenomena. Discuss a timestamp protocol that avoids the phantom
phenomena.
Answer:
Phantom phenomenon:
1. A deadlock that is detected but is not really a deadlock is called a phantom deadlock.
2. In distributed deadlock detection, information about wait-for relationship between transactions is
transmitted from one server to another.
3. If there is a deadlock, the necessary information will eventually be collected in one place and a cycle will
be detected.
4. As this procedure will take some time, there is a chance that one of the transactions that hold a lock will
meanwhile have released it; in this case the deadlock will no longer exist.
For example:
1. Consider the case of global deadlock detector that receives local wait-for graph from servers X and Y as
shown in fig. 1 and fig.2.

Fig. 1. Local wait-for graph. Fig. 2. Local wait-for graph. Fig.3. Global wait-for graph.

2. Suppose that transaction U releases an object at server X and requests the one held by V at server Y.
3. Suppose also that the global detector receives server Y’s local graph before server X’s.
4. In this case, it would detect a cycle T → U → V → T, although the edge T → U no longer exists.
5. A phantom deadlock could be detected if a waiting transaction in a deadlock cycle aborts during the
deadlock detection procedure. For example, if there is a cycle T → U → V → T and U aborts after the
information concerning U has been collected, then the cycle has been broken already and there is no
deadlock.
Timestamp based protocol that avoids phantom phenomenon:
1. The B + tree index based approach can be adapted to timestamping by treating index buckets as data
items with timestamps associated with them, and requiring that all read accesses use an index.
2. Suppose a transaction Ti wants to access all tuples with a particular range of search-key values, using a B
+ tree index on that search-key.
3. Ti will need to read all the buckets in that index which have key values in that range.
4. Ti will need to write one of the buckets in that index when any deletion or insertion operation on the
tuple is done.
5. Thus the logical conflict is converted to a conflict on an index bucket, and the phantom phenomenon is
avoided.

16. What do you mean by multiple granularities? How it is implemented in transaction


system?
Answer:
Multiple granularity:
1. Multiple granularity can be defined as hierarchically breaking up the database into blocks which can be
locked.
2. It maintains the track of what to lock and how to lock.
3. It makes easy to decide either to lock a data item or to unlock a data item.
Implementation:
1. Multiple granularity is implemented in transaction system by defining multiple levels of granularity by
allowing data items to be of various sizes and defining a hierarchy of data granularity where the small
granularities are nested within larger ones.
2. In the tree, a non leaf node represents the data associated with its descendants.
3. Each node is an independent data item.
4. The highest level represents the entire database.
5. Each node in the tree can be locked individually using shared or exclusive mode locks.
6. If a node is locked in an intention mode, explicit locking is being done at lower level of the tree (that is, at
a finer granularity).
7. Intention locks are put on all the ancestors of a node before that node is locked explicitly.
8. While traversing the tree, the transaction locks the various nodes in an intention mode. This hierarchy
can be represented graphically as a tree.

Fig. 1.
9. When a transaction locks a node, it also has implicitly locked all the descendants of that node in the same
mode.

17. What is multiple granularity protocol of concurrency control?


Answer:
1. Multiple granularity protocol is a protocol in which we lock the data items in top-down order and unlock
them in bottom-up order.
2. In multiple granularity locking protocol, each transaction Ti can lock a node Q in any locking mode by
following certain rules, which ensures serializability. These rules are as follows:
i. Ti must follow the compatibility matrix as shown in Fig. 1 to lock a node Q. This matrix contain following
additional locks:
a. Intension-Shared (IS): Explicit locking at a lower level of tree but only with shared locks.
b. Intension-Exclusive (IX): Explicit locking at a lower level with exclusive or shared locks.
c. Shared and Intension-Exclusive (SIX): The sub-tree rooted by that node is locked explicitly in
shared mode and explicit locking is being done at a lower level with exclusive mode locks.
ii. It first locks the root of the tree and then locks the other nodes.
iii. It can lock a node Q in S or IS mode only if it currently has the parent of Q locked in either IX or IS
mode.
iv. It can lock node Q in X, SIX or IX mode only if it currently has the parent of Q locked in either IX or SIX
mode.
v. It can lock a node if it has not previously unlocked any node.
vi. It can unlock a node Q only if it currently has none of the children of Q locked.
Requested mode X SIX IX S IS
X NO NO NO NO NO
SIX NO NO NO NO YES
IX NO NO YES NO YES
S NO NO NO YES YES
IS NO YES YES YES YES

Fig. 1. Compatibility matrix for different mode in multiple granularity protocol.

18. What is granularity locking? How does granularity of data item affect the performance of
concurrency control? What factors affect the selection of granularity size of data item?
Answer:
Granularity locking:
1. Granularity locking is a concept of locking the data item on the basis of size of data item.
2. It is based on the hierarchy of data where small granularities are nested within larger one. The lock may
be granted at any level from bottom to top.
Effect of granularity of data item over the performance of concurrency control:
1. The larger the data item size is, the lower the degree of concurrency permitted. For example, if the data
item size is disk block, a transaction T that need to lock a record B must lock the whole disk block X that
contains B. If the other transactions want to lock record C which resides in same lock then it is forced to
wait.
2. If the data item size is small then the number of items in the database increases. Because every item is
associated with a lock, the system will have a larger number of active locks to be handled by the lock
manager.
3. More lock and unlock operations will be performed which cause higher overhead.
Factors affecting the selection of granularity size of data items:
1. It depends on the types of transaction involved.
2. If a typical transaction accesses a small number of records, it is advantageous to have the data item
granularity be one record.
3. If a transaction typically accesses many records in the same file, it may be better to have block or file
granularity so that the transaction will consider all those records as one (or a few) data items.

19. What do you mean by multi granularity? How the concurrency is maintained in this case.
Write the concurrent transaction for the following graph.
T1 wants to access item C in read mode
T2 wants to access item D in exclusive mode
T3 wants to read all the children of item B
T4 wants to access all items in read mode
Answer:
Multi granularity: Refer Q.16
Concurrency in multi granularity protocol: Refer Q. 1
Numerical:
1. Transaction T1 reads the item C in B. Then, T2 needs to lock the item the item A and B in IS mode (and in
that order), and finally to lock the item C in S mode.
2. Transaction T2 modifies the item D in B. Then, T2 needs to lock the item A AND B (and in that order) in
IX mode, and at last to lock the item D in X mode.
3. Transaction T3 reads all the records in B. Then, T3 needs to lock the A and B (and in that order) in IS
mode, and at last to lock the item B in S mode.
4. Transaction T4 read the all item. It can be done after locking the item A in S mode.

20. What is multi version concurrency control? Explain multi version timestamping
protocol.
Answer:
Multi version concurrency control:
1. Multi version concurrency control is a schemes in which each write(Q) operation creates a new version of
Q.
2. When a transaction issues a read(Q) operation, the concurrency-control manager selects one of the
version of Q to be read.
3. The concurrency control scheme must ensure that the version to be read is selected in manner that
ensures serializability.
Multi version timestamping protocol:
1. The most common transaction-ordering technique used by multi version schemes is timestamping.
2. With each transaction Ti in the system, we associate a unique static timestamp, denoted by TS(Ti).
3. This timestamp is assigned before the transaction starts execution.
4. Concurrency can be increased if we allow multiple versions to be stored, so that the transaction can
access the version that is consistent for them.
5. With this protocol, each data item Q is associated with a sequence of versions < Q1, Q2, . . . , Qm >.
6. Each version Qk contains three data fields:
a. Content is the value of version Qk.
b. W-timestamp (Qk) is the timestamp of the transaction that created version Qk.
c. R-timestamp (Qk) is the largest timestamp of any transaction that successfully read version Qk.
7. The scheme operates as follows:
Suppose that transaction Ti issues a read(Q) operation. Let Qk denote the version of Q whose write
timestamp is the largest write timestamp less than or equal to TS(Ti).
a. If transaction Ti issues a read(Q), then the value returned is the content of version Qk.
b. When transaction Ti issues write(Q):
i. If TS(Tj) < R-timestamp (Qk), then the system rolls back transaction Ti.
ii. If TS(Ti) = W-timestamp (Qk), the system overwrites the contents of Qk; otherwise it creates a new
version of Q.
iii. This rule forces a transaction to abort if it is “too late” in doing a write.

21. Discuss multi version two-phase locking.


Answer:
1. The multi version two-phase locking attempts to combine the advantages of multi version concurrency
control with the advantages of two-phase locking.
2. In addition to read and write lock modes, multi version two-phase locking provides another lock mode,
i.e., certify.
3. In order to determine whether these lock modes are compatible with each other or not, consider Fig. 1
Requested mode Shared Exclusive Certify
Shared YES YES NO
Exclusive YES NO NO
Certify NO NO NO

Fig. 5.1. Compatibility matrix for multi version two-phase locking.

4. The term “YES” indicates that if a transaction Ti hold a lock on data item Q than lock can be granted by
other requested transaction Tj on same data item Q.
5. The term “NO” indicates that requested mode is not compatible with the mode of lock held. So, the
requested transaction must wait until the lock is released.
6. In multi version two-phase locking, other transactions are allowed to read a data item while a transaction
still holds an exclusive lock on the data item.
7. This is done by maintaining two versions for each data item i.e., certified version and uncertified version.
8. In this situation, Tj is allowed to read the certified version of Q while Ti is writing the value of uncertified
version of Q. However, if transaction Ti is ready to commit, it must acquire a certify lock on Q.

22. What are the problems that can arise during concurrent execution of two or more
transactions? Discuss methods to prevent or avoid these problems.
Answer:
Problems that can arise during concurrent execution of two or more transaction: Refer Q.11
Methods to avoid these problems:
1. Lock based protocol:
a. It requires that all data items must be accessed in a mutually exclusive manner.
b. In this protocol, concurrency is controlled by locking the data items.
c. A lock guarantees exclusive use of a data item to current transaction.
d. Locks are used as a means of synchronizing the access by concurrent transaction to the database items.
2. Timestamp based protocol: Refer Q.12
3. Multi version scheme:
a. Multi version timestamping protocol: Refer Q.20
b. Multi version two-phase locking: Refer Q. 21
23. Explain the recovery with concurrent transactions.
Answer:
Recovery from concurrent transaction can be done in the following four ways:
1. Interaction with concurrency control:
a. In this scheme, the recovery scheme depends greatly on the concurrency control scheme that is used.
b. So to rollback a failed transaction, we must undo the updates performed by the transaction.
2. Transaction rollback:
a. In this scheme we rollback a failed transaction by using the log.
b. The system scans the log backward, for every log record found in the log the system restores the data
item.
3. Checkpoints:
a. In this scheme we used checkpoints to reduce the number of log records that the system must scan when
it recovers from a crash.
b. In a concurrent transaction processing system, we require that the checkpoint log record be of the form
<checkpoint L>, where ‘L’ is a list of transactions active at the time of the checkpoint.
4. Restart recovery:
a. When the system recovers from a crash, it constructs two lists.
b. The undo-list consists of transactions to be undone, and the redo-list consists of transaction to be redone.
c. The system constructs the two lists as follows: Initially, they are both empty. The system scans the log
backward, examining each record, until it finds the first <checkpoint> record.

24. Describe Oracle. How data is stored in Oracle RDBMS?


Answer:
1. The Oracle database (commonly referred to as Oracle RDBMS or simply Oracle) consists of a relational
database management system (RDBMS).
2. Oracle is a multi-user database management system. It is a software package specializing in managing a
single, shared set of information among many concurrent users.
3. Oracle is one of many database servers that can be plugged into a client/server equation.
4. Oracle works to efficiently manage its resource, a database of information, among the multiple clients
requesting and sending data in the network.
Storage:
1. The Oracle RDBMS stores data logically in the form of table spaces and physically in the form of data
files.
2. Table spaces can contain various types of memory segments, such as data segments, index segments, etc.

25. Write the name of disk files used is Oracle. Explain database schema.
Answer:
Disk files consists two files which are as follows:
1. Data files:
a. At the physical level, data files comprise one or more data blocks, where the block size can vary between
data files.
b. Data files can occupy pre-allocated space in the file system of a computer server, utilize raw disk directly,
or exist within ASM logical volumes.
2. Control files: One (or multiple multiplexed) control files (also known as “control files”) store overall
system information and statuses.
Database schema:
1. Oracle database conventions refer to defined groups of object ownership as schemas.
2. Most Oracle database installation has a default schema called SCOTT.
3. After the installation process has set up the sample tables, the user can log into the database with the
username scott and the password tiger.
4. The SCOTT schema has seen less use as it uses few of the features of the more recent releases of Oracle.
5. Most recent examples supplied by Oracle Corporation reference the default HR or OE schemas.

26. Define in terms of Oracle:


i. Tablespace
ii. Package
iii. Schema
Answer
i. Tablespace:
1. A tablespace is a logical portion of an Oracle database used to allocate storage for table and index data.
2. Each tablespace corresponds to one or more physical database files.
3. Every Oracle database has a tablespace called SYSTEM and may have additional tablespaces.
4. A tablespace is used to group related logical structures together.
ii. Package:
1. Packages are a method of encapsulating and storing related procedures, functions, and other package
constructs together as a unit in the database.
2. It also offers increased functionality and database performance.
3. Calling a public procedure or function that is part of a package is no different than calling a standalone
procedure or function, except that we must include the program’s package name as a prefix to the program
name.
iii. Schema:
1. A schema is a collection of table definitions or related objects owned by one person or user.
2. SCOTT is schema in the Oracle database.
3. Schema objects are the logical structures that directly refer to the database’s data.
4. Schema objects include such structures as tables, views, sequences, stored procedures, synonyms,
indexes, clusters and database links.

27. Explain SQL Plus, SQL * Net and SQL & LOADER.
Answer:
SQL Plus:
1. SQL Plus is the front-end tools for Oracle.
2. The SQL Plus window looks much like a DOS window with a white background similar Notepad.
3. This tool allows us to type in our statements, etc., and see the results.
SQL * Net:
1. This is Oracle’s own middleware product which runs on both the client and server to hide the complexity
of the network.
2. SQL * Net’s multiprotocol interchange allows client/server connections to span multiple communication
protocols without the need for bridges and routers, etc., SQL * Net will work with any configuration design.
SQL * LOADER:
1. A utility used to load data from external files into Oracle tables.
2. It can load data from as ASCII fixed-format or delimited file into an Oracle table.

28. What do you mean by locking techniques of concurrency control? Discuss the various
locking techniques and recovery with concurrent transaction also in detail.
Answer:
Locking techniques:
1. The locking technique is used to control concurrency execution of transactions which is based on the
concept of locking data items.
2. The purpose of locking technique is to obtain maximum concurrency and minimum delay in processing
transactions.
3. A lock is a variable associate with a data item in the database and describes the status of that data item
with respect to possible operations that can be applied to the item; there is one lock for each data item in
the database.
Following are the locking techniques with concurrent transaction:
1. Lock based locking technique: Refer Q. 2
2. Two-phase locking technique: Refer Q. 7
Following are the recovery techniques with concurrent transaction:
1. Log based recovery: Refer Q. 18, Unit-4.
2. Checkpoint: Refer Q.20, Unit-4.

You might also like