Kcs-501: Database Management System
Kcs-501: Database Management System
UNIT-1: INTRODUCTION: Overview, Database System vs File System, Database System Concept and
Architecture, Data Model Schema and Instances, Data Independence and Database Language and
Interfaces, Data Definitions Language, DML, Overall Database Structure. Data Modeling Using the Entity
Relationship Model: ER Model Concepts, Notation for ER Diagram, Mapping Constraints, Keys, Concepts
of Super Key, Candidate Key, Primary Key, Generalization, Aggregation, Reduction of an ER Diagrams to
Tables, Extended ER Model, Relationship of Higher Degree.
UNIT-2: RELATIONAL DATA MODEL: Relational Data Model Concepts, Integrity Constraints, Entity
Integrity, Referential Integrity, Keys Constraints, Domain Constraints, Relational Algebra, Relational
Calculus, Tuple and Domain Calculus. Introduction on SQL: Characteristics of SQL, Advantage of SQL. SQl
Data Type and Literals. Types of SQL Commands. SQL Operators and Their Procedure. Tables, Views and
Indexes. Queries and Sub Queries. Aggregate Functions. Insert, Update and Delete Operations, Joins,
Unions, Intersection, Minus, Cursors, Triggers, Procedures in SQL/PL SQL.
UNIT-3: DATA BASE DESIGN & NORMALIZATION: Functional dependencies, normal forms, first,
second, 8 third normal forms, BCNF, inclusion dependence, loss less join decompositions, normalization
using FD, MVD, and JDs, alternative approaches to database design.
Student name
Fig. 1.11.1.
11. Write the difference between super key and candidate key.
Ans.
S. No Super key Candidate key
1. Super key is a set of one or more attributes A candidate key is a column, or set of column, in
taken collectively that allows us to identify the table that can uniquely identify any database
uniquely an entity in the entity set record without referring to any other data
2. Super key is a broadest unique identifier Candidate key is a subset of super key
12. Give example for one to one and one to many relationships.
Ans.
3. What do you understand by database users? Describe the different types of database
users.
Answer:
Database users are the one who use and take the benefits of database. The different types of users
depending on the need and way of accessing the database are:
1. Application programmers:
a. They are the developers who interact with the database by means of DML queries.
b. These DML queries are written in the application programs like C, C++, JAVA, Pascal etc.
c. These queries are converted into object code to communicate with the database.
2. Sophisticated users:
a. They are database developers, who write SQL queries to select/ insert/delete/update data.
b. They directly interact with the database by means of query language like SQL.
c. These users can be scientists, engineers, analysts who thoroughly study SQL and DBMS to apply the
concepts in their requirement.
3. Specialized users:
a. These are also sophisticated users, but they write special database application programs.
b. They are the developers who develop the complex programs according to the requirement.
4. Standalone user:
a. These users will have standalone database for their personal use.
b. These kinds of database will have predefined database packages which will have menus and graphical
interfaces.
5. Native users:
a. These are the users who use the existing application to interact with the database.
b. For example, online library system, ticket booking systems, ATMs etc.
4. Who are data administrators? What are the functions of database administrator?
OR
Discuss the role of database administrator.
Answer:
Database administrators are the personnel’s who has control over data and programs used for accessing the
data.
Functions/role of database administrator (DBA):
1. Schema definition:
a. Original database schema is defined by DBA.
b. This is accomplished by writing a set of definitions, which are translated by the DDL compiler to a set of
labels that are permanently stored in the data dictionary.
2. Storage structure and access method definition:
a. The creation of appropriate storage structure and access method.
b. This is accomplished by writing a set of definitions, which are translated by the data storage and
definition language compiler.
3. Schema and physical organization and modification:
a. Modification of the database schema or the description of the physical storage organization.
b. These changes are accomplished by writing a set of definition to do modification to the appropriate
internal system tables.
4. Granting of authorization for data access: DBA grants different types of authorization for data
access to the various users of the database.
5. Integrity constraint specification: DBA carry out data administration in data dictionary such as
defining constraints.
6. Explain the differences between physical level, conceptual level and view level of data
abstraction.
Answer:
S.No. Physical level Conceptual/Logical level View level
1. This is the lowest level of This is the middle level of data This is the highest level of
data abstraction. abstraction. data
abstraction.
2. It describes how data is It describes what data is stored in It describes the user
actually stored in database. database. interaction with database
system.
3. It describes the complex It describes the structure of whole It describes only those part
low-level data structures in database and hides details of of the database in which the
detail. physical storage structure. users are interested and
hides rest of all information
from the users.
4. . A user is not aware of the A user is not aware of the A user is aware of the
complexity complexity of database. complexity of
of database. database.
7. Explain the difference between database management system (DBMS) and file system.
Answer
S.No. DBMS File System
1. In DBMS, the user is not required to write In this system, the user has to write the
the procedures. procedures for managing the file.
2. DBMS gives an abstract view of data that File system provides the detail of the data
hides the details. representation and storage of data.
3. DBMS provides a crash recovery File system do not have a crash mechanism, i.e., if
mechanism, i.e., DBMS protects the data the system crashes while entering some data, then
from the system failure. the content of the file will lost.
4. DBMS provides a good protection It is very difficult to protect a file under the file
mechanism. system.
5. DBMS can efficiently store and retrieve the File system cannot efficiently store and retrieve
data. the data.
6. DBMS takes care of In the file system, concurrent access has many
concurrent access of data using some form problems like redirecting the file while other
of locking. deleting some information or updating some
information.
8. Discuss the architecture of DBMS. What are the types of DBMS architecture?
Answer:
1. The DBMS design depends upon its architecture. The basic client/ server architecture is used to deal with
a large number of PCs, web servers, database servers and other components that are connected with
networks.
2. DBMS architecture depends upon how users are connected to the database to get their request done.
Types of DBMS architecture:
i. 1-Tier architecture:
1. In this architecture, the database is directly available to the user.
2. Any changes done are directly done on the database itself. It does not provide a handy tool for end users.
3. The 1-Tier architecture is used for development of the local application, where programmers can directly
communicate with the database for the quick response.
ii. 2-Tier architecture:
1. The 2-Tier architecture is same as basic client-server.
2. In the two-tier architecture, applications on the client end can directly communicate with the database at
the server side. For this interaction, API’s such as: ODBC, JDBC are used.
3. The user interfaces and application programs are run on the client-side.
4. The server side is responsible to provide the functionalities like query processing and transaction
management.
5. To communicate with the DBMS, client-side application establishes a connection with the server side.
Application Client
User
Database
Server
Application server
Application client
Client
User
9. What are data models? Briefly explain different types of data models.
Answer:
Data models:
1. Data models define how the logical structure of the database is modeled.
2. Data models are a collection of conceptual tools for describing data, data relationships, data semantics
and consistency constraints.
3. Data models define how data is connected to each other and how they are processed and stored inside the
system.
Types of data models:
1. Entity relationship model:
a. The entity relationship (ER) model consists of a collection of basic objects, called entities and of
relationships among these entities.
b. Entities are represented by means of their properties, called attributes.
2. Relational model:
a. The relational model represents data and relationships among data by a collection of tables, each of
which has a number of columns with unique names.
b. Relational data model is used for data storage and processing.
c. This model is simple and it has all the properties and capabilities required to process data with storage
efficiency.
3. Hierarchical model:
a. In hierarchical model data elements are linked as an inverted tree structure (root at the top with
branches formed below).
b. Below the single root data element are subordinate elements each of which in turn has its own
subordinate elements and so on, the tree can grow to multiple levels.
c. Data element has parent child relationship as in a tree.
4. Network model:
a. This model is the extension of hierarchical data model.
b. In this model there exist a parent child relationship but a child data element can have more than one
parent element or no parent at all.
5. Object-oriented model:
a. Object-oriented models were introduced to overcome the shortcomings of conventional models like
relational, hierarchical and network model.
b. An object-oriented database is collection of objects whose behaviour, state, and relationships are defined
in accordance with object-oriented concepts (such as objects, class, etc.).
12. Describe the classification of database language. Which type of language is SQL?
OR
Discuss the following terms (i) DDL Command (ii) DML command.
Answer:
Classification of database languages:
1. Data Definition Language (DDL):
a. DDL is set of SQL commands used to create, modify and delete database structures but not data.
b. They are used by the DBA to a limited extent, a database designer, or application developer.
c. Create, drop, alter, truncate are commonly used DDL command.
2. Data Manipulation Language (DML):
a. A DML is a language that enables users to access or manipulates data as organized by the appropriate
data model.
b. There are two types of DMLs:
i. Procedural DMLs: It requires a user to specify what data are needed and how to get those data.
ii. Declarative DMLs (Non-procedural DMLs): It requires a user to specify what data are needed
without specifying how to get those data.
c. Insert, update, delete, query are commonly used DML commands.
3. Data Control Language (DCL):
a. It is the component of SQL statement that control access to data and to the database.
b. Commit, rollback command are used in DCL.
4. Data Query Language (DQL):
a. It is the component of SQL statement that allows getting data from the database and imposing ordering
upon it.
b. It includes select statement.
5. View Definition Language (VDL):
1. VDL is used to specify user views and their mapping to conceptual schema.
2. It defines the subset of records available to classes of users.
3. It creates virtual tables and the view appears to users like conceptual level.
4. It specifies user interfaces.
SQL is a DML language.
Answer:
DBMS interfaces: A database management system (DBMS) interface is a user interface which allows for
the ability to input queries to a database without using the query language itself.
c. They are also often used in browsing interfaces, which allow a user to look through the contents of a
database in an exploratory and unstructured manner.
2. Forms-based interfaces:
a. A forms-based interface displays a form to each user.
b. Users can fill out all of the form entries to insert new data, or they can fill out only certain entries, in
which the DBMS will retrieve matching data for the remaining entries.
OR
Draw the overall structure of DBMS and explain its components in brief.
Answer:
A database system is partitioned into modules that deal with each of the responsibilities of the overall
system. The functional components of a database system can be broadly divided into two components:
1. Storage Manager (SM): A storage manager is a program module that provides the interface between
the low level data stored in the database and the application programs and queries submitted to the system.
The SM components include:
a. Authorization and integrity manager: It tests for the satisfaction of integrity constraints and
checks the authority of users to access data.
b. Transaction manager: It ensures that the database remains in a consistent state despite of system
failures and that concurrent transaction executions proceed without conflicting.
c. File manager: It manages the allocation of space on disk storage and the data structures are used to
represent information stored on disk.
d. Buffer manager: It is responsible for fetching data from disk storage into main memory and deciding
what data to cache in main memory. The buffer manager is a critical part of the database system, since it
enables the database to handle data sizes that are much larger than the size of main memory.
2. Query Processor (QP): The Query Processor (Query Optimizer) is responsible for taking every
statement sent to SQL Server and figure out how to get the requested data or perform the requested
operation. The QP components are:
a. DDL interpreter: It interprets DDL statements and records the definition in data dictionary.
b. DML compiler: It translates DML statements in a query language into an evaluation plan consisting of
low-level instructions that the query evaluation engine understands.
c. Query optimization: It picks the lowest cost evaluation plan from among the alternatives.
d. Query evaluation engine: It executes low-level instructions generated by the DML compiler.
17. What do you understand by attributes and domain? Explain various types of attributes
used in conceptual data model.
Answer:
Attributes:
1. Attributes are properties which are used to represent the entities.
2. All attributes have values. For example, a student entity may have name, class, and age as attributes.
3. There exists a domain or range of values that can be assigned to attributes.
4. For example, a student’s name cannot be a numeric value. It has to be alphabetic.
A student’s age cannot be negative, etc.
Domain:
1. A domain is an attribute constraint which determines the type of data values that are permitted for that
attribute.
2. Attribute domains can be very large, or very short.
Types of attributes used in conceptual data model:
1. Simple attribute: Simple attributes are atomic values, which cannot be divided further. For example, a
student’s phone number is an atomic value of 10 digits.
2. Composite attribute: Composite attributes are made of more than one simple attribute. For example,
a student's complete name may have first_name and last_name.
3. Derived attribute: Derived attributes are the attributes that do not exist in the physical database, but
their values are derived from other attributes present in the database. For example, average_salary in a
department should not be saved directly in the database, instead it can be derived.
4. Single-value attribute: Single-value attributes contain single value. For example,
Social_Security_Number.
5. Multi-value attribute: Multi-value attributes may contain more than one values. For example, a
person can have more than one phone number, email_address, etc.
18. What is purpose of the ER diagram? Construct an ER diagram for a University system
which should include information about students, departments, professors, courses, which
students are enrolled in which course, which professors are teaching which courses, student
grades, which course a department offers.
Answer:
Purpose of the ER diagram:
1. ER diagram is used to represent the overall logical structure of the database.
2. ER diagrams emphasis on the schema of the database and not on the instances because the schema of the
database is changed rarely.
3. It is useful to communicate the logical structure of database to end users.
4. It serves as a documentation tool.
5. It helps the database designer in understanding the information to be contained in the database.
ER diagram:
19. Draw an ER diagram for a small marketing company database, assuming your own data
requirements.
Answer:
20. A university registrar’s office maintains data about the following entities (a) courses,
including number, title, credits, syllabus and prerequisites; (b) course offerings, including
course number, year, semester section number, instructor(s), timings and classroom; (c)
students, including student-id, name and program; and (d) instructors, including
identification number, name department and title. Further the enrollment of students in
courses and grades awarded to students in each course they are enrolled for must be
appropriately modeled. Construct an ER diagram for the registrar’s office. Document all
assumption that you make about the mapping constraints.
Answer:
In this ER diagram, the main entity sets are student, course, course offering and instructor. The entity set
course offering is a weak entity set dependent on course. The assumptions made are:
a. A class meets only at one particular place and time. This ER diagram cannot model a class meeting at
different places at different times.
b. There is no guarantee that the database does not have two classes meeting at the same place and time.
iv. Many to many: An entity in A is associated with any number of entities in B, and an entity in B is
associated with any number of entities in A.
2. Participation constraints: It tells the participation of entity sets. There are two types of
participations:
i. Partial participation
ii. Total participation
22. Discuss the candidate key, primary key, super key, composite key and alternate key.
OR
Explain the primary key, super key, foreign key and candidate key with example.
OR
Define key. Explain various types of keys.
Answer:
Key:
1. Key is a attribute or set of attributes that is used to identify data in entity sets.
2. Key is defined for unique identification of rows in table.
Consider the following example of an Employee table:
Employee (EmployeeID, FullName, SSN, DeptID)
Various types of keys are:
1. Primary key:
a. Primary key uniquely identifies each record in a table and must never, be the same for records. Here in
Employee table we can choose either EmployeeID or SSN columns as a primary key.
b. Primary key is a candidate key that is used for unique identification of entities within the table.
c. Primary key cannot be null.
d. Any table has a unique primary key.
2. Super key:
a. A super key for an entity is a set of one or more attribute whose combined value uniquely identifies the
entity in the entity set.
b. For example: Here in employee table (EmployeeID, FullName) or (EmployeeID, FullName, DeptID) is a
super key.
3. Candidate key:
a. A candidate key is a column, or set of column, in the table that can uniquely identify any database record
without referring to any other data.
b. Candidate key are individual columns in a table that qualifies for uniqueness of all the rows. Here in
Employee table EmployeeID and SSN are candidate keys.
c. Minimal super keys are called candidate keys.
Introduction 1–24 A (CS/IT-Sem-5)
4. Composite key:
a. A composite key is a combination of two or more columns in a table that can be used to uniquely identify
each row in the table.
b. It is used when we cannot identify a record using single attributes.
c. A primary key that is made by the combination of more than one attribute is known as a composite key.
5. Alternate key:
a. The alternate key of any table are those candidate keys which are not currently selected as the primary
key.
b. Exactly one of those candidate keys is chosen as the primary key and the remainders, if any are then
called alternate keys.
c. An alternate key is a function of all candidate keys minus the primary key.
d. Here in Employee table if EmployeeID is primary key then SSN would be the alternate key.
6. Foreign key:
a. Foreign key represents the relationship between tables and ensures the referential integrity rule.
b. A foreign key is derived from the primary key of the same or some other table.
c. Foreign key is the combination of one or more columns in a table (parent table) at references a primary
key in another table (child table).
d. A foreign key value can be left null.
For example: Consider another table:
Project (ProjectName, TimeDuration, EmployeeID)
a. Here, the ‘EmployeeID’ in the ‘Project’ table points to the ‘EmployeeID’ in ‘Employee’ table
b. The ‘EmployeeID’ in the ‘Employee’ table is the primary key.
c. The ‘EmployeeID’ in the ‘Project’ table is a foreign key.
23. What do you mean by a key to the relation? Explain the differences between super key,
candidate key and primary key.
Answer
Key: Refer Q.22
Difference between super key, candidate key and primary key:
2. All super keys cannot be All candidate keys are super Primary key is a subset of
candidate keys. keys but not primary key. candidate key and super
key.
4. A relation can have any Number of candidate keys is less Number of primary keys is
number of super keys. than super keys. less than candidate keys.
5. For example, in Fig. 1.23.1, For example, in Fig. 1.23.1, For example, in Fig.
super key are: (Registration), candidate key are : 1.23.1, primary key is:
(Vehicle_id), ( Re g i st r a t i o (Registration, Vehicle_id) (Registration)
n , Vehicle_id), (Registration,
Vehicle_id, Make) etc.
Fig. An entity CAR for defining keys.
24. Explain generalization, specialization and aggregation.
OR
Compare generalization, specialization and aggregation with suitable examples.
Answer:
Generalization:
a. Generalization is a process in which two lower-level entities combine to form higher level entity.
b. It is bottom-up approach.
c. Generalization is used to emphasize the similarities among lower-level entity sets and to hide the
differences.
For example:
Specialization:
a. Specialization is a process of breaking higher-level entity into lower level entity.
b. It is top-down approach.
c. It is opposite to generalization.
Aggregation:
a. Aggregation is an abstraction through which relationships are treated as higher level entities.
For example:
1. The relationship works_on (relating the entity sets employee, branch and job) act as a higher-level entity
set.
2. We can then create a binary relationship ‘Manages’, between works on and manager to represent who
manages what tasks.
3. It helps in reducing the It increases the size of schema. It also increases the size of
schema size. schema.
UNIT-2
Relational Data Model and Language
LONG ANSWER TYPE QUESTIONS
1. What is relational model? Explain with example.
Answer:
1. A relational model is a collection of conceptual tools for describing data, data relationships, data
semantics and consistency constraints.
2. It is the primary data model for commercial data processing applications.
3. The relational model uses collection of tables to represent both data and the relationships among those
data.
4. Each table has multiple columns and each column has a unique name.
For example:
1. The tables represent a simple relational database.
2. The Table(1) shows details of bank customers, Table (2) shows accounts and Table (3) shows which
accounts belong to which customer.
Table (1): Customer table
cust_id c_name c_city
3. The Table (1), i.e., customer table, shows the customer identified by cust_id C_101 is named Ajay and
lives in Delhi.
4. The Table (2), i.e., accounts, shows that account A-1 has a balance of rs 1000.
5. The Table (3), i.e., depositor table, shows that account number (acc_no) A-1 belongs to the cust whose
cust_id is C_101 and account number (acc_no) A-2 belongs to the cust whose cust_id is C_102 and
likewise.
8. Give the following queries in the relational algebra using the relational schema:
Student (id, name)
Enrolled (id, code)
Subject (code, lecturer)
i. What are the names of students enrolled in cs3020?
ii. Which subjects is Hector taking?
iii. Who teaches cs1500?
iv. Who teaches cs1500 or cs3020?
v. Who teaches at least two different subjects?
vi. What are the names of students in cs1500 or cs307?
vii. What are the names of students in both cs 1500 and cs1200?
Answer:
i. πname (σcode = cs3020 (student >< enrolledin))
ii. πcode (σname = Hector (student >< enrolledin))
iii. πlecturer (σcode = cs1500 (subject))
iv. πlecturer (σcode = cs1500 ∨ ¬ code = cs3020 (subject))
v. For this query we have to relate subject to itself. To disambiguate the relation, we will call the subject
relation R and S. πlecturer(σR.lecture = S.lecturer ∧ R.code< >S.code(R >< S))
vi. πname(σcode = cs1500(student >< enrolledin)) ∪ (πname(σcode = cs307(student >< enrolledin)))
vii. πname(σcode = cs1500(student >< enrolledin)) ∩ πname(σcode = cs1200(student >< enrolledin))
9. What is relational calculus? Describe its important characteristics. Explain tuple and
domain calculus. OR
What is tuple relational calculus and domain relational calculus?
Answer:
1. Relational calculus is a non-procedural query language.
2. Relational calculus is a query system where queries are expressed as formulas consisting of a number of
variables and an expression involving these variables.
3. In a relational calculus, there is no description of how to evaluate a query.
Important characteristics of relational calculus:
1. The relational calculus is used to measure the selective power of relational languages.
2. Relational calculus is based on predicate calculus.
3. In relational calculus, user is not concerned with the procedure to obtain the results.
4. In relational calculus, output is available without knowing the method about its retrieval.
Tuple Relational Calculus (TRC):
1. The TRC is a non-procedural query language.
2. It describes the desired information without giving a specific procedure for obtaining that information.
3. A query in TRC is expressed as:
{t | P(t)}
That is, it is the set of all tuples t such that predicate P is true for t. The notation t[A] is used to denote the
value of tuple t on attribute A and t ∈ r is used to denote that tuple t is in relation r.
4. A tuple variable is said to be a free variable unless it is quantified by a ∃ or ∀.
5. Formulae are built using the atoms and the following rules:
a. An atom is a formula.
b. If P1 is a formula, then so are ¬ P1 and (P1).
c. If P2 and P1 are formulae, then so are P1∨ P2, P1∧ P2 and P1 ⇒ P2.
d. If P1(s) is a formula containing a free tuple variable s, and r is a relation, then ∃ s ∈ r (P1(s)) and ∀ s ∈ r
(P1(s)) are also formulae.
Domain Relational Calculus (DRC):
1. DRC uses domain variables that take on values from an attribute domain, rather than values for an entire
tuple.
2. An expression in the DRC is of the form:
{<x1, x2, ……., xn> | P(x1, x2, ………, xn)}
where x1, x2, ………, xn represent domain variable. P represents a formula compose of atoms.
3. An atom in DRC has one of the following forms:
a. <x1, x2, …….., xn> ∈ r, where r is a relation on n attributes and x 1, x2, ……., xn are domain variables or
domain constant.
b. x θ y, where x and y are domain variable and θ is a comparison operator (< , ≤, =, ≠, >, ≥). The attributes
x and y must have the domain that can be compared.
c. x θ c, where x is a domain variable, θ is a comparison operator and c is a constant in the domain of the
attribute for which x is a domain variable.
4. Following are the rules to build up the formula:
a. An atom is a formula.
b. If P1 is a formula then so is ¬P1.
c. If P1 and P2 are formula, then so are P1∨ P2, P1∧ P2 and P1 ⇒ P2.
d. If P1(x) is a formula in x, where x is a domain variable, then ∃ x (P1(x)) and x ∀ (P1(x)) are also formulae.
16. Draw an ER diagram of Hospital or Bank with showing the specialization, Aggregation,
Generalization. Also convert it in to relational schemas and SQL DDL.
Answer:
Relational schemas:
branch (branch-name, branch-city, assets)
customer (customer-name, customer-street, customer-city, customer-id) account (account-number,
balance)
loan (loan-number, amount)
employee (employee-id, employee-name, telephone-number, start- date, employment length, dependent-
name)
payment (payment-number, payment-amount, payment-date) saving-account (interest-rate)
checking-account (overdraft-amount)
Fig. 2.16.1. ER diagram for a banking enterprise.
SQL DDL of ER diagram:
create table branch (branch-city varchar (40),
branch-name varchar (40) primary key,
assets number (20));
create table customer (customer-id number (5) primary key,
customer-name varchar (40),
customer-street varchar (20),
customer-city varchar (30));
create table loan (loan-number number (6) primary key,
amount number (10));
create table employee (employee-id number(5) primary key,
employee-name varchar (40),
telephone- number (10),
number
start-date date,
employment number (4),
length
dependent- varchar (10));
name
create table payment (payment- number (6), number
payment- number (10),
amount
payment-date date);
create table account (account- number (12) primary key,
number
balance number (10));
create table
saving-account (interest-rate number (3);
create table
checking-account (overdraft- number (15));
amount
2. Comparison operators: These are used to compare one expression with another. The comparison
operators are given below:
Operator Defination
= Equality
!=, <> Inequality
> Greater than
< Less than
≥ Greater than or equal to
≤ Less than or equal to
3. Logical operators: A logical operator is used to produce a single result from combining the two
separate conditions.
4. Operator Definition
AND Returns true if both component conditions are true; otherwise returns false.
Set operators: Set operators combine the results of two separate queries into a single result.
Operator Definition
UNION Returns all distinct rows from both queries
INTERSECT Returns common rows selected by both queries
MINUS Returns all distinct rows that are in the first query, but not in second one.
Operator Definition
5. Operator : Prefix for host variable precedence:
a. Precedence , Variable separator defines the
order that the () Surrounds subqueries DBMS uses
when “ Surrounds a literal evaluating the
different operators in
“ “ Surrounds a table or column alias or literal text
the same expression.
() Overrides the normal operator precedence
b. The DBMS evaluates
+, - Unary operators
operators with the highest
*, / Multiplication and division
precedence first before
evaluating the +, - Addition and subtraction operators of
lower || Character concatenation
NOT Reverses the result of an expression
AMD True if both conditions are true
OR True if either conditions are true
UNION Returns all data from both queries
INTERSECT Returns only rows that match both queries
MINUS Returns only row that do not match both queries
Que 2.18. What are the relational algebra operations supported in SQL? Write the SQL
statement for each operation.
Answer:
Basic relational algebra operations: Refer Q. 2.5
SQL statement for relational algebra operations:
1. Select operation: Consider the loan relation,
loan (loan_number, branch_name, amount)
Find all the tuples in which the amount is more than ` 12000, then we write
σ amount > 12000 (loan)
2. Project operation: We write the query to list all the customer names and their cities as :
Π customer_name, customer_city (customer)
3. Set difference operation: We can find all customers of the bank who have an account but not a loan
by writing:
Π customer_name (depositor) – Πcustomer_name (borrower)
4. Cartesian product: We have the following two tables:
5. Rename:
Consider the Book relation with attributes Title, Author, Year and Price. The rename operator is used on
Book relation as follows:
* ρ Temp(Bname, Aname, Pyear, Bprice) (Book)
Here both the relation name and the attribute names are renamed.
22. Write full relation operation in SQL. Explain any one of them.
OR
Explain aggregate function in SQL.
Answer:
In SQL, there are many full relation operations like:
i. Eliminating duplicates
ii. Duplicating in union, intersection and difference
iii. Grouping
iv. Aggregate function
Aggregate function:
1. Aggregate functions are functions that take a collection of clues as input and return a single value.
2. SQL offers five built-in aggregate functions:
23. Explain how the GROUP BY clause in SQL works. What is the difference between
WHERE and HAVING clause?
Answer:
GROUP BY:
1. GROUP BY was added to SQL because aggregate functions (like SUM) return the aggregate of all column
values every time they are called, and without the GROUP BY function it is not possible to find the sum for
each individual group of column values.
2. The syntax for the GROUP BY function is:
SELECT columns, SUM (column) FROM table GROUP BY column
Example:
This “Sales” Table:
Company Amount
TCS 5500
IBM 4500
TCS 7100
Employee
Emp_Name City
Hari Pune
Om Mumbai
Suraj Nashik
Jai Solapur
Employee_Salary
Emp_Name Department Salary
Hari Computer 10000
Om IT 7000
Billu Computer 8000
Jai IT IT 5000
Emp_Name Salary
Hari 10000
Om 7000
Jai 5000
2. Outer join:
a. An outer join is an extended form of the inner join.
b. It returns both matching and non-matching rows for the tables that are being joined.
c. Types of outer join are as follows:
i. Left outer join: The left outer join returns matching rows from the tables being joined and also non-
matching rows from the left table in the result and places null values in the attributes that comes from the
right table.
For example:
Select Employee.Emp_Name, Salary
from Employee left outer join Employee_Salary
on Employee.Emp_Name = Employee_Salary.Emp_Name;
Result: The result of preceding query with selected fields of Table (1) and Table (2)
Emp_Name Salary
Hari 10000
Om 7000
Jai 5000
Suraj null
ii. Right outer join: The right outer join operation returns matching rows from the tables being joined,
and also non matching rows from the right table in the result and places null values in the attributes that
comes from the left table.
For example:
Select Employee.Emp_Name, City, Salary from Employee right outer join
Employee_Salary on Employee.Emp_Name =
Employee_Salary. Emp_Name;
Result: The result of preceding query with selected fields of Table (1) and Table (2)
26. Write difference between cross join, natural join, left outer join and right outer join with
suitable example.
Answer:
Cross join:
1. Cross join produces a result set which is the product of number of rows in the first table multiplied by the
number of rows in the second table if no where clause is used along with cross join. This kind of result is
known as Cartesian product.
2. If where clause is used with cross join, it functions like an inner join.
For Example:
Natural join:
1. Natural join joins two tables based on same attribute name and data types.
2. The resulting table will contain all the attributes of both the table but keep only one copy of each common
column.
3. In natural join, if there is no condition specifies then it returns the rows based on the common column.
For example: Consider the following two relations:
Student (Roll_No, Name)
Marks (Roll_No, Marks)
These two relations are shown in Table (1) and (2).
Table (1). The Student relation.
Student
Roll_No Name
1 A
2 B
3 C
Common
records
in both
queries
31. Consider the following relation. The primary key is Rollno, ISBN, Student (Roll No,
Name, Branch), Book (ISBN, Title, Author, Publisher) Issue (Roll No, ISBN, date_of_issue).
Write the query in relational algebra and SQL of the following:
i. List the Roll Number and Name of All CSE Branch Students.
ii. Find the name of students who have issued a book of publication ‘BPB’.
iii. List the title and author of all books which are issued by a student name started with ‘a’.
iv. List the title of all books issued on or before 20/09/2012.
v. List the name of student who will read the book of author named ‘Sanjeev’.
Answer:
i. In relational algebra:
πRoll No, Name (σBranch = “CSE” (Student))
In SQL:
Select Roll No, Name from Students
where Branch = “CSE”;
ii. In relational algebra:
πName σPublisher = “BPB” and Student_Roll No = P.Roll No (Student >< (πRoll No, Publisher (σIssue.ISBN = Book.ISBN ρP (Book >< Issue))))
In SQL:
Select Student.name from Student inner join
(Select Book.Publisher, Issue.Roll No from Issue inner join Book on Issue.ISBN = Book.ISBN as P)
ON Student.Roll No = P. Roll No
where P.Publisher = “BPB”;
iii. In relational algebra:
πS.Title, S.Author σS.Name like(‘a%’) (πT.Name, Book.Author, Book.Title σBook.ISBN = T.ISBN ρS (Book >< (πName, ISBN σStudent.Roll No = Issue.Roll No ρT
(Student >< Issue))));
In SQL:
Select S.title, S.Author from
(Select T.Name, Book.Author, Book.Title from Book inner join (Select Student.Name, Issue.ISBN from
Student inner join Issue
ON Student.Roll No = Issue.Roll No as T)
ON Book.ISBN = T.ISBN as S)
where S.Name like ‘a%’;
iv. In relational algebra:
πTitle (σdate > = 20/09/2012 (Book >< Issue))
In SQL:
Select Book.Title from Book inner join Issue ON Book.ISBN = Issue.ISBN as R
where R.date > = 20/09/2012;
v. In relational algebra:
πName (σAuthor = “Sanjeev” and Student.Roll No = Q.Roll No (Student >< πRoll No, Author σIssue.ISBN = Book.ISBN ρQ (Book >< Issue)))
In SQL:
Select Student.Name from Student inner join
(Select Issue.Roll No, Book.Author from Issue inner join Book ON Issue.ISBN = Book.ISBN as Q)
ON Student.Roll No = Q.Roll No
where Q.Author = “Sanjeev”;
32. Suppose there are two relations R (A, B, C), S (D, E, F). Write TRC and SQL for the
following RAs:
i. IIA, B (R)
ii. σB = 45 (R)
iii. IIA, F (σC = D (R × S))
Answer:
i. IIA, B (R):
TRC: {s.A, s.B|R(s)}
SQL: Select A, B from R;
ii. σB = 45 (R):
TRC: {s |R(s) ^ s.B = 45}
SQL: Select * from R where B = 45;
iii. IIA, F (σC = D (R × S)):
TRC: {t|pr q s (t[A] = p[A] ^ t[F] = q[F] ^ p[C] = q[D])}
SQL: Select A, F from R inner join S ON R.C = S.D;
Que 2.33. Consider the following relational DATABASE. Give an expression in SQL for each
following queries. Underline records are primary key
Employee (person_name, street, city)
Works (person_name, Company_name, salary)
Company (Company_name, city)
Manages (person_name, manager_name)
i. Finds the names of all employees who works for the ABC bank.
ii. Finds the name of all employees who live in the same city and on the same street as do
their managers.
iii. Find the name street address and cities of residence of all employees who work for ABC
bank and earn more than 7,000 per annum.
iv. Find the name of all employee who earn more than every employee of XYZ.
v. Give all employees of corporation ABC a 7% salary raise.
vi. Delete all tuples in the works relation for employees of ABC.
vii. Find the name of all employees in this DATABASE who live in the same city as the
company for which they work.
Answer
i. Select person_name from Works
Where company_name=‘ABC Bank’
ii. Select E1.person_name
From Employee as E1, Employee as E2, Manages as M
Where E1.person_name=M.person_name
and E2.person_name=M.manager_name
and E1.street=E2.street and E1.city=E2.city
iii. Select * from employee
where person_name in
(select person_name from Works
where company_name= ‘ABC Bank’ and salary>7000
select E.person_name, street,
city from Employee as E, Works as W
where E.person_name = W.person_name
and W.company_name=‘ABC Bank’ and W.salary>7000
iv. Select person_name from Works
where salary > all
(select salary from Works
where company_name=‘XYZ’)
select person_name from Works
where salary>(select max(salary) from Works
where company_name=‘XYZ’)
v. Update Works
set salary=salary*1.07
where company_name=‘ABC Bank’
vi. Delete from Works
where company_name=‘ABC Bank’
vii. Select E.person_name
from Employee as E, Works as W, Company as C
where E.person_name=W.person_name and E.city=C.city
and W.company_name=C.company_name
34. Explain embedded SQL and dynamic SQL in detail.
Answer:
Embedded SQL:
1. The SQL standard defines embeddings of SQL in a variety of programming languages such as Pascal,
PL/I, Fortran, C and COBOL.
2. A language in which SQL queries are embedded is referred to as a host language and the SQL structures
permitted in the host language constitute embedded SQL.
3. Programs written in the host language can use the embedded SQL syntax to access and update data
stored in a database.
4. In embedded SQL, all query processing is performed by the database system.
5. The result of the query is then made available to the program one tuple at a time.
6. Embedded SQL statements must be completely present at compile time and compiled by the embedded
SQL pre-processor.
7. To identify embedded SQL requests to the pre-processor, we use the
EXEC, SQL statement as:
EXEC SQL <embedded SQL statement> END.EXEC
8. Variable of the host language can be used within embedded SQL statements, but they must be preceded
by a colon (:) to distinguish them from SQL variables.
Dynamic SQL:
1. The dynamic SQL component of SQL allows programs to construct and submit SQL queries at run time.
2. Using dynamic SQL, programs can create SQL queries as strings at run time and can either have them
executed immediately or have them prepared for subsequent use.
3. Preparing a dynamic SQL statement compiles it, and subsequent uses of the prepared statement use the
compiled version.
4. SQL defines standards for embedding dynamic SQL calls in a host language, such as C, as in the following
example,
char * sqlprog = “update account set balance = balance * 1.05
where account_number =?”;
EXEC SQL prepare dynprog from: sqlprog;
char account [10] = “A-101”;
EXEC SQL execute dynprog using: account;
6. When do you get constraints violation? Also, define null value constraint.
Ans. Constraints get violated during update operations on the relation.
Null value constraint: While creating tables, if a row lacks a data value for a particular column, that
value is said to be null.
7. What is the role of join operations in relational algebra?
Ans. The join operation, denoted by, is used to join two relations to form a new relation on the basis of a
common attribute present in the two operand relations.
8. What are characteristics of SQL?
Ans. Characteristics of SQL:
1. SQL usage is extremely flexible.
2. It uses a free form syntax
9. Give merits and demerits of SQL database.
Ans. Merits of SQL database:
i. High speed
ii. Security
iii. Compatibility
iv. No coding required
Demerits of SQL database:
i. Some versions of SQL are costly.
ii. Difficulty in interfacing
iii. Partial control is given to database.
10. What is the purpose of view in SQL?
Ans. A view is a virtual relation, whose contents are derived from already existing relations and it does not
exist in physical form. A view can be used like any other relation, which is, it can be queried, inserted into,
deleted from and joined with other relations or views, though with some limitations on update operations.
11. Which command is used for creating user-defined data types?
Ans. The user-defined data types can be created using the CREATE DOMAIN command.
12. What do you mean by query and subquery?
Ans. Query is a request to database for obtaining some data. A subquery is a SQL query nested inside a
larger query. Subqueries must be enclosed within parenthesis.
13. Write the purpose of trigger.
Ans. Purpose of trigger:
1. Automatically generate derived column values.
2. Prevent invalid transactions.
3. Enforce complex security authorizations.
4. Enforce referential integrity across nodes in a distributed database.
5. Enforce complex business rules.
14. What do you mean by PL / SQL?
Ans. PL/SQL stands for Procedural Language/SQL. PL/SQL extends SQL by adding constructs found in
procedural languages, resulting is a structural language that is more powerful than SQL.
15. What is union compatibility?
Ans. Two relation instances are said to be union compatible if the following conditions hold:
i. They have the same number of the fields.
ii. Corresponding fields, taken in order from left to right, have the same domains.
16. What is Relational Algebra?
Ans. The relational algebra is a procedural query language. It consists of a set of operations that take one or
two relations as input and produces a new relation as a result.
17. Define constraint and its types in DBMS.
Ans. A constraint is a rule that is used for optimization purposes.
Types of constraints:
1. NOT NULL
2. UNIQUE
3. DEFAULT
4. CHECK
5. Key constraints
i. Primary key
ii. Foreign key
6. Domain constraints
UNIT-3
DATA BASE DESIGN & NORMALIZATION
1. Distinguish between functional dependency and multivalued dependency.
Ans. Functional dependency:
A functional dependency, denoted by X → Y, between two sets of attributes X and Y that are subsets of R
specifies a constraint on the possible tuples that can form a relation, state r or R.
Multivalued dependency (MVD):
MVD occurs when two or more independent multivalued facts about the same attribute occur within the
same relation. MVD is denoted by X →→ Y specified on relation schema R, where X and Y are both subsets
of R.
2. When are two sets of functional dependencies said to be equivalent?
Ans. Two sets F1 and F2 of FDs are said to be equivalent, if F1+ = F2+, that is, every FD in F1 is implied by
F2 and every FD in F2 is implied by F1.
3. Define the following:
a. Full functional dependency
b. Partial dependency
Ans.
a. A dependency X → Y in a relational schema R is said to be a fully functionally dependency if there is no A,
where A is the proper subset of X such that A → Y. It implies removal of any attribute from X means that
the dependency does not hold any more.
b. A dependency X → Y in a relational schema R is said to be a partial dependency if there is any attribute A
where A is the proper subset of X such that A → Y. The attribute Y is said to be partially dependent
on the attribute X.
4. What is transitive dependency? Name the normal form which is based on the concept of
transitive dependency.
Ans. An attribute Y of a relational schema R is said to be transitively dependent on attribute X (X → Y), if
there is a set of attributes A that is neither a candidate key nor a subset of any key of R and both
X → A and A → Y hold. The normal form that is based on transitive dependency is 3NF.
5. What is normalization?
Ans. Normalization is the process of organizing a database to reduce redundancy and improve data
integrity.
6. Define 2NF.
Ans. A relation R is in second normal form (2NF) if and only if it is in 1NF and every non-key attribute is
fully dependent on the primary key. A relation R is in 2NF if every non-prime attribute of R is fully
functionally dependent on each relation key.
7. Why is BCNF considered simpler as well as stronger than 3NF?
Ans. BCNF is the simpler form of 3NF as it makes explicit reference to neither the first and second normal
forms nor to the concept of transitive dependence.
In addition, it is stronger than 3NF as every relation that is in BCNF is also in 3NF but the vice versa is not
necessarily true.
8. Define lossless join decomposition.
Ans. Let R be a relational schema and let F be a set of functional dependencies on R. Let R1 and R2 form a
decomposition of R. This decomposition is a lossless join decomposition of R if at least one of
the following functionl dependencies is in F+.
i. R1 R2 → R1
ii. R1 R2 → R2
9. What do you understand by the closure of a set of attributes?
Ans. The closure of a set of attributes implies a certain subset of the closure that consists of all FDs with a
specified set of Z attributes as determinant.
10. What are the uses of the closure algorithm?
Ans. Besides computing the subset of closure, the closure algorithm has other uses that are as follows:
i. To determine if a particular FD, say X → Y is in the closure F+ of F, without computing the closure F+.
This can be done by simply computing X+ by using the closure algorithm and then checking if Y X+.
ii. To test if a set of attributes A is a super key of R. This can be done by computing A+ and checking if A+
contains all the attributes of R.
11. Describe the dependency preservation property.
Ans. It is a property that is desired while decomposition, that is, no FD of the original relation is lost. The
dependency preservation property ensures that each FD represented by the original relation is enforced by
examining and single relation resulted from decomposition or can be inferred from FDs in some
decomposed relation.
i. In Fig. 3.3.1, [Name + Course] is a candidate key, So Name and Course are prime attributes, Grade is fully
functionally dependent on the candidate keys and Phone no., Course-deptt. and roll no. are partially
functional dependent on the candidate key.
ii. Given R (A, B, C, D) and F = {AB C, B D}. Then key of this relation is AB and D is partially
dependent on the key.
4. Define partial functional dependency. Consider the following two steps of functional
dependencies F = {A C, AC D, E AD, E H} and G = {A CD, E AH}. Check
whether or not they are equivalent.
Answer:
Partial functional dependency: Refer Q.3
Numerical:
From F,
E AD
E A (By Decomposition Rule)
ED
Also given that
EH
So, E AH (By Union Rule)
which is a FD of set G.
Again A C and AC D
Imply A D (By Pseudotransitivity Rule)
A CD (by Union Rule)
which is FD of set G.
Hence, F and G are equivalent.
5. Write the algorithm to find minimal cover F for set of functional dependencies E.
Answer:
Algorithm:
1. Set F: = E.
2. Replace each functional dependency X {A1, A2, ..., An} in F by the n functional dependencies X A1,
X A2, ..., X An.
3. For each functional dependency X A in F for each attribute B that is an element of X if { {F – {X A} }
{ (X – {B}) A} } is equivalent to F, then replace X A with (X – {B} ) A in F.
4. For each remaining functional dependency X A in F if {F – {X A} } is equivalent to F, then remove
X A from F.
7. Define normal forms. List the definitions of first, second and third normal forms. Explain
BCNF with a suitable example.
OR
Explain 1NF, 2NF, 3NF and BCNF with suitable example.
Answer:
1. Normal forms are simply stages of database design, with each stage applying more strict rules to the types
of information which can be stored in a table.
2. Normal form is a method to normalize the relations in database.
3. Normal forms are based on the functional dependencies among the attributes of a relation.
Different normal forms are:
1. First Normal Form (1NF):
a. A relations R is in 1NF if all domains are simple i.e., all elements are atomic.
For example: The relation LIVED-IN given in Table (1) is not in 1NF because the domain values of the
attribute ADDRESS are not atomic.
Table (1). LIVED-IN
Name Address
Ashok CITY Year-moved-in Year-left
Kolkata 2007 2015
Delhi 2011 2015
Ajay CITY Year-moved-in Year-left
Mumbai 2000 2004
Chennai 2005 2009
Relation not in 1NF and can be normalized by replacing the non-simple domain with simple domains. The
normalized form of LIVED-IN is given in Table 3.7.2.
Table 3.7.2. LIVED-IN
Name City Year-moved-in Year-left
Ashok Kolkata 2007 2010
Ashok Delhi 2011 2015
Ajay Mumbai 2000 2004
Ajay Chennai 2005 2009
11. Write the difference between 3NF and BCNF. Find the normal form of relation R (A, B, C,
D, E) having FD set F= {A B, BC E, ED A}.
Answer:
Difference: Refer Q.9
Numerical:
Given: R (A, B, C, D, E) and
F=AB
BC E
ED A
(ACD)+ = ACDB AB
= ABCDE BC E
= ABCDE ED A
So, ACD is a key of R.
12. Explain inclusion dependencies.
Answer:
1. An inclusion dependency R. X < S.Y between two set of attributes X of relation schema R, and Y of
relation schema S specifies the constraint that, at any specific time when r is a relation state of R and s a
relation state of S, we must have
X(r(R)) Y(s(S))
2. The set of attributes on which the inclusion dependency is specified X of R and Y of S must have the same
number of attributes. Also domains for each pair of corresponding attributes should be compatible.
3. Inclusion dependencies are defined in order to formalize two types of inter relational constraints:
a. The foreign key (or referential integrity) constraint cannot be specified as a functional or multivalued
dependency because it relates attributes across relations.
b. The constraint between two relations that represent a class/subclass relationship also has no formal
definition in terms of the functional, multivalued, and join dependencies.
4. For example, if X = {A1, A2, . . . , An} and Y = {B1, B2, . . . , Bn}, one possible correspondence is to have
dom(Ai) compatible with dom(Bi) for 1 i n. In this case, we say that Ai corresponds to Bi.
14. Consider the relation r(X, Y, Z, W, Q) the set F = {XZ, YZ, ZW, WQZ, ZQX} and
the decomposition of r into relations R1(X, W), R2(X, Y), R3(Y, Q), R4(Z, W, Q) and R5(X,
Q). Check whether the decompositions are lossy or lossless.
Answer:
To check the decomposition is lossless following condition should hold.
1. R1 R2 R3 R4 R5 = (X, W) (X, Y) (Y, Q) (Z, W, Q) (X, Q) = (X, Y, Z, W, Q) = R
2. (R1 R2) (R3 R4) R5 = ((X, W) (X, Y)) ((Y, Q)
(Z, W, Q)) (X, Q) = X Q {X, Q)
=XQ=
Since, condition 2 violates the condition of lossless join decomposition. Hence decomposition is lossy.
17. Explain the fourth and fifth normal with suitable example.
Answer:
Fourth Normal Form (4NF):
1. A table is in 4NF, if it is in BCNF and it contains multivalued dependencies.
2. A relation schema R is in 4NF, with respect to a set of dependencies F (that includes FD and multivalued
dependencies) if, for every nontrivial multivalued dependency X Y in F+, X is super key for R.
For example: A Faculty has multiple courses to teach and he is leading several committees. This relation
is in BCNF, since all the three attributes concatenated together constitutes its key. The rule for
decomposition is to decompose the offending table into two, with the multi-determinant attribute or
attributes as part of the key of both. In this case to put the relation in 4NF, two separate relations are
formed as follows:
FACULTY_COURSE (FACULTY, COURSE)
FACULTY_COMMITTEE (FACULTY, COMMITTEE)
Faculty Course Faculty Committee
John Subject John Placement
John Networking John Scholarship
John MIS
Company_Product Company_Supplier
Company Product Company Supplier
Godrej Soap Godrej Mr. X
Godrej Shampoo Godrej Mr. X
H. Lever Soap Godrej Mr. Z
H. Lever Shampoo H. Lever Mr. X
H. Lever Mr. Y
The redundancy has been eliminated but we have lost the information. Now suppose that the original table
to be decomposed in three parts, Company_Product, Company_Supplier and Product_Supplier, which is as
follows:
Product_Supplier
PRODUCT SUPPLIER
Soap Mr. X
Soap Mr. Y
Shampoo Mr. X
Shampoo Mr. Y
Shampoo Mr. Z
So, it is clear that if a table is in 4NF and cannot be further decomposed, it is said to be in 5NF.
18. What is meant by the attribute preservation condition on decomposition? Given relation
R (A, B,C,D,E) with the functional dependencies F = {ABCD, AE, CD}, the
decomposition of R into R1(A, B, C), R2(B, C, D), R3(C, D, E) check whether the relation is
lossy or lossless.
Answer:
Attribute preservation condition on decomposition:
1. The relational database design algorithms start from a single universal relation schema R = {A1, A2, ...,
An} that includes all the attributes of the database.
2. We implicitly make the universal relation assumption, which states that every attribute name is unique.
3. Using the functional dependencies, the algorithms decompose the universal relation schema R into a set
of relation schemas D = {R1, R2, ..., Rm} that will become the relational database schema; D is called a
decomposition of R.
4. Each attribute in R must appear in at least one relation schema Ri in the decomposition so that no
attributes are lost; formally, we have
R1= (A, B, C) A B C D E
R2= (B, C, D) R1 a1 a2 a3 a4 a5
R3= (C, D, E) R2 a1 b22 b23 b24 a5
R1 R2 R3= C R3 b31 b32 a3 a4 b35
After applying first two functional dependencies first row contain all “a” symbols. Hence it is lossless join.
UNIT-4:
TRANSACTION PROCESSING CONCEPT
SHORT ANSWER TYPE QUESTIONS
1. Define transaction.
Ans. A collection of operations that form a single logical unit of work is called a transaction. The operations
that make up a transaction typically consist of requests to access existing data, modify existing data, add
new data or any combination of these requests.
2. Define the term ACID properties.
Ans. ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of database transactions
intended to guarantee validity even in the event of errors, power failures, etc.
3. State the properties of transaction.
Ans. ACID properties of transaction:
i. Atomicity
ii. Consistency
iii. Isolation
iv. Durability
4. Explain I in ACID property.
Ans. I in ACID property stands for isolation i.e., each transaction is unaware of other transaction executing
concurrently in the system.
5. What is serializability? How it is tested?
Ans. Serializability is the classical concurrency scheme which ensures that a schedule for executing
concurrent transaction serially in same order. Serializability is tested by constructing precedence graph.
6. Define schedule.
Ans. A schedule is a list of operations (actions) ordered by time, performed by a set of transactions that are
executed together in the system.
7. What do you mean by serial schedule?
Ans. Serial schedule is a schedule in which transactions in the schedule are defined to execute one after the
other.
8. Define replication in distributed database.
Ans. Replication is a technique used in distributed databases to store multiple copies of a data table at
different sites.
9. Define data atomicity.
Ans. Data atomicity is one of the transaction properties which specify that either all operations of the
transaction are reflected properly in the database or not.
10. Define cascading rollback and blind writes.
Ans. Cascading rollback is a situation in which failure of single transaction leads to a series of transaction
rollbacks. Blind writes are those write operations which are performed without performing the read
operation.
11. Define precedence graph.
Ans. A precedence graph is a directed graph G = (N, E) where N = {T1, T2, .... , Tn} is a set of nodes and E =
{e1, e2 .... en} is a set of directed edges.
12. Give types of failures.
Ans. Types of failures:
i. Transaction failure
ii. System crash
iii. Disk failure
13. Give the idea behind shadow paging technique.
Ans. The key idea behind shadow paging technique is to maintain following two page tables during the life
of transaction:
i. Current page table
ii. Shadow page table
14. Give merits and demerits of shadow paging.
Ans. Merits of shadow paging:
i. The overhead of log record output is eliminated.
ii. Recovery from crashes is significantly faster.
Demerits:
i. Commit overhead
ii. Data fragmentation
iii. Garbage collection
15. What is multimedia database?
Ans. Multimedia database provides features that allow users to store and query different types of
multimedia information, which includes images (pictures or drawings), video clips (movies, news reels,
home video), audio clips (songs, phone messages, speeches) and documents (books, articles).
16. Why is it desirable to have concurrent execution of multiple transactions?
Ans. It is desirable to have concurrent execution of multiple transaction:
i. To increase system throughput.
ii. To reduce average response time.
17. What do you mean by conflict serializable schedule?
Ans. A schedule is called conflict serializable if it can be transformed into a serial schedule by swapping
non-conflicting operation.
18. Define concurrency control.
Ans. Concurrency Control (CC) is a process to ensure that data is updated correctly and appropriately
when multiple transactions are concurrently executed in DBMS.
3. Write and describe ACID properties of transaction. How does the recovery manager
ensure atomicity of transactions? How does it ensure durability?
Answer:
ACID properties: Refer Q. 2
Ensuring the atomicity:
1. To ensure atomicity, database system keeps track of the old values of any data on which a transaction
performs a write.
2. If the transaction does not complete its execution, the database system restores the old values.
3. Atomicity is handled by transaction management component.
Ensuring the durability:
1. Ensuring durability is the responsibility of a component called the recovery management component.
2. The durability property guarantees that, once a transaction completes successfully, all the updates that it
carried out on the database persist, even if there is a system failure after the transaction completes
execution.
Fig. 1.
2. Partially committed: A transaction is said to be entered in the partial state when final statement gets
executed. But it is still possible that it may have to be aborted, since its actual operation is still resided
in main memory in which the power failure brings failure of its execution.
3. Failed: A transaction enters a failed state after the system determines that the transaction can no longer
proceeds with its normal execution.
4. Aborted: A transaction enters this state after the transaction has been rolled back and the database has
been restored to its state, prior to the start of the transaction.
5. Committed: A transaction enter this state after successful completion.
ACID properties with example: Refer Q. 2
5. Ii and Ij conflict if there are operations by different transactions on the same data item, and at least one of
these instructions is a write operation.
For example:
Schedule S
T1 T2
read (A)
write (A)
read (A)
write(A)
read (B)
write (B)
read (B)
write (B)
i. The write (A) instruction of T1 conflicts with read (A) instruction of T2. However, the write (A) instruction
of T2 does not conflict with the read (B) instruction of T1 as they access different data items.
Schedule S'
T1 T2
read (A)
write (B)
read (A)
read (B)
write(A)
write (B)
read (B)
write (B)
ii. Since the write (A) instruction of T2 in Schedule S’ does not conflict with the read (B) instruction of T1,
we can swap these instructions to generate an equivalent schedule.
iii. Both schedules will produce the same final system state.
6. If a schedule S can be transformed into a schedule S’ by a series of swaps of non-conflicting instructions,
we say that S and S’ are conflict equivalent.
7. The concept of conflict equivalence leads to the concept of conflict serializability and the schedule S is
conflict serializable.
10. What is schedule? Define the concept of recoverable, cascade less and strict schedules.
Answer:
Schedule: A schedule is a set of transaction with the order of execution of instruction in the transaction.
Recoverable schedule:
A recoverable schedule is one in which for each pair of transaction Ti and Tj if Tj reads a data item
previously written by Ti, the commit operation of Ti appears before the commit operation of Tj.
For example: In schedule S, let T2 commits immediately after executing read (A) i.e., T2 commits before
T1 does. Now let T1 fails before it commits, we must abort T2 to ensure transaction atomicity. But as T2 has
already committed, it cannot be aborted. In this situation, it is impossible to recover correctly from the
failure of T1.
Schedule S
T1 T2
read (A)
write (B)
read (A)
read (B)
11. What is precedence graph? How can it be used to test the conflict serializability of a
schedule?
Answer:
Precedence graph:
1. A precedence graph is a directed graph G = (N, E) that consists of set of nodes N = {T1, T2, ...., Tn} and
set of directed edges E = [e1, e2...., em].
2. There is one node in the graph for each transaction Ti in the schedule.
3. Each edge ei in the graph is of the form (Tj → Tk), 1 ≤ j ≤ n, 1 ≤ k ≤ n, where Tj is the starting node of ei
and Tk is the ending node of ei.
4. Such an edge is created if one of the operations in Tj appears in the schedule before some conflicting
operation in Tk.
Algorithm for testing conflict serializability of schedule S:
a. For each transaction Ti participating in schedule S, create a node labeled Ti in the precedence graph.
b. For each case in S where Tj executes a read_item(X) after Ti executes a write_item(X), create an edge (Ti
→ Tj) in the precedence graph.
c. For each case in S where Tj executes a write_item(X) after Ti executes read_item(X), create an edge (Ti
→ Tj) in the precedence graph.
d. For each case in S where Tj executes a write_item(X) after Ti executes a write_item(X), create an edge
(Ti → Tj) in the precedence graph.
e. The schedule S is serializable if and only if the precedence graph has no cycles.
5. The precedence graph is constructed as described in given algorithm.
6. If there is a cycle in the precedence graph, schedule S is not (conflict) serializable; if there is no cycle, S is
serializable.
7. In the precedence graph, and edge from Ti to Tj means that transaction Ti must come before transaction
Tj in any serial schedule that is equivalent to S, because two conflicting operations appear in the schedule in
that order.
8. If there is no cycle in the precedence graph, we can create an equivalent serial schedule S’ that is
equivalent to S, by ordering the transactions that participate in S as follows: Whenever an edge exists in the
precedence graph from Ti to Tj, Ti must appear before Tj in the equivalent serial schedule S’.
Example:
T1 T2 T3
read (Y);
read (Z);
read (X);
write (X);
write (Y);
write (Z);
read (Z);
read (Y);
write (Y);
read (Y);
write (Y);
read (X);
write (X);
13. Discuss cascade less schedule and cascading rollback. Why is cascade less of schedule
desirable?
Answer:
Cascade less schedule: Refer Q.10
Cascading rollback: Cascading rollback is a phenomenon in which a single failure leads to a series of
transaction rollback.
For example:
Schedule S
T1 T2 T3
read (A)
read (B)
write (A)
read (A)
write (A)
read (A)
In the example, transaction T1 writes a value of A that is read by transaction T2. Transaction T2 writes a
value of A that is read by transaction T3. Suppose that at this point T1 fails. T1 must be rolled back. Since T2
is dependent on T1, T2 must be rolled back, since T3 is dependent on T2, T3 must be rolled back.
Need for cascade less schedules:
Cascade less schedules are desirable because the failure of a transaction does not lead to the aborting of any
other transaction. This comes at the cost of less concurrency.
14. Discuss the rules to be followed while preparing a serializable schedule. Why should we
prefer serializable schedules instead of serial schedules?
Answer:
The set of rules which must be followed for preparing serializable schedule are:
1. Take any concurrent schedule.
2. Draw the precedence graph for concurrent schedule.
3. If there is a cycle in precedence graph then schedule is not serializable.
4. If there is no cycle the schedule is serializable.
5. Prepare serializable schedule using precedence graph.
We prefer serializable schedule instead of serial schedule because:
1. The problem with serial schedule is that it limits concurrency or interleaving of operations.
2. In a serial schedule, if a transaction waits for an I/O operation to complete, we cannot switch the CPU
processor to another transaction, thus wasting valuable CPU processing time.
3. If some transaction T is quite long, the other transactions must wait for T to complete all its operations
before committing.
15. What are schedules? What are differences between conflict serializability and view
serializability? Explain with suitable example what are cascade less and recoverable
schedules?
Answer:
Schedule: Refer Q. 10
Difference between conflict and view serializability:
S. No. Conflict serializability View serializability
1. Easy to achieve Difficult to achieve
2. Cheaper to test Expensive to test
3. Every conflict serializable is view Every view serializable is not conflict serializable
serializable
4. Used in most concurrency control scheme Not used in concurrency control scheme
16. What is schedule? What are its types? Explain view serializable and cascade less schedule
with suitable example of each.
Answer:
Schedule: Refer Q.10
Types of schedules are:
1. Recoverable schedule
2. Cascade less schedule
3. Strict schedule
View serializable: Refer Q. 9
Cascade less schedule: Refer Q. 10
17. Which of the following schedules are conflicts serializable? For each serializable
schedule find the equivalent schedule.
S1: r1(x); r3(x); w3(x); w1(x); r2(x)
S2: r3(x); r2(x); w3(x); r1(x); w1(x)
S3: r1(x); r2(x); r3(y); w1(x); r2(z); r2(y); w2(y)
Answer:
For S1 For S2 For S3
T1 T2 T3 T1 T2 T3 T1 T2 T3
r1(x) r3(x) R3(x)
r3(x) r2(x) R3(x)
w3(x) w3(x) R1(x)
w1(x) r1(x) W3(x)
r2(x) w1(x W1(x)
)
Since, the graph does Since, the graph does Since, the graph contains
not contain cycle. not contain cycle. cycle. Hence, it is not
Hence, it is conflict Hence, it is conflict conflict serializable.
serializable. serializable.
21. What is log file? Write the steps for log-based recovery of a system with suitable example.
Answer:
Log file: A log file is a file that records all the update activities occur in the database.
Steps for log-based recovery:
1. The log file is kept on a stable storage media.
2. When a transaction enters the system and starts execution, it writes a log about it
<Tn, start>
3. When the transaction modifies an item X, it write log as follows
<Tn, X, V1, V2>
It reads as Tn has changed the value of X, from V1 to V2.
4. When the transaction finishes, it logs
<Tn, commit>
For example:
<T0, start>
<T0, A, 0, 10>
<T0, commit>
<T1, start>
<T1, B, 0, 10>
<T2, start>
<T2, C, 0, 10>
<T2, C, 10, 20>
<checkpoint (T1, T2)>
<T3, start>
<T3, A, 10, 20>
<T3, D, 0, 10>
<T3, commit>
22. Describe the important types of recovery techniques. Explain their advantages and
disadvantages.
Answer:
There are many different database recovery techniques to recover a database:
1. Deferred update recovery: Refer Q.18
Advantages:
a. Recovery is easy.
b. Cascading rollback does not occur because no other transaction sees the work of another until it is
committed.
Disadvantages:
a. Concurrency is limited.
2. Immediate update recovery: Refer Q. 18
Advantages:
a. It allows higher concurrency because transactions write continuously to the database rather than waiting
until the commit point.
Disadvantages:
a. It leads to cascading rollbacks.
b. It is time consuming and may be problematic.
T1 T2
Wait-for-lock(R2)
Fig. 1. Wait-for-graph.
3. Recovery from deadlock:
a. Selection of a victim: In this we determine which transaction (or transactions) to roll back to break
the deadlock. We should rollback those transactions that will incur the minimum cost.
b. Rollback: The simplest solution is a ‘‘total rollback’’. Abort the transaction and then restart it.
c. Starvation: In a system where selection of transactions, for rollback, is based on the cost factor, it may
happen that the some transactions are always picked up.
4. Deadlock avoidance: Deadlock can be avoided by following methods:
a. Serial access: If only one transaction can access the database at a time, then we can avoid deadlock.
b. Auto commit transaction: It includes that each transaction can only lock one resource immediately
as it uses it, then finishes its transaction and releases its lock before requesting any other resource.
c. Ordered updates: If transactions always request resources in the same order (for example,
numerically ascending by the index value of the row being locked) then system do not enter in deadlock
state.
d. By rolling back conflicting transactions.
e. By allocating the locks where needed.
25. What is deadlock? What are necessary conditions for it? How it can be detected and
recovered?
Answer:
Deadlock: Refer Q. 23
Necessary condition for deadlock: A deadlock situation can arise if the following four conditions hold
simultaneously in a system:
1. Mutual exclusion: At least one resource must be held in a non-sharable mode; that is, only one process
at a time can use the resource. If another process requests that resource, the requesting process must be
delayed until the resource has been released.
2. Hold and wait: A process must be holding at least one resource and waiting to acquire additional
resources that are currently being held by other processes.
3. No pre-emption: Resources cannot be pre-empted; i.e., a resource can be released only by the process
holding it, after that process has completed its task.
4. Circular wait: A set {P0, P1, ..., Pn} of waiting processes must exist such that P0 is waiting for a resource
held by P1, P1 is waiting for a resource held by P2, ..., Pn–1 is waiting for a resource held by Pn, and Pn is
waiting for a resource held by P0.
Deadlock detection and recovery: Refer Q. 23
26. What is distributed databases? What are the advantages and disadvantages of
distributed databases?
OR
Explain the advantages of distributed DBMS.
Answer
Distributed database:
30. What are distributed database? List advantages and disadvantages of data replication
and data fragmentation. Explain with a suitable example, what are the differences in
replication and fragmentation transparency?
OR
Explain the types of distributed data storage.
OR
What are distributed database? List advantage and disadvantage of data replication and data
fragmentation.
Answer:
Distributed database: Refer Q.26
Advantages of data replication:
i. Availability: If one of the sites containing relation r fails, then the relation r can be found in another
site. Thus, the system can continue to process queries involving ‘r’, despite the failure of one site.
ii. Increased parallelism: Number of transactions can read relation r in parallel. The more replicas of ‘r’
there are, the greater parallelism is achieved.
Disadvantages of data replication:
i. Increased overhead on update: The system must ensure that all replicas of a relation r are
consistent; otherwise, erroneous computation may result. Thus, whenever r is updated, the update must be
propagated to all sites containing replicas. The result is increased overhead.
Advantages of data fragmentation:
i. Parallelized execution of queries by different sites is possible.
ii. Data management is easy as fragments are smaller compare to the complete database.
iii. Increased availability of data to the users/queries that are local to the site in which the data stored.
iv. As the data is available close to the place where it is most frequently used, the efficiency of the system in
terms of query processing, transaction processing is increased.
v. Data that are not required by local applications are not stored locally. It leads to reduced data transfer
between sites, and increased security.
Disadvantages of data fragmentation:
i. The performance of global application that requires data from several fragments located at different sites
may be slower.
ii. Integrity control may be more difficult if data and functional dependencies are fragmented and located at
different sites.
Differences in replication and fragmentation transparency:
S. No. Replication transparency Fragmentation transparency
1. It involves placing copies of each table or It involves decomposition of a table in many
each of their fragments on more than one tables in the system.
site in the system.
2. The user does not know about how many The user does not know about how relation is
replicas of the relation are present in the divided/ fragmented in the system.
system.
3. For example, if relation r is replicated, a For example, if relation ‘r’ is fragmented, ‘r’ is
copy of relation r is stored in two or more divided into a number of fragments. These
sites. In extreme case, a copy is stored in fragments contain sufficient information to
every site in the system which is called as allow reconstruction of the original relation ‘r’.
full replication.
Co-ordinator Participant
status step status step
can Commit?
1 prepared to commit 2 Prepared to
(waiting for votes) Yes commit
(uncertain)
do Commit
3 committed 4 committed
done have Committed
UNIT-5
CONCURRENCY CONTROL TECHNIQUES
SHORT ANSWER TYPE QUESTIONS
1. Why is concurrency control needed?
Ans. Concurrency control is needed so that the data can be updated correctly when multiple transactions
are executed concurrently.
2. Write down the main categories of concurrency control.
Ans. Categories of concurrency control are:
i. Optimistic ii. Pessimistic
iii. Semi-optimistic
3. What do you mean by optimistic concurrency control?
Ans. Optimistic concurrency control states means transactions fails when they commit with conflicts. It is
useful where we do not expect conflicts but if it occurs than the committing transaction is rollbacked and
can be restarted.
4. Define locks.
Ans. A lock is a variable associated with each data item that indicates whether read or write operation is
applied.
5. Define the modes of lock.
Ans. Data items can be locked in two modes:
1. Exclusive (X) mode: If a transaction Ti has obtained an exclusive mode lock on item Q, then Ti can
read as well as write Q data item.
2. Shared (S) mode: If a transaction Ti has obtained a shared mode lock on item Q, then Ti can only read
the data item Q but Ti cannot write the data item Q.
6. Give merits and demerits of two-phase locking.
Ans. Merits of two phase locking:
i. It maintains database consistency.
ii. It increases concurrency over static locking as locks are held for shorter period.
Demerits of two-phase locking:
i. Deadlock
ii. Cascade aborts / rollback
7. Define lock compatibility.
Ans. Lock compatibility determines whether locks can be acquired on a data item by multiple transactions
at the same time.
8. Define upgrade and downgrade in locking protocol.
Ans. Upgrade: Upgrade is the lock conversion from shared to exclusive mode. It takes place only in
growing phase.
Downgrade: Downgrade is the lock conversion from exclusive to shared mode. It can take place only in
shrinking phase.
9. Define the term intention lock.
Ans. Intention lock is a type of lock mode used in multiple granularity locking in which a transaction
intends to explicitly lock a lower level of the tree. To provide a higher degree of concurrency, intention
mode is associated with shared mode and exclusive mode.
10. What are the pitfalls of lock based protocol?
Ans. Pitfalls of lock based protocols are:
i. Deadlock can occur.
ii. Starvation is also possible if concurrency control manager is badly designed.
11. Define exclusive lock.
Ans. Exclusive lock is a lock which provides only one user to read a data item at a particular time.
12. Define timestamp.
Ans. A timestamp is a unique identifier created by the DBMS to identify a transaction. This timestamp is
used in timestamp based concurrency control techniques.
13. Define multi version scheme.
Ans. Multi version concurrency control is a scheme in which each write(Q) operation creates a new version
of Q. When a transaction issues a read(Q) operation, the concurrency control manager selects one of the
version of Q to be read that ensures serializability.
14. Define Thomas’ write rule.
Ans. Thomas’ write rule is the modification to the basic timestamp ordering, in which the rules for write
operations are slightly different from those of basic timestamp ordering. It does not enforce conflict
serializability.
5. Describe how a typical lock manager is implemented. Why must lock and unlock be
atomic operations? What is the difference between a lock and a latch? What are convoys and
how should a lock manager handle them?
Answer:
Implementation of lock manager:
1. A typical lock manager is implemented with a hash table, also called lock table, with the data object
identifier as the key.
2. A lock table entry contains the following information:
a. The number of transactions currently holding a lock on the object.
b. The nature of the lock.
c. A pointer to a queue of lock requests.
Reason for lock and unlock being atomic operations: Lock and unlock must be atomic operations
because it may be possible for two transactions to obtain an exclusive lock on the same object, thereby
destroying the principles of 2PL.
Difference between lock and latch:
S. No. Lock Latch
1. It is used when we lock any data item. It is used when we release lock
2. Hold for long duration. Hold for short duration
3. It is used at initial stage of transaction. It is used when all the operations are completed
Convoy:
1. Convoy is a queue of waiting transactions.
2. It occurs when a transaction holding a heavily used lock is suspended by the operating system, and every
other transaction that needs this lock is queued.
Lock manager handle convoy by allowing a transaction to acquire lock only once.
Fig. 1.
9. Write the salient features of graph based locking protocol with suitable example.
Answer:
Salient features of graph based locking protocol are:
1. The graph based locking protocol ensures conflict serializability.
2. Free from deadlock.
3. Unlocking may occur earlier in the graph based locking protocol than in the two phase locking protocol.
4. Shorter waiting time, and increase in concurrency.
5. No rollbacks are required.
6. Data items may be unlocked at any time.
7. Only exclusive locks are considered.
8. The first lock by T1 may be on any data item. Subsequently, a data Q can be locked by T1 only if the
parent of Q is currently locked by T1.
9. A data item that has been locked and unlocked by T1 cannot subsequently be relocked by T1.
For example:
We have three transactions in this schedule, i.e., we will only see how locking and unlocking of data item.
T1 T2 T3
Lock-X(A)
Lock-X(D)
Lock-X(H)
Unlock-X(D)
Lock-X(E)
Lock-X(D)
Unlock-X(B)
Unlock-X(E)
Lock-X(B)
Lock-X(E)
Unlock-X(H)
11. Describe major problems associated with concurrent processing with examples. What is
the role of locks in avoiding these problems?
OR
Describe the problem faced when concurrent transactions are executing in uncontrolled
manner. Give an example and explain.
Answer:
Concurrent transaction : Concurrent transaction means multiple transactions are active at the same
time. Following problems can arise if many transactions try to access a common database simultaneously:
1. The lost update problem :
a. A second transaction writes a second value of a data item on top of a first value written by a first
concurrent transaction, and the first value is lost to other transactions running concurrently which need, by
their precedence, to read the first value.
b. The transactions that have read the wrong value end with incorrect results.
Example:
Transaction T1 Transaction T2
Read X → A2
Read Y → A1
A2 + A1 → A2
Read X → A2
A2 + 1 → A2
Write A2 → X
Write A1 → Y
Time Write A2 → X
In the example, the update performed by the transaction T2 is lost (overwritten) by transaction T1.
2. The dirty read problem:
a. Transactions read a value written by a transaction that has been later aborted.
b. This value disappears from the database upon abort, and should not have been read by any transaction
(“dirty read”).
c. The reading transactions end with incorrect results.
Example:
Transaction T1 Transaction T2
Read Y → A1
Read X → A2
A2 + A1 → A2
Write A2 → X
.
.
. Read X → A2
. Time Commit
Fails
In the example, transaction T1 fails and changes the value of X back to its old value, but T2 is committed
and reads the temporary incorrect value of X.
3. The incorrect summary problem:
a. While one transaction takes a summary over the values of all the
Transaction T1 Transaction T2
Read Y → A2 Read X → A2
Read X → A1 A2 + 1 → A2
A2 + A1 → A2 Write A2 →X
Write A2 →X Roll back
Time
instances of a repeated data item, a second transaction updates some instances of that data item.
b. The resulting summary does not reflect a correct result for any (usually needed for correctness)
precedence order between the two transactions (if one is executed before the other).
Example:
An example of unrepeatable read in which if T1 were to read the value of X after T2 had updated X, the
result of T1 would be different.
Role of locks:
1. It locks the data item in the transaction in correct order.
2. If any data item is locked than it must be unlock at the end of operation.
15. Explain the phantom phenomenon. Devise a timestamp-based protocol that avoids the
phantom phenomenon.
OR
Explain the phantom phenomena. Discuss a timestamp protocol that avoids the phantom
phenomena.
Answer:
Phantom phenomenon:
1. A deadlock that is detected but is not really a deadlock is called a phantom deadlock.
2. In distributed deadlock detection, information about wait-for relationship between transactions is
transmitted from one server to another.
3. If there is a deadlock, the necessary information will eventually be collected in one place and a cycle will
be detected.
4. As this procedure will take some time, there is a chance that one of the transactions that hold a lock will
meanwhile have released it; in this case the deadlock will no longer exist.
For example:
1. Consider the case of global deadlock detector that receives local wait-for graph from servers X and Y as
shown in fig. 1 and fig.2.
Fig. 1. Local wait-for graph. Fig. 2. Local wait-for graph. Fig.3. Global wait-for graph.
2. Suppose that transaction U releases an object at server X and requests the one held by V at server Y.
3. Suppose also that the global detector receives server Y’s local graph before server X’s.
4. In this case, it would detect a cycle T → U → V → T, although the edge T → U no longer exists.
5. A phantom deadlock could be detected if a waiting transaction in a deadlock cycle aborts during the
deadlock detection procedure. For example, if there is a cycle T → U → V → T and U aborts after the
information concerning U has been collected, then the cycle has been broken already and there is no
deadlock.
Timestamp based protocol that avoids phantom phenomenon:
1. The B + tree index based approach can be adapted to timestamping by treating index buckets as data
items with timestamps associated with them, and requiring that all read accesses use an index.
2. Suppose a transaction Ti wants to access all tuples with a particular range of search-key values, using a B
+ tree index on that search-key.
3. Ti will need to read all the buckets in that index which have key values in that range.
4. Ti will need to write one of the buckets in that index when any deletion or insertion operation on the
tuple is done.
5. Thus the logical conflict is converted to a conflict on an index bucket, and the phantom phenomenon is
avoided.
Fig. 1.
9. When a transaction locks a node, it also has implicitly locked all the descendants of that node in the same
mode.
18. What is granularity locking? How does granularity of data item affect the performance of
concurrency control? What factors affect the selection of granularity size of data item?
Answer:
Granularity locking:
1. Granularity locking is a concept of locking the data item on the basis of size of data item.
2. It is based on the hierarchy of data where small granularities are nested within larger one. The lock may
be granted at any level from bottom to top.
Effect of granularity of data item over the performance of concurrency control:
1. The larger the data item size is, the lower the degree of concurrency permitted. For example, if the data
item size is disk block, a transaction T that need to lock a record B must lock the whole disk block X that
contains B. If the other transactions want to lock record C which resides in same lock then it is forced to
wait.
2. If the data item size is small then the number of items in the database increases. Because every item is
associated with a lock, the system will have a larger number of active locks to be handled by the lock
manager.
3. More lock and unlock operations will be performed which cause higher overhead.
Factors affecting the selection of granularity size of data items:
1. It depends on the types of transaction involved.
2. If a typical transaction accesses a small number of records, it is advantageous to have the data item
granularity be one record.
3. If a transaction typically accesses many records in the same file, it may be better to have block or file
granularity so that the transaction will consider all those records as one (or a few) data items.
19. What do you mean by multi granularity? How the concurrency is maintained in this case.
Write the concurrent transaction for the following graph.
T1 wants to access item C in read mode
T2 wants to access item D in exclusive mode
T3 wants to read all the children of item B
T4 wants to access all items in read mode
Answer:
Multi granularity: Refer Q.16
Concurrency in multi granularity protocol: Refer Q. 1
Numerical:
1. Transaction T1 reads the item C in B. Then, T2 needs to lock the item the item A and B in IS mode (and in
that order), and finally to lock the item C in S mode.
2. Transaction T2 modifies the item D in B. Then, T2 needs to lock the item A AND B (and in that order) in
IX mode, and at last to lock the item D in X mode.
3. Transaction T3 reads all the records in B. Then, T3 needs to lock the A and B (and in that order) in IS
mode, and at last to lock the item B in S mode.
4. Transaction T4 read the all item. It can be done after locking the item A in S mode.
20. What is multi version concurrency control? Explain multi version timestamping
protocol.
Answer:
Multi version concurrency control:
1. Multi version concurrency control is a schemes in which each write(Q) operation creates a new version of
Q.
2. When a transaction issues a read(Q) operation, the concurrency-control manager selects one of the
version of Q to be read.
3. The concurrency control scheme must ensure that the version to be read is selected in manner that
ensures serializability.
Multi version timestamping protocol:
1. The most common transaction-ordering technique used by multi version schemes is timestamping.
2. With each transaction Ti in the system, we associate a unique static timestamp, denoted by TS(Ti).
3. This timestamp is assigned before the transaction starts execution.
4. Concurrency can be increased if we allow multiple versions to be stored, so that the transaction can
access the version that is consistent for them.
5. With this protocol, each data item Q is associated with a sequence of versions < Q1, Q2, . . . , Qm >.
6. Each version Qk contains three data fields:
a. Content is the value of version Qk.
b. W-timestamp (Qk) is the timestamp of the transaction that created version Qk.
c. R-timestamp (Qk) is the largest timestamp of any transaction that successfully read version Qk.
7. The scheme operates as follows:
Suppose that transaction Ti issues a read(Q) operation. Let Qk denote the version of Q whose write
timestamp is the largest write timestamp less than or equal to TS(Ti).
a. If transaction Ti issues a read(Q), then the value returned is the content of version Qk.
b. When transaction Ti issues write(Q):
i. If TS(Tj) < R-timestamp (Qk), then the system rolls back transaction Ti.
ii. If TS(Ti) = W-timestamp (Qk), the system overwrites the contents of Qk; otherwise it creates a new
version of Q.
iii. This rule forces a transaction to abort if it is “too late” in doing a write.
4. The term “YES” indicates that if a transaction Ti hold a lock on data item Q than lock can be granted by
other requested transaction Tj on same data item Q.
5. The term “NO” indicates that requested mode is not compatible with the mode of lock held. So, the
requested transaction must wait until the lock is released.
6. In multi version two-phase locking, other transactions are allowed to read a data item while a transaction
still holds an exclusive lock on the data item.
7. This is done by maintaining two versions for each data item i.e., certified version and uncertified version.
8. In this situation, Tj is allowed to read the certified version of Q while Ti is writing the value of uncertified
version of Q. However, if transaction Ti is ready to commit, it must acquire a certify lock on Q.
22. What are the problems that can arise during concurrent execution of two or more
transactions? Discuss methods to prevent or avoid these problems.
Answer:
Problems that can arise during concurrent execution of two or more transaction: Refer Q.11
Methods to avoid these problems:
1. Lock based protocol:
a. It requires that all data items must be accessed in a mutually exclusive manner.
b. In this protocol, concurrency is controlled by locking the data items.
c. A lock guarantees exclusive use of a data item to current transaction.
d. Locks are used as a means of synchronizing the access by concurrent transaction to the database items.
2. Timestamp based protocol: Refer Q.12
3. Multi version scheme:
a. Multi version timestamping protocol: Refer Q.20
b. Multi version two-phase locking: Refer Q. 21
23. Explain the recovery with concurrent transactions.
Answer:
Recovery from concurrent transaction can be done in the following four ways:
1. Interaction with concurrency control:
a. In this scheme, the recovery scheme depends greatly on the concurrency control scheme that is used.
b. So to rollback a failed transaction, we must undo the updates performed by the transaction.
2. Transaction rollback:
a. In this scheme we rollback a failed transaction by using the log.
b. The system scans the log backward, for every log record found in the log the system restores the data
item.
3. Checkpoints:
a. In this scheme we used checkpoints to reduce the number of log records that the system must scan when
it recovers from a crash.
b. In a concurrent transaction processing system, we require that the checkpoint log record be of the form
<checkpoint L>, where ‘L’ is a list of transactions active at the time of the checkpoint.
4. Restart recovery:
a. When the system recovers from a crash, it constructs two lists.
b. The undo-list consists of transactions to be undone, and the redo-list consists of transaction to be redone.
c. The system constructs the two lists as follows: Initially, they are both empty. The system scans the log
backward, examining each record, until it finds the first <checkpoint> record.
25. Write the name of disk files used is Oracle. Explain database schema.
Answer:
Disk files consists two files which are as follows:
1. Data files:
a. At the physical level, data files comprise one or more data blocks, where the block size can vary between
data files.
b. Data files can occupy pre-allocated space in the file system of a computer server, utilize raw disk directly,
or exist within ASM logical volumes.
2. Control files: One (or multiple multiplexed) control files (also known as “control files”) store overall
system information and statuses.
Database schema:
1. Oracle database conventions refer to defined groups of object ownership as schemas.
2. Most Oracle database installation has a default schema called SCOTT.
3. After the installation process has set up the sample tables, the user can log into the database with the
username scott and the password tiger.
4. The SCOTT schema has seen less use as it uses few of the features of the more recent releases of Oracle.
5. Most recent examples supplied by Oracle Corporation reference the default HR or OE schemas.
27. Explain SQL Plus, SQL * Net and SQL & LOADER.
Answer:
SQL Plus:
1. SQL Plus is the front-end tools for Oracle.
2. The SQL Plus window looks much like a DOS window with a white background similar Notepad.
3. This tool allows us to type in our statements, etc., and see the results.
SQL * Net:
1. This is Oracle’s own middleware product which runs on both the client and server to hide the complexity
of the network.
2. SQL * Net’s multiprotocol interchange allows client/server connections to span multiple communication
protocols without the need for bridges and routers, etc., SQL * Net will work with any configuration design.
SQL * LOADER:
1. A utility used to load data from external files into Oracle tables.
2. It can load data from as ASCII fixed-format or delimited file into an Oracle table.
28. What do you mean by locking techniques of concurrency control? Discuss the various
locking techniques and recovery with concurrent transaction also in detail.
Answer:
Locking techniques:
1. The locking technique is used to control concurrency execution of transactions which is based on the
concept of locking data items.
2. The purpose of locking technique is to obtain maximum concurrency and minimum delay in processing
transactions.
3. A lock is a variable associate with a data item in the database and describes the status of that data item
with respect to possible operations that can be applied to the item; there is one lock for each data item in
the database.
Following are the locking techniques with concurrent transaction:
1. Lock based locking technique: Refer Q. 2
2. Two-phase locking technique: Refer Q. 7
Following are the recovery techniques with concurrent transaction:
1. Log based recovery: Refer Q. 18, Unit-4.
2. Checkpoint: Refer Q.20, Unit-4.