0% found this document useful (0 votes)

20 views73 pages

Stu Ctor

Uploaded by

Mohammad Bin Shafiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views73 pages

Stu Ctor

Uploaded by

Mohammad Bin Shafiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 73

UNIT-I

PART-A

Q1. Difference between ‘physical and logical schema’?

Ans: Physical schema represents the actual connection to the data source or data target. Logical schema represents
the logical name associated to that source or target. One logical schema can be associated with multiple physical
schemas along with context. One logical schema is associated with different physical schema using different
context.

Q2. Point out the importance of object base data model?

Ans: Object based data models the entities are based on real world models, and how the data is in real life. There is
not as much concern over what the data is as compared to how it is visualized and connected.
Some examples of object-based data models are
• Entity Relationship Data Model
• Object Oriented Data Model
• Semantic Data Model
• Functional Data Model
Out of these models, Entity Relationship Data Model and Object-Oriented Data Model are the most popular.

Q3. List out five applications of D.B.M.S.?

Ans:
• Banking
• Student information in educational institution
• Social media sites
• Telecommunication
• E-ticket system

Q4. Discuss about relational data model?

Ans: A relational data model involves the use of data tables that collect groups of elements into relations. These
models work based on the idea that each table setup will include a primary key or identifier. Other tables use that
identifier to provide "relational" data links and results.

Q5. What is data dictionary?

Ans: A data dictionary contains metadata data about the database. The data dictionary is very important as it
contains information such as what is in the database, who is allowed to access it, where is the database physically
stored etc. The users of the database normally don't interact with the data dictionary, it is only handled by the
database administrators.
The data dictionary in general contains information about the following −

• Names of all the database tables and their schemas.

• Details about all the tables in the database, such as their owners, their security constraints, when
they were created etc.
• Physical information about the tables such as where they are stored and how.
• Table constraints such as primary key attributes, foreign key information etc.
• Information about the database views that are visible.

Q6. List the purpose of database management system?

Ans: The purpose of DBMS is to transform the following −

• Data into information.
• Information into knowledge.
• Knowledge to the action.

Q7. What is embedded database?

Ans: An embedded database system, sometimes also called an in-process database system, is one that is delivered as
a set of libraries that are linked with the application code such that the database system functionality exists within
the application itself. The term “in-process database system” more accurately describes this architecture because the
database system operates in the same address space as the application itself. The server and the client each exist as
separate processes, with their own address space.

Q8. What is aggregate function?

Ans: An aggregate function in SQL performs a calculation on multiple values and returns a single value.

Q9. Which aggregate function are allowed in SQL?

Ans: SQL provides many aggregate functions that include avg, count, sum, min, max, etc. An aggregate function
ignores NULL values when it performs the calculation, except for the count function.

Q10. Definition of schema and instance?

Ans: Schema refers to the overall description of any given database. Instance basically refers to a collection of data
and information that the database stores at any moment.

Q11. What is data model?

Ans: A data model refers to the logical inter-relationships and data flow between different data elements involved in
the information world. It also documents the way data is stored and retrieved.

Q12. Generalize your view semi structured data model?

Ans: Semi-structured data refers to the structured data that doesn’t adhere to the tabular structure of
the data models that are associated with relational DBs or any other types of data tables. It includes tags or any other
markers to segregate semantic pieces and enforce hierarchies of fields and records within the data.

Q13. Difference between dynamic SQL and static SQL?

Ans: Static SQL is SQL statements in an application that do not change at runtime and, therefore, can be hard coded
into the application.

Dynamic SQL is SQL statements that are constructed at runtime

Q14. Difference between hierarchical data model and network model?

Ans: In Hierarchical data model, relationship between table and data is defined in parent child structure. In this
structure data are arranged in the form of a tree structure. This model supports one-to-one and one-to-many
relationships. On the other hand. Network models arrange data in graph structure.

Q15. Define database management system?

Ans: Database Management Systems (DBMS) are software systems used to store, retrieve, and run queries on data.
A DBMS serves as an interface between an end-user and a database, allowing users to create, read, update, and
delete data in the database.

Q16. Disadvantage of file processing system?

Ans: a) Data redundancy and inconsistency File processing system leads to the usage of many copies of same data.
This is data redundancy.

b) Difficulty in accessing data In a file processing system, to access data differently we need to have
different programs
c) Data isolation Files are stored in different locations
d) Integrity problems
e) Atomicity problems

Q17. Compare between DDL, DML, DCL?

Ans:

• DML is Data Manipulation Language

CREATE
DROP
DDL ALTER
RENAME
TRUNCATE

GRANT
DCL
REVOKE
• DCL is Data Control Language
• DDL is Data definition Language

Q18. Discuss about relational algebra?

Ans: Relational Algebra is procedural query language, which takes Relation as input and generate relation as
output. Relational algebra mainly provides theoretical foundation for relational databases and SQL.

PART-B

Q1. with the help of block diagram describe the basic architecture of DBMS?

Ans: The Database Management System (DBMS) architecture shows how data in the database is viewed by the
users. It is not concerned about how the data are handled and processed by the DBMS.
It helps in implementation, design, and maintenance of a database to store and organize information for companies.
The concept of DBMS depends upon its architecture. The architecture can be designed as centralized, decentralized,
or hierarchical.
The architecture of DBMS can be defined at three levels as follows −
• External levels.
• Conceptual levels.
• Internal levels.

Q2. disadvantage of file management system?

Ans:
• Data redundancy
• Limited user access
• Lack of storage access

Q3. List out the operation of relational algebra?

Ans:

Operators in Relational Algebra

• Projection (π): Projection is used to project required column data from a relation.
• Selection (σ): Selection is used to select required tuples of the relations

Q4. Describe about view in SQL?

Ans: Views in SQL are kind of virtual tables. A view also has rows and columns as they are in a real table in the
database. We can create a view by selecting fields from one or more tables present in the database. A View can
either have all the rows of a table or specific rows based on certain condition.

Q5. Which aggregate function are allowed in SQL?

Ans: SQL provides many aggregate functions that include avg, count, sum, min, max, etc. An aggregate function
ignores NULL values when it performs the calculation, except for the count function

Q6. Describe about function in SQL?

Ans: A function accepts input parameters, perform actions, and then return the result. We should note that functions
always return either a single value or a table.

Q7. Follow this table

• Employee (empno, name,office,age)

• book(isbn title,author,publisher)
• loan (empno, isbn, date)

I. Find the names of employee who borrowed book authored by ‘xyz’

Ans: select employe.name from employe

inner join loan on employe.empno =loan.empno
inner join book on loan.isbn=book.isbn
where book.author='xyz';

Q8. What is mapping cardinality in dbms?

Ans: In database management, cardinality plays an important role. Here cardinality represents the number of
times an entity of an entity set participates in a relationship set. Or we can say that the cardinality of a
relationship is the number of tuples (rows) in a relationship. Types of cardinality in between tables are:
• one-to-one
• one-to-many
• many-to-one
• many-to-many

Q9. What is ddl dml dcl in SQL?

Ans:
Q10. Explain about SQL fundamental?

Ans: SQL is Structured Query Language, which is a computer language for storing, manipulating and retrieving
data stored in a relational database.
SQL is the standard language for Relational Database System. All the Relational Database Management Systems
(RDMS) like MySQL, MS Access, Oracle, Sybase, Informix, Postgres and SQL Server use SQL as their standard
database language.
Also, they are using different dialects, such as −

• MS SQL Server using T-SQL,

• Oracle using PL/SQL,
• MS Access version of SQL is called JET SQL (native format) etc.

Q13. Consider the following table?

Employee (empno , name,dept, sallarey)

1. Find all the employee names whos first letter of name is “L”?
2. Find maximum salery in each gtoup
3. Find second maximum salary

Ans:
1. select name from employe where name like 'l%';

2. select name from employe where sallery in(select max(sallery) from employe group
by dep);

3. select name from employe where sallery in(select max(sallery) from employe group
by dep);
PART-C

Q1. Consider the following table

Ans: create table student_file ( student_number varchar(10), name char(10), address varchar(15), telephone int);

create table course_file ( course_number varchar(10), description char(10), hours int, professor_number
varchar(10));

create table professor_file ( professor_number varchar(10), p_name char(10), office varchar(15));

create table regis_file ( student_number varchar(10), course number varchar(10), a_date date);

• who teaches a specific course and what is his name?

select professor_file.p_name , course_file.description from professor_file
inner join course_file on professor_file.professor_number=course_file.professor_number ;

• for a specific student number which course the student enrolled and what is his name?
select student_file.name,course_file.description from student_file
inner join regis_file on student_file.student_number=regis_file.student_number
inner join course_file on regis_file.course_number=course_file.course_number

• who are the students for a specific course

select student_file.name from student_file
inner join regis_file on student_file.student_number=regis_file.student_number
inner join course_file on regis_file.course_number=course_file.course_number where
course_file.description ='c';

• who are the course teacher for a specific student

select professor_file.p_name from professor_file
inner join course_file onprofessor_file.professor_number=course_file.professor_number
inner join regis_file on course_file.course_number=regis_file.course_number
inner join student_file on regis_file.student_number =student_file.student_number
where student_file.name ='rafi';

• who registerd in a specific course

Q2.
Answer:
UNIT-II

PART-A

Q1. Express an entity relationship model with an example?

Ans: Entity relationship (ER) models are based on the real-world entities and their relationships. It is easy for the
developers to understand the system by simply looking at the ER diagram. ER models are normally represented by
ER-diagrams.

Q2. Discuss about 2NF?

Ans: Second Normal Form (2NF) is based on the concept of full functional dependency. Second Normal Form
applies to relations with composite keys, that is, relations with a primary key composed of two or more attributes.
A relation with a single-attribute primary key is automatically in at least 2NF. A relation that is not in 2NF may
suffer from the update anomalies. To be in second normal form, a relation must be in first normal form and
relation must not contain any partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no
non-prime attribute (attributes which are not part of any candidate key) is dependent on any proper subset of any
candidate key of the table.
Consider following functional dependencies in relation R (A, B, C, D )
AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any proper subset of
AB doesn’t determine any non-prime attribute.
Q3. What is composite attribute?

Ans: The attributes which can be divided into sub-parts are called composite attributes.
• Name, address is called composite attributes

Q4. Why bcnf is stricter than 3nf?

Ans: 1. BCNF is a normal form used in database normalization.

2. 3NF is the third normal form used in database normalization.

3. BCNF is stricter than 3NF because each and every BCNF is relation to 3NF but every 3NF is not relation to
BCNF.

4. BCNF non-transitionally depends on individual candidate key but there is no such requirement in 3NF.

5. Hence BCNF is stricter than 3NF.

Q5. Difference between single value and multi valued attribute?

Ans:

Q6. Define week entity?

Ans:

An entity type should have a key attribute which uniquely identifies each entity in the entity set, but there exists
some entity type for which key attribute can’t be defined. These are called Weak Entity type.
Q7. Entity set and relationship set?
Ans: An entity is a table in DBMS, and it represents a real-world object. Entities are connected to each other using
relationships. Thus, the difference between entity and relationship in DBMS is that the entity is a real-world object
while the relationship is an association between the entities.
Q8. what is query evaluation plan in dbms?

Ans: It is nothing but a program for an abstract machine inside the DBMS. It is produced by the query optimizer.
At times this can also be termed as access plan because DBMS decide’s how to access the rows. Query evaluation
plans are very much similar to relational algebra expressions in most of the systems.

Q9. classify the participation constraint in dbms?

Ans:

Q10. develop a database to describe 3nf?

Ans: A table is in a third normal form when the following conditions are met −

• It is in second normal form.

• All nonprimary fields are dependent on the primary key.
The dependency of these non-primary fields is between the data. For example, in the following table – the street
name, city and the state are unbreakably bound to their zip code.

CREATE TABLE CUSTOMERS(

CUST_ID INT NOT NULL,
CUST_NAME VARCHAR (20) NOT NULL,
DOB DATE,
STREET VARCHAR(200),
CITY VARCHAR(100),
STATE VARCHAR(100),
ZIP VARCHAR(12),
EMAIL_ID VARCHAR(256),
PRIMARY KEY (CUST_ID)
);

The dependency between the zip code and the address is called as a transitive dependency. To comply with the third
normal form, all you need to do is to move the Street, City and the State fields into their own table, which you can
call as the Zip Code table. –
CREATE TABLE ADDRESS(
ZIP VARCHAR(12),
STREET VARCHAR(200),
CITY VARCHAR(100),
STATE VARCHAR(100),
PRIMARY KEY (ZIP)
);

The next step is to alter the CUSTOMERS table as shown below −

CREATE TABLE CUSTOMERS(

CUST_ID INT NOT NULL,
CUST_NAME VARCHAR (20) NOT NULL,
DOB DATE,
ZIP VARCHAR(12),
EMAIL_ID VARCHAR(256),
PRIMARY KEY (CUST_ID)
);

The advantages of removing transitive dependencies are mainly two-fold. First, the amount of data duplication is
reduced and therefore your database becomes smaller.
The second advantage is data integrity. When duplicated data changes, there is a big risk of updating only some of
the data, especially if it is spread out in many different places in the database.
For example, if the address and the zip code data were stored in three or four different tables, then any changes in
the zip codes would need to ripple out to every record in those three or four tables.

Q10. write about functional dependencies?

Ans:
Functional dependency in DBMS, as the name suggests is a relationship between attributes of a table dependent on each other.
Introduced by E. F. Codd, it helps in preventing data redundancy and gets to know about bad designs.
To understand the concept thoroughly, let us consider P is a relation with attributes A and B. Functional Dependency is
represented by -> (arrow sign)
Then the following will represent the functional dependency between attributes with an arrow sign −

A -> B

Above suggests the following:

Example

The following is an example that would make it easier to understand functional dependency −
We have a <Department> table with two attributes − DeptId and DeptName.

DeptId = Department IDDeptName = Department Name

The DeptId is our primary key. Here, DeptId uniquely identifies the DeptName attribute. This is because if you want to know
the department name, then at first you need to have the DeptId.

DeptId DeptName

001 Finance

002 Marketing

003 HR

Therefore, the above functional dependency between DeptId and DeptName can be determined as DeptId is functionally
dependent on DeptName −

DeptId -> DeptName

Types of Functional Dependency

Functional Dependency has three forms −

• Trivial Functional Dependency

• Non-Trivial Functional Dependency

• Completely Non-Trivial Functional Dependency

Let us begin with Trivial Functional Dependency −

Trivial Functional Dependency

It occurs when B is a subset of A in −

A ->B

Example
We are considering the same <Department> table with two attributes to understand the concept of trivial dependency.
The following is a trivial functional dependency since DeptId is a subset of DeptId and DeptName

{ DeptId, DeptName } -> Dept Id

Non –Trivial Functional Dependency

It occurs when B is not a subset of A in −

A ->B
Example
DeptId -> DeptName

The above is a non-trivial functional dependency since DeptName is a not a subset of DeptId.

Completely Non - Trivial Functional Dependency

It occurs when A intersection B is null in −

A ->B

Armstrong’s Axioms Property of Functional Dependency

Armstrong’s Axioms property was developed by William Armstrong in 1974 to reason about functional dependencies.
The property suggests rules that hold true if the following are satisfied:

• TransitivityIf A->B and B->C, then A->C i.e. a transitive relation.

• ReflexivityA-> B, if B is a subset of A.

• AugmentationThe last rule suggests: AC->BC, if A->B

PART-B

Q3. E-R diagram for banking system?

Ans:
Q2. write about functional dependencies?

A -> B

Above suggests the following:

Example

The following is an example that would make it easier to understand functional dependency −
We have a <Department> table with two attributes − DeptId and DeptName.

DeptId = Department IDDeptName = Department Name

The DeptId is our primary key. Here, DeptId uniquely identifies the DeptName attribute. This is because if you want to know
the department name, then at first you need to have the DeptId.

DeptId DeptName

001 Finance

002 Marketing

003 HR

Therefore, the above functional dependency between DeptId and DeptName can be determined as DeptId is functionally
dependent on DeptName −

DeptId -> DeptName

Types of Functional Dependency

Functional Dependency has three forms −

• Trivial Functional Dependency

• Non-Trivial Functional Dependency

• Completely Non-Trivial Functional Dependency

Let us begin with Trivial Functional Dependency −

Trivial Functional Dependency

It occurs when B is a subset of A in −

A ->B

{ DeptId, DeptName } -> Dept Id

Non –Trivial Functional Dependency

It occurs when B is not a subset of A in −

A ->B
Example

DeptId -> DeptName

The above is a non-trivial functional dependency since DeptName is a not a subset of DeptId.
Completely Non - Trivial Functional Dependency

It occurs when A intersection B is null in −

A ->B

Armstrong’s Axioms Property of Functional Dependency

Armstrong’s Axioms property was developed by William Armstrong in 1974 to reason about functional dependencies.
The property suggests rules that hold true if the following are satisfied:

• TransitivityIf A->B and B->C, then A->C i.e. a transitive relation.

• ReflexivityA-> B, if B is a subset of A.

• AugmentationThe last rule suggests: AC->BC, if A->B

Q4. write short note on data model and its type?

Ans:
Data Model gives us an idea that how the final system will look like after its complete implementation. It defines
the data elements and the relationships between the data elements. Data Models are used to show how data is
stored, connected, accessed and updated in the database management system. Here, we use a set of symbols and
text to represent the information so that members of the organisation can communicate and understand it. Though
there are many data models being used nowadays but the Relational model is the most widely used model. Apart
from the Relational model, there are many other types of data models about which we will study in details in this
blog. Some of the Data Models in DBMS are:

1. Hierarchical Model
2. Network Model
3. Entity-Relationship Model
4. Relational Model
5. Object-Oriented Data Model
6. Object-Relational Data Model
7. Flat Data Model
8. Semi-Structured Data Model
9. Associative Data Model
10. Context Data Model

Hierarchical Model
Hierarchical Model was the first DBMS model. This model organises the data in the hierarchical tree structure.
The hierarchy starts from the root which has root data and then it expands in the form of a tree adding child node
to the parent node. This model easily represents some of the real-world relationships like food recipes, sitemap of
a website etc.
Q5. what is normalization in database?

Ans: Database normalization is a database schema design technique, by which an existing schema is modified to
minimize redundancy and dependency of data. Normalization split a large table into smaller tables and define
relationships between them to increases the clarity in organizing data.
Q6. types of normalization in database?

Ans:

Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on
the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be
lossless.
7.(i)

• A bank has many entities.

• Each customer has multiple accounts.
• Multiple customers belong to a single branch.
• Single customer can borrow multiple loans.
• A branch has multiple employees.
Identify the attributes for the given entities
• Customer − the relevant attributes are customerName, CustomerID, address.
• Account − The relevant attributes are AccountNo, balance.
• Branch − The relevant attributes are branchID, branchName, address.
• Loan − The relevant attributes are loanNo, paymentMode, dateOfLoan, and amount.
• Employee − The relevant attributes are empID, empName, dateOfJoin, experience, qualification.
Identify the Key attributes
• CustomerID is the key attribute for a customer.
• AccountNo is the key attribute for Account entities.
• BranchID is the key attribute for branch entities.
• LoanNo is the key attribute for a loan entity.
• EmpID is the key attribute for an Employee entity.
Identify the relationship between entity sets
• One customer is enrolled by multiple accounts and one account for multiple customers. Hence, the
relationship is many to many.
• Many customers belong to one branch but one branch belongs to many customers. Hence, the relationship
between customer and branch is many to one.
• One customer can borrow multiple loans in the same way multiple loans can borrow a single customer,
hence the relationship between customer and loan is one to many.
• One branch has many employees and in the same way the number of employees works in a single branch.

7.ii) Functional Dependency (FD) is a constraint that determines the relation of one attribute to another attribute in
a Database Management System (DBMS). Functional Dependency helps to maintain the quality of data in the
database. It plays a vital role to find the difference between good and bad database design.
A functional dependency is denoted by an arrow “→”. The functional dependency of X on Y is represented by X →
Y. Let’s understand Functional Dependency in DBMS with example.

Employee number Employee Name Salary City

1 Dana 50000 San Francisco
2 Francis 38000 London
3 Andrew 25000 Tokyo

Example:

In this example, if we know the value of Employee number, we can obtain Employee Name, city, salary, etc. By
this, we can say that the city, Employee Name, and salary are functionally depended on Employee number.

Trivial Functional Dependency:

The Trivial dependency is a set of attributes which are called a trivial if the set of attributes are included in that
attribute.

So, X -> Y is a trivial functional dependency if Y is a subset of X. Let’s understand with a Trivial Functional
Dependency Example.

For example:

Emp_id Emp_name
AS555 Hasan
AS811 Abir
AS999 Kabir
Consider this table of with two columns Emp_id and Emp_name.

{Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is a subset of {Emp_id,Emp_name}.
8) F:R(A,B,C,D,E),F={AC->E,B->D,E->A}

AC+=ACE,B+=BD,E+=EA

Candidate key is a single key or a group of multiple keys that uniquely identify rows in a table. No key is
determining the whole unique keys. So here candidate key is null.

9.i)

What is Database Anomaly?

Database anomaly is normally the flaw in databases which occurs because of poor planning and storing everything

in a flat database. Generally this is removed by the process of normalization which is performed by splitting/joining

of tables.

There are three types of database anomalies:

a) Insertion anomaly: An insertion anomaly occurs when we are not able to insert certain attribute in the database

without the presence of other attribute. For example suppose any professor is hired but not immediately assigned

any course group or any department may not get his/her place in such type of flat database mentioned above, if null

entries are not allowed in the database. So in the case mentioned above removing such type of problems requires

splitting of the database which is done by normalization.

b) Update anomaly: This occurs in case of data redundancy and partial update. In other words a correct update of

database needs other actions such as addition, deletion or both. For example in the above table the department

assigned to Anushka is an error because it needs to be updated at two different place to maintain consistency.

c) Deletion Anomaly: Deletion anomaly occurs where deletion some data is deleted because of deletion of some

other data. For example if Section B is to be deleted then un-necessarily Sonam’s detail has to be deleted. So

normalization is generally done before deleting any record from a flat database.

BCNF:

o BCNF is the advance version of 3NF. It is stricter than 3NF.

o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.

o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
9.(ii)Application of the general definitions of 2NF and 3NF may identify additional redundancy caused by
dependencies that violate one or more candidate keys. However, despite these additional constraints,
dependencies can still exist that will cause redundancy to be present in 3NF relations. This weakness in 3NF,
resulted in the presentation of a stronger normal form called Boyce–Codd Normal Form (Codd, 1974). Although,
3NF is adequate normal form for relational database, still, this (3NF) normal form may not remove 100%
redundancy because of X?Y functional dependency, if X is not a candidate key of given relation. That’s why we
use Boyce-Codd Normal Form (BCNF) to solve .

Difference of 3NF and BCNF:

3NF BCNF

In 3NF there should be no transitive dependency that is no

non prime attribute should be transitively dependent on the In BCNF for any relation A->B, A
candidate key. should be a super key of relation.

It is comparatively more stronger than

It is less stronger than BCNF. 3NF.

In 3NF the functional dependencies are already in 1NF and In BCNF the functional dependencies
2NF. are already in 1NF, 2NF and 3NF.

The redundancy is comparatively low

The redundancy is high in 3NF. in BCNF.

In BCNF there may or may not be

preservation of all functional
In 3NF there is preservation of all functional dependencies. dependencies.

It is comparatively easier to achieve. It is difficult to achieve.

Lossless decomposition is hard to

Lossless decomposition can be achieved by 3NF. achieve in BCNF.

10.(i) Lossless join decomposition is a decomposition of a relation R into relations R1, R2 such that if we
perform a natural join of relation R1 and R2, it will return the original relation R. This is effective in removing
redundancy from databases while preserving the original data.
In Lossless Decomposition, we select the common attribute and the criteria for selecting a common attribute is
that the common attribute must be a candidate key or super key in either relation R1, R2, or both.Decomposition
of a relation R into R1 and R2 is a lossless-join decomposition if at least one of the following functional
dependencies are in F+ (Closure of functional dependencies)
R1 ∩ R2 → R1
OR
R1 ∩ R2 → R2
10.(ii)
Consider a relation student (rollno, game, feestructure)
Rollno Game Feestructure

1 Basketball 500

2 Basketball 500

3 Basketball 500

4 Cricket 600

5 Cricket 600

6 Cricket 600

7 Tennis 400

F − {rollno -> game, rollno -> feestructure, game -> fee}

Rollno+= {rollno, game, feestructure}
=> rollno is primary key
The above student table is in 1NF because there are no multivalue attributes.
Student table is also in 2NF because all non-key attributes are fully functional dependent on the primary key
(rollno). But the table is not in 3NF because there is transitive dependency i.e. game-> feestructure.
Feestructure has transitive/indirect dependency on rollno via game.
So, now to convert the above student table into 3NF first we need to decompose the table as follows −

Decomposition for 3NF

To overcome these anomalies, the student table should be divided into smaller tables.
If X->Y is transitive dependency then divide R into R1(X+) and R2(R-Y+).
Game->feestructure is a transitive dependency [since neither game is a key nor fee is a key attribute]
R1=game+=(game, feestructure)
R2=(student-feestructure+) = (rollno,game)
So divide the student table into R1(game, feestructure) and R2 (rollno, game).
R1

Rollno Game

1 Basketball

2 Basketball

3 Basketball
Rollno Game

4 Cricket

5 Cricket

6 Cricket

7 tennis

Game Feestructure

Basketball 500

Cricket 600

Tennis 400
11.1

11.2.(i):Data modeling is a process that you use to define the data structure of a database. Data modeling is a
process that you use to define the data structure of a database.
Data modeling types:
1.Conceptual data models: Conceptual data models are the foundation of every data model that's created. They help
you understand which entities exist in your business and how they relate to each other. Conceptual models don't
include the details regarding the specific attributes attached to an entity.
2. Logical Data Model :Logical Data Model focuses on how data is stored in an organization's
systems. The logical model describes how data moves between its source (for example, a person or another system)
and its destination (for example, a database). It uses entities, attributes, relationships, cardinality, and constraints to
describe the entity set for each table in a relational database.
3. Physical Data Model :Physical data modeling is the process of defining the structure of a database schema to
store information. The physical model is typically created by a database administrator or system analyst. It is used to
create tables, indexes, and views, which are implemented through the use of Structured Query Language (SQL)
statements.

II)A mapping constraint is a data constraint that expresses the number of entities to which another entity can be
related via a relationship set. For binary relationship set R on an entity set A and B, there are four possible mapping
cardinalities. These are as follows:

1. One to one (1:1)

2. One to many (1:M)

3. Many to one (M:1)

4. Many to many (M:M)

One-to-one

In one-to-one mapping, an entity in E1 is associated with at most one entity in E2, and an entity in E2 is associated
with at most one entity in E1.

One-to-many

In one-to-many mapping, an entity in E1 is associated with any number of entities in E2, and an entity in E2 is
associated with at most one entity in E1.
Many-to-one

In one-to-many mapping, an entity in E1 is associated with at most one entity in E2, and an entity in E2 is associated
with any number of entities in E1.

Many-to-many

In many-to-many mapping, an entity in E1 is associated with any number of entities in E2, and an entity in E2 is
associated with any number of entities in E1.

12. Functional Dependency (FD) is a constraint that determines the relation of one attribute to another attribute in a
Database Management System (DBMS). Functional Dependency helps to maintain the quality of data in the
database. It plays a vital role to find the difference between good and bad database design.
A functional dependency is denoted by an arrow “→”. The functional dependency of X on Y is represented by X →
Y.
If a table can be recreated by joining multiple tables and each of this table have a subset of the attributes of the
table, then the table is in Join Dependency. It is a generalization of Multivalued Dependency
Join Dependency can be related to 5NF, wherein a relation is in 5NF, only if it is already in 4NF and it cannot be
decomposed further.
13.(i) Non loss join decomposition :Non loss Decomposition/Lossless join decomposition is a decomposition of
a relation R into relations R1, R2 such that if we perform a natural join of relation R1 and R2, it will return the
original relation R. This is effective in removing redundancy from databases while preserving the original data.
In Lossless Decomposition, we select the common attribute and the criteria for selecting a common attribute is
that the common attribute must be a candidate key or super key in either relation R1, R2, or both.Decomposition
of a relation R into R1 and R2 is a lossless-join decomposition if at least one of the following functional
dependencies are in F+ (Closure of functional dependencies)
R1 ∩ R2 → R1
OR
R1 ∩ R2 → R2
Decomposition is lossless if it is feasible to reconstruct relation R from decomposed tables using Joins. This is the
preferred choice. The information will not lose from the relation when decomposed. The join would result in the
same original relation.
Lossy decomposition: The decompositions R1, R2, R2…Rn for a relation schema R are said to be Lossy if there
natural join results into addition of extraneous tuples with the original relation R.
Let R be a relation and R1, R2, R3 … Rn be it’s decomposition, the decomposition is lossy if –
R ⊂ R1 ⨝ R2 ⨝ R3 .... ⨝ Rn
Lossy decomposition is,when a relation is decomposed into two or more relational schemas, the loss of information
is unavoidable when the original relation is retrieved.
14.(i) If a table can be recreated by joining multiple tables and each of this table have a subset of the attributes of
the table, then the table is in Join Dependency. It is a generalization of Multivalued Dependency
Join Dependency can be related to 5NF, wherein a relation is in 5NF, only if it is already in 4NF and it cannot be
decomposed further.

Example

EmpName EmpSkills EmpJob (Assigned Work)

Tom Networking EJ001

Harry Web Development EJ002

Katie Programming EJ002

The above table can be decomposed into the following three tables; therefore it is not in 5NF:<EmployeeSkills>

EmpName EmpSkills

Tom Networking

Harry Web Development

Katie Programming

EmpName EmpJob

Tom EJ001
Harry EJ002

Katie EJ002

EmpSkills EmpJob

Networking EJ001

Web Development EJ002

Programming EJ002

Our Join Dependency −

{(EmpName, EmpSkills ), ( EmpName, EmpJob), (EmpSkills, EmpJob)}

The above relations have join dependency, so they are not in 5NF. That would mean that a join relation of the
above three relations is equal to our original relation <Employee>.

14.(ii)5NF:

o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.

o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.

o 5NF is also known as Project-join normal form (PJ/NF).

Example

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for
Semester 2. In this case, combination of all these fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that
subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't
leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

SUBJECT LECTURER

Computer Anshika

Computer John

Math John
Math Akash

Chemistry Praveen

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

PART-C

Q1.Bank model ER diagram

UNIT THREE - TRANSACTIONS
PART-A

1. Define Transaction.

Ans: Transactions refer to a set of operations that are used for performing a set of logical work.

2. Give the reasons for allowing concurrency.

Ans: The reasons for allowing concurrency is if the transactions run serially, a short transaction might have to wait
for a preceding long transaction to complete, which can lead to unpredictable delays in running a transaction. So
concurrent implementation reduces the unpredictable delays in running transactions.

3. Analyse on average response time.

Ans: Average response time response time refers to the amount of time Application Server takes to return the results
of a request to the user. The response time is affected by factors such as network bandwidth, number of users,
number and type of requests submitted, and average think time.

4. Illustrate the situation to roll back a transaction.

Ans: You can use roll back transactions to erase all data modifications made from the start of the transaction or to a
savepoint. It also frees resources held by the transaction.

ROLLBACK;

5. Summarise the properties of Transaction.

Ans: In the context of transaction processing, the acronym ACID refers to the four key properties of a transaction:
atomicity, consistency, isolation, and durability. All changes to data are performed as if they are a single
operation.
6. What are the different modes of locks?

Ans: Lock mode is used to prevent other people to reads or change the locked resource. It can be categorised into
the following six types listed below:

● Exclusive Lock (X)

● Shared Lock (S)
● Update Lock (U)
● Intent Lock (I)
● Schema Lock (Sch)
● Bulk Update Lock (BU)

7. Assess the Serializability .How is it tested?

Ans: Serializability is the classical concurrency scheme. It ensures that a schedule for executing concurrent
transactions is equivalent to one that executes the transactions serially in some order.

Assume a schedule S. For S, we construct a graph known as precedence graph. This graph has a pair G = (V, E),
where V consists a set of vertices, and E consists a set of edges. The set of vertices is used to contain all the
transactions participating in the schedule. The set of edges is used to contain all edges Ti ->Tj for which one of the
three conditions holds:

1. Create a node Ti → Tj if Ti executes write (Q) before Tj executes read (Q).

2. Create a node Ti → Tj if Ti executes read (Q) before Tj executes write (Q).
3. Create a node Ti → Tj if Ti executes write (Q) before Tj executes write (Q)

8. Recommend the need of shadow paging.

Ans: The process of recovering data in a database management system is known as Shadow Paging. It is a recovery
technique where the transactions are performed in the main memory. After completion of transactions, they get
updated in the database.

9. Demonstrate recoverable schedule with suitable example.

Ans: If any transaction that performs a dirty read operation from an uncommitted transaction and also its committed
operation becomes delayed till the uncommitted transaction is either committed or rollback such type of schedules is
called as Recoverable Schedules.

Types of recoverable schedules

There are three types of recoverable schedules which are explained below with relevant examples −

● Cascading schedules
● Cascadeless Schedules
● Strict Schedules.

10. Define deadlock.

Ans: Deadlock is a state of a database system having two or more transactions, when each transaction is waiting for
a data item that is being locked by some other transaction.

11. List of types of serializability.

Ans: Two types of Serializability

1 Conflict Serializability

2 View Serializability

12. List the phases of two phase locking protocol.

Ans:

1 Growing Phase − All the locks are issued in this phase. No locks are released, after all changes to data-items are
committed and then the second phase (shrinking phase) starts.
2 Shrinking phase − No locks are issued in this phase, all the changes to data-items are noted (stored) and then locks
are released.

13. Give the states of Transaction.

Ans: Transaction states are as follows-

1. Active state
2. Partially committed state
3. Committed state
4. Failed state
5. Aborted state
6. Terminated state

14. Define the upgrade and the downgrade.

Ans:

Upgrade:It gives a mechanism for conversion from shared lock to exclusive lock is called as upgrade.

Downgrade :It gives a mechanism for conversion from exclusive lock to shared lock is called as downgrade.

Part B

1. I) Describe the ACID Properties of Transaction.

Answer:
ACID Stands for
A = Atomicity
C = Consistency
I = Isolation
D = Durability
Atomicity: Atomicity means that multiple operations can be grouped into a single logical entity, that is,
other threads of control accessing the database will either see all of the changes or none of the changes.

Consistency: Database consistency is defined by a set of values that all data points within the database
system must align to in order to be properly read and accepted.
Isolation: Isolation defined at database level as a property that defines how or when the changes made by
one operation become visible to others.

Durability: Durability is the property that makes sure that transactions are permanently stored and do not
disappear or are erased by accident, even during a database crash.

ii) What benefit does rigorous two phase locking provide? Show how does it compare with other forms of
two phase locking?
Answer:
Rigorous two – phase locking has the advantages of strict 2PL. In addition it has the property that for two
conflicting transactions, their commit order is their serializability order. In some systems users might
expect this behavior.

2. I) What is concurrency control? Explain the two phase locking protocol with an example?
Answer:
Concurrency Control in DBMS is a procedure of managing simultaneous transactions ensuring their
atomicity, isolation, consistency, and serializability.

Every transaction will lock and unlock the data item in two different phases.
• Growing Phase − All the locks are issued in this phase. No locks are released, after all changes to data-
items are committed and then the second phase (shrinking phase) starts.
• Shrinking phase − No locks are issued in this phase, all the changes to data-items are noted (stored) and
then locks are released.

Example:
Let T1 and T2 are two transactions.
T1=A+B and T2=B+A

Lock-X(B) : Cannot execute Lock-X(B) since B is locked by T2.

Lock-X(A) : Cannot execute Lock-X(A) since A is locked by T1.
In the above situation T1 waits for B and T2 waits for A. The waiting time never ends. Both the transaction cannot
proceed further at least any one releases the lock voluntarily. This situation is called deadlock.
ii) Discuss about confilict serializability and view serializability.
Answer:
Conflict Serializability:
Conflict serializability is a type of serializability in which conflicting operations on the same data items
are executed in an order that preserves database consistency. Each transaction is assigned a uni que
number, and the operations within each transaction are executed in order based on that number.

Key conditions for conflict serializability.

• Both operations belong to different transactions

• Both operations are on the same data item
• At least one of the two operations is a write operation

View serializability:

View serializability is a type of serializability in which each transaction produces results that are
equivalent to some well-defined sequential execution of all transactions in the system. Unlike conflict
serializability, which focuses on preventing inconsistencies within the database, view serializability in
DBMS focuses on providing users with consistent views of the database.

3. Write a short note

ii) Transaction concept.
Answer:
A Database Transaction is a logical unit of processing in a DBMS which entails one or more database
access operation. All types of database access operation which are held between the beginning and end
transaction statements are considered as a single logical transaction in DBMS.

• A transaction is a program unit whose execution may or may not change the contents of a database.
• The transaction concept in DBMS is executed as a single unit.
• If the database operations do not update the database but only retrieve data, this type of transaction is called
a read-only transaction.
• A successful transaction can change the database from one CONSISTENT STATE to another

iii) Dedlock concept.

Answer:
A deadlock is an unwanted situation in which two or more transactions are waiting indefinitely for one another to
give up locks. Deadlock is said to be one of the most feared complications in DBMS as it brings the whole
system to a Halt.
Example – let us understand the concept of Deadlock with an example :
Suppose, Transaction T1 holds a lock on some rows in the Students table and needs to update some rows in the
Grades table. Simultaneously, Transaction T2 holds locks on those very rows (Which T1 needs to update) in the
Grades table but needs to update the rows in the Student table held by Transaction T1.

4. i) What is deadlock? How does it occur?

Answer:
A deadlock is an unwanted situation in which two or more transactions are waiting indefinitely for one
another to give up locks.
Deadlock in a (DBMS) is an undesired situation in which two or more transactions have to wait
indefinitely for each other in order to get terminated, but none of the transactions is willing to give
up the allocated CPU and memory resources that the other one needs. Deadlock brings the whole
system to a halt as no task ever gets finished and is in waiting state forever.

5. i) What is concurrency control? How is it implemented in DBMS?

Answer:
Concurrency Control in DBMS is a procedure of managing simultaneous transactions ensuring their
atomicity, isolation, consistency, and serializability.

Different concurrency control protocols offer different benefits between the amount of concurrency they allow
and the amount of overhead that they impose. Following are the Concurrency Control techniques in DBMS:

• Lock-Based Protocols
• Two Phase Locking Protocol
• Timestamp-Based Protocols
• Validation-Based Protocols

ii) Generalize with a suitable example.

Answer:
Going up in this structure is called generalization, where entities are clubbed together to represent a more
generalized view. For example, a particular student named Mira can be generalized along with all the students.

Example:

6. I) What is two phase locking? Explain it with suitable example?

Answer:
Two-Phase Locking (2PL) is a concurrency control method which divides the execution phase of a
transaction into three parts.

Example:
Let T1 and T2 are two transactions.
T1=A+B and T2=B+A

Lock-X(B) : Cannot execute Lock-X(B) since B is locked by T2.

Answer:

The gold standard in isolation is “serializability”. A system that guarantees serializability is able to process
transactions concurrently, but guarantees that the final result is equivalent to what would have happened if each
transaction was processed individually, one after other (as if there were no concurrency).

7. What is Concurrency ? Explain it in terms of locking mechanism and two phase commit protocol?

Answer:

Data concurrency is the ability to allow multiple users to affect multiple transaction within a database.

Lock-based Protocols. Binary Locks − A lock on a data item can be in two states; it is either locked or unlocked.
Shared/exclusive − This type of locking mechanism differentiates the locks based on their uses. If a lock is acquired
on a data item to perform a write operation, it is an exclusive lock.

This protocol can be divided into two phases,

1. In Growing Phase, a transaction obtains locks, but may not release any lock.
2. In Shrinking Phase, a transaction may release locks, but may not obtain any lock.

Types of Two – Phase Locking Protocol

Following are the types of two – phase locking protocol:

1. Strict Two – Phase Locking Protocol

2. Rigorous Two – Phase Locking Protocol
3. Conservative Two – Phase Locking Protocol

➢ Strict Two-Phase Locking Protocol

• Strict Two-Phase Locking Protocol avoids cascaded rollbacks.
• This protocol not only requires two-phase locking but also all exclusive-locks should be held until the transaction
commits or aborts.
• It is not deadlock free.
➢ Rigorous Two-Phase Locking

• Rigorous Two – Phase Locking Protocol avoids cascading rollbacks.

• This protocol requires that all the share and exclusive locks to be held until the transaction commits.
➢ Conservative Two-Phase Locking Protocol
• Conservative Two – Phase Locking Protocol is also called as Static Two – Phase Locking Protocol.
• This protocol is almost free from deadlocks as all required items are listed in advanced.
• It requires locking of all data items to access before the transaction starts.

8. Explain two phase commit and Three-Phase Commit Protocols?

Answer:
2PC (Two Phase Commit) is a distributed commit protocol typically used to handle distributed transaction
— transactions with SQL statements changing/updating data across multiple databases (on different
physical machines) or multiple replicas (on different physical machines) of the same database.
Three-Phase Commit (3PC) Protocol is an extension of the Two-Phase Commit (2PC) Protocol that
avoids blocking problem under certain assumptions. In particular, it is assumed that no network partition
occurs, and not more than k sites fail, where we assume ‘k’ is predetermined number. With the
mentioned assumptions, protocol avoids blocking by introducing an extra third phase where multiple
sites are involved in the decision to commit.
9. Illustrate two phase locking protocol with an example?

Answer:

A transaction is said to follow the Two-Phase Locking protocol if Locking and Unlocking can be done in two
phases.

o The two-phase locking protocol divides the execution phase of the transaction into three parts.

o In the first part, when the execution of the transaction starts, it seeks permission for the lock it requires.

o In the second part, the transaction acquires all the locks. The third phase is started as soon as the transaction
releases its first lock.

o In the third phase, the transaction cannot demand any new locks. It only releases the acquired locks.

There are two phase locking protocol:

i) Growing phase
ii) Shrinking phase

Example:
The following way shows how unlocking and locking work with 2-PL.

Transaction T1:

o Growing phase: from step 1-3

o Shrinking phase: from step 5-7

o Lock point: at 3

Transaction T2:

o Growing phase: from step 2-6

o Shrinking phase: from step 8-9

o Lock point: at 6

10. Differentiate strict two phase locking protocol and rigorous two phase locking protocol?
Answer:

11. When a transaction is said to be deadlocked?

Answer:
A deadlock is a condition where two or more transactions are waiting indefinitely for one another to give
up locks. Deadlock is said to be one of the most feared complications in DBMS as no task ever gets
finished and is in waiting state forever.

12. Describe about the deadlock prevention senemes?

Answer:

Deadlock can be prevented by eliminating any of the four necessary conditions.

*Mutual Exclusion
*Hold and Wait
*No preemption
*Circular wait
Mutual Exclusion:
It is not possible to dis-satisfy the mutual exclusion because some resources, such as the tape drive and
printer, are inherently non-shareable.
Hold and Wait:
Allocate all required resources to the process before the start of its execution, this way hold and wait
condition is eliminated but it will lead to low device utilization.
The process will make a new request for resources after releasing the current set of resources.
No Preemption:
Preempt resources from the process when resources required by other high priority processes.
Circular wait:
Each resource will be assigned with a numerical number. A process can request the resources
increasing/decreasing. order of numbering.
For Example, if P1 process is allocated R5 resources, now next time if P1 ask for R4, R3 lesser than R5 such
request will not be granted, only request for resources more than R5 will be granted.

13. Define the term Recoverable schedule and Cascade less schedules?
Answer:
1. Recoverable Schedule:
A schedule is said to be recoverable if it is recoverable as name suggest. Only reads are allowed
before write operation on same data.
2. Cascade less schedules: When no read or write-write occurs before execution of transaction then
corresponding schedule is called cascade less schedule.

Part: C
Question (1)

Consider the following extension to the tree-locking protocol, which allows both shared and exclusive locks:

• A transaction can be either a read-only transaction, in which case it can request only shared locks, or an update
transaction, in which case it can request only exclusive locks.

• Each transaction must follow the rules of the tree protocol. Read-only transactions may lock any data item first,
whereas update transactions must lock the root first.

Answer:

We all know the four properties a transaction must follow. Yes, you got that right, I mean the acid properties.
Concurrency control techniques are used to ensure that the Isolation (or non-interference) property of
concurrently executing transactions is maintained.
A trivial question I would like to pose in front of you, (I know you must know this but still) why do you think that
we should have interleaving execution of transactions if it may lead to problems such as Irrecoverable Schedule,
Inconsistency and many more threats.
Why not just let it be Serial schedules and we may live peacefully, no complications at all.
Yes, the performance effects the efficiency too much which is not acceptable.
Hence a Database may provide a mechanism that ensures that the schedules are either conflict or view
serializable and recoverable (also preferably cascade less). Testing for a schedule for Serializability after it has
executed is obviously too late!
So we need Concurrency Control Protocols that ensures Serializability.

Concurrency-control protocols: allow concurrent schedules, but ensure that the schedules are conflict/view
serializable, and are recoverable and maybe even cascade less.
These protocols do not examine the precedence graph as it is being created, instead a protocol imposes a
discipline that avoids non-serializable schedules.
Different concurrency control protocols provide different advantages between the amount of concurrency they
allow and the amount of overhead that they impose.
We’ll be learning some protocols which are important for GATE CS. Questions from this topic is frequently
asked and it’s recommended to learn this concept. (At the end of this series of articles I’ll try to list all theoretical
aspects of this concept for students to revise quickly and they may find the material in one place.) Now, let’s get
going:
Different categories of protocols:

Question :( 2)

Consider the following two transactions:

T1: read(A);

read (B);

If A = 0 then B: = B + 1;

Write (B).

T2: read (B);

read (A);

If B = 0 then A: = A + 1;

Write (A).

Add lock and unlock instructions to transactions T1 and T2, so that they observe the two-phase locking protocol.
Can the execution of these transactions result in a deadlock?

Answer:

a. Lock and unlock instructions:

T1: lock-S (A)

read (A)

Lock-X (B)

Read (B)

If A = 0

Then B: = B + 1

write (B)

unlock (A)

Unlock (B)

T2: lock-S (B)

read (B)

Lock-X (A)

Read (A)

If B = 0

Then A: = A + 1

write (A)

Unlock (B)

Unlock (A)

b. Execution of these transactions can result in deadlock. For example, consider the following partial schedule:

Question: 3-(I)

Narrate the actions that are considered for deadlock detection and the recovery from deadlock.

Answer:

In this approach, The OS doesn't apply any mechanism to avoid or prevent the deadlocks. Therefore the system
considers that the deadlock will definitely occur. In order to get rid of deadlocks, The OS periodically checks the
system for any deadlock. In case, it finds any of the deadlock then the OS will recover the system using some
recovery techniques.

The main task of the OS is detecting the deadlocks. The OS can detect the deadlocks with the help of Resource
allocation graph.

In single instanced resource types, if a cycle is being formed in the system then there will definitely be a deadlock.
On the other hand, in multiple instanced resource type graph, detecting a cycle is not just enough. We have to apply
the safety algorithm on the system by converting the resource allocation graph into the allocation matrix and request
matrix.

In order to recover the system from deadlocks, either OS considers resources or processes.
Question: 3-(II).

Assess and discuss the properties of a Transaction that ensure integrity of data in the database system

Answer:

One of the prevailing conversations around database systems lately revolves around ACID support. But not
everybody knows what is meant by the term ACID. Oh, perhaps they know that it involves how data integrity is
maintained or that it impacts locking. And, at a high level many folks know that relational/SQL systems support
ACID whereas that is not always the case for NoSQL database systems.

So, with this in mind, let’s examine what is meant by ACID. Firstly, ACID is an acronym for atomicity, consistency,
isolation, and durability. Each of these four qualities contribute to the ability of a transaction to ensure data integrity.

Atomicity means that a transaction must exhibit an “all or nothing” behavior. Either all of the instructions within the
transaction happen successfully, or none of them happen. Atomicity preserves the “completeness” of the business
process.

Consistency refers to the state of the data both before and after the transaction is executed. A transaction maintains
the consistency of the state of the data. In other words, after running a transaction, all data in the database is
“correct.”

UNIT_4_IMPLEMENTATION TECHNIQUES

PART A

1.Point out the ordered indices with example.

The indices are usually sorted to make searching faster. The indices which are sorted are known as ordered indices.
Example: Suppose we have an employee table with thousands of record and each of which is 10 bytes long. If their
IDs start with 1, 2, 3....and so on and we have to search student with ID-543.

2.Write about B + tree index file.

A B tree index is a multivalued index. It is a rooted tree satisfying the following properties:

All paths from the root to the leaf are equally long.

➢ All paths from the root to the leaf are equally long.
➢ A node that is not a root or leaf, has between [n / 2] and ‘n’ children.
➢ A leaf node has between [(n-1) / 2] and ‘n-1’ values.

3.Illustrate hash indexing.

Key1 Index Value

Hash
0 Value_1
Key2 Functi
on 1 Value_2
Key3
2 Value_3

3 Value_4
Hash Function

4.Defie seek time.

Seek time is the time taken for a hard disk controller to locate a specific piece of stored data. Other delays include
transfer time or data rate and rotational delay.

5.Access the factors to be considered for the evaluation of indexing and hashing techniques.

➢ Access types: The types of access that are supported efficiently.

➢ Access time: The time it takes to find a particular data item, or set of items, using the technique in
question.

➢ Insertion time: The time it takes to insert a new data item.

➢ Deletion time: The time it takes to delete a data item.

➢ Space overhead: The additional space occupied by an index structure.

6.Define mirroring.

Database mirroring is a relational database management (RDMS) technique to maintain consistent data in spite of
high availability needs by creating redundant copies of a dataset. A database mirror is a complete backup of the
database that can be used if the primary database fails.

7.Discuss about Dense Index.

Dense Index In a dense index, a record is created for every search key valued in the database. This helps you to
search faster but needs more space to store index records. In this Indexing, method records contain search key value
and points to the real record on the disk.

8.What is an Index?

Indexing is a way to optimize the performance of a database by minimizing the number of disk accesses required
when a query is processed. It is a data structure technique which is used to quickly locate and access the data in a
database.

Indexes are created using a few database columns.

• The first column is the Search key that contains a copy of the primary key or candidate key of the table.
These values are stored in sorted order so that the corresponding data can be accessed quickly.

• The second column is the Data Reference or Pointer which contains a set of pointers holding the address
of the disk block where that particular key value can be found.

9.Difference B Tree and B+ Tree.

B Tree B+ Tree
All internal and leaf nodes have data pointers. Only leaf nodes have data pointers.
Since all keys are not available at leaf, search often takes All keys are at leaf nodes, hence search is faster and
more time. more accurate.
No duplicate of keys is maintained in the tree. Duplicate of keys are maintained and all nodes are
present at the leaf.
Insertion takes more time and it is not predictable Insertion are easier and the results are always the same.
sometimes.
Deletion of the internal node is very complex and the Deletion of any node is easy because all node are found
tree has to undergo a lot of transformations. at leaf.

10.Distinguish between fixed length record and variable length records?

Fixed length records :

1.All the records in the file are of same size.

2. Leads to memory wastage.
3. Access of the records is easier and faster.
4. Exact location of the records can be determined: location of ith record would be.n*(i-1), where n is the size of
every record.

Variable length records:

1.Different records in the file have different sizes.
2. Memory efficient.
3. Access of the records is slo.

11.Show the advantage and disadvantages of RAID level 3.

Redundant Array of Independent Disks (RAID)

Advantage:

• Easy to implement technology.

• Complete utilization storage capacity.
• Good performance in both read and write operation.

Disadvantage:

• It is fault-tolerant. A single drive failure results in complete data loss.

• Not an ideal choice for mission critical system.

12. What are the ordered Index? Give an example?

The indices are usually sorted to make searching faster. The indices which are sorted are known as ordered indices.

Example: Suppose we have an employee table with thousands of record and each of which is 10 bytes long. If their
IDs start with 1, 2, 3....and so on and we have to search student with ID-543.

➢ In the case of a database with no index, we have to search the disk block from starting till it reaches 543.
The DBMS will read the record after reading 543*10=5430 bytes.

➢ In the case of an index, we will search using indexes and the DBMS will read the record after reading
542*2= 1084 bytes which are very less compared to the previous case.

13.Prepare the need for Query Optimization.

Query Optimization: A single query can be executed through different algorithms or re-written in different
forms and structures. Hence, the question of query optimization comes into the picture – Which of these forms or
pathways is the most optimal? The query optimizer attempts to determine the most efficient way to execute a
given query by considering the possible query plans.

Importance: The goal of query optimization is to reduce the system resources required to fulfill a query, and
ultimately provide the user with the correct result set faster.
14.Define Primary index and Secondary index.

Primary index: The primary index contains the key fields of the table and a pointer to the non-key fields of the table.
The primary index is created automatically when the table is created in the database.

Secondary index: Additional indexes could be created considering the most frequently accessed dimensions of the
table. In contrast, a secondary index is an index that is not a primary index and may have duplicates.

15.When is it preferable to use a dense index rather than a sparse index?

It is preferable to use a dense index instead of a sparse index when the file is not sorted on the indexed field such as
when the index is a secondary index) or when the index file is small compared to the size of memory.

16.Analyze query processor.

The query processor of a database system has the function of determining how to answer the requests for
information from a user in the most optimal manner. The idea is that a query can be answered by a database system
in a variety of ways. The most straightforward is the brute-force approach.

17.Examine about query evaluation plan.

When a query is submitted to DB, it is parsed and translated to relational algebra. It is verified for its validity and
correctness. Once it passes this stage, different ways of evaluating the query is generated. It is checked for various
factors and its execution plan is generated. It may be based on cost of the query or based on the equivalence rules.
Once cost based execution and rule based execution plans are generated, optimizer has to decide, which plan to be
selected for evaluation. This is the most important step in processing a query.

18.Differentiate Static Hashing and Dynamic Hashing.

Static Hashing Dynamic Hashing

Static hashing is a hashing technique that allows users to Dynamic hashing is a hashing technique in which the
perform lookups on a finalized dictionary set data buckets are added and removed dynamically and on
demand.
In static hashing, the resultant data bucket address is In dynamic hashing, however, the data buckets change
always the same. depending on the records.
Dynamic hashing is more efficient than static hashing. Dynamic hashing is more efficient than static hashing.

19.Point out the mechanisms to avoid collision during hashing.

If the set of objects we intend to store within our hash table is larger than the size of our hash table we are bound to
have two or more different objects having the same hash value; a hash collision. Even if the size of the hash table is
large enough to accommodate all the objects finding a hash function which generates a unique hash for each object
in the hash table is a difficult task. Collisions are bound to occur (unless we find a perfect hash function, which in
most of the cases is hard to find) but can be significantly reduced with the help of various collision resolution
techniques.

Following are the collision resolution techniques used:

➢ Open Hashing (Separate chaining)

➢ Closed Hashing (Open Addressing)

• Liner Probing

• Quadratic probing

• Double hashing

20.Develop the procedure to reduce the occurrence of bucket overflows in a hash file organization.

The condition of bucket-overflow is known as collision. This is a fatal state for any static hash function. In this
case, overflow chaining can be used. Overflow Chaining − When buckets are full, a new bucket is allocated for the
same hash result and is linked after the previous one. The occurrence of overflow can be reduced when the bucket in
linked to a chain, the whenever a bucket reaches to a position of overflow it can automatically linked with the other
buckets and so on.

PART B

1(i).Define B+ tree in details.

B+ Tree is a self-balancing data structure for executing accurate and faster searching, inserting and deleting
procedures on data We can easily retrieve complete data or partial data because going through the linked tree
structure makes it efficient. The B+ tree structure grows and shrinks with an increase/decrease in the number of
stored records.

Here are essential rules for B+ Tree.

➢ Leaves are used to store data records.

➢ It stored in the internal nodes of the Tree.
➢ If a target key value is less than the internal node, then the point just to its left side is followed.
➢ If a target key value is greater than or equal to the internal node, then the point just to its right side is
followed.
➢ The root has a minimum of two children.

(ii)How do you represent leaf node of a B+ tree of order p?

➢ Each internal node is of the form: <P1, K1, P2, K2, ….., Pc-1, Kc-1, Pc> where c <= a and each Pi is a
tree pointer (points to another node of the tree) and, each Ki is a key-value (see diagram-I for reference).
➢ Every internal node has : K1 < K2 < …. < Kc-1
➢ For each search field values ‘X’ in the sub-tree pointed at by Pi, the following condition holds : Ki-1 < X
<= Ki, for 1 < i < c and, Ki-1 < X, for i = c (See diagram I for reference)
➢ Each internal node has at most ‘a’ tree pointers.
➢ The root node has, at least two tree pointers, while the other internal nodes have at least \ceil(a/2) tree
pointers each.
➢ If an internal node has ‘c’ pointers, c <= a, then it has ‘c – 1’ key values.

2(i). Describe the ordered indices with example.

Ordered index is an index that is sorted and stored on its search key value is called as ordered index. For example, if
you consider any book’s last page indexes, you would see that they are ordered on the topics. It makes us easy to
visit the content related to the topic by searching in alphabetical order.

Example: Suppose we have an employee table with thousands of record and each of which is 10 bytes long. If their
IDs start with 1, 2, 3....and so on and we have to search student with ID-543.

➢ In the case of a database with no index, we have to search the disk block from starting till it reaches 543.
The DBMS will read the record after reading 543*10=5430 bytes.

➢ In the case of an index, we will search using indexes and the DBMS will read the record after reading
542*2= 1084 bytes which are very less compared to the previous case.

(ii)Describe the different methods of implementing variable length records.

Variable-length records arise in a database in several ways:

➢ Storage of multiple items in a file.

➢ Record types allowing variable field size
➢ Record types allowing repeating fields

3.Examine about RAID system. How does it improve performance and reliability? Discuss the level 3 and
level 3 of RAID.

4.Demonstrate the structure of B+ tree and give the algorithm for search in the B+ tree with example.

Structure:

➢ All leaves are at the same level.

➢ The root has at least two children.
➢ Each node except root can have a maximum of m children and at least m/2 children.
➢ Each node can be contain a maximum of m-1 keys and a minimum of m/2-1 keys.

Algorithm:

1. Start from the root node. Compare k with the keys at the root node k1,k2,k3….km-1
2. If k<k1 go to the leaf child of the root node.
3. Else if k==k1 compare k2. If k< k2 ,k lies between k1 and k2 .So,search in the left child of k2.
4. If k>k2 go to k3,k4….. and in step2 and 3
5. Repeat the above step until a leaf node is reached.
6. If k exists in the leaf node, return true else return false

5.Give a details description about query processing and optimizing. Explain the cost estimation of query
optimizing.

Query Processing is the activity performed in extracting data from the database. In query processing, it takes various
steps for fetching the data from the database. The steps involved are:

➢ Parsing and translation

➢ Optimization

➢ Evaluation

Cost estimation of query optimizing

• The cost of the query evaluation can vary for different types of queries. Although the system is responsible
for constructing the evaluation plan, the user does need not to write their query efficiently.

• Usually, a database system generates an efficient query evaluation plan, which minimizes its cost. This type
of task performed by the database system and is known as Query Optimization.
• For optimizing a query, the query optimizer should have an estimated cost analysis of each operation. It is
because the overall operation cost depends on the memory allocations to several operations, execution
costs, and so on.

6.Describe the different type of file organization. Explain using a sketch of each of them with their advantages
and disadvantages.

File Organization refers to the logical relationships among various records that constitute the file, particularly
with respect to the means of identification and access to any specific record. In simple terms, Storing t he files in
certain order is called file Organization. File Structure refers to the format of the label and data blocks and of any
logical control record.
Various methods have been introduced to Organize files. These particular methods have advantages and
disadvantages on the basis of access or selection . Thus it is all upon the programmer to decide the best suited file
Organization method according to his requirements.
Some types of File Organizations are :

➢ Sequential File Organization

➢ Heap File Organization
➢ Hash File Organization
➢ B+ Tree File Organization
➢ Clustered File Organization

7.Explain about static and dynamic hashing with an example.

Static Hashing: In static hashing, when a search-key value is provided, the hash function always computes the same
address. For example, if mod-4 hash function is used, then it shall generate only 5 values. The output address shall
always be same for that function. The number of buckets provided remains unchanged at all times.
Dynamic Hashing: The problem with static hashing is that it does not expand or shrink dynamically as the size of
the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are added and
removed dynamically and on-demand. Dynamic hashing is also known as extended hashing.

8(I). Show the various level of RAID system.

There are 6 levels of RAID system.

➢ RAID 01 (striping and mirroring; also known as “mirror of stripes”)

➢ RAID 03 (byte-level striping and dedicated parity)
➢ RAID 10 (disk mirroring and straight block-level striping)
➢ RAID 50 (distributed parity and straight block-level striping)
➢ RAID 60 (dual parity and straight block-level striping)

(II)Why data dictionary storage is important.

The data dictionary is very important as it contains information such as what is in the database, who is allowed to
access it, where is the database physically stored etc. The users of the database normally don't interact with the data
dictionary, it is only handled by the database administrators.

9(I). With simple algorithms, define the computing of nested loop join and block nested loop join.

Nested loop join algorithm:

A nested loop join is a join that contains a pair of nested for loops. To perform the nested loop join θ on two
relations r and s, we use an algorithm known as the Nested loop join algorithm. The computation takes place as:

r⋈θs

where r is known as the outer relation and s is the inner relation of the join. It is because the for loop of r encloses
the for loop of s.
Algorithm:

for each tuple tr in r do begin

for each tuple ts in s do begin
test pair (tr, ts) to test if they satisfy the given join condition ?
if test satisfied
add tr . ts to the result;
end inner loop
end outer loop

➢ In the algorithm, tr and ts are the tuples of relations r and s, respectively. The notation tr. ts is a tuple
constructed by concatenating the attribute values of tuples tr and ts.

(II) Sketch and concise the basic steps in query processing.

Basic steps in query processing:

Parsing and translation

• Translate the query into its internal form. This is then translated into relational algebra.
• Parser checks syntax, verifies relation.

Optimization

• SQL is a very high level language:

o The users specify what to search for- not how the search is actually done
o The algorithms are chosen automatically by the DBMS.
• For a given SQL query there may be many possible execution plans.
• Amongst all equivalent plans choose the one with lowest cost.
• Cost is estimated using statistical information from the database catalog.

Evaluation

• The query evaluation engine takes a query evaluation plan, executes that plan and returns the answer to that
query.

Relational
Query
Parser & algebra
translat expression
or
optimi
zer
Query Output Evaluation Execution plan
engine
10.Analyse about the index schemas used in databases.

An index is a schema object that contains an entry for each value that appears in the indexed column(s) of the table
or cluster and provides direct, fast access to rows. The users cannot see the indexes, they are just used to speed up
searches/queries. An index is defined by a field expression that you specify when you create the index.

11(I).Analyze about the B+ tree file organization in detail.

The B+ tree file organization is a very advanced method of an indexed sequential access mechanism. In File, records
are stored in a tree-like structure. It employs a similar key-index idea, in which the primary key is utilised to sort the
records. The index value is generated for each primary key and mapped to the record.
Unlike a binary search tree (BST), the B+ tree can contain more than two children. All records are stored solely at
the leaf node in this method. The leaf nodes are pointed to by intermediate nodes. There are no records in them.

(II) Identify a B+ tree to insert the following key elements(order of the tree 3) 5,3,4,9,7,15,14,21,22,23.

12. Examine the algorithm of SELECT and JOIN operations.

Join

for each tuple tR in TR do

for each tuple ts in Ts do
compare (tR, ts) if they satisfies the condition
add them in the result of the join
end
end

13. Summarize in detail about Heuristic optimization algorithm.

A heuristic algorithm is one that is designed to solve a problem in a faster and more efficient fashion than traditional
methods by sacrificing optimality, accuracy, precision, or completeness for speed. Heuristic algorithms often times
used to solve NP-complete problems, a class of decision problems. In these problems, there is no known efficient
way to find a solution quickly and accurately although solutions can be verified when given. Heuristics can produce
a solution individually or be used to provide a good baseline and are supplemented with optimization algorithms.
Heuristic algorithms are most often employed when approximate solutions are sufficient and exact solutions are
necessarily computationally expensive.

14(I). Explain in detail about optimization of disk block access.

Data is transferred between disk and main memory in units called blocks. A block is a contiguous sequence of bytes
from a single track of one platter. Block sizes range from 512 bytes to several thousand. Optimization techniques
besides buffering of blocks in main memory.

➢ Scheduling: If several blocks from a cylinder need to be transferred, we may save time by requesting them
in the order in which they pass under the heads. A commonly used disk-arm scheduling algorithm is
the elevator algorithm.
➢ File organization. Organize blocks on disk in a way that corresponds closely to the manner that we expect
data to be accessed. For example, store related information on the same track, or physically close tracks, or
adjacent cylinders in order to minimize seek time. IBM mainframe OS's provide programmers fine control
on placement of files but increase programmer's burden.
➢ Nonvolatile write buffers. Use nonvolatile RAM to speed up disk writes drastically (first write to
nonvolatile RAM buffer and inform OS that writes completed).
➢ Log disk. Another approach to reducing write latency is to use a log disk, a disk devoted to writing a
sequential log.

(II) Generalization about mirrored RAID levels.

The various ways in which data is grouped across drives is called the RAID level. Each RAID level is denoted by a
number following the word RAID. The most common levels are RAID 0, RAID 1, RAID 5 and RAID 6. The RAID
level is generally determined by the requirements of the applications running on the server. RAID 0 is the fastest,
RAID 1 is the most reliable and RAID 5 is considered a good combination of both.

PART C

1.Create B tree and B+ tree to insert the following key values (the order of the tree is 3)
32,11,15,13,7,22,15,44,67,4.

2.The following key values are organized in an extendable hashing technique. 1,3,5,8,9,12,17,28 . Show the
extendable hash structure for this file if the hash function is h(x)=x mod 8 and buckets can hold three records.
Access how the extendable hash structure changes as the result of each of the following steps:

Insert 2

Insert 24

Delete 5

Delete 12

3(I) . Explain how reliability can be improve through redundancy.

Redundancy is used in system design with the purpose of increasing reliability. It is found in many engineering
systems such as bridges, distribution systems, process plants and communication systems. Redundancy may be
defined as the provision of more than one means of performing a required function. One of the key drivers of quality
is the performance of the product over a period of time. Performance of product is determined by the reliability and
redundancy. Reliability increases the efficiency while redundancy increases the current capability and expectations.
(II) How records are represented and organized in a file. Explain it with suitable example.

Some types of File Organizations are :

➢ Sequential File Organization

➢ Heap File Organization
➢ Hash File Organization
➢ B+ Tree File Organization
➢ Clustered File Organization

4(I) . Explain the architecture of a distribute database system.

In these systems, each peer acts both as a client and a server for imparting database services. The peers share their
resource with other peers and co-ordinate their activities. This architecture generally has four levels of schemas.

➢ Global Conceptual Schema: Depicts the global logical view of data.

➢ Local Conceptual Schema: Depicts logical data organization at each site.
➢ Local Internal Schema: Depicts physical data organization at each site.
➢ External Schema: Depicts user view of data.

(II) Generalize the concept of raid.

Redundant Array of Independent Disk (RAID) combines multiple small, inexpensive disk drives into an array of
disk drives which yields performance more than that of a Single Large Expensive Drive (SLED). RAID is also
called Redundant Array of Inexpensive Disks. These schemas are as RAID 0, RAID 1, RAID 2......RAID 6. It
contains a set of physical disk drives. In this technology, the operating system views these separate disks as a single
logical disk.
Unit 5

Part – A
Question-01:

Discriminate the meaning of homogenous and heterogenous in DDBMS.

Answer:

A DDBMS may be classified as homogeneous or heterogeneous.

In a homogenous distributed system, each database is an Oracle database.

In a heterogeneous distributed system, at least one of the databases is a non-Oracle database.

Question-02:

Give the definition of distributed database system.

Answer:

A distributed database is a database that consists of two or more files located in different sites either on the same
network or on entirely different networks. Portions of the database are stored in multiple physical locations and
processing is distributed among multiple database nodes.

Question-03:

What are the types of distributed database?

Answer:

Distributed databases can be broadly classified into homogeneous and heterogeneous distributed database
environments, each with further sub-divisions, as following.

Types of Homogeneous Distributed Databases

• Autonomous − Each database is independent that functions on its own. They are integrated by a
controlling application and use message passing to share data updates.
• Non-autonomous − Data is distributed across the homogeneous nodes and a central or master
DBMS co-ordinates data updates across the sites.

Types of Heterogeneous Distributed Databases

• Federated − The heterogeneous database systems are independent in nature and integrated together
so that they function as a single database system.
• Un-federated − The database systems employ a central coordinating module through which the
databases are accessed.

Question-04:

Define fragmentation in distributed database.

Answer:
Fragmentation is a database server feature that allows you to control where data is stored at the table level.
Fragmentation enables you to define groups of rows or index keys within a table according to some algorithm or
scheme.

Question-05:

Define information retrieval system. Show how it differs from database system.

Answer:

An IR system is a software system that provides access to books, journals and other documents; stores and manages
those documents.

Difference between information retrieval system and database system.

Question-06:

List the advantages of OODB.

Answer:

Advantages of OODB

• Complex data and a wider variety of data types compared to MySQL data types.
• Easy to save and retrieve data quickly.
• Seamless integration with object-oriented programming languages.
• Easier to model the advanced real-world problems.
• Extensible with custom data types.
Question-07:
Write about object database system.

Answer:
An object-oriented database (OOD) is a database system that can work with complex data objects — that is, objects
that mirror those used in object-oriented programming languages. In object-oriented programming, everything is an
object, and many objects are quite complex, having different properties and methods.

Question-08:
Examine the object-oriented data model with your own example.

Answer:

An object is an abstraction of a real-world entity or we can say it is an instance of class. Objects encapsulates data
and code into a single unit which provide data abstraction by hiding the implementation details from the user. For
example: Instances of student, doctor, engineer in above figure.
Question-09:

Give the types in object relational feature in oracle.

Answer:

• Defining types
• Dropping types
• Constructing objects
• Methods
• Queries involving types
• Declaring types for relation
• References
• Nested tables
• Nested tables of references
• Converting relations to object-relations

Question-10:

Compare sequential access devices versus random access devices with example.

Answer:
Question-11:

Show the advantages of distributed database system.

Answer:

The separation of the various system components, especially the separation of application servers from database
servers, yields tremendous benefits in terms of cost, management, and performance.

• Tunability
• Platform autonomy
• Fault tolerance
• Scalability
• Location transparency
• Site autonomy
• Enhanced security

Question-12:

Demonstrate ODL.

Answer:

ODL is used to define persistent classes, whose objects are stored permanently in the database.

Question-13:

What are the needs of object-oriented database?

Answer:

Object databases are commonly used in applications that require high performance, calculations, and faster results.
Some of the common applications that use object databases are real-time systems, architectural & engineering for
3D modelling, telecommunications, and scientific products, molecular science, and astronomy.

Question-14:

Compare ODL and OQL.

Answer:

ODL = Object Description Language, like CREATE TABLE part of SQL.

OQL = Object Query Language, tries to imitate SQL in an OO framework.

Question-15:What is relevance ranking?

Answer: Relevance ranking is a core problem of Information Retrieval which plays a fundamental role in various
real-world applications, such as search engines. Given a query and a set of candidate text documents, relevance
ranking algorithms determine how relevant each text document is for the given query.
PART B
QNO.1:
[I] Information Retrieval can be defined as a software program that deals with the organization, storage, retrieval,
and evaluation of information from document repositories, particularly textual information. Information Retrieval is
the activity of obtaining material that can usually be documented on an unstructured nature i.e. usually text which
satisfies an information need from within large collections which is stored on computers. For example, Information
Retrieval can be when a user enters a query into the system. Not only librarians, professional searchers, etc engage
themselves in the activity of information retrieval but nowadays hundreds of millions of people engage in IR every
day when they use web search engines. Information Retrieval is believed to be the dominant form of Information
access. The IR system assists the users in finding the information they require but it does not explicitly return the
answers to the question. It notifies regarding the existence and location of documents that might consist of the
required information. Information retrieval also extends support to users in browsing or filtering document
collection or processing a set of retrieved documents. The system searches over billions of documents stored on
millions of computers. A spam filter, manual or automatic means are provided by Email program for classifying the
mails so that it can be placed directly into particular folders. An IR system has the ability to represent, store,
organize, and access information items. A set of keywords are required to search. Keywords are what people are
searching for in search engines. These keywords summarize the description of the information.

[II] Information Retrieval

• A transaction process system (TPS) is an information processing system for business transactions
involving the collection, modification and retrieval of all transaction data. Characteristics of a TPS
include performance, reliability and consistency.TPS is also known as transaction processing or real-
time processing.
A transaction process system and transaction processing are often contrasted with a batch process system and batch
processing, where many requests are all executed at one time. The former requires the interaction of a user, whereas
batch processing does not require user involvement. In batch processing the results of each transaction are not
immediately available. Additionally, there is a delay while the many requests are being organized, stored and
eventually executed. In transaction processing there is no delay and the results of each transaction are immediately
available. During the delay time for batch processing, errors can occur. Although errors can occur in transaction
processing, they are infrequent and tolerated, but do not warrant shutting down the entire system.

QUE NO .2:
[I] Discuss on distributed transaction
A distributed transaction is a set of operations on data that is performed across two or more data repositories
(especially databases). It is typically coordinated across separate nodes connected by a network, but may also span
multiple databases on a single server.There are two possible outcomes: 1) all operations successfully complete, or 2)
none of the operations are performed at all due to a failure somewhere in the system. In the latter case, if some work
was completed prior to the failure, that work will be reversed to ensure no net work was done. This type of operation
is in compliance with the “ACID” (atomicity-consistency-isolation-durability) principles of databases that ensure
data integrity. ACID is most commonly associated with transactions on a single database server, but distributed
transactions extend that guarantee across multiple databases.

QUE 3: Differentiate Document Type Definition and XML schema with suitable example.
• If a city element in the same document also needs to have a child element 'name' the DTD requires
that this 'name' element must have child elements first and last as well. Despite the fact that city.name
does not require first and last as children.In contrast, XML Schema allows you to declare child element
types locally; you could declare the name child elements for both person and city separately. Thus
giving them their proper content models in those contexts.The other major difference is support for
namespaces. Since DTDs are part of the original XML specification (and inherited from SGML), they
are not namespace-aware at all because XML namespaces were specified later. You can use DTDs in
combination with namespaces, but it requires some contortions, like being forced to define the prefixes
in the DTD and using only those prefixes, instead of being able to use arbitrary prefixes.To me, other
differences are mostly superficial. Datatype support could easily be added to DTDs, and syntax is just
syntax. (I, for one, find the XML Schema syntax horrible and would never want to hand-maintain an
XML Schema, which I wouldn't say about DTDs or RELAX NG schemas; if I need an XML Schema
for some reason, I usually write a RELAX NG one and convert it with trang.
QUE 4:
[I] Point out the usage of OQL, the DMG’s query language
• Query languages, often known as DQLs or Data Query Languages, are computer languages that
are used to make various queries in information systems and databases. The Structured Query
Language (SQL) is a well-known example. DQL statements are used to query the data contained in
schema objects. Use OQL to create new databases or insert data into existing databases (to configure
the operation of Network Manager components) by amending the component schema files.

[ii] Analyze the methods to store XML documents

Answer:
Approaches to store the XML Document
1. Using a DBMS to store the document as text.
2. Using a DBMS to store the document as data elements.
3. Designing a specialized system for storing native XML document.
4. Creating or publishing customized XML documents from preexisting relational database.

QUENO 5:
[I] Discuss the approaches to store relations in distributed database
There are 2 ways in which data can be stored on different sites. These are:
1. Replication –
In this approach, the entire relationship is stored redundantly at 2 or more sites. If the entire database is available at
all sites, it is a fully redundant database. Hence, in replication, systems maintain copies of data.
2. Fragmentation –
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and each of the fragments is
stored in different sites where they’re required. It must be made sure that the fragments are such that they can be
used to reconstruct the original relation (i.e, there isn’t any loss of data).
Fragmentation is advantageous as it doesn’t create copies of data, consistency is not a problem.

Fragmentation of relations can be done in two ways:

• Horizontal fragmentation – Splitting by rows –

The relation is fragmented into groups of tuples so that each tuple is assigned to at least one fragment.
• Vertical fragmentation – Splitting by columns –
The schema of the relation is divided into smaller schemas. Each fragment must contain a common
candidate key so as to ensure a lossless join.
[II] Discuss how the effectiveness of retrieval is measured
The effectiveness of information retrieval systems is essentially measured by comparing performance,
functionality and systematic approach on a common set of queries and documents. The significance tests are
used to evaluate functional, performance (precession and recall), collection and interface evaluation.

QUENO 6
Describe about object database concepts
An object database is managed by an object-oriented database management system (OODBMS). The database
combines with relational database principles.
• Objects are the basic building block and an instance of a class, where the type is either built-in or
user-defined.
• Classes provide a schema or blueprint for objects, defining the behavior.
• Methods determine the behavior of a class.
• Pointers help access elements of an object database and establish relations between objects.
Object-oriented databases closely relate to object-oriented programming concepts. The four main ideas of object-
oriented programming are:
• Polymorphism
• Inheritance
• Encapsulation
• Abstraction

Polymorphism
Polymorphism is the capability of an object to take multiple forms. This ability allows the same program code to
work with different data types. Both a car and a bike are able to break, but the mechanism is different. In this
example, the action break is a polymorphism. The defined action is polymorphic — the result changes depending
on which vehicle performs.
Inheritance
Inheritance creates a hierarchical relationship between related classes while making parts of code reusable.
Defining new types inherits all the existing class fields and methods plus further extends them. The existing class is
the parent class, while the child class extends the parent.
For example, a parent class called Vehicle will have child classes Car and Bike. Both child classes inherit
information from the parent class and extend the parent class with new information depending on the vehicle type.
Encapsulation
Encapsulation is the ability to group data and mechanisms into a single object to provide access protection.
Through this process, pieces of information and details of how an object works are hidden, resulting in data and
function security. Classes interact with each other through methods without the need to know how particular
methods work.

Abstraction
Abstraction is the procedure of representing only the essential data features for the needed functionality. The
process selects vital information while unnecessary information stays hidden. Abstraction helps reduce the
complexity of modeled data and allows reusability.For example, there are different ways for a computer to connect
to the network. A web browser needs an internet connection. However, the connection type is irrelevant. An
established connection to the internet represents an abstraction, whereas the various types of connections represent
different implementations of the abstraction.
7-i: An object-relational database is a database management system like a relational database, but with an object-
oriented database model: objects, classes and inheritance are directly supported in database schemas as well as
within the query language. The popular RDBMS on the market today is Oracle.

7-ii: XML is a markup language based on Standard Generalized Markup Language (SGML) used for defining
markup languages. XML's primary function is to create formats for data that is used to encode information for
documentation, database records, transactions, and many other types of data.

8-i: Most IR systems also allow the use of Boolean and other operators to build a complex query. The query
language with these operators enriches the expressiveness of a user’s information need. The Information Retrieval
(IR) system finds the relevant documents from a large data set according to the user query.

8-ii: Information retrieval (IR) may be defined as a software program that deals with the organization, storage,
retrieval and evaluation of information from document repositories particularly textual information. The system
assists users in finding the information they require but it does not explicitly return the answers of the questions. It
informs the existence and location of documents that might consist of the required information. The documents that
satisfy user’s requirement are called relevant documents. A perfect IR system will retrieve only relevant documents.

With the help of the following diagram, we can understand the process of information retrieval (IR) −
It is clear from the above diagram that a user who needs information will have to formulate a request in the form of
query in natural language. Then the IR system will respond by retrieving the relevant output, in the form of
documents, about the required information.

Types of Information Retrieval (IR) Model

An information model (IR) model can be classified into the following three models −

Classical IR Model

It is the simplest and easy to implement IR model. This model is based on mathematical knowledge that was easily
recognized and understood as well. Boolean, Vector and Probabilistic are the three classical IR models.

Non-Classical IR Model

It is completely opposite to classical IR model. Such kind of IR models are based on principles other than similarity,
probability, Boolean operations. Information logic model, situation theory model and interaction models are the
examples of non-classical IR model.

Alternative IR Model

It is the enhancement of classical IR model making use of some specific techniques from some other fields. Cluster
model, fuzzy model and latent semantic indexing (LSI) models are the example of alternative IR model.

Design features of Information retrieval (IR) systems

Let us now learn about the design features of IR systems −

Inverted Index
The primary data structure of most of the IR systems is in the form of inverted index. We can define an inverted
index as a data structure that list, for every word, all documents that contain it and frequency of the occurrences in
document. It makes it easy to search for ‘hits’ of a query word.

Stop Word Elimination

Stop words are those high frequency words that are deemed unlikely to be useful for searching. They have less
semantic weights. All such kind of words are in a list called stop list. For example, articles “a”, “an”, “the” and
prepositions like “in”, “of”, “for”, “at” etc. are the examples of stop words. The size of the inverted index can be
significantly reduced by stop list. As per Zipf’s law, a stop list covering a few dozen words reduces the size of
inverted index by almost half. On the other hand, sometimes the elimination of stop word may cause elimination of
the term that is useful for searching. For example, if we eliminate the alphabet “A” from “Vitamin A” then it would
have no significance.

9: Crawling refers to following the links on a page to new pages, and continuing to find and follow links on new
pages to other new pages.

A web crawler is a software program that follows all the links on a page, leading to new pages, and continues that
process until it has no more new links or pages to crawl.

Web crawlers are known by different names: robots, spiders, search engine bots, or just “bots” for short. They are
called robots because they have an assigned job to do, travel from link to link, and capture each page’s information.
Unfortunately, If you envisioned an actual robot with metal plates and arms, that’s not what these robots look
like. Google’s web crawler is named Googlebot.

10-i: A distributed database is a database that consists of two or more files located in different sites either on the
same network or on entirely different networks. Portions of the database are stored in multiple physical locations
and processing is distributed among multiple database nodes.

10-ii: A distributed database is basically a database that is not limited to one system, it is spread over different sites,
i.e, on multiple computers or over a network of computers. A distributed database system is located on various sites
that don’t share physical components. This may be required when a particular database needs to be accessed by
various users globally. It needs to be managed such that for the users it looks like one single database.

11: It’s a figure that expresses the statistical importance of any given word to the document collection as a whole.

In plain language, the more often a word appears in a document collection, the more important it is, and the heavier
that term is weighted.

12: Information Retrieval (IR) can be defined as a software program that deals with the organization, storage,
retrieval, and evaluation of information from document repositories, particularly textual information. Information
Retrieval is the activity of obtaining material that can usually be documented on an unstructured nature i.e. usually
text which satisfies an information need from within large collections which is stored on computers. For example,
Information Retrieval can be when a user enters a query into the system.

13: Object Query Language (OQL) is a version of the Structured Query Language (SQL) that has been designed for
use in Network Manager. The components create and interact with their databases using OQL.
14: XML Hierarchical (Tree) Data Model

We now introduce the data model used in XML. The basic object in XML is the XML document. Two main
structuring concepts are used to construct an XML document: elements and attributes. It is important to note that
the term attribute in XML is not used in the same manner as is customary in database terminology, but rather as it is
used in document description languages such as HTML and SGML. Attributes in XML provide additional
information that describes elements, as we will see. There are additional concepts in XML, such as entities,
identifiers, and references, but first we concentrate on describing elements and attributes to show the essence of the
XML model.

DBMS Oral Questions
No ratings yet
DBMS Oral Questions
15 pages
MCQ Database Engineering (260 Questions)
100% (2)
MCQ Database Engineering (260 Questions)
46 pages
Most Popular Database Interview Questions and Answers
No ratings yet
Most Popular Database Interview Questions and Answers
12 pages
Dbms Viva Questions
No ratings yet
Dbms Viva Questions
14 pages
Dbms Module 1 Questions With Answers
No ratings yet
Dbms Module 1 Questions With Answers
7 pages
General Viva Mujahid T201046
No ratings yet
General Viva Mujahid T201046
72 pages
DBMS Interview Questions - Answers
No ratings yet
DBMS Interview Questions - Answers
3 pages
DBMS Important Ques For End Sem
No ratings yet
DBMS Important Ques For End Sem
25 pages
DBMS Interview Questions
No ratings yet
DBMS Interview Questions
15 pages
DBMS Interview1-Merged-1
No ratings yet
DBMS Interview1-Merged-1
81 pages
ICT Assignment
No ratings yet
ICT Assignment
27 pages
TCS Prep Camp DBMS
No ratings yet
TCS Prep Camp DBMS
120 pages
DBS Viva
No ratings yet
DBS Viva
22 pages
DBMS Solution Bank (2M)
No ratings yet
DBMS Solution Bank (2M)
22 pages
DB Question Ans
No ratings yet
DB Question Ans
12 pages
Important Questions
No ratings yet
Important Questions
11 pages
Updated DBMS IMPORTANT QUES FOR MID SEM
No ratings yet
Updated DBMS IMPORTANT QUES FOR MID SEM
19 pages
Inter Part II Short Questions Notes
No ratings yet
Inter Part II Short Questions Notes
18 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
18 pages
DBMS Notes
No ratings yet
DBMS Notes
6 pages
DBMS and SQL MCQs For Placement Drives
No ratings yet
DBMS and SQL MCQs For Placement Drives
12 pages
Cs3492 Dbms MLM
No ratings yet
Cs3492 Dbms MLM
13 pages
Unit 1
No ratings yet
Unit 1
89 pages
DBMS Interview Questions
No ratings yet
DBMS Interview Questions
41 pages
$RZAT2JZ
No ratings yet
$RZAT2JZ
18 pages
DBMS Interview Questions and Answers
No ratings yet
DBMS Interview Questions and Answers
5 pages
DBMS Interview Questions
No ratings yet
DBMS Interview Questions
12 pages
DBMS Viva Q&a
No ratings yet
DBMS Viva Q&a
5 pages
DBMS Sem Imp C
No ratings yet
DBMS Sem Imp C
11 pages
Simple Interview Questions
No ratings yet
Simple Interview Questions
40 pages
DBMS Interview Questions
No ratings yet
DBMS Interview Questions
11 pages
DBMS Important Questions For Annual Exams
No ratings yet
DBMS Important Questions For Annual Exams
14 pages
Dbms Questions
No ratings yet
Dbms Questions
34 pages
COM 508 Assignment 3
No ratings yet
COM 508 Assignment 3
10 pages
DBMS Viva Questions - Coders Lodge
No ratings yet
DBMS Viva Questions - Coders Lodge
17 pages
Data Base Management System
No ratings yet
Data Base Management System
10 pages
Dbms CCV Questions
No ratings yet
Dbms CCV Questions
14 pages
DBMS Interview Questions
No ratings yet
DBMS Interview Questions
19 pages
DBMS Questions: Ramesh 1 Ramesh
No ratings yet
DBMS Questions: Ramesh 1 Ramesh
86 pages
Dbms Interview Questions
No ratings yet
Dbms Interview Questions
11 pages
Unit 1 Two Marks Q&A IT
No ratings yet
Unit 1 Two Marks Q&A IT
7 pages
CS8492-Database Management Systems-2017r Question Bank
No ratings yet
CS8492-Database Management Systems-2017r Question Bank
24 pages
DBMS Interview Questions-DBMS Interview Questions: Dr. Anil Kumar Dubey (9783934206)
No ratings yet
DBMS Interview Questions-DBMS Interview Questions: Dr. Anil Kumar Dubey (9783934206)
41 pages
Dbms Viva Questions - Coders Lodge
No ratings yet
Dbms Viva Questions - Coders Lodge
18 pages
Dbms Class 12
No ratings yet
Dbms Class 12
6 pages
SQL Interview Question Must Learn
No ratings yet
SQL Interview Question Must Learn
18 pages
DBMS Interview Questions
No ratings yet
DBMS Interview Questions
16 pages
Dbms Important Questions
No ratings yet
Dbms Important Questions
15 pages
CS8492-Database Management Systems Department of CSE: Relational Databases
No ratings yet
CS8492-Database Management Systems Department of CSE: Relational Databases
18 pages
DBMS - Two Mark Question
No ratings yet
DBMS - Two Mark Question
10 pages
CS8492 DBMS - Part A & Part B
No ratings yet
CS8492 DBMS - Part A & Part B
26 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
10 pages
Database Management Systems2
100% (1)
Database Management Systems2
17 pages
CS8492-Database Management Systems-2017R Question Bank
No ratings yet
CS8492-Database Management Systems-2017R Question Bank
18 pages
Oracle SQL Revision Tour and Database Fundamentals
No ratings yet
Oracle SQL Revision Tour and Database Fundamentals
5 pages