DBMS CSE Digital Notes 2020-2021 - March 9th 2021

LECTURE NOTES
ON
DATA BASE MANAGEMENT SYSTEMS
II B. Tech II semester (1805PC64)
Prepared by
Mr. G. Bhanu Prasad Ms.D.Srivalli

Associate. Professor Asst. Professor
Mrs. Ch.Anitha
Asst. Professor
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN
(Autonomous Institution-UGC, Govt. of India)
Accredited by NBA & NAAC with ‘A’ Grade, UGC, Govt. of India
NIRF Indian Ranking–2020, Accepted by MHRD, Govt. of India
Permanently Affiliated to JNTUH, Approved by AICTE, ISO 9001:2015 Certified Institution
AAAA+ Rated by Digital Learning Magazine, AAA+ Rated by Careers 360 Magazine, 6th Rank CSR
Platinum Rated by AICTE-CII Survey, Top 100 Rank band by ARIIA, MHRD, Govt. of India
National Ranking-Top 100 Rank band by Outlook, National Ranking-Top 100 Rank band by Times News Magazine
Maisammaguda, Dhulapally, Secunderabad, Kompally-500100
2020 – 2021
Course Name: Data Base Management Systems
Course Code: 1805PC64
Course Objectives:
 To understand the basic concepts and the applications of database systems.
 To master the basics of SQL and construct queries using SQL.
 To understand the relational database design principles.
 To become familiar with the basic issues of transaction processing and concurrency
control.
 To become familiar with database storage structures and access techniques.
Course Outcomes:
 Demonstrate the basic elements of a relational database management system.
 Ability to identify the data models for relevant problems.
 Ability to design entity relationship model and convert entity relationship diagrams into
RDBMS and formulate SQL queries on the data.
 Apply normalization for the development of application software.
Mapping of Course outcome with Program Outcomes
Course PO PO PO PO PO PO PO PO PO PO PO PO
Outcome 1 2 3 4 5 6 7 8 9 10 11 12
CO1 H H H H H H M
CO2 H H H H H M
CO3 H H H H H H M
CO4 H H H H H M
H – High
M – Medium
L - Low
MALLA REDDY ENGINEERING COLLEGE FOR WOMEN
(1805PC06)DATABASE MANAGEMENT SYSTEMS
B.Tech. II Year II Sem LTPC
3 0 03
Course Objectives:
 To understand the basic concepts and the applications of database systems.
 To master the basics of SQL and construct queries using SQL.
 To understand the relational database design principles.
 To become familiar with the basic issues of transaction processing and concurrency control.
 To become familiar with database storage structures and access techniques.
Course Outcomes:
 Demonstrate the basic elements of a relational database management system and Ability to
identify the data models for relevant problems.
 Ability to design entity relationship model and convert entity relationship diagrams into
RDBMS and formulate SQL queries on the data.
 Apply normalization for the development of application software.
UNIT – I: Introduction: Database System Applications, Purpose of Database Systems, View of

Data, Database Languages – DDL, DML, Relational Databases, Database Design, Database
Architecture, Data Mining and Information Retrieval, Database Users and Administrators, History
of Database Systems.
Introduction to Data base design: Database Design and ER diagrams, Entities, Attributes and
Entity sets, Relationships and Relationship sets, Additional features of ER Model, Conceptual
Design with the ER Model, Conceptual Design for Large enterprises.
UNIT – II: Relational Model: Introduction to the Relational Model, Integrity Constraints over
Relations, Enforcing Integrity constraints, Querying relational data, Logical data base Design: ER
to Relational, Introduction to Views, Destroying /Altering Tables and Views.
Relational Algebra and Calculus: Preliminaries, Relational Algebra, Relational calculus – Tuple
relational Calculus, Domain relational calculus.
UNIT – III: SQL: Queries, Constraints, Triggers: Form of Basic SQL Query, UNION,
INTERSECT, and EXCEPT, Nested Queries, Aggregate Operators, NULL values, Natural
JOINS, Complex Integrity Constraints in SQL, Triggers and Active Data bases..
Schema Refinement and Normal Forms: Introduction to Schema Refinement, Functional

Dependencies - Reasoning about FDs, Normal Forms, Properties of Decompositions,
Normalization, Schema Refinement in Database Design, Other Kinds of Dependencies.
UNIT – IV: Transaction Management: Transactions, Transaction Concept, A Simple
Transaction Model, Storage Structure, Transaction Atomicity and Durability, Transaction
Isolation, Serializability.
Concurrency Control: Lock–Based Protocols, Multiple Granularity, Timestamp-Based
Protocols, Validation-Based Protocols.
Recovery System-Failure Classification, Recovery and Atomicity, Recovery Algorithm, Buffer
Management, Failure with loss of nonvolatile storage, Remote Backup systems.
UNIT – V: Storage and Indexing: Overview of Storage and Indexing: Data on External Storage,
File Organization and Indexing, Index Data Structures, Comparison of File Organizations. Tree-
Structured Indexing: Intuition for tree Indexes, Indexed Sequential Access Method (ISAM), B+
Trees: A Dynamic Index Structure, Search, Insert, Delete.
TEXT BOOKS:
1. Data base Management Systems, Raghu Ramakrishnan, Johannes Gehrke, McGraw Hill
Education (India) Private Limited, 3rd Edition. (Part of UNIT-I, UNIT-II, UNIT-III, UNIT-
V)
2. Data base System Concepts, A. Silberschatz, Henry. F. Korth, S. Sudarshan, McGraw Hill
Education(India) Private Limited l, 6th edition.( Part of UNIT-I,UNIT-IV)
REFERENCE BOOKS:
1. Database Systems, 6th edition, R Elmasri, Shamkant B.Navathe, Pearson Education.
2. Database System Concepts, Peter Rob & Carlos Coronel, Cengage Learning.
3. Introduction to Database Management, M. L. Gillenson and others, Wiley Student Edition.
4. 4.Database Development and Management, Lee Chao, Auerbach publications, Taylor&
Francis Group. Introduction to Database Systems, C. J. Date, Pearson Education.
INDEX
S. No Title Page No.

Unit – I (Introduction to Database Management Systems & Database Design)
1.1 Database Management System (DBMS) and Its Applications 1
1.2 Purpose of DBMS systems 2
1.3 View of data 3
1.4 Database Languages 4
1.5 Relational Databases 6
1.6 Database Design 7
1.7 Structure of a DBMS 7
1.8 Data Mining And Information Retrieval 10
1.9 Database users and Administrators 11
1.10 A Brief History of Database Management Systems 13
1.11 Introduction to Data base design 15
1.12 Entities 16
1.13 Attributes and Entity sets 19
1.14 Relationships and Relationship sets 20
1.15 Additional Features of ER Diagram 22
1.16 Conceptual Design with the ER Model 28
Unit – II (Relational Model, Relational Algebra and Calculus)
2.1 Introduction to the Relational Model 32
2.2 Integrity Constraints over Relations 33
2.3 Enforcing Integrity Constraints 37
2.4 Querying Relational Data 38
2.5 Logical Data Base Design 39
2.6 Introduction to Views 42
2.7 Destroying/Altering Tables and Views 42
2.8 Relational Algebra and Calculus 43
2.9 Relational Calculus 47
2.10 Tuple Relational Calculus 47
2.11 Domain Relational Calculus 48
UNIT – III (SQL, Schema Refinement and Normal Forms)
3.1 The Form of a Basic SQL Query 51
3.2 Union, Intersect, And Except 54
3.3 Nested Queries 55
3.4 Aggregate Operators 61
3.5 Null Values 62
3.6 Complex Integrity - Constraints in SQL Triggers 67
3.7 Triggers and Active Databases 68
3.8 Introduction to Schema Refinement 69
3.9 Functional dependencies 71
3.10 Normalization 73
3.11 Schema Refinement or Database design 80
3.12 Other Kinds Of Dependencies 80
UNIT – IV (TRANSACTION MANAGEMENT & RECOVERY SYSTEM)
4.1 Transactions 83
4.2 Transaction Concept 83
4.3 A Simple Transaction Model 84
4.4 Storage Structure 86
4.5 Transaction Atomicity and Durability 88
4.6 Serializability 91
4.7 Lock-Based protocols 92
4.8 Multiple-Granularity Locking 93
4.9 Timestamp-based Protocols 95
4.10 Validation-Based Protocols 96
4.11 Failure Classification 99
4.12 Recovery and Atomicity 100
4.13 Recovery Algorithm 102
4.14 Buffer Management 102
4.15 Failure with Loss of Nonvolatile Storage 103
4.16 Remote database systems 104
UNIT-V (Storage and Indexing)
5.1 Overview of Storage and Indexing 107
5.2 Data on External Storage 110
5.3 File Organizations And Indexing 111
5.4 Comparison Of Three File Organizations 115
5.5 Indexed Sequential Access Method (ISAM) 120
5.6 B+ Tree 122
Assignment Questions 124
Tutorial Questions 126
Important Questions 128
Objective Questions 132
Internal Question papers 141
External Question papers 142
References 147
Database Management Systems Department of CSE
UNIT – I
Introduction to Database Management Systems & Database Design
Overview:
Unit – 1 provides a general overview of the nature and purpose of database systems. It also
explains how the concept of a database system has developed, what the common features of
database systems are, what a database system does for the user, and how a database system
interfaces with operating systems. This unit is motivational, historical, and explanatory in nature.
Details about the entity relationship model. This model provides a high level view of the issues.
To design the data base we need to follow proper way that way is called data model. So we see
how to use the E-R model to design the data base.
What is a Database?
To find out what database is, we have to start from data, which is the basic building
block of any DBMS.
Data: Facts, figures, statistics etc. having no particular meaning (e.g. 1, ABC, 19 etc).
Record: Collection of related data items, e.g. in the above example the three data items had no
meaning. But if we organize them in the following way, then they collectively represent
meaningful information.
Roll Name Age
1 ABC 19
Table or Relation: Collection of related records.
Roll Name Age

1 ABC 19
2 DEF 22
3 XYZ 28
The columns of this relation are called Fields, Attributes or Domains. The rows are called
Tuples or Records.
Database: Collection of related relations.
Database-management system (DBMS) is a collection of interrelated data and a set of
programs to access those data. This is a collection of related data with an implicit meaning and
hence is a database.
DBMS is software which is used to manage the collection of interrelated data.
1.1 Database Management System (DBMS) and Its Applications:
A Database management system is a computerized record-keeping system. It is a repository or a
container for collection of computerized data files. The overall purpose of DBMS is to allow the
users to define, store, retrieve and update the information contained in the database on demand.
Information can be anything that is of significance to an individual or organization.
MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 1

Databases touch all aspects of our lives. Some of the major areas of application are as
follows:
1. Banking
2. Airlines
3. Universities
4. Manufacturing and selling
5. Human resources
1.2 Purpose of DBMS systems:

It is a collection of programs that enables the user to create and maintain a database. In other words, it
is general-purpose software that provides the users with the processes of defining, constructing and
manipulating the database for various applications.
Database systems are designed to manage large bodies of information. Management of data
involves both defining structures for storage of information and providing mechanisms for the
manipulation of information. In addition, the database system must ensure the safety of the
information stored, despite system crashes or attempts at unauthorized access. If data are to be
shared among several users, the system must avoid possible anomalous results.
Disadvantages in File Processing:

A database management system (DBMS) is a collection of interrelated data and a set of programs to
access those data. This is a collection of related data with an implicit meaning and hence is a
database. The collection of data, usually referred to as the database, contains information relevant to
an enterprise. The primary goal of a DBMS is provide a way to store and retrieve database
information that is both convenient and efficient. By data, we mean known facts that can be
recorded and that have implicit meaning.
For example, consider the names, telephone numbers, and addresses of the people you know. You
may have recorded this data in an indexed address book, or you may have stored it on a diskette,
using a personal computer and software such as DBASE IV or V, Microsoft ACCESS, or EXCEL.
The following are the main disadvantages of DBMS in File Processing:
 Data redundancy and inconsistency.

 Difficult in accessing data.
 Data isolation.
 Data integrity.
 Concurrent access is not possible.
 Security Problems.
Advantages of DBMS:
Because information is so important in most organizations, computer scientists have developed a
large body of concepts and techniques for managing data. These concepts and technique form the
focus of this book.

 Data Independence.
 Efficient Data Access.
 Data Integrity and security.
 Data administration.
 Concurrent access and Crash recovery.
 Reduced Application Development Time.
1.3 View of data:

The man purpose of DBMS is to provide users with an abstract view of the data.
The data abstraction is in three levels.
Physical level: How the data are actually stored that means what data structures are used to store
data on Hard disk
Ex: Sequential , Tree structured
Logical Level : What data are stored in database
View level : It is the part of data base
Ex: Required records in table.
Instance: The collection of information stored in the database at a particular moment is called an
instance of the data base.
Schema:Database schema skeleton structure of and it represents the logical view of entire
database. It tells about how the data is organized and how relation among them is associated
Data independence:
The ability to modify schema definition in one level with affecting a schema definition in the
next higher level is called data independence.
Physical independence,Logical independence

Data models:
Underlying the structure of a data base is the data model.The collection of conceptual tools for
describing data, data relationships, data semantics.
Three types of data models are there:
Object based logical model:

These are used to describe the data at logical level, view level. It is divided into several
types.
Entity relationship model:

Object oriented model
Semantic data model
Function data model
Record based logical model:

In contrast to object based model they are used both to specify the overall logical structure of
data base and to pride a higher level description of the implementation.
E-R Model
Relational model
Network model
Hierarchical model
1.4 Database Languages:

A data sublanguage mainly has two parts:
Data Definition Language (DDL) ,Data Manipulation Language (DML).
The Data Definition Language is used for specifying the database schema and the Data
Manipulation Language is used for both reading and updating the database. These languages are
called data sub-languages as they do not include constructs for all computational requirements.
Computation purposes include conditional or iterative statements that are supported by the high-
level programming languages. Many DBMSs have a capability to embed the sublanguage in a
high-level programming language such as ‘Fortran’, ‘C’, C++, Java, or Visual Basic. Here, the
high-level language is sometimes referred to as the host language as it is acting like a host for
this language. To compile the embedded file, the commands in the data sub-language are first
detached from the host-language program and are substituted by function calls. The pre-
processed file is then compiled and placed in an object module which gets linked with a DBMS-
specific library that is having the replaced functions, and executed based on requirement. Most
data sub-languages also supply non-embedded or interactive commands which can be input
directly using terminal.

Data Definition Language:
Data Definition Language (DDL) statements are used to classify the database structure or
schema. It is a type of language that allows the DBA or user to depict and name those entities,
attributes, and relationships that are required for the application along with any associated
integrity and security constraints. Here are the lists of tasks that come under DDL:
 CREATE – used to create objects in the database

 ALTER – used to alters the structure of the database
 DROP – used to delete objects from the database
 TRUNCATE – used to remove all records from a table, including all spaces allocated for
the records are removed
 COMMENT – used to add comments to the data dictionary
 RENAME – used to rename an object
Data Manipulation Language:
A language that offers a set of operations to support the fundamental data manipulation
operations on the data held in the database. Data Manipulation Language (DML) statements are
used to manage data within schema objects. Here are the lists of tasks that come under DML:
 SELECT – It retrieve data from the a database

 INSERT – It inserts data into a table
 UPDATE – It updates existing data within a table
 DELETE – It deletes all records from a table, the space for the records remain
 MERGE – UPSERT operation (insert or update)
 CALL – It calls a PL/SQL or Java subprogram
Data Control Language:

There is another two forms of database sub-languages. The Data Control Language (DCL) is
used to control privilege in Database. To perform any operation in the database, such as for
creating tables, sequences or views we need privileges. Privileges are of two types,
 System – creating session, table etc are all types of system privilege.
 Object – any command or query to work on tables comes under object privilege. DCL is
used to define two commands. These are:
 Grant – It gives user access privileges to database.
 Revoke – It takes back permissions from user.

Transaction Control Language (TCL):
Transaction Control statements are used to run the changes made by DML statements. It allows
statements to be grouped together into logical transactions.
 COMMIT – It saves the work done

 SAVEPOINT – It identifies a point in a transaction to which you can later roll back
 ROLLBACK – It restores database to original since the last COMMIT
 SET TRANSACTION – It changes the transaction options like isolation level and what
rollback segment to use
1.5 Relational Databases:

A relational database contains two or more tables that are related to each other in some way. For
example, a database might contain a Customers table and an Invoices table that contains the
customer's orders.
A relational database is a set of formally described tables from which data can be accessed or
reassembled in many different ways without having to reorganize the database tables. The
standard user and application programming interface (API) of a relational database is
the Structured Query Language (SQL). SQL statements are used both for interactive queries for
information from a relational database and for gathering data for reports.
When creating a relational database, you can define the domain of possible values in a data
column and further constraints that may apply to that data value. For example, a domain of
possible customers could allow up to 10 possible customer names but be constrained in one table
to allowing only three of these customer names to be specifiable.
Examples of relational databases: Standard relational databases enable users to manage

predefined data relationships across multiple databases. Popular examples of relational databases
include Microsoft SQL Server, Oracle Database, MySQL and IBM DB2.

1.6 Database Design:
Database design is the process of producing a detailed data model of a database. This data
model contains all the needed logical and physical design choices and physical storage
parameters needed to generate a design in a data definition language, which can then be used to
create a database.
1.7 Structure of a DBMS:
The architecture of a database system is greatly influenced by the underlying computer system
on which the database system runs. Database systems can be centralized, or client-server, where
one server machine executes work on behalf of multiple client machines. Database systems can
also be designed to exploit parallel computer architectures. Distributed databases span multiple
geographically separated machines.
At very high level, a database is considered as shown in below diagram. Let us see them in
detail below.
 Applications: - It can be considered as a user friendly web page where the user enters the
requests. Here he simply enters the details that he needs and presses buttons to get the
data.

 End User: - They are the real users of the database. They can be developers, designers,
administrator or the actual users of the database.
 DDL: - Data Definition Language (DDL) is a query fired to create database, schema,
tables, mappings etc in the database. These are the commands used to create the
objects like tables, indexes in the database for the first time. In other words, they create
structure of the database.
 DDL Compiler: - This part of database is responsible for processing the DDL
commands. That means these compiler actually breaks down the command into
machine understandable codes. It is also responsible for storing the metadata
information like table name, space used by it, number of columns in it, mapping
information etc.
 DML Compiler: - When the user inserts, deletes, updates or retrieves the record from the
database, he will be sending request which he understands by pressing some buttons.
But for the database to work/understand the request, it should be broken down to
object code. This is done by this compiler. One can imagine this as when a person is
asked some question, how this is broken down into waves to reach the brain!
 Query Optimizer: - When user fires some request, he is least bothered how it will be
fired on the database. He is not all aware of database or its way of performance. But
whatever be the request, it should be efficient enough to fetch, insert, update or delete
the data from the database. The query optimizer decides the best way to execute the
user request which is received from the DML compiler. It is similar to selecting the
best nerve to carry the waves to brain!
 Stored Data Manager: - This is also known as Database Control System. It is one the
main central system of the database. It is responsible for various tasks
o It converts the requests received from query optimizer to machine
understandable form. It makes actual request inside the database. It is like
fetching the exact part of the brain to answer.
o It helps to maintain consistency and integrity by applying the constraints. That
means, it does not allow inserting / updating / deleting any data if it has child
entry. Similarly it does not allow entering any duplicate value into database
tables.
o It controls concurrent access. If there is multiple users accessing the database at
the same time, it makes sure, all of them see correct data. It guarantees that
there is no data loss or data mismatch happens between the transactions of
multiple users.
o It helps to backup the database and recover data whenever required. Since it is
a huge database and when there is any unexpected exploit of transaction, and
reverting the changes are not easy. It maintains the backup of all data, so that it
can be recovered.
 Data Files: - It has the real data stored in it. It can be stored as magnetic tapes, magnetic
disks or optical disks.

 Compiled DML: - Some of the processed DML statements (insert, update, delete) are
stored in it so that if there is similar requests, it will be re-used.
 Data Dictionary: - It contains all the information about the database. As the name
suggests, it is the dictionary of all the data items. It contains description of all the
tables, view, materialized views, constraints, indexes, triggers etc.
Figure 1.3 shows the structure of a typical DBMS.

1.8 Data Mining And Information Retreival:
Information Retrieval - the ability to query a computer system to return relevant results.
The most widely used example is the Google web search engine.
Data Mining - the ability to retrieve information from one or more data sources in order to
combine it, cluster it, visualize it and discover patterns in the data.
Big Data - the ability to manipulate huge volumes of data (that far exceed the capacity of a
single machine) in order to perform data mining techniques on that data.
Text/data mining currently involves analyzing a large collection of often unrelated digital items
in a systematic way and to discover previously unknown facts, which might take the form of
relationships or patterns that are buried deep in an extensive collection. These relationships
would be extremely difficult, if not impossible, to discover using traditional manual-based
search and browse techniques. Both text and data mining build on the corpus of past
publications and build not so much on the shoulders of giants as on the breadth of past
published knowledge and accumulated mass wisdom.

1.9 Database users and Administrators:
Database users are the one who really use and take the benefits of database. There will be
different types of users depending on their need and way of accessing the database.
Database Users:
Application Programmers - They are the developers who interact with the database by means
of DML queries. These DML queries are written in the application programs like C, C++,
JAVA, Pascal etc. These queries are converted into object code to communicate with the
database. For example, writing a C program to generate the report of employees who are
working in particular department will involve a query to fetch the data from database. It will
include a embedded SQL query in the C Program.
Sophisticated Users - They are database developers, who write SQL queries to
select/insert/delete/update data. They do not use any application or programs to request the
database. They directly interact with the database by means of query language like SQL. These
users will be scientists, engineers, analysts who thoroughly study SQL and DBMS to apply the
concepts in their requirement. In short, we can say this category includes designers and
developers of DBMS and SQL.
Specialized Users - These are also sophisticated users, but they write special database
application programs. They are the developers who develop the complex programs to the
requirement.
Stand-alone Users - These users will have stand –alone database for their personal use. These
kinds of database will have readymade database packages which will have menus and graphical
interfaces.
Native Users - these are the users who use the existing application to interact with the database.
For example, online library system, ticket booking systems, ATMs etc which has existing
application and users use them to interact with the database to fulfill their requests.
Database Administrators: The life cycle of database starts from designing, implementing to
administration of it. A database for any kind of requirement needs to be designed perfectly so

that it should work without any issues. Once all the design is complete, it needs to be installed.
Once this step is complete, users start using the database. The database grows as the data grows
in the database. When the database becomes huge, its performance comes down. Also accessing
the data from the database becomes challenge. There will be unused memory in database,
making the memory inevitably huge. These administration and maintenance of database is
taken care by Database Administrator
DBA has many responsibilities. A good performing database is in the hands of DBA.
Installing and upgrading the DBMS Servers: - DBA is responsible for installing a new
DBMS server for the new projects. He is also responsible for upgrading these servers as there
are new versions comes in the market or requirement. If there is any failure in upgradation of
the existing servers, he should be able revert the new changes back to the older version, thus
maintaining the DBMS working. He is also responsible for updating the service packs/ hot
fixes/ patches to the DBMS servers.
Design and implementation: - Designing the database and implementing is also DBA’s
responsibility. He should be able to decide proper memory management, file organizations,
error handling, log maintenance etc for the database.
Performance tuning: - Since database is huge and it will have lots of tables, data, constraints
and indices, there will be variations in the performance from time to time. Also, because of
some designing issues or data growth, the database will not work as expected. It is
responsibility of the DBA to tune the database performance. He is responsible to make sure all
the queries and programs works in fraction of seconds.
Migrate database servers: - Sometimes, users using oracle would like to shift to SQL server
or Netezza. It is the responsibility of DBA to make sure that migration happens without any
failure, and there is no data loss.
Backup and Recovery: - Proper backup and recovery programs needs to be developed by
DBA and has to be maintained him. This is one of the main responsibilities of DBA.
Data/objects should be backed up regularly so that if there is any crash, it should be recovered
without much effort and data loss.

Security: - DBA is responsible for creating various database users and roles, and giving them
different levels of access rights.
Documentation: - DBA should be properly documenting all his activities so that if he quits or
any new DBA comes in, he should be able to understand the database without any effort. He
should basically maintain all his installation, backup, recovery, security methods. He should keep
various reports about database performance.
1.10 A Brief History of Database Management Systems:
A Database Management System allows a person to organize, store, and retrieve data from a
computer. It is a way of communicating with a computer’s “stored memory.” In the very early
years of computers, “punch cards” were used for input, output, and data storage. Punch cards
offered a fast way to enter data, and to retrieve it. Herman Hollerith is given credit for adapting
the punch cards used for weaving looms to act as the memory for a mechanical tabulating
machine, in 1890. Much later, databases came along.
Databases (or DBs) have played a very important part in the recent evolution of computers. The
first computer programs were developed in the early 1950s, and focused almost completely on
coding languages and algorithms. At the time, computers were basically giant calculators and
data (names, phone numbers) was considered the leftovers of processing information.
Computers were just starting to become commercially available, and when business people
started using them for real-world purposes, this leftover data suddenly became important.
Enter the Database Management System (DBMS). A database, as a collection of information,

can be organized so a Database Management System can access and pull specific information.
In 1960, Charles W. Bachman designed the Integrated Database System, the “first” DBMS.
IBM, not wanting to be left out, created a database system of their own, known as IMS. Both
database systems are described as the forerunners of navigational databases.
By the mid-1960s, as computers developed speed and flexibility, and started becoming popular,
many kinds of general use database systems became available. As a result, customers demanded
a standard be developed, in turn leading to Bachman forming the Database Task Group. This
group took responsibility for the design and standardization of a language called Common

Business Oriented Language (COBOL). The Database Task Group presented this standard in
1971, which also came to be known as the “CODASYL approach.”
The CODASYL approach was a very complicated system and required substantial training. It
depended on a “manual” navigation technique using a linked data set, which formed a large
network. Searching for records could be accomplished by one of three techniques:
Using the primary key (also known as the CALC key)

Moving relationships (also called sets) to one record from another
Scanning all records in sequential order
Eventually, the CODASYL approach lost its popularity as simpler, easier-to-work-with systems
came on the market.
Edgar Codd worked for IBM in the development of hard disk systems, and he was not happy
with the lack of a search engine in the CODASYL approach, and the IMS model. He wrote a
series of papers, in 1970, outlining novel ways to construct databases. His ideas eventually
evolved into a paper titled, A Relational Model of Data for Large Shared Data Banks, which
described new method for storing data and processing large databases. Records would not be
stored in a free-form list of linked records, as in CODASYL navigational model, but instead
used a “table with fixed-length records.”
RDBM Systems were an efficient way to store and process structured data. Then, processing
speeds got faster, and “unstructured” data (art, photographs, music, etc.) became much more
common place. Unstructured data is both non-relational and schema-less, and Relational
Database Management Systems simply were not designed to handle this kind of data.
NoSQL: NoSQL (“Not only” Structured Query Language) came about as a response to the
Internet and the need for faster speed and the processing of unstructured data. Generally
speaking, NoSQL databases are preferable in certain use cases to relational databases because
of their speed and flexibility. The NoSQL model is non-relational and uses a “distributed”
database system. This non-relational system is fast, uses an ad-hoc method of organizing data,
and processes high-volumes of different kinds of data.
“Not only” does it handle structured and unstructured data, it can also process unstructured Big
Data, very quickly. The widespread use of NoSQL can be connected to the services offered by
Twitter, LinkedIn, Facebook, and Google. Each of these organizations store and process

colossal amounts of unstructured data. These are the advantages NoSQL has over SQL and
RDBM Systems:
Higher scalability
A distributed computing system
Lower costs
A flexible schema
Can process unstructured and semi-structured data
Has no complex relationship
Unfortunately, NoSQL does come with some problems. Some NoSQL databases can be quite
resource intensive, demanding high RAM and CPU allocations. It can also be difficult to find
tech support if your open source NoSQL system goes down.
1.11 Introduction to Data base design:
The database design process can be divided into six steps.The ER model is most relevant to the
first three steps.
Requirements Analysis:
The very first step in designing a database application is to understand what data is to be stored
in the database, what applications must be built on top of it, and what operations are most
frequent and subject to performance requirements. In other words, we must find out what the
users want from the database
Conceptual Database Design:

The information gathered in the requirements analysis step is used to develop to high level
description of the data to the stored in the database, along with the constraints that are known to
hold over this data. This step is often carried out using the ER model, or a similar high level
data model.
Logical Database Design:
We must choose a DBMS to implement our database design, and convert the conceptual
database design into a database schema in the data model of the chosen DBMS.

Schema Refinement:
The fourth step in database design is to analyse the collection of relations in our relational
database schema to identify potential problems, and to refine it. In contrast to the requirements
analysis and conceptual design steps, which are essentially subjective, schema refinement can
be guided by some elegant and powerful theory.
Physical Database Design:
In this step we must consider typical expected workloads that our database must support and
further refine the database design to ensure that it meets desired performance criteria. This tep
may simply involve building indexes on some tables and clustering some tables, or it may
involve a substantial redesign of parts of the database schema obtained from the earlier design
steps.
Security Design:
In this step, we identify different user groups and different roles played by various users (Eg :
the development team for a product, the customer support representatives, the product manager
).
For each role and user group, we must identify the parts of the database that they must be able
to access and the parts of the database that they should not be allowed to access, and take steps
to ensure that they can access.
1.12 Entities:
The entity relationship (E-R) data model is based on a perception of a real world that consists of
a set of basic objects called entities, and of relationships among these objects.
Rectangles- which represent entity sets
Ellipse-which represent attributes
Diamonds-which represent relationship sets
Lines-which link attributes to entity sets and entity sets to relationship sets
Double ellipses-which represent multivalued attributes
Double lines- which indicate total participation of an entity in a relationship set
The appropriate mapping cardinality for a particular relationship set is obviously dependent on
the real world situation that is being modeled by the relationship set. The overall logical

structure of a database can be expressed graphically by an E-R diagram, which is built up from
the following components.
Rectangles, which represent entity sets

Ellipse, which represent attributes
Diamonds, which represent relationship sets
Lines, which link attributes to entity sets and entity sets to relationship sets
Double ellipses, which represent multivalued attributes
Double lines, which indicate total participation of an entity in a relationship set
Entity: An entity is a real-world object or concept which is distinguishable from other objects.
It may be something tangible, such as a particular student or building. It may also be somewhat
more conceptual, such as CS A-341, or an email address.
Attributes: These are used to describe a particular entity (e.g. name, SS#, height).

Domain: Each attribute comes from a specified domain (e.g., name may be a 20 character
string; SS# is a nine-digit integer)
Entity set: a collection of similar entities (i.e., those which are distinguished using the same set
of attributes. As an example, I may be an entity, whereas Faculty might be an entity set to
which I belong. Note that entity sets need not be disjoint. I may also be a member of Staff or
of Softball Players.
Key: a minimal set of attributes for an entity set, such that each entity in the set can be uniquely
identified. In some cases, there may be a single attribute (such as SS#) which serves as a key,
but in some models you might need multiple attributes as a key ("Bob from Accounting").
There may be several possible candidate keys. We will generally designate one such key as
the primary key.
ER diagrams:
It is often helpful to visualize an ER model via a diagram. There are many variant conventions
for such diagrams; we will adapt the one used in the text.
Diagram conventions
 An entity set is drawn as a rectangle.
 Attributes are drawn as ovals.
 Attributes which belong to the primary key are underlined.

Example:
ER Model
Entity relationship model defines the conceptual view of database. It works around real world
entity and association among them. At view level, ER model is considered well for designing
databases.

Entity: A real-world thing either animate or inanimate that can be easily identifiable and
distinguishable. For example, in a school database, student, teachers, class and course offered
can be considered as entities. All entities have some attributes or properties that give them their
identity.
An entity set is a collection of similar types of entities. Entity set may contain entities with
attribute sharing similar values. For example, Students set may contain all the student of a
school; likewise Teachers set may contain all the teachers of school from all faculties. Entities
sets need not to be disjoint.
1.13 Attributes and Entity sets
Entities are represented by means of their properties, called attributes. All attributes have
values. For example, a student entity may have name, class, age as attributes.
There exists a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.
Types of Attributes
Simple attribute
Simple attributes are atomic values, which cannot be divided further. For example, student's
phone-number is an atomic value of 10 digits.
Composite attribute
Composite attributes are made of more than one simple attribute. For example, a student's
complete name may have first_name and last_name.
Derived attribute
Derived attributes are attributes, which do not exist physical in the database, but there values
are derived from other attributes presented in the database. For example, average_salary in a
department should be saved in database instead it can be derived. For another example, age can
be derived from data_of_birth.
Single-valued attribute
Single valued attributes contain on single value. For example: Social_Security_Number.
Multi-value attribute
Multi-value attribute may contain more than one values. For example, a person can have more
than one phone numbers, email_addresses etc.
These attribute types can come together in a way like:
o simple single-valued attributes

o simple multi-valued attributes

o composite single-valued attributes
o composite multi-valued attributes
Entity-Sets & Keys
Key is an attribute or collection of attributes that uniquely identifies an entity among entity set.
Example:roll_number of a student makes her/him identifiable among students.
o Super Key: Set of attributes (one or more) that collectively identifies an entity in an
entity set.
o Candidate Key: Minimal super key is called candidate key that is, supers keys for
which no proper subset are a superkey. An entity set may have more than one candidate
key.
o Primary Key: This is one of the candidate key chosen by the database designer to
uniquely identify the entity set.
1.14 Relationships and Relationship sets
The association among entities is called relationship. For example, employee entity has relation
works_at with department. Another example is for student who enrolls in some course. Here,
Works_at and Enrolls are called relationship.
Relationship Set
Relationship of similar type is called relationship set. Like entities, a relationship too can have
attributes. These attributes are called descriptive attributes.
Degree of Relationship
The number of participating entities in an relationship defines the degree of the relationship.
o Binary = degree 2
o Ternary = degree 3
o n-ary = degree
Mapping Cardinalities
Cardinality defines the number of entities in one entity set which can be associated to the
number of entities of other set via relationship set.
o One-to-one: one entity from entity set A can be associated with at most one entity of
entity set B and vice versa.

o One-to-many: One entity from entity set A can be associated with more than one
entities of entity set B but from entity set B one entity can be associated with at most
one entity.
o Many-to-one: More than one entities from entity set A can be associated with at most
one entity of entity set B but one entity from entity set B can be associated with more
than one entity from entity set A.
o Many-to-many: one entity from A can be associated with more than one entity from B
and vice versa.

1.15 Additional Features of ER Diagram:
Ternary Relationship SetA relationship set need not be an association of precisely two entities;
it can involve three or more when applicable. Here is another example from the text, in which a
store has multiple locations.
Using several entities from same entity set

A relationship might associate several entities from the same underlying entity set, such
as in the following example, Reports_To. In this case, an additional role indicator (e.g.,
"supervisor") is used in the diagram to further distinguish the two similar entities.

Specifying additional constraints:
If you took a 'snapshot' of the relationship set at some instant in time, we will call this
an instance..
A (binary) relationship set can further be classified as either

o many-to-many
o one-to-many
o one-to-one
based on whether an individual entity from one of the underlying sets is allowed to be in more
than one such relationship at a time. The above figure contains a many-to-many relationship, as
departments may employ more than one person at a time, and an individual person may be
employed by more than one department.
Sometimes, an additional constraint exists for a given relationship set, that any entity from one
of the associated sets appears in at most one such relationship. For example, consider a
relationship set "Manages" which associates departments with employees. If a department
cannot have more than one manager, this is an example of a one-to-many relationship set (it
may be that an individual manages multiple departments).

This type of constraint is called a key constraint. It is represented in the ER diagrams by
drawing an arrow from an entity set E to a relationship set R when each entity in an instance of
E appears in at most one relationship in (a corresponding instance of) R.
An instance of this relationship is given in Figure 2.7.
If both entity sets of a relationship set have key constraints, we would call this a "one-to-one"
relationship set. In general, note that key constraints can apply to relationships between more
than two entities, as in the following example.
An instance of this relationship:

Participation Constraints
Recall that a key constraint requires that each entity of a set be required to participate in at most
one relationship. Dual to this, we may ask whether each entity of a set be required to participate
in at least one relationship.
If this is required, we call this a total participation constraint; otherwise the participation
is partial. In our ER diagrams, we will represent a total participation constraint by using
a thick line.
Weak Entities
There are times you might wish to define an entity set even though its attributes do not formally
contain a key (recall the definition for a key).

Usually, this is the case only because the information represented in such an entity set is only
interesting when combined through an identifying relationship set with another entity set we
call the identifying owner.
We will call such a set a weak entity set, and insist on the following:
 The weak entity set must exhibit a key constraint with respect to the identifying
relationship set.
 The weak entity set must have total participation in the identifying relationship set.
Together, this assures us that we can uniquely identify each entity from the weak set by
considering the primary key of its identifying owner together with a partial key from the weak
entity.
In our ER diagrams, we will represent a weak entity set by outlining the entity and the
identifying relationship set with dark lines. The required key constraint and total participation
are diagrammed with our existing conventions. We underline the partial key with a dotted line.
Class Hierarchies
As with object-oriented programming, it is often convenient to classify an entity sets as a

subclass of another. In this case, the child entity set inherits the attributes of the parent entity
set. We will denote this scenario using an "ISA" triangle, as in the following ER diagram:

Furthermore, we can impose additional constraints on such subclassing. By default, we will

assume that two subclasses of an entity set are disjoint. However, if we wish to allow an entity
to lie in more than one such subclass, we will specify an overlap constraint. (e.g.
"Contract_Emps OVERLAPS Senior_Emps")
Dually, we can ask whether every entity in a superclass be required to lie in (at least) one
subclass. By default we will not assume not, but we can specify a covering constraint if
desired. (e.g. "Motorboats AND Cards COVER Motor_Vehicles")
Aggregation
Thus far, we have defined relationships to be associations between two or more entities.
However, it sometimes seems desirable to define a new relationship which associates some
entity with some other existing relationship. To do this, we will introduce a new feature to our
model called aggregation. We identifying an existing relationship set by enclosing it in a larger
dashed box, and then we will allow it to participate in another relationship set.
A motivating example follows:

1.16 Conceptual Design with the ER Model
It is most important to recognize that there is more than one way to model a given situation.
Our next goal is to start to compare the pros and cons of common choices.
Should a concept be modeled as an entity or an attribute?
Consider the scenario, if we want to add address information to the Employees entity set? We
might choose to add a single attribute address to the entity set. Alternatively, we could
introduce a new entity set, Addresses and then a relationship associating employees with
addresses. What are the pros and cons?
Adding a new entity set is more complex model. It should only be done when there is need for
the complexity. For example, if some employees have multiple address to be associated, then
the more complex model is needed. Also, representing addresses as a separate entity would
allow a further breakdown, for example by zip code or city.
What if we wanted to modify the Works_In relationship to have both a start and end date, rather
than just a start date. We could add one new attribute for the end date; alternatively, we could
create a new entity set Duration which represents intervals, and then the Works_In relationship
can be made ternary (associating an employee, a department and an interval). What are the pros
and cons?

If the duration is described through descriptive attributes, only a single such duration can be
modeled. That is, we could not express an employment history involving someone who left the
department yet later returned.
Should a concept be modeled as an entity or a relationship?
Consider a situation in which a manager controls several departments. Let's presume that a
company budgets a certain amount (budget) for each department. Yet it also wants managers to
have access to some discretionary budget (dbudget). There are two corporate models. A
discretionary budget may be created for each individual department; alternatively, there may be
a discretionary budget for each manager, to be used as she desires.
Which scenario is represented by the following ER diagram? If you want the alternate
interpretation, how would you adjust the model?
Should we use binary or ternary relationships?

Consider the following ER diagram, representing insurance policies owned by employees at a
company. Each employee can own several polices, each policy can be owned by several
employees, and each dependent can be covered by several policies.
What if we wish to model the following additional requirements:

 A policy cannot be owned jointly by two or more employees.
 Every policy must be owned by some employee.

Dependents is a weak entity set, and each dependent entity is uniquely identified by
taking pname in conjunction with the policyid of a policy entity (which, intuitively, covers the
given dependent).
The best way to model this is to switch away from the ternary relationship set, and instead use
two distinct binary relationship sets.
Should we use aggregation?

Consider again the following ER diagram:
If we did not need the until or since attributes. In this case, we could model the identical setting
using the following ternary relationship:

Let's compare these two models. What if we wanted to add an additional constraint to
each, that each sponsorship (of a project by a department) be monitored by at most one
employee. Can you add this constraint to either of the above models.

UNIT – II
Relational Model, Relational Algebra and Calculus
2.1 Introduction to the Relational Model
The main construct for representing data in the relational model is a relation. A relation
consists of a relation schema and a relation instance. The relation instance is a table, and the
relation schema describes the column heads for the table. We first describe the relation schema
and then the relation instance. The schema specifies the relation’s name, the name of each field
(or column, or attribute), and the domain of each field. A domain is referred to in a relation
schema by the domain name and has a set of associated values.
Eg: Students(sid: string, name: string, login: string, age: integer, gpa: real)
This says, for instance, that the field named sid has a domain named string. The set of values
associated with domain string is the set of all character strings.
An instance of a relation is a set of tuples, also called records, in which each tuple has the
same number of fields as the relation schema. A relation instance can be thought of as a table in
which each tuple is a row, and all rows have the same number of fields.
A relation schema specifies the domain of each field or column in the relation instance.
These domain constraints in the schema specify an important condition that we want each
instance of the relation to satisfy: The values that appear in a column must be drawn from the
domain associated with that column. Thus, the domain of a field is essentially the type of that
field, in programming language terms, and restricts the values that can appear in the field.

Domain constraints are so fundamental in the relational model that we will henceforth
consider only relation instances that satisfy them; therefore, relation instance means relation
instance that satisfies the domain constraints in the relation schema.
The degree, also called arity, of a relation is the number of fields. The cardinality of a relation
instance is the number of tuples in it. In Figure 3.1, the degree of the relation (the number of
columns) is five, and the cardinality of this instance is six.
A relational database is a collection of relations with distinct relation names. The
relational database schema is the collection of schemas for the relations in the database.
Creating and Modifying Relations
The SQL-92 language standard uses the word table to denote relation, and we will often
follow this convention when discussing SQL. The subset of SQL that supports the creation,
deletion, and modification of tables is called the Data Definition Language (DDL).
To create the Students relation, we can use the following statement:
The CREATE TABLE statement is used to define a new table.

CREATE TABLE Students ( sid CHAR(20), name CHAR(30), login
CHAR(20), age INTEGER, gpa REAL )
Tuples are inserted using the INSERT command. We can insert a single tuple into the
Students table as follows:
INSERT INTO Students (sid, name, login, age, gpa) VALUES (53688, ‘Smith’, ‘smith@ee’,
18, 3.2)
We can delete tuples using the DELETE command. We can delete all Students tuples with
name equal to Smith using the command:
DELETE FROM Students S WHERE S.name = ‘Smith’
We can modify the column values in an existing row using the UPDATE command. For
example, we can increment the age and decrement the gpa of the student with sid 53688:
UPDATE Students S SET S.age = S.age + 1, S.gpa = S.gpa - 1 WHERE S.sid = 53688
2.2 Integrity Constraints over Relations:

An integrity constraint (IC) is a condition that is specified on a database schema, and
restricts the data that can bstored in an instance of the database. If a database instance satisfies
all the integrity constraints specified on the database schema, it is a legal instance. A DBMS
enforces integrity constraints, in that it permits only legal instances to be stored in the database.

Relational Model – Constraints
 Integrity Constraints: An integrity constraint (IC) is a condition specified on a

database schema and restricts the data that can be stored in an instance of the database.
If a database instance satisfies all the integrity constraints specifies on the database
schema, it is a legal instance. A DBMS permits only legal instances to be stored in the
database. Many kinds of integrity constraints can be specified in the relational model:
 Domain Constraints:A relation schema specifies the domain of each field in the
relation instance. These domain constraints in the schema specify the condition that
each instance of the relation has to satisfy: The values that appear in a column must be
drawn from the domain associated with that column. Thus, the domain of a field is
essentially the type of that field.
Key Constraints
A Key Constraint is a statement that a certain minimal subset of the fields of a relation is a
unique identifier for a tuple.
 Super Key:An attribute, or set of attributes, that uniquely identifies a tuple within a
relation.However, a super key may contain additional attributes that are not necessary
for a unique identification.
Example: The customer_id of the relation customer is sufficient to distinguish one tuple
from other. Thus,customer_id is a super key. Similarly, the combination
of customer_id and customer_name is a super key for the relation customer. Here
the customer_name is not a super key, because several people may have the same
name. We are often interested in super keys for which no proper subset is a super key.
Such minimal super keys are called candidate keys.
 Candidate Key:A super key such that no proper subset is a super key within the
relation.There are two parts of the candidate key definition:
o Two distinct tuples in a legal instance cannot have identical values in all the
fields of a key
o No subset of the set of fields in a candidate key is a unique identifier for a
tuple.A relation may have several candidate keys.
Example: The combination of customer_name and customer_street is sufficient to

distinguish the members of the customer relation. Then both, {customer_id} and
{customer_name, customer_street} are candidate keys.
Although customer_id and customer_name together can distinguish customer tuples,
their combination does not form a candidate key, since the customer_id alone is a
candidate key.

 Primary Key:The candidate key that is selected to identify tuples uniquely within the
relation. Out of all the available candidate keys, a database designer can identify
a primary key. The candidate keys that are not selected as the primary key are called
as alternate keys.
Features of the primary key:

o Primary key will not allow duplicate values.
o Primary key will not allow null values.
o Only one primary key is allowed per table.
Example: For the student relation, we can choose student_id as the primary key.
 Foreign Key:Foreign keys represent the relationships between tables. A foreign key is a
column (or a group of columns) whose values are derived from the primary key of some
other table.The table in which foreign key is defined is called a Foreign table or Details
table. The table that defines the primary key and is referenced by the foreign key is
called the Primary table or Master table.
Features of foreign key:

o Records cannot be inserted into a detail table if corresponding records in the
master table do not exist.
o Records of the master table cannot be deleted or updated if corresponding
records in the detail table actually exist.
General Constraints
Domain, primary key, and foreign key constraints are considered to be a fundamental part of
the relational data model. Sometimes, however, it is necessary to specify more general
constraints.
Example: we may require that student ages be within a certain range of values. Giving such an
IC, the DBMS rejects inserts and updates that violate the constraint.
Current database systems support such general constraints in the form of table
constraints andassertions. Table constraints are associated with a single table and checked
whenever that table is modified. In contrast, assertions involve several tables and are checked
whenever any of these tables is modified.
Example: for table constraint, which ensures always the salary of an employee, is above 1000:
CREATE TABLE employee (eid integer, ename varchar2(20), salary real,
CHECK(salary>1000));

Example: for assertion, which enforce a constraint that the number of boats plus the number of
sailors should be less than 100.
CREATE ASSERTION smallClub CHECK ((SELECT COUNT (S.sid) FROM Sailors S) +
(SELECT COUNT (B.bid) FROM Boats B) < 100);
Referential/Enforcing Integrity Constraints

This integrity constraints works on the concept of Foreign Key. A key attribute of a relation can
be referred in other relation, where it is called foreign key.
Referential integrity constraint states that if a relation refers to an key attribute of a different or
same relation, that key element must exists.
Querying Relational Data:
Specifying Key Constraints in SQL

CREATE TABLE Students ( sid CHAR(20), name CHAR(30), login CHAR(20), age
INTEGER, gpa REAL, UNIQUE (name, age), CONSTRAINT StudentsKey PRIMARY KEY
(sid) )
Foreign Key Constraints
Sometimes the information stored in a relation is linked to the information stored in another
relation. If one of the relations is modified, the other must be checked, and perhaps modified, to
keep the data consistent. An IC involving both relations must be specified if a DBMS is to
make such checks. The most common IC involving two relations is a foreign key constraint.
Suppose that in addition to Students, we have a second relation:
Enrolled(sid: string, cid: string, grade: string)

To ensure that only bonafide students can enroll in courses, any value that appears in the sid
field of an instance of the Enrolled relation should also appear in the sid field of some tuple in
the Students relation
The sid field of Enrolled is called a foreign key and refers to Students. The foreign key in the
referencing relation (Enrolled, in our example) must match the primary key of the referenced
relation (Students), i.e., it must have the same number of columns and compatible data types,
although the column names can be different.
Specifying Foreign Key Constraints in SQL
CREATE TABLE Enrolled ( sid CHAR(20), cid CHAR(20), grade CHAR(10), PRIMARY
KEY (sid, cid), FOREIGN KEY (sid) REFERENCES Students )
2.3 Enforcing Integrity Constraints:
Data integrity refers to the correctness and completeness of data within a database. To enforce
data integrity, you can constrain or restrict the data values that users can insert, delete, or update
in the database. For example, the integrity of data in the pubs2 and pubs3 databases requires
that a book title in the titles table must have a publisher in the publishers table. You cannot
insert books that do not have a valid publisher into titles, because it violates the data integrity
of pubs2 or pubs3.
Transact-SQL provides several mechanisms for integrity enforcement in a database such as
rules, defaults, indexes, and triggers. These mechanisms allow you to maintain these types of
data integrity:
 Requirement – requires that a table column must contain a valid value in every row; it
cannot allow null values. The create table statement allows you to restrict null values for a
column.
 Check or validity – limits or restricts the data values inserted into a table column. You
can use triggers or rules to enforce this type of integrity.

 Uniqueness – no two table rows can have the same non-null values for one or more table
columns. You can use indexes to enforce this integrity.
 Referential – data inserted into a table column must already have matching data in
another table column or another column in the same table. A single table can have up to 192
references.
As an alternative to using rules, defaults, indexes, and triggers, Transact-SQL provides a series
of integrity constraints as part of the create table statement to enforce data integrity as
defined by the SQL standards. These integrity constraints are described later in this chapter.
Consider the instance S1 of Students shown in Figure 3.1. The following insertion violates the
primary key constraint because there is already a tuple with the sid 53688, and it will be
rejected by the DBMS:
INSERT INTO Students (sid,name,login,age,gpa)VALUES (53688,‘Mike’,‘mike@ee’, 17, 3.4)
The following insertion violates the constraint that the primary key cannot contain null:
INSERT INTO Students (sid,name, login,age, gpa)VALUES (null, ‘Mike’, ‘mike@ee’, 17, 3.4)
2.4 Querying Relational Data

A relational database query is a question about the data, and the answer consists of a new
relation containing the result. For example, we might want to find all students younger than 18
or all students enrolled in Reggae203.
A query language is a specialized language for writing queries.
SQL is the most popular commercial query language for a relational DBMS. Consider the
instance of the Students relation shown in Figure 3.1. We can retrieve rows corresponding to
students who are younger than 18 with the following SQL query:
SELECT * FROM Students S WHERE S.age < 18
The symbol * means that we retain all fields of selected tuples in the result. The condition S.age
18 in the WHERE clause specifies that we want to select only tuples in which the age field has
a value less than 18.

2.5 Logical Data Base Design: ER to Relational
Conversion of ER Diagram to Relational Database
The ER Model is intended as a description of real-world entities. Although it is constructed in
such a way as to allow easy translation to the relational schema model, this is not an entirely
trivial process. The ER diagram represents the conceptual level of database design meanwhile
the relational schema is the logical level for the database design. We will be following the
simple rules:
1. Entities and Simple Attributes:

An entity type within ER diagram is turned into a table. You may preferably keep the same
name for the entity or give it a sensible name but avoid DBMS reserved words as well as avoid
the use of special characters.Each attribute turns into a column (attribute) in the table. The key
attribute of the entity is the primary key of the table which is usually underlined. It can be
composite if required but can never be null.
It is highly recommended that every table should start with its primary key attribute
conventionally named as TablenameID.
Taking the following simple ER diagram:
The initial relational schema is expressed in the following format writing the table names with
the attributes list inside a parentheses as shown below for
Persons( personid , name, lastname, email )
Persons and Phones are Tables. name, lastname, are Table Columns (Attributes).
personid is the primary key for the table : Person

2. Multi-Valued Attributes
A multi-valued attribute is usually represented with a double-line oval.
If you have a multi-valued attribute, take the attribute and turn it into a new entity or table of its
own. Then make a 1:N relationship between the new entity and the existing one. In
simplewords. 1. Create a table for the attribute. 2. Add the primary (id) column of the parent
entity as a foreign key within the new table as shown below:

Phones ( phoneid , personid, phone )
personid within the table Phones is a foreign key referring to the personid of Persons

3. 1:1 Relationships
To keep it simple and even for better performances at data retrieval, I would personally
recommend using attributes to represent such relationship. For instance, let us consider the case
where the Person has or optionally has one wife. You can place the primary key of the wife
within the table of the Persons which we call in this case Foreign key as shown below.
Persons( personid , name, lastname, email , wifeid )

Wife ( wifeid , name )
Or vice versa to put the personid as a foreign key within the Wife table as shown below:
Wife ( wifeid , name , personid)
For cases when the Person is not married i.e. has no wifeID, the attribute can set to NULL
4. 1:N Relationships
This is the tricky part ! For simplicity, use attributes in the same way as 1:1 relationship but we
have only one choice as opposed to two choices. For instance, the Person can have
a House from zero to many , but a House can have only one Person. To represent such
relationship the personid as the Parent node must be placed within the Child table as a foreign
key but not the other way around as shown next:
It should convert to :
House ( houseid , num , address, personid)
5. N:N Relationships
We normally use tables to express such type of relationship. This is the same for N − ary
relationship of ER diagrams. For instance, The Person can live or work in many countries.
Also, a country can have many people. To express this relationship within a relational schema
we use a separate table as shown below:

It should convert into :

Countries ( countryid , name, code)
HasRelat ( hasrelatid , personid , countryid)
Relationship with attributes:
It is recommended to use table to represent them to keep the design tidy and clean regardless
of the cardinality of the relationship.
Case Study
For the sake of simplicity, we will be producing the relational schema for the following ER
diagram:
The relational schema for the ER Diagram is given below as:
Company( CompanyID , name , address )

Staff( StaffID , dob , address , WifeID)

Child( ChildID , name , StaffID )
Wife ( WifeID , name )
Phone(PhoneID , phoneNumber , StaffID)
Task ( TaskID , description)
Work(WorkID , CompanyID , StaffID , since )
Perform(PerformID , StaffID , TaskID )
2.6 Introduction to Views

A view is a table whose rows are not explicitly stored in the database but are computed as
needed from a view definition. Consider the Students and Enrolled relations. Suppose that we
are often interested infinding the names and student identifiers of students who got a grade of B
in some course, together with the cid for the course. We can define a view for this purpose.
Using SQL notation:
CREATE VIEW B-Students (name, sid, course) AS SELECT S.sname, S.sid, E.cid FROM
Students S, Enrolled E WHERE S.sid = E.sid AND E.grade = ‘B’
This view can be used just like a base table, or explicitly stored table, in defining new queries
or
views. Given the instances of Enrolled and Students shown in Figure 3.4, BStudents contains
the tuples shown in Figure 3.18.
2.7 Destroying/Altering Tables and Views
To destroy views, use the DROP TABLE command. For example, DROP TABLE Students
RESTRICT destroys the Students table unless some view or integrity constraint refers to
Students; if so, the command fails. If the keyword RESTRICT is replaced by CASCADE,
Students is dropped and any referencing views or integrity constraints are (recursively) dropped
as well; one of these two keywords must always be specified. A view can be dropped using the
DROP VIEW command, which is just like DROP TABLE.
ALTER TABLE modifies the structure of an existing table. To add a column called maiden
Students, for example, we would use the following command:
ALTER TABLE Students ADD COLUMN maiden-name CHA(10)
The definition of Students is modified to add this column, and all existing rows are padded with
null values in this column. ALTER TABLE can also be used to delete columns and to add or
drop integrity constraints on a table.

2.8 Relational Algebra and Calculus:
Introduction To Relational Algebra
Overview: The Relational Model defines two root languages for accessing a relational database -
- Relational Algebra and Relational Calculus. Relational Algebra is a low-level, operator-
oriented language. Creating a query in Relational Algebra involves combining relational
operators using algebraic notation. Relational Calculus is a high-level, declarative language.
Creating a query in Relational Calculus involves describing what results are desired.
Relational algebra is one of the two formal query languages associated with the relational
model. Queries in algebra are composed using a collection of operators. A fundamental property
is that every operator in the algebra accepts (one or two) relation instances as arguments and
returns a relation instance as the result. This property makes it easy to compose operators to form
a complex query —a relationalalgebra expression is recursively defined to be a relation, a
unary algebra operator applied to a singleexpression, or a binary algebra operator applied to two
expressions. We describe the basic operators of the algebra (selection, projection, union, cross-
product, and difference).
Selection and Projection
Relational algebra includes operators to select rows from a relation (σ)and to project columns
(π).
These operations allows to manipulate data in a single relation. Consider the instance of the
Sailors relation shown in Figure 4.2, denoted as S2. We can retrieve rows corresponding to

expert sailors by using the s operator. The expression (S2) evaluates to the relation shown in
Figure 4.4. The subscript rating>8 specifies the selection criterion to be applied while retrieving
tuple
Set Operations: The following standard operations on sets are also available in relational
algebra: union (𝖴), intersection (n), set-difference (-), and cross-product (×).
Union: R𝖴S returns a relation instance containing all tuples that occur in either relation
instanceR or relation instance S (or both). R and S must be unioncompatible, and the schema of
the result is defined to be identical to the schema of R.
Intersection: RnS returns a relation instance containing all tuples that occur in both R and S.
Therelations R and S must be union-compatible, and the schema of the result is defined to be
identical to the schema of R.
Set-difference: R-S returns a relation instance containing all tuples that occur in R but not in
S.The relations R and S must be union-compatible, and the schema of the result is defined to be
identical to the schema of R.
Cross-product: R×S returns a relation instance whose schema contains all the fields of R
(inthesame order as they appear in R) followed by all the fields of S (in the same order as they
appear in S). The result of R × S contains one tuple r, s (the concatenation of tuples r and s) for
each pair of tuples r ∈ R, s ∈S. The cross-product opertion is sometimes calledCartesian
product.
Joins: The join operation is one of the most useful operations in relational algebra and is the
most commonlyused way to combine information from two or more relations. Although a join
can be defined as a cross-product followed by selections and projections, joins arise much more
frequently in practice than plain cross-products.
Condition Joins
The most general version of the join operation accepts a join condition c and a pair of relation
instances as arguments, and returns a relation instance. The join condition is identical to a
selection condition in form.
The operation is defined as follows:

As an example, the result of .
Select Operation (σ)

It selects tuples that satisfy the given predicate from a relation.
Notation − σp(r)
Where σ stands for selection predicate and r stands for relation. p is prepositional logic formula
which may use connectors like and, or, and not. These terms may use relational operators like
− =, ≠, ≥, < , >, ≤.
For example −
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database'.
σsubject = "database" and price = "450"(Books)

Output − Selects tuples from books where subject is 'database' and 'price' is 450.
σsubject = "database" and price = "450" or year > "2010"(Books)

Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those books
published after 2010.
Project Operation (∏)

It projects column(s) that satisfy a given predicate.
Notation − ∏A1, A2, An (r)
Where A1, A2 , An are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
For example −
∏subject, author (Books)

Selects and projects columns named as subject and author from the relation Books.

Union Operation (𝖴)

It performs binary union between two given relations and is defined as −
r𝖴 s = { t | t ∈ r or t ∈ s}
Notation − r U s
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold −
 r, and s must have the same number of attributes.

 Attribute domains must be compatible.
 Duplicate tuples are automatically eliminated.
∏ author (Books) 𝖴 ∏ author (Articles)
Output − Projects the names of the authors who have either written a book or an article or both.
Set Difference (−)

The result of set difference operation is tuples, which are present in one relation but are not in
the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
∏ author (Books) − ∏ author (Articles)

Output − Provides the name of authors who have written books but not articles.
Cartesian Product (Χ)

Combines information of two different relations into one.
Notation − r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
σauthor = 'tutorialspoint'(Books Χ Articles)

Output − Yields a relation, which shows all the books and articles written by tutorialspoint.

Rename Operation (ρ)

The results of relational algebra are also relations but without any name. The rename operation
allows us to rename the output relation. 'rename' operation is denoted with small Greek
letter rho ρ.
Notation − ρ x (E)
Where the result of expression E is saved with name of x.
Additional operations are −
 Set intersection
 Assignment
 Natural join
2.9 Relational Calculus:

Relational calculus is an alternative to relational algebra. In contrast to the algebra, which is
procedural, the calculus is nonprocedural, or declarative, in that it allows to describe the set of
answers without being explicit about how they should be computed.
The variant of the calculus that we present in detail is called the tuple relational calculus
(TRC). Variables in TRC take on tuples as values. In another variant, called the domain
relational calculus (DRC), the variables range over field values.
Tuple Relational Calculus: A tuple variable is a variable that takes on tuples of a

particular relation schema as values. That is, every value assigned to a given tuple variable has
the same number and type of fields. A tuple relational calculus query has the form { T | p(T)
},where T is a tuple variable and p(T) denotes a formula that describes T. The result of this query
is the set of all tuples t for which the formula p(T)evaluates to true with T = t. The language for
writing formulas p(T) is thus at the heart of TRC and is essentially a simple subset of first-order
logic.
As a simple example, consider the following query.
Find all sailors with a rating above 7.

{S | S ∈ Sailors 𝖠 S.rating > 7}

Syntax of TRC Queries: Let Rel be a relation name, R and S be tuple variables, a an attribute of
R,and b an attribute of S. Let op denote an operator in the set {<, >, =, =, =, =}. An atomic
formula is one of the following:
R ∈ Rel
R.a op S.b
R.a op constant, or constant op R.a
A formula is recursively defined to be one of the following, where p and q are themselves
formulas, and p(R) denotes a formula in which the variable R appears:
any atomic formula

¬p, p 𝖠 q, p ∨ q,orp ⇒ q
∃R(p(R)), where R is a tuple variable
∀R(p(R)), where R is a tuple variable
2.10 Domain Relational Calculus: A domain variable is a variable that ranges over the values
in the domain of some attribute (e.g., the variable can be assigned an integer if it appears in an
attribute whose domain is the set of integers). A DRC query has the form {x |
p(x1,x2,...,xn)},where each x is either a domain variable or a constant and p(x1,x2,...,xn) denotes
a DRC formula whose only free variables are the variables among the x i, 1 ≤ i ≥ n. The result
of this query is the set of all tuples x1,x2,...,xi for which the formula evaluates to true.
DRC formula is defined in a manner that is very similar to the definition of a TRC formula.
The main difference is that the variables are now domain variables. Let op denote an operator in
the set {<, >, =, =, =, =} and let X and Y be domain variables.
An atomic formula in DRC is one of the following:
0 <x1,x2,...,xn>∈Rel,where Rel is a relation with n attributes; each x, 1 ≤i≥ n is either a

variable or a constant.
1 X op Y
X op constant,or constant op X
A formula is recursively defined to be one of the following, where p and q are themselves
formulas, and p(X) denotes a formula in which the variable X appears:

any atomic formula

¬p, p 𝖠 q, p ∨ q,orp ⇒ q
∃X(p(X)), where X is a domain variable
∀X(p(X)), where X is a domain variable
Eg:Find all sailors with a rating above 7.
{<I, N, T, A>|<I, N, T, A>∈Sailors 𝖠 T>7}
Relational calculus exists in two forms −
2.11 Tuple Relational Calculus (TRC)

Filtering variable ranges over tuples
Notation − {T | Condition}
Returns all tuples T that satisfies a condition.
For example −{ T.name | Author(T) AND T.article = 'database' }
Output − Returns tuples with 'name' from Author who has written article on 'database'.
TRC can be quantified. We can use Existential (∃) and Universal Quantifiers (∀).
For example −
{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}

Output − The above query will yield the same result as the previous one.
Domain Relational Calculus (DRC)

In DRC, the filtering variable uses the domain of attributes instead of entire tuple values (as
done in TRC, mentioned above).
Notation −
{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where a1, a2 are attributes and P stands for formulae built by inner attributes.
For example −
{< article, page, subject >| ∈ TutorialsPoint 𝖠 subject = 'database'}

Output − Yields Article, Page, and Subject from the relation TutorialsPoint, where subject is
database.

Just like TRC, DRC can also be written using existential and universal quantifiers. DRC also
involves relational operators.
The expression power of Tuple Relation Calculus and Domain Relation Calculus is equivalent
to Relational Algebra.
An Instance S of Sailors An Instance R of Reserves
bid bname color

101 Interlake blue
102 Interlake red
103 Clipper green
104 Marine red
An Instance B of Boats

UNIT-III
SQL, Schema Refinement and Normal Forms
3.1The Form of a Basic SQL Query:
SQL is the language used to query all databases. It's simple to learn and appears to do very little
but is the heart of a successful database application. Understanding SQL and using it efficiently
is highly imperative in designing an efficient database application. The better your understanding
of SQL the more versatile you'll be in getting information out of databases.A SQL SELECT
statement can be broken down into numerous elements, each beginning with a keyword.
Although it is not necessary, common convention is to write these keywords in all capital letters.
In this article, we will focus on the most fundamental and common elements of a SELECT
statement, namely
SELECT
FROM
WHERE
ORDER BY
The SELECT ... FROM Clause

The most basic SELECT statement has only 2 parts:
 What columns you want to return
 What table(s) those columns come from.
Examples of Basic SQL Queries:

If we want to retrieve all of the information about all of the customers in the Employees table,
we could use the asterisk (*) as a shortcut for all of the columns, and our query looks like
SELECT * FROM Employees
If we want only specific columns (as is usually the case), we can/should explicitly specify them
in a comma-separated list, as in
SELECTEmployeeID, FirstName, LastName, HireDate, City FROM Employees
Explicitly specifying the desired fields also allows us to control the order in which the fields are
returned, so that if we wanted the last name to appear before the first name, we could write
SELECT EmployeeID, LastName, FirstName, HireDate, City FROM Employees
The WHERE Clause
The next thing we want to do is to start limiting, or filtering, the data we fetch from the database.
By adding a WHERE clause to the SELECT statement, we add one (or more) conditions that
must be met by the selected data. This will limit the number of rows that answer the query and
are fetched. In many cases, this is where most of the "action" of a query takes place.

Examples
We can continue with our previous query, and limit it to only those employees living in London:
WHERE City = 'London'
If you wanted to get the opposite, the employees who do not live in London, you would write
WHERE City <> 'London'
It is not necessary to test for equality; you can also use the standard equality/inequality operators
that you would expect. For example, to get a list of employees who were hired on or after a given
date, you would write
WHEREHireDate>= '1-july-1993'
Of course, we can write more complex conditions. The obvious way to do this is by having
multiple conditions in the WHERE clause. If we want to know which employees were hired
between two given dates, we could write
SELECTEmployeeID, FirstName, LastName, HireDate, City
FROM Employees
WHERE (HireDate>= '1-june-1992') AND (HireDate<= '15-december-1993')
Note that SQL also has a special BETWEENoperator that checks to see if a value is between
two values (including equality on both ends). This allows us to rewrite the previous query as
SELECT EmployeeID, FirstName, LastName, HireDate, City
FROM Employees
WHEREHireDateBETWEEN '1-june-1992' AND '15-december-1993'
We could also use the NOT operator, to fetch those rows that are not between the specified dates:
SELECTEmployeeID, FirstName, LastName, HireDate, City
FROM Employees
WHERE HireDateNOT BETWEEN '1-june-1992' AND '15-december-1993'
Let us finish this section on the WHERE clause by looking at two additional, slightly more
sophisticated, comparison operators.
What if we want to check if a column value is equal to more than one value? If it is only 2
values, then it is easy enough to test for each of those values, combining them with the OR
operator and writing something like
WHERE City = 'London' OR City = 'Seattle'

However, if there are three, four, or more values that we want to compare against, the above
approach quickly becomes messy. In such cases, we can use the IN operator to test against a set
of values. If we wanted to see if the City was either Seattle, Tacoma, or Redmond, we would
write
WHERE City IN ('Seattle', 'Tacoma', 'Redmond')
As with the BETWEEN operator, here too we can reverse the results obtained and query for
those rows where City is not in the specified list:
WHERE City NOT IN ('Seattle', 'Tacoma', 'Redmond')
Finally, the LIKE operator allows us to perform basic pattern-matching using wildcard
characters. For Microsoft SQL Server, the wildcard characters are defined as follows:
Wildcard Description
_ (underscore) matches any single character
% matches a string of one or more characters
[] matches any single character within the specified range (e.g. [a-f])
or set (e.g. [abcdef]).
[^] matches any single character not within the specified range (e.g.
[â-f]) or set (e.g. [âbcdef]).
Here too, we can opt to use the NOT operator: to find all of the employees whose first name
does not start with 'M' or 'A', we would write
WHERE (FirstNameNOT LIKE'M%') AND (FirstNameNOT LIKE 'A%')
The ORDER BY Clause

Until now, we have been discussing filtering the data: that is, defining the conditions that
determine which rows will be included in the final set of rows to be fetched and returned from
the database. Once we have determined which columns and rows will be included in the results
of our SELECT query, we may want to control the order in which the rows appear—sorting the
data.
To sort the data rows, we include the ORDER BY clause. The ORDER BY clause includes one
or more column names that specify the sort order. If we return to one of our first SELECT
statements, we can sort its results by City with the following statement:


ORDER BY City
If we want the sort order for a column to be descending, we can include the DESC keyword after
the column name.
The ORDER BY clause is not limited to a single column. You can include a comma-delimited
list of columns to sort by—the rows will all be sorted by the first column specified and then by
the next column specified. If we add the Country field to the SELECT clause and want to sort
by Country and City, we would write:
SELECTEmployeeID, FirstName, LastName, HireDate, Country, City
FROM Employees
ORDER BY Country, City DESC
Note that to make it interesting, we have specified the sort order for the City column to be
descending (from highest to lowest value). The sort order for the Country column is still
ascending. We could be more explicit about this by writing
SELECTEmployeeID, FirstName, LastName, HireDate, Country, City
FROM Employees
ORDER BY Country ASC, City DESC
It is important to note that a column does not need to be included in the list of selected (returned)
columns in order to be used in the ORDER BY clause. If we don't need to see/use the Country
values, but are only interested in them as the primary sorting field we could write the query as
ORDER BY Country ASC, City DESC
3.2 UNION, INTERSECT, AND EXCEPT
SQL provides three set-manipulation constructs that extend the basic query form. Since the
answer to a query is a multiset of rows, it is natural to consider the use of operations such as
union, intersection, and difference. SQL supports these operations under the names UNION,
INTERSECT,andEXCEPT.
Union:
Eg: Find the names of sailors who have reserved a red or a green boat.
SELECT S.sname FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND
R.bid = B.bid AND B.color = ‘red’
union

SELECT S2.sname FROM Sailors S2, Boats B2, Reserves R2 WHERE S2.sid = R2.sid AND
R2.bid = B2.bid AND B2.color = ‘green’
This query says that we want the union of the set of sailors who have reserved red boats and
the set of sailors who have reserved green boats.
Intersect:
Eg:Find the names of sailors who have reserved both a red and a green boat.
SELECT S.sname FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND
R.bid = B.bid AND B.color = ‘red’
intersect
SELECT S2.sname FROM Sailors S2, Boats B2, Reserves R2 WHERE S2.sid = R2.sid AND
Except:
Eg:Find the sids of all sailors who have reserved red boats but not green boats.
SELECT S.sid FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid =
B.bid AND B.color = ‘red’
Except
SELECT S2.sid FROM Sailors S2, Reserves R2, Boats B2 WHERE S2.sid = R2.sid AND
SQL also provides other set operations: IN (to check if an element is in a given set), op ANY, op
ALL (to compare a value with the elements in a given set, using comparison operator op), and
EXISTS (to check if a set is empty). IN and EXISTS can be prefixed by NOT,withthe obvious
modification to their meaning.
We cover UNION, INTERSECT,andEXCEPT in this section, and the other operations
3.3 NESTED QUERIES

A Subquery or Inner query or Nested query is a query within another SQL query and embedded
within the WHERE clause.Asubquery is used to return data that will be used in the main query as
a condition to further restrict the data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along
with the operators like =, <, >, >=, <=, IN, BETWEEN etc.
There are a few rules that subqueries must follow:

 Subqueries must be enclosed within parentheses.

 A subquery can have only one column in the SELECT clause, unless multiple columns
are in the main query for the subquery to compare its selected columns.
 An ORDER BY cannot be used in a subquery, although the main query can use an
ORDER BY. The GROUP BY can be used to perform the same function as the ORDER
BY in a subquery.Subqueries that return more than one row can only be used with
multiple value operators, such as the IN operator.
 The SELECT list cannot include any references to values that evaluate to a BLOB,
ARRAY, CLOB, or NCLOB.
EXISTS (sub query)

The argument of EXISTS is an arbitrary SELECT statement. The sub query is evaluated to
determine whether it returns any rows. If it returns at least one row, the result of EXISTS is
TRUE; if the sub query returns no rows, the result of EXISTS is FALSE.
The sub query can refer to variables from the surrounding query, which will act as constants
during any one evaluation of the sub query.
This simple example is like an inner join on col2, but it produces at most one output row for
each tab1 row, even if there are multiple matching tab2 rows:
SELECT col1
FROM tab1
WHERE EXISTS (SELECT 1
FROM tab2
WHERE col2 = tab1.col2);
Example "Students in Projects":
SELECT name
FROM stud
WHERE EXISTS (SELECT 1
FROM assign
WHERE stud = stud.id);
[NOT] IN/IN [NOT]

The right-hand side of this form of IN is a parenthesized list of scalar expressions. The result is
TRUE if the left-hand expression's result is equal to any of the right-hand expressions.
The right-hand side of this form of IN is a parenthesized sub query, which must return exactly
one column. The left-hand expression is evaluated and compared to each row of the sub query
result. The result of IN is TRUE if any equal sub query row is found.

SELECT id, name

FROM stud
WHERE id IN ( SELECT stud
FROM assign
WHERE id = 1);
ANY and SOME

The right-hand side of this form of ANY is a parenthesized sub query, which must return exactly
result using the given operator, which must yield a Boolean result. The result of ANY is TRUE if
any true result is obtained.
SOME is a synonym for ANY. IN is equivalent to = ANY.
ALL
The right-hand side of this form of ALL is a parenthesized sub query, which must return exactly
result using the given operator, which must yield a Boolean result. The result of ALL is TRUE if
all rows yield TRUE (including the special case where the sub query returns no rows). NOT IN
is equivalent to <> ALL.
Row-wise comparison
The left-hand side is a list of scalar expressions. The right-hand side can be either a list of scalar
expressions of the same length, or a parenthesized sub query, which must return exactly as many
columns as there are expressions on the left-hand side. Furthermore, the sub query cannot return
more than one row. (If it returns zero rows, the result is taken to be NULL.) The left-hand side is
evaluated and compared row-wise to the single sub query result row, or to the right-hand
expression list. Presently, only = and <> operators are allowed in row-wise comparisons. The
result is TRUE if the two rows are equal or unequal, respectively.
A nested query is a query that has another query embedded within it; the embedded query is
called a subquery.
SQL provides other set operations: IN (to check if an element is in a given set),NOT IN(to
check if an element is not in a given set).
Eg:1. Find the names of sailors who have reserved boat 103.
SELECT S.sname FROM Sailors S
WHERE S.sid IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid = 103 )

The nested subquery computes the (multi)set of sids for sailors who have reserved boat 103,
and the top-level query retrieves the names of sailors whose sid is in this set. The IN operator
allows us to test whether a value is in a given set of elements; an SQL query is used to generate
the set to be tested.
2.Find the names of sailors who have not reserved a red boat.
WHERE S.sid NOT IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid IN ( SELECT B.bid
FROM Boats B
WHERE B.color = ‘red’ )
Correlated Nested Queries
In the nested queries that we have seen, the inner subquery has been completely independent of
the outer query. In general the inner subquery could depend on the row that is currently being
examined in the outer query .
Eg: Find the names of sailors who have reserved boat number 103.

WHERE EXISTS ( SELECT *
FROM Reserves R
WHERE R.bid = 103 AND R.sid = S.sid )
The EXISTS operator is another set comparison operator, such as IN. It allows us to test
whether a set is nonempty.
Set-Comparison Operators
SQL also supports op ANY and op ALL, where op is one of the arithmetic comparison
operators {<, <=, =, <>, >=,>}.
Eg:1. Find sailors whose rating is better than some sailor called Horatio.
SELECT S.sid FROM Sailors S

WHERE S.rating > ANY ( SELECT S2.rating

FROM Sailors S2 WHERE S2.sname = ‘Horatio’ )
If there are several sailors called Horatio, this query finds all sailors whose rating is better
than that of some sailor called Horatio.
2.Find the sailors with the highest rating.
SELECT S.sid FROM Sailors S

WHERE S.rating >= ALL ( SELECT S2.rating
FROM Sailors S2 )
SQL Operators
There are two type of Operators, namely Comparison Operators and Logical Operators. These
operators are used mainly in the WHERE clause, HAVING clause to filter the data to be
selected.
Comparison Operators:Comparison operators are used to compare the column data with
specific values in a condition.Comparison Operators are also used along with the SELECT
statement to filter data based on specific conditions.
Comparison Operators Description

= equal to
<>, != is not equal to
< less than
> greater than
>= greater than or equal to
<= less than or equal to
Logical Operators:There are three Logical Operators namely AND, OR and NOT.
SQL Comparison Keywords:There are other comparison keywords available in sql which are
used to enhance the search capabilities of a sql query. They are "IN", "BETWEEN...AND", "IS
NULL", "LIKE".
Comparision Operators Description
LIKE column value is similar to specified character(s).
IN column value is equal to any one of a specified set of values.
BETWEEN...AND column value is between two values, including the end values
specified in the range.
IS NULL column value does not exist.
SQL LIKE Operator

The LIKE operator is used to list all rows in a table whose column values match a specified
pattern. It is useful when you want to search rows to match a specific pattern, or when you do not
know the entire value. For this purpose we use a wildcard character '%'.

To select all the students whose name begins with 'S'

SELECT first_name, last_name
FROM student_details
WHERE first_name LIKE 'S%';
The above select statement searches for all the rows where the first letter of the column
first_name is 'S' and rest of the letters in the name can be any character.
There is another wildcard character you can use with LIKE operator. It is the underscore
character, ' _ ' . In a search string, the underscore signifies a single character.
To display all the names with 'a' second character,
WHERE first_name LIKE '_a%';
NOTE:Each underscore act as a placeholder for only one character. So you can use more than
one underscore. Eg: ' i% '-this has two underscores towards the left, 'S j%' - this has two
underscores between character 'S' and 'i'.
SQL BETWEEN ... AND Operator

The operator BETWEEN and AND, are used to compare data for a range of values.
To find the names of the students between age 10 to 15 years, the query would be like,
SELECT first_name, last_name, age
WHERE age BETWEEN 10 AND 15;
SQL IN Operator: The IN operator is used when you want to compare a column with more than
one value. It is similar to an OR condition.
If you want to find the names of students who are studying either Maths or Science, the query
would be like,
SELECT first_name, last_name, subject
WHERE subject IN ('Maths', 'Science');
You can include more subjects in the list like ('maths','science','history')
NOTE:The data used to compare is case sensitive.
SQL IS NULL Operator

A column value is NULL if it does not exist. The IS NULL operator is used to display all the
rows for columns that do not have a value.

If you want to find the names of students who do not participate in any games, the query would
be as given below
WHERE games IS NULL
There would be no output as we have every student participate in a game in the table
student_details, else the names of the students who do not participate in any games would be
displayed.
3.4 Aggregate Operators

The SQL Aggregate Functions are functions that provide mathematical operations. If you need to
add, count or perform basic statistics, these functions will be of great help.
The functions include:

 count() - counts a number of rows
 sum() - compute sum
 avg() - compute average
 min() - compute minimum
 max() - compute maximum
Use of SQL Aggregate Functions

SQL Aggregate Functions are used as follows. If a grouping of values is needed also include the
GROUP BY clause.Use a column name or expression as the parameter to the Aggregate
Function. The parameter, '*', represents all rows.
SELECT <column_name1>, <column_name2><aggregate_function(s)>

FROM <table_name>
GROUP BY <column_name1>, <column_name2>
Example
The following example Aggregate Functions are applied to the employee_count of the branch
table. The region_nbr is the level of grouping.Here are the contents of the table:
Table: BRANCH
branch_nbr branch_name region_nbr employee_count
108 New York 100 10
110 Boston 100 6
212 Chicago 200 5
404 San Diego 400 6
415 San Jose 400 3

This SQL Statement with aggregate functions is executed:

SELECT region_nbr A, count(branch_nbr) B, sum(employee_count) C,
min(employee_count) D
max(employee_count) E, avg(employee_count) F
FROM dbo.branch
GROUP BY region_nbr
ORDER BY region_nbr
Here is the result.

A B C D E F
100 2 16 6 10 8
200 1 5 5 5 5
400 2 9 3 6 4
3.5 NULL VALUES

The SQL NULL is the term used to represent a missing value. A NULL value in a table is a
value in a field that appears to be blank.A field with a NULL value is a field with no value. It is
very important to understand that a NULL value is different than a zero value or a field that
contains spaces.
Syntax:
The basic syntax of NULL while creating a table:
CREATE TABLE CUSTOMERS

( ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25),
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID));
Here, NOT NULL signifies that column should always accept an explicit value of the given data
type. There are two columns where we did not use NOT NULL, which means these columns
could be NULL.
A field with a NULL value is one that has been left blank during record creation.
Example:
The NULL value can cause problems when selecting data, however, because when comparing an
unknown value to any other value, the result is always unknown and not included in the final
results.
You must use the IS NULL or IS NOT NULL operators in order to check for a NULL value.
Consider the following table, CUSTOMERS having the following records:

ID NAME AGE ADDRESS SALARY

1 Ramesh 32 Ahmedabad 2000.00
2 Khilan 25 Delhi 1500.00
3 kaushik 23 Kota 2000.00
4 Chaitali 25 Mumbai 6500.00
5 Hardik 27 Bhopal 8500.00
6 Komal 22 MP
7 Muffy 24 Indore
Now, following is the usage of IS NOT NULL operator:

SELECT ID, NAME, AGE, ADDRESS, SALARY
FROM CUSTOMERS
WHERE SALARY IS NOT NULL;
This would produce the following result:

1 Ramesh 32 Ahmedabad 2000.00
2 Khilan 25 Delhi 1500.00
3 kaushik 23 Kota 2000.00
4 Chaitali 25 Mumbai 6500.00
5 Hardik 27 Bhopal 8500.00
Now, following is the usage of IS NULL operator:

SELECT ID, NAME, AGE, ADDRESS, SALARY
FROM CUSTOMERS
WHERE SALARY IS NULL;
This would produce the following result:

6 Komal 22 MP
7 Muffy 24 Indore
SQL LOGICAL OPERATORS

There are three Logical Operators namely, AND, OR, and NOT. These operators compare two
conditions at a time to determine whether a row can be selected for the output. When retrieving
data using a SELECT statement, you can use logical operators in the WHERE clause, which
allows you to combine more than one condition.
Logical Operators Description

OR For the row to be selected at

least one of the conditions

must be true.
For a row to be selected all the
AND specified conditions must be
true.
For a row to be selected the
NOT specified condition must be
false.
"OR" Logical Operator
If you want to select rows that satisfy at least one of the given conditions, you can use the logical
operator, OR.
Example: if you want to find the names of students who are studying either Maths or Science, the
query would be like,
SELECT first_name, last_name, subject
WHERE subject = 'Maths' OR subject = 'Science'
first_name last_name subject

------------- ------------- ----------
Anajali Bhagwat Maths
Shekar Gowda Maths
Rahul Sharma Science
Stephen Fleming Science
The following table describes how logical "OR" operator selects a row.
Column1 Column2 Row
Satisfied? Satisfied? Selected
YES YES YES
YES NO YES
NO YES YES
NO NO NO
"AND" Logical Operator

If you want to select rows that must satisfy all the given conditions, you can use the logical
operator, AND.
Example: To find the names of the students between the age 10 to 15 years, the query would be
like:

SELECT first_name, last_name, age

WHERE age >= 10 AND age <= 15;
The output would be something like,

first_name last_name age
------------- ------------- ------
Rahul Sharma 10
Anajali Bhagwat 12
Shekar Gowda 15
The following table describes how logical "AND" operator selects a row.
Column1 Column2 Row

Satisfied? Satisfied? Selected
YES YES YES
YES NO NO
NO YES NO
NO NO NO
"NOT" Logical Operator

If you want to find rows that do not satisfy a condition, you can use the logical operator, NOT.
NOT results in the reverse of a condition. That is, if a condition is satisfied, then the row is not
returned.
Example: If you want to find out the names of the students who do not play football, the query
would be like:
SELECT first_name, last_name, games

WHERE NOT games = 'Football'
OUTER JOINS
All joins mentioned above, that is Theta Join, Equi Join and Natural Join are called inner-joins.
An inner-join process includes only tuples with matching attributes, rest are discarded in
resulting relation. There exists methods by which all tuples of any relation are included in the
resulting relation.
There are three kinds of outer joins:

Left outer join ( R S )

All tuples of Left relation, R, are included in the resulting relation and if there exists tuples in R
without any matching tuple in S then the S-attributes of resulting relation are made NULL.
Left
A B
100 Database
101 Mechanics
102 Electronics
Right
A B
100 Alex
102 Maya
104 Mira
Left outer join output

A B C D
100 Database 100 Alex
101 Mechanics --- ---
102 Electronics 102 Maya
Right outer join: ( R S )

All tuples of the Right relation, S, are included in the resulting relation and if there exists tuples
in S without any matching tuple in R then the R-attributes of resulting relation are made NULL.
Right outer join output

A B C D
--- --- 104 Mira
Full outer join: ( R S)

All tuples of both participating relations are included in the resulting relation and if there no
matching tuples for both relations, their respective unmatched attributes are made NULL.
Full outer join output
A B C D

101 Mechanics --- ---

--- --- 104 Mira
DISALLOWING NULL VALUES
SQL NOT NULL Statement

Now one wants to display the field entries whose location is not left blank, then here is a
statement example.
SELECT * FROM Employee

WHERE Location IS NOT NULL;
SQL NOT NULL Statement Output:

The NOT NULL statement will display the following results
Employee Employee
Age Gender Location Salary
ID Name
New
1001 Henry 54 Male 100000
York
1002 Tina 36 Female Moscow 80000
1003 John 24 Male London 40000
1006 Sophie 29 Female London 60000
3.6 Complex Integrity - Constraints in SQL Triggers

An integrity constraint defines a business rule for a table column. When enabled, the rule will be
enforced by oracle (and so will always be true.) To create an integrity constraint all existing table
data must satisfy the constraint.
Default values are also subject to integrity constraint checking (defaults are included as part of
an INSERT statement before the statement is parsed.)
If the results of an INSERT or UPDATE statement violate an integrity constraint, the statement
will be rolled back.
Integrity constraints are stored as part of the table definition, (in the data dictionary.)
If multiple applications access the same table they will all adhere to the same rule.

The following integrity constraints are supported by Oracle:
 NOT NULL
 UNIQUE
 CHECK constraints for complex integrity rules
 PRIMARY KEY
 FOREIGN KEY integrity constraints - referential integrity actions: – On Update – On
Delete – Delete CASCADE – Delete SET NULL
Constraint States
The current status of an integrity constraint can be changed to any of the following 4 options
using the CREATE TABLE or ALTER TABLE statement.
 ENABLE - Ensure that all incoming data conforms to the constraint

 DISABLE - Allow incoming data, regardless of whether it conforms to the constraint
 VALIDATE - Ensure that existing data conforms to the constraint
 NOVALIDATE - Allow existing data to not conform to the constraint
These can be used in combination

 ENABLE { VALIDATE | NOVALIDATE }
 DISABLE { VALIDATE | NOVALIDATE }
 ENABLE VALIDATE is the same as ENABLE.
ENABLE NOVALIDATE means that the constraint is checked, but it does not have to be true
for all rows. This will resume constraint checking for Inserts and Updates but will not validate
any data that already exists in the table.
DISABLE NOVALIDATE is the same as DISABLE.
DISABLE VALIDATE disables the constraint, drops the index on the constraint, and disallows
any modification of the constrained columns.
For a UNIQUE constraint, this enables you to load data from a nonpartitioned table into a
partitioned table using the ALTER
3.7 Triggers and Active Databases
A trigger is a procedure that is automatically invoked by the DBMS in response to specified

changes to the database, and is typically specified by the DBA. A database that has a set of
associated triggers is called an active database.
A trigger description contains three parts:
Event: A change to the database that activates the trigger.
Condition: A query or test that is run when the trigger is activated.
Action: A procedure that is executed when the trigger is activated and its condition is true.

Eg: The trigger called init count initializes a counter variable before every execution of an
INSERT statement that adds tuples to the Students relation. The trigger called incr count
increments the counter for each inserted tuple that satisfies the condition age < 1
CREATE TRIGGER init count BEFORE INSERT ON Students /* Event */
DECLARE
count INTEGER;
BEGIN /* Action */
count := 0;
END
CREATE TRIGGER incr count AFTER INSERT ON

Students /* Event */
WHEN (new.age < 18) /* Condition*/
FOR EACH ROW
BEGIN /* Action */
count:=count+1;
END
3.8 Introduction to Schema Refinement:
Overview:Only construction of the tables is not only the efficient data base design. Solving the
redundant data problem is the efficient one. For this we use functional dependences. And normal
forms those will be discussed in this chapter.
We now present an overview of the problems that schema refinement is intended to address and
a refinement approach based on decompositions. Redundant storage of information is the root
cause of these problems. Although decomposition can eliminate redundancy, it can lead to
problems of its own and should be used with caution.
Problems Caused by Redundancy
Storing the same information redundantly, that is, in more than one place within a database, can
lead to several problems:
Redundant storage: Some information is stored repeatedly.

Update anomalies: If one copy of such repeated data is updated, an inconsistency is

createdunless all copies are similarly updated.
Insertion anomalies: It may not be possible to store some information unless some
otherinformation is stored as well.
Deletion anomalies: It may not be possible to delete some information without losing some
otherinformation as well.
Use of Decompositions
Redundancy arises when a relational schema forces an association between attributes that is not
natural. Functional dependencies can be used to identify such situations and to suggest
refinements to the schema. The essential idea is that many problems arising from redundancy can
be addressed by replacing a relation with a collection of ‘smaller’ relations. Each of the smaller
relations contains a subset of the attributes of the original relation. We refer to this process as
decomposition of the larger relation into the smaller relations.
Problems Related to Decomposition: Decomposing a relation schema can create more
problems than it solves. Two important questions must be asked repeatedly:
 Do we need to decompose a relation?
 What problems (if any) does a given decomposition cause?
To help with the first question, several normal forms have been proposed for relations. If a
relation schema is in one of these normal forms, we know that certain kinds of problems cannot

arise. Considering the normal form of a given relation schema can help us to decide whether or
not to decompose it further. If we decide that a relation schema must be decomposed further, we
must choose a particular decomposition.
With respect to the second question, two properties of decompositions are of particular interest.
The lossless-join property enables us to recover any instance of the decomposed relation from
corresponding instances of the smaller relations. The dependency preservation property enables
us to enforce any constraint on the original relation by simply enforcing some constraints on
each of the smaller relations. That is, we need not perform joins of the smaller relations to check
whether a constraint on the original relation is violated.
3.9 Functional dependencies:

A functional dependency A->B in a relation holds if two tuples having same value of attribute A
also have same value for attribute B. For Example, in relation STUDENT shown in table 1,
Functional Dependencies
STUD_NO->STUD_NAME, STUD_NO->STUD_ADDR holdbut
STUD_NAME->STUD_ADDR do not hold
How to find functional dependencies for a relation?

Functional Dependencies in a relation are dependent on the domain of the relation. Consider the
STUDENT relation given in Table 1.
 We know that STUD_NO is unique for each student. So STUD_NO->STUD_NAME,
STUD_NO->STUD_PHONE, STUD_NO->STUD_STATE, STUD_NO-
>STUD_COUNTRY and STUD_NO -> STUD_AGE all will be true.
 Similarly, STUD_STATE->STUD_COUNTRY will be true as if two records have same
STUD_STATE, they will have same STUD_COUNTRY as well.

 For relation STUDENT_COURSE, COURSE_NO->COURSE_NAME will be true as

two records with same COURSE_NO will have same COURSE_NAME.

Functional Dependency Set: Functional Dependency set or FD set of a relation is the set of all
FDs present in the relation. For Example, FD set for relation STUDENT shown in table 1 is:
{ STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO->STUD_STATE,
STUD_NO->STUD_COUNTRY,
STUD_NO -> STUD_AGE, STUD_STATE->STUD_COUNTRY }
Attribute Closure: Attribute closure of an attribute set can be defined as set of attributes which
can be functionally determined from it.
How to find attribute closure of an attribute set?
To find attribute closure of an attribute set:
 Add elements of attribute set to the result set.
 Recursively add elements to the result set which can be functionally determined from the
elements of the result set.
Using FD set of table 1, attribute closure can be determined as:
(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY, STUD_AGE}
(STUD_STATE)+ = {STUD_STATE, STUD_COUNTRY}
How to finding Candidate Keys and Super Keys using Attribute Closure?
 If attribute closure of an attribute set contains all attributes of relation, the attribute set
will be super key of the relation.
 If no subset of this attribute set can functionally determine all attributes of the relation,
the set will be candidate key as well. For Example, using FD set of table 1,
(STUD_NO, STUD_NAME)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
(STUD_NO, STUD_NAME) will be super key but not candidate key because its subset
(STUD_NO)+ is equal to all attributes of the relation. So, STUD_NO will be a candidate key.

3.10 Normalization:
In general, database normalization involves splitting tables with columns that have different
types of data ( and perhaps even unrelated data) into multiple table, each with fewer columns that
describe the attributes of a single concept of physical object or being.
The goal of normalization is to prevent the problems ( called modification anomalie) that plague
a poorly designed relation (table).
Suppose, for example, that you have a table with resort guest ID numbers, activities the guests
have signed up to do, and the cost of each activity – all together in the following GUEST –
ACTIVITY-COST table:
Each row in the table represents a guest that has signed up for the named activity and paid the
specified cost. Assuming that the cost depends only on the activity that is, a specific activity
costs the same for all guests if you delete the row for GUEST – ID 2587, you lose not only the
fact that guest 2587 signed up for scuba diving, but also the fact that scuba diving costs $ 250.00
per outing. This is called a deletion anomaly – when you delete a row, you lose more information
than you intended to remove.

In the current example, a single deletion resulted in the loss of information on two entities what
activity a guest signed up to do and how much a particular activity costs.
Now, suppose the resort adds a new activity such as horseback riding. You cannot enter the
activity name ( horseback riding) or cost ($190.00) in to the table until a guest decides to sign up
for it. The unnecessary restriction of having to wait until someone signs up for an activity before
you can record its name and cost is called an insertion anomaly.
In the current example, each insertion adds facts about two entities. Therefore, you cannot
INSERT a fact about one entity until you have an additional fact about the other entity.
Conversely, each deletion removes facts about two entities. Thus, you cannot DELETE the
information about one entity while leaving the information about the other in table.
You can eliminate modification anomalies through normalization – that is, splitting the single
table with rows that have attributes about two entities into two tables, each of which has rows
with attributes that describe a single entity.
You will be ablve to remove the aromatherapy appointment for guest 1269 without losing the
fact that an aromatherapy session costs $75.00. Similarly, you can now add the fact that
horseback riding costs $ 190.00 per day to the ACTIVITY – COST table without having to wait
for a guest to sign up for the activity.
During the development of relational database systems in the 1970s, relational theorists kept
discovering new modification anomalies. Some one would find an anomaly, classify it, and then
figure out a way to prevent it by adding additional design criteria to the definition of a “well
formed relation. These design criteria are known as normal forms. Not surprisingly E.F codd (of
the 12 rule database definition fame), defined the first, second, and third normal forms, (INF,
2NF, and 3NF).
After Codd postulated 3 NF, relational theorists formulated Boyce-codd normal form (BCNF)
and then fourth normal form (4NF) and fifth normal form (5NF)
First Normal form :
Normalization is a processes by which database designers attempt to eliminate modification
anomalies such as the :

Deletion anomaly
The iniability to remove a single fact from a table without removing other (unrelated) facts you
want to keep.
Insertion anomaly:
The inability to insert one fact without inserting another ( and some times, unrelated) fact.
Update anomaly:
Changing a fact in one column creates a false fact in another set of columns. Modification
anomalies are a result of functional dependencies among the columns in a row ( or tuple, to use
the precise relational database term
A functional dependency means that if you know the value in one column or set of columns, you
can always determine the value of another. To put the table in first normal form (INF) you could
break up the student number list in the STUDENTS column of each row such that each row had
only one of the student Ids in the STUDENTS column. Doing so would change the table’s
structure and rows to: The value given by the combination (CLASS, SECTION, STUDENT) is
the composite key for the table because it makes each row unique and all columns atomic. Now
that each the table in the current example is in INF, each column has a single, scalar value.
Unfortunately, the table still exhibits modification anomalies:\
Deletion anomaly:
If professor SMITH goes to another school and you remove his rows from the table, you also
lose the fact that STUDENTS 1005, 2110 and 3115 are enrolled in a history class.
Insertion anomaly:
If the school wants to add an English Class (EI00), it cannot do so until a student signs up for the
course ( Remember, no part of a primary key can have a NULL value).
Update anomaly:
If STUDENT 4587 decides to sign up for the SECTION 1, CS100 CLASS instead of his math
class, updating the Class and section columns in the row for STUDENT 4587 to reflect the
change will cause the table to show TEACHER RAWL INS as being in both the MATH and the
COMP-SCI departments.
Thus, ‘flattening’ a table’s columns to put it into first normal form (INF) does not solve any of
the modification anomaliesAll it does is guarantee that the table satisfies the requirements for a
table defined as “relational” and that there are no multi valued dependencies between the
columns in each row.

Second Normal Form:

The process of normalization involves removing functional depencies between columns n order
to eliminate the modification anomalies caused by these dependencies.
Putting a table in first normal form (INF) requires removing all multi valued dependencies.
When a table is in second normal form, it must be in first normal form (no multi valued
dependencies and have no partial key dependencies.
A partial key dependency is a situation in which the value in part of a key can be used to
determine the value of another attribute ( column)Thus, a table is in 2NF when the value in all
nonkey columns depends on the entire key. Or, said another way, you cannot determine the value
of any of the columns by using part of the keyWith (CLASS, SECTION, STUDENT) as its
primary key. If the university has two rules about taking classes no student can sign up for more
than one section of the same class, and a student can have only one major then the table, while in
1 NF, is not in 2NF.
Given the value of (STUDENT, COURSE) you can determine the value of the SECTION, since
no student can sign up for two sections of the same course. Similarly since students can sign up
for only one major, knowing STUDENT determines the value of MAJOR. In both instances, the
value of a third column can be deduced (or is determined) by the value in a portion of the key
(CLASS, SECTION, STUDENT) that makes each row unique.
To put the table in the current example in 2NF will require that it be split in to three tables
described by :
Courses (Class, Section, Teacher, Department)
PRIMARY KEY (Class, Section)
Enrollment (Student, Class, Section)
PRIMARY KEY (Student, class)
Students (student, major)
PRIMARY KEY (Student)
Unfortunately, putting a table in 2NF does not eliminate modification anomalies.
Suppose, for example, that professor Jones leaves the university. Removing his row from the
COURSES table would eliminate the entire ENGINEERING department, since he is currently
the only professor in the department.

Similarly, if the university wants to add a music department, it cannot do so until it hires a
professor to teach in the department.
Understanding Third Normal Form :
To be a third normal form (3NF) a table must satisfy the requirements for INF (no multi valued
dependencies) and 2NF ( all nonkey attributes must depend on the entire key). In addition, a
table in 3NF has no transitive dependencies between nonkey columns.
Given a table with columns, (A,B,C) a transitive dependency is one in which a determines B, and
B determines C, therefore A determines C, or, expressed using relational theory notation
If A B and B C then A C.
When a table is in 3NF the value in every non key column of the table can be determined by
using the entire key and only the entire key,. Therefore, given a table in 3NF with columns
(A,B,C) if A is the PRIMARY KEY, you could not use the value of B ( a non key column) to
determine the value of a C ( another non key column). As such, A determines B(A B), and A
determines C( C). However, knowing the value of column B does not tell you have value in
column C that is, it is not the case that B C.
Suppose, for example, that you have a COURSES tables with columns and PRIMARY KEY
described by
Courses (Class, section, teacher, department , department head)
PRIMARY KEY (Class, Section)
That contains the Data :
( ----------A--------- (B) (C) (D)
Class Section Teacher Dept. Dept. Head
History
H100 Smith Smith
History
H1002 Riley Smith
CS100 1 Bowls Comp.Sci Peroit

Hasting
M2003 Rawlins Math s
Hasting
M2002 Brown Math s
Hasting
M2004 Riley Math s

E1001 Jones Engg. Jones
Given that a TEACHER can be assigned to only one DEPARTMENT and that a
DEPARTMENT can have only one department head, the table has multiple transitive
dependencies.
For example, the value of TEACHER is dependant on the PRIMARY KEY (CLASS,
SECTION), since a particular SECTION of a particular CLASS can have only one teacher that is
A B. Moreover, since a TEACHER can be in only one DEPARTMENT, the value in
DEPARTMENT is dependant on the value in TEACHER that is B C. However, since the
PRIMARY KEY (CLASS, SECTION) determines the value of TEACHER, it also determines
the value of DEPARTMENT that is A C. Thus, the table exhibits the transitive dependency in
which A B and B C, therefore A C.
The problem with a transitive dependency is that it makes the table subject to the deletion
anomaly. When smith retires and we remove his row from the table, we lose not only the fact
that smith taught SECTION 1 of H100 but also the fact that SECTION 1 of H100 was a class
that belonged to the HISTORY department.
To put a table with transitive dependencies between non key columns into 3 NF requires that the
table be split into multiple tables. To do so for the table in the current example, we would need
split it into tables, described by :
Courses (Class, Section, Teacher)
PRIMARY KEY (class, section)
Teachers (Teacher, department)
PRIMARY KEY (teacher)
Departments (Department, Department head)
PRIMARY KEY (department )

After Normalization

3.11 Schema Refinement or Database design:

 Normalisation or Schema Refinement is a technique of organizing the data in the
database. It is a systematic approach of decomposing tables to eliminate data redundancy
and undesirable characteristics like Insertion, Update and Deletion Anomalies.
 The Schema Refinement refers to refine the schema by using some technique. The best
technique of schema refinement is decomposition.
 The Basic Goal of Normalisation is used to eliminate redundancy.
 Redundancy refers to repetition of same data or duplicate copies of same data stored in
different locations.
Normalization is used for mainly two purpose :
 Eliminating redundant(useless) data.
 Ensuring data dependencies make sense i.e data is logically stored.
Anomalies or Problems Facing without Normalisation :Anomalies refers to the problems
occurred after poorly planned and unnormalised databases where all the data is stored in one
table which is sometimes called a flat file database. Let us consider such type of schema –
SID Sname CID Cname FEE
S1 A C1 C 5k
S2 A C1 C 5k
S1 A C2 C 10k
S3 B C2 C 10k
S3 B C2 JAVA 15k
Primary Key(SID,CID)
Here all the data is stored in a single table which causes redundancy of data or say anomalies as
SID and Sname are repeated once for same CID .
3.12 OTHER KINDS OF DEPENDENCIES:
Finish-to-Start Dependencies: The most common type of dependency is the finish-to-start
relationship (FS). This relationship means that the first task, the predecessor, must be finished
before the next task, the successor, can start. On the Gantt chart it is usually represented as
follows:

Start-to-Start Dependencies
The next type of dependency is the start-to-start relationship (SS). This relationship means that
the successor task cannot start until the predecessor task starts. On the Gantt chart, it is usually
represented as follows:
Finish-to-Finish Dependencies
The third type of dependency is the finish-to-finish relationship (FF). This relationship means
that the successor task cannot finish until the predecessor task finishes. On the Gantt chart, it is
usually represented as follows:
Start-to-Finish Dependencies
The start-to-finish relationship (SF) is the least common task relationship and means that the
successor cannot finish until the predecessor starts. On the Gantt chart, it is usually represented
as follows:

Variations of Task Dependency Types
Of course tasks sometimes overlap – this is termed lead (or lead time). Tasks can also be delayed
(for example, to wait while concrete dries) which is called lag (or lag time).

UNIT-IV
TRANSACTION MANAGEMENT & RECOVERY SYSTEM
Overview:
In this unit we introduce two topics first one is concurrency control. The stored data will be
accessed by the users so if any two or users try to access same data at a time it may raise the
problem of data inconsistency to solve that concurrency control methods are invented. Recovery
is used to maintain the data without loss when the problem of power failure, software failure and
hardware failure.
4.1 Transactions
Collections of operations that form a single logical unit of work are called Transactions. A
database system must ensure proper execution of transactions despite failures – either the entire
transaction executes, or none of it does.
4.2 Transaction Concept:
A transaction is a unit of program execution that accesses and possibly updates various data
items. Usually, a transaction is initiated by a user program written in a high level data
manipulation language or programming language ( for example SQL, COBOL, C, C++ or
JAVA), where it is delimited by statements ( or function calls) of the form Begin transaction and
end transaction. The transaction consists of all operations executed between the begin transaction
and end transaction.
To ensure integrity of the data, we require that the database system maintain the following
properties of the transaction.
Atomicity: Either all operations of the transaction are reflected properly in the database, or non
are .
Consistency: Execution of a transaction in isolation ( that is, with no other transaction executing
concurrently) preserves the consistency of the database.

Isolation: Even though multiple transactions may execute concurrently, the system guarantees
that, for every pair of transaction Ti and Tj, ti appears to Ti that either Tj finished execution
before Ti started, or Tj started execution after Ty finished. Thus, each transaction is unaware of
other transactions executing concurrently in the system.
Durability: After a transaction completes successfully, the changes it has made to the database
persist, even if there are system failures.
4.3 A Simple Transaction Model:
Transaction state:
In the absence of failures, all transactions complete successfully. A transaction may not always
complete its execution successfully. Such a transaction is termed aborted. If we are to ensure the
atomicity property, an aborted transction must have no effect on the state of the database.
Thus, any changes that the aborted transaction made to the database must be undone. Once the
changes caused by an aborted transaction have been undone, we say that the transaction has been
rolled back. It is part of the responsibility of the recovery scheme to manage transaction aborts.
A transaction that completes its execution successfully is said to be committed. A committed

transaction that has performed updates transforms the database into a new consistent state, which
must persist even if there is a system failure.
Once a transction has committed, we cannot undo its effects by aborting it. The only way to undo
the effects of committed transaction is to execute a compensating transaction. For instance, if a
transaction added $20 to an account, the compensating transaction would subtract $20 from the
account. However, it is not always possible to create such a compensating transaction. Therefore,
the responsibility of writing and executing a compensating transaction is left to the user, and is
not handled by the database system. By successful completion of a transaction, A transaction
must be in one of the following states :
Active:
The initial state ; the transaction stays in this state while it is executing

Partially committed :
After the final statement has been executed
Faile:
After the discovery that normal execution can no longer proceed
Aborted:
After the transaction has been rolled back and the database has been restrored to its state prior to
the start of the transaction
Committed:
After successful completion
We say that a transaction has committed nly if it has entered the committed state. Similarly, we
say that a transaction has aborted only if it has entered the aborted state. A transaction is said to
have terminated if has either committed or aborted.
A transaction starts in the active state. When it finishes its final statement, it enters the
partially committed state. At this point, the transaction has completed its execution, but it is still
possible that it may have to be aborted, since the actual output may still be termporarily residing
in main momory, and thus a hardware failure may preclude its successful completion.
The database system then writes out enough information to disk that, even in the event of
a failure, the updates performed by the transaction can be recreated when the system restarts after
the failure. When the last of this information is written out, the transaction enters the committed
state.
A transaction enters the filed state after the system determines that the transaction can no
longer proceed with its normal execution ( for example, because of hard ware or logical errors)
such a transaction must be rolled back. Then, it enters the aborted state. At this point, the system
has two options.
It can restart the transaction, but only if the transaction was aborted as a result of some hardware
or software error that was not created through the internal logic of the transaction. A restarted
transaction is considered to be a new transaction.
It can kill the transaction. It usually does so because of some internal logical error that can be
corrected only by rewriting the application program, or because the input was bad, or because the
desired data were not found in the database.

We must be cautious when dealing with observable external writes, such as writes to a terminal
or printer. Once such a write has occurred, it cannot be erased, since it may have been seen
external to the database system. Most systems allows such writes to take place only after the
transaction has entered the committed state.
A transaction in a database can be in one of the following states −
4.4 Storage Structure:
These properties are often called the ACID properties, the acronym is derived from the first letter
of each of the four properties.Volatile Memory
These are the primary memory devices in the system, and are placed along with the CPU. These
memories can store only small amount of data, but they are very fast. E.g.:- main memory, cache
memory etc. these memories cannot endure system crashes- data in these memories will be lost
on failure.
Non-Volatile memory
These are secondary memories and are huge in size, but slow in processing. E.g.:- Flash memory,
hard disk, magnetic tapes etc. these memories are designed to withstand system crashes.
Stable Memory
This is said to be third form of memory structure but it is same as non volatile memory. In this
case, copies of same non volatile memories are stored at different places. This is because, in case
of any crash and data loss, data can be recovered from other copies. This is even helpful if there

one of non-volatile memory is lost due to fire or flood. It can be recovered from other network
location. But there can be failure while taking the backup of DB into different stable storage
devices. Even it may fail to transfer all the data successfully; either it will partially transfer the
data to remote devices or completely fail to store the data in stable memory. Hence extra caution
has to be taken while taking the backup of data from one stable memory to other. There are
different methods followed to copy the data. One of them is to copy the data in two phases –
copy the data blocks to first storage device, if it is successful copy to second storage device. The
copying is complete only when second copy is executed successfully. But second copy of data
blocks may fail to copy whole blocks. In such case, each data blocks in first copy and second
copy needs to be compared for its inconsistency. But verifying each blocks would be very costly
task as we may have huge number of data block. One of the better way to identify the failed
block is to identify the block which was in progress during the failure. Take only this block,
compare the data and correct the mismatches.
Failure Classification
When a transaction is being executed in the system, it may fail to execute due to various reasons.
The failure can be because of system program, bug in a program, user, or system crash. These
failures can be broadly classified into three categories.
Transaction Failure : This type of failure affects only few tables or processes. This is the
condition in the transaction where a transaction cannot execute it further. This failure can be
because of user or executing program/ transaction. The user may cancel the transaction when the
transaction is executing by pressing the cancel button or abort using the DB commands. The
transaction may fail because of the constraints on the tables – violation of constraints. It can even
fail if there is concurrent processing of multiple transactions and there is lack of resources for all
of them or deadlock situation. All these will cause the transaction to stop processing in the
middle of its execution. When a transaction fails / stops in the middle, it would have partially
changed DB and it needs to be rolled back to previous consistent state. In ATM withdrawal
example, if the user cancels his transaction after step (i), the system should be able to stop further
processing of the transaction, or if he cancels the transaction after step (ii), the system should be
strong enough to update his balance in his account. Here system may cancel the transaction due
to insufficient balance. The failure can be because of errors in the code – logical errors or
because of system errors like deadlock or unavailability of system resources to execute the
transactions.

System Crash: This can be because of hardware or software failure or because of external
factors like power failure. This is the failure of the system because of the bug in the software or
the failure of system processor. This crash mainly affects the data in the primary memory. If it
affects only the primary memory, the actual data will not be really affected and recovery from
this failure is easy. This is because primary memories are temporary storages and it would not
have updated the actual database. Hence the system will be in a consistent state before to the
transaction. But when secondary memory crashes, there would be a loss of data and need to take
serious actions to recover lost data. Because secondary memories contain actual DB data.
Recovering them from crash is little tedious and requires more effort. DB Recovery system
provides strong mechanisms to recovery the system from crash and maintains the atomicity of
the transactions. In most of the cases data in the secondary memory are not affected because o f
this crash. This is because; the database has lots of integrity checkpoints to prevent the data loss
from secondary memory.
Disk Failure: These are the issues with hard disks like formation of bad sectors, disk head crash,
unavailability of disk etc. Data can even be lost because of fire, flood, theft etc. This is mainly
affects the secondary memory where the actual data lies. In these cases, we need to have
alternative ways of storing DB. We can create backups of DB at regular basis and store them
separately from the memory where DB is stored or maintain multiple copies of DB at different
network locations to recover them from failure.
4.5 Transaction Atomicity and Durability:
To gain a better understanding of ACID properties and the need for them, consider a simplified
banking system consisting of several accounts and a set of transactions that access and update
those accounts.
Transactions access data using two operations:
Read (X) which transfers the data item X from the database to a local buffer belonging to the
transaction that executed the read operation
Write (X), which transfers the data item X from the local buffer of the transaction that executed
the write back to the database.

In a real database system, the write operation does not necessarily result in the immediate update
of the data on the disk; the write operation may be temporarily stored in memory and executed
on the disk later.
For now, however, we shall assume that the write operation updates the database immediately.
Let Ty be a transaction that transfers $50 from account A to account B. This transaction can be
defined as
Ti : read (A);
A; = A-50;
Write (A);
Read (B);
B:=B+50;
Write (B).
Let us now consider each of the ACID requirements.
Consistency:
Execution of a transaction in isolation ( that is, with no other transaction executing concurrently)
preserves the consistency of the database.
The consistency requirement here is that the sum of A and B be unchanged by the execution of
the transaction. Without the consistency requirement, money could be created or destroyed by
the transaction. It can be verified easily that, if the database is consistent before an execution of
the transaction, the database remains consistent after the execution of the transaction.
Ensuring consistency for an individual transaction is the responsibility of the application
programmer who codes the transaction. This task may be facilitated by automatic testing of
integrity constraints.
Atomicity:
Suppose that, just before the execution of transaction Ty the values of accounts A and B are
$1000 and $2000, respectively.

Now suppose that, during the execution of transaction Ty, a failure occurs that prevents Ti from
completing its execution successfully.
Examples of such failures include power failures, hardware failures, and software errors
Further, suppose that the failure happened after the write (A) operation but before the write (B)
operation. In this case, the values of amounts A and B reflected in the database are $950 and
$2000. The system destroyed $50 as a result of this failure.
In particular, we note that the sum A + B is no longer preserved. Thus, because of the failure, the
state of the system no longer reflects a real state of the world that the database is supposed to
capture. WE term such a state in inconsistent state. We must ensure that such inconsistencies are
not visible in a database system.
Note, however, that the system must at some point be in an inconsistent state. Even if transaction
Ty is executed to completion, there exists a point at which the value of account A is $ 950 and
the value of account B is $2000 which is clearly an inconsistent state.
This state, however is eventually replaced by the consistent state where the value of account A is
$ 950, and the value of account B is $ 2050.
Thus, if the transaction never started or was guaranteed to complete, such an inconsistent state
would not be visible except during the execution of the transaction.
That is the reason for the atomicity requirement:
If the atomicity property is present, all actions of the transaction are reflected in the database or
none are.
The basic idea behind ensuring atomicity is this:

The database system keeps track ( or disk) of the old values of any data on which a transaction
performs a write, and, if the transaction does not complete its execution, the database system
restores the old values to make it appear as though the transaction never executed.

Ensuring atomicity is the responsibility of the database system itself; specifically, it is handed by
a component called the transaction management component. ]
4.6 Serializability:
When multiple transactions are being executed by the operating system in a multiprogramming
environment, there are possibilities that instructions of one transactions are interleaved with
some other transaction.
 Schedule − A chronological execution sequence of a transaction is called a schedule. A

schedule can have many transactions in it, each comprising of a number of
instructions/tasks.
 Serial Schedule − It is a schedule in which transactions are aligned in such a way that
one transaction is executed first. When the first transaction completes its cycle, then the
next transaction is executed. Transactions are ordered one after the other. This type of
schedule is called a serial schedule, as transactions are executed in a serial manner.
In a multi-transaction environment, serial schedules are considered as a benchmark. The

execution sequence of an instruction in a transaction cannot be changed, but two transactions
can have their instructions executed in a random fashion. This execution does no harm if two
transactions are mutually independent and working on different segments of data; but in case
these two transactions are working on the same data, then the results may vary. This ever-
varying result may bring the database to an inconsistent state.
To resolve this problem, we allow parallel execution of a transaction schedule, if its transactions
are either serializable or have some equivalence relation among them.
Equivalence Schedules
An equivalence schedule can be of the following types −
Result Equivalence
If two schedules produce the same result after execution, they are said to be result equivalent.
They may yield the same result for some value and different results for another set of values.
That's why this equivalence is not generally considered significant.
View Equivalence
Two schedules would be view equivalence if the transactions in both the schedules perform
similar actions in a similar manner.
For example −

 If T reads the initial data in S1, then it also reads the initial data in S2.
 If T reads the value written by J in S1, then it also reads the value written by J in S2.
 If T performs the final write on the data value in S1, then it also performs the final write
on the data value in S2.
Conflict Equivalence
Two schedules would be conflicting if they have the following properties −
 Both belong to separate transactions.

 Both accesses the same data item.
 At least one of them is "write" operation.
Two schedules having multiple transactions with conflicting operations are said to be conflict
equivalent if and only if −
 Both the schedules contain the same set of Transactions.

 The order of conflicting pairs of operation is maintained in both the schedules.
Note − View equivalent schedules are view serializable and conflict equivalent schedules are
conflict serializable. All conflict serializable schedules are view serializable too.
Concurrency Control:
4.7. Lock-Based protocols:
ADBMS must be able to ensure that only serializable, recoverable schedules are allowed, and
that no actions of committed transactions are lost while undoing aborted transactions. A
DBMS typically uses a locking protocol to achieve this. A locking protocol is a set of rules to
be followed by each transaction, in order to ensure that even though actions of several
transactions might be interleaved, the net effect is identical to executing all transactions in
some serial order.
Strict Two-Phase Locking(Strict2PL):
The most widely used locking protocol, called Strict Two-Phase Locking, or Strict2PL,
It has two rules. The first rule is
1.If a transaction T wants to read an object, it first requests a shared lock on the object.
Of course, a transaction that has an exclusive lock can also read the object; an additional shared
lock is not required. A transaction that requests a lock is suspended until the DBMS is able to

grant it the requested lock. The DBMS keeps track of the locks it has granted and ensures that if
atransaction holds an exclusive lock on an object no other transaction holds a shared or exclusive
lock on the same object.
The second rule in Strict2PL is:
(1)All locks held by a transaction are released when the transaction is completed
4.8 Multiple-Granularity Locking

Another specialized locking strategy is called multiple-granularity locking, and it allows us to
efficiently set locks on objects that contain other objects. For instance, a database contains
several files , afile is a collection of pages , and a page is a collection of records . A transaction
that expects to accessmost of the pages in a file should probably set a lock on the entire file,
rather than locking individualpages as and when it needs them. Doing so reduces the locking
overhead considerably. On the other hand,other transactions that require access to parts of the
file — even parts that are not needed by thistransaction are blocked. If a transaction accesses
relatively few pages of the file, it is better to lock only
those pages. Similarly, if a transaction accesses ever alrecords on a page, it should lock the entire
pageand if it accesses just a few records, it should lock just those records.
The question to be addressed is how a lock manager can efficiently ensure that a page,
for example, is not locked by a transaction while an other transaction holds a conflictinglock on
the file containing the page.
The recovery manager of a DBMS is responsible for ensuring two important properties of
transactions: atomicity and durability. It ensures atomicity by undoing the actions of transactions
that do not commit and durability by making sure that all actions of committed transactions
survive system crashes, (e.g., a core dump caused by a bus error) and media failures (e.g., a
disk is corrupted).
The Log: The log, sometimes called the trail or journal, is a history of actions executed by the
DBMS. Physically, the log is a file of records stored in stable storage, which is assumed to
survive crashes; this durability can be achieved by maintaining two or more copies of the log on

different disks, so that the chance of all copies of the log being simultaneously lost is negligibly
small.
The most recent portion of the log, called the log tail,is kept in main memory and is
periodically forced to stable storage. This way, log records and data records are written to disk at
the same granularity.
Every log record is given a unique id called the log sequence number (LSN). As with
any record id, we can fetch a log record with one disk access given the LSN. Further, LSNs
should be assigned in monotonically increasing order; this property is required for the ARIES
recovery algorithm. If the log is a sequential file, in principle growing indefinitely, the LSN can
simply be the address of the first byte of the log record.
A log record is written for each of the following actions:

Updating a page: After modifying the page, an update type record is appended to the log tail.
The pageLSN of the page is then set to the LSN of the update log record
Commit: When a transaction decides to commit, it force-writes a commit type log record
containing thetransaction id. That is, the log record is appended to the log, and the log tail is
written to stable storage, up to and including the commit record.
The transaction is considered to have committed at the instant that its commit log record is
written to stable storage
Abort: When a transaction is aborted, an abort type log record containing the transaction id is
appendedto the log, and Undo is initiated for this transaction
End: As noted above, when a transaction is aborted or committed, some additional actions must
be takenbeyond writing the abort or commit log record. After all these additional steps are
completed, an end type log record containing the transaction id is appended to the log.
Undoing an update: When a transaction is rolled back (because the transaction is aborted, or
duringrecovery from a crash), its updates are undone. When the action described by an update
log record is undone, a compensation log record,or CLR, is written.

Other Recovery-Related Data Structures

In addition to the log, the following two tables contain important recovery-related information:
Transaction table: This table contains one entry for each active transaction. The entry
contains thetransaction id, the status, and a field called lastLSN, which is the LSN of the most
recent log record for this transaction. The status of a transaction can be that it is in progress, is
committed, or is aborted.
Dirty page table: This table contains one entry for each dirty page in the buffer pool, that is,
each pagewith changes that are not yet reflected on disk. The entry contains a field recLSN,
which is the LSN of the first log record that caused the page to become dirty. Note that this LSN
identifies the earliest log record that might have to be redone for this page during restart from a
crash.
Checkpoint
A checkpoint is like a snapshot of the DBMS state, and by taking checkpoints periodically, as
we will see, the DBMS can reduce the amount of work to be done during restart in the event of a
subsequent crash.
4.9 Timestamp-based Protocols:

The most commonly used concurrency protocol is the timestamp based protocol. This protocol
uses either system time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions at the
time of execution, whereas timestamp-based protocols start working as soon as a transaction is
created.
Every transaction has a timestamp associated with it, and the ordering is determined by the age
of the transaction. A transaction created at 0002 clock time would be older than all other
transactions that come after it. For example, any transaction 'y' entering the system at 0004 is two
seconds younger and the priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the system
know when the last ‘read and write’ operation was performed on the data item.
Timestamp Ordering Protocol

The timestamp-ordering protocol ensures serializability among transactions in their conflicting

read and write operations. This is the responsibility of the protocol system that the conflicting
pair of tasks should be executed according to the timestamp values of the transactions.
 The timestamp of transaction Ti is denoted as TS(Ti).
 Read time-stamp of data-item X is denoted by R-timestamp(X).
 Write time-stamp of data-item X is denoted by W-timestamp(X).
Timestamp ordering protocol works as follows −
 If a transaction Ti issues a read(X) operation −
o If TS(Ti) < W-timestamp(X)
 Operation rejected.
o If TS(Ti) >= W-timestamp(X)
 Operation executed.
o All data-item timestamps updated.
 If a transaction Ti issues a write(X) operation −
o If TS(Ti) < R-timestamp(X)
 Operation rejected.
o If TS(Ti) < W-timestamp(X)
 Operation rejected and Ti rolled back.
o Otherwise, operation executed.
Thomas' Write Rule
This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and T i is rolled back.
Time-stamp ordering rules can be modified to make the schedule view serializable.
Instead of making Ti rolled back, the 'write' operation itself is ignored.
4.10 Validation-Based Protocols:

In cases where a majority of transactions are read-only transactions, the rate of conflicts among
transactions may be low. Thus, many of these transactions, if executed without the supervision of
a concurrency-control scheme, would nevertheless leave the system in a consistent state. A
concurrency-control scheme imposes overhead of code execution and possible delay of
transactions. It may be better to use an alternative scheme that imposes less overhead. A
difficulty in reducing the overhead is that we do not know in advance which transactions will be
involved in a conflict. To gain that knowledge, we need a scheme for monitoring the system.

We assume that each transaction Ti executes in two or three different phases in its lifetime,
depending on whether it is a read-only or an update transaction. The phases are, in order,
1. Read phase. During this phase, the system executes transaction Ti. It reads the values of the
various data items and stores them in variables local to Ti. It performs all write operations on
temporary local variables, without updates of the actual database.
2. Validation phase. Transaction Ti performs a validation test to determine whether it can copy
to the database the temporary local variables that hold the results of write operations without
causing a violation of serializability.
3. Write phase. If transaction Ti succeeds in validation (step 2), then the system applies the
actual updates to the database. Otherwise, the system rolls back Ti.
Each transaction must go through the three phases in the order shown. However, all three phases
of concurrently executing transactions can be interleaved.
To perform the validation test, we need to know when the various phases of trans-
actions Ti took place. We shall, therefore, associate three different timestamps with
transaction Ti:
1. Start(Ti), the time when Ti started its execution.
2. Validation(Ti ), the time when Ti finished its read phase and started its validation phase.
3. Finish(Ti), the time when Ti finished its write phase.
We determine the serializability order by the timestamp-ordering technique, using the value of
the timestamp Validation(Ti). Thus, the value TS(Ti) = Validation(Ti) and, if TS(Tj ) < TS(Tk ),
then any produced schedule must be equivalent to a serial schedule in which
transaction Tj appears before transaction Tk . The reason we have chosen Validation(Ti), rather
than Start(Ti), as the timestamp of transaction Ti is that we can expect faster response time
provided that conflict rates among transactions are indeed low.
The validation test for transaction Tj requires that, for all transactions Ti with TS(Ti) < TS(Tj ),
one of the following two conditions must hold:
1. Finish(Ti) < Start(Tj ). Since Ti completes its execution before Tj started, the serializability
order is indeed maintained.
2. The set of data items written by Ti does not intersect with the set of data items read by Tj ,
and Ti completes its write phase before Tj starts its validation phase
(Start(Tj ) < Finish(Ti) < Validation(Tj )). This condition ensures that

the writes of Ti and Tj do not overlap. Since the writes of Ti do not affect the read of Tj , and
since Tj cannot affect the read of Ti, the serializability order is indeed maintained.
As an illustration, consider again transactions T14 and T15. Suppose that TS(T14) < TS(T15).
Then, the validation phase succeeds in the schedule 5 in Figure 16.15. Note that the writes to the
actual variables are performed only after the validation phase of T15. Thus, T14 reads the old
values of B and A, and this schedule is serializable.
The validation scheme automatically guards against cascading rollbacks, since the actual writes
take place only after the transaction issuing the write has committed.
However, there is a possibility of starvation of long transactions, due to a sequence of conflicting
short transactions that cause repeated restarts of the long transaction.
To avoid starvation, conflicting transactions must be temporarily blocked, to enable the long
transaction to finish.
This validation scheme is called the optimistic concurrency control scheme since transactions
execute optimistically, assuming they will be able to finish execution and validate at the end. In
contrast, locking and timestamp ordering are pessimistic in that they force a wait or a rollback
whenever a conflict is detected, even though there is a chance that the schedule may be conflict
serializable.
Recovery System:
Crash Recovery:
DBMS is a highly complex system with hundreds of transactions being executed every second.
The durability and robustness of a DBMS depends on its complex architecture and its underlying
hardware and system software. If it fails or crashes amid transactions, it is expected that the
system would follow some sort of algorithm or techniques to recover lost data.

4.11 Failure Classification:

To see where the problem has occurred, we generalize a failure into various categories, as
follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from where it can’t
go any further. This is called transaction failure where only a few transactions or processes are
hurt.
Reasons for a transaction failure could be −
 Logical errors − Where a transaction cannot complete because it has some code error or
any internal error condition.
 System errors − Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability, the system aborts
an active transaction.
System Crash
There are problems − external to the system − that may cause the system to stop abruptly and
cause the system to crash. For example, interruptions in power supply may cause the failure of
underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure
In early days of technology evolution, it was a common problem where hard-disk drives or
storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any
other failure, which destroys all or a part of disk storage.
Storage Structure
We have already described the storage system. In brief, the storage structure can be divided into
two categories −
 Volatile storage − As the name suggests, a volatile storage cannot survive system
crashes. Volatile storage devices are placed very close to the CPU; normally they are
embedded onto the chipset itself. For example, main memory and cache memory are
examples of volatile storage. They are fast but can store only a small amount of
information.

 Non-volatile storage − These memories are made to survive system crashes. They are
huge in data storage capacity, but slower in accessibility. Examples may include hard-
disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.
4.12 Recovery and Atomicity:
When a system crashes, it may have several transactions being executed and various files opened
for them to modify the data items. Transactions are made of various operations, which are atomic
in nature. But according to ACID properties of DBMS, atomicity of transactions as a whole must
be maintained, that is, either all the operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
 It should check the states of all the transactions, which were being executed.
 A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
 It should check whether the transaction can be completed now or it needs to be rolled
back.
 No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as maintaining
the atomicity of a transaction −
 Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.
 Maintaining shadow paging, where the changes are done on a volatile memory, and later,
the actual database is updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a transaction.
It is important that the logs are written prior to the actual modification and stored on a stable
storage media, which is failsafe.
Log-based recovery works as follows −
 The log file is kept on a stable storage media.
 When a transaction enters the system and starts execution, it writes a log about it.
<Tn, Start>
 When the transaction modifies an item X, it write logs as follows −

<Tn, X, V1, V2>

It reads Tn has changed the value of X, from V1 to V2.
 When the transaction finishes, it logs −
<Tn, commit>
The database can be modified using two approaches −
 Deferred database modification − All logs are written on to the stable storage and the
database is updated when a transaction commits.
 Immediate database modification − Each log follows an actual database modification.
That is, the database is modified immediately after every operation.
Recovery with Concurrent Transactions
When more than one transaction are being executed in parallel, the logs are interleaved. At the
time of recovery, it would become hard for the recovery system to backtrack all logs, and then
start recovering. To ease this situation, most modern DBMS use the concept of 'checkpoints'.
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the memory
space available in the system. As time passes, the log file may grow too big to be handled at all.
Checkpoint is a mechanism where all the previous logs are removed from the system and stored
permanently in a storage disk. Checkpoint declares a point before which the DBMS was in
consistent state, and all the transactions were committed.
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the following
manner −
 The recovery system reads the logs backwards from the end to the last checkpoint.
 It maintains two lists, an undo-list and a redo-list.
 If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn,
Commit>, it puts the transaction in the redo-list.

 If the recovery system sees a log with <T n, Start> but no commit or abort log found, it
puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the
transactions in the redo-list and their previous logs are removed and then redone before saving
their logs.
4.13 Recovery Algorithm:
Introduction to ARIES
ARIES is a recovery algorithm that is designed to work with a steal, no-force approach. When
the recovery manager is invoked after a crash, restart proceeds in three phases:
1. Analysis: Identifies dirty pages in the buffer pool and active transactions at the time of the
crash.
2. Redo: Repeats all actions, starting from an appropriate point in the log, and restores the
database stateto what it was at the time of the crash.3.Undo: Undoes the actions of transactions
that did not commit, so that the database reflects only theactions of committed transactions.
There are three main principles behind the ARIES recovery algorithm:
Write-ahead logging: Any change to a database object is first recorded in the log; the record in
the logmust be written to stable storage before the change to the database object is written to
disk.
Repeating history during Redo: Upon restart following a crash, ARIES retraces all actions of
theDBMS before the crash and brings the system back to the exact state that it was in at the time
of the crash. Then, it undoes the actions of transactions that were still active at the time of the
crash.
Logging changes during Undo: Changes made to the database while undoing a transaction are
logged inorder to ensure that such an action is not repeated in the event of repeated restarts.
Buffer Management:
A DBMS must manage a huge amount of data, and in the course of processing therequired space
for the blocks of data will often be greater than the memory spaceavailable. For this there is the
need to manage a memory in which to load and unload theblocks. The buffer manager is

responsible primarily for managing the operations inherentsaving and loading of the blocks. In
fact, the operations that provides the buffer manager are these:* FIX: This command tells the
operator of the buffer to load a block from disk and returnthe pointer to the memory where it is
loaded. If the block was already in memory, thebuffer manager needs only to return the pointer,
otherwise he must load from disk andbring it into memory. If the buffer memory is full but it is
possible to have 2 situations:or the possibility of releasing a portion of memory that is occupied
by transactionsalready completed. In this case, before freeing the area the content is written
to disk if any block of this area had been changed.* There is the possibility of free memory to be
occupied because transitions still ongoing.In this case, the buffer manager can work in 2 ways: in
the first mode (STEAL) theoperator of the free buffer memory occupied by a transition already
active, possiblysaving your changes to disk, in the second mode (NOT STEAL) the transition
requestedblock is made to wait until the free memory.* SET DIRTY: invoking this
command, you mark a block of memory as amended.Before introducing the last 2
commands you need to anticipate that the DMBS canoperate in 2 modes: Force and NOT
FORCE. When working in FORCE mode, the rescuedisk is in synchronous mode with
the commit of a transaction. When working mode isNOT FORCE the rescue is carried out from
time to time in asynchronous manner.Typically, commercial database operating mode NOT
FORCE because this allows anincrease in performance: the block may undergo multiple changes
in memory beforebeing saved, then you can choose to make the saves when the system is
unloading.* Force: This command will cause the operator of the buffer to make the writing
insynchronously with the completion (commit) the transaction* FLUSH: This command will
cause the operator of the buffer to perform the rescue,when in how NOT FORCE.
4.15 Failure with Loss of Nonvolatile Storage:

Until now, we have considered only the case where a failure results in the loss of information
residing in volatile storage while the content of the nonvolatile storage remains intact. Although
failures in which the content of nonvolatile storage is lost are rare, we nevertheless need to be
prepared to deal with this type of failure. In this section, we discuss only disk storage. Our
discussions apply as well to other nonvolatile storage types.
The basic scheme is to dump the entire content of the database to stable storage periodically—
say, once per day. For example, we may dump the database to one or more magnetic tapes. If a
failure occurs that results in the loss of physical database blocks, the system uses the most recent

dump in restoring the database to a previous consistent state. Once this restoration has been
accomplished, the system uses the log to bring the database system to the most recent consistent
state.
More precisely, no transaction may be active during the dump procedure, and a procedure similar
to checkpointing must take place:
1. Output all log records currently residing in main memory onto stable storage.
2. Output all buffer blocks onto the disk.
3. Copy the contents of the database to stable storage.
4. Output a log record <dump> onto the stable storage.
Steps 1, 2, and 4 correspond to the three steps used for checkpoints in Section 17.4.3.
To recover from the loss of nonvolatile storage, the system restores the database to disk by using
the most recent dump. Then, it consults the log and redoes all the transactions that have
committed since the most recent dump occurred. Notice that no undo operations need to be
executed.
A dump of the database contents is also referred to as an archival dump, since we can archive
the dumps and use them later to examine old states of the database.
Dumps of a database and checkpointing of buffers are similar.
The simple dump procedure described here is costly for the following two reasons.
First, the entire database must be be copied to stable storage, resulting in considerable data
transfer. Second, since transaction processing is halted during the dump procedure, CPU cycles
are wasted. Fuzzy dump schemes have been developed, which allow transactions to be active
while the dump is in progress. They are similar to fuzzy checkpointing schemes; see the
bibliographical notes for more details.
4.16 Remote database systems:
A remote, online, or managed backup service, sometimes marketed as cloud backup or backup-
as-a-service, is a service that provides users with a system for the backup, storage, and recovery
of computer files. Online backup providers are companies that provide this type of service to end
users (or clients). Such backup services are considered a form of cloud computing.
Online backup systems are typically built around a client software program that runs on a
schedule. Some systems run once a day, usually at night while computers aren't in use. Other

newer cloud backup services run continuously to capture changes to user systems nearly in real-
time. The only backup system typically collects, compresses, encrypts, and transfers the data to
the remote backup service provider's servers or off-site hardware.
There are many products on the market – all offering different feature sets, service levels, and
types of encryption. Providers of this type of service frequently target specific market segments.
High-end LAN-based backup systems may offer services such as Active Directory, client remote
control, or open file backups. Consumer online backup companies frequently have beta software
offerings and/or free-trial backup services with fewer live support options.
Fig: Remote database systems

UNIT-V
Storage and Indexing

Overview:
In this Unit we discuss about Data storage and retrieval. It deals with disk, file, and file
system structure, and with the mapping of relational and object data to a file system. A
variety of data access techniques are presented in this unit , including hashing, B+ - tree
indices, and grid file indices. External sorting which will be done in secondary memory
is discussed here.
This chapter internals of an RDBMS
The lowest layer of the software deals with management of space on disk, where the
data is to be stored. Higher layers allocate, deal locate, read and write pages through
(routines provided by) this layer, called the disk space manager.
On top of the disk space manager, we have the buffer manager, which partitions the
available main memory into a collection of pages of frames. The purpose of the buffer
manager is to bring pages in from disk to main memory as needed in response to read
requests from transactions.
The next layer includes a variety of software for supporting the concepts of a file, which,
in DBMS, is a collection of pages or a collection of records. This layer typically
supports a heap file, or file or unordered pages, as well as indexes. In addition to
keeping track of the pages in a file, this layer organizes the information within a page.
The code that implements relational operators sits on top of the file and access methods
layer. These operators serve as the building blocks for evaluating queries posed against
the data.
When a user issues a query, the query is presented to a query optimizer, whish uses
information about how the data is stored to produce an efficient execution plan for
evaluating the query. An execution plan is usually represented as tree of relational
operators ( with annotations that contain additional detailed information about which
access methods to use.
Data in a DBMS is stored on storage devices such as disks and tapes ; the disk space
manager is responsible for keeping tract of available disk space. The file manager, which
provides the abstraction of a file of records to higher levels of DBMS code, requests to
the disk space manager to obtain and relinquish space on disk.
When a record is needed for processing, it must be fetched from disk to main memory.
The page on which the record resides is determined by the file manager ( the file
manager determines the page on which the record resides)
Sometimes, the file manager uses auxiliary data structures to quickly identify the page
that contains a desired record. After identifying the required page, the file manager
issues a request for the page to a layer of DBMS code called the buffer manager.

5.1 Overview of Storage and Indexing:

Databases are stored in file formats, which contain records. At physical level, the actual
data is stored in electromagnetic format on some device. These storage devices can be
broadly categorized into three types −
 Primary Storage − The memory storage that is directly accessible to the CPU
comes under this category. CPU's internal memory (registers), fast memory
(cache), and main memory (RAM) are directly accessible to the CPU, as they are
all placed on the motherboard or CPU chipset. This storage is typically very
small, ultra-fast, and volatile. Primary storage requires continuous power supply
in order to maintain its state. In case of a power failure, all its data is lost.
 Secondary Storage − Secondary storage devices are used to store data for future
use or as backup. Secondary storage includes memory devices that are not a part
of the CPU chipset or motherboard, for example, magnetic disks, optical disks
(DVD, CD, etc.), hard disks, flash drives, and magnetic tapes.
 Tertiary Storage − Tertiary storage is used to store huge volumes of data. Since
such storage devices are external to the computer system, they are the slowest in
speed. These storage devices are mostly used to take the back up of an entire
system. Optical disks and magnetic tapes are widely used as tertiary storage.
Memory Hierarchy
A computer system has a well-defined hierarchy of memory. A CPU has direct access to
it main memory as well as its inbuilt registers. The access time of the main memory is
obviously less than the CPU speed. To minimize this speed mismatch, cache memory is
introduced. Cache memory provides the fastest access time and it contains data that is
most frequently accessed by the CPU.

The memory with the fastest access is the costliest one. Larger storage devices offer
slow speed and they are less expensive, however they can store huge volumes of data as
compared to CPU registers or cache memory.
Magnetic Disks
Hard disk drives are the most common secondary storage devices in present computer
systems. These are called magnetic disks because they use the concept of magnetization
to store information. Hard disks consist of metal disks coated with magnetizable
material. These disks are placed vertically on a spindle. A read/write head moves in
between the disks and is used to magnetize or de-magnetize the spot under it. A
magnetized spot can be recognized as 0 (zero) or 1 (one).
Hard disks are formatted in a well-defined order to store data efficiently. A hard disk
plate has many concentric circles on it, called tracks. Every track is further divided
into sectors. A sector on a hard disk typically stores 512 bytes of data.
Redundant Array of Independent Disks
RAID or Redundant Array of Independent Disks, is a technology to connect multiple
secondary storage devices and use them as a single storage media.
RAID consists of an array of disks in which multiple disks are connected together to
achieve different goals. RAID levels define the use of disk arrays.
RAID 0
In this level, a striped array of disks is implemented. The data is broken down into
blocks and the blocks are distributed among disks. Each disk receives a block of data to
write/read in parallel. It enhances the speed and performance of the storage device.
There is no parity and backup in Level 0.
RAID 1
RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a
copy of data to all the disks in the array. RAID level 1 is also called mirroring and
provides 100% redundancy in case of a failure.

RAID 2
RAID 2 records Error Correction Code using Hamming distance for its data, striped on
different disks. Like level 0, each data bit in a word is recorded on a separate disk and
ECC codes of the data words are stored on a different set disks. Due to its complex
structure and high cost, RAID 2 is not commercially available.
RAID 3
RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is
stored on a different disk. This technique makes it to overcome single disk failures.
RAID 4
In this level, an entire block of data is written onto data disks and then the parity is
generated and stored on a different disk. Note that level 3 uses byte-level striping,
whereas level 4 uses block-level striping. Both level 3 and level 4 require at least three
disks to implement RAID.

RAID 5
RAID 5 writes whole data blocks onto different disks, but the parity bits generated for
data block stripe are distributed among all the data disks rather than storing them on a
different dedicated disk.
RAID 6
RAID 6 is an extension of level 5. In this level, two independent parities are generated
and stored in distributed fashion among multiple disks. Two parities provide additional
fault tolerance. This level requires at least four disk drives to implement RAID.
5.2 Data on External Storage:

Secondary Storage − Secondary storage devices are used to store data for future use or
as backup. Secondary storage includes memory devices that are not a part of the CPU
chipset or motherboard, for example, magnetic disks, optical disks (DVD, CD, etc.),
hard disks, flash drives, and magnetic tapes.

5.3 File Organizations And Indexing:
File Organization
File Organization defines how file records are mapped onto disk blocks. We have four
types of File Organization to organize file records −
Heap File Organization

When a file is created using Heap File Organization, the Operating System allocates
memory area to that file without any further accounting details. File records can be
placed anywhere in that memory area. It is the responsibility of the software to manage
the records. Heap File does not support any ordering, sequencing, or indexing on its
own.
Sequential File Organization
Every file record contains a data field (attribute) to uniquely identify that record. In
sequential file organization, records are placed in the file in some sequential order based
on the unique key field or search key. Practically, it is not possible to store all the
records sequentially in physical form.
Hash File Organization:Hash File Organization uses Hash function computation on
some fields of the records. The output of the hash function determines the location of
disk block where the records are to be placed.
Clustered File Organization
Clustered file organization is not considered good for large databases. In this
mechanism, related records from one or more relations are kept in the same disk block,
that is, the ordering of records is not based on primary key or search key.
File Operations
Operations on database files can be broadly classified into two categories −

 Update Operations
 Retrieval Operations
 Update operations change the data values by insertion, deletion, or update.
Retrieval operations, on the other hand, do not alter the data but retrieve them
after optional conditional filtering. In both types of operations, selection plays a
significant role. Other than creation and deletion of a file, there could be several
operations, which can be done on files.
 Open − A file can be opened in one of the two modes, read mode or write
mode. In read mode, the operating system does not allow anyone to alter data. In
other words, data is read only. Files opened in read mode can be shared among
several entities. Write mode allows data modification. Files opened in write
mode can be read but cannot be shared.
 Locate − Every file has a file pointer, which tells the current position where the
data is to be read or written. This pointer can be adjusted accordingly. Using find
(seek) operation, it can be moved forward or backward.
 Read − By default, when files are opened in read mode, the file pointer points to
the beginning of the file. There are options where the user can tell the operating
system where to locate the file pointer at the time of opening a file. The very
next data to the file pointer is read.
 Write − User can select to open a file in write mode, which enables them to edit
its contents. It can be deletion, insertion, or modification. The file pointer can be
located at the time of opening or can be dynamically changed if the operating
system allows to do so.
 Close − This is the most important operation from the operating system’s point
of view. When a request to close a file is generated, the operating system
o removes all the locks (if in shared mode),
o saves the data (if altered) to the secondary storage media, and
o releases all the buffers and file handlers associated with the file.
The organization of data inside a file plays a major role here. The process to locate the
file pointer to a desired record inside a file various based on whether the records are
arranged sequentially or clustered. We know that data is stored in the form of records. Every record
has a key field, which helps it to be recognized uniquely.

Indexing is a data structure technique to efficiently retrieve records from the database
files based on some attributes on which the indexing has been done. Indexing in
database systems is similar to what we see in books.
Indexing is defined based on its indexing attributes. Indexing can be of the following
types −
 Primary Index − Primary index is defined on an ordered data file. The data file
is ordered on a key field. The key field is generally the primary key of the
relation.
 Secondary Index − Secondary index may be generated from a field which is a
candidate key and has a unique value in every record, or a non-key with
duplicate values.
 Clustering Index − Clustering index is defined on an ordered data file. The data
file is ordered on a non-key field.
Ordered Indexing is of two types −
 Dense Index
 Sparse Index
Dense Index
In dense index, there is an index record for every search key value in the database. This
makes searching faster but requires more space to store index records itself. Index
records contain search key value and a pointer to the actual record on the disk.
Sparse Index
In sparse index, index records are not created for every search key. An index record here
contains a search key and an actual pointer to the data on the disk. To search a record,
we first proceed by index record and reach at the actual location of the data. If the data
we are looking for is not where we directly reach by following the index, then the
system starts sequential search until the desired data is found.

Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored
on the disk along with the actual database files. As the size of the database grows, so
does the size of the indices. There is an immense need to keep the index records in the
main memory so as to speed up the search operations. If single-level index is used, then
a large size index cannot be kept in memory which leads to multiple disk accesses.
Multi-level Index helps in breaking down the index into several smaller indices in order
to make the outermost level so small that it can be saved in a single disk block, which
can easily be accommodated anywhere in the main memory.

5.4 Comparison Of Three File Organizations:
The costs of some simple operations for three basic file organizations;
Files of randomly ordered records, or heap files;

Files sorted on a sequence of fields or sorted files
Fiels that are hashed on a sequence of fields or hashed files
The choice of file organization can have a significant on performance. The choice of an
appropriate file organization depends on the following operations.
Scan :
Fetch all records in the file. The pages in the file must be fetched from disk into the
buffer pool. There is also a CPU overhead per record for locating the record on the page
( in the pool).
Search with equality selection:
Fetch all records that satisfy an equality selection, for example, “ find the students
record for the student with sid 23’ Pages that contain qualifying records must be fetched
from disk, and qualifying records must be located within retrieved pages.
Search with Range selection ;
Fetch all records that satisfy a range section, for example, “find all students records with
name alphabetically after ‘smith”
Insert :
Insert a given record into the file. We must identify the page in the file into which the
new record must be inserted, fetch that page from disk, modify it to include the new
record, and then write back the modified page. Depending on the file organization, we
may have to fetch, modify and write back other pages as well.

Delete :
Delete a record that is specified using its record identity 9rid). We must identify the page
that contains the record, fetch it from disk, modify it, and write it back. Depending on
the file organization, we may have to fetch, modify and write back other pages as well.
Heap files :
Files of randomly ordered records are called heap files.

The various operations in heap files are :
Scan :
The cost is B(D+RC) because we must retrieve each of B pages taking time D per page,
and for each page, process R records taking time C per record.
Search with Equality selection:

Suppose that user knows in advance that exactly one record matches the desired equality
selection, that is, the selection is specified on a candidate key. On average, user must
scan half the file, assuming that the record exists and the distribution of values in the
search field is uniform.
For each retrieved data page, user must check all records on the page to see if it is the
desired record. The cost is 0.5B(D+RC). If there is no record that satisfies the selection
then user must scan the entire file to verify it.
Search with Range selection :

The entire file must be scanned because qualifying records could appear anywhere in the
file, and does not know how many records exist. The cost is B(D+RC).
Insert : Assume that records are always inserted at the end of the file so fetch the last
page in the file, add the record, and write the page back. The cost is 3D+C.

Delete :
First find the record, remove the record from the page, and write the modified page
back. For simplicity, assumption is made that no attempt is made to compact the file to
reclaim the free space created by deletions. The cost is the cost of searching plus C+D.
The record to be deleted is specified using the record id. Since the page id can easily be
obtained from the record it, user can directly read in the page. The cost of searching is
therefore D
Sorted files :
The files sorted on a sequence of field are known as sorted files.
The various operation of sorted files are
Scan : The cost is B(D+RC) because all pages must be examined the order in which
records are retrieved corresponds to the sort order.
(ii) Search with equality selection:
Here assumption is made that the equality selection is specified on the field by
which the file is sorted; if not, the cost is identical to that for a heap file. To locate
the first page containing the desired records or records, qualifying records must
exists, with a binary search in log 2 B steps. Each step requires a disk I/O two
comparisons. Once the page is known the first qualifying record can again be located
by a binary search of the page at a cost of Clog2 R. The cost is Dlog2 B + Clog2B.
This is significant improvement over searching heap files.
(iii) Search with Range selection :

Assume that the range selection is on the soft field, the first record that satisfies
the selction is located as it is for search with equality. Subsequently, data pages are
sequentially retrieved until a record is found that does not satisfy the range selection ;
this is similar to an equality search with many qualifying records.

(iv) Insert :
To insert a record preserving the sort order, first find the correct position in the
file, add the record, and then fetch and rewrite all subsequent pages. On average, assume
that the inserted record belong in the middles of the file. Thus, read the latter half of the
file and then write it back after adding the new record. The cost is therefore the cost of
searching to find the position of the new record plus 2 * (0.5B(D+RC)), that is, search
cost plus B(D+RC)
(v) Delete :
First search for the record, remove the record from the page, and write the
modified page back. User must also read and write all subsequent pages because all
records that follow the deleted record must be moved up to compact the free space. The
cost is search cost plus B(D+RC) Given the record identify (rid) of the record to delete,
user can fetch the page containing the record directly.
Hashed files :
A hashed file has an associated search key, which is a combination of one or more fields
of the file. In enables us to locate records with a given search key value quickly, for
example, “Find the students record for Joe” if the file is hashed on the name field we can
retrieve the record quickly.
This organization is called a static hashed file; its main drawback is that long chains of
overflow pages can develop. This can affect performance because all pages ina bucket
have to be searched.
The various operations of hashed files are ;

Fig: File Hashed on age,with Index on salary
Scan :
In a hashed file, pages are kept at about 80% occupancy ( in order to leave some space
for futue insertions and minimize over flow pages as the file expands). This is achieved
by adding a new page to a bucket when each existing page is 80% full, when records are
initially organized into a hashed file structure. Thus the number of pages, and therefore
the cost of scanning all the data pages, is about 1.25 times the cost of scaning an
unordered file, that is, 1.25B(D+RC)
Search with Equality selection:
The hash function associated with a hashed file maps a record to a bucket based on the
values in all the search key fields; if the value for anyone of these fields is not specified,
we cannot tell which bucket the record belongs to. Thus if the selection is not an
equality condition on all the search key fields, we have to scan the entire file.
Search with Range selection :
The harsh structure offers no help at all; even if the range selection is on the search key,
the entire file must be scanned. The cost is 1.25 B{D+RC}

Insert :
The appropriate page must be located, modified and then written back. The cost is thus
the cost of search plus C+D.
Delete :
We must search for the record, remove it from the page, and write the modified page
back. The cost is again the cost of search plus C+D (for writing the modified page ).
Choosing a file organization :
The below table compares I/O costs for three file organizations
A heap file has good storage efficiency, and supports fast scan, insertion, and deletion of
records. However it is slow for searches.
A stored file also offers good storage efficiency, but insertion and deletion of records is
slow. It is quite for searches, and in particular, it is the best structure for range
selections.
A hashed file does not utilize space quite as well as sorted file, but insertions and
deletions are fast, and equality selections are very fast. However, the structure offers no
support for range selections, and full file scans are title slower; the lower space
utilization means that files contain more pages.
Tree Structured indexing:

5.5 INDEXED SEQUENTIAL ACCESS METHOD (ISAM)
The potential large size of the index file motivates the ISAM idea. Building an auxiliary
file on the index file and so on recursively until the final auxiliary file fits on one page?
This repeated construction of a one-level index leads to a tree structure that is illustrated
in Figure The data entries of the ISAM index are in the leaf pages of the tree and

additional overflow pages that are chained to some leaf page. In addition, some systems
carefully organize the layout of pages so that page boundaries correspond closely to the
physical characteristics of the underlying storage device. The ISAM structure is
completely static and facilitates such low-level optimizations.
Fig ISAM Index Structure

Each tree node is a disk page, and all the data resides in the leaf pages. This corresponds
to an index that uses Alternative (1) for data entries, we can create an index with
Alternative (2) by storing the data records in a separate file and storing key, rid pairs in
the leaf pages of the ISAM index. When the file is created, all leaf pages are allocated
sequentially and sorted on the search key value.The non-leaf level pages are then
allocated. If there are several inserts to the file subsequently, so that more entries are
inserted into a leaf than will fit onto a single page, additional pages are needed because
the index structure is static. These additional pages are allocated from an overflow area.
The allocation of pages is illustrated in below Figure.
Fig: Page allocation in ISAM

5.6 B+ Tree:
B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf
nodes of a B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain
at the same height, thus balanced. Additionally, the leaf nodes are linked using a link
list; therefore, a B+ tree can support random access as well as sequential access.
Structure of B+ Tree
Every leaf node is at equal distance from the root node. A B+tree is of the
order n where n is fixed for every B+ tree.
Internal nodes −
Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.
At most, an internal node can contain n pointers.
Leaf nodes −
Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
 At most, a leaf node can contain n record pointers and n key values.
 Every leaf node contains one block pointer P to point to next leaf node and forms
a linked list.
 B+ Tree Insertion
 B+ trees are filled from bottom and each entry is done at the leaf node.
 If a leaf node overflows −
o Split node into two parts.
o Partition at i = ⌊(m+1)/2⌋.
o First i entries are stored in one node.
o Rest of the entries (i+1 onwards) are moved to a new node.
o ith key is duplicated at the parent of the leaf.
 If a non-leaf node overflows −
o Split node into two parts.
o Partition the node at i = ⌈(m+1)/2⌉.

o Entries up to i are kept in one node.

o Rest of the entries are moved to a new node.
 B Tree Deletion
+
 B+ tree entries are deleted at the leaf nodes.

 The target entry is searched and deleted.
o If it is an internal node, delete and replace with the entry from the left
position.
 After deletion, underflow is tested,
o If underflow occurs, distribute the entries from the nodes left to it.
 If distribution is not possible from left, then
o Distribute from the nodes right to it.
 If distribution is not possible from left or from right, then
o Merge the node with left and right to it.

AssignmentQuestions
UNIT – I
1. Design the ER-Diagram for Hospital Management system.

2. Elaborate the relational model? Explain about various domain and integrity
constraint in Relational Model with examples?
3. Name the main steps in database design. What is the goal of each step? In which
step is the ER model mainly used?
4. Design the ER diagram for Airline Reservation system.
5. Organize the process of evaluating a query using conceptual evolution strategy
with an example.
UNIT – II
1. Illustrate different set operations in Relational algebra with an example?

2. Summarize the following fundamental operations of relational algebra:
i) select ii) project iii) rename
3. Classify about nested queries? What are correlated nested queries? How would
you use the operators IN, EXISTS, UNIQUE, ANY, ALL in writing nested
queries?
4. Summarize short notes on union, set difference, Cartesian product operations
with an example
5. Classifytrigger and explain its three parts? Differentiate row level and
statement level triggers?
UNIT – III
1. Demonstrate about functional dependencies? Discuss about second normal form.

2. Estimate the problems related to decomposition
3. Evaluate the dependency preserving property with example.
4. Compare fourth normal form and BCNF.
5. Identify about dependency preserving decomposition.
UNIT – IV
1. Organize a locking protocol? Describe the Strict Two Phase Locking Protocol?
What can you say about the schedules allowed by this protocol?
2. Classify short notes on : a) Multiple granularity b) Serializability c) Complete
schedule d) Serial Schedule.

3. Experiment the Time Stamp - Based Concurrency Control protocol? How is it

used to ensure serializability?
4. Illustrate a log file? Explain about the check point log based recovery schema for
recovering the database.
5. Classifythe failures that can occur with loss of Non-volatile storage?.
UNIT – V
1. Illustrate extendable hashing techniques for indexing data records. Consider your
class students data records and roll number as index attribute and show the hash
directory.
2. Is disk cylinder a logical concept? Justify your answer.
3. Formulate the performance implications of disk structure? Explain briefly about
redundant arrays of independent disks.
4. Measure the indexing? Explain what are the differences between trees based
index and Hash based index.
5. Justify extendable hashing? How it is different from linear hashing?

Tutorial Problems
Tutorial-1
1. Construct an Entity-Relationship diagram for a online shopping systems such as

Jabong/Flipcart. Quote your assumptions and list the requirements considered by
you for conceptual database design for the above system.
2. Construct an ER diagram? Specify the notations used to indicate various
components of ER diagram.
3. Identify the different types of relationships in ER modeling.
4. Analyze primary key and foreign key constraint. How these constraints are
expressed in SQL?
5. Discuss about query processor and database system structure.
Tutorial -2
1. Consider the following schema to write queries in Domain relational calculus:

Sailor(sid, sname, age, rating)
Boats(bid, bname, bcolor)
Reserves(sid, bid, day)
a) Find the boats reserved by sailor with id 567.
b) Find the names of the sailors who reserved 'red' boats.
c) Find the boats which have at least two reservations by different sailors.
2. Identity the aggregate and comparison operators in SQL? Explain with an
example in detail.
3. Answer each of the following questions briefly. The questions are based on the
following relational schema:
Suppliers(sid: integer, sname:string, address:string)
Parts(pid:integer, pname:string, color:string)
Catalog(sid:integer, pid:integer, cost:real)
a) Find the sids of suppliers who charge more for some part than the average cost
of that part (averaged over all the suppliers who supply that part).
b) For each part, find the sname of the supplier who charges the most for that
part.
c) Find the sids of suppliers who supply only red parts.
d) For every supplier that supplies more than 1 part, print the name of the
supplier and the total number of parts that she supplies.
4. Elaborate the Trigger? Explain how to implement Triggers in SQL with example.
5. Discuss the following operators in SQL with examples
i) Some ii) Not In iii) In iv) Except
Tutorial -3
1. Consider a relation R with five attributes ABCDE. You are given the following
dependencies: A->B, BC->E and ED->A
i) List all keys for R
ii) Is R in 3NF? If not, explain why not.

iii) Is R in BCNF? If not, explain why not.

2. Define 1NF, 2NF, 3NF and BCNF, what is the motivation for putting a relation
in BCNF? What is the motivation for 3NF?
3. Construct closure of F? Where F is the set of functional dependencies. Explain
computing F+ with suitable examples.
4. Differentiate between FD and MFD
5. Summarize the problems are caused by redundancy and decomposition of
relation.
Tutorial -4
1. Discuss about log? What is log tail? Explain the concept of checkpoint log
record.
2. Elaborate to test serializability of a schedule? Explain with an example.
3. Construct the concurrency control using time stamp ordering protocol.
4. Demonstrate ACID properties of transactions.
5. Differentiate transaction rollback and restart recovery.
Tutorial -5
1. Illustrate the indexed data structures? Explain any one of them.

2. Compare heap file organization with hash file organization.
3. Formulate all operations of B+ tree for indexing with suitable example.
4. Organize the cluster index, primary and secondary indexes with examples.
5. Discuss about composite search key? What are the pros and cons of composite
search keys

Important Questions
Unit-1
Explain DBMS? Explain Database system Applications.
Make a comparison between Database system and File system.
Explain storage manager component of Database System structure.
Explain the Database users and user interfaces.
Explain levels of data abstraction.
List and explain the functions of data base administrator
What is an ER diagram? Specify the notations used to indicate various components of

ER-diagram
Explain the Transaction management in a database.
What are the types of languages a database system provides? Explain.
How to specify different constraints in ER diagram with examples.
What is an unsafe query? Give an example and explain why it is important to

disallow such queries?
Explain the Participation Constraints.
List the six design goals for relational database and explain why they are desirable.
A company database needs to store data about employees, departments and

children of employees. Draw an ER diagram that captures the above data.
Discuss aggregation versus ternary Relationships.
Explain conceptual design for large Databases.
Explain how to differentiate attributes in Entity set?
What is the composite Attribute? How to model it in the ER diagram? Explain with an
example.
Compare candidate key , primary key and super key.

Unit-2
What is a relational database query? Explain with an example.
Relational Calculus is said to be a declarative language, in contrast to algebra, which

is a procedural language. Explain the distinction.
Discuss about Tuple Relational Calculus in detail.
Write the following queries in Tuple Relational Calculus for following Schema.
Sailors (sid: integer, sname: string, rating: integer, age: real)
Boats (bid: integer, bname: string, color: string)
Reserves (sid: integer, bid: integer, day: date)
i. Find the names of sailors who have reserved a red boat
ii. Find the names of sailors who have reserved at least one boat
(b) Find the names of sailors who have reserved at least two boats
(c) Find the names of sailors who have reserved all boats.
(d) Explain various operations in relational algebra with example.
(e) Compare procedural and non procedural DML’s.
(f) Explain about Relation Completeness
Consider the following schema:
Suppliers (sid : integer, sname: string, address: string)
Parts (pid : integer, pname: string, color: string)
Catalog (sid : integer; pid : integer, cost: real)
The key fields are underlined. The catalog relation lists the
price changes for parts by supplies. Write the following
queries in SQL.
Find the pnames of parts for which there is some supplier.
(g) Find the snames of suppliers who supply every part.

(h) Find the pnames of parts supplied by raghu supplier and no one else.
(i) Find the sids of suppliers who supply only red parts.

Explain in detail
thefollowing
(j) i. join
operation
ii. Nested-
loop join
iii.BlockNes
ted-Loop
join.
Write the SQL expressions for the following
relational database? sailor schema(sailor id, Boat id,
sailorname, rating, age) Reserves(Sailor id, Boat id,
Day)
Boat Schema(boat id, Boatname,color)
i. Find the age of the youngest sailor for each rating level?
Find the age of the youngest sailor who is eligible to vote for each rating
level with at lead two such sailors?
Find the No. of reservations for each red boat?
Find the average age of sailor for each rating level that atleast 2 sailors.
What is outer join? Explain different types of joins?
What is a trigger and what are its 3 parts. Explain in detail.
What is view? Explain the Views in SQL.
Unit-3
What is Normalization? give types of normalization

What are the advantages of normalized relations over the unnormalized relations?
What is redundancy? What are the problems caused by redundancy?
What is dependency preserving decomposition?
Explain multivalued dependencies with example
Explain lossless join decomposition
Is the Decomposition R into R1(A,C,D),R2(B,C,D) and R3(E,F,D) lossless?
Explain BCNF with example?
Explain 3NF and 4NF with examples.
Explain 5NF with examples.
Unit-4
What is transaction? Explain the states and properties of transaction?

Explain the time stamp based protocols.
Discuss how to handle deadlocks?
Explain about multiple granularity
Explain read-only ,write-only & read-before-write protocols in serialazability.
Describe each of the following locking protocols
i. Two phase lock
ii. Conservative two phase lock

Explain the implementation of atomicity and durability.

Explain ACID properties of Transaction?
Explain different types of failures?
Explain logical undo
Logging? b. Explain
Transaction Rollback?
Explain Log-Record Buffering in detail.
What are the merits & demerits of using fuzzy dumps for media recovery.
Explain the phases of ARIES Algorithm.
Explain 3 main properties of ARIES Algorithm
What information does the dirty page table and transaction table contain.
Explain about Buffer Manager in detail.
describe the shadow paging recovery technique
Explain the difference between system crash and disaster?
Unit-5
Explain the following
a. Cluster indexes
b. Primary and secondary indexes
c. Clustering file organization
Discuss various file organizations.

Write short notes on dense and spare indices
Explain about the B+ tree structure in detail with an example
Write a short notes on ISAM.
Compare the Ordered Indexing with Hashing.
Compare Linear Hashing with extendable Hashing
Explain about external storage media.
Differentiate between Extendible vs. Linear Hashing.

Unit wise Objective Questions
Unit-I
In the relational modes, cardinality is termed as:
(A)Number of tuples.(B)Number of attributes.C).Number of tables. D) Number of

constraints.
Relational calculus is a
(A) Procedural language. (B) Non- Procedural language.

(c)Data definition language. (D) High level language.
The view of total database content is
(A) Conceptual view. (B) Internal view.(C)External view. (D) Physical View.
Cartesian product in relational algebra is A
Unary operator. (B) A Binary operator.

C.Ternary operator. (D) Not defined.
Cartesian product in relational algebra is a binary operator.

(It requires two operands. e.g., P X Q)
DML is provided for

a)Description of logical structure of database.
b)Addition of new structures in the database system.
c)Manipulation& processing of database.
d)Definition of physical structure of database system.
‘AS’ clause is used in SQL for
(A) Selection operation. (B) Rename operation.

(C) Join operation. (D) Projection operation.
ODBC stands for

a.Object Database Connectivity.
b.Oral Database Connectivity.
c.Oracle Database Connectivity.
d.OpenDatabase Connectivity.
Architecture of the database can be viewed as
(A) two levels. (B)four levels.(C) three levels.(D)one level.

In a relational model, relations are termed as
(A)Tuples. (B) Attributes C)Tables. (D) Rows. Ans:
In the architecture of a database system external level is the
A)physical level. (B) logical level.C)conceptual level (D) view level.
Unit-II
An entity set that does not have sufficient attributes to form a primary key
is a A)strong entity set. (B) weak entity set.C)simple entity set. (D) primary
entity set.
In a Hierarchical model records are organized as
A) Graph. (B) List. C) Links. (D) Tree.
In tuple relational calculus 2 1 P P → is equivalent to

(A) 2 1 P P ∨ ¬ (B) 2 1 P P ∨
(C) 2 1 P P 𝖠 (D) 2 1 P P ¬ 𝖠
Q.4The language used in application programs to request data from the

DBMS is referred to as the
A)DML (B) DDL C)VDL (D) SDL
Q.5A logical schema is
A) the entire database.

B) is a standard way of organizing information into accessible parts.
C) describes how data is actually stored on disk.
D) both (A) and (C)
Q.6 The database environment has all of the following components except:
(A) users. (B) separate files.C)database. (D) database administrator.
Q.7The way a particular application views the data from the

database that the application uses is a
(A) module. (B) relational model.C)schema. (D) sub schema.
Q. 8 In an E-R diagram an entity set is represent by a
A)rectangle. (B) ellipse.C)diamond box. (D) circle.

Unit-III
A report generator is used to
(A) update files. (B) print files on paper.

(C) data entry. (D) delete files.
The property / properties of a database is / are :
A).It is an integrated collection of logically related records.

B).It consolidates separate files into a common pool of data records.
C) Data stored in a database is independent of the application programs using it.
D) All of the above.
The DBMS language component which can be embedded in a program is
A).The data definition language (DDL).

B).The data manipulation language (DML).
C)The database administrator (DBA).
D)Aquerylanguage.
Conceptual design
a) is a documentation technique.
(b) needs data volume and processing frequencies to determine the size of the
database.
(c) involves modelling independent of the DBMS.
(d) is designing the relational model.
The method in which records are physically stored in a specified order

according to a key field in each record is
(A) hash. (B) direct.(C) sequential. (D) all of the above.
A subschema expresses
(A) the logical view. (B)the physical view.(C)the external view. (D)all of the above.
Count function in SQL returns the number of

(A) values.(B) distinct values.(C) groups.(D) columns.
Which one of the following statements is false?
A) The data dictionary is normally maintained by the databaseadministrator.

B) Data elements in the database can be modified by changing the data
dictionary.
C) The data dictionary contains the name and description of each data element.
D) The data dictionary is a tool used exclusively by the
database administrator.

Unit-
An advantage of the database management
135approach is
A)data is dependent on programs.

B) data redundancy increases.
C) data is integrated and can be accessed by multiple programs.
D) none of the above.
A DBMS query language is designed to
A) support end users who use English-like commands.

B) support in the development of complex applications software.
C) specify the structure of a database.
D) all of the above.
Transaction processing is associated with everything below except
A)producing detail, summary, or exception reports.

B) recording a business activity.
C) confirming an action or triggering a response.
D)maintaining data.
It is possible to define a schema completely using
(A) VDL and DDL. (B) DDL and DML.

C)SDL and DDL. (D) VDL and DML.
The method of access which uses key transformation is known as
(A) direct. (B) hash.(C) random. (D) sequential.
Data independence means
A) data is defined separately and not included in programs.

B) programs are not dependent on the physical attributes of data.
C) programs are not dependent on the logical attributes of data.
D) both (B) and (C).
The statement in SQL which allows to change the definition of a table is
(A) Alter. (B)Update. (C)Create. (D)select.
Key to represent relationship between tables is called
(A) Primary key(B) Secondary Key C)Foreign Key (D) None of these

Unit-
136
1.The file organization that provides very fast access to any arbitrary record of a
. file is
(A) Orderedfile (B) Unordered file
(C) Hashed file (D) B-tree
2.DBMS helps
achieve
(A) Dataindependenc (B) Centralized control of data
(e) Neither (A) nor (B) (D) both (A) and (B)
Q.3What is a relationship called when it is maintained between two entities?
A)Unary (B) Binary C)Ternary (D) Quaternary
Q.4Which of the following operation is used if we are interested in only certain columns
of a table?
A)PROJECTION (B) SELECTION C)UNION (D) JOIN

Long answer questions
II year CSE – II Sem DBMS
Unit-1
1. Explain DBMS? Explain Database system Applications.
2. Make a comparison between Database system and File system.
3. Explain storage manager component of Database System structure.
4. Explain the Database users and user interfaces.
5. Explain levels of data abstraction.
6. List and explain the functions of data base administrator
7. What is an ER diagram? Specify the notations used to indicate various

components of ER-diagram
8. Explain the Transaction management in a database.
9. What are the types of languages a database system provides? Explain.
10. How to specify different constraints in ER diagram with examples
11. What is an unsafe query? Give an example and explain why it is important to
disallow such queries?
12. Explain the Participation Constraints.
13. List the six design goals for relational database and explain why they are
desirable.
Unit-2
1. A company database needs to store data about employees, departments and children
of employees. Draw an ER diagram that captures the above data.
2. Discuss aggregation versus ternary Relationships.
3. Explain conceptual design for large Databases.
4. Explain how to differentiate attributes in Entity set?
5. What is the composite Attribute? How to model it in the ER diagram? Explain with an
example.

6. Compare candidate key,primary key and super key.
7.What is a relational database query? Explain with an example.
8.Relational Calculus is said to be a declarative language, in contrast to algebra,

which is a procedural language. Explain the distinction.
9.Discuss about Tuple Relational Calculus in detail.
Write the following queries in Tuple Relational Calculus for following Schema.
Sailors (sid: integer, sname: string, rating: integer, age: real)
Boats (bid: integer, bname: string, color: string)
Reserves (sid: integer, bid: integer, day: date)
i. Find the names of sailors who have reserved a red boat
Find the names of sailors who have reserved at least one boat
Find the names of sailors who have reserved at least two boats
Find the names of sailors who have reserved all boats.
10. Explain various operations in relational algebra with example.
Compare procedural and non procedural DML’s.
11. Explain about Relation Completness
12.Consider the following schema:
Suppliers (sid : integer, sname: string, address: string)
Parts (pid : integer, pname: string, color: string)
Catalog (sid : integer; pid : integer, cost: real)
The key fields are underlined. The catalog relation lists the
price changes for parts by supplies.
13. Write the following queries in SQL.
i. Find the pnames of parts for which there is some supplier.
Find the snames of suppliers who supply every part.

Find the pnames of parts supplied by raghu supplier and no one else.
Find the sids of suppliers who supply only red parts.
14. Write the SQL expressions for the following

relational database? sailor schema(sailor id,
Boat id, sailorname, rating, age)
Reserves(Sailor id, Boat id, Day)
Boat Schema(boat id, Boatname,color)
i. Find the age of the youngest sailor for each rating level?
15. Find the age of the youngest sailor who is eligible to vote for each rating level
with at lead two such sailors?
16. Find the No. of reservations for each red boat?
17. Find the average age of sailor for each rating level that atleast 2 sailors.
18. What is outer join? Explain different types of joins?
19. What is a trigger and what are its 3 parts. Explain in detail.
Unit-3
1. a. What is Normalization? give types of normalization

b.What are the advantages of normalized relations over the unnormalized relations?
2. What is redundancy? What are the problems caused by redundancy?
3. What is dependency preserving decomposition?
4. Explain multivalued dependencies with example
5. Explain lossless join decomposition
6. Consider the relation R(A,B,C,D,E) and
FD’sA>BC C->A D->E F->AE->D

Is the Decomposition R into R1(A,C,D),R2(B,C,D) and R3(E,F,D) lossless?
7. Explain BCNF with example?

Explain 3NF and 4NF with examples
Unit-4
1. What is transaction? Explain the states and properties of transaction?
2.Explain the time stamp based protocols.
3. Discuss how to handle deadlocks?
4. Explain about multiple granularity
5.Explain read-only ,write-only & read-before-write protocols in serialazability.

6.Describe each of the following locking protocols
i. Two phase lock

ii. Conservative two phase lock
7. Explain the implementation of atomicity and durability.
8.Explain ACID properties of Transaction?
9.Explain different types of failures?Explain logical undo Logging? Explain Transaction

Rollback?
10Explain Log-Record Buffering in detail.
10 What are the merits & demerits of using fuzzy dumps for media recovery.
a. Explain the phases of ARIES Algorithm.

b. Explain 3 main properties of ARIES Algorithm
11.What information does the dirty page table and transaction table contain.
12.Explain about Buffer Manager in detail.
13. describe the shadow paging recovery technique
14.Explain the difference between system crash and disaster?
Unit-5
1. Explain the following
a. Cluster indexes
b. Primary and secondary indexes
c. Clustering file organization
2. Discuss various file organizations.

3. Write short notes on dense and spare indices
4. Explain about the B+ tree structure in detail with an example
5. Write a short notes on ISAM.
6. Compare the Ordered Indexing with Hashing.
7. Compare Linear Hashing with extendable Hashing
8. Explain about external storage media.

Sample MidPaper
II B.Tech II Sem CSE Database Management Systems I Mid Question Paper
PART-A
1. a) List the responsibilities of DBA?
b) Write brief notes on views?
c) List the primitive operations in relational algebra?
d) What is meant by nested queries?
e) What is Trigger and Active database?
PART-B
2) Explain the different types of relationships in E-R modeling?

(or)
3) Name the main steps in database design. What is the goal of each step? In
which step is the E-R model mainly used.
4) What are integrity constraints? How these constraints are expressed in SQL?
(or)
5) Explain the operations of relational algebra? What are aggregative operations
and logical operators in SQL?
6) Describe about DDL & DML commands with syntaxes and examples?
(or)
7) What is normalization? Explain 1NF, 2NF and 3NF Normal forms with
examples?



Code No: 114CQ R13

JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
B.Tech II Year II Semester Examinations, May -
2016 DATABASE MANAGEMENT SYSTEMS
(Common to CSE, IT)
Time:3Hours Max. Marks:75
Note: This question paper contains two parts A and B.
Part A is compulsory which carries 25 marks. Answer all questions in
Part A.
Part B consists of 5 Units. Answer any one full question from each
unit. Each question carries 10 marks and may have a, b, c as
subquestions.
PART-A(25Marks)
1.a) Discuss about DDL. [2]
b) Write brief notes on altering tables and views. [3]
c) Describe about outer join. [2]
d) What is meant by nested queries? [3]
e) What is second normal form? [2]
f) Describe the inclusion dependencies. [3]
g) What is meant by buffer management? [2]
h) What is meant by remote backup system? [3]
i) Discuss about primary indexes. [2]
j) What is meant by linear hashing? [3]
PART-B
(50
Marks)
2. Explain the relationaldatabasearchitecture. [10]
OR
3. State and explain various features ofE-RModels. [10]
4. Explain Tuple relationalcalculus. [10]

OR
5. Discuss about domainrelationalcalculus. [10]
6. What is meant by functional dependencies? Discuss about secondnormalform. [10]

OR
7. Explain fourth normal formandBCNF. [10]
8. What is meant byconcurrencycontrol? [10]

OR
9. Discuss about failure with loss ofnonvolatilestorage. [10]

10. What is meant by extendable hashing? How it is different fromlinearhashing? [10]

OR
11. What are the indexed data structures? Explain any oneofthem. [10]


REFERENCES
Reference Text Books:-
1. Data base Management Systems, Raghurama Krishnan, Johannes Gehrke, TATA

McGrawHill 3rd Edition
2. Data base System Concepts, Silberschatz, Korth, McGraw hill, V edition
3. Data base Systems design, Implementation, and Management, Peter Rob &
Carlos Coronel 7th Edition.
4. Fundamentals of Database Systems, Elmasri Navrate Pearson Education
. 5. Introduction to Database Systems, C.J.Date Pearson Education
Websites:-
http://en.wikipedia.org/wiki/Database_management_system
https://www.tutorialspoint.com/dbms
http://helpingnotes.com/notes/msc_notes/dbms_notes/
http://www.geeksforgeeks.org
Journals:-
1. Specifying Integrity Constraints in a Network DBMS N. Prakash, , N. Parimala,and N.

Bolloju
2. Design and Implementation of a Relational DBMS for Microcomputers F. Cesarini

and G. Soda

DBMS CSE Digital Notes 2020-2021 - March 9th 2021

Uploaded by

Copyright:

Available Formats

DBMS CSE Digital Notes 2020-2021 - March 9th 2021

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DBMS CSE Digital Notes 2020-2021 - March 9th 2021

Uploaded by

Copyright:

Available Formats

LECTURE NOTES

DATA BASE MANAGEMENT SYSTEMS

II B. Tech II semester (1805PC64)

Mr. G. Bhanu Prasad Ms.D.Srivalli

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Mapping of Course outcome with Program Outcomes

UNIT – I: Introduction: Database System Applications, Purpose of Database Systems, View of

Schema Refinement and Normal Forms: Introduction to Schema Refinement, Functional

S. No Title Page No.

Table or Relation: Collection of related records.

Roll Name Age

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 1

1.2 Purpose of DBMS systems:

Disadvantages in File Processing:

 Data redundancy and inconsistency.

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 2

1.3 View of data:

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 3

Object based logical model:

Entity relationship model:

Record based logical model:

1.4 Database Languages:

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 4

 CREATE – used to create objects in the database

 RENAME – used to rename an object

Data Manipulation Language:

 SELECT – It retrieve data from the a database

Data Control Language:

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 5

 COMMIT – It saves the work done

1.5 Relational Databases:

Examples of relational databases: Standard relational databases enable users to manage

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 6

1.6 Database Design:

1.7 Structure of a DBMS:

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 7

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 8

Figure 1.3 shows the structure of a typical DBMS.

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 9

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 10

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 11

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 12

1.10 A Brief History of Database Management Systems:

Enter the Database Management System (DBMS). A database, as a collection of information,

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 13

Using the primary key (also known as the CALC key)

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 14

1.11 Introduction to Data base design:

Conceptual Database Design:

Logical Database Design:

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 15

Physical Database Design:

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 16

Rectangles, which represent entity sets

MallaReddy Engineering College for Women(Autonomous Institution-UGC, Govt. of India) 17

 An entity set is drawn as a rectangle.

 Attributes are drawn as ovals.

 Attributes which belong to the primary key are underlined.