Database Management Systems Notes
Database Management Systems Notes
ANUPAMA JAIN
Objectives:
To Understand the basic concepts and the applications of databasesystems
To Master the basics of SQL and construct queries usingSQL
To understand the relational database designprinciples
To become familiar with the basic issues of transaction processing and concurrency
control
To become familiar with database storage structures and accesstechniques
UNIT I:
Database System Applications, Purpose of Database Systems, View of Data – Data
Abstraction –Instances and Schemas – Data Models – the ER Model – Relational Model –
Other Models – Database Languages – DDL – DML – database Access for applications
Programs – Database Users and Administrator – Transaction Management – Database
Architecture – Storage Manager – the Query Processor.
Introduction to the Relational Model – Structure – Database Schema, Keys – Schema
Diagrams.
Database design and ER diagrams – ER Model - Entities, Attributes and Entity sets –
Relationships and Relationship sets – ER Design Issues – Concept Design – Conceptual
Design with relevant Examples. Relational Query Languages, Relational Operations.
UNIT II:
Relational Algebra – Selection and projection set operations – renaming – Joins – Division –
Examples of Algebra overviews – Relational calculus – Tuple Relational Calculus (TRC) –
Domain relational calculus (DRC).
Overview of the SQL Query Language – Basic Structure of SQL Queries, Set Operations,
Aggregate Functions – GROUPBY – HAVING, Nested Sub queries, Views, Triggers,
Procedures.
UNIT III:
Normalization – Introduction, Non loss decomposition and functional dependencies, First,
Second, and third normal forms – dependency preservation, Boyce/Codd normal form.
Higher Normal Forms - Introduction, Multi-valued dependencies and Fourth normal form,
Join dependencies and Fifth normal form
UNIT IV:
Transaction Concept- Transaction State- Implementation of Atomicity and Durability –
Concurrent Executions – Serializability- Recoverability – Implementation of Isolation –
Testing for serializability- Lock –Based Protocols – Timestamp Based Protocols- Validation-
Based Protocols – Multiple Granularity.
TEXT BOOKS:
1. Database System Concepts, Siebrecht, Korte, McGraw hill, Sixth Edition.(All UNITS
except IIIth)
2. Database Management Systems, Raghu Ramakrishnan, Johannes Gehrke, TATA
McGrawHill 3rdEdition.
REFERENCE BOOKS:
1. Fundamentals of Database Systems, Elmasri Navathe Pearson Education.
2. An Introduction to Database systems, C.J. Date, A.Kannan, S. Swami Nadhan,
Pearson, Eight Edition for UNIT III.
Outcomes:
Demonstrate the basic elements of a relational database managementsystem
Ability to identify the data models for relevantproblems
Ability to design entity relationship and convert entity relationship diagrams into
RDBMS and formulate SQL queries on the respectdata
Apply normalization for the development of applicationsoftware
S. No Topic Page no
Unit
INTRODUCTION TO DATABASE
1 I MANAGEMENT SYSTEM 1
I VIEW OF DATA
2 6
I
4 ENTITY-RELATIONSHIP MODEL 9
I DATABASE SCHEMA
5 21
S. No Topic Page no
Unit
II
1 PRELIMINARIES 23
II
2 RELATIONAL ALGEBRA 23
II
3 RELATIONAL CALCULUS 28
II
4 THE FORM OF A BASIC SQL QUERY 31
II INTRODUCTION TO VIEWS
5 39
II
6 TRIGGERS 40
III
4 DECOMPOSITIONS 49
III DEPENDENCY-PRESERVING
5 55
DECOMPOSITION INTO 3NF
III OTHER KINDS OF
6 DEPENDENCIES 56
S. No Topic Page no
Unit
IV
1 TRANSACTION CONCEPT 63
IV
2 CONCURRENT EXECUTION 67
IV TRANSACTION
3 72
CHARACTERISTICS
IV RECOVERABLE
4 SCHEDULES 76
IV
5 RECOVERY SYSTEM 79
IV TIMESTAMP-BASED
6 85
PROTOCOLS
IV
7 MULTIPLE GRANULARITY. 87
V RECOVERY WITH
5 CONCURRENT 92
TRANSACTIONS
V
6 DBMS FILE STRUCTURE 93
What is a Database?
To find out what database is, we have to start from data, which is the basic building block of
any DBMS.
Data: Facts, figures, statistics etc. having no particular meaning (e.g., 1, ABC, 19 etc).
Record: Collection of related data items, e.g. in the above example the three data items had no
meaning. But if we organize them in the following way, then they collectively represent meaningful
information.
Roll Name Age
1 ABC 19
1 ABC 19
2 DEF 22
3 XYZ 28
The columns of this relation are called Fields, Attributes or Domains. The rows
are called Tuples or Records.
Database: Collection of related relations. Consider the following collection of tables:
T1 T2
1 ABC 19
1 KOL
2 DEF 22
2 DEL
3 XYZ 28
3 MUM
DATABASEMANGAEMENTSYSTEM Page 1
1 I
I H1
2 II
II H2
3 I
A database in a DBMS could be viewed by lots of different people with different responsibilities.
For example, within a company there are different departments, as well as customers, who each need
to see different kinds of data. Each employee in the company will have different levels of access to the
database with their own customized front-endapplication.
In a database, data is organized strictly in row and column format. The rows are called Tuple or
Record. The data items within one row may belong to different data types. On the other hand, the
columns are often called Domain or Attribute. All the data items within a single attribute are of the
same data type.
Database systems are designed to manage large bodies of information. Management of data involves
both defining structures for storage of information and providing mechanisms for the manipulation of
information. In addition, the database system must ensure the safety of the information stored, despite
system crashes or attempts at unauthorized access. If data are to be shared among several users, the
system must avoid possible anomalous results.
Databases touch all aspects of our lives. Some of the major areas of application are as follows:
1. Banking
2. Airlines
3. Universities
4. Manufacturing andselling
5. Humanresources
Enterprise Information
◦ Sales: For customer, product, and purchaseinformation.
◦ Accounting: For payments, receipts, account balances, assets and other accountinginformation.
◦ Human resources: For information about employees, salaries, payroll taxes, and benefits, and for
generation ofpaychecks.
◦ Manufacturing: For management of the supply chain and for tracking production of items infactories,
inventories of items inwarehouses and stores, and orders foritems.
Online retailers: For sales data noted above plus online order tracking, generation
of recommendation lists,and
maintenance of online product evaluations.
This typical file-processing system is supported by a conventional operating system. The system
stores permanent records in various files, and it needs different application programs to extract records
from, and add records to, the appropriate files. Before database management systems (DBMSs) were
introduced, organizations usually stored information in such systems. Keeping organizational
information in a file-processing system has a number of majordisadvantages:
Data redundancy and inconsistency. Since different programmers create the files and application
programs over a long period, the various files are likely to have different structures and the programs
may be written in several programming languages. Moreover, the same information may be duplicated
in several places (files). For example, if a student has a double major (say, music and mathematics) the
address and telephone number of that student may appear in a file that consists of student records of
students in the Music department and in a file that consists of student records of students in the
Mathematics department. This redundancy leads to higher storage and access cost. In addition, it may
lead to data inconsistency; that is, the various copies of the same data may no longer agree. For
example, a changed student address may be reflected in the Music department records but not
elsewhere in the system.
Difficulty in accessing data. Suppose that one of the university clerks needs to find out the names of
all students who live within a particular postal-code area. The clerk asks the data-processing
department to generate such a list. Because the designers of the original system did not anticipate this
request, there is no application program on hand to meet it. There is, however, an application program
to generate the list of all students.
Data isolation. Because data are scattered in various files, and files may be in different formats,
writing new application programs to retrieve the appropriate data is difficult.
Integrity problems. The data values stored in the database must satisfy certain types of consistency
constraints. Suppose the university maintains an account for each department, and records the balance
amount in each account. Suppose also that the university requires that the account balance of a
department may never fall below zero. Developers enforce these constraints in the system by adding
appropriate code in the various application programs. However, when new constraints are added, it is
difficult to change the programs to enforce them. The problem is compounded when constraints
involve several data items from differentfiles.
Atomicity problems. A computer system, like any other device, is subject to failure. In many
applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that existed
prior to the failure.
DATABASEMANGAEMENTSYSTEM Page 4
That is, the funds transfer must be atomic—it must happen in its entirety or not at all. It is difficult to
ensure atomicity in a conventional file-processing system.
Concurrent-access anomalies. For the sake of overall performance of the system and faster response,
many systems allow multiple users to update the data simultaneously. Indeed, today, the largest
Internet retailers may have millions of accesses per day to their data by shoppers. In such an
environment, interaction of concurrent updates is possible and may result in inconsistent data.
Consider department A, with an account balance of $10,000. If two department clerks debit the
account balance (by say $500 and $100, respectively) of department A at almost exactly the same
time, the result of the concurrent executions may leave the budget in an incorrect (or inconsistent)
state. Suppose that the programs executing on behalf of each withdrawal read the old balance, reduce
that value by the amount being withdrawn, and write the result back. If the two programs run
concurrently, they may both read the value $10,000, and write back $9500 and $9900, respectively.
Depending on whichonewritesthevaluelast,theaccountbalanceofdepartment Amaycontaineither$9500or
$9900, rather than the correct value of $9400. To guard against this possibility, the system must
maintain some form of supervision.
But supervision is difficult to provide because data may be accessed by many different application
programs that have not been coordinated previously.
Security problems. Not every user of the database system should be able to access all the data. For
example, in a university, payroll personnel need to see only that part of the database that has financial
information. They do not need access to information about academic records. But, since application
programs are added to the file-processing system in an ad hoc manner, enforcing such security
constraints is difficult.
These difficulties, among others, prompted the development of database systems. In what follows, we
shall see the concepts and algorithms that enable database systems to solve the problems with file-
processing systems.
Advantages of DBMS:
Controlling of Redundancy: Data redundancy refers to the duplication of data (i.e storing same data
multiple times). In a database system, by having a centralized database and centralized control of data
by the DBA the unnecessary duplication of data is avoided. It also eliminates the extra time for
processing the large volume of data. It results in saving the storage space.
Improved Data Sharing: DBMS allows a user to share the data in any number of application programs.
Data Integrity: Integrity means that the data in the database is accurate. Centralized control of the
data helps in permitting the administrator to define integrity constraints to the data in the database. For
example: in customer database we can can enforce an integrity that it must accept the customer only
from Noida and Meerut city.
Security: Having complete authority over the operational data, enables the DBA in ensuring that the
only mean of access to the database is through proper channels. The DBA can define authorization
checks to be carried out whenever access to sensitive data is attempted.
CA-201 DBMS - SIRT ANUPAMA
DATABASEMANGAEMENTSYSTEM Page 5
Efficient Data Access: In a database system, the data is managed by the DBMS and all access to the
data is through the DBMS providing a key to effective data processing
Enforcements of Standards: With the centralized of data, DBA can establish and enforce the data
standards which may include the naming conventions, data quality standards etc.
Data Independence: Ina database system, the database management system provides the interface
between the application programs and the data. When changes are made to the data representation, the
meta data obtained by the DBMS is changed but the DBMS is continuing to provide the data to
application program in the previously used way. The DBMs handles the task of transformation of data
wherever necessary.
Reduced Application Development and Maintenance Time: DBMS supports many important
functions that are common to many applications, accessing data stored in the DBMS, which facilitates
the quick development of application.
Disadvantages of DBMS
1) It is bit complex. Since it supports multiple functionalities to give the user the best, the underlying
software has become complex. The designers and developers should have thorough knowledge
about the software to get the most out ofit.
2) Because of its complexity and functionality, it uses large amount of memory. It also needs large
memory to runefficiently.
3) DBMS system works on the centralized system, i.e.; all the users from all over the world access
this database. Hence any failure of the DBMS, will impact all theusers.
4) DBMS is generalized software, i.e.; it is written work on the entire systems rather specific one.
Hence some of the application will runslow.
View of Data
A database system is a collection of interrelated data and a set of programs that allow users to access
and modify these data. A major purpose of a database system is to provide users with an abstract view
of the data. That is, the system hides certain details of how the data are stored and maintained.
Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers
to use complex data structures to represent data in the database. Since many database-system users
are not computer trained, developers hide the complexity from users through several levels of
abstraction, to simplify users’ interactions with thesystem:
• Physical level (or Internal View / Schema): The lowest level of abstraction describes how the data
are actually stored. The physical level describes complex low-level data structures indetail.
• Logical level (or Conceptual View / Schema): The next-higher level of abstraction describes what
data are stored in the database, and what relationships exist among those data. The logical level thus
describes the entire database in terms of a small number of relatively simple structures. Although
implementation of the simple structures at the logical level may involve complex physical-level
structures, the user of the logical level does not need to be aware of this complexity. This is referred
to as physical dataindependence.
• View level (or External View / Schema): The highest level of abstraction describes only part of the
entire database. Even though the logical level uses simpler structures, complexity remains because of
the variety of information stored in a large database. Many users of the database system do not need
all this information; instead, they need to access only a part of the database. The view level of
abstraction exists to simplify their interaction with the system. The system may provide many views
for the samedatabase.
For example, we may describe a record as follows:
type instructor = record
ID:char (5);
name:char (20);
dept name:char (20);
salary:numeric (8,2);
end;
This code defines a new record type called instructor with four fields. Each field has a name
and a type associated with it. A university organization may have several such record types,
including
At the physical level, an instructor, department, or student record can be described as a block of
consecutive storage locations.
At the logical level, each such record is described by a type definition, as in the previous code
segment, and the interrelationship of these record types is defined as well.
Finally, at the view level, computer users see a set of application programs that hide details of the
data types. At the view level, several views of the database are defined, and a database user sees
some or all of theseviews.
Databases change over time as information is inserted and deleted. The collection of information
stored in the database at a particular moment is called an instance of the database. The overall design
of the database is called the database schema. Schemas are changed infrequently, if at all. The
concept of database schemas and instances can be understood by analogy to a program written in a
programming language.
Each variable has a particular value at a given instant. The values of the variables in a program at a
point in time correspond to an instance of a database schema. Database systems have several
schemas, partitioned according to the levels of abstraction. The physical schema describes the
database design at the physical level, while the logical schema describes the database design at the
logical level. A database may also have several schemas at the view level, sometimes called
subschemas, which describe different views of the database. Of these, the logical schema is by far
the most important, in terms of its effect on application programs, since programmers construct
applications by using the logical schema. Application programs are said to exhibit physical data
independence if they do not depend on the physical schema, and thus need not be rewritten if the
physical schemachanges.
Data Models
Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints.
• Relational Model. The relational model uses a collection of tables to represent both data and the
relationships among those data. Each table has multiple columns, and each column has a unique
name. Tables are also known as relations. The relational model is an example of a record-based
model.
Entity-Relationship Model. The entity-relationship (E-R) data model uses a collection of basic
objects, called entities, and relationships among these objects.
Suppose that each department has offices in several locations and we want to record the locations at
which each employee works. The ER diagram for this variant of Works In, which we call Works In2
Object-Based Data Model. Object-oriented programming (especially in Java, C++, or C#) has
become the dominant software-development methodology. This led to the development of an object-
oriented data model that can be seen as extending the E-R model with notions of encapsulation,
methods (functions), and object identity.
Semi-structured Data Model. The semi-structured data model permits the specification of data
where individual data items of the same type may have different sets of attributes. This is in contrast
to the data models mentioned earlier, where every data item of a particular type must have the same
set of attributes. The Extensible Markup Language (XML) is widely used to represent semi-
structured data.
Historically, the network data model and the hierarchical data model preceded the relational data
model.
These models were tied closely to the underlying implementation, and complicated the task of modeling
data.
As a result, they are used little now, except in old database code that is still in service in some places.
Database Languages
A database system provides a data-definition language to specify the database
schema and a data-manipulation language to express database queries and updates. In practice,
the data-definition and data-manipulation languages are not two separate languages; instead they
simply form parts of a single database language, such as the widely used SQL language.
Data-Manipulation Language
A data-manipulation language (DML) is a language that enables users to access or manipulate data
as organized by the appropriate data model. The types of access are:
• Retrieval of information stored in thedatabase
• Insertion of new information into thedatabase
• Deletion of information from thedatabase
• Modification of information stored in thedatabase
• Domain Constraints. A domain of possible values must be associated with every attribute (for
example, integer types, character types, date/time types). Declaring an attribute to be of a particular
domain acts as a constraint on the values that it can take. Domain constraints are the most elementary
form of integrity constraint. They are tested easily by the system whenever a new data item is entered
into thedatabase.
CA-201 DBMS – SAGE DR. ANUPAMA JAIN
DATABASEMANGAEMENTSYSTEM Page 11
• Authorization. We may want to differentiate among the users as far as the type of access they are
permitted on various data values in the database. These differentiations are expressed in terms of
authorization, the most common being: read authorization, which allows reading, but not
modification, of data; insert authorization, which allows insertion of new data, but not modification
of existing data; update authorization, which allows modification, but not deletion, of data; and
delete authorization, which allows deletion of data. We may assign the user all, none, or a
combination of these types ofauthorization.
The DDL, just like any other programming language, gets as input some instructions (statements) and
generates some output. The output of the DDL is placed in the data dictionary, which contains
metadata—that is, data about data.
Data Dictionary
We can define a data dictionary as a DBMS component that stores the definition of data
characteristics and relationships. You may recall that such “data about data” were labeled metadata.
The DBMS data dictionary provides the DBMS with its self-describing characteristic. In effect, the
data dictionary resembles and X-ray of the company’s entire data set, and is a crucial element in the
data administrationfunction.
For example, the data dictionary typically stores descriptions of all:
• Data elements that are define in all tables of all databases. Specifically, the data dictionary stores
the name, datatypes, display formats, internal storage formats, and validation rules. The data
dictionary tells where an element is used, by whom it is used and soon.
• Tables define in all databases. For example, the data dictionary is likely to store the name of the
table creator, the date of creation access authorizations, the number of columns, and soon.
• Indexes define for each database tables. For each index the DBMS stores at least, the index name
the attributes used, the location, specific index characteristics and the creationdate.
• Define databases: who created each database, the date of creation where the database is located, who
the
DBA is and so on.
• End users and The Administrators of the database
• Programs that access the database including screen formats, report formats application formats,
SQL queries and soon.
• Access authorization for all users of alldatabases.
• Relationships among data elements which elements are involved: whether the relationship are
mandatory or optional, the connectivity and cardinality and soon.
Database Administrators and Database Users
A primary goal of a database system is to retrieve information from and store new information in the
database.
Database Users and User Interfaces
A database system is partitioned into modules that deal with each of the responsibilities of the overall
system. The functional components of a database system can be broadly divided into the storage
DATABASEMANGAEMENTSYSTEM Page 13
Query Processor:
The query processor componentsinclude
· DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
· DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engineunderstands.
A query can usually be translated into any of a number of alternative evaluation plans that all give
the same result. The DML compiler also performs query optimization, that is, it picks the lowest
cost evaluation plan from among the alternatives.
Query evaluation engine, which executes low-level instructions generated by the DML compiler.
Storage Manager:
A storage manager is a program module that provides the interface between the lowlevel data stored in
the database and the application programs and queries submitted to the system. The storage manager is
responsible for the interaction with the filemanager.
The storage manager components include:
Transaction Manager:
What is ER Modeling?
A graphical technique for understanding and organizing the data independent of the actual
databaseimplementation
We need to be familiar with the following terms to go further.
Entity
Anything that has an independent existence and about which we collect data. It is also known as entity
type.
In ER modeling, notation for entity is given below.
Entity instance
Entity instance is a particular member of the entity
type. Example for entity instance: A particular
employee RegularEntity
An entity which has its own key attribute is a regular
entity. Example for regular entity: Employee.
Weak entity
An entity which depends on other entity for its existence and doesn't have any key attribute of its own is
a weakentity.
Attributes
Properties/characteristics which describe entities are called
attributes. In ER modeling, notation for attribute is given below.
Domain of Attributes
The set of possible values that an attribute can take is called the domain of the attribute. For
example, the attribute day may take any value from the set {Monday, Tuesday ... Friday}. Hence this
set can be termed as the domain of the attributeday.
Key attribute
The attribute (or combination of attributes) which is unique for every entity instance is called key
attribute.
E. g the employee_id of an employee, pan_card_number of a person etc. If the key attribute
consists of two or more attributes in combination, it is called a compositekey.
In ER modeling, notation for key attribute is given below.
Simple attribute
If an attribute cannot be divided into simpler components, it is a simple
attribute. Example for simple attribute: employee_id of an employee.
Composite attribute
If an attribute can be split into components, it is called a composite attribute.
Example for composite attribute : Name of the employee which can be split into First_name,
Middle name, and Last-named.
Single valuedAttributes
If an attribute can take only a single value for each entity instance, it is a single valued attribute.
example for single valued attribute: age of a student. It can take only one value for a particularstudent.
DATABASEMANGAEMENTSYSTEM Page 16
Stored Attribute
An attribute which needs to be stored permanently is a stored
attribute Example for stored attribute: name of a student
Derived Attribute
An attribute which can be calculated or derived based on other attributes is a derived attribute.
Example for derived attribute: age of employee which can be calculated from date of birth and current
date.
In ER modeling, notation for derived attribute is given below.
Relationships
Associations between entities are called relationships
Example: An employee works for an organization. Here "works for" is a relation between the
entity’s employee and organization.
In ER modeling, notation for relationship is given below.
However, in ER Modeling, To connect a weak Entity with others, you should use a weak
relationship notation as givenbelow
One employee is assigned with only one parking space and one parking space is assigned to
only one employee. Hence it is a 1:1 relationship and cardinality is One-To-One (1:1)
One organization can have many employees, but one employee works in only one organization.
Hence it is a 1: N relationship and cardinality is One-To-Many (1: N)
In ER modeling, this can be mentioned using notations as given below
One employee works in only one organization but one organization can have many employees.
Hence it is a M:1 relationship and cardinality are Many-to-One (M :1)
One student can enroll for many courses and one course can be enrolled by many students. Hence
it is a M: N relationship and cardinality is Many-to-Many(M: N)
In ER modeling, this can be mentioned using notations as given below
Relationship Participation
1. Total
In total participation, every entity instance will be connected through the relationship to another
instance of the other participating entity types
2. Partial
Example for relationship participation
Consider the relationship - Employee is head of the department.
Here all employees will not be the head of the department. Only one employee will be the head
of the department. In other words, only few instances of employee entity participate in the
above relationship. So employee entity's participation is partial in the saidrelationship.
Relational Model
The relational model is today the primary data model for commercial data processing applications. It
attained its primary position because of its simplicity, which eases the job of the programmer,
compared to earlier data models such as the network model or the hierarchical model.
Structure of Relational Databases:
A relational database consists of a collection of tables, each of which is assigned a unique name. For
example, consider the instructor table of Figure:1.5, which stores information about instructors. The
table has four column headers: ID, name, dept name, and salary. Each row of this table records
information about an instructor, consisting of the instructor’s ID, name, dept name, and salary.
Database Schema
When we talk about a database, we must differentiate between the database schema, which is the
logical design of the database, and the database instance, which is a snapshot of the data in the
database at a given instant in time. The concept of a relation corresponds to the programming-
language notion of a variable, while the concept of a relation schema corresponds to the
programming-language notion of type definition.
Keys
A superkey is a set of one or more attributes that, taken collectively, allow us to identify uniquely a
tuple in the relation. For example, the ID attribute of the relation instructor is sufficient to distinguish
one instructor tuple from another. Thus, ID is a superkey. The name attribute of instructor, on the
other hand, is not a superkey, because several instructors might have the same name.
A superkey may contain extraneous attributes. For example, the combination of ID and name is a
superkey for the relation instructor. If K is a superkey, then so is any superset of K. We are often
interested in super keys for which no proper subset is a superkey. Such minimal superkeys are called
candidate keys.
It is customary to list the primary key attributes of a relation schema before the other attributes; for
example, the dept name attribute of department is listed first, since it is the primary key. Primary key
attributes are also underlined. A relation, say r1, may include among its attributes the primary key of
another relation, say r2. This attribute is called a foreign key from r1, referencing r2.
Schema Diagrams
A database schema, along with primary key and foreign key dependencies, can be depicted by
schema diagrams. Figure 1.12 shows the schema diagram for our university organization.
Referential integrity constraints other than foreign key constraints are not shown explicitly in schema
diagrams. Wewillstudyadifferentdiagrammaticrepresentationcalledtheentity-relationshipdiagram.