DBMS Lecture Notes
DBMS Lecture Notes
TOTAL: 45 PERIODS
COURSE OUTCOMES: At the end of the course, the student will be able to,
CO1: Understand the major objectives of database technology
CO2: Understand the relational model for databases
CO3: Design issues of Database
CO4: Identify the problems in Transaction
CO5:Analyze the issues involved in Implementation
CO-PO MAPPING
CO1 1 2 2 1 2
CO2 2 2 2
CO3 1 2 1 1 1
CO4 2 1 2 2
CO5 1 2 2 2 1
CO5 1 2 2 2 1
1- low, 2 - medium, 3 - high, ‘-' no correlation
TEXT BOOKS:
1.Abraham Silberschatz, Henry F. Korth, S. Sudharshan, “Database System Concepts”,
Sixth Edition, Tata McGraw Hill, 2011 (Unit I and Unit-V).
2.C.J.Date, A.Kannan, S.Swamynathan, “An Introduction to Database Systems”, Eighth
Edition, Pearson Education, 2006.( Unit II, III and IV)
3. Data base Management Systems, Raghurama Krishnan, Johannes Gehrke, TATA
McGrawHill 3rd Edition.
REFERENCE BOOKS:
1.M.T. RamezElmasri, Shamkant B. Navathe, “Fundamentals of Database Systems”,
Fourth Edition , Pearson / Addisionwesley, 2007.
2.Raghu Ramakrishnan, “Database Management Systems”, Third Edition, McGraw Hill,
2003., “Programming In C++”, PHI Pvt. Ltd., 2008
3.Fundamentals of Database Systems, Elmasri Navathe Pearson Education.
WEB RESOURCES:
1.https://www.inmotionhosting.com/blog/what-is-a-database-management-system/
2.https://www.techtarget.com/searchdatamanagement/definition/database-management-
system
INDEX
SNO NAME OF THE TOPIC PAGE NO
1 UNIT I 1
2 UNIT II 22
3 UNIT III 54
4 UNIT IV 71
5 UNIT V 90
Unit 1
Introduction
Introduction to DBMS
DBMS stands for Database Management System.
DBMS = Database + Management System.
Database is a collection of data and Management System is a set of programs
to store and retrieve those data.
DBMS is a collection of inter-related data and set of programs to store &
access those data in an easy and effective manner.
DBMS:-
DBMS is a software that is used to manage the data. Some of the popular DBMS
softwares are: MySQL, IBM Db2, Oracle,
DBMS provides an interface to the user so that the operations on database can
be performed using the interface.
DBMS secure the data, that is the main advantage of DBMS over file system.
DBMS also secures the data from unauthorized access as well as corrupt data
insertions. It allows multiple users to access data simultaneously while
maintaining the data consistency and data integrity.
DBMS allows following operations to the authorized users of the database:
Data Modification: DBMS allows users to insert, update and delete the data
from the tables. These tables contains rows and columns, where row represents a
record of data while column represents attributes of the records. You can also bulk
update the several records in DBMS with a single click.
Data Retrieval: DBMS allows users to fetch data from the database. Searching
and retrieval of data is fast in DBMS. The size of the database doesn’t impact this
operation, on the other hand in file system, the size of the data can hugely impact the
search operation efficiency.
1
Need of DBMS
Database systems are basically developed for large amount of data. When
dealing with huge amount of data, there are two things that require
optimization: Storage of data and retrieval of data.
Fast Retrieval of data: Along with storing the data in an optimized and
systematic manner, it is also important that we retrieve the data quickly when needed.
Database systems ensure that the data is retrieved as quickly as possible.
DBMS applications
2
Banking System: For storing customer info, tracking day to day credit and debit
transactions, generating bank statements etc. All this work has been done with
the help of Database management systems. Also, banking system needs security
of data as the data is sensitive, this is efficiently taken care by the DBMS
systems.
Sales: To store customer information, production information and invoice
details. Using DBMS, you can track, manage and generate historical data to
analyse the sales data.
Airlines: To travel though airlines, we make early reservations, this reservation
information along with flight schedule is stored in database. This is where the
real-time update of data is necessary as a flight seat reserved for one
passenger should not be allocated to another passenger, this is easily handled
by the DBMS systems as the data updates are in real time and fast.
Education sector: Database systems are frequently used in schools and colleges
to store and retrieve the data regarding student details, staff details, course
details, exam details, payroll data, attendance details, fees details etc. There is a
large amount of inter-related data that needs to be stored and retrieved in an
efficient manner.
Online shopping: You must be aware of the online shopping websites such as
Amazon, Flipkart etc. These sites store the product information, your addresses
and preferences, credit details and provide you the relevant list of products based
on your query. All this involves a Database management system. Along with
managing the vast catalogue of items, there is a need to secure the user
private information such as bank & card details. All this is taken care of by
database management systems.
Data redundancy:
o Data redundancy refers to the duplication of data,
o Need more storage
o Data redundancy often leads to higher storage costs
o poor access time.
Data inconsistency:
o Data redundancy leads to data inconsistency, lets take the same example
that we have taken above, a student is enrolled for two courses and we
have student address stored twice, now lets say student requests to
change his address, if the address is changed at one place and not on all
the records then this can lead to data inconsistency.
Data Isolation:
3
o Because data are scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate
data is difficult.
Dependency on application programs:
o Changing files would lead to change in application programs.
Atomicity issues:
o Atomicity of a transaction refers to “All or nothing”, which means either
all the operations in a transaction executes or none.
o It is difficult to achieve atomicity in file processing systems.
Data Security:
o Data should be secured from unauthorised access,
o for example a student in a college should not be able to see the payroll
details of the teachers, such kind of security constraints are difficult to
apply in file processing systems.
No redundant data:
o Redundancy removed by data normalization. No data duplication saves
storage and improves access time.
Data Consistency and Integrity:
o As we discussed earlier the root cause of data inconsistency is data
redundancy, since data normalization takes care of the data redundancy,
data inconsistency also been taken care of as part of it
Data Security:
o It is easier to apply access constraints in database systems so that only
authorized user is able to access the data.
o Each user has a different set of access thus data is secured from the
issues such as identity theft, data leaks and misuse of data.
Privacy:
o Limited access means privacy of data. DBMS can grant and revoke
access to the database on user level that ensures who is accessing which
data. It also helps user to manage the constraints on database, this
ensures which type of data can be entered into the table.
Easy access to data –
o Database systems manages data in such a way so that the data is easily
accessible with fast response times. Even if the database size is huge, the
DBMS can still provide faster access and updation of data.
Easy recovery:
o Since database systems keeps the backup of data, it is easier to do a full
recovery of data in case of a failure. This is very useful especially for
almost all the organizations, as the data maintained over time should not
be lost during a system crash or failure.
Flexible:
4
o Database systems are more flexible than file processing systems. DBMS
systems are scalable,
o The database size can be increased and decreased based on the amount
of storage required.
o It also allows addition of additional tables as well as removal of existing
tables without disturbing the consistency of data.
Disadvantages of DBMS
View of Data
View of data in DBMS
Abstraction is one of the main features of database systems.
Hiding irrelevant details from user and providing abstract view of data to users,
helps in easy and efficient user-database interaction.
The top level of that architecture is “view level”.
The view level provides the “view of data” to the users and hides the irrelevant
details such as data relationship, database schema, constraints, security etc
from the user.
To fully understand the view of data, you must have a basic knowledge of data
abstraction and instance & schema.
5
Three levels of abstraction
Physical level: This is the lowest level of data abstraction. It describes how data is
actually stored in database. You can get the complex data structure details at this level.
Logical level: This is the middle level of 3-level data abstraction architecture. It
describes what data is stored in database.
View level: Highest level of data abstraction. This level describes the user
interaction with database system.
This is the schema of the employee table. Schema defines the attributes of tables in
the database. Schema is of three types: Physical schema, logical schema and view
schema.
6
Schema represents the logical view of the database. It helps you understand
what data needs to go where.
Schema can be represented by a diagram as shown below.
Schema helps the database users to understand the relationship between
data. This helps in efficiently performing operations on database such as insert,
update, delete, search etc.
In the following diagram, we have a schema that shows the relationship between three
tables: Course, Student and Section. The diagram only shows the design of the
database, it doesn’t show the data present in those tables. Schema is only a structural
view(design) of a database as shown in the diagram below.
The design of a database at physical level is called physical schema, how the data
stored in blocks of storage is described at this level.
Design of database at logical level is called logical schema, programmers and
database administrators work at this level, at this level data can be described as certain
types of data records gets stored in data structures, however the internal details such as
implementation of data structure is hidden at this level (available at physical level).
Design of database at view level is called view schema. This generally
describes end user interaction with database systems.
DBMS Instance
Definition of instance: The data stored in database at a particular moment of
time is called instance of database. Database schema defines the attributes in tables
that belong to a particular database. The value of these attributes at a moment of time
is called the instance of that database.
For example, we have seen the schema of table “employee” above. Let’s see
the table with the data now. At this moment the table contains two rows (records).
7
This is the the current instance of the table “employee” because this is the data that is
stored in this table at this particular moment of time.
EMP_NAME EMP_ID EMP_ADDRESS EMP_CONTACT
Chaitanya101Noida95********
Ajeet102Delhi99********
Let’s take another example: Let’s say we have a single table student in the database,
today the table has 100 records, so today the instance of the database has 100 records.
We are going to add another 100 records in this table by tomorrow so the instance of
database tomorrow will have 200 records in table. In short, at a particular moment the
data stored in database is called the instance, this changes over time as and when we
add, delete or update data in the database.
DBMS languages
Database languages are used to read, update and store data in a database. There
are several such languages that can be used for this purpose; one of them is SQL
(Structured Query Language).
All of these commands either defines or update the database schema that’s why they
come under Data Definition language.
8
To read records from table(s) – SELECT
To insert record(s) into the table(s) – INSERT
Update the data in table(s) – UPDATE
Delete all the records from the table – DELETE
9
2. Two tier architecture
In two-tier architecture, the Database system is present at the server machine and the
DBMS application is present at the client machine, these two machines are connected
with each other through a reliable network as shown in the above diagram.
Whenever client machine makes a request to access the database present at server
using a query language like sql, the server perform the request on the database and
returns the result back to the client. The application connection interface such as
JDBC, ODBC are used for the interaction between server and client.
10
3. Three tier architecture
In three-tier architecture, another layer is present between the client machine and
server machine. In this architecture, the client application doesn’t communicate
directly with the database systems present at the server machine, rather the client
application communicates with server application and the server application internally
communicates with the database system present at the server.
1. E-R Model
2. Object oriented Model
11
Record based logical Models – Like Object based model, they also describe data at
the conceptual and view levels. These models specify logical structure of database
with records, fields and attributes.
1. Relational Model
2. Hierarchical Model
3. Network Model – Network Model is same as hierarchical model except that it
has graph-like structure rather than a tree-based structure. Unlike hierarchical
model, this model allows each record to have more than one parent record.
Physical Data Models – These models describe data at the lowest level of abstraction.
A simple ER Diagram:
In the following diagram we have two entities Student and College and their
relationship. The relationship between Student and College is many to one as a college
12
can have many students however a student cannot study in multiple colleges at the
same time. Student entity has attributes such as Stu_Id, Stu_Name&Stu_Addr and
College entity has attributes such as Col_ID&Col_Name.
Here are the geometric shapes and their meaning in an E-R Diagram. We will
discuss these terms in detail in the next section(Components of a ER Diagram) of this
guide so don’t worry too much about these terms now, just go through them once.
Components of a ER Diagram
1. Entity
An entity is an object or component of data. An entity is represented as rectangle in an
ER diagram.
13
For example: In the following ER diagram we have two entities Student and College
and these two entities have many to one relationship as many students study in a
single college. We will read more about relationships later, for now focus on entities.
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the
relationship with other entity is called weak entity. The weak entity is represented by a
double rectangle. For example – a bank account cannot be uniquely identified without
knowing the bank to which the account belongs, so bank account is a weak entity.
2. Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in
an ER diagram. There are four types of attributes:
1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute
14
1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student
roll number can uniquely identify a student from a set of students. Key attribute is
represented by oval same as other attributes however the text of key attribute is
underlined.
2. Composite attribute:
3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It is
represented with double ovals in an ER Diagram. For example – A person can have
more than one phone numbers so the phone number attribute is multivalued.
15
4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute.
It is represented by dashed oval in an ER Diagram. For example – Person age is a
derived attribute as it changes over time and can be derived from another attribute
(Date of birth).
3. Relationship
A relationship is represented by diamond shape in ER diagram, it shows the
relationship among entities. There are four types of relationships:
1. One to One
2. One to Many
3. Many to One
4. Many to Many
16
2. One to Many Relationship
When a single instance of an entity is associated with more than one instances of
another entity then it is called one to many relationship. For example – a customer can
place many orders but a order cannot be placed by many customers.
When more than one instances of an entity is associated with a single instance of
another entity then it is called many to one relationship. For example – many students
can study in a single college but a student cannot study in many colleges at the same
time.
When more than one instances of an entity is associated with more than one instances
of another entity then it is called many to many relationship. For example, a can be
assigned to many projects and a project can be assigned to many students.
17
one associated Student. Total participation is represented using a double line between
the entity set and relationship set.
19
ER Diagram for Library
20
Hospital Management System
21
Unit II
RELATIONAL MODEL
Relational Model
Relational Model (RM) represents the database as a collection of relations. A
relation is nothing but a table of values. Every row in the table represents a collection of
related data values. These rows in the table denote a real-world entity or relationship.
The table name and column names are helpful to interpret the meaning of values in
each row. The data are represented as a set of relations. In the relational model, data are
stored as tables. However, the physical storage of the data is independent of the way the data
are logically organized.
1. Attribute: Each column in a Table. Attributes are the properties which define a
relation. e.g., Student_Rollno, NAME,etc.
2. Tables – In the Relational model the, relations are saved in the table format. It is
stored along with its entities. A table has two properties rows and columns. Rows
represent records and columns represent attributes.
3. Tuple – It is nothing but a single row of a table, which contains a single record.
4. Relation Schema: A relation schema represents the name of the relation with its
attributes.
5. Degree: The total number of attributes which in the relation is called the degree of the
relation.
6. Cardinality: Total number of rows present in the Table.
7. Column: The column represents the set of values for a specific attribute.
8. Relation instance – Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
9. Relation key – Every row has one, two or multiple attributes, which is called relation
key.
10. Attribute domain – Every attribute has some pre-defined value and scope which is
known as attribute domain
22
Structure of Relational Database
There are many types of Integrity Constraints in DBMS. Constraints on the Relational
database management system is mostly divided into three main categories are:
1. Domain Constraints
2. Key Constraints
3. Referential Integrity Constraints
Domain Constraints
Domain constraints can be violated if an attribute value is not appearing in the corresponding
domain or it is not of the appropriate data type.
Domain constraints specify that within each tuple, and the value of each attribute must be
unique. This is specified as data types which include standard data types integers, real
numbers, characters, Booleans, variable length strings, etc.
Example:
The example shown demonstrates creating a domain constraint such that CustomerName is
not NULL
23
Key Constraints
An attribute that can uniquely identify a tuple in a relation is called the key of the table. The
value of the attribute for different tuples in the relation has to be unique.
Example:
In the given table, CustomerID is a key attribute of Customer Table. It is most likely to have
a single key for one customer, CustomerID =1 is only for the CustomerName =” Google”.
1 Google Active
2 Amazon Active
3 Apple Inactive
Example:
24
Each relation should be depicted clearly in the table
Rows should contain data about instances of an entity
Columns must contain data about attributes of the entity
Cells of the table should hold a single value
Each column should be given a unique name
No two rows can be identical
The values of an attribute should be from the same domain
Simplicity: A Relational data model in DBMS is simpler than the hierarchical and
network model.
Structural Independence: The relational database is only concerned with data and
not with a structure. This can improve the performance of the model.
Easy to use: The Relational model in DBMS is easy as tables consisting of rows and
columns are quite natural and simple to understand
Query capability: It makes possible for a high-level query language like SQL to
avoid complex database navigation.
Data independence: The Structure of Relational database can be changed without
having to change any application.
Scalable: Regarding a number of records, or rows, and the number of fields, a
database should be enlarged to enhance its usability.
Few relational databases have limits on field lengths which can’t be exceeded.
Relational databases can sometimes become complex as the amount of data grows,
and the relations between pieces of data become more complicated.
Complex relational database systems may lead to isolated databases where the
information cannot be shared from one system to another.
Relational database systems are expected to be equipped with a query language that can assist
its users to query the database instances. There are two kinds of query languages − relational
algebra and relational calculus.
Relational Algebra
Relational algebra is a procedural query language, which takes instances of relations as input
and yields instances of relations as output. It uses operators to perform queries. An operator
can be either unary or binary. They accept relations as their input and yield relations as their
output. Relational algebra is performed recursively on a relation and intermediate results are
also considered relations.
The fundamental operations of relational algebra are as follows −
Select
Project
Union
25
Set different
Cartesian product
Rename
We will discuss all these operations in the following sections.
26
r, and s must have the same number of attributes.
Attribute domains must be compatible.
Duplicate tuples are automatically eliminated.
∏ author (Books) ∪ ∏ author (Articles)
Output − Projects the names of the authors who have either written a book or an article or
both.
The result of set difference operation is tuples, which are present in one relation but are not in
the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
∏ author (Books) − ∏ author (Articles)
Output − Provides the name of authors who have written books but not articles.
The results of relational algebra are also relations but without any name. The rename
operation allows us to rename the output relation. 'rename' operation is denoted with small
Greek letter rho ρ.
Notation − ρ x (E)
Where the result of expression E is saved with name of x.
Additional operations are −
Set intersection
Assignment
Natural join
Relational Calculus
27
Relational calculus exists in two forms −
Tuple Relational Calculus (TRC)
Filtering variable ranges over tuples
Notation − {T | Condition}
Returns all tuples T that satisfies a condition.
For example −
{ T.name | Author(T) AND T.article = 'database' }
Output − Returns tuples with 'name' from Author who has written article on 'database'.
TRC can be quantified. We can use Existential (∃) and Universal Quantifiers (∀).
For example −
{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output − The above query will yield the same result as the previous one.
Domain Relational Calculus (DRC)
In DRC, the filtering variable uses the domain of attributes instead of entire tuple values (as
done in TRC, mentioned above).
Notation −
{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where a1, a2 are attributes and P stands for formulae built by inner attributes.
For example −
{< article, page, subject > | ∈ TutorialsPoint ∧ subject = 'database'}
Output − Yields Article, Page, and Subject from the relation TutorialsPoint, where subject is
database.
Just like TRC, DRC can also be written using existential and universal quantifiers. DRC also
involves relational operators.
The expression power of Tuple Relation Calculus and Domain Relation Calculus is
equivalent to Relational Algebra.
SQL NULL Values
In SQL there may be some records in a table that do not have values or data for
every field. This could be possible because at a time of data entry information is not
available. So SQL supports a special value known as NULL which is used to represent the
values of attributes that may be unknown or not apply to a tuple. SQL places a NULL value
in the field in the absence of a user-defined value. For example, the Apartment_number
attribute of an address applies only to address that are in apartment buildings and not to
other types of residences.
Importance of NULL value:
It is important to understand that a NULL value is different from a zero value.
A NULL value is used to represent a missing value, but that it usually has one of three
different interpretations:
28
The value unknown (value exists but is not known)
Value not available (exists but is purposely withheld)
Attribute not applicable (undefined for this tuple)
It is often not possible to determine which of the meanings is intended. Hence, SQL
does not distinguish between the different meanings of NULL.
Setting a NULL value is appropriate when the actual value is unknown, or when a value
would not be meaningful.
A NULL value is not equivalent to a value of ZERO if the data type is a number and is
not equivalent to spaces if the data type is character.
A NULL value can be inserted into columns of any data type.
A NULL value will evaluate NULL in any expression.
Suppose if any column has a NULL value, then UNIQUE, FOREIGN key, CHECK
constraints will ignore by SQL.
In general, each NULL value is considered to be different from every other NULL in the
database. When a NULL is involved in a comparison operation, the result is considered to
be UNKNOWN. Hence, SQL uses a three-valued logic with values True, False,
and Unknown. It is, therefore, necessary to define the results of three-valued logical
expressions when the logical connectives AND, OR, and NOT are used.
29
Suppose if we find the Fname, Lname of the Employee having no Super_ssn then the query
will be:
Query
SELECT Fname, Lname FROM Employee WHERE Super_ssn IS NULL;
Output:
Whenever one of these operations are applied, integrity constraints specified on the relational
database schema must never be violated.
30
Insert Operation
The insert operation gives values of the attribute for a new tuple which should be inserted
into a relation.
Update Operation
You can see that in the below-given relation table CustomerName= ‘Apple’ is updated from
Inactive to Active.
Delete Operation
To specify deletion, a condition on the attributes of the relation selects the tuple to be deleted.
The Delete operation could violate referential integrity if the tuple which is deleted is
referenced by foreign keys from other tuples in the same database.
Select Operation
31
Structured Query Language (SQL)
Structured Query Language is a standard Database language which is used to create,
maintain and retrieve the relational database. Following are some interesting facts about
SQL.
SQL is case insensitive. But it is a recommended practice to use keywords (like
SELECT, UPDATE, CREATE, etc) in capital letters and use user defined things (liked
table name, column name, etc) in small letters.
We can write comments in SQL using “–” (double hyphen) at the beginning of any line.
SQL is the programming language for relational databases (explained below) like
MySQL, Oracle, Sybase, SQL Server, Postgre, etc. Other non-relational databases (also
called NoSQL) databases like MongoDB, DynamoDB, etc do not use SQL
Although there is an ISO standard for SQL, most of the implementations slightly vary
in syntax. So we may encounter queries that work in SQL Server but do not work in
MySQL.
.
What is Relational Database?
Relational database means the data is stored as well as retrieved in the form of relations
(tables). Table 1 shows the relational database with only one relation
called STUDENT which stores ROLL_NO, NAME, ADDRESS, PHONE and AGE of
students.
STUDENT
TABLE 1
These are some important terminologies that are used in terms of relation.
Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME etc.
Tuple: Each row in the relation is known as tuple. The above relation contains 4 tuples, one
of which is shown as:
1 RAM DELHI 9455123451 18
Degree: The number of attributes in the relation is known as degree of the relation.
The STUDENT relation defined above has degree 5.
Cardinality: The number of tuples in a relation is known as cardinality.
The STUDENT relation defined above has cardinality 4.
32
Column: Column represents the set of values for a particular attribute. The
column ROLL_NO is extracted from relation STUDENT.
ROLL_NO
Part of the query represented by statement 1 is compulsory if you want to retrieve from a
relational database. The statements written inside [] are optional. We will look at the
possible query combination on relation shown in Table 1.
Case 1: If we want to retrieve attributes ROLL_NO and NAME of all students, the query
will be:
ROLL_NO NAME
1 RAM
2 RAMESH
3 SUJIT
33
4 SURESH
ROLL_NO NAME
3 SUJIT
4 SURESH
CASE 3: If we want to retrieve all attributes of students, we can write * in place of writing
all attributes as:
SELECT * FROM STUDENT WHERE ROLL_NO>2;
CASE 4: If we want to represent the relation in ascending order by AGE, we can use
ORDER BY clause as:
SELECT * FROM STUDENT ORDER BY AGE;
Note: ORDER BY AGE is equivalent to ORDER BY AGE ASC. If we want to retrieve the
results in descending order of AGE, we can use ORDER BY AGE DESC.
34
CASE 5: If we want to retrieve distinct values of an attribute or group of attribute,
DISTINCT is used as in:
SELECT DISTINCT ADDRESS FROM STUDENT;
ADDRESS
DELHI
GURGAON
ROHTAK
If DISTINCT is not used, DELHI will be repeated twice in result set. Before understanding
GROUP BY and HAVING, we need to understand aggregations functions in SQL.
AGGRATION FUNCTIONS: Aggregation functions are used to perform mathematical
operations on data values of a relation. Some of the common aggregation functions used in
SQL are:
COUNT: Count function is used to count the number of rows in a relation. e.g;
SELECT COUNT (PHONE) FROM STUDENT;
COUNT(PHONE)
SUM: SUM function is used to add the values of an attribute in a relation. e.g;
SELECT SUM (AGE) FROM STUDENT;
SUM(AGE)
74
In the same way, MIN, MAX and AVG can be used. As we have seen above, all
aggregation functions return only 1 row.
AVERAGE: It gives the average values of the tupples. It is also defined as sum divided by
count values.
Syntax:AVG(attributename)
OR
Syntax:SUM(attributename)/COUNT(attributename)
The above mentioned syntax also retrieves the average value of tupples.
MAXIMUM:It extracts the maximum value among the set of tupples.
Syntax:MAX(attributename)
MINIMUM:It extracts the minimum value amongst the set of all the tupples.
Syntax:MIN(attributename)
35
GROUP BY: Group by is used to group the tuples of a relation based on an attribute or
group of attribute. It is always combined with aggregation function which is computed on
group. e.g.;
In this query, SUM(AGE) will be computed but not for entire table but for each address.
i.e.; sum of AGE for address DELHI(18+18=36) and similarly for other address as well.
The output is:
ADDRESS SUM(AGE)
DELHI 36
GURGAON 18
ROHTAK 20
If we try to execute the query given below, it will result in error because although we have
computed SUM(AGE) for each address, there are more than 1 ROLL_NO for each address
we have grouped. So it can’t be displayed in result set. We need to use aggregate functions
on columns after SELECT statement to make sense of the resulting set whenever we are
using GROUP BY.
SELECT ROLL_NO, ADDRESS, SUM(AGE) FROM STUDENT
GROUP BY (ADDRESS);
Advanced SQL
Accessing SQL From a Programming Language
■ API (application-program interface) for a program to interact with a database server
■ Application makes calls to
● Connect with the database server
● Send SQL commands to the database server
● Fetch tuples of result one-by-one into program variables
■ Various tools:
● ODBC (Open Database Connectivity) works with C, C++, C#, and Visual
Basic. Other APIs such as ADO.NET sit on top of ODBC
● JDBC (Java Database Connectivity) works with Java
● Embedded SQL
36
Integrity Constraints
The Set of rules which is used to maintain the quality of information are known
as integrity constraints.
Integrity constraints make sure about data intersection, update and so on.
Integrity constraints can be understood as a guard against unintentional damage to
the database.
For any stored data if we want to preserve the consistency and correctness, a relational
DBMS typically imposes one or more data integrity constraints. These constraints restrict the
data values which can be inserted into the database or created by a database update.
There are different types of data integrity constraints that are commonly found in relational
databases, including the following −
Required data − Some columns in a database contain a valid data value in each row;
they are not allowed to contain NULL values. In the sample database, every order has
an associated customer who placed the order. The DBMS can be asked to prevent
NULL values in this column.
Validity checking − Every column in a database has a domain, a set of data values
which are legal for that column. The DBMS allowed preventing other data values in
these columns.
Entity integrity − The primary key of a table contains a unique value in each row that
is different from the values in all other rows. Duplicate values are illegal because they
are not allowing the database to differentiate one entity from another. The DBMS can
be asked to enforce this unique values constraint.
Referential integrity − A foreign key in a relational database links each row in the
child table containing the foreign key to the row of the parent table containing the
matching primary key value. The DBMS can be asked to enforce this foreign
key/primary key constraint.
Other data relationships − The real-world situation which is modeled by a database
often has additional constraints which govern the legal data values that may appear in
the database. The DBMS is allowed to check modifications to the tables to make sure
that their values are constrained in this way.
Business rules − Updates to a database that are constrained by business rules
governing the real-world transactions which are represented by the updates.
Consistency − Many real-world transactions that cause multiple updates to a database.
The DBMS is allowed to enforce this type of consistency rule or to support
applications that implement such rules.
37
Domain Constraint
The Definition of an applicable set of values is known as domain constraint.
Strings, character, time, integer, currency, date etc. Are examples of the data type of domain
constraints.
Example
11 Manish 30000
12 Vikram 20000
38
13 Sudhir 10000
Rajeev 40000
Null is not allowed in Emp_ID as it is a Primary key and cannot have a NULL value.
Key Constraints
The Entity within its entity set is identified uniquely by the key which is the entity set.
There can be a number of keys in an entity set but only one will be the primary key out of all
keys. In a relational table a primary key can have a unique as well as a null value.
Example
100 Naren 4 27
101 Lalit 6 28
102 Shivanshu 3 22
103 Navdeep 5 29
102 Karthik 7 25
All row ID must be unique hence 102 is not allowed.
Database authorization
Authorization is the process where the database manager gets information about the
authenticated user. Part of that information is determining which database operations the user
can perform and which data objects a user can access.
39
A role is a database object that groups one or more privileges. Roles can be assigned to users
or groups or other roles by using the GRANT statement. Users that are members of roles
have the privileges that are defined for the role with which to access data.
The forms of authorization, such as administrative authority, privileges, and Row and
column access (RCAC) access, are discussed in Authorization of Big SQL objects. In
addition, ownership of objects brings with it a degree of authorization on the objects
created.
System-level authorization
SYSADM (system administrator) authority
The SYSADM (system administrator) authority provides control over all the resources
created and maintained by the database manager. The system administrator possesses all the
authorities of SYSCTRL, SYSMAINT, and SYSMON authority. The user who has
SYSADM authority is responsible both for controlling the database manager, and for
ensuring the safety and integrity of the data.
SYSCTRL authority
The SYSCTRL authority provides control over operations that affect system resources. For
example, a user with SYSCTRL authority can create, update, start, stop, or drop a database.
This user can also start or stop an instance, but cannot access table data. Users with
SYSCTRL authority also have SYSMON authority.
SYSMAINT authority
The SYSMAINT authority provides the authority required to perform maintenance
operations on all databases that are associated with an instance. A user with SYSMAINT
authority can update the database configuration, backup a database or table space, restore
an existing database, and monitor a database. Like SYSCTRL, SYSMAINT does not
provide access to table data. Users with SYSMAINT authority also have SYSMON
authority.
SYSMON (system monitor) authority
The SYSMON (system monitor) authority provides the authority required to use the
database system monitor.
Database-level authorization
DBADM (database administrator)
The DBADM authority level provides administrative authority over a single database. This
database administrator possesses the privileges required to create objects and issue database
commands. The DBADM authority can be granted only by a user with SECADM authority.
The DBADM authority cannot be granted to PUBLIC.
SECADM (security administrator)
40
The SECADM authority level provides administrative authority for security over a single
database. The security administrator authority possesses the ability to manage database
security objects (database roles, audit policies, trusted contexts, security label components,
and security labels) and grant and revoke all database privileges and authorities. A user
with SECADM authority can transfer the ownership of objects that they do not own. They
can also use the AUDIT statement to associate an audit policy with a particular database or
database object at the server.The SECADM authority has no inherent privilege to access
data stored in tables. It can only be granted by a user with SECADM authority. The
SECADM authority cannot be granted to PUBLIC.
SQLADM (SQL administrator)
The SQLADM authority level provides administrative authority to monitor and tune SQL
statements within a single database. It can be granted by a user with ACCESSCTRL or
SECADM authority.
WLMADM (workload management administrator)
The WLMADM authority provides administrative authority to manage workload
management objects, such as service classes, work action sets, work class sets, and
workloads. It can be granted by a user with ACCESSCTRL or SECADM
authority.EXPLAIN (explain authority)The EXPLAIN authority level provides
administrative authority to explain query plans without gaining access to data. It can only
be granted by a user with ACCESSCTRL or SECADM authority.
EXPLAIN (explain authority)
The EXPLAIN authority level provides administrative authority to explain query plans
without gaining access to data. It can only be granted by a user with ACCESSCTRL or
SECADM authority.
ACCESSCTRL (access control authority)
ACCESSCTRL authority can only be granted by a user with SECADM authority. The
ACCESSCTRL authority cannot be granted to PUBLIC. The ACCESSCTRL authority
level provides administrative authority to issue the following GRANT (and REVOKE)
statements:
41
For more information about granting and revoking privileges, see Granting and revoking
access.
LOAD authority
SELECT, INSERT, UPDATE, DELETE privilege on tables, views, nicknames, and
materialized query tables
EXECUTE privilege on packages
EXECUTE privilege on modules
EXECUTE privilege on routines, except on the audit routines.
USAGE privilege on all sequences
Privileges
CONTROL privilege
If you possess the CONTROL privilege on an object, you can access that database object,
and grant and revoke privileges to or from other users on that object. The CONTROL
privilege only applies to tables, views, nicknames, indexes, and packages..
If a different user requires the CONTROL privilege to that object, a user with SECADM or
ACCESSCTRL authority can grant the CONTROL privilege to that object. The CONTROL
privilege cannot be revoked from the object owner, however, the object owner can be
changed by using the TRANSFER OWNERSHIP statement.
Individual privileges
Individual privileges can be granted to allow a user to carry out specific tasks on specific
objects. Users with the administrative authorities ACCESSCTRL or SECADM, or with the
CONTROL privilege, can grant and revoke privileges to and from users.
Revoking privileges
The REVOKE statement is used to revoke previously granted privileges. The revoking of a
privilege from an authorization name revokes the privilege granted by all authorization
names.
42
Authorization ID privileges: SETSESSION USER
Authorization ID privileges involve actions on authorization IDs. There is currently only
one such privilege: the SETSESSIONUSER privilege.
Schema privileges
Schema privileges are in the object privilege category.
Table and view privileges
Table and view privileges involve actions on tables or views in a database.
Package privileges
A package is a database object that contains the information needed by the database
manager to access data in the most efficient way for a particular application program.
Package privileges enable a user to create and manipulate packages.
Sequence privileges
The creator of a sequence automatically receives the USAGE and ALTER privileges on the
sequence. The USAGE privilege is needed to use NEXT VALUE and PREVIOUS VALUE
expressions for the sequence.
Routine privileges
Execute privileges involve actions on all types of routines such as functions, procedures,
and methods within a database. Once having EXECUTE privilege, a user can then invoke
that routine, create a function that is sourced from that routine (applies to functions only),
and reference the routine in any DDL statement such as CREATE VIEW or CREATE
TRIGGER.
Usage privilege on workloads
To enable use of a workload, a user who holds ACCESSCTRL, SECADM, or WLMADM
authority can grant USAGE privilege on that workload to a user, a group, or a role using
the GRANT USAGE ON WORKLOAD statement.
Embedded SQL applications connect to databases and execute embedded SQL statements.
The embedded SQL statements are contained in a package that must be bound to the target
database server.
You can develop embedded SQL applications for the Db2® database in the following host
programming languages: C, C++, and COBOL.
Building embedded SQL applications involves two prerequisite steps before application
compilation and linking.
43
The PREP (PRECOMPILE) command is used to invoke the Db2 precompiler, which
reads your source code, parses and converts the embedded SQL statements
to Db2 run-time services API calls, and finally writes the output to a new modified
source file. The precompiler produces access plans for the SQL statements, which are
stored together as a package within the database.
Once you have precompiled and bound your embedded SQL application, it is ready to be
compiled and linked using the host language-specific development tools.
To aid in the development of embedded SQL applications, you can refer to the embedded
SQL template in Cembedded SQL template in C. Examples of working embedded SQL
sample applications can also be found in the %DB2PATH%\SQLLIB\samples directory.
Note: %DB2PATH% refers to the Db2 installation directory
Static and dynamic SQL
44
Supported development software for embedded SQL applications
Before you begin writing embedded SQL applications, you must determine if your
development software is supported. The operating system that you are developing for
determines which compilers, interpreters, and development software you must use.
Setting up the embedded SQL development environment
Before you can start building embedded SQL applications, install the supported
compiler for the host language you will be using to develop your applications and set
up the embedded SQL environment.
Designing embedded SQL applications
When designing embedded SQL applications you must use static or dynamic executed
SQL statements.
Programming embedded SQL applications
Programming embedded SQL applications involves the same steps required to
assemble an application in your host programming language.
Building embedded SQL applications
After you have created the source code for your embedded SQL application, you must
follow additional steps to build the application. You should consider building 64-bit
executable files when developing new embedded SQL database applications. Along
with compiling and linking your program, you must precompile and bind it.
Deploying and running embedded SQL applications
Embedded SQL applications are portable and can be placed in remote database
components. You can compile the application in one location and run the package on
a different component.
Compatibility features for migration
The Db2 database manager provides features that facilitate the migration of embedded
SQL C applications from other database systems.
Dynamic SQL
Dynamic SQL enables you to write programs that reference SQL statements whose full text is
not known until runtime. Before discussing dynamic SQL in detail, a clear definition of static
SQL may provide a good starting point for understanding dynamic SQL. Static SQL
statements do not change from execution to execution. The full text of static SQL statements
are known at compilation, which provides the following benefits:
Successful compilation verifies that the SQL statements reference valid database
objects.
Successful compilation verifies that the necessary privileges are in place to access the
database objects.
Performance of static SQL is generally better than dynamic SQL.
Because of these advantages, you should use dynamic SQL only if you cannot use static SQL
to accomplish your goals, or if using static SQL is cumbersome compared to dynamic SQL.
However, static SQL has limitations that can be overcome with dynamic SQL. You may not
always know the full text of the SQL statements that must be executed in a PL/SQL
procedure. Your program may accept user input that defines the SQL statements to execute,
or your program may need to complete some processing work to determine the correct course
of action. In such cases, you should use dynamic SQL.
45
For example, consider a reporting application that performs standard queries on tables in a
data warehouse environment where the exact table name is unknown until runtime. To
accommodate the large amount of data in the data warehouse efficiently, you create a new
table every quarter to store the invoice information for the quarter. These tables all have
exactly the same definition and are named according to the starting month and year of the
quarter, for
example INV_01_1997, INV_04_1997, INV_07_1997, INV_10_1997, INV_01_1998, etc. In
such a case, you can use dynamic SQL in your reporting application to specify the table name
at runtime.
With static SQL, all of the data definition information, such as table definitions, referenced
by the SQL statements in your program must be known at compilation. If the data definition
changes, you must change and recompile the program. Dynamic SQL programs can handle
changes in data definition information, because the SQL statements can change "on the fly"
at runtime. Therefore, dynamic SQL is much more flexible than static SQL. Dynamic SQL
enables you to write application code that is reusable because the code defines a process that
is independent of the specific SQL statements used.
In addition, dynamic SQL lets you execute SQL statements that are not supported in static
SQL programs, such as data definition language (DDL) statements. Support for these
statements allows you to accomplish more with your PL/SQL programs.
Saurabh A7 Patiala
Mehak B6 Jalandhar
Sumiti D9 Ludhiana
46
Customer name Street City
Ria A5 Patiala
Table-2: Branch
ABC Patiala
DEF Ludhiana
GHI Jalandhar
Table-3: Account
Table-4: Loan
Table-5: Borrower
47
Customer name Loan number
Saurabh L33
Mehak L49
Ria L98
Table-6: Depositor
Saurabh 1111
Mehak 1113
Sumiti 1114
Queries-1: Find the loan number, branch, amount of loans of greater than or equal to 10000
amount.
{t| t ∈ loan ∧ t[amount]>=10000}
Resulting relation:
Loan number
L33
L35
48
Loan number
L98
Queries-3: Find the names of all customers who have a loan and an account at the bank.
{t | ∃ s ∈ borrower( t[customer-name] = s[customer-name])
∧ ∃ u ∈ depositor( t[customer-name] = u[customer-name])}
Resulting relation:
Customer name
Saurabh
Mehak
Queries-4: Find the names of all customers having a loan at the “ABC” branch.
{t | ∃ s ∈ borrower(t[customer-name] = s[customer-name]
∧ ∃ u ∈ loan(u[branch-name] = “ABC” ∧ u[loan-number] = s[loan-number]))}
Resulting relation:
Customer name
Saurabh
49
Customer name Street City
Table-2: Loan
Loan number Branch name Amount
L10 Sub 90
L08 Main 60
Table-3: Borrower
Customer name Loan number
Ritu L01
Debomit L08
Soumya L03
Query-1: Find the loan number, branch, amount of loans of greater than or equal to 100
amount.
{≺l, b, a≻ | ≺l, b, a≻ ∈ loan ∧ (a ≥ 100)}
Resulting relation:
Query-2: Find the loan number for each loan of an amount greater or equal to 150.
{≺l≻ | ∃ b, a (≺l, b, a≻ ∈ loan ∧ (a ≥ 150)}
50
Resulting relation:
Loan number
L01
L03
Query-3: Find the names of all customers having a loan at the “Main” branch and find the
loan amount .
{≺c, a≻ | ∃ l (≺c, l≻ ∈ borrower ∧ ∃ b (≺l, b, a≻ ∈ loan ∧ (b = “Main”)))}
Resulting relation:
Ritu 200
Debomit 60
Soumya 150
Normal queries we fire on the database they should be correct and in a well-defined
structure which means they should follow a proper syntax if the syntax or query is wrong
definitely we will get an error and due to that our application or calculation definitely going
to stop. So to overcome this problem QBE was introduced. QBE stands for Query By
Example and it was developed in 1970 by Moshe Zloof at IBM.
It is a graphical query language where we get a user interface and then we fill some
required fields to get our proper result.
In SQL we will get an error if the query is not correct but in the case of QBE if the query is
wrong either we get a wrong answer or the query will not be going to execute but we will
never get any error.
Note-:
In QBE we don’t write complete queries like SQL or other database languages it comes
with some blank so we need to just fill that blanks and we will get our required result.
Example
Consider the example where a table ‘SAC’ is present in the database with Name,
Phone_Number, and Branch fields. And we want to get the name of the SAC-
Representative name who belongs to the MCA Branch. If we write this query in SQL we
have to write it like
SELECT NAME
FROM SAC
51
WHERE BRANCH = 'MCA'"
And definitely, we will get our correct result. But in the case of QBE, it may be done as like
there is a field present and we just need to fill it with “MCA” and then click on the
SEARCH button we will get our required result.
Points about QBE:
SQL Trigger
Trigger: A trigger is a stored procedure in database which automatically invokes whenever
a special event in the database occurs. For example, a trigger can be invoked when a row is
inserted into a specified table or when certain table columns are being updated.
Syntax:
create trigger [trigger_name]
[before | after]
{insert | update | delete}
on [table_name]
[for each row]
[trigger_body]
Explanation of syntax:
1. create trigger [trigger_name]: Creates or replaces an existing trigger with the
trigger_name.
2. [before | after]: This specifies when the trigger will be executed.
3. {insert | update | delete}: This specifies the DML operation.
4. on [table_name]: This specifies the name of the table associated with the trigger.
5. [for each row]: This specifies a row-level trigger, i.e., the trigger will be executed for
each row being affected.
6. [trigger_body]: This provides the operation to be performed as trigger is fired
BEFORE triggers run the trigger action before the triggering statement is run.
AFTER triggers run the trigger action after the triggering statement is run.
Example:
Given Student Report Database, in which student marks assessment is recorded. In such
schema, create a trigger so that the total and average of specified marks is automatically
inserted whenever a record is insert.
Here, as trigger will invoke before record is inserted so, BEFORE Tag can be used.
52
Suppose the database Schema –
mysql> desc Student;
+ + + + + + +
| Field | Type | Null | Key | Default | Extra |
+ + + + + + +
| tid | int(4) | NO | PRI | NULL | auto_increment |
| name | varchar(30) | YES | | NULL | |
| subj1 | int(2) | YES | | NULL | |
| subj2 | int(2) | YES | | NULL | |
| subj3 | int(2) | YES | | NULL | |
| total | int(3) | YES | | NULL | |
| per | int(3) | YES | | NULL | |
+ + + + + + +
7 rows in set (0.00 sec)
SQL Trigger to problem statement.
create trigger stud_marks
before INSERT
on
Student
for each row
set Student.total = Student.subj1 + Student.subj2 + Student.subj3, Student.per =
Student.total * 60 / 100;
Above SQL statement will create a trigger in the student database in which whenever
subjects marks are entered, before inserting this data into the database, trigger will compute
those two values and insert with the entered values. i.e.,
mysql> insert into Student values(0, "ABCDE", 20, 20, 20, 0, 0);
Query OK, 1 row affected (0.09 sec)
mysql> select * from Student;
+ + + + + + + +
| tid | name | subj1 | subj2 | subj3 | total | per |
+ + + + + + + +
| 100 | ABCDE | 20 | 20 | 20 | 60 | 36 |
+ + + + + + + +
1 row in set (0.00 sec)
53
Unit 3
Database Design
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2
From the above table we can conclude some valid functional dependencies:
roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine
values of fields name, dept_name and dept_building, hence a valid Functional
dependency
roll_no → dept_name , Since, roll_no can determine whole set of {name,
dept_name, dept_building}, it can determine its subset dept_name also.
dept_name → dept_building , Dept_name can identify the dept_building
accurately, since departments with different dept_name will also have a different
dept_building
More valid functional dependencies: roll_no → name, {roll_no, name} ⇢
{dept_name, dept_building}, etc.
Here are some invalid functional dependencies:
54
name → dept_name Students with the same name can have different dept_name,
hence this is not a valid functional dependency.
dept_building → dept_name There can be multiple departments in the same
building, For example, in the above table departments ME and EC are in the same
building B2, hence dept_building → dept_name is an invalid functional
dependency.
More invalid functional dependencies: name → roll_no, {name, dept_name} →
roll_no, dept_building → roll_no, etc.
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
55
Here, {roll_no, name} → name is a trivial functional dependency, since the
dependent name is a subset of determinant set {roll_no, name}
Similarly, roll_no → roll_no is also an example of trivial functional dependency.
42 abc 17
43 pqr 18
44 xyz 18
42 abc 17
43 pqr 18
44 xyz 18
56
roll_no name age
45 abc 19
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
57
Decomposition of a relation R into R1 and R2 is a lossless-join decomposition if at
least one of the following functional dependencies are in F+ (Closure of functional
dependencies)
R1 ∩ R2 → R1
OR
R1 ∩ R2 → R2
Functional dependencies
Normal Forms in DBMS
Normalization is the process of minimizing redundancy from a relation or set of
relations. Redundancy in relation may cause insertion, deletion, and update anomalies. So,
it helps to minimize the redundancy in relations. Normal forms are used to eliminate or
reduce redundancy in database tables.
Example 2 –
ID Name Courses
1 A c1, c2
2 E c3
3 M C2, c3
58
In the above table Course is a multi-valued attribute so it is not in 1NF.
Below Table is in 1NF as there is no multi-valued attribute
ID Name Course
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
To be in second normal form, a relation must be in first normal form and relation
must not contain any partial dependency. A relation is in 2NF if it has No Partial
Dependency, i.e., no non-prime attribute (attributes which are not part of any candidate
key) is dependent on any proper subset of any candidate key of the table.
Partial Dependency – If the proper subset of candidate key determines non-prime
attribute, it is called partial dependency.
Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
{Note that, there are many courses having the same course fee. }
Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only
candidate key {STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key. Non-prime attribute
COURSE_FEE is dependent on a proper subset of the candidate key, which is a partial
dependency and so this relation is not in 2NF.
59
To convert the above relation to 2NF,
we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
2 C5
NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance,
if there are 100 students taking C1 course, we don’t need to store its Fee as 1000 for all
the 100 records, instead, once we can store it in the second table as the course fee for
C1 is 1000.
Example 2 – Consider following functional dependencies in relation R (A, B , C, D )
AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial dependency,
i.e., any proper subset of AB doesn’t determine any non-prime attribute.
Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive
dependency.
Example 1 – In relation STUDENT given in Table 4,
FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE,
STUD_STATE -> STUD_COUNTRY, STUD_NO -> STUD_AGE}
Candidate Key: {STUD_NO}
60
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on
STUD_NO. It violates the third normal form. To convert it in third normal form,
we will decompose the relation STUDENT (STUD_NO, STUD_N AME,
STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
Example 2 – Consider relation R(A, B, C, D, E)
A -> BC,
CD -> E,
B -> D,
E -> A
All possible candidate keys in above relation are {A, E, CD, BC} All attributes
are on right sides of all functional dependencies are prime.
A relation R is in BCNF if R is in Third Normal Form and for every FD, LHS is
super key. A relation is in BCNF iff in every non-trivial functional dependency X –
> Y, X is a super key.
Example 1 – Find the highest normal form of a relation R(A,B,C,D,E) with FD
set as {BC->D, AC->BE, B->E}
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can
determine all attribute of relation, So AC will be candidate key. A or C can’t be
derived from any other attribute of the relation, so there will be only 1 candidate
key {AC}.
Step 2. Prime attributes are those attributes that are part of candidate key {A, C}
in this example and others will be non-prime {B, D, E} in this example.
Step 3. The relation R is in 1st normal form as a relational DBMS does not
allow multi-valued or composite attribute.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC
is not a proper subset of candidate key AC) and AC->BE is in 2nd normal form
(AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset
of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super
key nor D is a prime attribute) and in B->E (neither B is a super key nor E is a
prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be
super key or RHS should be prime attribute.
61
Key Points –
BCNF is free from redundancy.
If a relation is in BCNF, then 3NF is also satisfied.
If all attributes of relation are prime attribute, then the relation is always in 3NF.
A relation in a Relational Database is always and at least in 1NF form.
Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.
If a Relation has only singleton candidate keys( i.e. every candidate key consists of
only 1 attribute), then the Relation is always in 2NF( because no Partial functional
dependency possible).
Sometimes going for BCNF form may not preserve functional dependency. In that
case go for BCNF only if the lost FD(s) is not required, else normalize till 3NF only.
There are many more Normal forms that exist after BCNF, like 4NF and more. But
in real world database systems it’s generally not required to go beyond BCNF.
Exercise 1: Find the highest normal form in R (A, B, C, D, E) under following functional
dependencies.
ABC --> D
CD --> AE
Important Points for solving above type of question.
1) It is always a good idea to start checking from BCNF, then 3 NF, and so on.
2) If any functional dependency satisfied a normal form then there is no need to check for
lower normal form. For example, ABC –> D is in BCNF (Note that ABC is a superkey), so
no need to check this dependency for lower normal forms.
Candidate keys in the given relation are {ABC, BCD}
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a super
key so this dependency is not in BCNF. So, R is not in BCNF.
3NF: ABC -> D we don’t need to check for this dependency as it already
satisfied BCNF. Let us consider CD -> AE. Since E is not a prime attribute, so the
relation is not in 3NF.
2NF: In 2NF, we need to check for partial dependency. CD is a proper
subset of a candidate key and it determines E, which is non-prime attribute. So,
given relation is also not in 2 NF. So, the highest normal form is 1 NF.
62
X- >->Y (read as X multidetermines Y) relates one value of X to many values of Y.
A Nontrivial MVD occurs when X->->Y and X->->z where Y and Z are not
dependent are independent to each other. Non-trivial MVD produces redundancy.
We use multivalued conditions in two different ways −
To test relations to decide if they are lawful under a given arrangement of practical
and multivalued dependencies.
To determine limitations on the arrangement of lawful relations. We will concern
ourselves just with relations that fulfill a given arrangement of practical and
multivalued dependencies.
Multivalued Dependencies
The 4th Normal Form can cause the Multivalued Dependencies. If a relation is in Boyce
codee Normal form, it has to remove the multivalued Dependencies.
Explanation − The multivalued dependencies is that, if there is a dependency or relation in a
table, then one value has multiple dependencies occur.
Let us consider an example as given below. Consider the following table −
id department shift
1 coding day
2 Hr day
63
id department shift
3 Network night
In the above table, id 2 has two departments Hr and Network. And shift timing day and
night.
When we select the details with the id 2, then it will result the table as follows −
id department shift
2 Hr day
2 Network night
2 Hr night
2 Network day
This means there exist multivalued dependencies. In this, the relation between department
and shift is nothing.
This can be rectified by removing the multivalued dependency as, making this data in to two
tables as below −
Table 1
id department
1 coding
2 Hr
2 network
Table 2
id shift
1 day
2 day
64
id shift
2 night
The 4th normal form is applied to remove the multivalued dependencies in the data table.
The fourth normal form thus defines the multivalued dependencies.
If two or more independent relation are kept in a single relation or we can say multivalue
dependency occurs when the presence of one or more rows in a table implies the presence
of one or more other rows in that same table. Put another way, two attributes (or columns)
in a table are independent of one another, but both depend on a third attribute.
For a dependency A -> B, if for a single value of A, multiple value of B exists, then the
table may have multi-valued dependency. The table should have at least 3 attributes and B
and C should be independent for A ->> B multivalued dependency. For example,
Person->-> mobile,
Person ->-> food_likes
This is read as “person multidetermines mobile” and “person multidetermines food_likes.”
Note that a functional dependency is a special case of multivalued dependency. In a
functional dependency X -> Y, every x determines exactly one y, never more than one.
Fourth normal form (4NF) is a level of database normalization where there are no non-
trivial multivalued dependencies other than a candidate key. It builds on the first three
normal forms (1NF, 2NF and 3NF) and the Boyce-Codd Normal Form (BCNF). It states
that, in addition to a database meeting the requirements of BCNF, it must not contain more
than one multivalued dependency.
Properties – A relation R is in 4NF if and only if the following conditions are satisfied:
65
2. the table should not have any Multi-valued Dependency.
S1 A
S2 B
CID CNAME
C1 C
C2 D
S1 A C1 C
S1 A C2 D
S2 B C1 C
S2 B C2 D
66
Multivalued dependencies (MVD) are:
Example –
Table – R1
Company Product
C1 pendrive
C1 mic
C2 speaker
C2 speaker
Company->->Product
Table – R2
Agent Company
Aman C1
Aman C2
67
Agent Company
Mohan C1
Agent->->Company
Table – R3
Agent Product
Aman pendrive
Aman mic
Aman speaker
Mohan speaker
Agent->->Product
Table – R1⋈R2⋈R3
Company Product Agent
C1 pendrive Aman
C1 mic Aman
C2 speaker speaker
C1 speaker Aman
Agent->->Product
68
Properties – A relation R is in 5NF if and only if it satisfies following conditions:
Example – Consider the above schema, with a case as “if a company makes a product and
an agent is an agent for that company, then he always sells that product for the company”.
Under these circumstances, the ACP table is shown as:
Table – ACP
Agent Company Product
A1 PQR Nut
A1 PQR Bolt
A1 XYZ Nut
A1 XYZ Bolt
A2 PQR Nut
The relation ACP is again decompose into 3 relations. Now, the natural Join of all the three
relations will be shown as:
Table – R1
Agent Company
A1 PQR
A1 XYZ
A2 PQR
Table – R2
Agent Product
69
Agent Product
A1 Nut
A1 Bolt
A2 Nut
Table – R3
Company Product
PQR Nut
PQR Bolt
XYZ Nut
XYZ Bolt
Result of Natural Join of R1 and R3 over ‘Company’ and then Natural Join of R13 and R2
over ‘Agent’and ‘Product’ will be table ACP.
Hence, in this example, all the redundancies are eliminated, and the decomposition of ACP
is a lossless join decomposition. Therefore, the relation is in 5NF as it does not violate the
property of lossless join.
70
Unit 4
TRANSACTIONS
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account.
This small transaction contains several low-level tasks:
X's Account
Open_Account(X)
1. Old_Balance = X.balance
2. New_Balance = Old_Balance - 800
3. X.balance = New_Balance
4. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Operations of Transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a
buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following
operations:
1. R(X);
2. X = X - 500;
3. W(X);
Let's assume the value of X before starting of the transaction is 4000.
o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain
3500.
o The third operation will write the buffer's value to the database. So X's final value will
be 3500.
But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.
71
For example: If in the above transaction, the debit transaction fails after executing operation
2 then X's value will remain 4000 in the database which is not acceptable by the bank.
TRANSACTION RECOVERY
UNDO and REDO: lists of transactions UNDO = all transactions running at the last
checkpoint REDO = empty For each entry in the log, starting at the last checkpoint If a
BEGIN TRANSACTION entry is found for T Add T to UNDO If a COMMIT entry is found
for T Move T from UNDO to REDO
ACID PROPERTIES
A transaction is a very small unit of a program and it may contain several lowlevel tasks.
A transaction in a database system must maintain Atomicity, Consistency, Isolation,
and Durability − commonly known as ACID properties − in order to ensure accuracy,
completeness, and data integrity.
Atomicity − This property states that a transaction must be treated as an atomic unit,
that is, either all of its operations are executed or none. There must be no state in a
database where a transaction is left partially completed. States should be defined either
72
before the execution of the transaction or after the execution/abortion/failure of the
transaction.
Consistency − The database must remain in a consistent state after any transaction.
No transaction should have any adverse effect on the data residing in the database. If
the database was in a consistent state before the execution of a transaction, it must
remain consistent after the execution of the transaction as well.
Durability − The database should be durable enough to hold all its latest updates even
if the system fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold the modified data.
If a transaction commits but the system fails before the data could be written on to the
disk, then that data will be updated once the system springs back into action.
Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions
will be carried out and executed as if it is the only transaction in the system. No
transaction will affect the existence of any other transaction.
SYSTEM RECOVERY
Any transaction that was running at the time of failure needs to be undone and
restarted
Any transactions that committed since the last checkpoint need to be redone
Transactions of type T1 need no recovery • Transactions of type T3 or T5 need to be
undone and restarted
Transactions of type T2 or T4 need to be redone.
Media Failures
System failures are not too severe • Only information since the last checkpoint is
affected • This can be recovered from the transaction log • Media failures (disk crashes etc)
are more serious • The data stored to disk is damaged • The transaction log itself may be
damaged
Recovery from Media Failure
• Restore the database from the last backup
• Use the transaction log to redo any changes made since the last backup
• If the transaction log is damaged you can’t do step 2
• Store the log on a separate physical device to the database
• The risk of losing both is then reduced.
MEDIA RECOVERY
If you restore the archived redo log files and data files, then you must perform media
recovery before you can open the database. Any database transactions in the archived redo
log files not reflected in the data files are applied to the data files, bringing them to a
transaction-consistent state before the database is opened.
Media recovery requires a control file, data files (typically restored from backup), and
online and archived redo log files containing changes since the time the data files were
backed up. Media recovery is most often used to recover from media failure, such as the loss
of a file or disk, or a user error, such as the deletion of the contents of a table.
RMAN enables you to perform both a complete and a point-in-time recovery of your
database. However, this documentation focuses on complete recovery.
Two Phase commit protocol is a type of distributed commit protocol. There are two
different types of databases. In a local database system, every transaction needs to be
committed. Therefore, the transaction manager has the role to commit the decision by
conveying it to the reporting manager.
when it comes to a distributed system, the transaction manager should convey it from
all the servers from various sites included in the distributed system to commit the decision.
When each server completes the processing at each site. The transaction reaches a partially
committed state. But it has to wait until all the transaction reaches that state. Once all the
transactions from different servers reach the partially committed state, the transaction
manager can commit the transaction. However, it is necessary that all the sites must commit
the transaction.
SAVE POINTS
74
transaction that can be "rolled back to" without affecting any work done in the transaction
before the savepoint was created.
SAVEPOINT command
SAVEPOINT command is used to temporarily save a transaction so that you can rollback to
that point whenever required.
SAVEPOINT savepoint_name;
Copy
In short, using this command we can name the different states of our data in any table and
then rollback to that state using the ROLLBACK command whenever required.
savepoint
Savepoint names must be distinct within a given transaction. If you create a second savepoint
with the same identifier as an earlier savepoint, then the earlier savepoint is erased. After a
savepoint has been created, you can either continue processing, commit your work, roll back
the entire transaction, or roll back to the savepoint.
Example
Creating Savepoints: Example To update the salary for Banda and Greene in the sample
table hr.employees, check that the total department salary does not exceed 314,000, then
reenter the salary for Greene:
UPDATE employees
SAVEPOINT banda_sal;
UPDATE employees
SAVEPOINT greene_sal;
75
UPDATE employees
COMMIT;
Recovery Facilities
Checkpoint facility allows updates to the database for getting the latest patches to be made
permanent and keep secure from vulnerability. Recovery manager allows the database system
for restoring the database to a reliable and steady-state after any failure occurs.
In addition to the advanced facilities noted above, SQL is rich in the type of ease of
use capabilities that are necessary to support relational databases from the simple to the
complex. Table Facility First and foremost, SQL provides a table facility that enables a
prompted, intuitive interface for the following functions: 9 Defining databases 9 Populating
databases with rows 9 Manipulating databases.
Table Editor SQL also provides a table editor that makes it easy for you to perform
the following functions against rows in table data that is structured in row and column
format:. 9 Access 9 Insert 9 Update 9 Delete Query Facility: With the Query facility, SQL
permits you to interactively define queries and have results displayed in a variety of report
formats including the following: 9 Tabular 9 Matrix 9 Free format For those readers who
have a System i5 background, you will notice that SQL brings with it its own naming scheme
that is significantly different from corresponding native objects. See table 4-1 for specifics
CONCURRENCY
Database concurrency is the ability of a database to allow multiple users to affect
multiple transactions. This is one of the main properties that separates a database from other
forms of data storage, like spreadsheets.
The ability to offer concurrency is unique to databases. Spreadsheets or other flat file
means of storage are often compared to databases, but they differ in this one important
regard.
Spreadsheets cannot offer several users the ability to view and work on the different
data in the same file, because once the first user opens the file it is locked to other users.
Other users can read the file, but may not edit data.
76
The ability to offer concurrency is unique to databases. Spreadsheets or other flat file
means of storage are often compared to databases, but they differ in this one important
regard.
Spreadsheets cannot offer several users the ability to view and work on the different
data in the same file, because once the first user opens the file it is locked to other users.
Other users can read the file, but may not edit data.
LOCKING PROTOCOLS
In DBMS Lock based Protocols, there are two modes for locking and unlocking data
items Shared Lock (lock-S) and Exclusive Lock (lock-X). Let's go through the two types of
locks in detail:
Shared Lock
Shared Locks, which are often denoted as lock-S(), are defined as locks that provide
Read-Only access to the information associated with them. Whenever a shared lock is
used on a database, it can be read by several users, but these users who are reading the
information or the data items will not have the permission to edit it or make any
changes to the data items.
To put it another way, we can say that shared locks don't provide the access to write.
Because numerous users can read the data items simultaneously, multiple shared locks
can be installed on them at the same time, but the data item must not have any other
locks connected with it.
A shared lock, also known as a read lock, is solely used to read data objects. Read
integrity is supported via shared locks.
Shared locks can also be used to prevent records from being updated.
S-lock is requested via the Lock-S instruction.
Exclusive Lock
Exclusive Lock allows the data item to be read as well as written. This is a one-time
use mode that can't be utilized on the exact data item twice. To obtain X-lock, the user
77
needs to make use of the lock-x instruction. After finishing the 'write' step,
transactions can unlock the data item.
By imposing an X lock on a transaction that needs to update a person's account
balance, for example, you can allow it to proceed. As a result of the exclusive lock,
the second transaction is unable to read or write.
The other name for an exclusive lock is write lock.
At any given time, the exclusive locks can only be owned by one transaction.
Example of exclusive locks: Consider the instance where the value of a data item X is equal
to 50 and a transaction requires a deduction of 20 from the data item X. By putting a Y lock
on this particular transaction, we can make it possible. As a result, the exclusive lock prevents
any other transaction from reading or writing.
There are basically four lock based protocols in dbms namely Simplistic Lock
Protocol, Pre-claiming Lock Protocol, Two-phase Locking Protocol, and Strict Two-Phase
Locking Protocol. Let's go through each of these lock-based protocols in detail.
1. Growing Phase: In this phase, we can acquire new locks on data items but none of
these locks can be released.
2. Shrinking Phase: In this phase, the existing locks can be released but no new locks
can be obtained.
Two-phase locking helps to reduce the amount of concurrency in a schedule but just like
the two sides of a coin two-phase locking has a few cons too. The protocol raises transaction
processing costs and may have unintended consequences. The likelihood of establishing
deadlocks is one bad result.
It is responsible for assuring that if 1 transaction modifies data, there can be no other
transaction that will be able to read it until the first transaction commits. The majority of
database systems use a strict two-phase locking protocol.
Deadlock
When a transaction must wait an unlimited period for a lock, it is referred to as starvation.
The following are the causes of starvation :
Let's know how starvation can be prevented. Random process selection for resource or
processor allocation should be avoided since it encourages hunger. The resource allocation
priority scheme should contain ideas like aging, in which a process' priority rises as it waits
longer. This prevents starvation.
Deadlock- In a circular chain, a deadlock situation occurs when two or more processes
are expecting each other to release a resource, or when more than 2 processes are waiting for
the resource.
Two-Phase Locking –
This is just a skeleton transaction that shows how unlocking and locking work with 2-PL.
Note for:
79
Transaction T1:
The growing Phase is from steps 1-3.
The shrinking Phase is from steps 5-7.
Lock Point at 3
Transaction T2:
The growing Phase is from steps 2-6.
The shrinking Phase is from steps 8-9.
Lock Point at 6
DEADLOCK
Deadlock Avoidance –
When a database is stuck in a deadlock, It is always better to avoid the deadlock rather than
restarting or aborting the database. The deadlock avoidance method is suitable for smaller
databases whereas the deadlock prevention method is suitable for larger databases.
One method of avoiding deadlock is using application-consistent logic. In the above-given
example, Transactions that access Students and Grades should always access the tables in
the same order. In this way, in the scenario described above, Transaction T1 simply waits
for transaction T2 to release the lock on Grades before it begins. When transaction T2
releases the lock, Transaction T1 can proceed freely.
Another method for avoiding deadlock is to apply both row-level locking mechanism and
READ COMMITTED isolation level. However, It does not guarantee to remove deadlocks
completely.
80
Deadlock Detection –
When a transaction waits indefinitely to obtain a lock, The database management system
should detect whether the transaction is involved in a deadlock or not.
Wait-for-graph is one of the methods for detecting the deadlock situation. This method is
suitable for smaller databases. In this method, a graph is drawn based on the transaction and
their lock on the resource. If the graph created has a closed-loop or a cycle, then there is a
deadlock.
For the above-mentioned scenario, the Wait-For graph is drawn below
Deadlock prevention –
For a large database, the deadlock prevention method is suitable. A deadlock can be
prevented if the resources are allocated in such a way that deadlock never occurs. The
DBMS analyzes the operations whether they can create a deadlock situation or not, If they
do, that transaction is never allowed to be executed.
Deadlock prevention mechanism proposes two schemes :
Wait-Die Scheme –
In this scheme, If a transaction requests a resource that is locked by another transaction,
then the DBMS simply checks the timestamp of both transactions and allows the older
transaction to wait until the resource is available for execution.
Suppose, there are two transactions T1 and T2, and Let the timestamp of any transaction
T be TS (T). Now, If there is a lock on T2 by some other transaction and T1 is
requesting for resources held by T2, then DBMS performs the following actions:
Checks if TS (T1) < TS (T2) – if T1 is the older transaction and T2 has held some
resource, then it allows T1 to wait until resource is available for execution. That means
if a younger transaction has locked some resource and an older transaction is waiting
for it, then an older transaction is allowed to wait for it till it is available. If T1 is an
older transaction and has held some resource with it and if T2 is waiting for it, then T2
is killed and restarted later with random delay but with the same timestamp. i.e. if the
older transaction has held some resource and the younger transaction waits for the
resource, then the younger transaction is killed and restarted with a very minute delay
with the same timestamp.
This scheme allows the older transaction to wait but kills the younger one.
SERIALIZABILITY
A schedule is serialized if it is equivalent to a serial schedule. A concurrent schedule must
ensure it is the same as if executed serially means one after another. It refers to the sequence
of actions such as read, write, abort, commit are performed in a serial manner.
Example
T1 T2
READ1(A)
WRITE1(A)
READ1(B)
C1
READ2(B)
WRITE2(B)
READ2(B)
C2
Non serial schedule − When a transaction is overlapped between the transaction T1 and T2.
Example
T1 T2
READ1(A)
WRITE1(A)
82
T1 T2
READ2(B)
WRITE2(B)
READ1(B)
WRITE1(B)
READ1(B)
Types of serializability
Conflict serializability
It orders any conflicting operations in the same way as some serial execution. A pair of
operations is said to conflict if they operate on the same data item and one of them is a write
operation.
That means
Readi(x) readj(x) - non conflict read-read operation
Readi(x) writej(x) - conflict read-write operation.
Writei(x) readj(x) - conflict write-read operation.
Writei(x) writej(x) - conflict write-write operation.
RECOVERY ISOLATION LEVELS
In case of transaction the term ACID has been used significantly to state some of
important properties that a transaction must follow. We all know ACID stands for Atomicity,
Consistency, Isolation and Durability and these properties collectively called as ACID
Properties.
83
Properties of transaction
Isolation
It determines the visibility of transactions of other systems. A lower level allows every user
to access the same data. Therefore, it involves high risk of data privacy and security of the
system. However, a higher isolation level reduces the type of concurrency over the data but
requires more resources and is slower than lower isolation levels.
The isolation protocols help safeguards the data from unwanted transactions. They maintain
the integrity of every data by defining how and when the changes made by one operation are
visible to others.
Levels of isolation
84
SET SESSION TRANSACTION
ISOLATION LEVEL REPEATABLE READ;
transaction A
transaction B
SET TRANSACTION
ISOLATION LEVEL READ UNCOMMITTED;
transaction C
SET TRANSACTION
ISOLATION LEVEL READ COMMITTED;
transaction D
transaction E
session ends
Check which option −
A- Serializable
B- Repeatable read
C- Read uncommitted
Solution
Step 1 − In the above program, the first session starts and ends without doing any
transaction.
Step 2 − The second session begins at session-level with isolation level "Repeatable
Read". Transaction A& B gets executed with these settings.
Step 3 − Once again a new transaction begins with isolation level "Read
uncommitted". This setting is used only for "Transaction C" since "Set transaction"
alone is mentioned. If the "SET transaction" is used without global or session
keywords, then these particular settings will work only for a single transaction.
Step 4 − Once again "Set Transaction" with isolation level Read committed works
only for Transaction D. (Refer step 3 for reason)
Step 5 − "Transaction E" gets continued at the "Repeatable Read" since the
transaction started at step 2 has not ended still. Transaction isolation level set at Step 3
and Step 4 vanishes once a single transaction is executed. So, automatically
"Transaction E" will refer to the prior transaction settings.
Transaction Properties
A database management system (DBMS) is considered a relational database management
system (RDBMS) if it follows the transactional properties, ACID.
A: Atomicity
C: Consistency
I: Isolation
D: Durability
The SQL Server takes care of the Atomicity, Consistency, and Durability of the system, and
the user has to care about the Isolation property of the transaction. The meaning of each of
these properties is described below, as it applies to a transaction.
Atomicity
Transaction work should be atomic, which means all the work is one unit. If the user
performs a transition, either the transaction should complete and perform all the asked
operations, or it should fail and don’t do anything. Atomicity deals with the transaction
process and an RDBMS transaction does not leave the work incomplete.
Consistency
After the transaction is completed, the database should not be left in an inconsistent state,
which means the data on which transaction is applied must be logically correct, according to
the rules of the system.
Isolation
If two transactions are applied on a similar database, then both the transaction should be
isolated from each other, and the user must see the result. It can also be defined as a
transaction that should see the data only after or before the concurrent transaction process is
completed, which means if a one transaction process is in between, the other transaction
process should wait until the first transaction is completed.
For instance, if A performs a transaction process on data d1, and before the transaction
process gets completed, B also performs another transaction process on the same
data d1. Here, the isolation property will isolate the transaction process of A and B, and the
transaction process of B will only start after the transaction process of A gets completed.
Durability
Even if the system fails, the transaction should be persistent, which means, if the system fails
during a transaction process, the transaction should be dropped, too, without affecting the
data.
86
Concurrency is a situation that arises in a database due to the transaction process.
Concurrency occurs when two or more than two users are trying to access the same data or
information. DBMS concurrency is considered a problem because accessing data
simultaneously by two different users can lead to inconsistent results or invalid behaviour.
Dirty Reads
Lost Updates
Non-repeatable Reads
Phantom Reads
Dirty Read
This problem occurs when another process reads the changed, but uncommitted data. For
instance, if one process has changed data but not committed it yet, another process is able to
read the same data. This leads to the inconsistent state for the reader.
Lost Updates
This problem occurs when two processes try to manipulate the same data simultaneously.
This problem can lead to data loss, or the second process might overwrite the first processs
change.
Non-repeatable Reads
This problem occurs when one process is reading the data, and another process is writing the
data. In non-repeatable reads, the first process reading the value might get two different
values, as the changed data is read a second time because the second process changes the
data.
Phantom Reads
If two same queries executed by two users show different output, then it would be a Phantom
Read problem. For instance, If user A select a query to read some data, at the same time the
user B insert some new data but the user A only get able to read the old data at the first
attempt, but when user A re-query the same statement then he/she gets a different set of data.
1. Pessimistic model - In the pessimistic model of managing concurrent data access, the
readers can block writers, and the writers can block readers.
2. Optimistic model - In the optimistic model of managing concurrent data access, the
readers cannot block writers, and the writers cannot block readers, but the writer can
block another writer.
87
Note that readers are users are performing the SELECT operations. Writers are users are
performing INSERT, ALTER, UPDATE, S.E.T. operations.
Isolation Level
When we connect to a SQL server database, the application can submit queries to the
database with one of five different isolation levels. These levels are:
Read Uncommitted
Read Committed
Repeatable Read
Serializable
Snapshot
Out of these five isolation levels, Read Uncommitted, Read Committed, Repeatable Read,
and Serializable come under the pessimistic concurrency model. Snapshot comes under the
optimistic concurrency model. These levels are ordered in terms of the separation of work by
two different processes, from minimal separation to maximal.
Let's look at each of these isolation levels and how they affect concurrency of operations.
Read Uncommitted
This is the first level of isolation, and it comes under the pessimistic model of concurrency. In
Read Uncommitted, one transaction is allowed to read the data that is about to be changed by
the commit of another process. Read Uncommitted allows the dirty read problem.
Read Committed
This is the second level of isolation and also falls under the pessimistic model of
concurrency. In the Read Committed isolation level, we are only allowed to read data that is
committed, which means this level eliminates the dirty read problem. In this level, if you are
reading data then the concurrent transactions that can delete or write data, some work is
blocked until other work is complete.
Repeatable Read
The Repeatable Read isolation level is similar to the Read Committed level and eliminates
the Non-Repeatable Read problem. In this level, the transaction has to wait till another
transaction's update or read query is complete. But if there is an insert transaction, it does not
wait for anyone. This can lead to the Phantom Read problem.
Serializable
This is the highest level of isolation in the pessimistic model. By implementing this level of
isolation, we can prevent the Phantom Read problem. In this level of isolation, we can ask
any transaction to wait until the current transaction completes.
Snapshot
Snapshot follows the optimistic model of concurrency, and this level of isolation takes a
snapshot of the current data and uses it as a copy for the different transactions. Here each
88
transaction has its copy of data, so if a user tries to perform a transaction like an update or
insert, it asks him to re-verify all the operation before the process gets started executing.
89
UNIT 5
Implementation Techniques
As discussed above, the data in database management system (DBMS) is stored on physical
storage devices such as main memory and secondary (external) storage. Thus, it is important
that the physical database (or storage) is properly designed to increase data processing
efficiency and minimise the time required by users to interact with the information system.
When required, a record is fetched from the disk to main memory for further processing. File
manager is the software that manages the allocation of storage locations and data structure.
Cache
The fastest and most costly form of storage Volatile Managed by the computer system
hardware.
Main memory4 Fast access (10s to 100s of nanoseconds; 1 nanosecond = 10–9 seconds) 4
Generally too small (or too expensive) to store the entire database
90
Capacities of up to a few Gigabytes widely used currently Capacities have gone up and
per-byte costs have decreased steadily and rapidly
Volatile — contents of main memory are usually lost if a power failure or system crash
occurs.
Flash memory
Data survives power failureData can be written at a location only once, but location can be
erased and written to again Can support only a limited number of write/erase cycles.
Erasing of memory has to be done to an entire bank of memory.
MAGNETIC DISK
A magnetic disk is a storage device that uses a magnetization process to write, rewrite
and access data. It is covered with a magnetic coating and stores data in the form of tracks,
spots and sectors. Hard disks, zip disks and floppy disks are common examples of magnetic
disks.
A magnetic disk primarily consists of a rotating magnetic surface (called platter) and
a mechanical arm that moves over it. Together, they form a “comb”. The mechanical arm is
used to read from and write to the disk. The data on a magnetic disk is read and written using
a magnetization process.
The platter keeps spinning at high speed while the head of the arm moves across its
surface. Since the whole device is hermetically sealed, the head floats on a thin film of air.
When a small current is applied to the head, tiny spots on the disk surface are magnetized and
data is stored. Vice-versa, a small current could be applied to those tiny spots on the platter
when the head needs to read the data.
Data is organized on the disk in the form of tracks and sectors, where tracks are the
circular divisions of the disk. Tracks are further divided into sectors that contain blocks of
data. All read and write operations on the magnetic disk are performed on the sectors. The
floating heads require very precise control to read/write data due to the proximity of the
tracks.
Early devices lacked the precision of modern ones and allowed for just a certain
number of tracks to be placed in each disk. Greater precision of the heads allowed for a much
greater number of tracks to be closely packed together in subsequent devices. Together with
the invention of RAID (redundant array of inexpensive disks), a technology that combines
multiple disk drives, the storage capacity of later devices increased year after year.
The first magnetic hard drive built by IBM in 1956 was a large machine consisting of
50 21-inch (53-cm) disks. Despite its size, it could store just 5 megabytes of data. Since then,
91
magnetic disks have increased their storage capacities many times-folds, while their size has
decreased comparably.
The size of modern hard disks is just about 3.5 inches (approx. 9 cm) with their
capacity easily reaching one or more terabytes. A similar fate happened to floppy disks,
which shrunk from the original 8 inches of the late 60s, to the much smaller 3.5 inches of the
early 90s. However, floppy disks have eventually became obsolete after the introduction of
CD-ROMs in the late 1990s and now have all but completely disappeared.
RAID
RAID works by placing data on multiple disks and allowing input/output (I/O)
operations to overlap in a balanced way, improving performance. Because using multiple
disks increases the mean time between failures, storing data redundantly also increases fault
tolerance.
RAID arrays appear to the operating system (OS) as a single logical drive.
RAID employs the techniques of disk mirroring or disk striping. Mirroring will copy identical
data onto more than one drive. Striping partitions help spread data over multiple disk drives.
Each drive's storage space is divided into units ranging from a sector of 512 bytes up to
several megabytes. The stripes of all the disks are interleaved and addressed in order. Disk
mirroring and disk striping can also be combined in a RAID array.
In a single-user system where large records are stored, the stripes are typically set up to be
small (512 bytes, for example) so that a single record spans all the disks and can be accessed
quickly by reading all the disks at the same time.
In a multiuser system, better performance requires a stripe wide enough to hold the typical or
maximum size record, enabling overlapped disk I/O across drives.
RAID controller
A RAID controller is a device used to manage hard disk drives in a storage array. It
can be used as a level of abstraction between the OS and the physical disks, presenting
groups of disks as logical units. Using a RAID controller can improve performance and help
protect data in case of a crash.
A RAID controller may be hardware- or software-based. In a hardware-based
RAID product, a physical controller manages the entire array. The controller can also be
designed to support drive formats such as Serial Advanced Technology Attachment and
92
Small Computer System Interface. A physical RAID controller can also be built into a
server's motherboard.
With software-based RAID, the controller uses the resources of the hardware system,
such as the central processor and memory. While it performs the same functions as a
hardware-based RAID controller, software-based RAID controllers may not enable as much
of a performance boost and can affect the performance of other applications on the server.
If a software-based RAID implementation is not compatible with a system's boot-up
process and hardware-based RAID controllers are too costly, firmware, or driver-based
RAID, is a potential option.
Firmware-based RAID controller chips are located on the motherboard, and all
operations are performed by the central processing unit (CPU), similar to software-based
RAID. However, with firmware, the RAID system is only implemented at the beginning of
the boot process. Once the OS has loaded, the controller driver takes over RAID
functionality. A firmware RAID controller is not as pricey as a hardware option, but it puts
more strain on the computer's CPU. Firmware-based RAID is also called hardware-assisted
software RAID, hybrid model RAID and fake RAID.
RAID levels
RAID devices use different versions, called levels. The original paper that coined the
term and developed the RAID setup concept defined six levels of RAID -- 0 through 5. This
numbered system enabled those in IT to differentiate RAID versions. The number of levels
has since expanded and has been broken into three categories: standard, nested and
nonstandard RAID levels.
Standard RAID levels
RAID 0.
This configuration has striping but no redundancy of data. It offers the best
performance, but it does not provide fault tolerance.
93
visualization of RAID 0.
RAID 1. Also known as disk mirroring, this configuration consists of at least two drives that
duplicate the storage of data. There is no striping. Read performance is improved, since either
disk can be read at the same time. Write performance is the same as for single disk storage.
RAID 2. This configuration uses striping across disks, with some disks storing error checking
and correcting (ECC) information. RAID 2 also uses a dedicated Hamming code parity, a
linear form of ECC. RAID 2 has no advantage over RAID 3 and is no longer used.
94
RAID 3. This technique uses striping and dedicates one drive to storing parity information.
The embedded ECC information is used to detect errors. Data recovery is accomplished by
calculating the exclusive information recorded on the other drives. Because an I/O operation
addresses all the drives at the same time, RAID 3 cannot overlap I/O. For this reason, RAID 3
is best for single-user systems with long record applications.
RAID 4. This level uses large stripes, which means a user can read records from any single
drive. Overlapped I/O can then be used for read operations. Because all write operations are
required to update the parity drive, no I/O overlapping is possible.
95
RAID 5. This level is based on parity block-level striping. The parity information is striped
across each drive, enabling the array to function, even if one drive were to fail. The array's
architecture enables read and write operations to span multiple drives. This results in
performance better than that of a single drive, but not as high as a RAID 0 array. RAID 5
requires at least three disks, but it is often recommended to use at least five disks for
performance reasons.
RAID 5 arrays are generally considered to be a poor choice for use on write-intensive
systems because of the performance impact associated with writing parity data. When a disk
fails, it can take a long time to rebuild a RAID 5 array.
96
RAID 6. This technique is similar to RAID 5, but it includes a second parity scheme
distributed across the drives in the array. The use of additional parity enables the array to
continue functioning, even if two disks fail simultaneously. However, this extra protection
comes at a cost. RAID 6 arrays often have slower write performance than RAID 5 arrays.
TERTIARY STORAGE
97
FILE ORGANIZATION
A database consist of a huge amount of data. The data is grouped within a table in
RDBMS, and each table have related records. A user can see that the data is stored in form
of tables, but in actual this huge amount of data is stored in physical memory in form of
files.
File – A file is named collection of related information that is recorded on secondary
storage such as magnetic disks, magnetic tapes and optical disks.
File Organization refers to the logical relationships among various records that constitute
the file, particularly with respect to the means of identification and access to any specific
record. In simple terms, Storing the files in certain order is called file Organization. File
Structure refers to the format of the label and data blocks and of any logical control
record.
98
Some types of File Organizations are :
We will be discussing each of the file Organizations in further sets of this article along
with differences and advantages/ disadvantages of each file Organization methods.
The easiest method for file Organization is Sequential method. In this method the file
are stored one after another in a sequential manner. There are two ways to implement this
method:
Pile File Method – This method is quite simple, in which we store the records in a
sequence i.e one after other in the order in which they are inserted into the tables.
Let the R1, R3 and so on upto R5 and R4 be four records in the sequence. Here, records
are nothing but a row in any table. Suppose a new record R2 has to be inserted in the
sequence, then it is simply placed at the end of the file.
99
Sorted File Method –In this method, As the name itself suggest whenever a new record
has to be inserted, it is always inserted in a sorted (ascending or descending) manner.
Sorting of records may be based on any primary key or any other key.
Let us assume that there is a preexisting sorted sequence of four records R1, R3, and so
on upto R7 and R8. Suppose a new record R2 has to be inserted in the sequence, then it
will be inserted at the end of the file and then it will sort the sequence .
100
Pros and Cons of Sequential File Organization –
Pros –
Fast and efficient method for huge amount of data.
Simple design.
Files can be easily stored in magnetic tapes i.e cheaper storage mechanism.
Cons –
Time wastage as we cannot jump on a particular record that is required, but we have to
move in a sequential manner which takes our time.
Sorted file method is inefficient as it takes time and space for sorting records.
Heap File Organization works with data blocks. In this method records are inserted
at the end of the file, into the data blocks. No Sorting or Ordering is required in this
method. If a data block is full, the new record is stored in some other block, Here the other
data block need not be the very next data block, but it can be any block in the memory. It is
the responsibility of DBMS to store and manage the new records.
Suppose we have four records in the heap R1, R5, R6, R4 and R3 and suppose a
new record R2 has to be inserted in the heap then, since the last data block i.e data
block 3 is full it will be inserted in any of the data blocks selected by the DBMS,
lets say data block 1.
101
If we want to search, delete or update data in heap file Organization the we will
traverse the data from the beginning of the file till we get the requested record. Thus if the
database is very huge, searching, deleting or updating the record will take a lot of time.
Pros and Cons of Heap File Organization –
Pros –
Fetching and retrieving records is faster than sequential record but only in case of small
databases.
When there is a huge number of data needs to be loaded into the database at a time, then
this method of file Organization is best suited.
102
classified into two types: Dense index and Sparse index. Dense index o The dense index
contains an index record for every search key value in the data file. It makes searching faster.
o In this, the number of records in the index table is same as the number of records in the
main table. o It needs more space to store index record itself. The index records have the
search key and a pointer to the actual record on the disk.
Clustering Index o A clustered index can be defined as an ordered data file. Sometimes the
index is created on non-primary key columns which may not be unique for each record. o In
this case, to identify the record faster, we will group two or more columns to get the unique
value and create index out of them. This method is called a clustering index. o The records
which have similar characteristics are grouped, and indexes are created for these group.
Example: suppose a company contains several employees in each department. Suppose we
use a clustering index, where all employees which belong to the same Dept_ID are
considered within a single cluster, and index pointers point to the cluster as a whole. Here
Dept_Id is a non-unique key.
The previous schema is little confusing because one disk block is shared by records which
belong to the different cluster. If we use separate disk block for separate clusters, then it is
called better technique.
Secondary Index In the sparse indexing, as the size of the table grows, the size of mapping
also grows. These mappings are usually kept in the primary memory so that address fetch
should be faster. Then the secondary memory searches the actual data based on the address
got from mapping. If the mapping size grows then fetching the address itself becomes slower.
In this case, the sparse index will not be efficient. To overcome this problem, secondary
indexing is introduced. In secondary indexing, to reduce the size of mapping, another level of
indexing is introduced. In this method, the huge range for the columns is selected initially so
that the mapping size of the first level becomes small. Then each range is further divided into
smaller ranges. The mapping of the first level is stored in the primary memory, so that
address fetch is faster. The mapping of the second level and actual data are stored in the
secondary memory (hard disk).
Hashing
Hashing is an effective technique to calculate the direct location of a data record on the disk
without using index structure.
Hashing uses hash functions with search keys as parameters to generate the address of a data
record.
Hash Organization
Bucket − A hash file stores data in bucket format. Bucket is considered a unit of
storage. A bucket typically stores one complete disk block, which in turn can store one
or more records.
Hash Function − A hash function, h, is a mapping function that maps all the set of
search-keys K to the address where actual records are placed. It is a function from
search keys to bucket addresses.
Static Hashing
103
In static hashing, when a search-key value is provided, the hash function always computes the
same address. For example, if mod-4 hash function is used, then it shall generate only 5
values. The output address shall always be same for that function. The number of buckets
provided remains unchanged at all times.
Operation
Insertion − When a record is required to be entered using static hash, the hash
function h computes the bucket address for search key K, where the record will be
stored.
Bucket address = h(K)
Search − When a record needs to be retrieved, the same hash function can be used to
retrieve the address of the bucket where the data is stored.
Delete − This is simply a search followed by a deletion operation.
Bucket Overflow
The condition of bucket-overflow is known as collision. This is a fatal state for any static
hash function. In this case, overflow chaining can be used.
Overflow Chaining − When buckets are full, a new bucket is allocated for the same
hash result and is linked after the previous one. This mechanism is called Closed
Hashing.
104
Linear Probing − When a hash function generates an address at which data is already
stored, the next free bucket is allocated to it. This mechanism is called Open Hashing.
Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the size of
the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets
are added and removed dynamically and on-demand. Dynamic hashing is also known
as extended hashing.
Hash function, in dynamic hashing, is made to produce a large number of values and only a
few are used initially.
105
Types of Ordered Indices
Dense Indexing
In a dense index, for every search-key in the file, an index entry is present.
In a dense-clustering index, the index record contains the search-key value and a
pointer to the first data record with that search-key value.
The remaining number of records with similar search-key value would be stored in a
sequence after the first record.
Sparse Indexing
In a sparse index, an index entry appears for only some of the search-key values.
106
These indices can be used only if the relation is stored in sorted order of the search-
key value.
To locate a record, one has to find the index entry with the largest search-key value
that is less than or equal to the search-key value for which we are looking. We start
looking from the record pointed to by that index entry, and then follow the pointers in
the file until we find the desired record.
Multi-Level Indexing
Let’s say we have a data file with a large number of records, then Multi-Level
Indexing will come to use.
As the size of the database grows, the size of the indices also grows, since a multilevel
index is stored in the disk with the actual database files.
Example- 2: Construct a B+ Tree for the following search key values, Where n = 4.
{10, 30, 40, 50, 60, 70, 90 }
107
Now, Let’s Insert and Delete some elements into this tree.
Insert 25,75
When we insert an element, we add it on the next right node of the value lower than
the inserting element.
Delete 70
Here, when you delete any element. The element that has been deleted will be
replaced with the element on the right.
108
1. All paths from the root to the leaf are equally long.
2. A node that is not a root or leaf, has between [n / 2] and ‘n’ children.
3. A leaf node has between [(n-1) / 2] and ‘n-1’ values.
The structures of leaf, non-leaf nodes of this tree is :
Example-1: Construct a B- Tree for the following search key values, where n = 3.
{10, 20, 30, 40, 50}
Let’s take another example and insert, delete elements from the tree
Example-2: Construct a B- Tree for the following search key values, where n = 3. (n
is no of pointers)
{10, 20, 30, 40, 50, 60, 70, 80, 90}
109
1. b) Insert 65 to the above tree.
Static Hashing
Bucket Overflow :
This will occur only in two ways.
1. Insufficient buckets.
2. Skew in distribution of records. Some buckets are given more records than others, so a bucket
can overflow even though the other buckets still have space. This situation is called ‘bucket
skew’.
Overflow Chaining :
The overflows of a given bucket are chained together in a linked list. This is called
‘Closed Hashing’.
110
In ‘Open Hashing’, the set of buckets are fixed, and there are no overflow chains.
Here, if a bucket is full, the system inserts records in some other bucket in the initial
set of buckets.
A hash index arranges the search keys, with their associated pointers, into a hash
file structure. In this, one applies a hash function on a search key to helping identify a
bucket, and store the key and its associated pointers in the bucket.
Example of Static Hashing
Example-10: Hash file organization of DEPT file using DName as key, where there
are eight departments.
1. The distribution is random : In the average case, each bucket will have nearly the same
number of values assigned to it, regardless of the actual distribution of search-key values.
DYNAMIC HASHING
The ‘Dynamic Hashing’ technique allows the hash function to be modified
dynamically to accommodate the growth or shrinkage of the database. The ‘dynamic
hashing’ technique we use is called ‘Extendible Hashing’.
This technique is used to know the address of the required record, whose key value is
given.
111
The binary equivalent of the key is considered to map the key value to the address of
the record.
Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps involved
are:
As query processing includes certain activities for data retrieval. Initially, the given
user queries get translated in high-level database languages such as SQL. It gets translated
into expressions that can be further used at the physical level of the file system. After this, the
actual evaluation of the queries and a variety of query -optimizing transformations and takes
place. Thus before processing a query, a computer system needs to translate the query into a
human-readable and understandable language. Consequently, SQL or Structured Query
Language is the best suitable choice for humans. But, it is not perfectly suitable for the
internal representation of the query to the system. Relational algebra is well suited for the
internal representation of a query. The translation process in query processing is similar to the
parser of a query. When a user executes any query, for generating the internal form of the
query, the parser in the system checks the syntax of the query, verifies the name of the
relation in the database, the tuple, and finally the required attribute value. The parser creates a
tree of the query, known as 'parse-tree.' Further, translate it into the form of relational algebra.
With this, it evenly replaces all the use of the views when used in the query.
Thus, we can understand the working of a query processing in the below-described diagram:
112
Suppose a user executes a query. As we have learned that there are various methods of
extracting the data from the database. In SQL, a user wants to fetch the records of the
employees whose salary is greater than or equal to 10000. For doing this, the following query
is undertaken:
Thus, to make the system understand the user query, it needs to be translated in the form of
relational algebra. We can bring this query in the relational algebra form as:
After translating the given query, we can execute each relational algebra operation by using
different algorithms. So, in this way, a query processing begins its working.
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and
evaluating each operation. Thus, after translating the user query, the system executes a query
evaluation plan.
o In order to fully evaluate a query, the system needs to construct a query evaluation
plan.
o The annotations in the evaluation plan may refer to the algorithms to be used for the
particular index or the specific operations.
o Such relational algebra with annotations is referred to as Evaluation Primitives. The
evaluation primitives carry the instructions needed for the evaluation of the operation.
113
o Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query
execution plan.
o A query execution engine is responsible for generating the output of the given query.
It takes the query execution plan, executes it, and finally makes the output for the user
query.
Optimization
o The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not to
write their query efficiently.
o Usually, a database system generates an efficient query evaluation plan, which
minimizes its cost. This type of task performed by the database system and is known
as Query Optimization.
o For optimizing a query, the query optimizer should have an estimated cost analysis of
each operation. It is because the overall operation cost depends on the memory
allocations to several operations, execution costs, and so on.
Finally, after selecting an evaluation plan, the system evaluates the query and produces the
output of the query.
114
that attribute. For a key attribute, d = r, sl = 1/r and s = 1. For a nonkey attribute, by making
an assumption that the d distinct values are uniformly distributed among the records, we
estimate sl = (1/d) and so s = (r/d).
Information such as the number of index levels is easy to maintain because it does not
change very often. However, other information may change frequently; for example, the
number of records r in a file changes every time a record is inserted or deleted. The query
optimizer will need reasonably close but not necessarily com-pletely up-to-the-minute values
of these parameters for use in estimating the cost of various execution strategies.
For a nonkey attribute with d distinct values, it is often the case that the records are
not uniformly distributed among these values. For example, suppose that a company has 5
departments numbered 1 through 5, and 200 employees who are distributed among the
departments as follows: (1, 5), (2, 25), (3, 70), (4, 40), (5, 60). In such cases, the optimizer
can store a histogram that reflects the distribution of employee records over different
departments in a table with the two attributes (Dno, Selectivity), which would contain the
following values for our example: (1, 0.025), (2, 0.125), (3, 0.35), (4, 0.2), (5, 0.3). The
selectivity values stored in the histogram can also be estimates if the employee table changes
frequently.
In the next two sections we examine how some of these parameters are used in cost
functions for a cost-based query optimizer.
select operation
Procedural language
Information is retrieved from the database by specifying the sequence of operations to be
performed.
For Example − Relational algebra.
Structure Query language (SQL) is based on relational algebra.
Relational algebra consists of a set of operations that take one or two relations as an input
and produces a new relation as output.
Types of Relational Algebra operations
The different types of relational algebra operations are as follows −
Select operation
Project operation
Rename operation
Union operation
115
Intersection operation
Difference operation
Cartesian product operation
Join operation
Division operation
Select, project, rename comes under unary operation (operate on one table).
Select operation
It displays the records that satisfy a condition. It is denoted by sigma (σ) and is a horizontal
subset of the original relation.
Syntax
Its syntax is as follows −
σcondition(table name)
Example
Consider the student table given below −
1 CSE A
2 ECE B
3 CIVIL B
4 IT A
Now, to display all the records of student table, we will use the following command −
σ(student)
In addition to this, when we have to display all the records of CSE branch in student table,
we will use the following command −
σbranch=cse(student)
Hence, the result will be as follows −
1 CSE A
To display all the records in student tables whose regno>2, we will use the below mentioned
command −
σRegNo>2(student)
The output will be as follows −
116
RegNo Branch Section
3 CIVIL B
4 IT A
To display the record of ECE branch section B students, use the given command −
σbranch=ECE ^ section=B(student)
To display the records of section B CSE and IT branch, use the following command −
σSection=B ^ Branch=cse ∨ branch=IT(student)
Consider the EMPLOYEE TABLE as another example to know about selection operations.
Retrieve information about those employees whose salary is greater than 20,000
If one condition is specified then, we can use the following command −
σ salary > 20,000 (emp)
If more than one condition specified in the query then ( AND: ^, OR:∨ , Not:#, equal:
=, >, <, >=, <=)
Relational operator will be used to combine the multiple conditions into a single statement.
Example − In order to retrieve the information of those employee whose salary > 20,000 and
working in HOD and Dept no is 20, we can use the following command −
σ salary > 20,000 ^LOC=HOD ^Deptno=20(emp)
Sorting
It is the technique of storing the records in ascending or descending order of one or more
columns. It is useful because, some of the queries will ask us to return sorted records, or in
operations like joins will be more efficient in sorted records. All the records are by default
sorted based on the primary key column. In addition, we can specify to sort the records based
on other columns, as required. Two types of sorting methods are mainly used.
Quick Sort
If a table size is small and can be accommodated into current memory, then quick sort can be
used. As the name suggests it is simple and easy method of sorting. In this method a pivot
element is identified among the values of the column and values less the pivot element is
moved to the left of pivot and greater than pivot elements are moved to the right of the pivot.
It takes very less additional space (n log (n)) to sort. It takes only n log (n) time to sort at best
case and only n2 time at worst case. But this method is less stable as it can alter the position
of two similar records while sorting.
Merge Sort
For the larger tables which cannot be accommodated in the current memory, this type of
sorting is used. It has better performance compared to quick sort. Let us see how this sort can
be done.
Suppose each block can hold two records and memory can hold up to 2 blocks. That means
memory cannot hold all the records of large tables; it can hold up to 4 records only. Suppose
117
Initial table has 12 records with two records in each block. When merge sort is applied, the
records are grouped into 3 with two blocks each. Each block is merged at pass 1 and sorted
for the pass 2, where it again merged and sorted. At the last stage blocks at pass 2 are merged
and sorted to give the final result. This is how a merge sort works.
Join Operations:
A Join operation combines related tuples from different relations, if and only if a given join
condition is satisfied. It is denoted by ⋈.
Example:
EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000
118
Types of Join operations:
1. Natural Join:
o A natural join is the set of tuples of all combinations in R and S that are equal on their
common attribute names.
o It is denoted by ⋈.
Example: Let's use the above EMPLOYEE table and SALARY table:
Input:
Output:
EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
2. Outer Join:
119
The outer join operation is an extension of the join operation. It is used to deal with missing
information.
Example:
EMPLOYEE
FACT_WORKERS
120
Keywords: internet, informations, web, dates
Online database Placing on the Internet collections of complex information involves
storing them in the database which can then be accessed online by users. The term database
can easily be deceiving because in reality system that makes visible this database on the
Internet is far more complex. Any database that provides information to users of Internet
services should be stored on a server that is visible on the Internet and to use a scripting
technology. The information in the database is extracted according to the specific needs of
user and then formatted so that they can be properly displayed. For example, when someone
writes the word "Romania" on Google search engine. com system will request the search
form will search the database of items that the word "Romania" after which will format the
results so that they can be displayed by a browser such as Internet Explorer. A general view
of server architecture is provided in the scheme in the following figure. Fig. 1 Architecture of
Web sereverului support scripting As seen in Figure system architecture is structured on
several levels.
When the user wishes to access external information located on the server, it will use
an Internet navigator to connect to it. Accessing the server is done via a URL. The main
elements which enter into the composition architecture are server: Web server, the parser
scripts type server-side, the drivers for access to the database, the database and collections of
files. The Web server that is a complex application responsible for communication with
external Web browsers. Basically the Web server listens to the HTTP port (default 80) of the
machine on which it is installed. When a request arrives on this port, the Web Server
interprets to see what information has been requested. Information requested from the server
are actually files that reside on your hard disk. The Web server is to wrap these files so that
they can be sent ahead. The required Files can be divided into two categories: • files that
contain static information. They shall be sent forth to browsers without any kind of change.
Static Files are usually images, HTML files, movies, documents offered for download,
movies, Flash animations, etc. • scriptures. These are practices of small programs that run by
a intepretor, by sending to the Web server only the result of their execution.
The main role of these scripts is to dynamically generate documents type. The
technique of dynamic generation of HTML documents makes it possible to access the
databases on the Internet . • the role of the parser scripts type server-side has been described
above. Where a script needs the records from a database that will interact with it through a
driver. He will run in an application-level SQL database. Following the execution of this
application I return a cursor. Had this cursor is generates HTML code that once reached a
navigator determines the display of the data you want. Drivers for access to the database are
meant interaction between interpreter of scriptures and database itself. They are very
specialised software tools that usually are not visible to the programmer nor any user.
The drivers are important because the choice of their flawed significantly affect
system performance. Main SGBD sites used in Web applications are: mySQL, SQL Server
and Oracle. Collections of files are static information which are sent to users on demand. It is
important to note that ASP scripts are designed to produce HTML pages that you send to
Web browsers to display. The major Benefit of ASP scripts is that permit the production of
dynamic HTML code according to the concrete needs. For example, you can easily get the
records from a table to a database data Sourceand may wrap in HTML format can be
displayed in a browser. Although they were conceived as General Web application, the
overwhelming majority of applications THE ASP scripts are related to working with
databases on-line. In order to achieve THE ASP scripts must have the following: • a
computer on which to set up a Web server (for example, Internet Information Server and
121
Personal Web Server); any Windows system can be easily configured to support ASP scripts;
• a text editor; You can use Notepad or specialized editors such as FrontPage or Macromedia
Dreamweaver 2007. • a SGBD for creating and updating of the database used by means of
scripts; • a Web browser to see the result of script execution; Considering that THE ASP
scripts are usually made to work with databases is needed and a database to run the script. It
must be on the same computer with the script, preferably in the same directory.
Database access
ActiveX Data Objects (ADO) is a technology that allows accessing databases from
Web pages. Basically, ADO can be used for writing scripts compact for connecting to data
sources from the Web pages or to sources of OLE DB-compatible data; ADO is also
utilizeazăşi like databases, spreadsheets tabular, sequential data files, or e-mail directories.
OLE DB is a programmatic interface to system level that provides the standard set of COM
components for managing databases. Accessing COM components is carried out with the
object model using VBScript or ADOşi JScript scripts can access the databases of Web
applications. ADO is also used for opening databases compatible ODBC (Open DataBase
Connectivity). For creating an application with the access to the database, ADO will require
an identification of the data source. This is done by adding character to connect unuişir,
consisting of arguments separated with unşir ";"
for example, the name of the supplier of the data source (data source provider) and the
location of the data source. ADO use characters for login in order to identify THE OLE DB
provider (provider). The provider is a component that represents the data source, he also
available to your application information about the format of the data. For compatibility, the
OLE DB provider for ODBC supports the syntax of the string for the connection. The string
of characters for login that relates a source database on a remote computer, can contain
security information (user name, password). To prevent access to data sources creates
Windows accounts for the computers that will access data sources, with the appropriate
NTFS permissions to files. For the establishment and the handling of the connections
between the application and data sources compatible OLE DB or ODBC-compatible
databases, ADO provides the Connection object. He has properties and methods allowing the
opening and closing the logins, databases, respectively the formulation of queries to update
the data.
To establish a connection to a database, you will create an instance of the Connection
object. For example, the following script create Connection and open a connection. The string
for the connection does not contain any spaces before or after the equal sign (=) In the
previous example, Open method of Connection object refers to the character string for the
connection. Security is enforced by the security subsystem of the SGBD system, which
checks whether all applications access to satisfy the constraints of security (or authorities,
most likely) stored in the system catalog. Each authority from a discretionary scheme has a
name, a lot of privileges (RETRIVE, INSERT, etc.), a variable-by-appropriate relationship
(i.e., the data for which you apply the authority) and a lot of users. These authorities can be
used to provide control elements dependent on value, independent of the summary and
statistical value, dependent on context. Audit Collection can be used to record attempts of
violation of security .
122
Web technologies: HTML, ASP, PHP
HTML is a form of markup text oriented to the presentation of documents on a single
page, using a specialized rendering software, called HTML user agent, the best example of
such software as your Web browser. HTML provides the means by which the contents of a
document can be annotated with various types of metadata and indications of playback.
Indications of play can range from minor text decorations, such as specifying the fact that a
specific word or it must be stressed that an image should be introduced, up to sophisticated
scripts, images, maps and forms. The metadata may include information about the title and
author of the document, the structural information about how the document is divided into
different segments, paragraphs, lists, headings, etc. and crucial information that enable the
document can be linked to other documents to form such hyperlinks (or web site).
HTML is a text format designed to be read and edited using a simple text editor.
However writing and modifying pages in this way requires solid knowledge of HTML and is
time consuming. Graphical Editors (WYSIWYG) such as Macromedia Dreamweaver, Adobe
GoLive, Microsoft FrontPage or allow webpages to be treated like documetele Word. You
can generate HTML directly using the technologies of server-side encoding such as PHP, JSP
or ASP. Many applications like content management systems, wikis and forums web
generates HTML pages.
HTML is also used in e-mail. Most e-mail applications use a built-in HTML editor
for composing e-mails and a presentation engine of e-mails of this type. Using HTML e-mail
is a controversial topic and many mailing lists they intentionally blocked. Active Server
Pages (ASP) , also known under the names of Classic ASP or ASP Classic, was the first
language programming server-side Microsoft's for generating dynamic Web pages. Originally
was released as an add-on for IIS by Windows NT 4.0 Option Pack, after which it was
included as a free component in Windows Server, starting with the version of Windows 2000
Server). Currently was passed its version of ASP.NET. ASP.NET is a Microsoft technology
for creating Web applications and Web services. Asp.net is the successor of ASP (Active
Server Pages) and benefit from the power of the .NET development platform, and the set of
tools offered by the development environment of Visual Studio .NET application "".
Some of the advantages of the ASP .NET are:
• ASP .NET has a broad set of components, based on XML, thus providing a model
object oriented programming (OOP).
• ASP .NET runs code compiled, which increases performance of the web aplication.
Source code can be separated into two files, one for the executable code, and another one for
the content of the page (HTML code and the text of the page).
• .NET is compatible with over 20 different languages, the most used as C # and
Visual Basic. PHP is a programming language.
PHP Name comes from the English language and is a recursive acronym: Php:
Hypertext Preprocessor. Used originally to produce dynamic Web pages, is widely used in
the development of pages and web applications. It uses mainly incorporated into the HTML
code, but starting from version 4.3.0, you can also use the "command line" (CLI), allowing
for the creation of independent applications. It is one of the most important programming
languages open-source web and server-side, with versions available for most web servers and
for all operating systems. According to the statistics is installed over 20 million websites and
1 million Web servers . Conclusions A database, sometimes called "data bank" is a way of
storing information and data on external media (storage device), with the possibility to light
123
and their rapid retrieval. Typically a database is stored in one or more files. Databases are
handled by systems management databases.
robust, efficient, scalable, etc. Discussions will focal point on the strength of the DBMS and
how it integrates with different technologies. In our case, however, we don’t honestly care
about most of those things. Instead, we are going to be looking at the price of getting started,
tools, the consumer interface and availability of help, specifically helpful for the beginner. We
have a collection of database interview questions which is very helpful to crack your
interviews.
It usually depends on the following factors.
1. The deciding on of right database relies upon on the undertaking (Website) is about.
1. MySQL
MySQL is used in almost all the open-source internet tasks that require a database in the back-
end. MySQL is phase of the effective LAMP stack alongside with Linux, Apache, and PHP.
If you are looking for new job then our MySQL interview Questions will help you to crack
received by using Oracle. Since we don’t what Oracle will do with MySQL database, the
124
open-source community has created a number of forks of MySQL inclusive of Drizzle and
MariaDB.
Following are a few key features:
· MyISAM storage makes use of b-tree disk tables with index compression for high
performance.
Windows and Mac OS. This has full aid for views, joins, triggers, procedures. You can also
visit our PostgreSQL Interview Questions, if you are looking new job then it will help you
4. Asynchronous replication
5. Highly scalable
3. Oracle
Oracle is the fine database for any mission-critical commercial application. Oracle has
125
3. Real Application Cluster
3.SQLite
SQLite does now not work like a usual client-server mannequin with a standalone process.
This is Microsoft’s flagship Database product. If you are stuck in a agency that closely makes
use of Microsoft products, you would possibly end-up working on MS SQL Server.
storing and retrieving data as requested through other software program applications — which
might also run either on the same computer or on any other pc across a network.
126