0% found this document useful (0 votes)
37 views198 pages

Dbms Material

The document provides an overview of Database Management Systems (DBMS), discussing their importance in modern data storage and retrieval, as well as their applications across various sectors such as banking, airlines, and manufacturing. It covers key concepts including data, information, metadata, and the advantages and disadvantages of using a DBMS compared to traditional file systems. Additionally, it outlines the architecture of a DBMS, the roles of different stakeholders involved, and the levels of data abstraction within a database system.

Uploaded by

likhithakonda28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views198 pages

Dbms Material

The document provides an overview of Database Management Systems (DBMS), discussing their importance in modern data storage and retrieval, as well as their applications across various sectors such as banking, airlines, and manufacturing. It covers key concepts including data, information, metadata, and the advantages and disadvantages of using a DBMS compared to traditional file systems. Additionally, it outlines the architecture of a DBMS, the roles of different stakeholders involved, and the levels of data abstraction within a database system.

Uploaded by

likhithakonda28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 198

DATABASE MANAGEMENT SYSTEMS

UNIT I: Introduction
Introduction
The information storage and retrieval has become very important in our day-to-day life. The old
era of manual system is no longer used in most of the places. For example, to book your airline
tickets or to deposit your money in the bank the database systems may be used. The database
system makes most of the operations automated. A very good example for this is the billing
system used for the items purchased in a super market. Obviously this is done with the help of a
database application package. Inventory systems used in a drug store or in a manufacturing
industry are some more examples of database. We can add similar kind of examples to this list.
Apart from these traditional database systems, more sophisticated database systems are used in
the Internet where a large amount of information is stored and retrieved with efficient search
engines. For instance, http://www.google.com is a famous web site that enables users tosearch for
their favorite information on the net. In a database we can store starting from text data to very
complex data like audio, video, etc.
Database Management Systems (DBMS)
A database is a collection of related data stored in a standard format, designed to be shared by
multiple users. A database is defined as “A collection of interrelated data items that can be
processed by one or more application programs”.
A database can also be defined as “A collection of persistent data that is used by the
application systems of some given enterprise”. An enterprise can be a single individual (with a
small personal database), or a complete corporation or similar large body (with a large shared
database), or anything in between.
Example: A Bank, a Hospital, a University, a Manufacturing company

Data
Data is the raw material from which useful information is derived. The word data is the plural of
Datum. Data is commonly used in both singular and plural forms. It is defined as raw facts or
observations. It takes variety of forms, including numeric data, text and voice and images. Data is
a collection of facts, which is unorganized but can be made organized into useful information.
The term Data and Information come across in our daily life and are often interchanged.
Example: Weights, prices, costs, number of items sold etc.
Information
Data that have been processed in such a way as to increase the knowledge of the person who uses the
data. The term data and information are closely related. Data are raw material resources that are
processed into finished information products. The information as data that has been processed in such
way that it can increase the knowledge of the person who uses it.
In practice, the database today may contain either data or information.
Data Processing
The process of converting the facts into meaningful information is known as data processing. Data
1
processing is also known as information processing.
Metadata
Data that describe the properties or characteristics of other data.Data is only become useful when
placed in some context. The primary mechanism for providing context for data is Metadata.
Metadata are data that describe the properties, or characteristics of other data. Some of these
properties include data definition, data structures and rules or constraints. The Metadata describes
the properties of data but do not include that data.
Database System Applications
Databases are widely used. Here are some representative applications:
1. Banking: For customer information, accounts, and loans, and banking transactions.
2. Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner - terminals situated around the world accessed
the central database system through phone lines and other data networks.
3. Universities: For student information, course registrations, and grades.
4. Credit card transactions: For purchases on credit cards and generation of monthly statements.
5. Telecommunication: For keeping records of calls made, generating monthly bills,
maintaining balances on prepaid calling cards, and storing information about the
communication networks.
6. Finance: For storing information about holdings, sales, and purchases of financial
instruments such as stocks and bonds.
7. Sales: For customer, product, and purchase information.
8. Manufacturing: For management of supply chain and for tracking production of items in
factories, inventories of items in warehouses / stores, and orders for items.
9. Human resources: For information about employees, salaries, payroll taxes and benefits, and
for generation of paychecks.
File Systems Versus A DBMS (Characteristics)
In earlier days, the databases were created directly on top of file systems. File system has
many disadvantages.

1. Not enough primary memory to process large data sets. If data is maintained in other storage
devices like disks, tapes and bringing relevant data to main memory, it increases the cost of
performance. Problem in accessing the large data due to addressing the data using 32 bit or
64 bit mode addressing mechanism.
2. Programs must be written to process the user request to process the data stored in files which are
complex in nature because of large volume of data to be searched.
3. Inconsistent data and complexity in providing concurrent accesses.
4. Not sufficiently flexible to enforce security policies in which different users
havepermission to access different subsets of the data.
2
A DBMS is a piece of software that is designed to make the preceding tasks easier. By storing
data in a DBMS, rather than as a collection of operating system Files, we can use the DBMS's

3
features to manage the data in a robust and efficient manner.
Advantages of DBMS
One of the main advantages of using a database management system is that the organization
can exert via the DBA, centralized management and control over the data. The database administrator
is the focus of the centralized control.
The following are the major advantages of using a Database Management System (DBMS): Data
independence: Application programs should be as independent as possible from detailsof data
representation and storage. The DBMS can provide an abstract view of the data toinsulate
application code from such details.
Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and retrieve
data efficiently. This feature is especially important if the data is stored on external storage devices.
Data integrity and security: The DBMS can enforce integrity constraints on the data. The DBMS
can enforce access controls that govern what data is visible to different classes of users.
Data administration: When several users share the data, centralizing the administration of data can
offer significant improvements. It can be used for organizing the data representation to minimize
redundancy and for fine-tuning the storage of the data to make retrieval efficient.
Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the data in such a
manner that users can think of the data as being accessed by only one user at a time. Further, the
DBMS protects users from the effects of system failures. .
Reduced application development time: Clearly, the DBMS supports many important functions
that are common to many applications accessing data stored in the DBMS.
Disadvantages of DBMS
The disadvantage of the DBMS system is overhead cost. The processing overhead
introduced by the DBMS to implement security, integrity, and sharing of the data causes a
degradation of the response and throughput times. An additional cost is that of migration from a
traditionally separate application environment to an integrated one.
Even though centralization reduces duplication, the lack of duplication requires that the
database be adequately backup so that in the case of failure the data can be recovered.
Backup and recovery operations are complex in a DBMS environment, and this is an
incrementin a concurrent multi-user database system. A database system requires a certain
amount of controlled redundancies and duplication to enable access to related data items.
Data Models
A data model is a collection of high-level data description constructs that hide many low-
level storage details. A DBMS allows a user to define the data to be stored in terms of a data
model. Mostdatabase management systems today are based on the relational data model.
A schema is a description of a particular collection of data, using the given data model. The
relational model of data is the most widely used model today.
Main concept: relation, basically a table with rows and columns. Every relation has a schema,
which describes the columns, or fields.
Data Model is a collection of high-level data description constructs that hide many low-

4
level storage details. A DBMS allows a user to define the data to be stored in terms of a data
model. Most database management systems today are based on the Relational data model.
Relational

5
models include – IBM’s DB2, Informix, Oracle, Sybase, Microsoft’s Access, Foxbase, Paradox,
Tandem and Teradata.
Categories of data models

 Conceptual (high-level, semantic) data models: Provide concepts that are close to the way
many users perceive data (Also called entity-based or object-based data models).
 Physical (low-level, internal) data models: Provide concepts that describe details of how data
is stored in the computer.
 Implementation (representational) data models: Provide concepts that fall between theabove
two.

1. Hierarchical models:
Advantages:
 Hierarchical model is simple to construct and operate on.
 Corresponds to a number of natural hierarchical organized domains – e.g., assemblies
in manufacturing, personal organization in companies.
Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET

NEXT WITHINPARENT etc.,
Disadvantages:
 Navigational and procedural nature of processing.
 Database is visualized as a linear arrangement of records.
 Little scope for “query optimization”.
 One-to-many relationships.
2. Network model:
Advantages:

 Network model is able to model complex relationships and represents semantics of


add/delete on the relationships.
 Can handle most situations for modeling using record types and relationship types.
 Language is navigational; uses constructs like FIND, FIND member, FIND owner,
FINDNEXT within set, GET etc. Programmers can do optimal navigation through the
database.
Disadvantages:

 Navigational and procedural nature of processing.


 Database contains a complex array of pointers that are expensive and difficult to update
when inserting and deleting.
 Little scope for automated “query optimization”.
3. Relational model:
6
 A relation, basically a table with rows and columns.

7
 Every relation has a schema, which describes the columns, or fields.
 Student information in a university database may be stored in a relation with the
following schema
 Students (sid: string, name: string, login: string, age: integer, gpa: real)

Levels of Abstraction in a DBMS (Three tier schema architecture )

The data in a DBMS is described at three levels of abstraction.


The database description consists of a schema at each of these three levels of abstraction.
External, Conceptual and Physical Views describe how users see the data. Conceptual
schema defines logical structure.
Physical schema describes the files and indexes used.

Conceptual schema:

 The conceptual schema(also called as logical schema) describes the stored data in terms of the
data model of the DBMS.
 In a relational DBMS, the conceptual schema describes all relations that are stored in
the database.
 In our sample university database, these relations contain information about entities, such as
students and faculty, and about relationships, such as students’ enrollment in courses.
Students(sid: string, name: string, login: string, age: integer, gpa:real)
Faculty(fid: string, fname: string, salary : real) Courses(cid: string, cname: string, credits: integer)
Rooms(nw: integer, address: string, capacity: integer)Enrolled (sid: string, cid: string, grade: string)
Teaches (fid: string, cid: string)

The choice of relations, and the choice of fields for each relation, is not always obvious,
andthe process of arriving at a good conceptual schema is called conceptual database
design.

8
Physical Schema:

 The physical schema specifies storage details.

9
 It summarizes how the relations described in the conceptual schema are actually stored
on secondary storage devices such as disks and tapes.
 Decides what file organizations to use to store the relations and create auxiliary data
structures, called indexes, to speed up data retrieval operations.
 A sample physical schema for the university database is to store all relations as unsorted
files of records.
o Create indexes on the first column of the students, faculty and courses relations, the
salary column of faculty, and the capacity of column of rooms.

External Schema:

 This schema allows data access to be customized at the level of individual users or
groups ofusers.
 A database has exactly one conceptual schema and one physical schema, but it may have
several external schemas.
 An external schema is a collection of one or more views and relations from the conceptual
schema.
 A view is conceptually a relation, but the records in a view are not stored in the DBMS.
Data Independence
Application programs are insulated from changes in the way the data is structured and stored. Data
independence is achieved through use of the three levels of data abstraction.
Logical data independence: users can be shielded from changes in the logical structure of the data,
or changes in the choice of relations to be stored. This is the independence to change the conceptual
schema without having to change the external schemas and their application programs.
Physical data independence: the conceptual schema insulated users from changes in physical
storage details. This is the independence to change the internal schema without having to changethe
conceptual schema.
Architecture of a DBMS
The functional components of a database system can be broadly divided into query processor
components and storage manager components. The query processor includes:
1. DML Compiler: It translates DML statements in a query language into low-level instructions that
the query evaluation engine understands.
2. Embedded DML Pre-compiler: It converts DML statements embedded in an application program
to normal procedure calls in the host language. The pre-compiler must interact with the DML
compiler to generate the appropriate code.
3. DDL Interpreter: It interprets DDL Stateline its and records them in a set of tables containing
metadata.
4. Transaction Manager: Ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
5. File Manager: Manages the allocation of space on disk storage and the data structures used to

10
represent information stored on disk.
6. Buffer Manager: Is responsible for fetching data from disk storage into main memory

11
anddeciding what data to cache in memory.
Also some data structures are required as part of the physical system implementation:
1. Data Files: The data files store the database by itself.
2. Data Dictionary: It stores metadata about the structure of the database, as it is used heavily.
3. Indices: It provides fast access to data items that hold particular values.
4. Statistical Data: It stores statistical information about the data in the database. Thisinformation
used by the query processor to select efficient ways to execute a query.

People Who Deal With Databases


Quite a variety of people are associated with the creation and use of databases. Obviously,
there are database implementors, who build DBMS software, and end users who wish to store and
use data in a DBMS.
Database implementors work for vendors such as IBM or Oracle. End users come from a
diverseand increasing number of fields.
In addition to end users and implementors, two other classes of people are associated with a
DBMS: application programmers and database administrators (DBAs).
Database application programmers develop packages that facilitate data access for end users, who
are usually not computer professionals, using the host or data languages and software tools that
DBMS vendors provide.
The task of designing and maintaining the database is entrusted to a professional called the
12
database administrator.

13
The DBA is responsible for many critical tasks:

Design of the conceptual and physical schemas: The DBA is responsible for interacting with
the users of the system to understand what data is to be stored in the DBMS and how it is likely
to be used. Based on this knowledge, the DBA must design the conceptual schema (decide what
relations to store) and the physical schema (decide how to store them).
Security and authorization: The DBA is responsible for ensuring that unauthorized data
access is not permitted. In general, not everyone should be able to access all the data. In a
relational DBMS, users can be granted permission to access only certain views and relations.
Data availability and recovery from failures: The DBA must take steps to ensure that if the
system fails, users can continue to access as much of the uncorrupted data as possible.
Database tuning: The needs of users are likely to evolve with time. The DBA is responsible
for modifying the database, in particular the conceptual and physical schemas, to ensure adequate
performance as user requirements change.

Database Environment
A database management system (DBMS) is a collection of programs that enables users to create
and maintain a database. The DBMS is hence a general-purpose software system that facilitates the
processes of defining, constructing, manipulating, and sharing databases among various users and
applications.
Defining a database involves specifying the data types, structures, and constraints for the data to be
stored in the database.
Constructing the database is the process of storing the data itself on some storage medium that is
controlled by the DBMS.
Manipulating a database includes such functions as querying the database to retrieve specificdata,
updating the database to reflect changes in the mini world, and generating reports from the data.
Sharing a database allows multiple users and programs to access the database concurrently. Other
important functions provided by the DBMS include protecting the database and maintaining
it over a long period of time.
Protection includes both system protection against hardware or software malfunction (or
crashes), and security protection against unauthorized or malicious access. A typical large
database may have a life cycle of many years, so the DBMS must be able to maintain the
database system by allowing the system to evolve as requirements change over time. We can
call the database and DBMS software together a database system.

14
Database Architecture
Database architecture uses programming languages to design a particular type of software for
businesses or organizations. Database architecture focuses on the design, development,
implementation and maintenance of computer programs that store and organize information for
businesses, agencies and institutions.

The architecture of a DBMS can be seen as either single tier or multi-tier. The tiers are
classified as follows:
1- tier architecture,2-tier architecture, 3-tier architecture…. n-tier architecture

1- tier architecture:
One-tier architecture involves putting all of the required components for a software application or
technology on a single server or platform.

2- tier architecture:
The two-tier is based on Client Server architecture. The two-tier architecture is like client server
application. The direct communication takes place between client and server. There is no
intermediate between client and server.

3-tier architecture:
A 3-tier architecture separates its tiers from each other based on the complexity of the users and
how they use the data present in the database. It is the most widely used architecture to design a
DBMS

15
Centralized DBMS Architecture
Architectures for DBMSs have followed trends similar to those for general computer system
architectures. Earlier architectures used mainframe computers to provide the main processing for all
functions of the system, including user application programs and user interface programs, as well as
all the DBMS functionality.

As prices of hardware declined, most users replaced their terminals with personal computers
(PCs) and workstations. At first, database systems used these computers in the same way as
they had used display terminals, so that the DBMS itself was still a centralized DBMS in which
all the DBMS functionality, application program execution, and user interface processing were
carried out on one machine.

Gradually, DBMS systems started to exploit the available processing power at the user side, which
led to client/server DBMS architectures.

Client/Server Architecture:
The client/server architecture was developed to deal with computing environments in which a
large number of PCs, workstations, file servers, printers, database servers, Web servers, and
other equipment are connected via a network. The idea is to define specialized servers with

16
specific functionalities.
The resources provided by specialized servers can be accessed by many client machines. The

17
client machines provide the user with the appropriate interfaces to utilize these servers, as well as
with local processing power to run local applications. This concept can be carried over to software,
with specialized software-such as a DBMS or a CAD (computer-aided design) package being stored
on specific server machines and being made accessible to multiple clients.

The concept of client/server architecture assumes an underlying framework that consists of


many PCs and workstations as well as a smaller number of mainframe machines, connected via
local area networks and other types of computer networks. A client in this framework is typically
a user machine that provides user interface capabilities and local processing. When a client
requires access to additional functionality-such as database access-that does not exist at that
machine, it connects to a server that provides the needed functionality.
A server is a machine that can provide services to the client machines, such as file access,
printing, archiving, or database access. In the general case, some machines install only client
software, others only server software, and still others may include both client and server
software. However, it is more common that client and server software usually run on separate
machines.
In client/server architecture, the user interface programs and application programs can run on
the client side. When DBMS access is required, the program establishes a connection to the DBMS
(which is on the server side); once the connection is created, the client program can communicate
with the DBMS. A standard called Open Database Connectivity (ODBC) provides an application
programming interface (API), which allows client-side programs to call the DBMS, as long as
both client and server machines have the necessary software installed. Most DBMS vendors provide
ODBC drivers for their systems.

18
Entity Relationship Model
Introduction
The entity-relationship (ER) data model allows us to describe the data involved in a real-world
enterprise in terms of objects and their relationships and is widely used to develop an initial database
design.
The ER model is important primarily for its role in database design. It provides useful
concepts that allow us to move from an informal description of what users want from their
database to a more detailed and precise, description that can be implemented in a DBMS.
Even though the ER model describes the physical database model, it is basically useful in the
design and communication of the logical database model.
Overview of Database Design
Our primary focus is the design of the database. The database design process can be divided
into six steps:

Requirements Analysis
The very first step in designing a database application is to understand what data is to be
stored inthe database, what applications must be built on the database, and what operations must
be performed on the database. In other words, we must find out what the users want from the
database. This process involves discussions with user groups, a study of the current operating
environment, how it is expected to change an analysis of any available documentation on
existing applications and so on.

Conceptual Database Design


The information gathered in the requirement analysis step is used to develop a high-level
description of the data to be stored in the database, along with the conditions known to hold this
data. The goal is to create a description of the data that matches both—how users and developers
think of the data (and the people and processes to be represented in the data). This facilitates
discussion among all the people involved in the design process i.e., developers and as well as
users who have no technical background. In simple words, the conceptual database design phase
is used in drawing ER model.

Logical Database Design


We must implement our database design and convert the conceptual database design into a
database schema (a description of data) in the data model (a collection of high-level data description
constructs that hide many low-level storage details) of the DBMS. We will consider only relational
DBMSs, and therefore, the task in the logical design step is to convert the conceptual database
design in the form of E-R Schema (Entity-Relationship Schema) into a relational database schema.

Schema Refinement
The fourth step in database design is to analyze the collection, of relations (tables) in our
relationaldatabase schema to identify future problems, and to refine (clear) it.

19
Physical Database Design

This step may simply involve building indexes on some tables and clustering some tables, or
itmay involve redesign of parts of the database schema obtained from the earlier design steps.

Application and Security Design


Any software project that involves a DBMS must consider applications that involve
processes andidentify the entities.
Entities, Attributes and Entity Sets
Entity: An entity is an object in the real world that is distinguishable from other objects.
Entity set: An entity set is a collection of similar entities. The Employees entity set with attributes
ssn, name, and lot is shown in the following figure.

Attribute: An attribute describes a property associated with entities. Attribute will have a name and
a value for each entity.
Domain: A domain defines a set of permitted values for an attribute
Entity Relationship Model: An ERM is a theoretical and conceptual way of showing data
relationships in software development. It is a database modeling technique that generates an abstract
diagram or visual representation of a system's data that can be helpful in designing a relational
database. ER model allows us to describe the data involved in a real-world enterprise in terms of
objects and their relationships and is widely used to develop an initial database design.
Representation of Entities and Attributes
ENTITIES: Entities are represented by using rectangular boxes. These are named with the entity
name that they represent.

ATTRIBUTES: Attributes are the properties of entities. Attributes are represented by means of
ellipses. Every ellipse represents one attribute and is directly connected to its entity.
Types of attributes:
Simple attribute − Simple attributes are atomic values, which cannot be divided further. For

example, a student's roll number is an atomic value.


20
 Composite attribute − Composite attributes are made of more than one simple attribute. For

example, a student's complete name may have first_name and last_name.


Derived attribute –
Derived attributes are the attributes that do not exist in the physical database, but their
values are derived from other attributes present in the database. For example, average_salary
in a department should not be saved directly in the database, instead it can be derived. For
another example, age can be derived from data_of_birth.

 Single-value attribute − Single-value attributes contain single value. For example −


Social_Security_Number.
 Multi-value attribute − Multi-value attributes may contain more than one values. For example,

a person can have more than one phone number, email_address, etc.

Relationship and Relationship set


Relationships are represented by diamond-shaped box. Name of the relationship is written inside the
diamond-box. All the entities (rectangles) participating in a relationship, are connected to it by a line.
Types of relationships:
Degree of Relationship is the number of participating entities in a relationship defines
the degree of the relationship. Based on degree the relationships are categorized as
Unary = degree 1,Binary = degree 2
21
 Ternary = degree 3
n-array = degree
Unary Relationship: A relationship with one entity set. It is like a relationship among 2 entities of
same entity set. Example: A professor ( in-charge) reports to another professor (Head Of the Dept).

Binary Relationship: A relationship among 2 entity sets. Example: A professor teaches a course
and a course is taught by a professor.

Ternary Relationship: A relationship among 3 entity sets. Example: A professor teaches a course
in so and so semester.

n-array Relationship: A relationship among n entity sets.

Cardinality:
Defines the number of entities in one entity set, which can be associated with the number of
entities of other set via relationship set. Cardinality ratios are categorized into 4. They are.

1. One-to-One relationship: When only one instance of entities is associated with the
relationship, then the relationship is one-to-one relationship. Each entity in A is associated
with at most one entity in B and each entity in B is associated with at most one entity in A.

22
2. One-to-many relationship: When more than one instance of an entity is associated with a
relationship, then the relationship is one-to-many relationship. Each entity in A is associated

with zero or more entities in B and each entity in B is associated with at most one entity in A.

3. Many-to-one relationship: When more than one instance of entity is associated with the
relationship, then the relationship is many-to-one relationship. Each entity in A is associated
with at most one entity in B and each entity in B is associated with 0 (or) more entities in A.

4. Many-to-Many relationship: If more than one instance of an entity on the left and more than
one instance of an entity on the right can be associated with the relationship, then it depicts
many-to-many relationship. Each entity in A is associated with 0 (or) more entities in B and

23
5. each entity in B is associated with 0 (or) more entities in A.

Relationship Set:
A set of relationships of similar type is called a relationship set. Like entities, a relationship too can
have attributes. These attributes are called descriptive attributes.
Participation Constraints:
 Total Participation − If Each entity in the entity set is involved in the relationship then the
participation of the entity set is said to be total. Total participation is represented by double lines.
 Partial participation − If, Not all entities of the entity set are involved in the relationship
then such a participation is said to be partial. Partial participation is represented by single
lines.
Example:

Additional Features Of The ER ModelKey Constraints

Consider a relationship set called Manages between the Employees and Departments entity
sets such that each department has at most one manager, although a single employee is allowed to
manage more than one department. The restriction that each department has at most one manager is
an example of a key constraint, and it implies that each Departments entity appears in at mostone
Manages relationship in any allowable instance of Manages. This restriction is indicated in the ER
diagram of below Figure by using an arrow from Departments to Manages. Intuitively, the arrow
states that given a Departments entity, we can uniquely determine the Manages relationship in which
it appears.

24
Key Constraints for Ternary Relationships
If an entity set E has a key constraint in a relationship set R, each entity in an instance of E
appearsin at most one relationship in (a corresponding instance of) R. To indicate a key constraint
on entityset E in relationship set R, we draw an arrow from E to R.
Below figure show a ternary relationship with key constraints. Each employee works in at
most onedepartment, and at a single location.

Weak Entities
Strong Entity set: If each entity in the entity set is distinguishable or it has a key then such an entity
set is known as strong entity set.

Weak Entity set: If each entity in the entity set is not distinguishable or it doesn't has a key
then such an entity set is known as weak entity set.

eno is key so it is represented by solid underline. dname is partial key. It can't distinguish the tuples
in the Dependent entity set. so dname is represented by dashed underline.
Weak entity set is always in total participation with the relation. If entity set is weak then the
relationship is also known as weak relationship, since the dependent relation is no longer needed when
the owner left.
Ex: policy dependent details are not needed when the owner (employee) of that policy left or
fired from the company or expired. The detailed ER Diagram is as follows.

25
The cardinality of the owner entity set is with weak relationship is 1 : m. Weak entity set is uniquely
identifiable by partial key and key of the owner entity set.
Dependent entity set is key to the relation because the all the tuples of weak entity set are associated
with the owner entity set tuples.

Dependents is an example of a weak entity set. A weak entity can be identified uniquely only by
considering some of its attributes in conjunction with the primary key of another entity, which is
called the identifying owner.
The following restrictions must hold:
 The owner entity set and the weak entity set must participate in a one-to-many relationshipset
(one owner entity is associated with one or more weak entities, but each weak entity hasa
single owner). This relationship set is called the identifying relationship set of the weak entity
set.
 The weak entity set must have total participation in the identifying relationship set
E-R Diagrams Implementation
Now we are in a position to write the ER diagram for the Company database which was
introducedin the beginning of this unit. The readers are strictly advised to follow the steps shown
in this unit to design an ER diagram for any chosen problem.

Step 1: Identify the Strong and Weak Entity Sets


After careful analysis of the problem we come to a conclusion that there are four possible
entity sets as shown below:
1. Employees Strong Entity Set
2. Departments Strong Entity Set
3. Projects Strong Entity Set
4. Dependents Weak Entity Set
Step 2: Identify the Relevant Attributes
The next step is to get all the attributes that are most applicable for each entity set. Do this
work by considering each entity set in mind and also the type of attributes. Next job is to pick the
primary key for strong entity sets and partial key for weak entity sets.
Example: Following are the attributes:
1. Employees SSN. Name, Addr, DateOfBirth, Sex, Salary
2. Departments DNo. DName, DLocation
3. Projects PNo. PName, PLocation
4. Dependents (weak) DepName, DateOf Birth, Sex, Relationship

26
The underlined attributes are the primary keys and DepName is the partial key of
Dependents.Also, DLocation may be treated as a multivalued attribute.

Step 3: Identify the Relationship Sets


In this step we need to find all the meaningful relationship sets among possible entity sets.
This step is very tricky, as redundant relationships may lead to complicated design and in turn a
bad implementation.
Example: Let us show below what the possible relationship sets are:
1. Employees and Departments WorksFor
2. Employees and Departments Manages
3. Departments and Projects Controls
4.Projects and Employees WorksOn
5. Dependents and Employees Has
6. Employees and Employees Supervises
Some problems may not have recursive relationship sets but some do have. In fact, our
Company database has one such relationship set called Supervises. You can complete this step adding
possible descriptive attributes of the relationship sets (Manages has StartDate and WorksOn has
Hours).

Step 4: Identify the Cardinality Ratio and Participation Constraints

This step is relatively a simple one. Simply apply the business rules and your common sense. So,
we write the structural constraints for our example as follows:

1. WorksFor N: 1 Total on either side


2. Manages 1: 1 Total on Employees and Partial on Departments side
3. Controls 1: N Total on either side
4. WorksOn M: N Total on either side
5. Has 1: M Total on Dependents and Partial on Employees

Step 5: Identify the IS-A and Has-A Relationship Sets


The last step is to look for “is-a” and “has-a” relationships sets for the given problem. As
far asthe Company database is concerned, there are no generalization and aggregation
relationships in the Company database.
The complete single ER diagram by combining all the above five steps is shown in figure

27
Class Hierarchies
To classify the entities in an entity set into subclass entity is known as class hierarchies. Example,
we might want to classify Employees entity set into subclass entities Hourly-Emps entity set and
Contract-Emps entity set to distinguish the basis on which they are paid. Then the class hierarchy
is illustrated as follows.

This class hierarchy illustrates the inheritance concept. Where, the subclass attributes ISA (read as
: is a) super class attributes; indicating the “is a” relationship (inheritance concept).Therefore, the
attributes defined for a Hourly-Emps entity set are the attributes of Hourly-Emps plus attributes
of Employees (because subclass can have superclass properties). Likewise the attributes definedfor
a Contract-Emps entity set are the attributes of Contract-Emps plus attributes of Employees.

Class Hierarchy based on Sub-super Set

1. Specialization: Specialization is the process of identifying subsets (subclasses) of an entity set


(superclass) that share some special distinguishable characteristic. Here, the superclass (Employee)
is defined first, then the subclasses (Hourly-Emps, Contract-Emps, etc.) are defined next.
In short, Employees is specialized into subclasses.
2. Generalization: Generalization is the process of identifying (defining) some generalized
(common) characteristics of a collection of (two or more) entity sets and creating a new entityset
that contains (possesses) these common characteristics. Here, the subclasses (Hourly-Emps,
Contract- Emps, etc.) are defined first, then the Superclass (Employee) is defined, next.
In shortly, Hourly-Emps and Contract-Emps are generalized by Employees.

Class Hierarchy based on Constraints


1. Overlap constraints: Overlap constraints determine whether two subclasses are allowedto
contain the same entity.

Example: Can Akbar be both an Hourly-Emps entity and a Contract-Emps entity?The answer

28
is, No.

29
Other example, can Akbar be both a Contract-Emps entity and a Senior-Emps entity (among
them)?
The answer is, Yes. Thus, this is a specialisation hierarchy property. We denote this
bywriting “Contract-Emps OVERLAPS Senior-Emps”.

2. Covering Constraints: Covering constraints determine whether the entities in thesubclasses


collectively include all entities in the superclass.
Example: Should every Employee be a Hourly-Emps or .Contract-Emps? The Answer is, No.
He can be a Daily-Emps.
Other example, should every Motor-vehicle (superclass) be a Bike (subclass) or a Car(subclass)?
The Answer is YES. Thus generalization hierarchies property is that every instance of a
superclass is an instance of a subclass.
We denote this by writing “ Bikes and Cars COVER Motor-vehicles”.
Aggregation
Aggregation allows us to indicate that a relationship set (identified through a dashed box)
participates in another relationship sets. That is, a relationship set in an association between entity
sets. Sometimes we have to model a relationship between a collection of entities and relationships.
Example: Suppose that we have an entity set called Project and that each Project entity is sponsored
by one or more departments. Thus, the sponsors relationship set captures this information but, a
department that sponsors a project, might assign employees to monitor the sponsorship. Therefore,
Monitors should be a relationship set that associates a sponsors relationship (rather than a Project or
Department entity) with an Employees entity. However, again we have to definerelationships to
associate two or more entities.
Use of Aggregation
We use an aggregation, when we need to express a relationship among relationships. Thus, there

are really two distinct relationships, Sponsors and Monitors, each with its own attributes.
Conceptual Database Design With The ER Model (ER Design Issues)
The following are the ER design issues:
1. Use entry sets attributes
2. Use of Entity sets or relationship sets
3. Binary versus entry relationship sets
4. Aggregation versus ternary relationship.

30
1. Use of Entity Sets versus Attributes
Consider the relationship set (called Works In2) shown in Figure

31
Intuitively, it records the interval during which an employee works for a department. Now
suppose that it is possible for an employee to work in a given department over more than one
period.
This possibility is ruled out by the ER diagram’s semantics. The problem is that we want
to record several values for the descriptive attributes for each instance of the Works_In2
relationship. (This situation is analogous to wanting to record several addresses for each
employee.) We can address this problem by introducing an entity set called, say, Duration, with
attributes from and to, as shown in Figure

2. Entity Sets versus relationship sets :


Consider the relationship set called Manages that each department manager is given a
discretionary budget (dbudget), as shown in below figure, in which we have also renamed the
relationship set to Manages2.

There is at most one employee managing a department, but a given employee could manage several
departments; we store the starting date and discretionary budget for each manager- department pair.
This approach is natural if we assume that a manager receives a separate discretionary budget for
each department that he or she manages.
32
But what if the discretionary budget is a sum that covers all departments managed by that
employee? In this case each Manages2 relationship that involves a given employee will have
the same value in the dbudget field. In general such redundancy could be significant and could
cause a variety of problems. Another problem with this design is that it is misleading.

We can address these problems by associating dbudget with the appointment of the
employee asmanager of a group of departments. In this approach, we model the appointment as
an entity set,say Mgr_Appts, and use a ternary relationship, say Man ages3, to relate a manager,
an appointment, and a department. The details of an appointment (such as the discretionary
budget) are not repeated for each department that is included in the appointment now, although
there is still one Manages3 relationship instance per such department. Further, note that each
departmenthas at most one manager, as before, because of the key constraint. This approach is
illustrated inbelow Figure.

3.Binary versus Ternary Realationships


Consider the ER diagram shown in below figure. It models a situation in which an employee
can own several policies, each policy can be owned by several employees, and each dependent
can be covered by several policies.
Suppose that we have the following additional requirements:

 A policy cannot be owned jointly by two or more employees.


 Every policy must be owned by some employee.

Dependents is a weak entity set, and each dependent entity is uniquely identified by taking pname
in conjunction with the policyid of a policy entity (which, intuitively,

33
The first requirement suggests that we impose a key constraint on Policies with respect to
Covers, but this constraint has the unintended side effect that a policy can cover only one dependent.
The second requirement suggests that we impose a total participation constraint on Policies. This
solution is acceptable if each policy covers at least one dependent. The third requirement forces us
to introduce an identifying relationship that is binary (in our version of ER diagrams, although there
are versions in which this is not the case).

Even ignoring the third point above, the best way to model this situation is to use two binary
relationships, as shown in below figure.

4.Aggregate versus Ternary Realationships


The choice between using aggregation or a ternary relationship is mainly determined by the
existence of relationship that relates a relationship set to an entity set (or second relationship set).
The choice may also be guided by certain integrity constraints to we want to express.

Consider the constraint that each sponsorship (of a project by a department) be monitored by at most
one employee. We cannot express this constraint in terms of the Sponsors2 relationship set. Also we
can express the constraint by drawing an arrow from the aggregated relationship. Sponsors to the
relationship Monitors. Thus, the presence of such a constraint serves as another reason for using
aggregation rather than a ternary relationship set.

34
35
UNIT-II

RELATIONAL MODEL
Introduction
Relational Model was proposed by E.F Codd to model data in the form of relations or tables.
After designing the conceptual model of database using ER diagram, we need to convert the
conceptual model in the relational model which can be implemented using any RDBMS
(Relational Data Base Management System) like SQL, MY SQL etc.
The relational model is very simple and elegant; a database is a collection of one or more relations,
where each relation is a table with rows and columns.This simple tabular representation enables even
new users to understand the contents of a database, and it permits the use of simple, high-level
languages to query the data.
Relational Model
Relational Model represents how date is stored in relational databases. A Relational database
stores data in the form of relations (tables).
Consider a relation STUDENT with attributes ROLL_NO, NAME, ADDRESS, PHONE and
AGE as shown in table.
ROLL NAM ADDRES PHONE AGE
_NO E S
1 Nishm Hyderabad 9455123451 28
a
2 Sai Guntur 9652431843 27
3 Sweth Nellore 9156253131 26
a
4 Raji Ongole 9215635311 25
Attribute: Attributes are the properties that define a relation. Ex: ROLL_NO, NAME

Tuple: Each row in a relation is known as tuple.


Ex:
1 Nish Hyderab 94551234 28
ma ad 51
Degree: The number of attributes in the relation is known as degree.
Ex: The degree of the given STUDENT table is 5.
Column: Column represent the set of values for a particular attribute. The column ROLL_NO is
extracted from the relation STUDENT.
Ex:

ROLL_
NO
1

36
Null values: The value which is not known or unavailable is called NULL VALUE. It is
representedby blank space.
Cardinality: The number of tuples are present in the relation is called as its cardinality.
Ex: The Cardinality of the STUDENT table is 4.

Concept Of Domain
The domain of a database is the set of all allowable values (or) attributes of the database.
Ex: Gender (Male, Female, Others).
Relation
 A relation is defined as a set of tuples and attributes.
 A relation consists of Relation schema and relation instance.
 Relation schema: A relation schema represents the name of the relation with its attributes.
Ex: STUDENT (ROLL_NO, NAME, ADDRESS, PHONE and AGE) is Relation schema
forSTUDENT.
 Relation instance: The set of tuples of a relation at a particular instance of a time is called
Relation Instance.
An instance of „Employee „relation

Emp_co Emp_Na Dept_Na


de me me
01234 John HR
12567 Smith Sales
21678 Sai Productio
n
12456 Jay Design

Importance of Null values:


 SQL supports a special value known as NULL which is used to represent the values of
attributes that may be unknown or not apply to a tuple.
 For example, the apartment_number attribute of an address applies only to the address that
is in apartment buildings and not to other types of residences.
 It is important to understand that a NULL value is different from Zero value.
 A Null value is used to represent a missing value, but that is usually has one of the following
interpretations:
 Value unknown (Value exists but it is unknown)
 Value not available (exists but it is purposely withheld)
 Attribute not applicable (undefined for this tuple)

 It is often not possible to determine which of the meanings is intended.

Constraints

37
 On modeling the design of the relational data base, we can put some rules(conditions) like
what values are allowed to be inserted in the relation
 Constraints are the rules enforced on the data columns of a table. These are used to limit the
type of data that can go in to a table
 This Ensure the accuracy and reliability of the data in the database. Constraints could be
either on a column level on a table level.
Domain Constraints In DBMS
 In DBMS table is viewed as a combination of rows and columns
 For example, if you are having a column called month and you want only (jan, feb,
march……) as values allowed to be entered for that particular column which is referred to
as domain for that particular column
Definition: Domain constraint ensures two things it makes sure that the data value entered for that
particular column matches with the data type defined by that column
It shows that the constraints (NOT NULL/UNIQUE/PRIMARY KEY/FOREIGN
KEY/CHECK/DEFAULT)

Domain constraint= data type check for the column +constraints.


Example:, we want to create a table “STUDENT” with “stu_id” field having a value greater
than 100, can create a domain and table like this.
 Create domain id_value int constraint id_test check (value>=100);
 CREATE table STUDENT (stu_id id value primary key, stu_name varchar(30), stu_age int);

Key constraints in DBMS:


 Constraints are nothing but the rules that are to be followed while entering data into columns
of the database table.
 Constraints ensure that the data entered by the user into columns must be within the criteria
specified by the condition.
 We have 6 types of key constraints in DBMS
1. Not Null
2. Unique
3. Default
4. Check
38
5. Primary key
6. Foreign key

1. Not Null:
 Null represents a record where data may be missing data or data for that record may be
optional.
 Once not null is applied to a particular column, you cannot enter null values to that column.
 A not null constraint cannot be applied at table level.

Example:

Create table EMPLOYEE (id int Not null, name varchar Not null, Age intnot null, address
char (25), salary decimal (18,2), primary key(id));

 In the above example we have applied not null on three columns id, name and age which
means whenever a record is entered using insert statement all three columns should contain
avalue other than null.
 We have two other columns address and salary, where not null is not applied which means
that you can leave the row as empty.

2. Unique:

Some times we need to maintain only. Unique data in the column of a database table, this is
possible by using a Unique constraint.

 Unique constraint ensures that all values in a column are Unique.

Example:

Create table PERSONS (id int unique, last_name varchar (25) not null,First name varchar (25),
age int);

39
 In the above example, as we have used unique constraint on ID column we are not supposedto

enter the data that is already present, simply no two ID values are same.
3. Default:
Default in SQL is used to add default data to the columns.

 When a column is specified as default with same value then all the rows will use the
same value i.e., each and every time while entering the data we need not enter that
value.
 But default column value can be customised i.e., it can be over ridden when inserting a data
for that row based on the requirement.

(Row with default values “abc”)


Example:

Create table EMPLOYEE (id int Not null, last_name varchar (25) Not null,first_name varchar
(25), Age int, city varchar (25) Default Hyderabad);

 As a result, whenever you insert a new row each time you need not enter a value for this
default column that is entering a column value for a default column is optional.

4. Check:
 Check constraint ensures that the data entered by the user for that column is within the rangeof
values or possible values specified.

40
Example: Create table STUDENT (id int, name varchar (25), age int,check(age>=18));

41
 As we have used a check constraint as (age>=18) which means value entered by user for this
age column while inserting the data must be less than or equal to 18.
5. Primary Key:
 A primary key is a constraint in a table which uniquely identifies each row record in a
database table by enabling one or more column in the table as primary key.

Creating a primary key:


 A particular column is made as a primary key column by using the primary key keyword
followed by the column name.
Example:
Create table EMP (ID int, name varchar (20), age int, course varchar(10), Primary key
(ID));
 Here we have used the primary key on ID column then ID column must contain unique
values i.e., one ID cannot be used for another student.

6. Foreign Key:
 The foreign key constraint is a column or list of columns which points to the primary
key column of another table.
 The main purpose of the foreign key is only those values are allowed in the present table
that will match to the primary key column of another table.

42
From the above two tables, COURSE_ID is a primary key of the table STUDENT_MARKS
and alsobehaves as a foreign key as it is same in STUDENT_DETAILS and
STUDENT_MARKS.
Example:

(Reference Table)

Create table CUSTOMER1 (id int, name varchar (25), course varchar (10),primary key (ID));

(Child table)

Create table CUSTOMER2 (id int, marks int, references customer1(ID));

Integrity Constraints in DBMS:


There are two types of integrity constraints
1. Entity Integrity Constraints
2. Referential Integrity Constraints

Entity Integrity constraints:

These constraints are used to ensure the uniqueness of each record or row in the data
table.
Entity Integrity constraints says that no primary key can take NULL VALUE,
since using primary key we identify each tuple uniquely in a relation.

Example:

43
Explanation:
 In the above relation, EID is made primary key, and the primary key can‟ttake
NULL values but in the 3rd tuple, the primary key is NULL, so it is violating
Entity integrity constraints.

Referential Integrity constraints:

The referential integrity constraint is specified between two relations or tables and usedto
maintain the consistency among the tuples in two relations.
This constraint is enforced through foreign key, when an attribute in the foreign key of
relation R1 have the same domain as primary key of relation R2, then the foreign key
ofR1 is said to reference or refer to the primary key of relation R2.
The values of the foreign key in a tuple of relation R1 can either take the values of the
primary key for some tuple in Relation R2, or can take NULL values, but can‟t be empty.

Explanation:

 In the above, DNO of the first relation is the foreign key and DNO in the second relation is
the primary key
44
 DNO=22 in the foreign key of the first relation is not available in the second relation so, since
DNO=22 is not defined in the primary key of the second relation therefore Referential integrity
constraints is violated here.
Basic SQL (introduction)

 SQL stands for Structure Query Language it is used for storing and managing data in
relational database management system.
 It is standard language for relational database system. It enables a user to create, read,
updateand delete relational databases and tables.
 All the RDBMS like MYSQL, Oracle, MA access and SQL Server use SQL as their
standarddatabase language.
 SQL allows users to Query the database in a number of ways using statements like common
English.
Rules: SQL follows following rules

 SQL is not a case sensitive. Generally, keywords are represented in


UPPERCASE.
 Using the SQL statements, you can perform most of the actions in a database.
 Statements of SQL are dependent on text lines. We can use a single SQL
statement on one or multiple text line.
SQL Process:
When an SQL command is executing for any RDBMS, then the system figure out the best
way to carry out the request and the sql engine determines that how to interrupt the task.
In the process, various components are included. These components can be optimization
engine, query engine, query dispatcher etc.,
All the non-sql queries are handled by the classic query engine, but sql query engine won‟t
handle logical files.

Characteristics of SQL:
 SQL is easy to learn.
 SQL is used to access data from relational database management system.
 SQL is used to describe the data.
 SQL is used to create and drop the database and table.
 SQL allows users to set permissions on tables, procedures and views.

Simple database Schema:


 A database schema is a structure that represents the logical storage of the data in the database.
 It represents the organization of data and provides information about the relationships between
the tables in a given database.
 A database schema is the logical representation of a database, which shows how the data is
stored logically in the entire database.
 It contains list of attributes and instruction that informs the database engine that how the
data is organized and how the elements are related to each other.
 A database schema contains schema objects that may include tables, fields, packages,

45
views, relationship, primary key, foreign key.
 In actual, the data is physically stored in files that may be in unstructured form, but to retrieve
it and use it, we need to keep them in a structured manner. To do this a database schema is used.
It provides knowledge about how the data is organized in a database and how it is associated
with other data.
 A database schema object includes the following:
 Consistent formatting for all data entries.
 Database objects and unique keys for all data entries.
 Tables with multiple columns, and each column contains its names and datatypes.
 The given diagram is an example of a database schema it contains three tables, their
data types. This also represents the relationships between the tables and primary keysas
well as foreign keys.

46
SQL Commands:
SQL commands are categorized into three types.

1. Data Definition Language (DDL): used to create (define) a table.

2. Data Manipulation Language (DML): used to update, store and retrieve data from tables.

3. Data Control Language (DCL): used to control the access of database created using DDL
andDML.

SQL DATATYPES :
SQL data type is used to define the values that a column can contain

Every column is required to have a name and data type in the database table. DATA TYPES OF

SQL :

SQL DATA
TYPES

Binary data Numeric data Extract String datatype Date data type
type type numeric data
type

1. BINARY DATATYPES:
There are three types of binary data types which are given below
47
DATA DESCRIPTION
TYPE
Binary It has a maximum length of 800 bytes. It contains
a fixed- length binary data
Var binary It has a maximum length of 800 bytes. It
contains avariable - length binary
data
Image It has a maximum length of 2,147,483,647 bytes. It
contains
a variable - length binary data

2. NUMERIC DATATYPE:

DATA FROM TO DESCRIPTION


TYPE
Float -1.79 E 1.79 E It is used to specify a floating-point
+308 +308 value.
Ex: 6.2, 2.9 etc
Real -3.40 E 3.40 E It specifies a single precision
+38 +38 floatingpoint number.

3. EXACT NUMERIC DATA TYPE:

DATA DESCCRIPTION
TYPE
Int It is used to specify an integer value
Small int It is used to specify small integer value
Bit It has the number of bits to store
Decimal It specifies a numeric value that can have a decimal
number
Numeric It is used to specify a numeric value

4. DATE AND TIME DATATYPES:

DATA DESCRIPTION
TYPE
Date It is used to store the year, month, and days value
Time It is used to store the hour, minute, and seconds value
Time stamp It stores the year, month, hour, minute, and the
second value

5. STRING DATATYPE:

DATA DESCRIPTION
TYPE
48
Char It has a maximum length of 8000 characters. It contains fixed-
length non-
Unicode characters.
Varchar It has a maximum length of 8000 characters. It contains variable-
length
non-Unicode characters.
Text It has a maximum length of 2,147,483,647 characters. It contains
variable-
length non-Unicode characters.

TABLE DEFINITIONS: (CREATE, ALTER)

SQL TABLE: SQL table is a collection of data which is organized in terms of rows and
columns.
In DBMS, the table is known as relation and row as a
tuple Let‟s see an example of the “EMPLOYEE “table

EMP_I EMP_NA CITY PHONE_ID


D ME
1 Kristen Washington 7289201223
2 Anna Franklin 9378282882
3 Jackson California 9264783838
4 Daniel Hawaii 9638482678
In the above table, “EMPLOYEE” is the table name, “EMP_ID,
“EMP_NAME”, “CITY”,”PHONE-NO” are the column names.
The combination of data of multiple columns forms a row
EG: 1, “Kristen”, “Washington” and “7289201223 “are the data of one row
OPERATIONS ON TABLE:
1. Create table
2. Alter table
3. Drop table
1. Create table: SQL create table is used to create a table in the database. To define
the table,you should define the name of the table and also define its column and column‟s
data type.

SYNTAX:

Create table table_name (“column1” “datatype”,

“column2” “datatype”,“column3” “datatype”,

….

“column N” “datatype”);

49
EXAMPLE:

SQL > create table employee (emp_id int, emp_name varchar (25), phone_no int,address char
(30));

 If you create the table successfully, you can verify the table by looking at the message by
the sql server. else you can use DESC command as follows
SQL > DESC employee;

FIELD TYPE NULL DEFAUL EXTRA


T

Emp_id Int (11) No NULL


Emp_name Varchar No NULL
(25)
Phone_no No Int (11) NULL
address yes NULL Char(30)
2. ALTER TABLE:
The alter table command adds, delete or modifies columns in a table
 The alter table command also adds and deletes various constraints in a table
 The following SQL adds an “EMAIL” column to the “EMPLOYEE “table

SYNTAX:

ALTER table table_name add column1 datatype;

EXAMPLE:

ALTER table employee add email varchar (255);

SQL > DESC employee;

FIELD TYPE NULL DEFAUL EXTRA


T
Emp_id Int (11) No NULL
Emp_name Varchar (25) No NULL
Phone_no No Int (11) NULL
Address Yes NULL Char (30)
Email Varchar (255) NULL

3. DROP TABLE:
 The drop table command deletes a table in the data base
 The following example SQL deletes the table “EMPLOYEE”

50
SYNTAX :

DROP table table_name;

EXAMPLE:

DROP table employee;

 Dropping a table results in loss of all information stored in the table.


Different DML Operations (insert, delete, update):
 DML-Data Manipulation Language.
 Data Manipulation Commands are used to manipulate data to the database.
 Some of the data manipulation commands are
1. Insert
2. Update
3. Delete

Insert:
SQL insert statement is a sql query. It is used to insert a single multiple records in a table.

Syntax:

Insert into table name values (value 1, value 2, value 3);

Let‟s take an example of table which has 3 records within it.

▪ insert into student values(„alekhya‟,501,‟hyderabad‟);


▪ insert into student values(„deepti‟,502,‟guntur‟);
▪ insert into student values(„ramya‟,503,‟nellore‟);The following table will be as follows:

NAME ID CITY
Alekhya 501 Hyderabad
Deepti 502 Guntur
Ramya 503 Nellore

1. Update:
 The SQL Commands update are used to modify the data that is already in the database.
 SQL Update statement is used to change the data of records held by tables which rows is
to be update, it is decided by condition to specify condition, we use “WHERE” clause.
 The update statement can be written in following form:

Syntax:
Update table_name set column_name=expression where condition;

51
Example:
Let‟s take an example: here we are going to update an entry in the table.

Update students set name=‟rasi‟ where id=503;

After update the table is as follows:

NAME ID CITY
Alekhya 501 Hyderabad
Deepti 502 Guntur
Rasi 503 Nellore

2. Delete:
 The SQL delete statement is used to delete rows from a table.
 Generally, delete statement removes one or more records from a table.

Syntax:

delete from table_name [where condition];

Example:

Let us take a table named “student” table

Delete from students where id=501;

Resulting table after the query:

NAME ID CITY
Deepti 502 Guntur
Rasi 503 Nellore

52
UNIT-III
SQL

Basic SQL querying (select and project) using where clause:


 The following are the various SQL clauses:

SQL Clause

Group by clause having clause Order by clause

1. Group by:
 SQL group by statement is used to arrange identical data into groups.
 The group by statement is used with the SQL select statement.
 The group by statement follows the WHERE clause in a SELECT statement and precedes the
ORDER BY clause.

Syntax:
Select column from table_name where column group by column, order bycolumn;

Sample table: product


PRODU COMP QT RATE COST
CT ANY Y
Item 1 Com 1 2 10 20
Item 2 Com 2 3 25 75
Item 3 Com 1 2 30 60
Item 4 Com 3 5 10 50
Item 5 Com 2 2 20 40

Example:
 Select company count (*) from product group by company;

Output:
Com 2
1
Com 3
2
Com 5
3
53
2. Having clause:
 Having clause is used to specify a search condition for a group or an aggregate.

Having clause is used in a group by clause, if you are not using group by clause then you
canuse having function like a where clause.

Syntax:
Select column1, column2 from table_name

Where conditions

Group by column1, column2Having conditions

Order by column1, column2;

Example:
 select company count (*) from product

Group by company Having count (*) > 2;

Output:
Com 3 5
Com 2 2

3. Order by clause:
The order by clause sorts the result _set in ascending or descending order.

Syntax:

Select column1, column2, from table_name

Where condition

Order by column1, column2…asc;

Sample table:

Take a student table

54
Example:

Select * from student order by name;

Output:

NAME ID CITY
Alekhya 501 Hyderabad
Deepti 502 Guntur
Rasi 503 Nellore

SQL Where clause:


 A where clause in SQL is a data manipulation language statement.
 Where clauses are not mandatory clauses of SQL DML statements but it can be used to
limit the number of rows affected by a SQL DML statement or returned by query.
 Actually, it follows the records.it returns only those queries which the specific conditions.

Syntax:
Select column1, column2,.................column from table_name where[condition];

 Where clause uses same conditional selection.

= Equal to
> Greater than
< Less than
>= Greater than or equal
to
<= Less than or equal to
<> Not equal to

Arithmetic and logical operations:

SQL operators:
 SQL statements generally contain some reserved words or characters that are used to perform
operations such as arithmetic and logical operations etc. Their reserved words are known as
operators.
SQL arithmetic operator:
 We can use various arithmetic operators on the data stored in tables.
 Arithmetic operators are:

+ Addition
- Subtraction
/ Division
* Multiplication
% modulus

55
1. Addition (+):
It is used to perform addition operation on data items.

Sample table:
EMP_ID EMP_NAME SALARY
1 Alex 25000
2 John 55000
3 Daniel 52000
4 Sam 12312

 select emp id, emp_name, salary, salary+100 as “salary +100”


from addition;
Output:
EMP_ID EMP_NAM SALAR SALARY+100
E Y
1 Alex 25000 25100
2 John 55000 55100
3 Daniel 52000 52100
4 Sam 12312 12412
 Here we have done addition of 100 to each emp‟s salary.

2. Subtraction (-):
 It is used to perform subtraction on the data items.

Example:
Select emp_id, emp_name, salary, salary-100 as “salary-100” from
subtraction;
EMP_I EMP_NA SALARY SALARY-100
D ME
1 Alex 25000 24900
2 John 55000 54900
3 Daniel 52000 51900
4 Sam 90000 89900
Here we have done subtraction of 100 for each emp‟s salary.

3. Division (/):
 The division function is used to integer division (x is divided by y).an integer value
is returned.

Example:
 Select emp_id, emp_name, salary, salary/100 as “salary/100” fromdivision;

EMP_ID EMP_NAME SALAR Salary/100


Y
1 Alex 25000 250
2 John 55000 550
3 Daniel 52000 520
4 Sam 90000 900
56
4. Multiplication (*):
 It is used to perform multiplication of data items.
 Select emp_id, emp_name, salary, salary*100 as “salary*100” frommultiplication;

EMP_ID EMP_NAME SALAR SALARY*


Y 100
1 Alex 25000 2,500,000
2 John 55000 5,500,000
3 Daniel 52000 5,200,000
4 Sam 90000 9,000,000
 Here we have done multiplication of 100 to each emp‟s salary.

5. Modulus (%):
 It is used to get remainder when one data is divided by another.
 Select emp_id, emp_name, salary, salary%25000 as “salary%25000”

frommodulus; Output:
EMP_ID EMP_NAME SALAR SALARY%2
Y 5000
1 Alex 25000 0
2 John 55000 5000
3 Daniel 52000 2000
4 Sam 90000 15000
 Here we have done modulus operation to each emp‟s salary.

Logical operations:
Logical operations allow you to test for the truth of a condition.

The following table illustrates the SQL logical operator.

OPERATOR MEANING
ALL Returns true if all comparisons are true
AND Returns true if both expressions are true
ANY Returns true if any one of the
comparisons is
true
BETWEEN Return true if the operand is within a
range
IN Return true if the operand is equal to
one of
the values in a list
EXISTS Return true if the sub query contains any
rows
1. AND:
The AND operator allows you to construct multiple condition in the WHERE clause of an
SQLstatement such as select.

 The following example finds all employees where salaries are greater than the 5000 and lessthan
7000.
 Select first_name, last_name, salary from employees wheresalary>5000 AND
salary<7000 order by salary;

57
Output:
FIRST_NAME LAST_NAME SALARY
John Wesley 6000
Eden Daniel 6000
Luis Popp 6900
Shanta Suji 6500
1. ALL:
The ALL operator compares a value to all values in another value set.

 The following example finds all employees whose salaries are greater than all salaries
of employees.
EX:
select first_name, last_name, salary from employees where salary>=ALL (select salary from
employees where department_id =8) order by salaryDESC;
Output:
FIRST_NA LAST_NAME SALARY
ME
Steven King 24000
John Russel 17000
Neena Kochhar 14000

2. ANY:
The ANY operator compares a value to any value in a set ascending to condition.
The following example statement finds all employees whose salaries are greater than the average
salary of every department.
EX:
select first_name, last_name, salary from employees where salary >ANY(select avg (salary) from
employees‟ group by department_id) order byfirst_name, last_name;

Output:
FIRST_NA LAST_NAME SALARY
ME
Alexander Hunold 9000.00
Charles Johnson 6200.00
David Austin 4800.00
Eden Flip 9000.00
1. Between:
 The between operator searches for values that are within a set of values.
 For example, the following statement finds all employees where salaries are between 9000and
12000.
EX: select first_name, last_name, salary from employees where salarybetween 9000 AND
12000order by salary;

Output:
FIRST_NAME LAST_NAME SALARY
Alexander Hunold 9000.00
58
Den Richards 10000.00
Nancy Prince 12000.00

59
2. IN:
 The IN operator compares a value to list of specified values. The IN operator return true
if compared value matches at least one value in the list.
 The following statement finds all employees who work in department _id 8 or 9. EX:

select first_name, last_name, department_id from employees wheredepartment_id IN (8,9)


order by department_id;

Output:
FIRST_NAME LAST_NAME DEPARTMEN
T_ID
John Russel 8
Jack Livingstone 8
Steven King 9
Neena Kochhar 9

3. Exists:
 The EXISTS operator tests if a sub query contains any rows.
 For example, the following statement finds all employees who have dependents.
 select first_name, last_name from employees where EXISTS (select 1from dependent d
where d.employee_id=e.employee_id);

FIRST_NAME LAST_NAM
E
Steven King
Neena Kochhar
Alexander Hunold

SQL FUNCTIONS (Date & Time, Numeric, Aggregate, String


conversions): DATE & TIME FUNCTIONS:

60
61
Some important date and time functions are below:

Sysdate: It generates the system date.Ex: Select sysdate from dual;

Output: 05-DEC-2021.

ADD_MONTHS: This function returns a date after adding data with specified no of months. EX:

Select ADD_MONTHS („2017-02-29‟,1) from dual;

Output: 31-MAR-17.

Select add_months(sysdate,3) from dual;

Output: 05-MAR-22.

CURRENT_DATE: This function displays the current date.

Ex: Select CURRENT_DATE from dual;

Output: 05-DEC-2021.

NEXT_DAY: This function represents both day and date and returns the day of the next given
day.

EX: Select NEXT_DAY(SYSDATE,‟MONDAY‟) from dual;

Output: 07-DEC-21.

LAST_DAY: This function returns a day corresponding last day of months.

EX: Select LAST_DAY (sysdate) from dual;


62
Output: 31-DEC-21.

63
MONTHS_BETWEEN: It is used to find no of months between two given dates.

EX: Select MONTHS_BETWEEN(‟16-APRIL-2021‟,‟16-AUGUST-2021) from dual;

Output: -4.

ROUND: It gives the nearest value or round off value for the argument pass. (or) It returns a date
rounded to a specific unit of measure.

EX: Select ROUND(‟26-NOV-21‟,‟YYYY‟) from dual;

Output: 01-JAN-22.

TRUNC: This function returns the date with the time(co-efficient) portion of the date truncated to
the unit specified.

EX: Select TRUNC (sysdate, ‟MM‟) from dual;

Output: 01-DEC-21.

TO_DATE: This function converts date which is in the character string to a date value.

EX: Select TO_DATE (‟01 jan 2017‟,‟DD MM YYYY‟) from dual;

Output: 01-JAN-17.

TO_CHAR: This function converts DATE or an INTERVAL value to a character string in a


specified format.

EX: Select TO_CHAR (sysdate,‟DD MM YYYY‟) from dual;

Output: 05 12 2021.

LEAST: This function displays the oldest date present in the argument list.

EX: Select LEAST(‟01-march-2021‟,‟16-feb-2021‟,‟28-dec-2021‟) from

dual;

Output: 01-MAR-21.

GREATEST: This function displays the latest date present in the argument list.

EX: Select GREATEST (‟01-march-2021‟,‟16-feb-2021‟,‟28-dec-2021‟) from dual;

Output: 28-DEC-21.

Aggregate Functions:
64
Aggregate Functions take a collection of values as input and returns a single value.
1. 1.Count () 2. Sum () 3. Avg () 4. Max () 5. Min ()

65
1. Count (): This function returns number of rows returned by a query.

Syntax: Select count(column_name)

From table_name Where condition);

Example: Select count (distinct manager_id) from employees;

2. Sum (): It will add/ sum all the column values in the query.

Syntax: Select sum (column_name)

From table_name Where condition);

Example: Select sum(salaries) from employees;

3. Avg (): Avg function used to calculate average values of the set of rows.

Syntax: Select avg (column_name)

From table_name Where condition);

Example: Select avg(salary) from employees;

4. Max (): This function is used to find maximum value from the set of values.

Syntax: Select max (column_name)

From table_name Where condition);

Example: Select max (salary) from employees;

5. Min (): This function is used to find minimum value from the set of values.

Syntax: Select min (column_name)

From table_name

Where condition);

Example: Select min (salary) from employees;

66
SQL NUMERIC FUNCTIONS:

Numeric functions are used to perform operations on numbers and return numbers. Following are

some of the Numeric functions

1. ABS (): It returns the absolute value of a number. EX: select ABS (-243.5) from dual;

OUTPUT: 243.5

2. ACOS (): It returns the cosine of a number. EX: select ACOS (0.25) from dual;
OUTPUT: 1.318116071652818

3. ASIN (): It returns the arc sine of a number. EX: select ASIN (0.25) from dual;
OUTPUT: 0.253680255142

4. CEIL (): It returns the smallest integer value that is a greater than or equal to a number. EX:
select CEIL (25.77) from dual;
OUTPUT: 26

67
5. FLOOR (): It returns the largest integer value that is a less than or equal to a number. EX: select
FLOOR (25.75) from dual;
OUTPUT: 25
6. TRUNCATE (): This does not work for SQL server. It returns the truncated to 2 places right of
the decimal point.
EX: select TRUNCATE (7.53635, 2) from dual;
OUTPUT: 7.53
7. MOD (): It returns the remainder when two numbers are divided. EX: select MOD (55,2)
from dual;
OUTPUT: 1.
8. ROUND (): This function rounds the given value to given number of digits of precision. EX:
select ROUND (14.5262,2) from dual;
OUTPUT: 14.53.
9. POWER (): This function gives the value of m raised to the power of n. EX: select
POWER (4,9) from dual;
OUTPUT: 262144.
10. SQRT (): This function gives the square root of the given value n.EX: Select SQRT
(576) from dual;
OUTPUT: 24.
11. LEAST (): This function returns least integer from given set of integers.EX: select LEAST

(1,9,2,4,6,8,22) from dual;

OUTPUT: 1.

12. GREATEST (): This function returns greatest integer from given set of integers. EX: select

GREATEST (1,9,2,4,6,8,22) from dual;

OUTPUT: 22

68
STRING CONVERSION FUNCTIONS OF SQL:

String Functions are used to perform an operation on input string and return the output string.
Following are the string functions

1. CONCAT (): This function is used to add two words (or) strings.

EX: select „database‟ ||‟ „|| „management system‟ From dual;

OUTPUT: „database management system‟

2. INSTR (): This function is used to find the occurrence of an alphabet. EX: instr

(„databasesystem‟,‟ a‟) from dual; OUTPUT: 2 (the first occurrence of „a‟)

3. LOWER (): This function is used to convert the given string into lowercase. EX: select

lower(„DATABASE‟) from dual;

OUTPUT: database
1
4. UPPER (): This function is used to convert the lowercase string into uppercase. EX: select

upper(„database „) from dual;

OUTPUT: DATABASE

5. LPAD (): This function is used to make the given string of the given size by adding

the givensymbol. EX: > lpad („system „, 8, „0‟) from dual;

OUTPUT: 00system

6. RPAD (): This function is used to make the given string as long as the given size by
adding thegivensymbol on the right.

EX: rpad („system „,8, „0„) from dual;

OUTPUT: system00

7. LTRIM (): This function is used to cut the given substring from the original string. EX:

ltrim(„database „, „data „) from dual;

OUTPUT: base

8. RTRIM (): This function is used to cut the given substring from the original string. EX:

rtrim(„database „, „base „) from dual;

OUTPUT: data.

9. INITCAP (): This function returns the string with first letter of each word starts with uppercase.

EX: Select INITCAP („data base management system‟) from dual;

OUTPUT: Data Base Management System.

10. LENGTH (): Tis function returns the length of the given

string. EX: select LENGTH („SQ LANGUAGE‟) from dual;

OUTPUT: 11.
2
11. SUBSTR (): This function returns a portion of a string beginning at the character position.
EX:
select SUBSTR („MY WORLD IS AMAZING‟,12,3) from dual;

OUTPUT: AM.

TRANSLATE (): This function returns a string after replacing some set of characters into another set.
EX: select TRANSLATE („Delhi is the capital of India‟,‟i‟,‟a‟) from dual;
OUTPUT: Delha as the capatal andaa

INTRODUCTION TO CREATING TABLES WITH RELATIONSHIP

Creating tables using CREATE command:


This command is used to create a database and its objects such as Tables, Views, Procedures,
Triggers etc. It defines each column of the table uniquely. Each column has minimum of three
attributes, a column name , data type and size.

Syntax: CREATE TABLE <table_name>


(
column _name 1 DATATYPE 1 (SIZE),
column _name 2 DATATYPE 2 (SIZE),
:
column _name n DATATYPE N (SIZE) );

CREATING TABLES WITH RELATIONSHIP

When we want to create tables with relationship , we need to use Referential integrity constraints. The
referential integrity constraint enforces relationship between tables.

-It designates a column or combination of columns as a Foreign key.


-The foreign key establish a relationship with a specified primary or unique key in another
table called the Referenced key.
- When referential integrity is enforced , it prevents from..
1) Adding records to a related table if there is no associated record in the primary table.
2) Changing values in a primary table that result in orphaned records in a related table.
3) Deleting records from a primary table if there are matching related records.
Note: The table containing the foreign key is called the child table and the table containing the
referenced key is called the Parent table.

SYNTAX: CREATE TABLE


<tablename>(col_name1 datatype[size] ,col_name2datatype[size] ,
:col_name n datatype[size],
FOREIGN KEY(column_name) REFERENCES <parent_table_name>(column_name));

3
EX: SQL> CREATE TABLE marks(sid VARCHAR2(4),marks NUMBER(3), PRIMARY

KEY(sid),FOREIGN KEY(sid) REFERENCES student1(sid));

IMPLEMENTATION OF KEY AND INTEGRITY CONSTRAINTS

Data constraints: All business of the world run on business data being gathered, stored and
analyzed. Business managers determine a set of business rules that must be applied to their data
prior to it being stored in the database/table of ensure its integrity.
For instance , no employee in the sales department can have a salary of less than Rs.1000/- . Such
rules have to be enforced on data stored. If not, inconsistent data is maintained in database.

Note: It is used to impose business rules on DBs.


It allows to enter only valid data.

Various types of Integrity Constraints:

Integrity constraints are the rules in real life, which are to be imposed on the data. If the data is not
satisfying the constraints then it is considered as inconsistent. These rules are to be enforced on
data because of the presence of these rules in real life. These rules are called integrity constraints.
Every DBMS software must enforce integrity constraints, otherwise inconsistent datais generated.

You can use constraints to do the following:

 To prevent invalid data entry into tables.


 To Enforce rules on the data in a table whenever a row is inserted, updated, or deleted
from that table. The constraint must be satisfied for the operation to succeed.
 To Prevent the deletion of a record from a table if there are dependencies.

4
Example for Integrity Constraints :-

Constraints are categorized as follows.

1. Domain integrity constraints - A domain means a set of values assigned to a column. i.e
Aset of permitted values. Domain constraints are handled by
Defining proper data type
Specifying not null
constraint Specifying check
constraint. Specifying
default constraint

Not null –indicates that a column cannot store NULL value.


Check – Ensures that the value in column meets a specific condition.
Default- prevents null value in column when value is not provided in column by user. So, it
assigns default value or globally assigned value.

2. Entity integrity constraints – are TWO types


Unique constraint –It defines a Entity or column as a UNIQUE for particular table.
And ensures that each row of a column must have a UNIQUE value
or name.
Primary key constraint – This avoids duplicate and null values. It combination of a NOT
NULL and UNIQUE.
3. Referential integrity constraints
Foreign key –indicates the relationship between child and parent tables.
This Constraint are always attached to a column not a table.
We can add constraints in two ways.

Column level :-

Constraint is declared immediately declaring column.


Define with each column
5
Use column level to declare constraint for single column.

6
Composite key cannot be defined at column level.
Table level :-
constraint is declared after declaring all columns.
use table level to declare constraint for combination of columns.(i.e composite key)
not null cannot be defined. Another type is possible at Alter level

constraint is declared with ALTER command.


When we use this , make sure that the table should not contain data.

To add these constraints , we can use constraint with label or with out label. TWO BASIC TYPES -

1. Constraint WITH NAME


2.constraints WITHOUT NAME.
i) Declaring Constraint at“TABLE” level (Constraints with
label)Syntax :- CREATE TABLE <table name>
(
col_name1 DATATYPE(SIZE) ,
…..,
col_nameN DATATYPE(SIZE) ,
CONSTRAINT <cons_lable> NAME _OF_ THE_CONSTRAINT
[column_list] );
ii) Declaring Constraint at “Column” level (Constraints with label)Syntax :-
col_name DATATYPE(SIZE) CONSTRAINT <cons_lable>
NAME _OF_ THE_CONSTRAINT

iii) Declaring Constraint at “TABLE” level (Constraints without


label)Syntax :- CREATE TABLE <table name>(
col_name1 DATATYPE(SIZE) ,
…..,
col_nameN DATATYPE(SIZE) ,
NAME _OF_ THE_CONSTRAINT [column_list] );

iv) Declaring Constraint at “Column” level (Constraints without


label)Syntax :- col_name DATATYPE(SIZE) NAME _OF_
THE_CONSTRAINT
v) Adding Constraint to a table at “Alter” level (Constraints with label)
A constraint can be added to a table at any time after the table was created by using ALTER
TABLE statement , using ADD clause.

Syntax:
ALTER TABLE <table_name> ADD CONSTRAINT cont_label NAME _OF_
THE_CONSTRAINT (column);

7
vi) Declaring Constraint at “Alter” level (Constraints without label)
Syntax:
ALTER TABLE <table_name> ADD NAME _OF_ THE_CONSTRAINT (column); Note:’
Constraint ‘ clause is not required when constraints declared without a label.

NOT NULL:
 It ensures that a table column cannot be left empty.
 Column declared with NOT NULL is a mandatory column i.e data must be entered.
 The NOT NULL constraint can only be applied at column level.
 It allows DUPLICATE data.
 Used to avoid null values into columns.

NOTE: It is applicable at COLUMN LEVEL only.


SYNTAX: column_name DATATAYPE[SIZE] NOT NULL

EX 1 : CREATE TABLE table_notnull(


sid NUMBER(4) NOT NULL, // COLUMN LEVELsname
VARCHAR2(10));
SQL> SELECT *FROM table_notnull;SID SNAME

501 GITA
502 RAJU503
503
504
Here, SID column not allowed any null values and it can allow duplicate values , but sname can
allows it.
Ex 2:

CHECK :
 Used to impose a conditional rule a table column.
 It defines a condition that each row must satisfy.
 Check constraint validates data based on a condition .

8
 Value entered in the column should not violate the condition.
 Check constraint allows null values.
 Check constraint can be declared at table level or column level.
 There is no limit to the number of CHECK constraints that can be defined on
a condition.
Limitations :-

 Conditions should not contain/not applicable to pseudo columns like ROWNUM,


SYSDATE etc.
 Condition should not access columns of another table

//CONSTRAINT @ COLUMN LEVEL


SYNTAX: column_name DATATYPE (SIZE) CHECK (condition) // without labelHere,

we are creating a table with two columns such as Sid, sname.

Ex: CREATE TABLE check_column(


sid VARCHAR2(4) CHECK (sid LIKE 'C%' AND LENGTH(sid)=4), // without label
sname VARCHAR2(10));
Here, sid should start with ‘C ‘and a length of sid is exactly 4 characters.

SQL> SELECT *FROM check_column;SID SNAME

C501 MANI C502 DHANAC503 RAVI C504 RAJA


// with label
Syntax: Column_name DATATYPE(SIZE) CONSTRAINT constaint_labelCHECK(condtion)

CREATE TABLE check_column (


sid VARCHAR2(4) CONSTRAINT ck CHECK (sid LIKE 'C%' AND LENGTH(sid)=4),sname
VARCHAR2(10) );

//CHECK @TABLE LEVEL

CREATE TABLE check_table (


sid VARCHAR2(4) , sname VARCHAR2(10),
CHECK (sid LIKE'C%' AND LENGTH(sid)=4),CHECK(sname
LIKE '%A'));

Here, sid should start with ‘C ‘and a length of sid is exactly 4 characters. And sname shouldends
with letter ‘ A’

9
SQL> SELECT *FROM check_table;

SID SNAME

C401 ABHILA C401 ANITHA C403 NANDHITHAC522 LOHITHA

// with label

CONSTRAINT ck1 CHECK (sid LIKE'C%' AND LENGTH(sid)=4),CONSTRAINT ck2


CHECK(sname LIKE '%A'));

@ ALTER LEVEL
Here, we add check constraint to new table with columns.

SQL> CREATE TABLE check_alter(sid VARCHAR2(4),


Sname VARCHAR2(10));

//CHECK @ ALTER LEVEL: / / CONSTRAINT WITHOUT NAME


SQL> ALTER TABLE <table name>ADD CHECK (condition ); SQL> ALTER TABLE

check_alter ADD CHECK ( sid like ‘C%’ );

SYNTAX: ALTER TABLE <table_name> ADD CONSTRAINT


cont_name CHECK(cond);

SQL> ALTER TABLE check_alter ADD CONSTRAINT ck CHECK ( sid LIKE 'C%');

ANOTHER EXAMPLE FOR TABLE LEVEL CONSTRAINT

Here, We create table with THREE columns

ADD CHECK CONSTRAINT @ TABLE LEVEL (AT THE END OF TABLE


DEFINITION)
MARKS IN BETWEEN 0 AND 100.

SQL> CREATE TABLE marks2 ( sid VARCHAR2(4),


sec VARCHAR2(2),
marks NUMBER(3),
CHECK(marks>0 AND marks<=100) );DROP @ CHECK

CONSTRAINT

SYNTAX: ALTER TABLE <table_name> DROP CONSRAINT cont_name;

SQL> ALTER TABLE check_table DROP CONSTRAINT ck;

1
0
DEFAULT

-If values are not provided for table column , default will be considered.
-This prevents NULL values from entering the columns , if a row is inserted without a value fora
column.
-The default value can be a literal, an expression, or a SQL function.
-The default expression must match the data type of the column.
- The DEFAULT constraint is used to provide a default value for a column.

-The default value will be added to all new records IF no other value is specified. Syntax:

Column_name datatype (size) DEFAULT <value/expression/function> Ex: MIDDLENAME

VARCHAR(10) DEFAULT 'UNAVAILABLE'

CONTACTNO NUMBER(10) DEFAULT 9999999999

This defines what value the column should use when no value has been supplied explicitly when
inserting a record in the table.

CREATE TABLE tab_default( sid NUMBER(10),


contactno number(10) DEFAULT 9999999999);
Add data to table:

Insert into tab_default (sid,contactno) values(501,9493949312);Insert into tab_student(sid)


values(502);
Insert into tab_student(sid) values(503);
Insert into tab_student(sid,sname) values(504,9393949412);

Select * from tab_default;SID CONTACTNO


---------- ------------------
501 9493949312
502 9999999999
503 9999999999
504 9393949412

UNIQUE
 Columns declared with UNIQUE constraint does not accept duplicate values.
 One table can have a number of unique keys.
 Unique key can be defined on more than one column i.e composite unique key
 A composite key UNIQUE key is always defined at the table level only.
 By default UNIQUE columns accept null values unless declared with NOT
NULL constraint
 Oracle automatically creates UNIQUE index on the column declared with
UNIQUE constraint

1
 UNIQUE constraint can be declared at column level and table level.

UNIQUE@ COLUMN LEVEL


SYNTAX: column_name DATA_TYPE(SIZE) UNIQUE

CREATE TABLE
table_unique( sid NUMBER(4) UNIQUE,sname VARCHAR2(10));

//UNIQUE @ TABLE
LEVEL

SYNTAX: UNIQUE(COLUMN_LIST);
CREATE TABLE table_unique2(
sid NUMBER(4),
sname VARCHAR2(10) ,
UNIQUE(sid,sname));

SQL> SELECT *FROM TABLE_UNIQUE2;SID SNAME

401 RAMU
402 SITA // Here , these two records are distinct not the same.402 GITHA
403 GITHA
404 RAMU

Unique @ ALTER level:


Alter table table_unique ADD UNIQUE (sid) // with out label
Alter table table_unique ADD CONSTRAINT uq UNIQUE(sid) // with label

DROP UNIQUE @ TABLE LEVEL


SQL> ALTER TABLE table_unique2 DROP UNIQUE(sid,sname);

Now , we removed unique constraint , so now this table consists duplicate data.
//UNIQUE@ ALTER LEVEL (here, the table contains duplicates, so it is not works)
//delete data from table_unique2
SQL> DELETE FROM table_unique2;

PRIMARY KEY constraint :-


PRIMARY KEY is one of the candidates key , which uniquely identifies a record in a table.
-used to define key column of a table.
-it is provided with an automatic index.
-A primary key constraint combines a NOT NULL and UNIQUE behavior in one declaration.
Characterstics of PRIMARY KEY :-

There should be at the most one Primary Key or Composite primary key per table
1
PK column do not accept null values.
PK column do not accept duplicate values.
RAW,LONG RAW,VARRAY,NESTED TABLE,BFILE columns cannot be declared
with PK
If PK is composite then uniqueness is determined by the combination of
columns. A composite primary key cannot have more than 32 columns
It is recommended that PK column should be short and numeric.
Oracle automatically creates Unique Index on PK column
EX:

// PRMARY KEY @ COLUMN LEVEL


SYNTAX : column_name DATA_TYPE(SIZE) PRIMARY KEY

SQL> CREATE TABLE student1 (


sid VARCHAR2(4) PRIMARY KEY
CHECK (sid LIKE 'V%' AND LENGTH(sid)=4 ) ,
name VARCHAR2(10));

SQL> DESC student1;


Name Null? Type

SID NOT NULL VARCHAR2(4) NAME VARCHAR2(10)

CASE 2: ADD PRIMARY KEY @ ALTER LEVEL

SQL> CREATE TABLE student2( sid VARCHAR2(4),


name VARCHAR2(10));

SYNTAX: ALTER TABLE <tablename> ADD PRIMARY KEY

(col_name);SQL> ALTER TABLE student2 ADD PRIMARY KEY(sid);

Table altered.

SQL> DESC student2;


Name Null? Type
-----------------------------------------------------------------------------SID NOT NULL
VARCHAR2(4) NAME VARCHAR2(10)

13
CASE 3 : ADD PRIMARY KEY @ TABLE LEVEL
here, we can create a simple and composite primary keys;

SYNTAX: CREATE TABLE < tablename>( col_name1 datatype[size],


col_name2 datatype[size],
:
col_namen datatype[size], PRIMARY KEY
(col_name);
//SIMPLE PRIMARY KEY @ TABLE LEVEL
SQL> CREATE TABLE
student3( sid VARCHAR2(4), name
VARCHAR2(10), marks NUMBER(3),
PRIMARY KEY(sid) );

SQL> DESC student3;


Name Null?
Type

SID NOT NULL VARCHAR2(4) NAME VARCHAR2(10)


MARKS NUMBER(3)

// COMPOSITE PRIMARY KEY @ TABLE LEVELSYNTAX:


CREATE TABLE < tablename>( col_name1 datatype[size],
col_name2 datatype[size],
:
col_namen datatype[size],
PRIMARY KEY (col_name1,col_name2….colmn_name n);

SQL> CREATE TABLE


student4( sid VARCHAR2(4), name VARCHAR2(10),
marks NUMBER(3),
PRIMARY KEY(sid,name) ); // WITH OUT LABEL

CONSTRAINT pk PRIMARY KEY(sid,name) // WITH LABEL

SQL> DESC STUDENT4;


Name Null? Type
------------------------------------------------------------------------------SID NOT NULL
VARCHAR2(4)
NAME NOT NULL VARCHAR2(10)MARKS NUMBER(3)

14
FOREIGN KEY Constraint:-

 Foreign key is used to establish relationship between tables.


 Foreign key is a column in one table that refers primary key/unique
columns of another or same table.
 Values of foreign key should match with values of primary key/unique or
foreign key can be null.
 Foreign key column allows null values unless it is declared with NOT
NULL.
 Foregin key column allows duplicates unless it is declared with UNIQUE
 By default oracle establish 1:M relationship between two tables.
 To establish 1:1 relationship between two tables declare foreign key
with unique constraint
 Foreign key can be declared at column level or table level.
 Composite foreign key must refer composite primary key or Composite
unique key.

EX: TABLES: STYDENT 1 & MARKS 1

CHILD TABLE ‘ MARKS1’ :

ADDING PRIMARY AND CHECK CONSTRAINT @ CREATE LEVEL


SQL> CREATE TABLE marks1(
sid VARCHAR2(4) PRIMARY KEY CHECK( sid LIKE 'V%' AND LENGTH(sid)=4),
marks NUMBER(3) );

// ADDING PRIMARY AND FOREIGN KEY @ TABLE/ CREATE LEVEL

15
SYNTAX: CREATE TABLE
<tablename>(
col_name1 datatype[size] ,col_name2
datatype[size] ,
:
col_name n datatype[size],

FOREIGN KEY(column_name) REFERENCES <parent_table_name>(column_name));


EX: SQL> CREATE TABLE marks3(
sid VARCHAR2(4),
marks NUMBER(3), PRIMARY KEY(sid),
FOREIGN KEY(sid) REFERENCES student1(sid));

SQL> DESC MARKS3;


Name Null? Type
------------------------------------------------------------------------------SID NOT NULL
VARCHAR2(4) MARKS NUMBER(3)

Query : ADD CHECK CONSTRAINT @ ALTER LEVEL ( ON EXISTING TABLE)


MARKS IN BETWEEN 0 AND 100.

SQL> ALTER TABLE marks3 ADD CHECK ( marks>0 AND marks< =100 );

ADD CHECK CONSTRAINT @ TABLE LEVEL (AT THE END OF TABLE


DEFINITION)
MARKS IN BETWEEN 0 AND 100.

SQL> CREATE TABLE marks3 ( sid VARCHAR2(4),


sec VARCHAR2(2),
marks NUMBER(3),
CHECK(marks>0 AND marks<=100)
);

//ADDING FOREIGN KEY @ ALTER LEVEL

SQL> ALTER TABLE marks1 ADD FOREIGN KEY (sid) REFERENCESSTUDENT1(sid);


SQL> desc marks1;
Name Null? Type
-----------------------------------------------------------------------------SID NOT NULL
VARCHAR2(4) MARKS NUMBER(3)

Note :-
PRIMARY KEY cannot be dropped if it referenced by any FOREIGN KEY constraint.
If PRIMARY KEY is dropped with CASCADE option then along with PRIMARY
KEY referencingFOREING KEY is also dropped.
PRIMARY KEY column cannot be dropped if it is referenced by some FOREIGN
KEY. PRIMARY KEY table cannot be dropped if it is referenced by some FOREIGN

16
KEY.

17
PRIMARY KEY table cannot be truncated if it is referenced by some FOREIGN KEY.
Note:: Once the primary key and foreign key relationship has been created then you cannot
remove any parent record if the dependent childs exists.

USING ON DELETE CASCADE

By using this clause you can remove the parent record even if childs exists. Because when ever
you remove parent record oracle automatically removes all its dependent records from child
table, if this clause is present while creating foreign key constraint.
Ex: Consider twe tables dept(parent) and emp(child) tables.TABLE LEVEL

SQL> create table emp(empno number(2), ename varchar(10), deptno number(2),


primarykey(empno), foreign key(deptno) references dept(deptno) on delete cascade); //
without label

SQL> create table emp(empno number(2), ename varchar(10), deptno


number(2),constraint pk primary key(empno), constraint fk foreign key(deptno) references
dept(deptno) on delete cascade); // with label
ALTER LEVEL
SQL> alter table emp add foreign key(deptno) references dept(deptno) on delete cascade;
SQL> alter table emp add constraint fk foreign key(deptno) references dept(deptno)
ondelete cascade;
Enabling/Disabling a Constraint:
If the constraints are present, then for each DML operation constraints are checked by executing
certain codes internally. It may slow down the DML operation marginally. For massive DML
operations, such as transferring data from one table to another because of the presenceof constraint,
the speed will be considered slower. To improve the speed in such cases, the following methods
are adopted:

Disable constraint
Performing the DML operation DML
operation Enable constraint

Disabling Constraint:-Syntax :-
ALTER TABLE <tabname> DISABLE CONSTRAINT <constraint_name> ;Example
:- SQL>ALTER TABLE student1 DISABLE CONSTRAINT ck ; SQL>ALTER TABLE
mark1 DISABLE PRIMARY KEY CASCADE;

18
NOTE:-
If constraint is disabled with CASCADE then PK is disabled with FK.

Enabling Constraint :-Syntax :-


ALTER TABLE <tabname> ENABLE CONSTRAINT <name>
Example :-
SQL>ALTER TABLE student1 ENABLE CONSTRAINT ck;

SET OPERATIONS IN SQL


SQL supports few Set operations to be performed on table data. These are used to get meaningful
results from data, under different special conditions. The SET operators combine the results of two
or more component queries into one result. Queries containing SET operators are called
Compound Queries.

The number of columns and data types of the columns being selected must be identical in all the
SELECT statements used in the query. The names of the columns need not be identical.

All SET operators have equal precedence. If a SQL statement contains multiple SET operators, the
oracle server evaluates them from left (top) to right (bottom) if no parentheses explicitly specify
another order.
Introduction
SQL set operators allows combine results from two or more SELECT statements. At first sight thislooks
similar to SQL joins although there is a big difference. SQL joins tends to combine columns i.e. with
each additionally joined table it is possible to select more and more columns. SQL setoperators on
the other hand combine rows from different queries with strong preconditions .
 Retrieve the same number of columns and
 The data types of corresponding columns in each involved SELECT must be
compatible (either the same or with possibility implicitly convert to the data
types of the first SELECTstatement).

Set operator types

According to SQL Standard there are following Set operator types:

 UNION ---returns all rows selected by either query. To return all rows from
multipletables and eliminates any duplicate rows.
 UNION ALL-- returns all rows from multiple tables including duplicates.
 INTERSECT – returns all rows common to multiple queries.
 MINUS—returns rows from the first query that are not present in second query.

19
Note: Whenever these operators used select statement must have

- Equal no. of columns.


- Similar data type columns.

Syntax :-

SELECT statement 1
UNION / UNION ALL / INTERSECT / MINUS
SELECT statement 2 ;
Rules :-

1 No of columns returned by first query must be equal to no of columns returned by


second query
2 Corresponding columns datatype type must be same.

1. UNION

 UNION operator combines data returned by two SELECT statement.


 Eliminates duplicates.
 Sorts result.
 This will combine the records of multiple tables having the same structure.

Example :-

1 SQL>SELECT job FROM emp WHERE deptno=10


UNION
SELECT job FROM emp WHERE deptno=20 ;

2 SQL>SELECT job,sal FROM emp WHERE

deptno=10 UNION
SELECT job,sal FROM emp WHERE deptno=20 ORDER BY sal ;NOTE:-
ORDER BY clause must be used with last query.

3 SQL> select * from student1 union select * from student2;

20
2. UNION ALL

This will combine the records of multiple tables having the same structurebut including duplicates.
IT is similar to UNION but it includes duplicates.

Example :-

SQL>SELECT job FROM emp WHERE deptno=10UNION ALL


SELECT job FROM emp WHERE deptno=20 ;

SQL> select * from student1 union all select * from student2;

3. INTERSECT

This will give the common records of multiple tables having the samestructure.

INTERSECT operator returns common values from the result of two SELECT statements.

Example:-
Display common jobs belongs to 10th and 20th departments ?

EX 1: SQL>SELECT job FROM emp WHERE deptno=10

INTERSECT
SELECT job FROM emp WHERE deptno=20;

EX2: SQL> select * from student1 intersect select * from student2;

21
4. MINUS

This will give the records of a table whose records are not in other tableshaving the same structure.
MINUS operator returns values present in the result of first SELECT statement and not presentin
the result of second SELECT statement.

Example:-
Display jobs in 10th dept and not in 20th dept ?

EX1: SQL>SELECT job FROM emp WHERE deptno=10MINUS


SELECT job FROM emp WHERE deptno=20;

Ex2: SQL> select * from student1 minus select * from student2;

UNION vs JOIN :-

UNION JOIN
Union combines data
Join relates data
Union is performed on similar
Join can be performed also be performedon
structures
dissimilar structures also

SQL JOINS

A SQL JOIN is an Operation , used to retrieve data from multiple tables. It is performed whenever
two or more tables are joined in a SQL statement. so, SQL Join clause is used to combine records
from two or more tables in a database. A JOIN is a means for combining fields from two tables by
using values common to each. Several operators can be used to join tables,

such as =, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; these all to be used to join tables.
However, the most common operator is the equal symbol.

22
SQL Join Types:
There are different types of joins available in SQL:
INNER JOIN: Returns rows when there is a match in both tables.
OUTER JOIN : Returns all rows even there is a match or no match in tables.
- LEFT JOIN/LEFT OUTER JOIN: Returns all rows from the left table,
even if there are no matches in the right table.
-RIGHT JOIN/RIGHT OUTER JOIN : Returns all rows from the right table, even if there
Are no matches in the left table.

-FULL JOIN/FULL OUTER JOIN : Returns rows when there is a match in one of the tables.

.
SELF JOIN: It is used to join a table to itself as if the table were two tables, temporarily
renaming at least one table in the SQL statement.
CARTESIAN JOIN or CROSS JOIN : It returns the Cartesian product of the sets of
records from the two or more joined tables.
Based on Operators, The Join can be classified as
- Inner join or Equi Join
- Non-Equi Join
 NATURAL JOIN: It is performed only when common column name is same. In this,no
need to specify join condition explicitly , ORACLE automatically performs join
operation on the column with same name.
1. SQL INNER JOIN (simple join)
It is the most common type of SQL join. SQL INNER JOINS return all rows from multiple
tables where the join condition is met.
Syntax
SELECT columns FROM table1 INNER JOIN table2 ON table1.column = Table2.column;
Visual Illustration
In this visual diagram, the SQL INNER JOIN returns the shaded area:

23
The SQL INNER JOIN would return the records where table1 and table2 intersect.Let's
look at some data to explain how the INNER JOINS work with example.
We have a table called SUPPLIERS with two fields (supplier_id and supplier_name).It
contains the following data:
supplier_id supplier_name10000 ibm
10001 hewlett Packard
10002 Microsoft
10003 Nvidia
We have another table called ORDERS with three fields (order_id, supplier_id, and
order_date).
It contains the following data:

order_id supplier_id order_date


500125 10000 2003/05/12
500126 10001 2003/05/13
500127 10004 2003/05/14

Example of INNER JOIN:


Q: List supplier id, name and order id of supplier.
SELECT s.supplier_id, s.supplier_name, od.order_date FROM suppliers s INNER JOIN
orders od ON s.supplier_id = od.supplier_id;
This SQL INNER JOIN example would return all rows from the suppliers and orders tables
where there is a matching supplier_id value in both the suppliers and orders tables.Our
result set would look like this:
supplier_id name order_date
10000 ibm 2003/05/12
10001 hewlett 2003/05/13
packard

24
The rows for Microsoft and NVIDIA from the supplier table would be omitted, since the
supplier_id's 10002 and 10003 do not exist in both tables.
The row for 500127 (order_id) from the orders table would be omitted, since
thesupplier_id 10004 does not exist in the suppliers table.
2.OUTER JOIN:
Inner / Equi join returns only matching records from both the tables but not unmatched record,An
Outer join retrieves all row even when one of the column met join condition.
Types of outer join:
1. LEFT JOIN/LEFT OUTER JOIN

2. RIGHT JOIN/RIGHT OUTER JOIN

3. FULLJOIN/FULL OUTER JOIN


2.1. LEFT OUTER JOIN
This type of join returns all rows from the LEFT-hand table specified in the ON condition
and only those rows from the other table where the joined fields are equal (joincondition is
met).
Syntax
SELECT columns FROM table1 LEFT [OUTER] JOIN table2ON table1.column =
table2.column;
Visual Illustration
In this visual diagram, the SQL LEFT OUTER JOIN returns the shaded area:

The SQL LEFT OUTER JOIN would return the all records from table1 and only those
records from table2 that intersect with table1.
Example
SELECT suppliers.supplier_id, suppliers.supplier_name, orders.order_date FROM
suppliers LEFT OUTER JOIN orders ON suppliers.supplier_id = orders.supplier_id;
This LEFT OUTER JOIN example would return all rows from the suppliers table and only
25
those rows from the orders table where the joined fields are equal.

supplier_id supplier_name order_date


---------- ------------------ -----------------------
---- ----

10000 ibm 2003/05/12


10001 hewlett packard 2003/05/13
10002 microsoft <null>
10003 nvidia <null>

The rows for Microsoft and NVIDIA would be included because a LEFT OUTER JOINwas
used. However, you will notice that the order_date field for those records contains a
<null> value.
SQL RIGHT OUTER JOIN
This type of join returns all rows from the RIGHT-hand table specified in the ON
condition and only those rows from the other table where the joined fields are equal
(joincondition is met).
Syntax
SELECT columns FROM table1 RIGHT [OUTER] JOIN table2 ON table1.column =
table2.column;
In some databases, the RIGHT OUTER JOIN keywords are replaced with RIGHT JOIN.
Visual Illustration
In this visual diagram, the SQL RIGHT OUTER JOIN returns the shaded area:

The SQL RIGHT OUTER JOIN would return the all records from table2 and only those
records from table1 that intersect with table2.

26
Example
SELECT orders.order_id, orders.order_date, suppliers.supplier_name FROM suppliers

RIGHT OUTER JOIN orders ON suppliers.supplier_id = orders.supplier_id;


This RIGHT OUTER JOIN example would return all rows from the orders table and only
those rows from the suppliers table where the joined fields are equal.
If a supplier_id value in the orders table does not exist in the suppliers table, all fields
inthe suppliers table will display as <null> in the result set.
order_id order_date supplier_nam
e
------------ --------------- -----------------
500125 2013/05/12 ibm
500126 2013/05/13 hewlett
packard
500127 2013/05/14 <null>

The row for 500127 (order_id) would be included because a RIGHT OUTER JOIN wasused.
However, you will notice that the supplier_name field for that record contains a
<null> value.
SQL FULL OUTER JOIN
This type of join returns all rows from the LEFT-hand table and RIGHT-hand table with
nulls in place where the join condition is not met.
Syntax
SELECT columns FROM table1 FULL [OUTER] JOIN table2 ON table1.column =
table2.column; In some databases, the FULL OUTER JOIN keywords are replaced with
FULL JOIN.
Visual Illustration
In this visual diagram, the SQL FULL OUTER JOIN returns the shaded area:

27
The SQL FULL OUTER JOIN would return the all records from both table1 and table2.
Example
Here is an example of a SQL FULL OUTER JOIN:

Query : Find supplier id, supplier name and order date of suppliers who have ordered.

SELECT suppliers.supplier_id, suppliers.supplier_name, orders.order_date FROM


suppliers FULL OUTER JOIN orders ON suppliers.supplier_id = orders.supplier_id;

This FULL OUTER JOIN example would return all rows from the suppliers table and all
rows from the orders table and whenever the join condition is not met, <nulls> would be
extended to those fields in the result set.
If a supplier_id value in the suppliers table does not exist in the orders table, all fields in the
orders table will display as <null> in the result set. If a supplier_id value in the orders table
does not exist in the suppliers table, all fields in the suppliers table will display as
<null> in the result set.

supplier_id supplier_name order_date


----------------- -------------------- --------------------
-
10000 ibm 2013/05/12
10001 hewlett 2013/05/13
packard
10002 microsoft <null>
10003 nvidia <null>
<null> <null> 2013/05/
14
The rows for Microsoft and NVIDIA would be included because a FULL OUTER JOIN
wasused. However, you will notice that the order_date field for those records contains a
<null> value.
The row for supplier_id 10004 would be also included because a FULL OUTER JOIN was
used. However, you will notice that the supplier_id and supplier_name field for those records
contain a <null> value.

Equi join :
When the Join Condition is based on EQUALITY (=) operator, the join is said to be an Equijoin. It
28
is also called as Inner Join.
Syntax
Select col1,col2,…From <table 1>,<table 2>Where <join condition with ‘=’ > .
Ex.Query : Find supplier id, supplier name and order date of suppliers who have ordered .
select s.supplierid, s.uppliername ,o.order_date from suppliers s, orders o where s.supplierid
=o.supplierid

supplier_id name order_date


10000 ibm 2003/05/12
10001 hewlett 2003/05/1
packard 3
Non Equi Join :-
When the join condition based on other than equality operator , the join is said to be a Non-Equi
join.
Syntax:-
Select col1,col2,……. From <table 1>,<table 2>
Where <join condition > [AND <join cond> AND <cond>-----]
In NON- EQUI JOIN, JOIN COND is not based on = operator. It is based on other than
= operator, usually BETWEEN or > or < operators.
Query : Find supplier id,supplier name and order date in between 50025 and 500127.
sql> select s.supplier_id,s.supplier_name,o.order_date from suppliers s , orders o whereo.order_id
between 500125 and 500127;

SUPPLIER_ID SUPPLIER_N ORDER_DAT


10000 ibm 12-may-03

10000 ibm 13-may03


10000 ibm 14-may03
10001 hewlett 12-may03
10001 hewlett 13-may03
10001 hewlett 14-may-03
10002 microsoft 12-may-03
` 10002 microsoft 13-may-03
10002 microsoft 14-may-03
10003 nvidia 12-may-03
10003 nvidia 13-may-03
10003 nvidia 14-may-03

29
Query : Find supplier id,supplier name and order date above 500126.
sql> select s.supplier_id,s.supplier_name,o.order_date from suppliers s , orders o whereo.order_id
>500126;

SUPPLIER_ID SUPPLIER_NO ORDER_DAT

Self Join :-
Joining a table to itself is called Self Join.
 Self Join is performed when tables having self refrential integrity.
 To perform Self Join same table must be listed twice with different alias.
 Self Join is Equi Join within the table.
It is used to join a table to itself as if the table were two tables, temporarily renaming at leastone
table in the SQL statement.
Syntax :
(Here T1 and T2 refers same table)
SELECT <collist> From Table1 T1, Table1 T2Where T1.Column1=T2.Column2;
Example:
select s1.supplier_id ,s1.supplier_name ,s2.supplier_id from suppliers s1, suppliers s2 where
s1.supplier_id=s2.supplier_id ;
supplier_id supplier_name supplier_id
------------- -------------- ---------------
---- ----

10000 ibm 10000


10001 hewlett packard 10001
10002 microsoft 10002
10003 nvidia 10003

30
CROSS JOIN:
It returns the Cartesian product of the sets of records from the two or more joined tables. In
Cartesian product, each element of one set is combined with every element of another set to form
the resultant elements of Cartesian product.
Sytax: SELECT * FROM <tablename1> CROSS JOIN <tablename2>

 CROSS JOIN returns cross product of two tables.


 Each record of one table is joined to each and every record of another table.
 If table1 contains 10 records and table2 contains 5 records then CROSS JOIN between
table1 and table2 returns 50 records.
 ORACLE performs CROSS JOIN when we submit query without JOIN COND.
Example: sql> SELECT * FROM suppliers CROSS JOIN orders;supplier_id
supplier_n order_id supplier_id order_dat

31
NATURAL JOIN:
 NATURAL JOIN is possible in ANSI SQL/92 standard.
 NATURAL JOIN is similar to EQUI JOIN.
 NATURAL JOIN is performed only when common column name is same.
 In NATURAL JOIN no need to specify join condition explicitly ,
ORACLEautomatically performs join operation on the column with same name.
Syntax: SELECT <column list> FROM table1 NATURAL JOIN table2;
Example: ( Sailors table)
SELECT sid,sname,sid FROM sailors NATURAL JOIN reserves ; //both tables
havesame column name.

32
VIEWS
A view in SQL is a logical subset of data from one or more tables. View is used to restrict data
access.Data abstraction is usually required after a table is created and populated with data. Data held
by some tables might require restricted access to prevent all users from accessing all columns of a
table, for data security reasons. Such a security issue can be solved by creating several tables with
appropriate columns and assigning specific users to each such table, as required. This answersdata
security requirements very well but gives rise to a great deal of redundant data being resident in
tables, in the database.To reduce redundant data to the minimum possible, Oracle provides Virtual
tables which are Views.
View Definition :-
A View is a virtual table based on the result returned by a SELECT query.
The most basic purpose of a view is restricting access to specific column/rows from a table
thus allowing different users to see only certain rows or columns of a table.
Composition Of View:-
A view is composed of rows and columns, very similar to table. The fields in a view are
fields from one or more database tables in the database.

SQL functions, WHERE clauses and JOIN statements can be applied to a view in the same
manner as they are applied to a table.

View storage:-
Oracle does not store the view data. It recreates the data, using the view’s SELECT statement,
every time a user queries a view.

A view is stored only as a definition in Oracle’s system catalog.


When a reference is made to a view, its definition is scanned, the base table is opened and the
view is created on top of the base table.This, therefore, means that a view never holds data, until a
specific call to the view is made. This reduces redundant data on the HDD to a very large extent.

Advantages Of View:-
Security:- Each user can be given permission to access only a set of views that contain specific data.
Query simplicity:- A view can drawn from several different tables and present it as a single table
turning multiple table queries into single table queries against the view.
Data Integrity:- If data is accessed and entered through a view, the DBMS can automatically check
the data to ensure that it meets specified integrity constraints.

33
Disadvantage of View:-
Performance:- Views only create the appearance of the table but the RDBMS must still translate
queries against the views into the queries against the underlined source tables. If the view is
defined on a complex multiple table query then even a simple query against the view becomes a
complicated join and takes a long time to execute.
Types of Views :-
 Simple Views
 Complex Views

Simple Views :-
a View based on single table is called simple view.
Syntax:-
CREATE VIEW <View Name>AS
SELECT<ColumnName1>,<ColumnName2>..FROM <TableName>
[WHERE <COND>] [WITH CHECK OPTION][WITH READ ONLY]

Example :-

SQL>CREATE VIEW emp_vAS


SELECT empno,ename,sal FROM emp ;

Views can also be used for manipulating the data that is available in the base tables[i.e. the
user can perform the Insert, Update and Delete operations through view.

Views on which data manipulation can be done are called Updateable Views.

If an Insert, Update or Delete SQL statement is fired on a view, modifications to data in the
view are passed to the underlying base table.

For a view to be updatable,it should meet the following criteria:

Views defined from Single table.


If the user wants to INSERT records with the help of a view, then the PRIMARY KEY
column(s) and all the NOT NULL columns must be included in the view.

Inserting record through view :-

SQL>INSERT INTO emp_v VALUES(1,’A’,5000,200) ;

Updating record throught view :-

UpdatingaView:
A view can be updated under certain conditions:
34
 The SELECT clause may not contain the keyword DISTINCT.
 The SELECT clause may not contain summary functions.
 The SELECT clause may not contain set functions.
 The SELECT clause may not contain set operators.
 The SELECT clause may not contain an ORDER BY clause.
 The FROM clause may not contain multiple tables.
 The WHERE clause may not contain subqueries.
 The query may not contain GROUP BY or HAVING.
 Calculated columns may not be updated.
 All NOT NULL columns from the base table must be included in the view in order for
the INSERT query to function.
So if a view satisfies all the above-mentioned rules then you can update a view.

EX: SQL>UPDATE emp_v SET sal=2000 WHERE empno=1;

Deleting record throught view :-

SQL>DELETE FROM emp_v WHERE empno=1;With Check Option :-

If VIEW created with WITH CHECK OPTION then any DML operation through that view
violates where condition then that DML operation returns error.

Example :- SQL>CREATE VIEW V2


AS
SELECT empno,ename,sal,deptno FROM empWHERE deptno=10
WITH CHECK OPTION ;
Then insert the record into emp table through view V2
SQL>INSERT INTO V2 VALUES(2323,’RAJU’,4000,20) ;
The above INSERT returns error because DML operation violating WHERE clause.
Complex Views :-
A view is said to complex view
If it based on more than one table
Query contains
AGGREGATE functionsDISTINCT clause GROUP BY clause
HAVING clause
Sub-queries Constants
Strings or Values Expressions UNION,INTERSECT,MINUS
operators.
Example 1 :-
SQL>CREATE VIEW V3AS
SELECT E.empno,E.ename,E.sal,D.dname,D.locFROM emp E JOIN dept D
USING(deptno) ;

35
NON- UPDATABLE VIEWS:

we cannot perform insert or update or delete operations on base table through complex views.
Complex views are not updatable views.
Example 2 :-
SQL>CREATE VIEW V2AS
SELECT deptno,SUM(sal) AS sumsalFROM
EMP GROUP BY deptno;

Destroying a View:-
The DROP VIEW command is used to destroy a view from the database.Syntax:-
DROP VIEW<viewName>Example :-
SQL>DROP VIEW emp_v;
DIFFERENCES BETWEEN SIMPLE AND COMPLEX VIEWS:

SIMPLE COMPLEX
Created from one table Created from one or more tables
Does not contain functions Conations functions
Does not contain groups of data Contain groups of data

MATERIALIZED VIEW: @ DATAWAREHOUSE SYSTEMS

A materialized view in Oracle is a database object that contains the results of a query. They are
local copies of data located remotely, or are used to create summary tables based on
aggregationsof a table's data. Materialized views, which store data based on remote tables are also,
know as snapshots.

A materialized view can querytables, views, and other materialized views. Collectively these are
called master tables (a replication term) or detail tables (a data warehouse term).

For replication purposes, materialized views allow you to maintain copies of remote data on your
local node. These copies are read-only. If you want to update the local copies, you have to use the
Advanced Replication feature. You can select data from a materialized view as you would from a
table or view.

For data warehousing purposes, the materialized views commonly created are aggregate
views, single-table aggregate views, and join views.

In replication environments, the materialized views commonly created are primary key, rowid, and
36
subquery materialized views.

SYNTAX:

CREATE MATERIALIZED VIEW view-


name BUILD [IMMEDIATE | DEFERRED]
REFRESH [FAST | COMPLETE | FORCE ]
ON [COMMIT | DEMAND ]
[[ENABLE | DISABLE] QUERY REWRITE]
[ON PREBUILT TABLE]
AS
SELECT COLUMN_LIST FROM TABLE_NAME;
The BUILD clause options are shown below.
 IMMEDIATE : The materialized view is populated immediately.
 DEFERRED : The materialized view is populated on the first requested refresh.

The following refresh types are available.


 FAST : A fast refresh is attempted. If materialized view logs are not present against
the source tables in advance, the creation fails.
 COMPLETE : The table segment supporting the materialized view is truncated
and repopulated completely using the associated query.
 FORCE : A fast refresh is attempted. If one is not possible a complete refresh
isperformed. A refresh can be triggered in one of two ways.
 ON COMMIT : The refresh is triggered by a committed data change in one of
the dependent tables.
 ON DEMAND : The refresh is initiated by a manual request or a scheduled task.
The QUERY REWRITE clause tells the optimizer if the materialized view should be considerfor
query rewrite operations. An example of the query rewrite functionality is shown below.
The ON PREBUILT TABLE clause tells the database to use an existing table segment, which
must have the same name as the materialized view and support the same column structure as the
query.

Example:
The following statement creates the rowid materialized view on table emp located on a remote
database:
SQL> CREATE MATERIALIZED VIEW mv_emp_rowidREFRESH WITH ROWID
AS SELECT * FROM emp@remote_db;Materialized view log created.

ORDERING

USING “ ORDER BY” clause:

This will be used to ordering the columns data (ascending or descending).Syntax1: (simple form)
37
select * from <table_name> order by <col> desc;

Note: By default oracle will use ascending order.

If you want output in descending order you have to use desc keyword after the
column.Ex: SQL> select * from student order by no; SQL> select * from student order by
no desc;

The order of rows returned in a query result is undefined. The ORDER BY clause can be used
tosort the rows. If you use the ORDER BY clause, it must be the last clause of the SQL statement.
You can specify an expression, or an alias, or column position in ORDER BY clause.

Syntax2 : ( complex form)

SELECT expr FROM table


[WHERE condition(s)]
[ORDER BY {column, expr} [ASC|DESC]];
In the syntax,

ORDER BY :specifies the order in which the retrieved rows are displayed.orders the rows in
ascending order ( default order)
orders the rows in descending order
Ordering of Data :-

 Numeric values are displayed with the lowest values firs for example 1–999.
 Date values are displayed with the earliest value first for example 01-JAN-92 before 01-
JAN-95.

38
 Character values are displayed in alphabetical order—for example, A first and Z last.

39
 Null values are displayed last for ascending sequences and first for descendingsequences.
Examples :-
Arrange employee records in ascending order of their sal ?

SQL>SELECT * FROM emp ORDER BY sal ;


Arrange employee records in descending order of their sal ?

SQL>SELECT * FROM emp ORDER BY sal DESC ;


Display employee records working for 10th dept and arrange the result in ascending order oftheir
sal ?

SQL>SELECT * FROM emp WHERE deptno=10 ORDER BY sal ;


Arrange employee records in ascending of their deptno and with in dept arrange records
in descending order of their sal ?

SQL>SELECT * FROM emp ORDER BY deptno,sal DESC ;


In ORDER BY clause we can use column name or column position , for example

SQL>SELECT * FROM emp ORDER BY 5 DESC ;


In the above example records are sorted based on the fifth column in emp table.
Arrange employee records in descending order of their comm. If comm. Is null then arrange
thoserecords last ?

SQL>SELECT * FROM emp ORDER BY comm DESC NULLS LAST ;

GROUP BYAND HAVING CLAUSE


GROUP BY clause
Using group by, we can create groups of related information. Columns used in select must be
used with group by, otherwise it was not a group by expression.

SELECT [DISTINCT] select-listFROM from-list

WHERE qualification GROUP BY grouping-list

HAVING group-qualification

 The select list in the SELECT clause contain


1. A list of column names
2. A list of terms having the form aggop( aggregate operators)Every column
that appear in (1) must also appear in grouping-list

 The expression appearing in the group-qualification in the HAVING clause must have
a single value per group.
40
41
Ex: SQL> select deptno, sum(sal) from emp group by deptno;SQL> select deptno, sum(sal)
from emp group by deptno; DEPTNO SUM(SAL)
---------- ----------

10 875
0
20 10875
30 9400

SQL> select deptno,job,sum(sal) from emp group by deptno,job;Sql> Find the age of the
youngest sailor for each rating level.

SQL> Select s.rating, MIN(s.age) from sailors s GROUP BY s.rating;

Find the age of the youngest sailor who is eligible to vote for each rating level with at least two
such sailors ?

SQL> select s.rating, MIN(s.age) as minage from sailors s where s.age>=18

GROUP BY s.rating HAVING COUNT(*) > 1;

For each red boat find the number of reservations for this boat?

SQL> Select b.bid, COUNT(*) AS reservationcount from boats b,reserves r where

r.bid=b.bid and b.color=’red’

GROUP BY b.bid;

Find the average age of sailors for each rating level that has at least two sailors ?

SQL> Select s.rating, AVG(s.age) AS avgage from sailors s

GROUP BY s.rating

HAVING COUNT(*) > 1;

AGGREGATION
It is a group operation, which will be works on all records of a table. To do this, Group
functions required to process group of rows and Returns one value from that group.

 These functions are also called AGGREGATE functions or GROUP functions

42
 Aggregate functions - max(),min(),sum(),avg(),count(),count(*).
Group functions will be applied on all the rows but produces single output.
a) SUM
This will give the sum of the values of the specified column.Syntax: sum (column)
Ex: SQL> select sum(sal) from emp;
b) AVG
This will give the average of the values of the specified column.Syntax: avg (column)
Ex: SQL> select avg(sal) from emp;
c) MAX
This will give the maximum of the values of the specified column.Syntax: max (column)
Ex: SQL> select max(sal) from emp;
d) MIN
This will give the minimum of the values of the specified column.Syntax: min (column)
Ex: SQL> select min(sal) from emp;
e) COUNT
This will give the count of the values of the specified column.Syntax: count (column)
Ex: SQL> select count(sal),count(*) from emp;

SUB QUERIES
A subquery is a SQL query nested inside a larger query.
A subquery may occur in :
- A SELECT clause
- A FROM clause

- A WHERE clause

The subquery can be nested inside a SELECT, INSERT, UPDATE, or DELETE


statement or inside another subquery.
A subquery is usually added within the WHERE Clause of another SQL SELECT
statement.

You can use the comparison operators, such as >, <, or =. The comparison operator can
also be a multiple-row operator, such as IN, ANY, or ALL.
43
A subquery is also called an inner query or inner select, while the statement containing a
subquery is also called an outer query or outer select.
The inner query executes first before its parent query so that the results of inner query
can be passed to the outer query.
You can use a subquery in a SELECT, INSERT, DELETE, or UPDATE statement to perform
the following tasks :
Compare an expression to the result of the query.
Determine if an expression is included in the results of the query.
Check whether the query selects any rows.
Syntax :

The subquery (inner query) executes once before the main query (outer query) executes.
The main query (outer query) use the subquery result.

SQL Subqueries Example :


In this section, you will learn the requirements of using subqueries. We have the following two
tables 'student' and 'marks' with common field 'SID'.
SQL> select *from
student1; SID NAME
v001 abhi
v002 abhayv
003 arjun
v004 anand
SQL> select *from marks;
SID TOTALMARKS

v001 95
v002 80
v003 74
v004 81
44
Now we want to write a query to identify all students who get better marks than that of the
student who's StudentID is 'V002', but we do not know the marks of 'V002'.
- To solve this problem, we require two queries.
One query returns the marks (stored in Totalmarks field) of 'V002' and a second query identifies
the students who get better marks than the result of the first query.
SQL> select *from marks where
sid='v002'; SIDTOTAL MARKS
---------- ----------
v002 80
The result of query is 80.
- Using the result of this query, here we have written another query to identify the students who
get better marks than 80.
Second query :
SQL> select s.sid,s.name,m.totalmarks from student1 s, marks m where s.sid=m.sid and
m.totalmarks>80;
SID NAME TOTALMARKS
---- ---------- ----------
v001 abhi 95
v004 anand 81
Above two queries identified students who get better number than the student who's StudentID is
'V002' (Abhi).

You can combine the above two queries by placing one query inside the other. The sub query(also
called the 'inner query') is the query inside the parentheses. See the following code and query result:

45
SQL> select s.sid,s.name,m.totalmarks from student1 s,marks m where s.sid=m.sid and
m.totalmarks >(select totalmarks from marks where sid='v002');
SID NAME TOTALMARKS
---- ---------- ----------
v001 Abhi 95
v004 Anand 81

46
Subqueries: Guidelines
There are some guidelines to consider when using subqueries :
-A subquery must be enclosed in parentheses.
-A subquery must be placed on the right side of the comparison operator.
-Subqueries cannot manipulate their results internally, therefore ORDER BY clause cannot be
added in to a subquery.You can use a ORDER BY clause in the main SELECT
statement(outer query) which will be last clause.
-Use single-row operators with single-row subqueries.
-If a subquery (inner query) returns a null value to the outer query, the outer query will
not return any rows when using certain comparison operators in a WHERE clause.
Type of Subqueries
Single row subquery : Returns zero or one row.
Multiple row subquery : Returns one or more rows.
Multiple column subquery : Returns one or more columns.
Correlated subqueries : Reference one or more columns in the outer SQL statement. The
subquery is known as a correlated subquery because the subquery is related to the outerSQL
statement.
Nested subqueries : Subqueries are placed within another subqueries.
1) SINGLE ROW SUB QUERIES:- Returns zero or one row.
If inner query returns only one row then it is called single row subquery.

Syntax :-

SELECT <collist> FROM <tabname>


WHERE colname OPERATOR (SELECT statement)Operator can be< > <= >= =<>
Examples:- (on Emp Table)

Q: Display employee records whose job equals to job of SMITH?

SQL>SELECT * FROM emp


WHERE job = (SELECT job FROM emp WHERE ename=’SMITH’) ;
Q: Display employee name earning maximum salary ?
SQL>SELECT ename FROM emp
WHERE sal = (SELECT MAX(sal) FROM emp) ;

47
Example2: (on SAILORS _BOAT_RESERVATION DATABASE )

SQL> SELECT * FROM SAILORS;


SID SNAME RATING AGE
---------- ---------- ---------- ----------

SQL> SELECT * FROM BOATS;


BID BNAME COLOR
---------- ---------- ----------
101 INTERLAKE BLUE
102 INTERLAKE RED
103 CLIPPER GREEN
104 MARINE RED
SQL> SELECT * FROM RESERVES;

SID BID DAY


---------- ---------- ---------

48
Q: Find the sailor’s ID whose name is equal to ‘DUSTIN’
SQL> SELECT SID FROM SAILORS WHERE SID = (SELECT SID FROM
SAILORS WHERE SNAME='DUSTIN');
SID
----------
22

Q:Find sailors records whose name equals to ‘ DUSTIN’?

SQL> SELECT *FROM SAILORS WHERE SID = (SELECT SID FROMSAILORS


WHERE SNAME='DUSTIN');

SID SNAME RATING AGE


22 DUSTIN 7 45
Q:Find the rating of a sailor whose name is ‘DUSTIN’.

SQL> SELECT RATING FROM SAILORS WHERE SID = (SELECT SID FROM SAILORS
WHERESNAME='DUSTIN');
RATING

7
Q: Find the sailors records whose sid is geater than ‘dustin’?
SQL> SELECT *FROM SAILORS WHERE SID > (SELECT SID FROMSAILORS
WHERE SNAME='DUSTIN');

49
Q:Find the sailors records ,whose sailors’ having maximum rating .
SQL> SELECT *FROM SAILORS WHERE RATING = (SELECTMAX(RATING)
FROM SAILORS);
SID SNAME RATING AGE
---------- ---------- ---------- ----------
58 RUSTY 10 35
71 ZOBRA 10 16

Q:Find the records of sailors whose rating is same as ‘DUSTIN’


SQL> SELECT *FROM SAILORS WHERE RATING = (SELECT RATING
FROM SAILORS WHERE SNAME='DUSTIN');

SID SNAME RATING AGE


---------- ---------- ---------- ----------
22 DUSTIN 7 45

64 HORATIO 7 35

50
MULTI ROW SUB QUERIES:

if inner query returns more than one row then it is called multi row subquery.

Syntax :-

SQL>SELECT <collist> FROM <tabname>WHERE colname OPERATOR (SELECT statement) ;

Here, OPERATOR must be IN , NOT IN, ANY, ALL


IN operator :-

To test for values in a specified list of values, use IN operator. The IN operator can be used with
any data type. If characters or dates are used in the list, they must be enclosed in single quotation
marks (’’).

Syntax:-

IN (V1,V2,V3---------);

Note :-

IN ( ... ) is actually translated by Oracle server to a set of ‘OR’ conditions: a =value1 OR a =


value2 OR a = value3. So using IN ( ... ) has no performance benefits, and it is used for logical
simplicity.

Example :-

Q:Display employee records working as CLERK OR MANAGER ?

SQL>SELECT * FROM emp WHERE job IN (‘CLERK’,’MANAGER’) ;


Q:Find the name of sailors who have reserved boat 103

SQL> SELECT S.SNAME FROM SAILORS S WHERE S.SID IN (SELECT R.SIDFROM


RESERVES R WHERE R.BID=103);
SNAME
----------DUSTIN LUBBER HORATIO

51
Q:Find the name of sailors who have reserved a red boat

SQL> SELECT S.SNAME FROM SAILORS S WHERE S.SID IN (SELECT R.SID FROM
RESERVES R WHERE R.BID IN (SELECT B.BID FROM BOATS B WHERE
B.COLOR='RED'));
SNAME
----------DUSTIN LUBBER HORATIO
Q:Find the names of sailors who have not reserved a red boat.

SELECT S.SNAME FROM SAILORS S WHERE S.SID NOT IN (SELECT R.SID FROM
RESERVES R WHERE R.BID IN (SELECT B.BID FROM BOATS B WHERE B.COLOR
= 'RED'));
SNAME
----------BRUTUS CANDY RUSTY ZOBRA HORATIO ART
BOB
Using EXISTS Operator :-
EXISTS operator returns TRUE or FALSE.

If inner query returns at least one record then EXISTS returns TRUE otherwise returns

FALSE. ORACLE recommends EXISTS and NOT EXISTS operators instead of IN and NOT

IN.

Q: Findthe name of sailors who have reserved boat 103

SQL> SELECT S.SNAME FROM SAILORS S WHERE EXISTS (SELECT * FROM


RESERVES R WHERE R.BID=103 AND R.SID = S.SID) ;

SNAME
----------DUSTIN LUBBER HORATIO

Q:Find the name of sailors who have not reserved boat 103

SQL> SELECT S.SNAME FROM SAILORS S WHERE NOT EXISTS (SELECT *FROM
RESERVES R WHERE R.BID=103 AND R.SID = S.SID) ;
SNAME
----------BRUTUS CANDY RUSTY HORATIO ZOBRA ART
BOB

52
ANY operator:-

Compares a value to each value in a list or returned by a query. Must be preceded by =, !=, >, <,
<=, >=. Evaluates to FALSE if the query returns no rows.
Select employees whose salary is greater than any salesman’s salary ?

SQL>SELECT ename FROM emp


WHERE SAL > ANY ( SELECT sal FROM emp WHERE job = 'SALESMAN');
Q:Find sailors whose rating is better than some sailor called Horatio?

SQL> SELECT S.SID FROM SAILORS S WHERE S.RATING > ANY ( SELECT
S2.RATING FROM SAILORS S2 WHERE S2.SNAME=’HORATIO’) ;
SID
58
71
74
31
32
ALL operator :-
Compares a value to every value in a list or returned by a query. Must be preceded by =, !=, >,
<, <=, >=. evaluates to TRUE if the query returns no rows.

Example:-

Select employees whose salary is greater than every salesman’s salary ?


SQL>SELECT ename FROM emp
WHERE SAL > ALL ( SELECT sal FROM emp WHERE job = 'SALESMAN');
Q:Find sailors whose rating is better than every sailor called Horation?

SQL> SELECT S.SID FROM SAILORS S WHERE S.RATING > ALL ( SELECT
S2.RATING FROM SAILORS S2 WHERE S2.SNAME=’HORATIO’) ;
SID
58
71
Multi Column Subqueries:-
If inner queryreturns more than one column value then it is called MULTI COLUMN subquery.
Example :-
Display employee names earning maximum salaries in their dept ?
SQL>SELECT ename FROM emp WHERE (deptno,sal) IN (SELECT deptno,MAX(sal)
FROM emp GROUP BY deptno) ;
SQL> SELECT SNAME FROM SAILORS WHERE (RATING,AGE) IN (SELECT
53
RATING,MAX(AGE) FROM SAILORS GROUP BY RATING);
SNAME
----------DUSTIN BRUTUS LUBBER RUSTY HORATIO BOB
SQL> SELECT SID,SNAME FROM SAILORS WHERE (RATING,AGE) IN (SELECT
RATING,MAX(AGE) FROM SAILORS GROUP BY RATING);
SID SNAME

Nested Queries:-
A subquery embedded in another subquery is called NESTED

QUERY. Queries can be nested upto 255 level.

Example :-
Display employee name earning second maximum salary ?
SQL>SELECT ename FROM emp
WHERE sal = (SELECT MAX(sal) FROM EMP
WHERE sal < (SELECT MAX(sal) FROM emp))
;

Q:Find the names of sailors who have not reserved a red boat.

SELECT S.SNAME FROM SAILORS S WHERE S.SID NOT IN (SELECT R.SID FROM
RESERVES R WHERE R.BID IN (SELECT B.BID FROM BOATS B WHERE B.COLOR
= 'RED'));
SNAME

BRUTUS CANDY RUSTY ZOBRA HORATIO ARTBOB

54
CORRELATED SUB QUERIES:
In the Co-Related sub query a parent query will be executed first and based on the output ofouter
query the inner query execute.
If parent query returns N rows ,inner query executed for N times.
If a subquery references one or more columns of parent query is called CO-RELATED
subquery because it is related to outer query. This subquery executes once for each and every
row of main query.

Example1 :-

Display employee names earning more than avg(sal) of their dept ?

SQL>SELECT ename FROM emp x


WHERE sal > (SELECT AVG(sal) FROM emp
WHERE deptno=x.deptno);

Example2: Find sailors whose rating more than avg(rating ) of their id.
SQL> SELECT S.SNAME FROM SAILORS S WHERE RATING > (SELECT AVG(RATING)
FROMSAILORS WHERE SID=S.SID);

no rows selected.

55
SUB QUERIES WITH SET OPERATORS:
Q1) Find the names of sailors who have reserved a red or a green boat?

SQL> Select s.sname from sailors s, reserves r, boats b where s.sid=r.sid andr.bid=b.bid
and (b.color = ‘red’ or b.color= ‘green’);

Or

SQL> Select s.sname from sailors s, reserves r, boats b where s.sid=r.sid andr.bid=b.bid and
b.color=’red’

UNION

Select s.sname from sailors s, reserves r, boats b where s.sid=r.sid andr.bid=b.bid and
b.color=’green’;

SNAME

Dustin

Lubber

Horatio

Q2) Find the names of sailors who have reserved a red and a green boat?

SQL> Select s.sname from sailors s, reserves r, boats b where s.sid=r.sidand


r.bid=b.bid and b.color=’red’

INTERSECT

Select s.sname from sailors s, reserves r, boats b where s.sid=r.sid andr.bid=b.bid and
b.color=’green’;

SNAME

Dustin

Lubber

Horatio

56
Q3) Find the names of sailors wh have reserved a red bo at but not green boat?

SQL> Select s.sname from sailors s, reserves r, ts b where s.sid=r.sidand r.bid=b.bid


bo and b.color=’red’

MINUS

Select s.sname from sailors s, reserves r, boats b ere s.sid=r.sid andr.bid=b.bid and
w b.color=’green’;
NO ROWS SELECTED

Q4) Find all sids of sailors who h ve a rating of 10 or reserved boat 104?

SQL>select s.sid from sailors s where ION


s.rating=10U Select r.sid from reserves r

where r.bid=104
SID

22

31

58

71

57
UNIT-IV

Schema Refinement(Normalization)

Purpose of Normalization:
Normalization is the process of reducing data redundancy in a table and improving data integrity. Data normalization is a
technique used in databases to organize data efficiently.Data normalization ensures that your data remains clean, consistent,
and error-free by breaking it into smaller tables and linking them through relationships. This process reduces redundancy,
improves data integrity, and optimizes database performance. Then why do you need it? If there is no normalization in SQL,
there will be many problems, such as:
 Insert Anomaly: This happens when we cannot insert data into the table without another.
 Update Anomaly: This is due to data inconsistency caused by data redundancy and data update.
 Delete exception: Occurs when some attributes are lost due to the deletion of other attributes.
 So normalization is a way of organizing data in a database. Normalization involves organizing the columns and
tables in the database to ensure that their dependencies are correctly implemented using database constraints.
 Normalization is the process of organizing data properly. It is used to minimize the duplication of various
relationships in the database. It is also used to troubleshoot exceptions such as inserts, deletes, and updates in the
table. It helps to split a large table into several small normalized tables. Relational links and links are used to
reduce redundancy. Normalization, also known as database normalization or data normalization, is an important
part of relational database design because it helps to improve the speed, accuracy, and efficiency of the database.
Need For Normalization
 It eliminates redundant data.
 It reduces chances of data error.
 The normalization is important because it allows database to take up less disk space.
 It also help in increasing the performance.
 It improves the data integrity and consistency.
Advantages
 By using normalization redundancy of database or data duplication can be resolved.
 We can minimize null values by using normalization.
 Results in a more compact database (due to less data redundancy/zero).
 Minimize/avoid data modification problems.
 It simplifies the query.is
 The database structure is clearer and easier to understand.
 The database can be expanded without affecting existing data.
 Finding, sorting, and indexing can be faster because the table is small and more rows can be accommodated on the
data page

Normalization (or schema refinement) is a critical process in database design that aims to eliminate redundancy and
improve the integrity of data. Its primary purpose is to organize the data in a way that reduces duplication and ensures
consistency, making databases more efficient, reliable, and scalable. Here are the key purposes of normalization and schema
refinement:

1. Eliminate Data Redundancy

 Problem: Without normalization, data might be repeated in multiple places within a database, leading to
inconsistencies and storage inefficiencies.
 Solution: Normalization breaks down tables into smaller, related ones, ensuring that each piece of information
is stored in only one place, thereby reducing duplication.

58
2. Avoid Update Anomalies

 Problem: When redundant data exists, updating a piece of information may require multiple updates across
different tables or records, which can lead to errors if one update is missed.
 Solution: Normalization ensures that changes only need to be made in one place. This reduces the chances
of inconsistencies arising from partial updates.

3. Improve Data Integrity

 Problem: With redundant data, it's possible to have inconsistencies where data might contradict itself across
different parts of the database.
 Solution: By organizing data into related, smaller tables, normalization enforces integrity constraints (such as
referential integrity) that maintain the accuracy and consistency of the database.

4. Minimize Insert, Delete, and Update Anomalies

 Problem: When data is stored in an unnormalized form, certain operations (like inserting, deleting, or updating
records) might lead to anomalies (e.g., having a null value where it shouldn't be, or accidentally deleting
important information).
 Solution: Normalization helps design the schema in a way that these operations are less likely to result in
inconsistencies or errors.

5. Efficient Storage Utilization

 Problem: Storing redundant data in a single table can waste disk space.
 Solution: Through normalization, storage is optimized because each piece of information is stored once, and
related data is grouped appropriately.

6. Enhance Query Efficiency

 Problem: Unnormalized databases with redundant data may require more complex queries to fetch or update
information, as relationships between data elements are not as clearly defined.
 Solution: In a normalized schema, relationships between tables are clear, and querying the database is more
efficient, as it allows for better indexing and streamlined data retrieval.

7. Promote Flexibility and Scalability

 Problem: A poorly structured schema may need significant rework as the database evolves or grows,
especially if new data relationships need to be introduced.
 Solution: Normalization encourages flexibility. It provides a clear structure that makes it easier to add new
attributes or relationships without major changes to the existing schema.

Levels of Normalization

Normalization typically occurs in stages, known as normal forms (NF), each addressing different issues:

 1NF (First Normal Form): Ensures that the table has no repeating groups and all attributes are atomic.
 2NF (Second Normal Form): Addresses partial dependencies, ensuring that non-key attributes depend on the
entire primary key.
 3NF (Third Normal Form): Ensures that no transitive dependencies exist between non-key attributes and the
primary key.
59
 Higher normal forms like BCNF, 4NF, and 5NF address more specific cases of dependency.

Functional dependency

A functional dependency occurs when one attribute uniquely determines another attribute within a relation. It is a
constraint that describes how attributes in a table relate to each other. If attribute A functionally determines attribute B we
write this as the A→B.
Functional dependencies are used to mathematically express relations among database entities and are very important to
understanding advanced concepts in Relational Database Systems.
Example:

roll_no name dept_name dept_building

42 abc CO A4

43 pqr IT A3

44 xyz CO A4

45 xyz IT A3

46 mno EC B2

47 jkl ME B2

From the above table we can conclude some valid functional dependencies:
 roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine values of fields name,
dept_name and dept_building, hence a valid Functional dependency
 roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name, dept_building}, it
can determine its subset dept_name also.
 dept_name → dept_building , Dept_name can identify the dept_building accurately, since departments with
different dept_name will also have a different dept_building
 More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name, dept_building},
etc.
Here are some invalid functional dependencies:
 name → dept_name Students with the same name can have different dept_name, hence this is not a valid
functional dependency.
 dept_building → dept_name There can be multiple departments in the same building. Example, in
the above table departments ME and EC are in the same building B2, hence dept_building → dept_name is an
invalid functional dependency.
 More invalid functional dependencies: name → roll_no, {name, dept_name} → roll_no, dept_building →
roll_no, etc.

Types of Functional Dependencies in DBMS


1. Trivial functional dependency
60
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency

1. Trivial Functional Dependency


In Trivial Functional Dependency, a dependent is always a subset of the determinant.
i.e. If X → Y and Y is the subset of X, then it is called trivial functional dependency
Example:
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a subset of determinant
set {roll_no, name}. Similarly, roll_no → roll_no is also an example of trivial functional dependency.

2. Non-trivial Functional Dependency


In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant. i.e. If X → Y
and Y is not a subset of X, then it is called Non-trivial functional dependency.
Example:
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a subset of
determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional dependency, since age is not a
subset of {roll_no, name}
3. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not dependent on each other. i.e. If a → {b,
c} and there exists no functional dependency between b and c, then it is called a multivalued functional dependency.

For example,
roll_no name age

61
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

45 abc 19

Here, roll_no → {name, age} is a multivalued functional dependency, since the dependents name & age are not
dependent on each other(i.e. name → age or age → name doesn’t exist !)
4. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on determinant.
i.e. If a → b & b → c, then according to axiom of transitivity, a → c. This is a transitive functional dependency.
For example,
enrol_no name dept building_no

42 abc CO 4

43 pqr EC 2

44 xyz IT 1

45 abc EC 2

Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of transitivity, enrol_no →
building_no is a valid functional dependency. This is an indirect functional dependency, hence called Transitive
functional dependency.
5. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines another attribute or set of
attributes. If a relation R has attributes X, Y, Z with the dependencies X->Y and X->Z which states that those
dependencies are fully functional.
6. Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the composite key, rather than the
whole key. If a relation R has attributes X, Y, Z where X and Y are the composite key and Z is non key attribute. Then X-
>Z is a partial functional dependency in RBDMS.

Advantages of Functional Dependencies


1. Data Normalization
Data normalization is the process of organizing data in a database in order to minimize redundancy and increase data
integrity. Functional dependencies play an important part in data

62
normalization. With the help of functional dependencies we are able to identify the primary key, candidate key in a table
which in turns helps in normalization.
2. Query Optimization
With the help of functional dependencies we are able to decide the connectivity between the tables and the necessary
attributes need to be projected to retrieve the required data from the tables. This helps in query optimization and improves
performance.
3. Consistency of Data
Functional dependencies ensures the consistency of the data by removing any redundancies or inconsistencies that may
exist in the data. Functional dependency ensures that the changes made in one attribute does not affect inconsistency in
another set of attributes thus it maintains the consistency of the data in database.
4. Data Quality Improvement
Functional dependencies ensure that the data in the database to be accurate, complete and updated. This helps to improve
the overall quality of the data, as well as it eliminates errors and inaccuracies that might occur during data analysis and
decision making, thus functional dependency helps in improving the quality of data in database.

NORMALIZATION

Normalization is a process of organizing the data in database to avoid data redundancy,


insertion anomaly, update anomaly & deletion anomaly. Let’s discuss about anomalies first then we will discuss normal
forms with examples.

Anomalies in DBMS

There are three types of anomalies that occur when the database is not normalized. These are – Insertion, update
and deletion anomaly. Let’s take an example to understand this.

Example:
Suppose a manufacturing company stores the employee details in a table named employee that has four attributes:
emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for storing employee’s
address and emp_dept for storing the department details in which the employee works. At some point of time the table
looks like this:

emp_id emp_name emp_address emp_dept


101 Rick Delhi D001
101 Rick Delhi D002
123 Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai D004

The above table is not normalized. We will see the problems that we face when a table is not normalized.

Update anomaly:
In the above table we have two rows for employee Rick as he belongs to two departments of the company. If we
want to update the address of Rick then we have to update the same in two rows or the data will become inconsistent. If
somehow, the correct address gets updated in one department but not in other then as per the database, Rick would be
having two different addresses, which is not correct and would lead to inconsistent data.

63
Insert anomaly:
Suppose a new employee joins the company, who is under training and currently not assigned to any department then
we would not be able to insert the data into the table if emp_dept field doesn’t allow nulls.

Delete anomaly:
Suppose, if at a point of time the company closes the department D890 then deleting the rows that are having
emp_dept as D890 would also delete the information of employee Maggie since she is assigned only to this department.

Normalization

Here are the most commonly used normal forms:

 First normal form(1NF)


 Second normal form(2NF)
 Third normal form(3NF)
 Boyce & Codd normal form (BCNF)

First normal form (1NF)

As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values. It should hold only
atomic values.

Example: Suppose a company wants to store the names and contact details of its employees. It creates a table that looks
like this:

emp_id emp_name emp_address emp_mobile


101 Herschel New Delhi 8912312390
8812121212
102 Jon Kanpur
9900012222
103 Ron Chennai 7778881212
9990000123
104 Lester Bangalore
8123450987

Two employees (Jon & Lester) are having two mobile numbers so the company stored them in the same field as you can
see in the table above.

This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”, the emp_mobile values
for employees Jon & Lester violates that rule.

To make the table complies with 1NF we should have the data like this:

emp_id emp_name emp_address emp_mobile


101 Herschel New Delhi 8912312390

64
102 Jon Kanpur 8812121212
102 Jon Kanpur 9900012222
103 Ron Chennai 7778881212
104 Lester Bangalore 9990000123
104 Lester Bangalore 8123450987

Second normal form (2NF)

A table is said to be in 2NF if both the following conditions hold:

 Table is in 1NF (First normal form)


 No non-prime attribute is dependent on the proper subset of any candidate key of table.

An attribute that is not part of any candidate key is known as non-prime attribute.

Example: Suppose a school wants to store the data of teachers and the subjects they teach. They create a table that looks
like this: Since a teacher can teach more than one subjects, the table can have multiple rows for a same teacher.

teacher_id subject teacher_age


111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40

Candidate Keys: {teacher_id, subject}


Non prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non prime attribute
teacher_age is dependent on teacher_id alone which is a proper subset of candidate key. This violates the rule for 2NF as the
rule says “no non-prime attribute is dependent on the proper subset of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40

teacher_subject table:

teacher_id subject
111 Maths
65
111 Physics
222 Biology
333 Physics
Now the tables comply with Second normal form (2NF).

Third Normal form (3NF)

A table design is said to be in 3NF if both the following conditions hold:

 Table must be in 2NF


 Transitive functional dependency of non-prime attribute on any super key should be removed.

An attribute that is not part of any candidate key is known as non-prime attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional
dependency X-> Y at least one of the following conditions hold:

 X is a super key of table


 Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.

Example: Suppose a company wants to store the complete address of each employee, they create a table named
employee_details that looks like this:

emp_id emp_name emp_zip emp_state emp_city emp_district


1001 John 282005 UP Agra Dayal Bagh
1002 Ajeet 222008 TN Chennai M-City
1006 Lora 282007 TN Chennai Urrapakkam
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on


Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any candidate keys.

Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id that makes
non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super key (emp_id). This violates
the rule of 3NF.

To make this table complies with 3NF we have to break the table into two tables to remove the transitive dependency:

employee table:

emp_id emp_name emp_zip


1001 John 282005
1002 Ajeet 222008
66
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999

employee_zip table:

emp_zip emp_state emp_city emp_district


282005 UP Agra Dayal Bagh
222008 TN Chennai M-City
282007 TN Chennai Urrapakkam
292008 UK Pauri Bhagwan
222999 MP Gwalior Ratan

Boyce Codd normal form (BCNF)

It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table complies
with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.

Example: Suppose there is a company wherein employees work in more than one department. They store the data like
this:

emp_id emp_nationality emp_dept dept_type dept_no_of_emp


1001 Austrian Production and planning D001 200
1001 Austrian stores D001 250
1002 American design and technical support D134 100
1002 American Purchasing department D134 600

Functional dependencies in the table above:


emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys.

To make the table comply with BCNF we can break the table in three tables like this:

emp_nationality table:

emp_id emp_nationality
1001 Austrian
1002 American

emp_dept table:

emp_dept dept_type dept_no_of_emp


67
Production and planning D001 200
stores D001 250
design and technical support D134 100
Purchasing department D134 600

68
emp_dept_mapping table:

emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical support
1002 Purchasing department

Functional dependencies:

emp_id -> emp_nationality


emp_dept -> {dept_type, dept_no_of_emp}

Candidate keys:

For first table: emp_id


For second table: emp_dept
For third table: {emp_id, emp_dept}
Decomposition
In Database Management Systems (DBMS), decomposition is the process of breaking down a relation (table) into
smaller relations to achieve a desired level of normalization, while preserving the data's integrity and minimizing redundancy.
The goal is to eliminate anomalies such as update, insert, and delete anomalies that can arise in unnormalized databases.
Decomposition is essential for ensuring that a database is in a higher normal form (e.g., 2NF, 3NF, BCNF), which helps
to:
 Minimize redundancy in data storage
 Improve data integrity by avoiding inconsistency
 Ensure consistency in data updates and queries

Key Types of Decomposition

1. Lossless Join Decomposition


2. Dependency-Preserving Decomposition

1. Lossless Join Decomposition

A lossless join decomposition ensures that when a relation is decomposed into smaller relations, the original relation
can be reconstructed by performing a natural join (or an equivalent join) on the decomposed relations. In other
words, no information is lost during the decomposition process.

Characteristics:

 Reconstructibility: You can join the decomposed relations back to get the original relation.
 No Data Loss: The decomposition is "lossless" because it preserves all the original tuples from the relation.

Conditions for Lossless Join:

To ensure that a decomposition is lossless, one of the following conditions must be met:

 If the intersection of the decomposed relations (the common attributes) forms a candidate key of at least one
of the decomposed relations, the decomposition69 is lossless.
o Example: If you decompose a relation R into R1(A, B) and R2(B, C) and B is a candidate key in either
R1 or R2, the decomposition is lossless.

Example:

Given the relation R(A, B, C) with the functional dependency A → B, you can decompose it into two smaller relations:

 R1(A, B) and R2(A, C)

Here, the common attribute is A. Since A is a key attribute in R1, the decomposition is lossless.

R1(A, B) R2(A, C)
A1, B1 A1, C1
A2, B2 A2, C2

By performing a natural join on R1 and R2 using the common attribute A, we can reconstruct the original relation
R(A, B, C).

2. Dependency-Preserving Decomposition

A dependency-preserving decomposition ensures that all the original functional dependencies (FDs) can still be
enforced in the decomposed relations. This is important for maintaining data integrity.

Characteristics:

 Preservation of Dependencies: The decomposed relations must still enforce the original
functional dependencies.
 Efficiency: Dependency-preserving decompositions help to avoid the overhead of recomputing dependencies
after joining decomposed relations.

Example:

Consider a relation R(A, B, C) with functional dependencies:

 A→B
 B→C

If we decompose R into:

 R1(A, B) (preserving A → B)
 R2(B, C) (preserving B → C)

The decomposition preserves all functional dependencies, so this is a dependency-preserving decomposition.

70
Decomposition and Normal Forms

Decomposition is an essential tool in achieving higher normal forms (such as 2NF, 3NF, and BCNF). The
normalization process involves decomposing a relation to remove redundancies and dependencies that violate the
conditions of the higher normal forms.

For example:

 1NF: No repeating groups or arrays (this is achieved by decomposition if necessary).


 2NF: Eliminate partial dependencies (decomposing a relation that has partial dependencies).
 3NF: Eliminate transitive dependencies (further decomposition to eliminate transitive dependencies).
 BCNF: Ensure that all determinants are candidate keys (decomposition may be required if there are violations).

Steps in Decomposition

1. Identify Functional Dependencies:


First, identify all the functional dependencies that apply to the relation.
2. Check for Violations of Normal Forms:
If the relation violates a higher normal form (e.g., partial, transitive dependencies, or BCNF violations), decompose it
into smaller relations that satisfy the normal form.
3. Decompose the Relation:
Based on the violations, decompose the relation into smaller relations. Ensure that the decomposition satisfies the
normal form while aiming for both lossless join and dependency-preserving properties.
4. Check for Lossless Join and Dependency Preservation:
After decomposing, verify that the decomposition is lossless (can be joined back to the original relation) and
dependency-preserving (the original functional dependencies are preserved).
5. Optimize for Performance (Optional):
In some cases, decompositions that result in multiple relations might lead to inefficient queries due to the need
for many joins. In practice, sometimes a denormalization step is used, where certain aspects of normalization are
reversed for performance reasons (at the cost of data redundancy).

Example of Decomposition Process

Let’s take a simple relation R(A, B, C) with the following functional dependencies:

 A→B
 B→C
 A → C (transitive dependency)

Step 1: Identify Violations of Normal Forms

 The relation is not in 2NF because B → C is a transitive dependency (i.e., non-key attribute C depends on
another non-key attribute B, which depends on A).
 The relation is not in 3NF because of the transitive dependency.

Step 2: Decompose to 3NF

We can decompose the relation into two smaller relations that satisfy 3NF:

 R1(A, B) (because A → B) 71
 R2(B, C) (because B → C)
Step 3: Verify Lossless Join and Dependency Preservation

 Lossless Join: We can join R1(A, B) and R2(B, C) on B to get the original relation R(A, B, C).
 Dependency Preservation: The dependencies A → B and B → C are preserved in R1 and R2.

Properties of Decomposition

1. Lossless Join:
The decomposition must allow the original relation to be reconstructed through a natural join of the
decomposed relations without losing any information.
2. Dependency Preservation:
The decomposition must ensure that all functional dependencies in the original relation can be enforced in the
decomposed relations. Ideally, the original functional dependencies should be expressible in the decomposed
relations without requiring joins.
Surrogate Key
A surrogate key also called a primary key, is generated when a new record is inserted into a table automatically
by a database that can be declared as the primary key of that table. It is the sequential number outside of the database
that is made available to the user and the application or it acts as an object that is present in the database but is not
visible to the user or application.
We can say that, in case we do not have a natural primary key in a table, then we need to artificially create
one in order to uniquely identify a row in the table, this key is called the surrogate key or synthetic primary key of
the table. However, the surrogate key is not always the primary key. Suppose we have multiple objects in a database
that are connected to the surrogate key, then we will have a many-to-one association between the primary keys and
the surrogate key and the surrogate key cannot be used as the primary key.
Features of the Surrogate Key
 It is automatically generated by the system.
 It holds an anonymous integer.
 It contains a unique value for all records of the table.
 The value can never be modified by the user or application.
 The surrogate key is called the factless key as it is added just for our ease of identification of unique values
and contains no relevant fact(or information) that is useful for the table.

Example:
Suppose we have two tables of two different schools having the same column registration_no, name, and percentage, each
table having its own natural primary key, that is registration_no.
Table of school A:
registration_no name percentage

210101 Harry 90

210102 Maxwell 65

210103 Lee 87

210104 Chris 76

72
Table of school B:
registration_no name percentage

CS107 Taylor 49

CS108 Simon 86

CS109 Sam 96

CS110 Andy 58

Now, suppose we want to merge the details of both the schools in a single table. Resulting table
will be:
surr_no registration_no name percentage

1 210101 Harry 90

2 210102 Maxwell 65

3 210103 Lee 87

4 210104 Chris 76

5 CS107 Taylor 49

6 CS108 Simon 86

7 CS109 Sam 96

8 CS110 Andy 58

As we can observe the above table and see that registration_no cannot be the primary key of the table as it does not
match with all the records of the table though it is holding all unique values of the table . Now , in this case, we have to
artificially create one primary key for this table. We can do this by adding a column surr_no in the table that contains
anonymous integers and has no direct relation with other columns . This additional column of surr_no is the surrogate key
of the table.

Use of Surrogate Key


There are several reasons to use surrogate keys in database tables:
1. Uniqueness: Data integrity is improved by the guaranteed uniqueness of surrogate keys.
2. Stability: Since surrogate keys do not depend on any business rules or data value, they have a lower chance
of changing over time.
3. Efficiency: Compared to natural keys, surrogate keys are frequently smaller and process more quickly.
73 rows can still be uniquely identified using surrogate keys.
4. Flexibility: In the event that the natural key changes,
Advantages of the Surrogate Key
 As there is no direct information related with the table, so the changes are only based on the requirements of the
application.
 Performance is enhanced as the value of the key is relatively smaller.
 The key value is guaranteed to contain unique information .
 As it holds smaller constant values , this makes integration of the table easy .
 Enables us to run fast queries (as compared to the natural primary key)
Disadvantages of the Surrogate Key
 The surrogate key value can never be used as a search key.
 As the key value has no relation to the data of the table, so third normal form is violated.
 The extra column for surrogate key will require extra disk space.
 We will need extra IO when we have to insert or update data of the table.
Examples of Surrogate Key
 System date & time stamp
 Random alphanumeric string

Boyce-Codd Normal Form (BCNF)


Boyce-Codd Normal Form (BCNF) An advanced version of the third normal form, which is a little more strict.
This is a tougher criterion that helps eliminate redundancy and anomalies from your Database. BCNF (Boyce-Codd
Normal Form) is a further refinement of the third normal form which simplifies our database design.
When is a relation in BCNF: A formula for FD is present, where the left-hand side is super key this solves
potential problems with candidate keys as well.
BCNF is essential for good database schema design in higher-level systems where consistency and efficiency are
important, particularly when there are many candidate keys (as one often finds with a delivery system).
Rules for BCNF
Rule 1: The table should be in the 3rd Normal Form.
Rule 2: X should be a superkey for every functional dependency (FD) X−>Y in a given relation.

To determine the highest normal form of a given relation R with functional dependencies, the first step is to check
whether the BCNF condition holds. If R is found to be in BCNF, it can be safely deduced that the relation is also in 3NF,
2NF, and 1NF as the hierarchy shows. The 1NF has the least restrictive constraint – it only requires a relation R to have
atomic values in each tuple. The 2NF has a slightly more restrictive constraint.
The 3NF has a more restrictive constraint than the first two normal forms but is less restrictive than the BCNF. In this
manner, the restriction increases as we traverse down the hierarchy.

74
Examples
Here, we are going to discuss some basic examples which let you understand the properties of BCNF. We will discuss
multiple examples here.
Example 1
Let us consider the student database, in which data of the student are mentioned.
Stu_ID Stu_Branch Stu_Course Branch_Number Stu_Course_No

101 Computer Science & DBMS B_001 201


Engineering

101 Computer Science & Computer B_001 202


Engineering Networks

102 Electronics & Communication VLSI Technology B_003 401


Engineering

102 Electronics & Communication Mobile B_003 402


Engineering Communication

Functional Dependency of the above is as mentioned:


Stu_ID −> Stu_Branch
Stu_Course −> {Branch_Number, Stu_Course_No}
Candidate Keys of the above table are: {Stu_ID, Stu_Course}
Why this Table is Not in BCNF
The table present above is not in BCNF, because as we can see that neither Stu_ID nor Stu_Course is a Super Key. As the
rules mentioned above clearly tell that for a table to be in BCNF, it must follow the property that for functional
dependency X−>Y, X must be in Super Key and here this property fails, that’s why this table is not in BCNF.
How to Satisfy BCNF
For satisfying this table in BCNF, we have to decompose it into further tables. Here is the full procedure through which
we transform this table into BCNF. Let us first divide this main table into two tables Stu_Branch and Stu_Course
Table.
Stu_Branch Table
Stu_ID Stu_Branch

101 Computer Science & Engineering

102 Electronics & Communication Engineering

75
Candidate Key for this table: Stu_ID.
Stu_Course Table
Stu_Course Branch_Number Stu_Course_No

DBMS B_001 201

Computer Networks B_001 202

VLSI Technology B_003 401

Mobile Communication B_003 402

Candidate Key for this table: Stu_Course. Stu_ID


to Stu_Course_No Table
Stu_ID Stu_Course_No

101 201

101 202

102 401

102 402

Candidate Key for this table: {Stu_ID, Stu_Course_No}.


After decomposing into further tables, now it is in BCNF, as it is passing the condition of Super Key, that in functional
dependency X−>Y, X is a Super Key.
Example 2
Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:
{ BC->D, AC->BE, B->E }
Explanation:
 Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets can determine all attributes of the
relation, So AC will be the candidate key. A or C can’t be derived from any other attribute of the relation, so
there will be only 1 candidate key {AC}.
 Step-2: Prime attributes are those attributes that are part of candidate key {A, C} in this example and others
will be non-prime {B, D, E} in this example.
 Step-3: The relation R is in 1st normal form as a relational DBMS does not allow multi- valued or
composite attributes.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper subset of candidate key AC)
and AC->BE is in 2nd normal form (AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset of
candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a prime attribute) and in B-
>E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be
super key or RHS should be a prime attribute. So the highest normal form of relation will be the 2nd Normal form.

76
Multivalued Dependency (MVD)
In Database Management Systems (DBMS), multivalued dependency (MVD) deals with complex attribute
relationships in which an attribute may have many independent values while yet depending on another attribute or group
of attributes. It improves database structure and consistency and is essential for data integrity and database normalization.
MVD or multivalued dependency means that for a single value of attribute ‘a’ multiple values of attribute ‘b’ exist. We
write it as,
a --> --> b
It is read as a is multi-valued dependent on b. Suppose a person named Geeks is working on 2 projects
Microsoft and Oracle and has 2 hobbies namely Reading and Music. This
can be expressed in a tabular format in the following way.

Example

Project and Hobby are multivalued attributes as they have more than one value for a single person i.e., Geeks.

When one attribute in a database depends on another attribute and has many independent values, it is said to
have multivalued dependency (MVD). It supports maintaining data accuracy and managing intricate data interactions.

Multi Valued Dependency (MVD)


We can say that multivalued dependency exists if the following conditions are met.
Conditions for MVD
Any attribute say a multiple define another attribute b; if any legal relation r(R), for all pairs of tuples t1 and t2 in r,
such that,
t1[a] = t2[a]
Then there exists t3 and t4 in r such that.
t1[a] = t2[a] = t3[a] = t4[a]
t1[b] = t3[b]; t2[b] = t4[b]

77
t1 = t4; t2 = t3
Then multivalued (MVD) dependency exists. To check the MVD in given table, we apply the conditions stated above
and we check it with the values in the given table.

Example

Condition-1 for MVD


t1[a] = t2[a] = t3[a] = t4[a]
Finding from table,
t1[a] = t2[a] = t3[a] = t4[a] = Geeks
So, condition 1 is Satisfied.
Condition-2 for MVD
t1[b] = t3[b]
And
t2[b] = t4[b]
Finding from table,
t1[b] = t3[b] = MS
And
t2[b] = t4[b] = Oracle
So, condition 2 is Satisfied.
Condition-3 for MVD
∃c ∈ R-(a 𝖴 b) where R is the set of attributes in the relational
table. t1 = t4
And
t2=
Finding from table,
t1 = t4 =
Reading And
t2 = t3 = Music
So, condition 3 is Satisfied. All conditions are satisfied, therefore,
a --> --> b
According to table we have got,
name --> --> project
And for,
a --> --> C
We get,
name --> --> hobby
Hence, we know that MVD exists in the above table and it can be stated by,
78
name --> -->
project name -->

79
Fourth Normal Form (4NF)
The Fourth Normal Form (4NF) is a level of database normalization where there are no non-trivial multivalued
dependencies other than a candidate key. It builds on the first three normal forms (1NF, 2NF, and 3NF) and the Boyce-
Codd Normal Form (BCNF). It states that, in addition to a database meeting the requirements of BCNF, it must not
contain more than one multivalued dependency.
Properties
A relation R is in 4NF if and only if the following conditions are satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. The table should not have any Multi-valued Dependency.
A table with a multivalued dependency violates the normalization standard of the Fourth Normal Form (4NF)
because it creates unnecessary redundancies and can contribute to inconsistent data. To bring this up to 4NF, it is
necessary to break this information into two tables.
Example:
Consider the database table of a class that has two relations R1 contains student ID(SID) and student name
(SNAME) and R2 contains course id(CID) and course name (CNAME).
Table R1
SID SNAME

S1 A

S2 B

Table R2
CID CNAME

C1 C

C2 D

When their cross-product is done it resulted in multivalued dependencies.


Table R1 X R2
SID SNAME CID CNAME

S1 A C1 C

S1 A C2 D

S2 B C1 C

S2 B C2 D

80
Multivalued dependencies (MVD) are:
SID->->CID; SID->->CNAME; SNAME->->CNAME
Join Dependency
Join decomposition is a further generalization of Multivalued dependencies. If the join of R1 and R2 over C is equal to

R2(C, D) of a given relations R (A, B, C, D). Alternatively, R1 and R2 are a lossless decomposition of R. A JD ⋈ {R1,
relation R then we can say that a join dependency (JD) exists, where R1 and R2 are the decomposition R1(A, B, C) and

R2, …, Rn} is said to hold over a relation R if R1, R2, ….., Rn is a lossless- join decomposition. The *(A, B, C, D), (C,
D) will be a JD of R if the join of joins attribute is equal to the relation R. Here, *(R1, R2, R3) is used to indicate that
relation R1, R2, R3 and so on are a JD of R. Let R is a relation schema R1, R2, R3… Rn be the decomposition of R. r( R
) is said to satisfy join dependency if and
only if

Joint Dependency

Example:
Table R1
Company Product

C1 Pendrive

C1 mic

C2 speaker

C2 speaker

Company->->Product
Table R2
Agent Company

Aman C1

Aman C2

Mohan C1

Agent->->Company
Table R3
Agent Product

Aman Pendrive

Aman Mic

81
Aman speaker

82
Agent Product

Mohan speaker

Agent->->Product
Table R1⋈R2⋈R3
Company Product Agent

C1 Pendrive Aman

C1 mic Aman

C2 speaker speaker

C1 speaker Aman

Agent->->Product

Fifth Normal Form(5NF)


A relation R is in Fifth Normal Form if and only if everyone joins dependency in R is implied by the candidate
keys of R. A relation decomposed into two relations must have lossless join Property, which ensures that no spurious or
extra tuples are generated when relations are reunited through a natural join.
Properties
A relation R is in 5NF if and only if it satisfies the following conditions:
1. R should be already in 4NF.
2. It cannot be further non loss decomposed (join dependency).
Example – Consider the above schema, with a case as “if a company makes a product and an agent is an agent
for that company, then he always sells that product for the company”. Under these circumstances, the ACP table is shown
as:
Table ACP
Agent Company Product

A1 PQR Nut

A1 PQR Bolt

A1 XYZ Nut

A1 XYZ Bolt

A2 PQR Nut

The relation ACP is again decomposed into 3 relations. Now, the natural Join of all three relations will be shown as:
Table R1
Agent Company
83
Agent Company

A1 PQR

A1 XYZ

A2 PQR

Table R2
Agent Product

A1 Nut

A1 Bolt

A2 Nut

Table R3
Company Product

PQR Nut

PQR Bolt

XYZ Nut

XYZ Bolt

The result of the Natural Join of R1 and R3 over ‘Company’ and then the Natural Join of R13 and R2 over ‘Agent’and
‘Product’ will be Table ACP.
Hence, in this example, all the redundancies are eliminated, and the decomposition of ACP is a lossless join decomposition.
Therefore, the relation is in 5NF as it does not violate the property of lossless join.
Conclusion
 Multivalued dependencies are removed by 4NF, and join dependencies are removed by 5NF.
 The greatest degrees of database normalization, 4NF and 5NF, might not be required for every application.
 Normalizing to 4NF and 5NF might result in more complicated database structures and slower query speed, but it
can also increase data accuracy, dependability, and simplicity.

84
UNIT-V

Transaction Concept

Transaction in DBMS is a set of logically related operations executed as a single unit. These logic are followed to
perform modification on data while maintaining integrity and consistency. Transactions are performed in a way that
concurrent actions from different users don’t malfunction the database. Transfer of money from one account to another in a
bank management system is the best example of Transaction.
States through which a transaction goes during its lifetime. These are the states which tell about the current state of
the Transaction and also tell how we will further do the processing in the transactions. These states govern the rules which
decide the fate of the transaction whether it will be committed or aborted. They also use a Transaction log.
Transaction States in DBMS
A Transaction log is a file maintained by the recovery management component to record all the activities of the
transaction. After the commit is done transaction log file is removed.

In DBMS, a transaction passes through various states such as active, partially committed, failed, and aborted.
Understanding these transaction states is crucial for database management and ensuring the consistency of data. For those
looking to solidify their knowledge in DBMS transactions.
These are different types of Transaction States :
1. Active State – When the instructions of the transaction are running then the transaction is in active state. If all the
‘read and write’ operations are performed without any error then it goes to the “partially committed state”; if any
instruction fails, it goes to the “failed state”.
2. Partially Committed – After completion of all the read and write operation the changes are made in main memory or
local buffer. If the changes are made permanent on the DataBase then the state will change to “committed state” and in case
of failure it will go to the “failed state”.

85
3. Failed State – When any instruction of the transaction fails, it goes to the “failed state” or if
failure occurs in making a permanent change of data on Database.
4. Aborted State – After having any type of failure the transaction goes from “failed state” to “aborted
state” and since in previous states, the changes are only made to local buffer or main memory and hence
these changes are deleted or rolled-back.
5. Committed State – It is the state when the changes are made permanent on the Data Base and the
transaction is complete and therefore terminated in the “terminated state”.
6. Terminated State – If there isn’t any roll-back or the transaction comes from the “committed state”,
then the system is consistent and ready for new transaction and the old transaction is terminated.

ACID Properties
This article is based on the concept of ACID properties in DBMS that are necessary for maintaining
data consistency, integrity, and reliability while performing transactions in the database. Let’s explore
them.
A transaction is a single logical unit of work that accesses and possibly modifies the contents of a
database. Transactions access data using read-and-write operations. To maintain consistency in a database,
before and after the transaction, certain properties are followed. These are called ACID properties.

Atomicity:
By this, we mean that either the entire transaction takes place at once or doesn’t happen at all. There is no
midway i.e. transactions do not occur partially. Each transaction is considered as one unit and either runs
to completion or is not executed at all. It involves the following two operations.

86
— Abort : If a transaction aborts, changes made to the database are not visible.
— Commit : If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Consider the following transaction T consisting of T1 and T2 : Transfer of 100 from account X to
account Y .

If the transaction fails after completion of T1 but before completion of T2 .( say, after write(X)
but before write(Y) ), then the amount has been deducted from X but not added to Y . This results in an
inconsistent database state. Therefore, the transaction must be executed in its entirety in order to ensure
the correctness of the database state.
Consistency:
This means that integrity constraints must be maintained so that the database is consistent before
and after the transaction. It refers to the correctness of a database. Referring to the example above,

The total amount before and after the transaction must be maintained. Total
before T occurs = 500 + 200 = 700 .
Total after T occurs = 400 + 300 = 700 .
Therefore, the database is consistent . Inconsistency occurs in case T1 completes but T2 fails. As a result,
T is incomplete.

Isolation:
This property ensures that multiple transactions can occur concurrently without leading to the inconsistency
of the database state. Transactions occur independently without interference. Changes occurring in a
particular transaction will not be visible to any other transaction until that particular change in that
transaction is written to memory or has been committed. This property ensures that the execution of
transactions concurrently will result in a state that is equivalent to a state achieved these were
executed serially in some order. Let X = 500, Y = 500.
Consider two transactions T and T”.

87
Suppose T has been executed till Read (Y) and then T’’ starts. As a result, interleaving of operations takes
place due to which T’’ reads the correct value of X but the incorrect value of Y and sum computed by
T’’: (X+Y= 50, 000+500=50, 500)
is thus not consistent with the sum at end of the transaction:
T: (X+Y = 50, 000 + 450 = 50, 450)
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take
place in isolation and changes should be visible only after they have been made to the main memory.
Durability:
This property ensures that once the transaction has completed execution, the updates and modifications to
the database are stored in and written to disk and they persist even if a system failure occurs. These updates
now become permanent and are stored in non-volatile memory. The effects of the transaction, thus, are
never lost.
Some important points:
Property Responsibility for maintaining properties

Atomicity Transaction Manager

Consistency Application programmer

Isolation Concurrency Control Manager

Durability Recovery Manager

The ACID properties, in totality, provide a mechanism to ensure the correctness and consistency of a
database in a way such that each transaction is a group of operations that acts as a single unit, produces
consistent results, acts in isolation from other operations, and updates that it makes are durably stored.
ACID properties are the four key characteristics that define the reliability and consistency of a transaction
in a Database Management System (DBMS). The acronym ACID stands for Atomicity, Consistency,
Isolation, and Durability. Here is a brief description of each of these properties:
1. Atomicity: Atomicity ensures that a transaction is treated as a single, indivisible unit of work.
Either all the operations within the transaction are completed successfully, or none of them are. If
any part of the transaction fails, the entire transaction is rolled back to its original state, ensuring
data consistency and integrity.

88
2. Consistency: Consistency ensures that a transaction takes the database from one consistent state to
another consistent state. The database is in a consistent state both before and after the transaction is
executed. Constraints, such as unique keys and foreign keys, must be maintained to ensure data
consistency.
3. Isolation: Isolation ensures that multiple transactions can execute concurrently without interfering
with each other. Each transaction must be isolated from other transactions until it is completed.
This isolation prevents dirty reads, non-repeatable reads, and phantom reads.
4. Durability: Durability ensures that once a transaction is committed, its changes are permanent
and will survive any subsequent system failures. The transaction’s changes are saved to the
database permanently, and even if the system crashes, the changes remain intact and can be
recovered.
Overall, ACID properties provide a framework for ensuring data consistency, integrity, and reliability in
DBMS. They ensure that transactions are executed in a reliable and consistent manner, even in the presence
of system failures, network issues, or other problems. These properties make DBMS a reliable and efficient
tool for managing data in modern organizations.
Advantages of ACID Properties in DBMS
1. Data Consistency: ACID properties ensure that the data remains consistent and accurate after
any transaction execution.
2. Data Integrity: ACID properties maintain the integrity of the data by ensuring that any
changes to the database are permanent and cannot be lost.
3. Concurrency Control: ACID properties help to manage multiple transactions occurring
concurrently by preventing interference between them.
4. Recovery: ACID properties ensure that in case of any failure or crash, the system can recover
the data up to the point of failure or crash.
Disadvantages of ACID Properties in DBMS
1. Performance: The ACID properties can cause a performance overhead in the system, as they
require additional processing to ensure data consistency and integrity.
2. Scalability: The ACID properties may cause scalability issues in large distributed systems where
multiple transactions occur concurrently.
3. Complexity: Implementing the ACID properties can increase the complexity of the system and
require significant expertise and resources.
Overall, the advantages of ACID properties in DBMS outweigh the disadvantages. They provide a
reliable and consistent approach to data management, ensuring data integrity, accuracy, and
reliability. However, in some cases, the overhead of implementing ACID properties can cause
performance and scalability issues. Therefore, it’s important to balance the benefits of ACID
properties against the specific needs and requirements of the system.
Concurrency Control
Concurrency control is a very important concept of DBMS which ensures the simultaneous
execution or manipulation of data by several processes or user without resulting in data inconsistency.
Concurrency Control deals with interleaved execution of more than one transaction.
What is Transaction?

89
A transaction is a collection of operations that performs a single logical function in a database application.
Each transaction is a unit of both atomicity and consistency. Thus, we require that transactions do not violate
any database consistency constraints. That is, if the database was consistent when a transaction started, the
database must be consistent when the transaction successfully terminates. However, during the execution of a
transaction, it may be necessary temporarily to allow inconsistency, since either the debit of A or the credit
of B must be done before the other. This temporary inconsistency, although necessary, may lead to difficulty
if a failure occurs.
It is the programmer’s responsibility to define properly the various transactions, so that each
preserves the consistency of the database. For example, the transaction to transfer funds from the account
of department A to the account of department B could be defined to be composed of two separate
programs: one that debits account A, and another that credits account B. The execution of these two
programs one after the other will indeed preserve consistency. However, each program by itself does not
transform the database from a consistent state to a new consistent state. Thus, those programs are not
transactions.
The concept of a transaction has been applied broadly in database systems and applications.
While the initial use of transactions was in financial applications, the concept is now used in real-time
applications in telecommunication, as well as in the management of long-duration activities such as
product design or administrative workflows.
A set of logically related operations is known as a transaction. The main operations of a transaction are:
 Read(A): Read operations Read(A) or R(A) reads the value of A from the database and stores
it in a buffer in the main memory.
 Write (A): Write operation Write(A) or W(A) writes the value back to the database from the
buffer.
Let us take a debit transaction from an account that consists of the following operations: 1. R(A);
2. A=A-1000;
3. W(A);
Assume A’s value before starting the transaction is 5000.
 The first operation reads the value of A from the database and stores it in a buffer.
 the Second operation will decrease its value by 1000. So buffer will contain 4000.
 the Third operation will write the value from the buffer to the database. So A’s final value will be
4000.
But it may also be possible that the transaction may fail after executing some of its operations. The failure
can be because of hardware, software or power, etc. For example, if the debit transaction discussed above
fails after executing operation 2, the value of A will remain 5000 in the database which is not acceptable
by the bank. To avoid this, Database has two important operations:
 Commit: After all instructions of a transaction are successfully executed, the changes made
by a transaction are made permanent in the database.
 Rollback: If a transaction is not able to execute all operations successfully, all the changes made
by a transaction are undone.
Properties of a Transaction
Atomicity: As a transaction is a set of logically related operations, either all of them should be
executed or none. A debit transaction discussed above should either execute all three

90
operations or none. If the debit transaction fails after executing operations 1 and 2 then its new value of
4000 will not be updated in the database which leads to inconsistency.
Consistency: If operations of debit and credit transactions on the same account are executed concurrently,
it may leave the database in an inconsistent state.
 For Example, with T1 (debit of Rs. 1000 from A) and T2 (credit of 500 to A) executing
concurrently, the database reaches an inconsistent state.
 Let us assume the Account balance of A is Rs. 5000. T1 reads A(5000) and stores the value in its
local buffer space. Then T2 reads A(5000) and also stores the value in its local buffer space.
 T1 performs A=A-1000 (5000-1000=4000) and 4000 is stored in T1 buffer space. Then T2 performs
A=A+500 (5000+500=5500) and 5500 is stored in the T2 buffer space. T1 writes the value from its
buffer back to the database.
 A’s value is updated to 4000 in the database and then T2 writes the value from its buffer back to
the database. A’s value is updated to 5500 which shows that the effect of the debit transaction is
lost and the database has become inconsistent.
 To maintain consistency of the database, we need concurrency control protocols which will be
discussed in the next article. The operations of T1 and T2 with their buffers and database have
been shown in Table 1.
T1’s buffer
T1 space T2 T2’s Buffer Space Database

A=5000

R(A); A=5000 A=5000

A=5000 R(A); A=5000 A=5000

A=A-1000; A=4000 A=5000 A=5000

A=4000 A=A+500; A=5500

W(A); A=5500 A=4000

W(A); A=5500

91
Isolation: The result of a transaction should not be visible to others before the transaction is committed. For
example, let us assume that A’s balance is Rs. 5000 and T1 debits Rs. 1000 from A. A’s new balance will
be 4000. If T2 credits Rs. 500 to A’s new balance, A will become 4500, and after this T1 fails. Then we
have to roll back T2 as well because it is using the value produced by T1. So transaction results are not
made visible to other transactions before it commits.
Durable: Once the database has committed a transaction, the changes made by the transaction should be
permanent. e.g.; If a person has credited $500000 to his account, the bank can’t say that the update has
been lost. To avoid this problem, multiple copies of the database are stored at different locations.
What is a Schedule?
A schedule is a series of operations from one or more transactions. A schedule can be of two types:
Serial Schedule: When one transaction completely executes before starting another transaction, the
schedule is called a serial schedule. A serial schedule is always consistent. e.g.; If a schedule S has debit
transaction T1 and credit transaction T2, possible serial schedules are T1 followed by T2 (T1->T2) or T2
followed by T1 ((T2->T1). A serial schedule has low throughput and less resource utilization.
Concurrent Schedule: When operations of a transaction are interleaved with operations of other
transactions of a schedule, the schedule is called a Concurrent schedule. e.g.; the Schedule of debit and
credit transactions shown in Table 1 is concurrent. But concurrency can lead to inconsistency in the
database. The above example of a concurrent schedule is also inconsistent.
Difference between Serial Schedule and Serializable Schedule
Serial Schedule Serializable Schedule

In Serial schedule, transactions will be In Serializable schedule transaction are


executed one after other. executed concurrently.

Serial schedule are less efficient. Serializable schedule are more efficient.

In serial schedule only one transaction In Serializable schedule multiple transactions can
executed at a time. be executed at a time.

Serial schedule takes more time for In Serializable schedule execution is fast.
execution.

92
Concurrency Control in DBMS
 Executing a single transaction at a time will increase the waiting time of the other transactions
which may result in delay in the overall execution. Hence for increasing the overall throughput
and efficiency of the system, several transactions are executed.
 Concurrency control is a very important concept of DBMS which ensures the simultaneous
execution or manipulation of data by several processes or user without resulting in data
inconsistency.
 Concurrency control provides a procedure that is able to control concurrent execution of the
operations in the database.
 The fundamental goal of database concurrency control is to ensure that concurrent execution of
transactions does not result in a loss of database consistency. The concept of serializability can be
used to achieve this goal, since all serializable schedules preserve consistency of the database.
However, not all schedules that preserve consistency of the database are serializable.
 In general it is not possible to perform an automatic analysis of low-level operations by
transactions and check their effect on database consistency constraints. However, there are simpler
techniques. One is to use the database consistency constraints as the basis for a split of the
database into subdatabases on which concurrency can be managed separately.
 Another is to treat some operations besides read and write as fundamental low-level
operations and to extend concurrency control to deal with them.
Concurrency Control Problems
There are several problems that arise when numerous transactions are executed simultaneously in a
random manner. The database transaction consist of two major operations “Read” and “Write”. It is very
important to manage these operations in the concurrent execution of the transactions in order to maintain
the consistency of the data.
Dirty Read Problem(Write-Read conflict)
Dirty read problem occurs when one transaction updates an item but due to some unconditional events that
transaction fails but before the transaction performs rollback, some other transaction reads the updated
value. Thus creates an inconsistency in the database. Dirty read problem comes under the scenario of Write-
Read conflict between the transactions in the database
1. The lost update problem can be illustrated with the below scenario between two
transactions T1 and T2.
2. Transaction T1 modifies a database record without committing the changes.
3. T2 reads the uncommitted data changed by T1
4. T1 performs rollback
5. T2 has already read the uncommitted data of T1 which is no longer valid, thus creating
inconsistency in the database.
Lost Update Problem
Lost update problem occurs when two or more transactions modify the same data, resulting in the update
being overwritten or lost by another transaction. The lost update problem can be illustrated with the below
scenario between two transactions T1 and T2.
1. T1 reads the value of an item from the database.
2. T2 starts and reads the same database item.
3. T1 updates the value of that data and performs a commit.
4. T2 updates the same data item based on its initial read and performs commit.

93
5. This results in the modification of T1 gets lost by the T2’s write which causes a lost update problem
in the database.
Concurrency Control Protocols
Concurrency control protocols are the set of rules which are maintained in order to solve the concurrency
control problems in the database. It ensures that the concurrent transactions can execute properly while
maintaining the database consistency. The concurrent execution of a transaction is provided with atomicity,
consistency, isolation, durability, and serializability via the concurrency control protocols.
 Locked based concurrency control protocol
 Timestamp based concurrency control protocol
Locked based Protocol
In locked based protocol, each transaction needs to acquire locks before they start accessing or modifying
the data items. There are two types of locks used in databases.
 Shared Lock : Shared lock is also known as read lock which allows multiple transactions to read
the data simultaneously. The transaction which is holding a shared lock can only read the data
item but it can not modify the data item.
 Exclusive Lock : Exclusive lock is also known as the write lock. Exclusive lock allows a
transaction to update a data item. Only one transaction can hold the exclusive lock on a data item
at a time. While a transaction is holding an exclusive lock on a data item, no other transaction is
allowed to acquire a shared/exclusive lock on the same data item.
There are two kind of lock based protocol mostly used in database:
 Two Phase Locking Protocol : Two phase locking is a widely used technique which ensures strict
ordering of lock acquisition and release. Two phase locking protocol works in two phases.
o Growing Phase : In this phase, the transaction starts acquiring locks before
performing any modification on the data items. Once a transaction acquires a lock,
that lock can not be released until the transaction reaches the end of the execution.
o Shrinking Phase : In this phase, the transaction releases all the acquired locks once it
performs all the modifications on the data item. Once the transaction starts releasing
the locks, it can not acquire any locks further.
 Strict Two Phase Locking Protocol : It is almost similar to the two phase locking protocol the
only difference is that in two phase locking the transaction can release its locks before it commits,
but in case of strict two phase locking the transactions are only allowed to release the locks only
when they performs commits.
Timestamp based Protocol
 In this protocol each transaction has a timestamp attached to it. Timestamp is nothing but the
time in which a transaction enters into the system.
 The conflicting pairs of operations can be resolved by the timestamp ordering protocol through the
utilization of the timestamp values of the transactions. Therefore, guaranteeing that the transactions
take place in the correct order.
Advantages of Concurrency
In general, concurrency means, that more than one transaction can work on a system. The advantages
of a concurrent system are:
 Waiting Time: It means if a process is in a ready state but still the process does not get the system
to get execute is called waiting time. So, concurrency leads to less waiting time.

94
 Response Time: The time wasted in getting the response from the cpu for the first time, is called
response time. So, concurrency leads to less Response Time.
 Resource Utilization: The amount of Resource utilization in a particular system is called
Resource Utilization. Multiple transactions can run parallel in a system. So, concurrency leads to
more Resource Utilization.
 Efficiency: The amount of output produced in comparison to given input is called
efficiency. So, Concurrency leads to more Efficiency.
Disadvantages of Concurrency
 Overhead: Implementing concurrency control requires additional overhead, such as acquiring and
releasing locks on database objects. This overhead can lead to slower performance and increased
resource consumption, particularly in systems with high levels of concurrency.
 Deadlocks: Deadlocks can occur when two or more transactions are waiting for each other to
release resources, causing a circular dependency that can prevent any of the transactions from
completing. Deadlocks can be difficult to detect and resolve, and can result in reduced throughput
and increased latency.
 Reduced concurrency: Concurrency control can limit the number of users or applications that can
access the database simultaneously. This can lead to reduced concurrency and slower performance
in systems with high levels of concurrency.
 Complexity: Implementing concurrency control can be complex, particularly in distributed systems
or in systems with complex transactional logic. This complexity can lead to increased development
and maintenance costs.
 Inconsistency: In some cases, concurrency control can lead to inconsistencies in the database.
For example, a transaction that is rolled back may leave the database in an inconsistent state, or
a long-running transaction may cause other transactions to wait for extended periods, leading to
data staleness and reduced accuracy.

Serializability
In this article, we are going to explain the serializability concept and how this concept affects the
DBMS deeply, we also understand the concept of serializability with some examples, and we will finally
conclude this topic with an example of the importance of serializability. The DBMS form is the foundation
of the most modern applications, and when we design the form properly, it provides high-performance
and relative storage solutions to our application.
What is a serializable schedule, and what is it used for?
If a non-serial schedule can be transformed into its corresponding serial schedule, it is said to be
serializable. Simply said, a non-serial schedule is referred to as a serializable schedule if it yields the
same results as a serial timetable.
Non-serial Schedule
A schedule where the transactions are overlapping or switching places. As they are used to carry out
actual database operations, multiple transactions are running at once. It’s possible that these transactions
are focusing on the same data set. Therefore, it is crucial that non-serial schedules can be serialized in order
for our database to be consistent both before and after the transactions are executed.

95
Example:
Transaction-1 Transaction-2

R(a)

W(a)

R(b)

W(b)

R(b)

R(a)

W(b)

W(a)

We can observe that Transaction-2 begins its execution before Transaction-1 is finished, and they are both
working on the same data, i.e., “a” and “b”, interchangeably. Where “R”-Read, “W”-Write
Serializability testing
We can utilize the Serialization Graph or Precedence Graph to examine a schedule’s serializability. A
schedule’s full transactions are organized into a Directed Graph, what a serialization graph is.

Precedence Graph

It can be described as a Graph G(V, E) with vertices V = “V1, V2, V3,…, Vn” and directed edges E =
“E1, E2, E3,…, En”. One of the two operations—READ or WRITE—performed by a certain transaction
is contained in the collection of edges. Where Ti -> Tj, means Transaction-Ti is either performing read or
write before the transaction-Tj.
Types of Serializability
There are two ways to check whether any non-serial schedule is serializable.

96
Types of Serializability – Conflict & View

1. Conflict serializability
Conflict serializability refers to a subset of serializability that focuses on maintaining the consistency of a
database while ensuring that identical data items are executed in an order. In a DBMS each transaction has
a value and all the transactions, in the database rely on this uniqueness. This uniqueness ensures that no two
operations with the conflict value can occur simultaneously.
For example lets consider an order table and a customer table as two instances. Each order is associated
with one customer even though a single client may place orders. However there are restrictions for
achieving conflict serializability in the database. Here are a few of them.
1. Different transactions should be used for the two procedures.
2. The identical data item should be present in both transactions.
3. Between the two operations, there should be at least one write operation.
Example
Three transactions—t1, t2, and t3—are active on a schedule “S” at once. Let’s create a graph of
precedence.
Transaction – 1 (t1) Transaction – 2 (t2) Transaction – 3 (t3)

R(a)

R(b)

R(b)

W(b)

W(a)

W(a)

97
Transaction – 1 (t1) Transaction – 2 (t2) Transaction – 3 (t3)

R(a)

W(a)

It is a conflict serializable schedule as well as a serial schedule because the graph (a DAG) has no loops.
We can also determine the order of transactions because it is a serial schedule.

DAG of transactions

As there is no incoming edge on Transaction 1, Transaction 1 will be executed first. T3 will run second
because it only depends on T1. Due to its dependence on both T1 and T3, t2 will finally be executed.
Therefore, the serial schedule’s equivalent order is: t1 –> t3 –> t2
Note: A schedule is unquestionably consistent if it is conflicting serializable. A non- conflicting
serializable schedule, on the other hand, might or might not be serial. We employ the idea of View
Serializability to further examine its serial behavior.
2. View Serializability
View serializability is a kind of operation in a serializable in which each transaction should provide some
results, and these outcomes are the output of properly sequentially executing the data item. The view
serializability, in contrast to conflict serialized, is concerned with avoiding database inconsistency. The
view serializability feature of DBMS enables users to see databases in contradictory ways.
To further understand view serializability in DBMS, we need to understand the schedules S1 and S2. The
two transactions T1 and T2 should be used to establish these two schedules. Each schedule must follow the
three transactions in order to retain the equivalent of the transaction. These three circumstances are listed
below.
1. The first prerequisite is that the same kind of transaction appears on every schedule. This
requirement means that the same kind of group of transactions cannot appear on both schedules S1
and S2. The schedules are not equal to one another if one schedule commits a transaction but it
does not match the transaction of the other schedule.
2. The second requirement is that different read or write operations should not be used in either
schedule. On the other hand, we say that two schedules are not similar if schedule S1 has two write
operations whereas schedule S2 only has one. The number of the write

98
operation must be the same in both schedules, however there is no issue if the number of the read
operation is different.
3. The second to last requirement is that there should not be a conflict between either timetable.
execution order for a single data item. Assume, for instance, that schedule S1’s transaction is T1,
and schedule S2’s transaction is T2. The data item A is written by both the transaction T1 and the
transaction T2. The schedules are not equal in this instance. However, we referred to the schedule
as equivalent to one another if it had the same number of all write operations in the data item.
What is view equivalency?
Schedules (S1 and S2) must satisfy these two requirements in order to be viewed as equivalent:
1. The same piece of data must be read for the first time. For instance, if transaction t1 is reading
“A” from the database in schedule S1, then t1 must also read A in schedule S2.
2. The same piece of data must be used for the final write. As an illustration, if transaction t1 updated
A last in S1, it should also conduct final write in S2.
3. The middle sequence need to follow suit. As an illustration, if in S1 t1 is reading A, and t2 updates
A, then in S2 t1 should read A, and t2 should update A.
View Serializability refers to the process of determining whether a schedule’s views are equivalent.
Example
We have a schedule “S” with two concurrently running transactions, “t1” and “t2.”
Schedule – S:
Transaction-1 (t1) Transaction-2 (t2)

R(a)

W(a)

R(a)

W(a)

R(b)

W(b)

R(b)

W(b)

By switching between both transactions’ mid-read-write operations, let’s create its view equivalent
schedule (S’).
Schedule – S’:

99
Transaction-1 (t1) Transaction-2 (t2)

R(a)

W(a)

R(b)

W(b)

R(a)

W(a)

R(b)

W(b)

It is a view serializable schedule since a view similar schedule is conceivable.


Advantages
Note: A conflict of Serializability
serializable schedule is always viewed as serializable, but vice versa is not
1.always
Execution
true. is predictable: In serializable, the DBMS’s threads are all performed simultaneously.
The DBMS doesn’t include any such surprises. In DBMS, no data loss or corruption occurs and
all variables are updated as intended.
2. DBMS executes each thread independently, making it much simpler to understand and
troubleshoot each database thread. This can greatly simplify the debugging process. The
concurrent process is therefore not a concern for us.
3. Lower Costs: The cost of the hardware required for the efficient operation of the database can be
decreased with the aid of the serializable property. It may also lower the price of developing the
software.
4. Increased Performance: Since serializable executions provide developers the opportunity to
optimize their code for performance, they occasionally outperform non-serializable equivalents.

Recoverability
Recoverability is a property of database systems that ensures that, in the event of a failure or
error, the system can recover the database to a consistent state. Recoverability guarantees that all
committed transactions are durable and that their effects are permanently stored in the database, while the
effects of uncommitted transactions are undone to maintain data consistency.

10
0
The recoverability property is enforced through the use of transaction logs, which record all
changes made to the database during transaction processing. When a failure occurs, the system uses the
log to recover the database to a consistent state, which involves either undoing the effects of
uncommitted transactions or redoing the effects of committed transactions.
There are several levels of recoverability that can be supported by a database system:
No-undo logging: This level of recoverability only guarantees that committed transactions are durable, but
does not provide the ability to undo the effects of uncommitted transactions.
Undo logging: This level of recoverability provides the ability to undo the effects of uncommitted
transactions but may result in the loss of updates made by committed transactions that occur after the
failed transaction.
Redo logging: This level of recoverability provides the ability to redo the effects of committed
transactions, ensuring that all committed updates are durable and can be recovered in the event of failure.
Undo-redo logging: This level of recoverability provides both undo and redo capabilities, ensuring that
the system can recover to a consistent state regardless of whether a transaction has been committed or
not.
In addition to these levels of recoverability, database systems may also use techniques such as
checkpointing and shadow paging to improve recovery performance and reduce the overhead associated
with logging.
Overall, recoverability is a crucial property of database systems, as it ensures that data is
consistent and durable even in the event of failures or errors. It is important for database administrators
to understand the level of recoverability provided by their system and to configure it appropriately to
meet their application’s requirements.
Recoverable Schedules:
 Schedules in which transactions commit only after all transactions whose changes they read commit
are called recoverable schedules. In other words, if some transaction Tj is reading value updated or
written by some other transaction Ti, then the commit of Tj must occur after the commit of Ti.
Example 1:
S1: R1(x), W1(x), R2(x), R1(y), R2(y),
W2(x), W1(y), C1, C2;
Given schedule follows order of Ti->Tj => C1->C2. Transaction T1 is executed before T2
hence there is no chances of conflict occur. R1(x) appears before W1(x) and transaction T1 is committed
before T2 i.e. completion of first transaction performed first update on data item x, hence given schedule is
recoverable.
Example 2: Consider the following schedule involving two transactions T1 and T2.
T1 T2

R(A)

W(A)

10
1
T1 T2

W(A)

R(A)

commit

commit

This is a recoverable schedule since T1 commits before T2, that makes the value read by T2
correct.
Irrecoverable Schedule: The table below shows a schedule with two transactions, T1 reads and writes A
and that value is read and written by T2. T2 commits. But later on, T1 fails. So we have to rollback T1.
Since T2 has read the value written by T1, it should also be rollbacked. But we have already committed
that. So this schedule is irrecoverable schedule. When Tj is reading the value updated by Ti and Tj is
committed before committing of Ti, the schedule will

be irrecoverable.
Recoverable with Cascading Rollback: The table below shows a schedule with two transactions, T1
reads and writes A and that value is read and written by T2. But later on, T1 fails. So we have to rollback
T1. Since T2 has read the value written by T1, it should also be rollbacked. As it has not committed, we
can rollback T2 as well. So it is recoverable with cascading rollback. Therefore, if Tj is reading value
updated by Ti and commit of Tj is delayed till commit of Ti, the schedule is called recoverable with
cascading rollback.

10
2
Cascadeless Recoverable Rollback: The table below shows a schedule with two transactions, T1 reads
and writes A and commits and that value is read by T2. But if T1 fails before commit, no other transaction
has read its value, so there is no need to rollback other transaction. So this is a Cascadeless recoverable
schedule. So, if Tj reads value updated by Ti only after Ti is committed, the schedule will be cascadeless
recoverable.

Implementation of Isolation
The levels of transaction isolation in DBMS determine how the concurrently running transactions
behave and, therefore, ensure data consistency with performance being even. There are four basic levels-
Read Uncommitted, Read Committed, Repeatable Read, and Serializable that provide different degrees of
data protection from providing fast access with possible incoherence and strict accuracy at the cost of
performance. It depends upon choosing the right one based on whether the need is speed or data integrity.
What is the Transaction Isolation Level?
In a database management system, transaction isolation levels define the degree to which the
operations in one transaction are isolated from the operations of other concurrent transactions. In other
words, it defines how and when the changes made by one transaction are visible to others to assure data
consistency and integrity.

10
3
As we know, to maintain consistency in a database, it follows ACID properties. Among these four
properties (Atomicity, Consistency, Isolation, and Durability) Isolation determines how transaction
integrity is visible to other users and systems. It means that a transaction should take place in a system in
such a way that it is the only transaction that is accessing the resources in a database system.
Isolation levels define the degree to which a transaction must be isolated from the data modifications made
by any other transaction in the database system. A transaction isolation level is defined by the following
phenomena:
 Dirty Read – A Dirty read is a situation when a transaction reads data that has not yet been
committed. For example, Let’s say transaction 1 updates a row and leaves it uncommitted,
meanwhile, Transaction 2 reads the updated row. If transaction 1 rolls back the change, transaction
2 will have read data that is considered never to have existed.
 Non Repeatable read – Non-realatable read occurs when a transaction reads the same row twice
and gets a different value each time. For example, suppose transaction T1 reads data. Due to
concurrency, another transaction T2 updates the same data and commit, Now if transaction T1
rereads the same data, it will retrieve a different value.
 Phantom Read – Phantom Read occurs when two same queries are executed, but the rows
retrieved by the two, are different. For example, suppose transaction T1 retrieves a set of rows that
satisfy some search criteria. Now, Transaction T2 generates some new rows that match the search
criteria for Transaction T1. If transaction T1 re-executes the statement that reads the rows, it gets a
different set of rows this time.
Based on these phenomena, The SQL standard defines four isolation levels:
1. Read Uncommitted – Read Uncommitted is the lowest isolation level. In this level, one
transaction may read not yet committed changes made by other transactions, thereby allowing
dirty reads. At this level, transactions are not isolated from each other.
2. Read Committed – This isolation level guarantees that any data read is committed at the
moment it is read. Thus it does not allow dirty read. The transaction holds a read or write lock on
the current row, and thus prevents other transactions from reading, updating, or deleting it.
3. Repeatable Read – This is the most restrictive isolation level. The transaction holds read locks
on all rows it references and writes locks on referenced rows for update and delete actions. Since
other transactions cannot read, update or delete these rows, consequently it avoids non-repeatable
read.
4. Serializable – This is the highest isolation level. A serializable execution is guaranteed to be
serializable. Serializable execution is defined to be an execution of operations in which concurrently
executing transactions appears to be serially executing.
The Table given below clearly depicts the relationship between isolation levels, read
phenomena, and locks:

10
4
Anomaly Serializable is not the same as Serializable. That is, it is necessary, but not sufficient that a
Serializable schedule should be free of all three phenomena types. Transaction isolation levels are used in
database management systems (DBMS) to control the level of interaction between concurrent transactions.
The four standard isolation levels are:
1. Read Uncommitted: This is the lowest level of isolation where a transaction can see
uncommitted changes made by other transactions. This can result in dirty reads, non-
repeatable reads, and phantom reads.
2. Read Committed: In this isolation level, a transaction can only see changes made by other
committed transactions. This eliminates dirty reads but can still result in non-repeatable reads and
phantom reads.
3. Repeatable Read: This isolation level guarantees that a transaction will see the same data
throughout its duration, even if other transactions commit changes to the data. However, phantom
reads are still possible.
4. Serializable: This is the highest isolation level where a transaction is executed as if it were the
only transaction in the system. All transactions must be executed sequentially, which ensures that
there are no dirty reads, non-repeatable reads, or phantom reads.
The choice of isolation level depends on the specific requirements of the application. Higher isolation levels
offer stronger data consistency but can also result in longer lock times and increased contention, leading to
decreased concurrency and performance. Lower isolation levels provide more concurrency but can result
in data inconsistencies.
In addition to the standard isolation levels, some DBMS may also support additional custom isolation levels
or features such as snapshot isolation and multi-version concurrency control (MVCC) that provide
alternative solutions to the problems addressed by the standard isolation levels.
Advantages of Transaction Isolation Levels
 Improved concurrency: Transaction isolation levels can improve concurrency by allowing
multiple transactions to run concurrently without interfering with each other.
 Control over data consistency: Isolation levels provide control over the level of data
consistency required by a particular application.
 Reduced data anomalies: The use of isolation levels can reduce data anomalies such as dirty
reads, non-repeatable reads, and phantom reads.
 Flexibility: The use of different isolation levels provides flexibility in designing
applications that require different levels of data consistency.
Disadvantages of Transaction Isolation Levels
 Increased overhead: The use of isolation levels can increase overhead because the database
management system must perform additional checks and acquire more locks.
 Decreased concurrency: Some isolation levels, such as Serializable, can decrease concurrency
by requiring transactions to acquire more locks, which can lead to blocking.
 Limited support: Not all database management systems support all isolation levels, which can
limit the portability of applications across different systems.
 Complexity: The use of different isolation levels can add complexity to the design of
database applications, making them more difficult to implement and maintain.

10
5
Lock Based Concurrency Control Protocol in DBMS
In a database management system (DBMS), lock-based concurrency control (BCC) is used to control the
access of multiple transactions to the same data item. This protocol helps to maintain data consistency
and integrity across multiple users.
In the protocol, transactions gain locks on data items to control their access and prevent conflicts
between concurrent transactions. This article will look deep into the Lock Based Protocol in detail.
Lock Based Protocols
A lock is a variable associated with a data item that describes the status of the data item to possible
operations that can be applied to it. They synchronize the access by concurrent transactions to the database
items. It is required in this protocol that all the data items must be accessed in a mutually exclusive manner.
Let me introduce you to two common locks that are used and some terminology followed in this protocol.
Types of Lock
1. Shared Lock (S): Shared Lock is also known as Read-only lock. As the name suggests it can be
shared between transactions because while holding this lock the transaction does not have the
permission to update data on the data item. S-lock is requested using lock-S instruction.
2. Exclusive Lock (X): Data item can be both read as well as written.This is Exclusive and cannot
be held simultaneously on the same data item. X-lock is requested using lock-X instruction.
Lock Compatibility Matrix
A transaction may be granted a lock on an item if the requested lock is compatible with locks already held
on the item by other transactions. Any number of transactions can hold shared locks on an item, but if any
transaction holds an exclusive(X) on the item no other transaction may hold any lock on the item. If a lock
cannot be granted, the requesting transaction is made to wait till all incompatible locks held by other
transactions have been released. Then the lock is granted.

Lock Compatibility Matrix

Types of Lock-Based Protocols


1. Simplistic Lock Protocol
It is the simplest method for locking data during a transaction. Simple lock-based protocols enable all
transactions to obtain a lock on the data before inserting, deleting, or updating it. It will unlock the data
item once the transaction is completed.

10
6
2. Pre-Claiming Lock Protocol
Pre-claiming Lock Protocols assess transactions to determine which data elements require locks. Before
executing the transaction, it asks the DBMS for a lock on all of the data elements. If all locks are given,
this protocol will allow the transaction to start. When the transaction is finished, it releases all locks. If all of
the locks are not provided, this protocol allows the transaction to be reversed and waits until all of the locks
are granted.
3. Two-phase locking (2PL)
A transaction is said to follow the Two-Phase Locking protocol if Locking and Unlocking can be done
in two phases
 Growing Phase: New locks on data items may be acquired but none can be released.
 Shrinking Phase: Existing locks may be released but no new locks can be acquired. For
more detail refer the published article Two-phase locking (2PL).
4. Strict Two-Phase Locking Protocol
Strict Two-Phase Locking requires that in addition to the 2-PL all Exclusive(X) locks held by the
transaction be released until after the Transaction Commits. For more details refer the published article
Strict Two-Phase Locking Protocol.
Upgrade / Downgrade locks
A transaction that holds a lock on an item Ais allowed under certain condition to change the lock state
from one state to another. Upgrade: A S(A) can be upgraded to X(A) if Ti is the only transaction holding the
S-lock on element A. Downgrade: We may downgrade X(A) to S(A) when we feel that we no longer want
to write on data-item A. As we were holding X-lock on A, we need not check any conditions.
So, by now we are introduced with the types of locks and how to apply them. But wait, just by
applying locks if our problems could’ve been avoided then life would’ve been so simple! If you have
done Process Synchronization under OS you must be familiar with one consistent problem, starvation and
Deadlock! We’ll be discussing them shortly, but just so you know we have to apply Locks but they must
follow a set of protocols to avoid such undesirable problems. Shortly we’ll use 2-Phase Locking (2-PL)
which will use the concept of Locks to avoid deadlock. So, applying simple locking, we may not always
produce Serializable results, it may lead to Deadlock Inconsistency.
Problem With Simple Locking
Consider the Partial Schedule:
S.No T1 T2

1 lock-X(B)

2 read(B)

3 B:=B-50

10
7
S.No T1 T2

4 write(B)

5 lock-S(A)

6 read(A)

7 lock-S(B)

8 lock-X(A)

9 …… ……

1. Deadlock
In deadlock consider the above execution phase. Now, T1 holds an Exclusive lock over B, and T2 holds
a Shared lock over A. Consider Statement 7, T2 requests for lock on B, while in Statement 8 T1 requests
lock on A. This as you may notice imposes a deadlock as none can proceed with their execution.

Deadlock

10
8
2. Starvation
Starvation is also possible if concurrency control manager is badly designed. For example: A transaction
may be waiting for an X-lock on an item, while a sequence of other transactions request and are granted an
S-lock on the same item. This may be avoided if the concurrency control manager is properly designed.

Timestamp based Concurrency Control


Timestamp-based concurrency control is a method used in database systems to ensure that
transactions are executed safely and consistently without conflicts, even when multiple transactions are
being processed simultaneously. This approach relies on timestamps to manage and coordinate the
execution order of transactions. Refer to the timestamp of a transaction T as TS(T).
What is Timestamp Ordering Protocol?
The main idea for this protocol is to order the transactions based on their Timestamps. A schedule in which
the transactions participate is then serializable and the only equivalent serial schedule permitted has the
transactions in the order of their Timestamp Values. Stating simply, the schedule is equivalent to the
particular Serial Order corresponding to the order of the Transaction timestamps. An algorithm must ensure
that, for each item accessed by Conflicting Operations in the schedule, the order in which the item is
accessed does not violate the ordering. To ensure this, use two Timestamp Values relating to each
database item X.
 W_TS(X) is the largest timestamp of any transaction that executed write(X) successfully.
 R_TS(X) is the largest timestamp of any transaction that executed read(X) successfully.
Basic Timestamp Ordering
Every transaction is issued a timestamp based on when it enters the system. Suppose, if an old transaction
Ti has timestamp TS(Ti), a new transaction Tj is assigned timestamp TS(Tj) such that TS(Ti) < TS(Tj).
The protocol manages concurrent execution such that the timestamps determine the serializability order.
The timestamp ordering protocol ensures that any conflicting read and write operations are executed in
timestamp order. Whenever some Transaction T tries to issue a R_item(X) or a W_item(X), the Basic TO
algorithm compares the timestamp of T with R_TS(X) & W_TS(X) to ensure that the Timestamp order is
not violated. This describes the Basic TO protocol in the following two cases.
 Whenever a Transaction T issues a W_item(X) operation, check the following conditions:
o If R_TS(X) > TS(T) and if W_TS(X) > TS(T), then abort and rollback T and reject
the operation. else,
o Execute W_item(X) operation of T and set W_TS(X) to TS(T).
 Whenever a Transaction T issues a R_item(X) operation, check the following conditions:
o If W_TS(X) > TS(T), then abort and reject T and reject the operation, else
o If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set
R_TS(X) to the larger of TS(T) and current R_TS(X).
Whenever the Basic TO algorithm detects two conflicting operations that occur in an incorrect order, it
rejects the latter of the two operations by aborting the Transaction that issued it. Schedules produced by
Basic TO are guaranteed to be conflict serializable. Already discussed that using Timestamp can
ensure that our schedule will be deadlock free. One drawback of the Basic TO protocol is that
Cascading Rollback is still possible. Suppose we have a Transaction T1 and T2 has used a value written
by T1. If T1 is aborted and resubmitted to the system then, T2 must also be aborted and rolled back.
So the problem of

10
9
Cascading aborts still prevails. Let’s gist the Advantages and Disadvantages of Basic TO protocol:
 Timestamp Ordering protocol ensures serializability since the precedence graph will be of the
form:

Precedence Graph for TS ordering


 Timestamp protocol ensures freedom from deadlock as no transaction ever waits.
 But the schedule may not be cascade free, and may not even be recoverable.
Strict Timestamp Ordering
A variation of Basic TO is called Strict TO ensures that the schedules are both Strict and Conflict
Serializable. In this variation, a Transaction T that issues a R_item(X) or W_item(X) such that TS(T) >
W_TS(X) has its read or write operation delayed until the Transaction T‘ that wrote the values of X
has committed or aborted.
Advantages of Timestamp Ordering Protocol
 High Concurrency: Timestamp-based concurrency control allows for a high degree of
concurrency by ensuring that transactions do not interfere with each other.
 Efficient: The technique is efficient and scalable, as it does not require locking and can handle
a large number of transactions.
 No Deadlocks: Since there are no locks involved, there is no possibility of
deadlocks occurring.
 Improved Performance: By allowing transactions to execute concurrently, the overall
performance of the database system can be improved.
Disadvantages of Timestamp Ordering Protocol
 Limited Granularity: The granularity of timestamp-based concurrency control is limited to the
precision of the timestamp. This can lead to situations where transactions are unnecessarily
blocked, even if they do not conflict with each other.
 Timestamp Ordering: In order to ensure that transactions are executed in the correct order, the
timestamps need to be carefully managed. If not managed properly, it can lead to inconsistencies in
the database.
 Timestamp Synchronization: Timestamp-based concurrency control requires that all
transactions have synchronized clocks. If the clocks are not synchronized, it can lead to
incorrect ordering of transactions.

11
0
 Timestamp Allocation: Allocating unique timestamps for each transaction can be
challenging, especially in distributed systems where transactions may be initiated at
different locations.

Deadlock
In database management systems (DBMS) a deadlock occurs when two or more transactions are
unable to the proceed because each transaction is waiting for the other to the release locks on resources.
This situation creates a cycle of the dependencies where no transaction can continue leading to the
standstill in the system. The Deadlocks can severely impact the performance and reliability of a DBMS
making it crucial to the understand and manage them effectively.
The Deadlock is a condition in a multi-user database environment where transactions are unable
to the complete because they are each waiting for the resources held by other transactions. This results in a
cycle of the dependencies where no transaction can proceed.
Basically, Deadlocks occur when two or more transactions wait indefinitely for resources held by
each other. Also, mastering how to detect and resolve deadlocks is vital for database efficiency
Characteristics of Deadlock
 Mutual Exclusion: Only one transaction can hold a particular resource at a time.
 Hold and Wait: The Transactions holding resources may request additional resources held by
others.
 No Preemption: The Resources cannot be forcibly taken from the transaction holding them.
 Circular Wait: A cycle of transactions exists where each transaction is waiting for the
resource held by the next transaction in the cycle.
In a database management system (DBMS), a deadlock occurs when two or more transactions are
waiting for each other to release resources, such as locks on database objects, that they need to complete
their operations. As a result, none of the transactions can proceed, leading to a situation where they are
stuck or “deadlocked.”
Deadlocks can happen in multi-user environments when two or more transactions are running
concurrently and try to access the same data in a different order. When this happens, one transaction may
hold a lock on a resource that another transaction needs, while the second transaction may hold a lock on a
resource that the first transaction needs. Both transactions are then blocked, waiting for the other to release
the resource they need.
DBMSs often use various techniques to detect and resolve deadlocks automatically. These techniques
include timeout mechanisms, where a transaction is forced to release its locks after a certain period of
time, and deadlock detection algorithms, which periodically scan the transaction log for deadlock cycles
and then choose a transaction to abort to resolve the deadlock.
It is also possible to prevent deadlocks by careful design of transactions, such as always acquiring locks
in the same order or releasing locks as soon as possible. Proper design of the database schema and
application can also help to minimize the likelihood of deadlocks.
In a database, a deadlock is an unwanted situation in which two or more transactions are waiting
indefinitely for one another to give up locks. Deadlock is said to be one of the most feared complications in
DBMS as it brings the whole system to a Halt.
Example – let us understand the concept of deadlock suppose, Transaction T1 holds a lock on some rows
in the Students table and needs to update some rows in the Grades table.

11
1
Simultaneously, Transaction T2 holds locks on those very rows (Which T1 needs to update) in the
Grades table but needs to update the rows in the Student table held by Transaction T1.
Now, the main problem arises. Transaction T1 will wait for transaction T2 to give up the lock, and
similarly, transaction T2 will wait for transaction T1 to give up the lock. As a consequence, All activity
comes to a halt and remains at a standstill forever unless the DBMS detects the deadlock and aborts one
of the transactions.

Deadlock in DBMS

Deadlock Avoidance
When a database is stuck in a deadlock, It is always better to avoid the deadlock rather than restarting or
aborting the database. The deadlock avoidance method is suitable for smaller databases whereas the deadlock
prevention method is suitable for larger databases.
One method of avoiding deadlock is using application-consistent logic. In the above-given example,
Transactions that access Students and Grades should always access the tables in the same order. In this
way, in the scenario described above, Transaction T1 simply waits for transaction T2 to release the lock
on Grades before it begins. When transaction T2 releases the lock, Transaction T1 can proceed freely.
Another method for avoiding deadlock is to apply both the row-level locking mechanism and the READ
COMMITTED isolation level. However, It does not guarantee to remove deadlocks completely.
Deadlock Detection
When a transaction waits indefinitely to obtain a lock, The database management system should
detect whether the transaction is involved in a deadlock or not.
Wait-for-graph is one of the methods for detecting the deadlock situation. This method is suitable for
smaller databases. In this method, a graph is drawn based on the transaction and its lock on the resource.
If the graph created has a closed loop or a cycle, then there is a
deadlock. For the above-mentioned scenario, the Wait-For graph is drawn below:

11
2
Deadlock Prevention
For a large database, the deadlock prevention method is suitable. A deadlock can be prevented if the
resources are allocated in such a way that a deadlock never occurs. The DBMS analyzes the operations
whether they can create a deadlock situation or not, If they do, that transaction is never allowed to be
executed.
Deadlock prevention mechanism proposes two schemes:
 Wait-Die Scheme: In this scheme, If a transaction requests a resource that is locked by another
transaction, then the DBMS simply checks the timestamp of both transactions and allows the older
transaction to wait until the resource is available for execution.
Suppose, there are two transactions T1 and T2, and Let the timestamp of any transaction T be TS
(T). Now, If there is a lock on T2 by some other transaction and T1 is requesting resources held by
T2, then DBMS performs the following actions:
Checks if TS (T1) < TS (T2) – if T1 is the older transaction and T2 has held some resource, then it
allows T1 to wait until resource is available for execution. That means if a younger transaction has
locked some resource and an older transaction is waiting for it, then an older transaction is allowed
to wait for it till it is available. If T1 is an older transaction and has held some resource with it and
if T2 is waiting for it, then T2 is killed and restarted later with random delay but with the same
timestamp. i.e. if the older transaction has held some resource and the younger transaction waits for
the resource, then the younger transaction is killed and restarted with a very minute delay with the
same timestamp.
This scheme allows the older transaction to wait but kills the younger one.
 Wound Wait Scheme: In this scheme, if an older transaction requests for a resource held by a
younger transaction, then an older transaction forces a younger transaction to kill the transaction
and release the resource. The younger transaction is restarted with a minute delay but with the
same timestamp. If the younger transaction is requesting a resource that is held by an older one,
then the younger transaction is asked to wait till the older one releases it.
The following table lists the differences between Wait – Die and Wound -Wait scheme prevention
schemes:
Wait – Die Wound -Wait

It is based on a non-preemptive technique. It is based on a preemptive technique.

In this, older transactions must wait for the younger In this, older transactions never wait for
one to release its data items. younger transactions.

The number of aborts and rollbacks is higher in these In this, the number of aborts and
techniques. rollback is lesser.

11
3
Applications
1. Delayed Transactions: Deadlocks can cause transactions to be delayed, as the resources they
need are being held by other transactions. This can lead to slower response times and longer wait
times for users.
2. Lost Transactions: In some cases, deadlocks can cause transactions to be lost or aborted, which
can result in data inconsistencies or other issues.
3. Reduced Concurrency: Deadlocks can reduce the level of concurrency in the system, as
transactions are blocked waiting for resources to become available. This can lead to slower
transaction processing and reduced overall throughput.
4. Increased Resource Usage: Deadlocks can result in increased resource usage, as
transactions that are blocked waiting for resources to become available continue to
consume system resources. This can lead to performance degradation and increased
resource contention.
5. Reduced User Satisfaction: Deadlocks can lead to a perception of poor system performance and
can reduce user satisfaction with the application. This can have a negative impact on user adoption
and retention.
Features of Deadlock in a DBMS
1. Mutual Exclusion: Each resource can be held by only one transaction at a time, and other
transactions must wait for it to be released.
2. Hold and Wait: Transactions can request resources while holding on to resources already
allocated to them.
3. No Preemption: Resources cannot be taken away from a transaction forcibly, and the
transaction must release them voluntarily.
4. Circular Wait: Transactions are waiting for resources in a circular chain, where each
transaction is waiting for a resource held by the next transaction in the chain.
5. Indefinite Blocking: Transactions are blocked indefinitely, waiting for resources to
become available, and no transaction can proceed.
6. System Stagnation: Deadlock leads to system stagnation, where no transaction can
proceed, and the system is unable to make any progress.
7. Inconsistent Data: Deadlock can lead to inconsistent data if transactions are unable to
complete and leave the database in an intermediate state.
8. Difficult to Detect and Resolve: Deadlock can be difficult to detect and resolve, as it may involve
multiple transactions, resources, and dependencies.
Disadvantages
1. System downtime: Deadlock can cause system downtime, which can result in loss of
productivity and revenue for businesses that rely on the DBMS.
2. Resource waste: When transactions are waiting for resources, these resources are not being
used, leading to wasted resources and decreased system efficiency.
3. Reduced concurrency: Deadlock can lead to a decrease in system concurrency, which can result
in slower transaction processing and reduced throughput.
4. Complex resolution: Resolving deadlock can be a complex and time-consuming process,
requiring system administrators to intervene and manually resolve the deadlock.
5. Increased system overhead: The mechanisms used to detect and resolve deadlock, such as
timeouts and rollbacks, can increase system overhead, leading to decreased performance.

11
4
Failure Classification
Failure in terms of a database can be defined as its inability to execute the specified transaction or
loss of data from the database. A DBMS is vulnerable to several kinds of failures and each of these failures
needs to be managed differently. There are many reasons that can cause database failures such as network
failure, system crash, natural disasters, carelessness, sabotage(corrupting the data intentionally), software
errors, etc.

Failure Classification in DBMS

A failure in DBMS can be classified as:

Failure Classification in DBMS

Transaction Failure:

If a transaction is not able to execute or it comes to a point from where the transaction becomes incapable of
executing further then it is termed as a failure in a transaction.
Reason for a transaction failure in DBMS:
1. Logical error: A logical error occurs if a transaction is unable to execute because of some mistakes
in the code or due to the presence of some internal faults.
2. System error: Where the termination of an active transaction is done by the database system itself
due to some system issue or because the database management system is unable to proceed with the
transaction. For example– The system ends an operating transaction if it reaches a deadlock
condition or if there is an unavailability of resources.
System Crash:
A system crash usually occurs when there is some sort of hardware or software breakdown. Some other
problems which are external to the system and cause the system to abruptly stop or eventually crash include
failure of the transaction, operating system errors, power cuts, main memory crash, etc.
These types of failures are often termed soft failures and are responsible for the data losses in the volatile
memory. It is assumed that a system crash does not have11 any effect on the data stored in the non-volatile
storage and this is known as the fail-stop-assumption.5
Data-transfer Failure:

When a disk failure occurs amid data-transfer operation resulting in loss of content from disk storage then
such failures are categorized as data-transfer failures. Some other reason for disk failures includes disk head
crash, disk unreachability, formation of bad sectors, read-write errors on the disk, etc.
In order to quickly recover from a disk failure caused amid a data-transfer operation, the backup copy of
the data stored on other tapes or disks can be used. Thus it’s a good practice to backup your data frequently.

Indexing

Indexing is a data structure technique to efficiently retrieve records from the database files based on
some attributes on which the indexing has been done. Indexing in database systems is similar to what we
see in books.
Indexing is defined based on its indexing attributes. Indexing can be of the following types −
 Primary Index − Primary index is defined on an ordered data file. The data file is ordered
on a key field. The key field is generally the primary key of the relation.
 Secondary Index − Secondary index may be generated from a field which is a candidate key
and has a unique value in every record, or a non-key with duplicate values.
 Clustering Index − Clustering index is defined on an ordered data file. The data file is ordered on
a non-key field.
Ordered Indexing is of two types −
 Dense Index
 Sparse Index
Dense Index
In dense index, there is an index record for every search key value in the database. This makes
searching faster but requires more space to store index records itself. Index records contain search
key value and a pointer to the actual record on the disk.

Sparse Index
11
In sparse index, index records are not created for every
6 search key. An index record here contains a search
key and an actual pointer to the data on the disk. To search a record, we first proceed by index record and
reach at the actual location of the data. If the data we are looking for is not where we directly reach by
following the index, then the system starts sequential search until the desired data is found.

11
7
Explore our latest online courses and learn new skills at your own pace. Enroll and become a
certified expert to boost your career.

Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored on the disk
along with the actual database files. As the size of the database grows, so does the size of the indices.
There is an immense need to keep the index records in the main memory so as to speed up the search
operations. If single-level index is used, then a large size index cannot be kept in memory which leads
to multiple disk accesses.

Multi-level Index helps in breaking down the index into several smaller indices in order to make the
outermost level so small that it can be saved in a single disk block, which can easily be
accommodated anywhere in the main memory.
Introduction of B+ Tree
B + Tree is a variation of the B-tree data structure. In a B + tree, data pointers are stored only
at the leaf nodes of the tree. In a B+ tree structure of a leaf node differs from the structure of internal
nodes. The leaf nodes have an entry for every value of the search field, along with a data pointer to the
record (or to the block that contains this record). The leaf nodes of the B+ tree are linked together to
provide ordered access to the search field to the records.
11 Internal nodes of a B+ tree are used to guide
the search. Some search field values from the leaf nodes
8 are repeated in the internal nodes of the B+
tree.
Features of B+ Trees
 Balanced: B+ Trees are self-balancing, which means that as data is added or removed from
the tree, it automatically adjusts itself to maintain a balanced structure. This ensures that the
search time remains relatively constant, regardless of the size of the tree.
 Multi-level: B+ Trees are multi-level data structures, with a root node at the top and one or
more levels of internal nodes below it. The leaf nodes at the bottom level contain the actual
data.
 Ordered: B+ Trees maintain the order of the keys in the tree, which makes it easy to
perform range queries and other operations that require sorted data.
 Fan-out: B+ Trees have a high fan-out, which means that each node can have many child
nodes. This reduces the height of the tree and increases the efficiency of searching and
indexing operations.
 Cache-friendly: B+ Trees are designed to be cache-friendly, which means that they can
take advantage of the caching mechanisms in modern computer architectures to improve
performance.
 Disk-oriented: B+ Trees are often used for disk-based storage systems because they are
efficient at storing and retrieving data from disk.
Why Use B+ Tree?
 B+ Trees are the best choice for storage systems with sluggish data access because they
minimize I/O operations while facilitating efficient disc access.
 B+ Trees are a good choice for database systems and applications needing quick data
retrieval because of their balanced structure, which guarantees predictable performance for a
variety of activities and facilitates effective range-based queries.
Difference Between B+ Tree and B Tree
Some differences between B+ Tree and B Tree are stated below.

Parameters B+ Tree B Tree

Separate leaf nodes for data Nodes store both keys and data
Structure storage and internal nodes for values
indexing

Leaf Nodes Leaf nodes form a linked list for Leaf nodes do not form a linked list
efficient range-based queries

11
9
Parameters B+ Tree B Tree

Order Higher order (more keys) Lower order (fewer keys)

Key Typically allows key duplication in Usually does not allow key
Duplication leaf nodes duplication

Better disk access due to More disk I/O due to non-


Disk Access sequential reads in a linked list sequential reads in internal nodes
structure

Applications Database systems, file systems, In-memory data structures,


where range queries are common databases, general-purpose use

Performance Better performance for range Balanced performance for search,


queries and bulk data retrieval insert, and delete operations

Memory Requires more memory for Requires less memory as keys and
Usage internal nodes values are stored in the same node

Implementation of B+ Tree

In order, to implement dynamic multilevel indexing, B-tree and B+ tree are generally employed. The
drawback of the B-tree used for indexing, however, is that it stores the data pointer (a pointer to the
disk file block containing the key value), corresponding to a particular key value, along with that key
value in the node of a B-tree. This technique greatly reduces the number of entries that can be packed
into a node of a B-tree, thereby contributing to the increase in the number of levels in the B-tree,
hence increasing the search time of a record. B+ tree eliminates the above drawback by storing data
pointers only at the leaf nodes of the tree. Thus, the structure of the leaf nodes of a B+ tree is quite
different from the structure of the internal nodes of the B tree. It may be noted here that, since data
pointers are present only at the leaf nodes, the leaf nodes must necessarily store all the key values
along with their corresponding data pointers to the disk file block, in order to access them.
Moreover, the leaf nodes are linked to providing ordered access to the records. The leaf nodes,
therefore form the first level of the index, with the internal nodes forming the other levels of a
multilevel index. Some of the key values of the leaf nodes also appear in the internal nodes,
to simply act as a medium to control the searching of a record. From the above discussion, it is
apparent that a B+ tree, unlike a B-tree, has two orders, ‘a’ and ‘b’, one for the internal nodes and the
other for the external (or leaf) nodes.

12
0
Structure of B+ Trees

B+ Trees contain two types of nodes:


 Internal Nodes: Internal Nodes are the nodes that are present in at least n/2 record
pointers, but not in the root node,
 Leaf Nodes: Leaf Nodes are the nodes that have n pointers.
The Structure of the Internal Nodes of a B+ Tree of Order ‘a’ is as Follows
 Each internal node is of the form: <P1, K1, P2, K2, ….., Pc-1, Kc-1, Pc> where c <= a and
each Pi is a tree pointer (i.e points to another node of the tree) and,
each Ki is a key-value (see diagram-I for reference).

 Every internal node has : K1 < K2 < …. < Kc-1


 For each search field value ‘X’ in the sub-tree pointed at by Pi, the following condition
holds: Ki-1 < X <= Ki, for 1 < I < c and, Ki-1 < X, for i = c (See diagram I for reference)
 Each internal node has at most ‘aa tree pointers.
 The root node has, at least two tree pointers, while the other internal nodes have at least \
ceil(a/2) tree pointers each.
 If an internal node has ‘c’ pointers, c <= a, then it has ‘c – 1’ key values.

12
1
Structure of Internal Node 12
2
The Structure of the Leaf Nodes of a B+ Tree of Order ‘b’ is as Follows
 Each leaf node is of the form: <<K1, D1>, <K2, D2>, ….., <Kc-1, Dc-1>, Pnext> where c
<= b and each Di is a data pointer (i.e points to actual record in the disk whose key
value is Ki or to a disk file block containing that record) and,

12
3
each Ki is a key value and, Pnext points to next leaf node in the B+ tree (see diagram II
for reference).
 Every leaf node has : K1 < K2 < …. < Kc-1, c <= b
 Each leaf node has at least \ceil(b/2) values.
 All leaf nodes are at the same level.

12
4
Structure of Lead Node
Diagram-II Using the Pnext pointer it is viable to traverse all the leaf nodes, just like a linked
list, thereby achieving ordered access to the records stored in the disk.

Tree Pointer
Searching a Record in B+ Trees

12
5
Searching in B+ Tree
Let us suppose we have to find 58 in the B+ Tree. We will start by fetching from the root node then
we will move to the leaf node, which might contain a record of 58. In the image given above, we will
get 58 between 50 and 70. Therefore, we will we are getting a leaf node in the third leaf node and get
58 there. If we are unable to find that node, we will return that ‘record not founded’ message.
Insertion in B+ Trees
Insertion in B+ Trees is done via the following steps.
 Every element in the tree has to be inserted into a leaf node. Therefore, it is necessary to go
to a proper leaf node.
 Insert the key into the leaf node in increasing order if there is no overflow. For
more, refer to Insertion in a B+ Trees.
Deletion in B+Trees
Deletion in B+ Trees is just not deletion but it is a combined process of Searching, Deletion, and
Balancing. In the last step of the Deletion Process, it is mandatory to balance the B+ Trees,
otherwise, it fails in the property of B+ Trees.
For more, refer to Deletion in B+ Trees.
Advantages of B+Trees
 A B+ tree with ‘l’ levels can store more entries in its internal nodes compared to a B- tree
having the same ‘l’ levels. This accentuates the significant improvement made to the search
time for any given key. Having lesser levels and the presence of
Pnext pointers imply that the B+ trees is very quick and efficient in accessing records from
disks.
 Data stored in a B+ tree can be accessed both sequentially and directly.
 It takes an equal number of disk accesses to fetch records.
 B+trees have redundant search keys, and storing search keys repeatedly is not
possible.
Disadvantages of B+ Trees
 The major drawback of B-tree is the difficulty of traversing the keys sequentially. The B+
tree retains the rapid random access property of the B-tree while also allowing rapid
sequential access.

12
6
Application of B+ Trees
 Multilevel Indexing
 Faster operations on the tree (insertion, deletion, search)
 Database indexing
Hashing in DBMS
Hashing in DBMS is a technique to quickly locate a data record in a database irrespective of the size
of the database. For larger databases containing thousands and millions of records, the indexing data
structure technique becomes very inefficient because searching a specific record through indexing
will consume more time. This doesn’t align with the goals of DBMS, especially when performance
and data retrieval time are minimized. So, to counter this problem hashing technique is used. In this
article, we will learn about various hashing techniques.
What is Hashing?
The hashing technique utilizes an auxiliary hash table to store the data records using a hash
function. There are 2 key components in hashing:
 Hash Table: A hash table is an array or data structure and its size is determined by the total
volume of data records present in the database. Each memory location in a hash table is called
a ‘bucket‘ or hash indice and stores a data record’s exact location and can be accessed
through a hash function.
 Bucket: A bucket is a memory location (index) in the hash table that stores the data
record. These buckets generally store a disk block which further stores multiple records. It
is also known as the hash index.
 Hash Function: A hash function is a mathematical equation or algorithm that takes one
data record’s primary key as input and computes the hash index as output.
Hash Function
A hash function is a mathematical algorithm that computes the index or the location where the
current data record is to be stored in the hash table so that it can be accessed efficiently later. This
hash function is the most crucial component that determines the speed of fetching data.
Working of Hash Function
The hash function generates a hash index through the primary key of the data record.
Now, there are 2 possibilities:

1. The hash index generated isn’t already occupied by any other value. So, the address of the data
record will be stored here.

2. The hash index generated is already occupied by some other value. This is called collision so to
counter this, a collision resolution technique will be applied.

12
7
3. Now whenever we query a specific record, the hash function will be applied and returns the
data record comparatively faster than indexing because we can directly reach the exact location of
the data record through the hash function rather than searching through indices one by one.
Example:

Hashing
Types of Hashing in DBMS
There are two primary hashing techniques in DBMS.
1. Static Hashing
In static hashing, the hash function always generates the same bucket’s address. For example, if we
have a data record for employee_id = 107, the hash function is mod-5 which is –
H(x) % 5, where x = id. Then the operation will take place like this:

H(106) % 5 = 1.
This indicates that the data record should be placed or searched in the 1st bucket (or 1st hash index)
in the hash table.
Example:

12
Static Hashing Technique 8
The primary key is used as the input to the hash function and the hash function generates the output
as the hash index (bucket’s address) which contains the address of the actual data record on the disk
block.
Static Hashing has the following Properties
 Data Buckets: The number of buckets in memory remains constant. The size of the hash
table is decided initially and it may also implement chaining that will allow handling some
collision issues though, it’s only a slight optimization and may not prove worthy if the
database size keeps fluctuating.
 Hash function: It uses the simplest hash function to map the data records to its
appropriate bucket. It is generally modulo-hash function
 Efficient for known data size: It’s very efficient in terms when we know the data size
and its distribution in the database.
 It is inefficient and inaccurate when the data size dynamically varies because we have limited
space and the hash function always generates the same value for every specific input. When
the data size fluctuates very often it’s not at all useful because collision will keep happening
and it will result in problems like – bucket skew, insufficient buckets etc.
To resolve this problem of bucket overflow, techniques such as – chaining and open
addressing are used. Here’s a brief info on both:

1. Chaining
Chaining is a mechanism in which the hash table is implemented using an array of type nodes, where
each bucket is of node type and can contain a long chain of linked lists to store the data records. So, even
if a hash function generates the same value for any data record it can still be stored in a bucket by adding
a new node.
However, this will give rise to the problem bucket skew that is, if the hash function keeps
generating the same value again and again then the hashing will become inefficient as the
remaining data buckets will stay unoccupied or store minimal data.

2. Open Addressing/Closed Hashing


This is also called closed hashing this aims to solve the problem of collision by looking out for the
next empty slot available which can store data. It uses techniques like linear probing, quadratic
probing, double hashing, etc.

2. Dynamic Hashing
Dynamic hashing is also known as extendible hashing, used to handle database that frequently
changes data sets. This method offers us a way to add and remove data buckets on demand
dynamically. This way as the number of data records varies, the buckets will also grow and shrink in
size periodically whenever a change is made.
Properties of Dynamic Hashing
 The buckets will vary in size dynamically periodically as changes are made offering more
flexibility in making any change.
 Dynamic Hashing aids in improving overall performance by minimizing or
completely preventing collisions. 12
9
 It has the following major components: Data bucket, Flexible hash function, and
directories
 A flexible hash function means that it will generate more dynamic values and will keep
changing periodically asserting to the requirements of the database.
 Directories are containers that store the pointer to buckets. If bucket overflow or bucket
skew-like problems happen to occur, then bucket splitting is done to maintain efficient
retrieval time of data records. Each directory will have a directory id.
 Global Depth: It is defined as the number of bits in each directory id. The more the
number of records, the more bits are there.
Working of Dynamic Hashing
Example: If global depth: k = 2, the keys will be mapped accordingly to the hash index. K bits
starting from LSB will be taken to map a key to the buckets. That leaves us with the following 4
possibilities: 00, 11, 10, 01.

Dynamic Hashing – mapping


As we can see in the above image, the k bits from LSBs are taken in the hash index to map to their
appropriate buckets through directory IDs. The hash indices point to the directories, and the k bits are
taken from the directories’ IDs and then mapped to the buckets. Each bucket holds the value
corresponding to the IDs converted in binary.

13
0

You might also like