Mca 201 DBMS
Mca 201 DBMS
A Database Management System (DBMS) is defined as the software system that allows users to define, create,
maintain and control access to the database. DBMS makes it possible for end users to create, read, update and
delete data in database. It is a layer between programs and data.
A database management system (DBMS) is essentially a group of linked data and a collection of computer
applications and tools that retrieve, analyze, and alter data.
A database management system is a software tool used to create and manage one or more databases, offering
an easy way to create a database, update tables, retrieve information, and enhance data. A DBMS is where data
is accessed, modified and locked to prevent conflicts.
A database management system (DBMS) is a software program that allows users to create, maintain, and
interact with a database. A database is a data collection organized in a specific format, making it easy to access,
manage, and manipulate. DBMSs are the intermediaries between users and databases, handling all
communication and data processing.
DBMS is a collection of programs that enables users to create and maintain a database.
Advantage of DBMS
• Controlling Redundancy
• Restricting unauthorized access
• Providing persistent storage:- Such an object is said to be persistent as it can then be directly retrieved
by another program.
• Atomicity in data
• Permitting inferencing and actions
• Representing complex relationships among data:- DBMS has the capability to represent a variety of
complex relationships among data, as well as to retrieve and update related data efficiently. A good
example of that is the use of foreign keys!
• Provides multiple user interfaces
• Enforcing integrity constraints
• Data integration
• Providing backup and recovery
• No concurrent access anomalies
• Data Inconsistency
• Data Searching
• Data Security
• Data Concurrency
• Low Maintenance Cost
Page | 1
• View of data in DBMS describes the abstraction of data at three-level i.e. physical level, logical level,
view level.
• View of data in DBMS narrate how the data is visualized at each level of data abstraction? Data
abstraction allow developers to keep complex data structures away from the users. The developers
achieve this by hiding the complex data structures through levels of abstraction.
• View of data in DBMS narrate how the data is visualized at each level of data abstraction? Data
abstraction allow developers to keep complex data structures away from the users. The developers
achieve this by hiding the complex data structures through levels of abstraction.
• There is one more feature that should be kept in mind i.e. the data independence. While changing the
data schema at one level of the database must not modify the data schema at the next level.
Page | 2
DATA INDEPENDENCE
➢ Data independence can be explained using
the three-schema architecture.
o Logical data independence refers characteristic of being able to change the conceptual schema without
having to change the external schema.
o Logical data independence is used to separate the external level from the conceptual view.
o If we do any changes in the conceptual view of the data, then the user view of the data would not be
affected.
o Physical data independence can be defined as the capacity to change the internal schema without
having to change the conceptual schema.
o If we do any changes in the storage size of the database system server, then the Conceptual structure of
the database will not be affected.
o Physical data independence is used to separate conceptual levels from the internal levels.
Schema
The overall design of the database is called schema or description of database.
Page | 3
• Schema of a database can only modify the DDL statement but does not change by performing
certain operations like insertion, updating, and deletion.
• Database schema explains the integrity constraints of the database, domains of all attributes,
foreign, and primary key of all the relations.
o The schema is a complete description of a database, including the names and descriptions of
all areas, records, elements, and sets. The major purpose of the schema is to provide
definitions from which to generate subschemas.
Types of Schema
Schema is of three types, which are as follows −
• View Schema − The design of a database at a view level is called view schema. This schema
generally shows the user interaction with the database system.
• Logical Schema − The design of a database at the logical level is called a logical schema. A
database administrator (DBA) and the programmers used to work at this level. This level
describes all the entities, attributes and their relationship with the integrity constraints.
• Physical Schema − The design of a database at the physical level is called a physical schema.
This schema describes how the data is stored in the secondary storage devices. There is only 1
logical and 1 physical schema per database and more than 1 view schema.
Schema is also called Intention and is shown as below −
Sub schema
It is the subset of the schema and inherits the same property that a schema has. It gives the users a window
through which he/she can view only that part of the database which he wants.
For example − Student table in a database the programmer can access all fields of table, but the user can
access only two or three fields of it. Subschema describes both views of the database.
Page | 4
A data model helps design the database at the conceptual, physical and logical levels.
Data Model structure helps to define the relational tables, primary and foreign keys and stored procedures.
It provides a clear picture of the base data and can be used by database developers to create a physical
database.
Though the initial creation of data model is labor and time consuming, in the long run, it makes your IT
infrastructure upgrade and maintenance cheaper and faster.
➢ The main goal of a designing data model is to make certain that data objects offered by the functional
team are represented accurately.
➢ The data model should be detailed enough to be used for building the physical database.
➢ The information in the data model can be used for defining the relationship between tables, primary
and foreign keys, and stored procedures.
➢ Data Model helps business to communicate the within and across organizations.
➢ Data model helps to documents data mappings in ETL process
➢ Help to recognize correct sources of data to populate the model
➢ To develop Data model one should know physical data stored characteristics.
➢ This is a navigational system produces complex application development, management. Thus, it
requires a knowledge of the biographical truth.
➢ Even smaller change made in structure require modification in the entire application.
➢ There is no set data manipulation language in DBMS.
There are mainly three different types of data models: conceptual data models, logical data models, and physical
data models, and each one has a specific purpose. The data models are used to represent the data and how it is
stored in the database and to set the relationship between data items.
1. Conceptual Data Model: This Data Model defines WHAT the system contains. This model is typically
created by Business stakeholders and Data Architects. The purpose is to organize, scope and define
business concepts and rules.
Page | 5
2. Logical Data Model: Defines HOW the system should be implemented regardless of the DBMS. This
model is typically created by Data Architects and Business Analysts. The purpose is to developed
technical map of rules and data structures.
➢ Describes data needs for a single project but could integrate with other logical data models
based on the scope of the project.
➢ Designed and developed
independently from the DBMS.
➢ Data attributes will have datatypes
with exact precisions and length.
➢ Normalization processes to the model
is applied typically till 3NF.
➢ The physical data model describes data need for a single project or application though it
maybe integrated with other physical data models based on project scope.
➢ Data Model contains relationships between tables that which addresses cardinality and
nullability of the relationships.
➢ Developed for a specific version of a DBMS, location, data storage or technology to be used in
the project.
➢ Columns should have exact datatypes, lengths assigned and default values.
➢ Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are defined.
DATABASE LANGUAGES
In our daily lives, we make use of certain languages to communicate and share our thoughts with other
individuals. It's an essential part of our lives as it helps others understand what we want to convey to them.
Similarly, in the data world, we need some special kind of programming languages to make the DBMS software
understand our needs and manage the data stored in the databases accordingly. These programming languages
are known as database languages or query languages.
Database languages are used to perform a variety of critical tasks that help a database management system
function correctly. These tasks can be certain operations such as read, update, insert, search, or delete the data
stored in the database.
Database Language is a special type of programming language used to define and manipulate a database. Based
on their application, database languages are classified into four different types: DDL, DML, DCL, and TCL.
Page | 6
DDL is used for specifying the database schema. It is used for creating tables, schema, indexes, constraints etc. in
database.
All of these commands either defines or update the database schema that’s why they come under Data
Definition language.
DML is used for accessing and manipulating data in a database. The following operations on database comes
under DML:
In practical data definition language, data manipulation language and data control languages are not
separate language, rather they are the parts of a single database language such as SQL.
Page | 7
The changes in the database that we made using DML commands are either performed or rollbacked using TCL.
TRANSACTION MANAGEMENT
A transaction is a logical unit of work performed on a database. They are logically ordered units of work
completed by the end-user or an application.
A transaction is made up of one or more database modifications. Creating, updating, or deleting a record from a
table.
A Transaction can be seen as a set of operations that are used to perform some logical set of work. A transaction
is used to make changes in data in a database which can be done by inserting new data, altering the existing
data, or by deleting the already data.
Lifetime of a transaction has multiple states, these states update the system about the current state of the
transaction and also tell the user about how to plan further processing.
Transaction states
There are various database transaction states as follows.
1. Active state - this is the state in which a transaction execution process begins. Operations such as read
or write are performed on the database.
1. Atomicity - A transaction cannot be subdivided and can only be executed as a whole and is treated as
an atomic unit. It is either all the operations are carried out or none are performed.
2. Consistency - After any transaction is carried out in a database it should remain consistent. No
transaction should affect the data residing in the database adversely.
3. Isolation - When several transactions need to be conducted in a database at the same time, each
transaction is treated as if it were a single transaction. As a result, the completion of a single
transaction should have no bearing on the completion of additional transactions.
4. Durability - From durable, all changes made must be permanent such that once the transaction is
committed the effects of the transaction cannot be reversed. In case of system failure or unexpected
Page | 8
shutdown and changes made by a complete transaction are not written to the disk, during restart the
changes should be remembered and restored.
1. Database Administrator (DBA) : Database Administrator (DBA) is a person/team who defines the
schema and also controls the 3 levels of database. The DBA will then create a new account id and
password for the user if he/she need to access the database. DBA is also responsible for providing
security to the database and he allows only the authorized users to access/modify the data base. DBA is
responsible for the problems such as security breaches and poor system response time.
• DBA also monitors the recovery and backup and provide technical support.
• The DBA has a DBA account in the DBMS which called a system or superuser account.
• DBA is the one having privileges to perform DCL (Data Control Language) operations such as
GRANT and REVOKE, to allow/restrict a particular user from accessing the database.
2. Naive / Parametric End Users : Parametric End Users are the unsophisticated who don’t have any DBMS
knowledge but they frequently use the database applications in their daily life to get the desired
results. For examples, Railway’s ticket booking users are naive users. Clerks in any bank is a naive user
because they don’t have any DBMS knowledge but they still use the database and perform their given
task.
3. System Analyst :
System Analyst is a user who analyzes the requirements of parametric end users. They check whether
all the requirements of end users are satisfied.
4. Sophisticated Users : Sophisticated users can be engineers, scientists, business analyst, who are familiar
with the database. They can develop their own database applications according to their requirement.
They don’t write the program code but they interact the database by writing SQL queries directly
through the query processor.
5. Database Designers : Data Base Designers are the users who design the structure of database which
includes tables, indexes, views, triggers, stored procedures and constraints which are usually enforced
before the database is created or populated with data. He/she controls what data must be stored and
how the data items to be related. It is responsibility of Database Designers to understand the
requirements of different user groups and then create a design which satisfies the need of all the user
groups.
7. Casual Users / Temporary Users : Casual Users are the users who occasionally use/access the database
but each time when they access the database they require the new information, for example, Middle or
higher level manager.
Page | 9
DATA DICTIONARY
A data dictionary in Database Management System (DBMS) can be defined as a component that stores the
collection of names, definitions, and attributes for data elements that are being used in a database. The Data
Dictionary stores metadata, i.e., data about the database.
Data Dictionary is made up of two words, data which means the collected information through multiple sources,
and dictionary meaning the place where all this information is made available.
A data dictionary is a crucial part of a relational database as it provides additional information about the
relationships between multiple tables in a database. The data dictionary in DBMS helps the user to arrange data
in a neat and well-organized way, thus preventing data redundancy.
Data Dictionary in DBMS provides additional information about relationships between multiple database tables,
helps to organize data, and prevents data redundancy in DBMS.
A data dictionary is a set of files that contain a database's metadata. Thus, it is also known as a metadata
repository. storing the relational schemas and other metadata about the relations in a structure is known
as Data Dictionary or System Catalog.
A data dictionary is like the A-Z dictionary of the relational database system holding all information of each
relation in the database.
There are mainly two types of data dictionary in a database management system:
• External levels.
• Conceptual levels.
• Internal levels.
The main objective of the three level architecture is nothing but to separate each user view of the data from
the way the database is physically represented. The database internal structure should be unaffected while
changes to the physical aspects of storage.
The DBA should be able to change the conceptual structure of the database without affecting all other users.
Page | 10
• ER Diagram: ER diagrams are the diagrams that are sketched out to design the database. They are
created based on three basic concepts: entities, attributes, and relationships between them. In ER
diagram we define the entities, their related attributes, and the relationships between them. This helps
in illustrating the logical structure of the databases.
• Database Design: The Entity-Relationship model helps the database designers to build the database in
a very simple and conceptual manner.
• Graphical Representation helps in Better Understanding: ER diagrams are very easy and simple to
understand and so the developers can easily use them to communicate with stakeholders.
• Easy to build: The ER model is very easy to build.
• The extended E-R features: Some of the additional features of ER model are specialization, upper and
lower-level entity sets, attribute inheritance, aggregation, and generalization.
• Integration of ER model: This model can be integrated into a common dominant relational model and is
widely used by database designers for communicating their ideas.
• Simplicity and various applications of ER model: It provides a preview of how all your tables should
connect, and what fields are going to be on each table, which can be used as a blueprint for
implementing data in specific software applications.
DESIGN ISSUES
Entity-Relationship Design Issues
The notions of an entity set and a relationship set are not precise, and it is possible to define a set of entities
and the relationships among them in a number of different ways.
Basic issues in the design of an E-R database schema are:-
Page | 11
sets, one to relate course registration records to students and one to relate course-registration records to
sections.
ER Design Methodologies
The guidelines that should be followed while designing an ER diagram are discussed below:
relationship sets
dependencies
discriminators
• Design diagram
MAPPING CONSTRAINT
Cardinality means how the entities are arranged to each other or what is the relationship structure between
entities in a relationship set.
In a Database Management System, Cardinality represents a number that denotes how many times an entity is
participating with another entity in a relationship set. The Cardinality of DBMS is a very important attribute in
representing the structure of a Database. In a table, the number of rows or tuples represents the Cardinality.
Page | 12
Cardinality Ratio
Cardinality ratio is also called Cardinality Mapping, which represents the mapping of one entity set to another
entity set in a relationship set. We generally take the example of a binary relationship set where two entities are
mapped to each other.
Cardinality is very important in the Database of various businesses. For example, if we want to track the
purchase history of each customer then we can use the one-to-many cardinality to find the data of a specific
customer. The Cardinality model can be used in Databases by Database Managers for a variety of purposes, but
corporations often use it to evaluate customer or inventory data.
1. One to one
2. Many to one
3. One to many
4. Many to many
KEYS
KEYS in DBMS is an attribute or set of attributes which helps you to identify a row(tuple) in a relation(table).
They allow you to find the relation between two tables. Keys help you uniquely identify a row in a table by a
combination of one or more columns in that table.
1. Super Key
2. Primary Key
3. Candidate Key
4. Alternate Key
5. Foreign Key
6. Compound Key
7. Composite Key
8. Surrogate Key
• Super Key – A super key is a group of single or multiple keys which identifies rows in a table.
• Primary Key – is a column or group of columns in a table that uniquely identify every row in that table.
• Candidate Key – is a set of attributes that uniquely identify tuples in a table. Candidate Key is a super
key with no repeated attributes.
• Alternate Key – is a column or group of columns in a table that uniquely identify every row in that table.
• Foreign Key – is a column that creates a relationship between two tables. The purpose of Foreign keys is
to maintain data integrity and allow navigation between two different instances of an entity.
• Compound Key – has two or more attributes that allow you to uniquely recognize a specific record. It is
possible that each column may not be unique by itself within the database.
Page | 13
• Composite Key – is a combination of two or more columns that uniquely identify rows in a table. The
combination of columns guarantees uniqueness, though individual uniqueness is not guaranteed.
• Surrogate Key – An artificial key which aims to uniquely identify each record is called a surrogate key.
These kind of key are unique because they are created when you don’t have any natural primary key.
ER DIAGRAM
• ER diagram was proposed by Peter Chen in 1971 as a visual tool to represent the ER model.
• ER diagrams are created based on three basic concepts: entities, attributes, and relationships between
them.
• Any object that physically exists and is logically constructed in the real world is called an entity.
• Strong entities are those entity types that have a key attribute, whereas weak entity type doesn’t have
a key attribute and so we cannot uniquely identify them by their attributes alone.
• Attributes are the characteristics or properties which define the entity type.
• Relationship is nothing but an association among two or more entities.
• A simple attribute is an attribute that cannot be decomposed further. A composite attribute is
an attribute that can be decomposed into simpler attributes.
• A multivalued attribute is an attribute that can have multiple values, and a derived attribute is
an attribute that can be derived from other attributes of an entity type.
• When only one instance of an entity is associated with the relationship to one instance of
another entity, then it is known as one to one relationship.
• If only one instance of the entity on the left side of the relationship is linked to multiple instances of
the entity on the right side, then this is considered a given one-to-many relationship.
• If only one instance of the entity on the left side of the relationship is linked to multiple instances of
the entity on the right side, then this is considered a given one-to-many relationship.
• If multiple instances of the entity on the left are linked by relationships to multiple instances of
the entity on the right, this is considered a many-to-one-relationship means relationship.
Component of ER Diagram
1. Entities:- An entity is anything in the real world, such as an object, class, person, or place. Objects that
physically exist and are logically constructed in the real world are called entities. Each entity consists of
several characteristics or attributes that describe that entity. For example, if a person is an entity,
its attributes or characteristics are age, name, height, weight, occupation, address, hobbies, and so on.
Read this article to learn more about Entity in DBMS.
2. Attributes:- Attributes are the characteristics or properties which define the entity type. In ER diagram,
the attribute is represented by an oval.
For example, here id, Name, Age, and Mobile_No are the attributes that define the entity type Student.
1. Simple attribute: Attributes that cannot be further decomposed into sub-attributes are called simple
attributes. It's an atomic value and is also known as the key attribute. The simple attributes are
represented by an oval shape in ER diagrams with the attribute name underlined.
For example, the roll number of a student, or the student's contact number are examples of simple
attributes.
Page | 14
2. Composite attribute: An attribute that is composed of many other attributes and can be decomposed
into simple attributes is known as a composite attribute in DBMS. The composite attribute is
represented by an ellipse.
For example, a student's address can be divided into city, state, country, and pin code or a full name can
be divided into first name, middle name, and last name.
3. Multivalued attribute: Multivalued attributes in DBMS are attributes that can have more than one
value. The double oval is used to represent a multivalued attribute.
For example, the mobile_number of a student is a multivalued attribute as one student can have more
than one mobile number.
4. Derived attribute: Derived attributes in DBMS are the ones that can be derived from other attributes of
an entity type. The derived attributes are represented by a dashed oval symbol in the ER diagram.
For example, the age attribute can be derived from the date of birth (DOB) attribute. So, it's a derived
attribute.
3. Relationships:- The concept of relationship in DBMS is used to describe the relationship between
different entities. This is denoted by the diamond or a rhombus symbol. For example, the
teacher entity type is related to the student entity type and their relation is represented by the
diamond shape.
Page | 15
1. One-to-One Relationships: When only one instance of an entity is associated with the relationship to
one instance of another entity, then it is known as one to one relationship. For example, let us assume
that a male can marry one female and a female can marry one male. Therefore the relation is one-to-
one.
2. One-to-Many Relationships: If only one instance of the entity on the left side of the relationship is
linked to multiple instances of the entity on the right side, then this is considered a given one-to-many
relationship. For example, a Scientist can invent many inventions, but the invention is done by only a
specific scientist.
3. Many-to-One Relationships: If only one instance of the entity on the left side of the relationship is
linked to multiple instances of the entity on the right side, then this is considered a given one-to-many
relationship. For example, a Student enrolls for only one course, but a course can have many students.
4. Many to Many Relationships: If multiple instances of the entity on the left are linked by relationships to
multiple instances of the entity on the right, this is considered a many-to-one-relationship means
relationship. For example, one employee can be assigned many projects, and one project can be
assigned by many employees.
Page | 16
2. Weak Entity – Weak entity type doesn’t have a key attribute and so we cannot uniquely identify them
by their attributes alone. Therefore, a foreign key must be used in combination with its attributes to
create a primary key. They are called Weak entity types because they can’t be identified on their own. It
relies on another powerful entity for its unique identity. A weak entity is represented by a double-
outlined rectangle in ER diagrams.
For example, the address can't be used to uniquely identify students as there can be many students from the
same locality. So, for this, we need an attribute of Strong Entity Type i.e ‘student’ to uniquely
identify entities of Address Entity Type.
The relationship between a weak entity type and a strong entity type is shown with a double-outlined diamond
instead of a single-outlined diamond. This representation can be seen in the image given below.
Each entity in the set of strong entities can be uniquely identified because it has a primary key, whereas
identifying each entity in a set of weak entities is not possible. After all, it has no primary key and it may contain
redundant entities.
Generalization Characteristics:
Page | 17
For example, Faculty and Student entities can be generalized and create a higher level entity Person.
Specialization
o Specialization is a top-down
approach, and it is opposite to
Generalization. In specialization, one
higher level entity can be broken
down into two lower level entities.
o Specialization is used to identify the
subset of an entity set that shares
some distinguishing characteristics.
o Normally, the superclass is defined
first, the subclass and its related
attributes are defined next, and
relationship set are then added.
o For example: In an Employee
management system, EMPLOYEE entity can be specialized as TESTER or DEVELOPER based on what role
they play in the company.
Page | 18
AGGREGATION
In aggregation, the relation between two entities is treated as a single entity. In aggregation, relationship with
its corresponding entities is aggregated into a higher level entity.
For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor. In the real
world, if a visitor visits a coaching center then he will
never enquiry about the Course only or just about
the Center instead he will ask the enquiry about
both.
INHERITANCE
Inheritance is an important feature of Generalization
and Specialization. It allows lower-level entities to
inherit the attributes of higher-level entities.
DESIGN OF ER SCHEMA
ER Models in Database Design
They are widely used to design relational databases. The entities in the ER schema become tables, attributes and
converted the database schema. Since they can be used to visualize database tables and their relationship, it’s
commonly used for database troubleshooting as well.
REDUCTION OF ER
The notations in an ER diagram can be used to represent the database, and these notations can be reduced to a
set of tables.
The database can be represented using the notations, and these notations can be reduced to a collection of
tables.
Page | 19
RELATIONS
• A one-to-one relationship means when a single record in the first table is related to only one record in
the other table.
• A one-to-many relationship is defined as when a single record in the first table is related to one or more
records in the other table, but a single record in the other table is related to only one record in the first
table.
• A many-to-many relationship can be defined as when a single record in the first table is related to one
or more products in the second table and a single record in the second table is related to one or more
records in the first table.
• A well-defined relationship adds more integrity to the table structure and makes the DBMS more
efficient.
KIND OF RELATIONS
• one-to-one
• one-to-many, and
• many-to-many
RELATIONAL DATABASE
A relational database is a collection of information that organizes data in predefined relationships where data is
stored in one or more tables (or "relations") of columns and rows, making it easy to see and understand how
different data structures relate to each other.
A relational database (RDB) is a way of structuring information in tables, rows, and columns.
An RDB has the ability to establish links—or relationships–between information by joining tables, which makes it
easy to understand and gain insights about the relationship between various data points.
A relational database is a type of database that stores and provides access to data points that are related to one
another.
Relational databases are based on the relational model, an intuitive, straightforward way of representing data in
tables.
In a relational database, each row in the table is a record with a unique ID called the key. The columns of the
table hold attributes of the data, and each record usually has a value for each attribute, making it easy to
establish the relationships among data points.
A candidate key is a subset of a super key set where the key which contains no redundant attribute is none other
than a Candidate Key.
Page | 20
The role of a candidate key is to identify a table row or column uniquely. Also, the value of a candidate key
cannot be Null. The description of a candidate key is "no redundant attributes" and being a "minimal
representation of a tuple," according to the Experts.
Primary Key
A Primary Key is the minimal set of attributes of a table that has the task to uniquely identify the rows, or we
can say the tuples of the given particular table.
A primary key of a relation is one of the possible candidate keys which the database designer thinks it's primary.
It may be selected for convenience, performance and many other reasons.
There are certain keys in DBMS that are used for different purposes, from which the most commonly known is
the Primary Key.
o A primary key may be composed of a single attribute known as single primary key or more than one
attribute known as composite key.
o The data values for the primary key attribute should not be null.
o Attributes which are part of a primary key are known as Prime attributes.
o If the primary key is made of more than one attribute then those attributes are irreducible.
o We use the convention that the attribute that form primary key of relation is underlined.
o Columns that are defined as LONG or LONG RAW cannot be part of a primary key.
FOREIGN KEY
In the relational databases, a foreign key is a field or a column that is used to establish a link between two
tables.
In simple words you can say that, a foreign key in one table used to point primary key in another table.
A foreign key (FK) is a column or combination of columns that is used to establish and enforce a link between the
data in two tables to control the data that can be stored in the foreign key table.
it uses operators to perform queries. An operator can be either unary or binary. They accept relations as their
input and yield relations as their output. Relational algebra is performed recursively on a relation and
intermediate results are also considered relations.
The fundamental operations of relational algebra are as follows −
• Select
• Project
• Union
Page | 21
• Set different
• Cartesian product
• Rename
CODD’S RULE
Dr Edgar F. Codd, after his extensive research on the Relational Model of database systems, came up with twelve
rules of his own, which according to him, a database must obey in order to be regarded as a true relational
database.
These rules can be applied on any database system that manages stored data using only its relational
capabilities. This is a foundation rule, which acts as a base for all the other rules.
The database must be in relational form. So that the system can handle the database through its relational
capabilities.
A database contains various information, and this information must be stored in each cell of a table in the form
of rows and columns.
Every single or precise data (atomic value) may be accessed logically from a relational database using the
combination of primary key value, table name, and column name.
This rule defines the systematic treatment of Null values in database records. The null value has various
meanings in the database, like missing the data, no value in a cell, inappropriate information, unknown data and
the primary key should not be null.
It represents the entire logical structure of the descriptive database that must be stored online and is known as a
database dictionary. It authorizes users to access the database and implement a similar query language to access
the database.
The relational database supports various languages, and if we want to access the database, the language must
be the explicit, linear or well-defined syntax, character strings and supports the comprehensive: data definition,
view definition, data manipulation, integrity constraints, and limit transaction management operations. If the
database allows access to the data without any language, it is considered a violation of the database.
All views table can be theoretically updated and must be practically updated by the database systems.
Rule 7: Relational Level Operation (High-Level Insert, Update and delete) Rule
A database system should follow high-level relational operations such as insert, update, and delete in each level
or a single row. It also supports union, intersection and minus operation in the database system.
All stored data in a database or an application must be physically independent to access the database. Each data
should not depend on other data or an application. If data is updated or the physical structure of the database is
changed, it will not show any effect on external applications that are accessing the data from the database.
Page | 22
It is similar to physical data independence. It means, if any changes occurred to the logical level (table
structures), it should not affect the user's view (application). For example, suppose a table either split into two
tables, or two table joins to create a single table, these changes should not be impacted on the user view
application.
A database must maintain integrity independence when inserting data into table's cells using the SQL query
language. All entered values should not be changed or rely on any external factor or application to maintain
integrity. It is also helpful in making the database-independent for each front-end application.
The distribution independence rule represents a database that must work properly, even if it is stored in
different locations and used by different end-users. Suppose a user accesses the database through an
application; in that case, they should not be aware that another user uses particular data, and the data they
always get is only located on one site. The end users can access the database, and these access data should be
independent for every user to perform the SQL queries.
The non-submersion rule defines RDBMS as a SQL language to store and manipulate the data in the database. If
a system has a low-level or separate language other than SQL to access the database system, it should not
subvert or bypass integrity to transform data.
SET OPERATIONS
The SQL Set operation is used to combine the two or more SQL SELECT statements.
1. Union :-
Union
o The SQL Union operation is used to combine the result of two or more SQL SELECT queries.
o In the union operation, all the number of datatype and columns must be same in both the tables on
which UNION operation is being applied.
o The union operation eliminates the duplicate rows from its resultset.
Syntax
2.UnionAll:-
Union All operation is equal to the Union operation. It returns the set without removing duplication and sorting
the data.
Page | 23
Syntax:
3. Intersect:-
o It is used to combine two SELECT statements. The Intersect operation returns the common rows from
both the SELECT statements.
o In the Intersect operation, the number of datatype and columns must be the same.
o It has no duplicates and it arranges the data in ascending order by default.
Syntax
4. Minus:-
o It combines the result of two SELECT statements. Minus operator is used to display the rows which are
present in the first query but absent in the second query.
o It has no duplicates and data arranged in ascending order by default.
Syntax:
AGGREGATE FUNCTIONS
o SQL aggregation function is used to perform the calculations on multiple rows of a single column of a
table. It returns a single value.
An aggregate function performs a calculation on a set of values, and returns a single value. Except for COUNT(*),
aggregate functions ignore null values. Aggregate functions are often used with the GROUP BY clause of the
SELECT statement.
Aggregate functions are a vital component of database management systems. They allow us to perform
calculations on large data sets quickly and efficiently.
• Count():-
o COUNT function is used to Count the number of rows in a database table. It can work on both numeric
and non-numeric data types.
o COUNT function uses the COUNT(*) that returns the count of all the rows in a specified table.
COUNT(*) considers duplicate and Null.
Page | 24
Syntax
COUNT(*)
or
COUNT( [ALL|DISTINCT] expression )
• Sum():-
Sum function is used to calculate the sum of all selected columns. It works on numeric fields only.
Syntax
SUM()
or
SUM( [ALL|DISTINCT] expression )
• Avg():-
The AVG function is used to calculate the average value of the numeric type. AVG function returns the average
of all non-Null values.
Syntax
AVG()
or
AVG( [ALL|DISTINCT] expression )
• Min():-
MIN function is used to find the minimum value of a certain column. This function determines the smallest value
of all selected values of a column.
Syntax
MIN()
or
MIN( [ALL|DISTINCT] expression )
• Max():-
MAX function is used to find the maximum value of a certain column. This function determines the largest value
of all selected values of a column.
Syntax
MAX()
or
Page | 25
NULL VALUES
A field with a NULL value is a field with no value.
If a field in a table is optional, it is possible to insert a new record or update a record without adding a value to
this field. Then, the field will be saved with a NULL value.
NULL in SQL represents a column field in the table with no value. NULL is different from a zero value and from
"none".
Important Rule:
o A subquery can be placed in a number of SQL clauses like WHERE clause, FROM clause, HAVING clause.
o You can use Subquery with SELECT, UPDATE, INSERT, DELETE statements along with the operators like =,
<, >, >=, <=, IN, BETWEEN, etc.
o A subquery is a query within another query. The outer query is known as the main query, and the inner
query is known as a subquery.
o In the Subquery, ORDER BY command cannot be used. But GROUP BY command can be used to perform
the same function as ORDER BY command.
DERIVED RELATIONS
A derived relation is a relation instance resulting from the evaluation of a relational algebra expression over a
database instance.
DDL IN SQL
The Data Definition Language is made up of SQL commands that can be used to design the database structure. It
simply handles database schema descriptions and is used to construct and modify the structure of database
objects in the database.
• CREATE Command: The database or its objects are created with this command (like table, index,
function, views, store procedure, and triggers). There are two types of CREATE statements in SQL, one
is for the creation of a database and the other for a table.
• DROP Command: The DROP command can be used to delete a whole database or simply a table that
means entire data will also be deleted. The DROP statement deletes existing objects such as databases,
tables, indexes, and views.
Page | 26
• ALTER Command: In an existing table, this command is used to add, delete/drop, or edit columns. It
can also be used to create and remove constraints from a table that already exists.
• TRUNCATE Command: used to indicate the table’s extents for deallocation (empty for reuse). This
procedure removes all data from a table quickly, usually circumventing a number of integrity checking
processes. It was included in the SQL:2008 standard for the first time. It is somewhat equivalent to the
delete command.
DOMAIN RULES
A domain is a unique set of values that can be assigned to an attribute in a database. For example, a domain of
strings can accept only string values.
1. NOT NULL :
The Not Null constraint prevents a column from accepting null values. This implies that you can't create
a new record or change an existing one without first putting a value in the field.
Example :
2. Check :
It restricts the value of a column across ranges. It can also be understood as it's like a condition or filter
checking before saving data into a column since it defines a condition that each row must satisfy.
Example :
ATTRIBUTE RULES
▪ Attribute rules enhance the editing experience and improve data integrity for
geodatabase datasets. They are user-defined rules that can be used to automatically
Page | 27
populate attributes, restrict invalid edits during edit operations, and perform quality
assurance checks on existing features.
▪ Attribute rules are complementary to existing rules used in the geodatabase, such as
domains and subtypes.
▪ When you create an attribute rule, you must specify the rule type to use. The
attribute rule type chosen depends on the task and at what point in the editing
process the rule needs to be evaluated.
▪ Attribute rules are viewed, created, and managed in their own tabular-style view
called the Attribute Rules view.
▪ The Attribute Rules view can be accessed using the context menu of the dataset
directly from the Catalog or Contents pane.
▪ It can also be accessed by clicking the Attribute Rules button in the Data Design group
on the Data tab for a feature layer or Standalone Table tab for a table when an active
layer in the map view is selected or when using the Fields or Subtypes view.
A trigger is a database object that is associated with the table, it will be activated when a defined action is
executed for the table. The trigger can be executed when we run the following statements:
1. INSERT
2. UPDATE
3. DELETE
Syntax –
[before | after]
on [table_name]
[trigger_body]
Assertions are different from check constraints in the way that check constraints are rules that relate to one single
row only. Assertions, on the other hand, can involve any number of other tables, or any number of other rows in
the same table. Assertions also check a condition, which must return a Boolean value.
We can use Assertions when we know that We can use Triggers even particular condition may
1.
the given particular condition is always true. or may not be true.
Page | 28
Assertions are not linked to specific table or It helps in maintaining the integrity constraints in
3. event. It performs task specified or defined the database tables, especially when the primary
by the user. key and foreign key constraint are not defined.
Assertions do not maintain any track of Triggers maintain track of all changes occurred in
4.
changes made in table. table.
Assertions have small syntax compared to They have large Syntax to indicate each and every
5.
Triggers. specific of the created trigger.
6. Modern databases do not use Assertions. Triggers are very well used in modern databases.
9. Granularity applies to the entire database Granularity applies to a specific table or view
10. Syntax Uses SQL statements Syntax Uses procedural code (e.g. PL/SQL, T-SQL)
Error handling Causes transaction to be Error handling can ignore errors or handle them
11. rolled back. explicitly
Assertions are Easy to debug with SQL Triggers are more difficult to debug procedural
13. statements. code
Examples –
Examples- CHECK constraints, FOREIGN KEY
14. constraints AFTER INSERT triggers, INSTEAD OF triggers
Page | 29
▪ There are four types of data integrity in SQL. Domain integrity, Entity integrity, Referential integrity,
and User-defined integrity. All these ensure that data integrity is maintained in any table in SQL.
▪ Integrity constraints are a set of rules. It is used to maintain the quality of information.
▪ Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected.
▪ Thus, integrity constraint is used to guard against accidental damage to the database.
In SQL, we basically have four types of data integrity. Let’s go through all of them in detail:
Domain integrity:- The authenticity of inputs for a particular column is called domain integrity in SQL.
Entity integrity:- Entity integrity requires that every row in the table should have distinct records. Thus, there
must be no duplicate rows.
Referential integrity:- Relationships are fundamental to referential integrity in SQL. Whenever 2 or maybe more
tables are linked, we must guarantee that the foreign key value is always matched with the value of the primary
value. A situation in which the foreign key's value has no corresponding primary key value in the main table is
incorrect. As a result, the record would be considered an orphaned record in SQL.
User-defined integrity:- This type of integrity allows the user to implement business rules to any database which
are not covered by the other 3 types of data integrity.
UNIT III
FUNCTIONAL DEPENDENCIES AND NORMALIZATION: BASIC DEFINITIONS
Functional dependencies are relationships between attributes in a database. They describe how one attribute is
dependent on another attribute.
Functional dependencies can be used to design a database in a way that eliminates redundancy and ensures
data integrity.
• In Trivial functional dependency, a dependent is always a subset of the determinant. In other words, a
functional dependency is called trivial if the attributes on the right side are the subset of the attributes
on the left side of the functional dependency.
• X → Y is called a trivial functional dependency if Y is the subset of X.
Page | 30
Need:-
• Working with the set containing extraneous functional dependencies increases the computation time.
• Therefore, the given set is reduced by eliminating the useless functional dependencies.
• This reduces the computation time and working with the irreducible set becomes easier.
INTRODUCTION TO NORMALIZATION
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to
eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.
o Normalization is the process of organizing the data and the attributes of a database. It is performed to
reduce the data redundancy in a database and to ensure that data is stored logically.
o Normalization is a database design technique that reduces data redundancy and eliminates undesirable
characteristics like Insertion, Update and Deletion Anomalies.
o Normalization rules divides larger tables into smaller tables and links them using relationships.
o The purpose of Normalisation in SQL is to eliminate redundant (repetitive) data and ensure data is
stored logically.
o The inventor of the relational model Edgar Codd proposed the theory of normalization of data with the
introduction of the First Normal Form, and he continued to extend theory with Second and Third
Normal Form. Later he joined Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.
o Normalization in DBMS is a process which helps produce database systems that are cost-effective and
have better security models.
Page | 31
Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.
Page | 32
FD DIAGRAM
In a functional dependency diagram (FDD), functional dependency is represented by rectangles representing
attributes and a heavy arrow showing dependency. Fig. Shows A functional dependency diagram for the
simplest functional dependency, that is, FD: Y -> X.
In functional dependency diagram, each FD is displayed as a horizontal line.
The left-hand side attributes of the FD, i.e. Determinants, are connected by Vertical lines to line representing
the FD.
The right-hand side attributes are connected by arrows pointing towards the attibutes.
o It states that an attribute of a table cannot hold multiple values. It must hold only single-valued
attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
o In the second normal form, all non-key attributes are fully functional dependent on the primary key
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be in third normal
form.
Page | 33
A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function
dependency X → Y.
1. X is a super key.
DEPENDENCY PRESERVATION
A form of decomposition known as dependency-preserving decomposition preserves the relationships between
the characteristics. This implies that the original table's dependencies will remain in the deconstructed tables as
well. Data loss might occur as a result of dependency-preserving decomposition, though.
o In the dependency preservation, at least one decomposed table must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part
of R1 or R2 or must be derivable from the combination of functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The
relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A->BC
is a part of relation R1(ABC).
BCNF
Boyce Codd normal form (BCNF)
o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of
anomaly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is
said to be in BCNF. For a table to be in BCNF, following conditions must be satisfied:
Page | 34
• If a relation is in 4NF and does not contain any join dependencies, it is in 5NF.
• To avoid redundancy, 5NF is satisfied when all tables are divided into as many tables as possible.
• A relation is said to have join dependency if it can be recreated by joining multiple sub relations and
each of these sub relations has a subset of the attributes of the original relation.
UNIT IV
TRANSACTION
o A transaction can be defined as a group of tasks. A single task is the minimum processing unit which
cannot be divided further.
o The transaction is a set of logically related operation. It contains a group of tasks.
o A transaction is an action or series of actions. It is performed by a single user to perform operations for
accessing the contents of the database.
o A transaction usually means that the data in the database has changed. One of the major uses of DBMS
is to protect the user’s data from system failures. It is done by ensuring that all the data is restored to a
consistent state when the computer is restarted after a crash.
o The transaction is any one execution of the user program in a DBMS. One of the important properties
of the transaction is that it contains a finite number of steps. Executing the same program multiple
times will generate multiple transactions.
Operations in Transaction-
The main operations in a transaction are-
1. Read Operation
2. Write Operation
Page | 35
1. Read Operation-
• Read operation reads the data from the database and then stores it in the buffer in main memory.
• For example- Read(A) instruction will read the value of A from the database and will store it in the
buffer in main memory.
2. Write Operation-
• Write operation writes the updated data value back to the database from the buffer.
• For example- Write(A) will write the updated value of A from the buffer to the database.
Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A's account to
B's account. This very simple and small transaction involves several low-level tasks.
A’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
The principles of concurrency in operating systems are designed to ensure that multiple processes or threads
can execute efficiently and effectively, without interfering with each other or causing deadlock.
ACID PROPERTIES
There are properties that all transactions should follow and possess. The four basic are in combination termed as
ACID properties. ACID Properties are used for maintaining the integrity of database during transaction
processing. ACID in DBMS stands for Atomicity, Consistency, Isolation, and Durability. ACID properties and its
concepts of a transaction are put forwarded by Haerder and Reuter in the year 1983.
Page | 36
Atomicity
All changes to data are performed as if they are a single operation. That is, all the changes are performed, or
none of them are.
The term atomicity defines that the data remains atomic. It means if any operation is performed on the data,
either it should be performed or executed completely or should not be executed at all. It further means that the
operation should not break in between or execute partially.
For example, in an application that transfers funds from one account to another, the atomicity property ensures
that, if a debit is made successfully from one account, the corresponding credit is made to the other account.
Consistency
Data is in a consistent state when a transaction starts and when it ends.
The word consistency means that the value should remain preserved always. In DBMS, the integrity of the data
should be maintained, which means if a change in the database is made, it should remain preserved always. In
the case of transactions, the integrity of the data is very essential so that the database remains consistent before
and after the transaction. The data should always be correct.
For example, in an application that transfers funds from one account to another, the consistency property
ensures that the total value of funds in both the accounts is the same at the start and end of each transaction.
Isolation
The intermediate state of a transaction is invisible to other transactions. As a result, transactions that run
concurrently appear to be serialized.
The term 'isolation' means separation. In DBMS, Isolation is the property of a database where no data should
affect the other one and may occur concurrently.
For example, in an application that transfers funds from one account to another, the isolation property ensures
that another transaction sees the transferred funds in one account or the other, but not in both, nor in neither.
Durability
After a transaction successfully completes, changes to data persist and are not undone, even in the event of a
system failure.
Durability ensures the permanency of something. In DBMS, the term durability ensures that the data after the
successful execution of the operation becomes permanent in the database.
For example, in an application that transfers funds from one account to another, the durability property ensures
that the changes made to each account will not be reversed.
TRANSACTION STATES
A transaction goes through many different states throughout its life cycle.
Page | 37
1. Active state-
• This is the first state in the life cycle of a transaction.
• A transaction is called in an active state as long as its instructions are getting executed.
• All the changes made by the transaction now are stored in the buffer in main memory.
2. Partially committed state-
• After the last instruction of transaction has executed, it enters into a partially committed state.
• After entering this state, the transaction is considered to be partially committed.
• It is not considered fully committed because all the changes made by the transaction are still stored in
the buffer in main memory.
3. Committed state-
• After all the changes made by the transaction have been successfully stored into the database, it enters
into a committed state.
• Now, the transaction is considered to be fully committed.
Note-
• After a transaction has entered the committed state, it is not possible to roll back the transaction.
• In other words, it is not possible to undo the changes that has been made by the transaction.
• This is because the system is updated into a new consistent state.
• The only way to undo the changes is by carrying out another transaction called as compensating
transaction that performs the reverse operations.
4. Failed state-
• When a transaction is getting executed in the active state or partially committed state and some failure
occurs due to which it becomes impossible to continue the execution, it enters into a failed state.
5. Aborted state-
• After the transaction has failed and entered into a failed state, all the changes made by it have to be
undone.
• To undo the changes made by the transaction, it becomes necessary to roll back the transaction.
• After the transaction has rolled back completely, it enters into an aborted state.
6. Terminated state-
• This is the last state in the life cycle of a transaction.
• After entering the committed state or aborted state, the transaction finally enters into a terminated
state where its life cycle finally comes to an end.
Page | 38
Atomicity:
one of the key characteristics of transactions in database management systems (DBMS) is atomicity, which
guarantees that every operation within a transaction is handled as a single, indivisible unit of work.
Durability:
One of the key characteristics of transactions in database management systems (DBMS) is durability, which
guarantees that changes made by a transaction once it has been committed are permanently kept in the
database and will not be lost even in the case of a system failure or catastrophe.
Implementation of Atomicity:
A number of strategies are used to establish atomicity in DBMS to guarantee that either all operations inside a
transaction are correctly done or none of them are executed at all.
o Undo Log: An undo log is a mechanism used to keep track of the changes made by a transaction before
it is committed to the database. If a transaction fails, the undo log is used to undo the changes made by
the transaction, effectively rolling back the transaction. By doing this, the database is guaranteed to
remain in a consistent condition.
o Redo Log: A redo log is a mechanism used to keep track of the changes made by a transaction after it is
committed to the database. If a system failure occurs after a transaction is committed but before its
changes are written to disk, the redo log can be used to redo the changes and ensure that the database
is consistent.
o Two-Phase Commit: Two-phase commit is a protocol used to ensure that all nodes in a distributed
system commit or abort a transaction together. This ensures that the transaction is executed atomically
across all nodes and that the database remains consistent across the entire system.
o Locking: Locking is a mechanism used to prevent multiple transactions from accessing the same data
concurrently. By ensuring that only one transaction can edit a specific piece of data at once, locking
helps to avoid conflicts and maintain the consistency of the database.
The implementation of durability in DBMS involves several techniques to ensure that committed changes are
durable and can be recovered in the event of failure.
o Write-Ahead Logging: Write-ahead logging is a mechanism used to ensure that changes made by a
transaction are recorded in the redo log before they are written to the database. This makes sure that
the changes are permanent and that they can be restored from the redo log in the event of a system
failure.
o Checkpointing: Checkpointing is a technique used to periodically write the database state to disk to
ensure that changes made by committed transactions are permanently stored. Checkpointing aids in
minimizing the amount of work required for database recovery.
Page | 39
o Redundant storage: Redundant storage is a technique used to store multiple copies of the database or
its parts, such as the redo log, on separate disks or systems. This ensures that even in the event of a
disk or system failure, the data can be recovered from the redundant storage.
o RAID: In order to increase performance and reliability, a technology called RAID (Redundant Array of
Inexpensive Disks) is used to integrate several drives into a single logical unit. RAID can be used to
implement redundancy and ensure that data is durable even in the event of a disk failure.
Here are Some Common Techniques used by DBMS to Implement Atomicity and Durability:
o Transactions: Transactions are used to group related operations that need to be executed atomically.
They are either committed, in which case all their changes become permanent, or rolled back, in which
case none of their changes are made permanent.
o Logging: Logging is a technique that involves recording all changes made to the database in a separate
file called a log. The log is used to recover the database in case of a failure. Write-ahead logging is a
common technique that guarantees that data is written to the log before it is written to the database.
o Shadow Paging: Shadow paging is a technique that involves making a copy of the database before any
changes are made. The copy is used to provide a consistent view of the database in case of failure. The
modifications are made to the original database after a transaction has been committed.
o Backup and Recovery: In order to guarantee that the database can be recovered to a consistent state in
the event of a failure, backup and recovery procedures are used. This involves making regular backups
of the database and keeping track of changes made to the database since the last backup.
CONCURRENT EXECUTIONS
The execution of a concurrent program consists of multiple processes active at the same time. This process is
called a concurrent execution.
If the computer has multiple processors then instructions from a number of processes, equal to the number of
physical processors, can be executed at the same time. This is sometimes referred to as parallel
or real concurrent execution.
In computer science, serializability is a property of a system describing how different processes operate on
shared data. A system is serializable if its result is the same as if the operations were executed in some
sequential order, meaning there is no overlap in execution. A database management system (DBMS) can be
accomplished by locking data so that no other process can access it while it is being read or written.
serializability is a property of a system that describes how different processes operate on shared data.
Serializability guarantees that the final result is equivalent to some sequential execution but allows for improved
performance by allowing operations that do not conflict with each other to execute concurrently.
Page | 40
Serializability is the property of a schedule whereby each transaction appears to execute atomically and
independently, even though they actually execute concurrently. In other words, when several transactions are
executed concurrently, they should appear as if they were executed either sequentially or not.
Types of Serializability
In a database management system (DBMS), serializability requires that transactions appear to happen in a
particular order, even if they execute concurrently. Transactions that are not serializable may produce incorrect
results.
1. Conflict Serializability
Conflict serializability is a type of serializability in which conflicting operations on the same data items are
executed in an order that preserves database consistency. Each transaction is assigned a unique number, and the
operations within each transaction are executed in order based on that number. This ensures that no two
conflicting operations are executed concurrently. For example, consider a database with two tables: Customers
and Orders. A customer can have multiple orders, but each order can only be associated with one customer.
2. View Serializability
View serializability is a type of serializability in which each transaction produces results that are equivalent to
some well-defined sequential execution of all transactions in the system.
Concurrency Control in DBMS is a procedure of managing simultaneous transactions ensuring their atomicity,
isolation, consistency and serializability.
Concurrency control concept comes under the Transaction in database management system (DBMS). It is a
procedure in DBMS which helps us for the management of two simultaneous processes to execute without
conflicts between each other, these conflicts occur in multi user systems.
Concurrency can simply be said to be executing multiple transactions at a time. It is required to increase time
efficiency. If many transactions try to access the same data, then inconsistency arises. Concurrency control
required to maintain consistency data.
Advantages
The advantages of concurrency control are as follows −
Executing a single transaction at a time will increase the waiting time of the other transactions which may result
in delay in the overall execution. Hence for increasing the overall throughput and efficiency of the system,
several transactions are executed.
Page | 41
Concurrently control is a very important concept of DBMS which ensures the simultaneous execution or
manipulation of data by several processes or user without resulting in data inconsistency.
Concurrency control provides a procedure that is able to control concurrent execution of the operations in the
database.
• There are 4 Coffman conditions out of which if one or more are true, then there might occur a
deadlock in the system.
• Deadlock handling and its avoidance are methods to deal with the situation, while the Wait-die and
Wait-wound schemes are the two prominent ways of preventing a deadlock.
1. Centralized Approach: This is the simplest and easiest way of deadlock detection as in this only a single
resource is responsible for detecting the deadlock. But it also has its own disadvantages, such as
excessive load on a single node and having only a single point of failure that makes the system less
reliable.
2. Distributed Approach: Unlike the centralized approach, multiple nodes are responsible for detecting
deadlock. Because of this approach multiple nodes there is proper load balancing and no single point of
failure that helps to further increase the speed of detecting deadlock.
3. Hierarchical Approach: This Approach integrates both centralized and distributed approaches for
deadlock detection. In this, a single node is made to handle a particular selected set of nodes
responsible for detecting deadlock.
A deadlock is a condition where two or more transactions are waiting indefinitely for one another to give up
locks. Deadlock is said to be one of the most feared complications in DBMS as no task ever gets finished and is in
waiting state forever.
Deadlock Avoidance
o When a database is stuck in a deadlock state, then it is better to avoid the database rather than
aborting or restating the database. This is a waste of time and resource.
o Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A method like
"wait for graph" is used for detecting the deadlock situation but this method is suitable only for the
smaller database. For the larger database, deadlock prevention method can be used.
Deadlock Prevention
o Deadlock prevention method is suitable for a large database. If the resources are allocated in such a
way that deadlock never occurs, then the deadlock can be prevented.
o The Database management system analyzes the operations of the transaction whether they can create
a deadlock situation or not. If they do, then the DBMS never allowed that transaction to be executed.
Deadlock Detection
In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should detect whether the
transaction is involved in a deadlock or not. The lock manager maintains a Wait for the graph to detect the
deadlock cycle in the database.
Page | 42
FAILURE CLASSIFICATION
A failure is always related to a required
function. The function is o en specified
together with a performance requirement. 1
A failure occurs when the function cannot be
performed or has a performance that
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point from where it can't go any
further. If a few transaction or process is hurt, then this is called as transaction failure.
1. Logical errors: If a transaction cannot complete due to some code error or an internal error
condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction because the
database system is not able to execute it. For example, The system aborts an active
transaction, in case of deadlock or resource unavailability.
2. System Crash
o System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.
Fail-stop assumption: In the system crash, non-volatile storage is assumed not to be corrupted.
3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail frequently. It was a common
problem in the early days of technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head crash, and unreachability to
the disk or any other failure, which destroy all or part of disk storage.
HEAP
The non-keyed storage structure with sequential data entry and access. There is also a compressed heap
structure (cheap) with trailing blanks removed.
HASH
A keyed storage structure with algorithmically chosen addresses based on key data values. There is also a
compressed hash structure (chash) with trailing blanks removed.
Page | 43
ISAM
A keyed storage structure in which data is sorted by values in key columns for fast access. The index is static and
needs remodification as the table grows. There is also a compressed ISAM structure (cISAM) with trailing blanks
removed.
BTREE
A keyed storage structure in which data is sorted by values in key columns, but the index is dynamic and grows
as the table grows. There is also a compressed B-tree structure (cB-tree) with trailing blanks removed.
To achieve such storage, we need to replicate the required information on multiple storage devices with
independent failure modes. The writing of an update should be coordinate in such a way that it would not delete
all the copies of the state and when we are recovering from a failure we can force all the copies to a consistent
and correct valued even if another failure occurs during the recovery. In this article we will discuss how to cover
these needs.
To implement such storage, we need to replicate the needed information on multiple storage devices (usually
disks) with independent failure modes. We need to coordinate the writing of updates in a way that guarantees
that a failure during an update will not leave all the copies in a damaged state and that, when we are recovering
from a failure, we can force all copies to a consistent and correct value, even if another failure occurs during the
recovery. In this section, we discuss how to meet these needs. A disk write results in one of three outcomes:
2. Partial failure:- A failure occurred in the midst of transfer, so only some of the sectors were written with the
new data, and the sector being written during the failure may have been corrupted.
3. Total failure.:-The failure occurred before the disk write started, so the previous data values on the disk
remain intact. Whenever a failure occurs during writing of a block, the system needs to detect it and invoke a
recovery procedure to restore the block to a consistent state.
DATA ACCESS
Data access is the ability to retrieve, modify, copy, or move data from IT systems in any location, whether the
data is in motion or at rest.
Data access works with complementary technologies, including data virtualization and master data
management, to put your data to work on premise, in the cloud and everywhere in between.
Users who have data access can store, retrieve, move or manipulate stored data, which can be stored on a wide
range of hard drives and external devices.
There are two ways to access stored data: random access and sequential access. The sequential method requires
information to be moved within the disk using a seek operation until the data is located. Each segment of data
has to be read one after another until the requested data is found. Reading data randomly allows users to store
or retrieve data anywhere on the disk, and the data is accessed in constant time.
Oftentimes when using random access, the data is split into multiple parts or pieces and located anywhere
randomly on a disk. Sequential files are usually faster to load and retrieve because they require fewer seek
operations.
The goal of data access is to provide individuals and organizations with the ability to access or retrieve data
stored within a repository so users can retrieve, move, or manipulate it across a wide range of use cases.
data access can involve a range of technologies, tools, and processes. For example, it may involve using a
database management system to store and retrieve data, implementing data security measures to protect
against unauthorized access, and using data analytics tools to visualize, process, and unlock insights.
Page | 44
Overall, data access plays a crucial role in modern organizations, as it enables businesses to reach siloed data
sources to make informed decisions.
Log-Based Recovery
o The log is a sequence of records. Log of each transaction is maintained in some stable storage so that if
any failure occurs, then it can be recovered from there.
o If any operation is performed on the database, then it will be recorded in the log.
o But the process of storing the logs should be done before the actual transaction is applied in the
database.
When the system is crashed, then the system consults the log to find which transactions need to be undone and
which need to be redone.
1. If the log contains the record <Ti, Start> and <Ti, Commit> or <Ti, Commit>, then the Transaction Ti
needs to be redone.
2. If log contains record<Tn, Start> but does not contain the record either <Ti, commit> or <Ti, abort>,
then the Transaction Ti needs to be undone.
Atomicity property of DBMS states that either all the operations of transactions must be performed or none. The
modifications done by an aborted transaction should not be visible to database and the modifications done by
committed transaction should be visible. To achieve our goal of atomicity, user must first output to stable
storage information describing the modifications, without modifying the database itself. This information can
help us ensure that all modifications performed by committed transactions are reflected in the database. This
information can also help us ensure that no modifications made by an aborted transaction persist in the
database. Log is a sequence of records, which maintains the records of actions performed by a transaction. It is
important that the logs are written prior to the actual modification and stored on a stable storage media, which
is failsafe.
o The deferred modification technique occurs if the transaction does not modify the database until it has
committed.
o In this method, all the logs are created and stored in the stable storage, and the database is updated
when a transaction commits.
o The Immediate modification technique occurs if database modification occurs while the transaction is
still active.
o In this technique, the database is modified immediately after every operation. It follows an actual
database modification.
Page | 45
CHECKPOINTS
o The checkpoint is a type of mechanism where all the previous logs are removed from the system and
permanently stored in the storage disk.
o The checkpoint is like a bookmark. While the execution of the transaction, such checkpoints are
marked, and the transaction is executed then using the steps of the transaction, the log files will be
created.
o When it reaches to the checkpoint, then the transaction will be updated into the database, and till that
point, the entire log file will be removed from the file. Then the log file is updated with the new step of
transaction till next checkpoint and so on.
o The checkpoint is used to declare a point before which the DBMS was in the consistent state, and all
transactions were committed.
o Place unrelated
o Spread-out query processing
o The administration of distributed transactions
o Independent of hardware
o Network independent of operating systems
o Transparency of transactions
o DBMS unrelated
Types:
1. Homogeneous Database:- A homogeneous database stores data uniformly across all locations. All
sites utilize the same operating system, database management system, and data structures. They are
therefore simple to handle.
2. Heterogeneous Database: -With a heterogeneous distributed database, many locations may employ
various software and schema, which may cause issues with queries and transactions. Moreover, one
Page | 46
site could not be even aware of the existence of the other sites. Various operating systems and
database applications may be used by various machines. They could even employ separate database
data models. Translations are therefore necessary for communication across various sites.
1. Replication –
In this approach, the entire relationship is stored redundantly at 2 or more sites. If the entire database is
available at all sites, it is a fully redundant database. Hence, in replication, systems maintain copies of data.
This is advantageous as it increases the availability of data at different sites. Also, now query requests can be
processed in parallel.
However, it has certain disadvantages as well. Data needs to be constantly updated. Any change made at one
site needs to be recorded at every site that relation is stored or else it may lead to inconsistency. This is a lot of
overhead. Also, concurrency control becomes way more complex as concurrent access now needs to be checked
over a number of sites.
2. Fragmentation –
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and each of the fragments
is stored in different sites where they’re required. It must be made sure that the fragments are such that they
can be used to reconstruct the original relation (i.e, there isn’t any loss of data).
Fragmentation is advantageous as it doesn’t create copies of data, consistency is not a problem.
DATA REPLICATION
Data replication is the process of making multiple copies of data and storing them at different locations for
backup purposes, fault tolerance and to improve their overall accessibility across a network.
Although data replication can be demanding in terms of cost, computational, and storage requirements,
businesses widely use this database management technique to achieve one or more of the following goals:
1. Improve the availability of data
Page | 47
Data Replication is the process of storing data in more than one site or node. It is useful in improving the
availability of data. It is simply copying data from a database from one server to another server so that all the
users can share the same data without any inconsistency.
1. Improved performance, as data can be read from a local copy of the data instead of a remote one.
2. Increased data availability, as copies of the data can be used in case of a failure of the primary database.
3. Improved scalability, as the load on the primary database can be reduced by reading data from the
replicas.
2. Increased risk of data inconsistencies, as data can be updated simultaneously on different replicas.
3. Increased storage and network usage, as multiple copies of the data need to be stored and transmitted.
4. Data replication is widely used in various types of systems, such as online transaction processing
systems, data warehousing systems, and distributed systems.
As the name suggests, here the data / records are fragmented horizontally. i.e.; horizontal subset of table data is
created and are stored in different database in DDB.
For example, consider the employees working at different locations of the organization like India, USA, UK etc.
number of employees from all these locations are not a small number. They are huge in number. When any
details of any one employee are required, whole table needs to be accessed to get the information. Again the
employee table may present in any location in the world. But the concept of DDB is to place the data in the
nearest DB so that it will be accessed quickly. Hence what we do is divide the entire employee table data
horizontally based on the location. i.e.
This is the vertical subset of a relation. That means a relation / table is fragmented by considering the columns of
it.
Page | 48
The vertical fragmentation of this table may be dividing the table into different tables with one or more columns
from EMPLOYEE.
This is the combination of horizontal as well as vertical fragmentation. This type of fragmentation will have
horizontal fragmentation to have subset of data to be distributed over the DB, and vertical fragmentation to
have subset of columns of the table.
As we observe in above diagram, this type of fragmentation can be done in any order. It does not have any
particular order. It is solely based on the user requirement. But it should satisfy fragmentation conditions.
UNIT V
EMERGING FIELDS IN DBMS: OBJECT ORIENTED DATABASES-BASIC IDEA AND THE MODEL
Object-oriented databases (OODBs) are an emerging field in DBMS (Database Management Systems) that aim to
provide a data model and storage structure specifically designed for object-oriented programming paradigms.
OODBs extend the traditional relational database model to support the storage and retrieval of complex,
structured objects directly within the database.
The basic idea behind object-oriented databases is to bridge the gap between programming languages and
databases by integrating the concepts of object-oriented programming into the data management system. This
allows developers to work with persistent objects in a more natural and seamless manner.
Object-oriented databases provide a more natural and efficient way to manage complex and structured data that
aligns well with object-oriented programming paradigms. They are particularly useful in domains where complex
data structures and relationships are prevalent, such as computer-aided design (CAD), multimedia applications,
scientific research, and data-intensive software systems.
It's worth noting that while object-oriented databases have their advantages, they are not as widely adopted as
traditional relational databases. Relational databases, such as SQL-based systems, still dominate the mainstream
due to their maturity, standardization, and extensive tooling ecosystem. However, object-oriented concepts and
features are often integrated into relational databases through extensions or object-relational mapping
frameworks, providing a compromise between the two paradigms.
1. Object Structure: The object structure refers to the way in which data is organized and stored within
the OODB. In an OODB, data is represented as objects, which are instances of predefined classes or
types. Each object has a unique identity and encapsulates both data attributes and the methods or
functions that operate on those attributes. The object structure defines the composition and
arrangement of these objects within the database.
Page | 49
• Complex Data Representation: Objects allow for the representation of complex data structures, such as
nested objects or object hierarchies. This enables the modeling of real-world entities and their
relationships in a more natural and intuitive manner.
• Encapsulation: Objects in an OODB encapsulate both data and behavior, following the principles of
object-oriented programming. Encapsulation ensures that an object's internal data and implementation
details are hidden and can only be accessed through defined methods or functions.
2. Object Class: In an OODB, an object class is a blueprint or template for creating objects. It defines the
structure, behavior, and properties that objects of that class will possess. Object classes in OODBs are
similar to classes in object-oriented programming languages.
Object classes provide a structured and organized approach to representing data in an OODB. They define the
characteristics and behaviors of objects, facilitating data modeling, code reuse, and maintaining data integrity.
Overall, the combination of object structure and object classes in OODBs allows for the effective representation,
manipulation, and organization of complex data within a database, aligning well with the principles of object-
oriented programming.
INHERITANCE
Inheritance in DBMS (Database Management Systems) refers to a mechanism that allows the creation of new
database objects based on existing objects, inheriting their attributes, relationships, and behaviors. It is a
concept borrowed from object-oriented programming and is commonly used in object-relational databases.
1. Code Reusability: Inheritance enables the reuse of existing database objects, reducing redundancy and
promoting code reusability. Common attributes, relationships, and behaviors defined in the superclass
need not be redefined in each subclass.
2. Data Consistency: Inheritance helps maintain data consistency by ensuring that common attributes and
relationships are inherited across related objects. Changes made to the superclass propagate to the
subclasses, promoting data integrity and reducing data duplication.
3. Simplified Database Design: Inheritance allows for a more modular and organized database design. By
capturing similarities and differences between related objects in a hierarchical structure, the design
becomes more maintainable and scalable.
It's important to note that inheritance in DBMS is typically found in object-relational databases or object-
oriented extensions of relational databases. Traditional relational databases may not provide explicit support for
inheritance, although relationships between tables can be established to simulate some aspects of inheritance.
In summary, inheritance in DBMS provides a means to create hierarchical relationships between objects,
facilitating code reuse, data consistency, and modular database design. It is a powerful mechanism that brings
the benefits of object-oriented programming to database systems.
Page | 50
A Data Warehouse (DW) is a relational database that is designed for query and analysis rather than transaction
processing. It includes historical data derived from transaction data from single and multiple sources.
A Data Warehouse provides integrated, enterprise-wide, historical data and focuses on providing support for
decision-makers for data modeling and analysis.
A Data Warehouse is a group of data specific to the entire organization, not only to a particular group of users.It
is not used for daily operations and transaction processing but used for making decisions.
A Data Warehouse can be viewed as a data system with the following attributes:
o It is a database designed for investigative tasks, using data from various applications.
o It supports a relatively small number of clients with relatively long interactions.
o It includes current and historical data to provide a historical perspective of information.
o Its usage is read-intensive.
o It contains a few large tables.
Page | 51
Data mining is the process of sorting through large data sets to identify patterns and relationships that can help
solve business problems through data analysis. Data mining techniques and tools enable enterprises to predict
future trends and make more-informed business decisions.
Data mining is one of the most useful techniques that help entrepreneurs, researchers, and individuals to extract
valuable information from huge sets of data. Data mining is also called Knowledge Discovery in Database (KDD).
The knowledge discovery process includes Data cleaning, Data integration, Data selection, Data transformation,
Data mining, Pattern evaluation, and Knowledge presentation.
Data Mining is a process used by organizations to extract specific data from huge databases to solve business
problems. It primarily turns raw data into useful information.
Page | 52
DATABASE ON WWW
Databases on the WWW provide a means for storing, managing, and accessing data over the internet. They play
a crucial role in powering various web applications, e-commerce platforms, content management systems, and
other online services.
A database on the World Wide Web (WWW) in database management systems (DBMS) refers to a collection of
structured data that is accessible and stored on the internet. It allows users to access, retrieve, and manipulate
data using web browsers or other web-based applications.
1. Image Formats:
• JPEG (Joint Photographic Experts Group): A widely used format for compressed images,
suitable for photographs and complex images.
• PNG (Portable Network Graphics): A lossless format for images with support for transparency
and high-quality graphics.
• GIF (Graphics Interchange Format): A format commonly used for animated images and simple
graphics.
• BMP (Bitmap): A basic format that stores uncompressed image data pixel by pixel, resulting in
large file sizes.
2. Audio Formats:
• MP3 (MPEG-1 Audio Layer 3): A compressed audio format that achieves high audio quality
while reducing file size.
• WAV (Waveform Audio File Format): A standard format for uncompressed audio files, often
used for high-fidelity recordings.
• AAC (Advanced Audio Coding): A format that offers better sound quality and smaller file sizes
compared to MP3.
• FLAC (Free Lossless Audio Codec): A lossless compression format that preserves audio quality
while reducing file size.
3. Video Formats:
• MP4 (MPEG-4 Part 14): A popular video format that supports audio, video, and subtitles in a
single file. It provides efficient compression while maintaining good quality.
• AVI (Audio Video Interleave): A container format that can store audio and video data in various
codecs, commonly used on Windows systems.
• MKV (Matroska Video): A flexible container format that can hold multiple audio, video, and
subtitle streams. It supports high-quality video and audio compression.
• MOV (QuickTime File Format): A multimedia container format developed by Apple that can
store video, audio, and other media types.
4. Other Formats:
• PDF (Portable Document Format): A format primarily used for documents, but can also store
multimedia elements such as images, audio, and video.
Page | 53
• SVG (Scalable Vector Graphics): A format for vector graphics that can be scaled without losing
quality. It is often used for icons, logos, and illustrations.
These are just a few examples of multimedia data formats commonly used in DBMS. Depending on the specific
requirements and applications, there are many other formats available, each with its own characteristics and
features. DBMS systems often provide support for multiple formats, allowing users to store and retrieve
multimedia data in a variety of ways.
1. Magnetic Disks: Magnetic disks, also known as hard disk drives (HDDs), are the most prevalent storage
media in DBMS. They consist of rotating platters coated with a magnetic material, and data is stored in
binary form on these platters. Magnetic disks provide high capacity and are suitable for storing large
amounts of data. They offer relatively lower cost per unit of storage, but their access times and data
transfer rates are slower compared to other storage media like solid-state drives (SSDs).
2. Solid-State Drives (SSDs): Solid-state drives use flash memory to store data. Unlike magnetic disks, they
have no moving parts, resulting in faster access times and higher data transfer rates. SSDs offer
improved random I/O performance, making them suitable for applications that require fast data
retrieval. However, SSDs are generally more expensive per unit of storage compared to magnetic disks.
They are often used as a cache or for storing frequently accessed data in DBMS environments.
3. Optical Disks: Optical disks, such as CDs (Compact Discs), DVDs (Digital Versatile Discs), and Blu-ray
discs, are another form of physical storage media. They use optical technology to store data in a non-
volatile manner. Optical disks provide high data durability and are primarily used for long-term archival
purposes rather than frequent data access. They offer slower access times compared to magnetic disks
and SSDs.
4. Magnetic Tapes: Magnetic tapes are sequential access storage media that use a magnetic recording
method to store data. They consist of a long strip of tape coated with a magnetic material. Magnetic
tapes offer high storage capacity but have slower access times compared to disk-based storage media.
They are typically used for backups, archives, and large-scale data storage where infrequent access is
required.
It's important to note that the choice of physical storage media depends on factors such as the specific DBMS
requirements, performance needs, budget constraints, and the nature of the data being stored. In many cases, a
combination of storage media is employed, with faster media like SSDs used for frequently accessed data and
magnetic disks or tapes used for less frequently accessed or archival data.
1. Disk Layout and Partitioning: Proper disk layout and partitioning can significantly impact performance.
Partitioning techniques such as striping or RAID (Redundant Array of Independent Disks) can distribute
data across multiple disks, improving parallelism and reducing disk contention.
2. Disk Scheduling Algorithms: The choice of disk scheduling algorithms can affect the order in which disk
requests are serviced. Algorithms like SCAN, C-SCAN, LOOK, or C-LOOK can minimize seek times and
improve overall disk performance.
3. Buffering and Caching: Using a disk buffer or cache can reduce the frequency of disk reads and writes.
Frequently accessed data can be cached in memory, reducing disk I/O operations and improving
response times.
Page | 54
4. File Organization: Choosing the appropriate file organization can impact disk performance. Techniques
such as indexing, clustering, or hashing can optimize data retrieval and minimize disk seeks.
5. Compression and Encoding: Data compression and encoding techniques can reduce the amount of data
stored on disk, resulting in reduced disk I/O operations and improved performance. However,
compression techniques may introduce overhead during data retrieval and updates.
6. I/O Parallelism: Exploiting parallelism in disk I/O operations can improve performance. Techniques such
as asynchronous I/O, parallel I/O, or multi-threading can overlap disk operations and reduce idle time.
7. Disk Defragmentation: Regularly defragmenting the disk can optimize performance by rearranging data
to minimize seek times. Defragmentation reduces the fragmentation of data blocks and improves
sequential access patterns.
8. RAID Configurations: Redundant Array of Independent Disks (RAID) configurations can provide fault
tolerance and improved performance. Techniques like RAID 0 (striping), RAID 1 (mirroring), or RAID 5
(striping with parity) can enhance disk performance and data availability.
9. Solid-State Drives (SSDs): Consider using solid-state drives instead of traditional magnetic disks. SSDs
offer faster access times and better random I/O performance, which can significantly boost DBMS
performance.
10. Regular Maintenance: Perform regular maintenance tasks such as disk cleanup, removing unnecessary
files, and optimizing database indexes. These activities can help maintain optimal disk performance
over time.
It's important to note that the performance optimization techniques may vary depending on the specific DBMS,
hardware, and workload characteristics. Therefore, it's recommended to analyze the system, benchmark
performance, and apply appropriate optimizations based on the specific environment and requirements.
• RAID is used to backup the data when a disk fails for some reason.
• RAID 2 uses Hamming code Error Detection method to correct error in data.
• RAID 3 does byte-level data striping and has parity bits for each data word.
• RAID 6 has two parity which can handle at most two disk failures.
In RAID technique, the combined disks are considered as a single logical disk by the operating system. These
individual disk uses different methods to store data. It depends on the type of RAID levels used.
• RAID 0
• RAID 1
• RAID 2
• RAID 3
Page | 55
• RAID 4
• RAID 5
• RAID 6
RAID 0
RAID 0 implements data striping. The data blocks are placed in multiple disks without redundancy. None of the
disks are used for data redundancy so if one disk fails then all the data in the array is lost.
Pros of RAID 0
• Data requests can be on multiple disks and not on a single disk hence improving the throughput.
Cons of RAID 0
• Failure of one disk can lead to complete data loss in the respective array.
• No data Redundancy is implemented so one disk failure can lead to system failure.
RAID 1
RAID 1 implements mirroring which means the data of one disk is replicated in another disk. This helps in
preventing system failure as if one disk fails then the redundant disk takes over.
Pros of RAID 1
• Failure of one Disk does not lead to system failure as there is redundant data in other disk.
Cons of RAID 1
• Extra space is required for each disk as each disk data is copied to some other disk also.
RAID 2
RAID 2 is used when error in data has to be checked at bit level, which uses a Hamming code detection method.
Two disks are used in this technique. One is used to store bit of each word in the disk and another is used to
store error code correction (Parity bits) of data words. The structure of this RAID is complex, so it is not used
commonly.
Pros of RAID 2
• One full disk is used to store parity bits which helps in detecting error.
Cons of RAID 2
RAID 3
RAID 3 implements byte-level striping of Data. Data is stored across disks with their parity bits in a separate disk.
The parity bits helps to reconstruct the data when there is a data loss.
Pros of RAID 3
Cons of RAID 3
Page | 56
RAID 4
RAID 4 implements block-level striping of data with dedicated parity drive. If only one of the data is lost in any
disk then it can be reconstructed with the help of parity drive. Parity is calculated with the help of XOR operation
over each data disk block.
Pros of RAID 4
• Parity bits helps to reconstruct the data if at most one data is lost from the disks.
Cons of RAID 4
• If there is more than one data loss from multiple disks then Parity cannot help us reconstruct the data.
RAID 5
RAID 5 is similar to RAID 4 with only one difference. The parity Rotates among the Disks.
Pros of RAID 5
• Parity is distributed over the disk and makes the performance better.
Cons of RAID 5
• Parity bits are useful only when there is data loss in at most one Disk. If there is loss in more than one
Disk Block then parity is of no use.
RAID 6
If there is more than one Disk failure, then RAID 6 implementation helps in that case. In RAID 6 there are two
parity in each array/row. It is similar to RAID 5 with extra parity.
Pros of RAID 6
Cons of RAID 6
FILE ORGANIZATION
o The File is a collection of records. Using the primary key, we can access the records. The type and frequency of
access can be determined by the type of file organization which was used for a given set of records.
o File organization is a logical relationship among various records. This method defines how file records are mapped
onto disk blocks.
o File organization is used to describe the way in which the records are stored in terms of blocks, and the blocks are
placed on the storage medium.
o The first approach to map the database to the file is to use the several files and store only one fixed length record
in any given file. An alternative approach is to structure our files so that we can contain multiple lengths for
records.
o Files of fixed length records are easier to implement than the files of variable length records.
Page | 57
o Sequential file organization:- This method is the easiest method for file organization. In this method,
files are stored sequentially. This method can be implemented in two ways:
Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence, records are nothing
but a row in the table. Suppose we want to insert a new record R2 in the sequence, then it will be placed at
the end of the file. Here, records are nothing but a row in any table.
Page | 58
Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and R7. Suppose a
new record R2 has to be inserted in the sequence, then it will be inserted at the end of the file, and then it
will sort the sequence.
✓ It is the simplest and most basic type of organization. It works with data blocks. In heap file
organization, the records are inserted at the file's end. When the records are inserted, it doesn't
require the sorting and ordering of records.
Page | 59
✓ When the data block is full, the new record is stored in some other block. This new data block need not to
be the very next data block, but it can select any data block in the memory to store new records. The heap
file is also known as an unordered file.
✓ In the file, every record has a unique id, and every page in a file is of the same size. It is the DBMS
responsibility to store and manage the new records.
o Hash file organization:- Hash File Organization uses the computation of hash function on some fields of
the records. The hash function's output determines the location of disk block where the records are to
be placed.
In this method, there is no effort for searching and sorting the entire file. In this method, each record
will be stored randomly in the memory.
o B+ file organization:-
B+ tree file organization is the advanced method of an indexed sequential access method. It uses a
tree-like structure to store records in File.
It uses the same concept of key-index where the primary key is used to sort the records. For each
primary key, the value of the index is generated and mapped with the record.
The B+ tree is similar to a binary search tree (BST), but it can have more than two children. In this
method, all the records are stored only at the leaf node. Intermediate nodes act as a pointer to the leaf
nodes. They do not contain any records.
If any record has to be retrieved based on its index value, then the address of the data block is fetched and the
record is retrieved from the memory.
Page | 60
Pros of ISAM:
o In this method, each record has the address of its data block, searching a record in a huge database is
quick and easy.
o This method supports range retrieval and partial retrieval of records. Since the index is based on the
primary key values, we can retrieve the data for the given range of value. In the same way, the partial
value can also be easily searched, i.e., the student name starting with 'JA' can be easily searched.
Cons of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be reconstructed to maintain the
sequence.
o When the record is deleted, then the space used by it needs to be released. Otherwise, the
performance of the database will slow down.
o Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored together. The above
EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster. Here, all the records
are grouped based on the cluster key- DEP_ID and all the records are grouped.
o Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records
based on the cluster key, we generate the value of the hash key for the cluster key
and store the records with the same hash key value.
Page | 61
ORDERED INDICES
The indices are usually sorted to make searching faster. The indices which are sorted are known as ordered
indices.
Ordered indexing is the traditional way of storing that gives fast retrieval. The indices are stored in a sorted
manner hence it is also known as ordered indices.
1. Dense Indexing: In dense indexing, the index table contains records for every search key value of the
database. This makes searching faster but requires a lot more space. It is like primary indexing but
contains a record for every search key.
2. Sparse Indexing: Sparse indexing consumes lesser space than dense indexing, but it is a bit slower as
well. We do not include a search key for every record despite that we store a Search key that points to
a block. The pointed block further contains a group of data. Sometimes we have to perform double
searching this makes sparse indexing a bit slower.
Page | 62
NETWORK MODEL
• Network Model in DBMS is a hierarchical model that is used to represent the many-to-many
relationship among the database constraints.
• The network model in DBMS is a hierarchal structure but is different from the hierarchal database
model as there can be numerous parents of a member.
• In the network model in DBMS, there are multiple paths to the same record which helps in avoiding
data redundancy problems.
• In the network model in DBMS, there is data integrity as every member entity has one or more owners.
Only the prime parent has no owner but it has various inter-related children.
• The network database model is very complicated due to several entities inter-related with each other.
So, managing is also quite difficult.
Operations on Network Model in DBMS
• Insertion Operation - We can insert or add a new record in the network database model but before
adding any new record the database administrator or the user needs to understand the whole
structure.
• Update Operation - We can update the data record(s). If a certain data is updated then all its children
entities are also affected.
• Deletion Operation - We can delete the data record(s) but the deletion is a very crucial operation.
Before deleting any record, we should first look out for the various connected entities so that the
corresponding entities do not get affected by the deletion.
• Retrieval Operation - The retrieval of records in the network model in DBMS is quite complex to
program but it is very fast as the entities are interconnected and various paths lead to certain records.
• In the network model in DBMS, there are multiple paths to the same record which helps in avoiding
data redundancy problems.
• In the network model in DBMS, there is data integrity as every member entity has one or more owners.
Only the prime parent has no owner but it has various inter-related children.
Page | 63
• The data retrieval is faster in the case of the network model in DBMS because the entities and the data
are more interrelated.
• Due to the parent-child relationship, if there is a change in the parent entity, it is reflected in the
children's entity as well. It also saves time as we do not need to update all the related children entities.
• In the case of the addition of new entities, the database administrator or the user needs to understand
the whole structure.
• Due to complex inter-related structure the addition, update, as well as deletion are very difficult.
• We need to use a pointer for navigation hence the operational anomalies exist.
HIERARCHICAL MODELS
Hierarchical data model is being used from the 1960s onwards where data is organized like a tree structure
In 1966 IBM introduced an information management system(IMS product) which is based on this hierarchical
data model but now it is rarely used.
The hierarchical model organizes the data into a tree structure which consist of a single root node where each
record is having a parent record and many child records and expands like a tree
A hierarchical database is a set of tables arranged in the form of a parent-child relationship. Each set of parents
can have a relationship with any number of children. But every child can have a relationship with only one set of
parents.
A hierarchical database model is a one-to-many relationship. You can think of it as an upside-down tree with the
root at the top. To access data from the database, the whole tree has to be traversed starting from the root
downwards.
• So, Hierarchical model is a collection of routed trees and the relationship that exists in the hierarchical
model is one to many and one to one.
Advantages
• It is easy to understand.
Page | 64
Disadvantages
• Data inconsistency occurs when the parent node is deleting that result in the deletion of the child node.
• Complex to design.
DBTG MODEL
The acronym DBTG refers to the Data Base Task Group of the Conference on Data Systems Languages (CODASYL),
the group responsible for standardization of the programming language COBOL.
The DBTG final report appeared in Apri1971, it introduced a new distinct and self-contained language. The DBTG
is intended to meet the requirements of many distinct programming languages, not just COBOL, the user in a
DBTG system is considered to be an ordinary application programmer and the language therefore is not biased
toward any single specific programming language.
It is based on network model. In addition to proposing a formal notation for networks (the Data Definition
Language or DDL), the DBTG has proposed a Subschema Data Definition Language (Subschema DDL) for defining
views of conceptual scheme that was itself defined using the Data Definition Language. It also proposed a Data
Manipulation Language (DML) suitable for writing applications programs that manipulate the conceptual scheme
or a view.
One to many or one to one Allowed the network mode One to one, one to many, many
relationship. to support many to many to one relationship.
relationships.
Retrieve algorithms are complex and Retrieve algorithms are Retrieve algorithms are simple
Page | 65
Based on parent child relationship. A record can have many Based on relational data
parents as well as many structures.
children.
Doesn’t provide an independent stand Conference on data system Relational databases are what
alone query interface. language. bring many sources into a
common query such as SQL.
Cannot insert the information of a Does not suffer from any Doesn’t suffer from any insertion
child who does not have any parent. insertion anomaly. anomaly.
Multiple occurrences of child records Free from update Free from update anomalies.
which lead to problems of anomalies.
inconsistency during the update
operation.
Deletion of parent results in deletion Free from delete anomalies. Free from delete anomalies.
of child record.
This model lacks data independence. There is partial data It provides data independence.
independence.
Page | 66