0% found this document useful (0 votes)

54 views67 pages

Mca 201 DBMS

Uploaded by

ramkrushna.yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views67 pages

Mca 201 DBMS

Uploaded by

ramkrushna.yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

lOMoARcPSD|44330101

DBMS END TERM

Advance DBMS (Rajiv Gandhi Proudyogiki Vishwavidyalaya)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by Ram Yadav ([email protected])
lOMoARcPSD|44330101

MCA 201 Data Base Management System

UNIT I
Database Management System (DBMS) is basically a collection of interrelated data and a set of software
tools/programs which access, process, and manipulate data. It allows access, retrieval, and use of that data by
considering appropriate security measures.

A Database Management System (DBMS) is defined as the software system that allows users to define, create,
maintain and control access to the database. DBMS makes it possible for end users to create, read, update and
delete data in database. It is a layer between programs and data.

A database management system (DBMS) is essentially a group of linked data and a collection of computer
applications and tools that retrieve, analyze, and alter data.

A database management system is a software tool used to create and manage one or more databases, offering
an easy way to create a database, update tables, retrieve information, and enhance data. A DBMS is where data
is accessed, modified and locked to prevent conflicts.

A database management system (DBMS) is a software program that allows users to create, maintain, and
interact with a database. A database is a data collection organized in a specific format, making it easy to access,
manage, and manipulate. DBMSs are the intermediaries between users and databases, handling all
communication and data processing.

DBMS is a collection of programs that enables users to create and maintain a database.

Advantage of DBMS

• Controlling Redundancy
• Restricting unauthorized access
• Providing persistent storage:- Such an object is said to be persistent as it can then be directly retrieved
by another program.
• Atomicity in data
• Permitting inferencing and actions
• Representing complex relationships among data:- DBMS has the capability to represent a variety of
complex relationships among data, as well as to retrieve and update related data efficiently. A good
example of that is the use of foreign keys!
• Provides multiple user interfaces
• Enforcing integrity constraints
• Data integration
• Providing backup and recovery
• No concurrent access anomalies
• Data Inconsistency
• Data Searching
• Data Security
• Data Concurrency
• Low Maintenance Cost

Page | 1

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

VARIOUS VIEW OF DATA

• View of data in DBMS describes the abstraction of data at three-level i.e. physical level, logical level,
view level.
• View of data in DBMS narrate how the data is visualized at each level of data abstraction? Data
abstraction allow developers to keep complex data structures away from the users. The developers
achieve this by hiding the complex data structures through levels of abstraction.
• View of data in DBMS narrate how the data is visualized at each level of data abstraction? Data
abstraction allow developers to keep complex data structures away from the users. The developers
achieve this by hiding the complex data structures through levels of abstraction.
• There is one more feature that should be kept in mind i.e. the data independence. While changing the
data schema at one level of the database must not modify the data schema at the next level.

• Data Abstraction:- Data abstraction is hiding the

complex data structure in order to simplify the user’s
interface of the system. It is done because many of the
users interacting with the database system are not that
much computer trained to understand the complex data
structures of the database system.
o Three-Schema Architecture:

The main objective of this architecture is to have

an effective separation between the user
interface and the physical database. So, the user
never has to be concerned regarding the internal
storage of the database and it has a simplified
interaction with the database system.

o The three-schema architecture defines the view of data at three levels:

▪ Physical level (internal level):- The physical or the internal level schema describes
how the data is stored in the hardware. It also describes how the data can be
accessed. The physical level shows the data abstraction at the lowest level and it has
complex data structures. Only the database administrator operates at this level.
▪ Logical level (conceptual level):- It is a level above the physical level. Here, the
data is stored in the form of the entity set, entities, their data types, the relationship
among the entity sets, user operations performed to retrieve or modify the data and
certain constraints on the data.
It is the developer and database administrator who operates at the logical or the
conceptual level.
▪ View level (external level):- It is the highest level of data abstraction and exhibits
only a part of the whole database. It exhibits the data in which the user is interested.
The view level can describe many views of the same data. Here, the user retrieves the
information using different application from the database.

Page | 2

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

DATA INDEPENDENCE
➢ Data independence can be explained using
the three-schema architecture.

➢ Data independence refers characteristic of

being able to modify the schema at one level of the
database system without altering the schema at the
next higher level.

There are two types of data independence:-

1. Logical Data Independence

o Logical data independence refers characteristic of being able to change the conceptual schema without
having to change the external schema.

o Logical data independence is used to separate the external level from the conceptual view.

o If we do any changes in the conceptual view of the data, then the user view of the data would not be
affected.

o Logical data independence occurs at the user interface level.

2. Physical Data Independence

o Physical data independence can be defined as the capacity to change the internal schema without
having to change the conceptual schema.

o If we do any changes in the storage size of the database system server, then the Conceptual structure of
the database will not be affected.

o Physical data independence is used to separate conceptual levels from the internal levels.

o Physical data independence occurs at the logical interface level.

SCHEMA AND SUBSCHEMA

Schema
The overall design of the database is called schema or description of database.

• Schema is a physical representation of data which is present in the database management

system.
• In simple words we can call a schema the structure of any database.
• It defines how the data was stored in a database and also shows the relationship among those
data, but it does not show the data present in those tables.
• The database schema includes the definition of the database, entities and the components.

Page | 3

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

• Schema of a database can only modify the DDL statement but does not change by performing
certain operations like insertion, updating, and deletion.
• Database schema explains the integrity constraints of the database, domains of all attributes,
foreign, and primary key of all the relations.
o The schema is a complete description of a database, including the names and descriptions of
all areas, records, elements, and sets. The major purpose of the schema is to provide
definitions from which to generate subschemas.

Types of Schema
Schema is of three types, which are as follows −

• View Schema − The design of a database at a view level is called view schema. This schema
generally shows the user interaction with the database system.
• Logical Schema − The design of a database at the logical level is called a logical schema. A
database administrator (DBA) and the programmers used to work at this level. This level
describes all the entities, attributes and their relationship with the integrity constraints.
• Physical Schema − The design of a database at the physical level is called a physical schema.
This schema describes how the data is stored in the secondary storage devices. There is only 1
logical and 1 physical schema per database and more than 1 view schema.
Schema is also called Intention and is shown as below −

Sub schema
It is the subset of the schema and inherits the same property that a schema has. It gives the users a window
through which he/she can view only that part of the database which he wants.
For example − Student table in a database the programmer can access all fields of table, but the user can
access only two or three fields of it. Subschema describes both views of the database.

PRIMARY CONCEPTS OF DATA MODELS

The Data Model is defined as an abstract model that organizes data description, data semantics, and
consistency constraints of data. The data model emphasizes on what data is needed and how it should be
organized instead of what operations will be performed on data. Data Model is like an architect’s building
plan, which helps to Jayda build conceptual models and set a relationship between data items.

The two types of Data Modeling Techniques are

➢ Entity Relationship (E-R) Model

➢ UML (Unified Modelling Language)

Page | 4

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

A data model helps design the database at the conceptual, physical and logical levels.

Data Model structure helps to define the relational tables, primary and foreign keys and stored procedures.

It provides a clear picture of the base data and can be used by database developers to create a physical
database.

It is also helpful to identify missing and redundant data.

Though the initial creation of data model is labor and time consuming, in the long run, it makes your IT
infrastructure upgrade and maintenance cheaper and faster.

Advantages of Data model:

➢ The main goal of a designing data model is to make certain that data objects offered by the functional
team are represented accurately.
➢ The data model should be detailed enough to be used for building the physical database.
➢ The information in the data model can be used for defining the relationship between tables, primary
and foreign keys, and stored procedures.
➢ Data Model helps business to communicate the within and across organizations.
➢ Data model helps to documents data mappings in ETL process
➢ Help to recognize correct sources of data to populate the model

Disadvantages of Data model:

➢ To develop Data model one should know physical data stored characteristics.
➢ This is a navigational system produces complex application development, management. Thus, it
requires a knowledge of the biographical truth.
➢ Even smaller change made in structure require modification in the entire application.
➢ There is no set data manipulation language in DBMS.

Types of Data Models in DBMS

There are mainly three different types of data models: conceptual data models, logical data models, and physical
data models, and each one has a specific purpose. The data models are used to represent the data and how it is
stored in the database and to set the relationship between data items.

1. Conceptual Data Model: This Data Model defines WHAT the system contains. This model is typically
created by Business stakeholders and Data Architects. The purpose is to organize, scope and define
business concepts and rules.

The 3 basic tenants of Conceptual Data Model are

➢ Entity: A real-world thing

➢ Attribute: Characteristics or properties of an entity
➢ Relationship: Dependency or association between two entities

Characteristics of a conceptual data model

➢ Offers Organisation-wide coverage of the business concepts.

➢ This type of Data Models are designed and developed for a business audience.
➢ The conceptual model is developed independently of hardware specifications like data storage
capacity, location or software specifications like DBMS vendor and technology. The focus is to
represent data as a user will see it in the “real world.”
➢ Conceptual data models known as Domain models create a common vocabulary for all
stakeholders by establishing basic concepts and scope.

Page | 5

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

2. Logical Data Model: Defines HOW the system should be implemented regardless of the DBMS. This
model is typically created by Data Architects and Business Analysts. The purpose is to developed
technical map of rules and data structures.

Characteristics of a Logical data model

➢ Describes data needs for a single project but could integrate with other logical data models
based on the scope of the project.
➢ Designed and developed
independently from the DBMS.
➢ Data attributes will have datatypes
with exact precisions and length.
➢ Normalization processes to the model
is applied typically till 3NF.

3. Physical Data Model: This Data Model

describes HOW the system will be
implemented using a specific DBMS system.
This model is typically created by DBA and
developers. The purpose is actual
implementation of the database.

Characteristics of a physical data model:

➢ The physical data model describes data need for a single project or application though it
maybe integrated with other physical data models based on project scope.
➢ Data Model contains relationships between tables that which addresses cardinality and
nullability of the relationships.
➢ Developed for a specific version of a DBMS, location, data storage or technology to be used in
the project.
➢ Columns should have exact datatypes, lengths assigned and default values.
➢ Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are defined.

DATABASE LANGUAGES
In our daily lives, we make use of certain languages to communicate and share our thoughts with other
individuals. It's an essential part of our lives as it helps others understand what we want to convey to them.

Similarly, in the data world, we need some special kind of programming languages to make the DBMS software
understand our needs and manage the data stored in the databases accordingly. These programming languages
are known as database languages or query languages.

Database languages are used to perform a variety of critical tasks that help a database management system
function correctly. These tasks can be certain operations such as read, update, insert, search, or delete the data
stored in the database.

Database Language is a special type of programming language used to define and manipulate a database. Based
on their application, database languages are classified into four different types: DDL, DML, DCL, and TCL.

Page | 6

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Types of Database Languages

1. Data Definition Language (DDL)

DDL is used for specifying the database schema. It is used for creating tables, schema, indexes, constraints etc. in
database.

❖ It is used to define database structure

or pattern.
❖ It is used to create schema, tables,
indexes, constraints, etc. in the
database.
❖ Using the DDL statements, you can
create the skeleton of the database.
❖ Data definition language is used to
store the information of metadata like
the number of tables and schemas,
their names, indexes, columns in each
table, constraints, etc.

Lets see the operations that we can perform

on database using DDL:

➢ To create the database instance –

CREATE
➢ To alter the structure of database – ALTER
➢ To drop database instances – DROP
➢ To delete tables in a database instance – TRUNCATE
➢ To rename database instances – RENAME
➢ To drop objects from database such as tables – DROP
➢ To Comment – Comment

All of these commands either defines or update the database schema that’s why they come under Data
Definition language.

2. Data Manipulation Language (DML)

DML is used for accessing and manipulating data in a database. The following operations on database comes
under DML:

➢ To read records from table(s) – SELECT

➢ To insert record(s) into the table(s) – INSERT
➢ Update the data in table(s) – UPDATE
➢ Delete all the records from the table – DELETE

3. Data Control language (DCL)

DCL is used for granting and revoking user access on a database –

➢ To grant access to user – GRANT

➢ To revoke access from user – REVOKE

In practical data definition language, data manipulation language and data control languages are not
separate language, rather they are the parts of a single database language such as SQL.

❖ It is used to retrieve the stored or saved data.

❖ The DCL execution is transactional. It also has rollback parameters.

Page | 7

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

4. Transaction Control Language(TCL)

The changes in the database that we made using DML commands are either performed or rollbacked using TCL.

➢ To persist the changes made by DML commands in database – COMMIT

➢ To rollback the changes made to the database – ROLLBACK

TRANSACTION MANAGEMENT
A transaction is a logical unit of work performed on a database. They are logically ordered units of work
completed by the end-user or an application.

A transaction is made up of one or more database modifications. Creating, updating, or deleting a record from a
table.

A Transaction can be seen as a set of operations that are used to perform some logical set of work. A transaction
is used to make changes in data in a database which can be done by inserting new data, altering the existing
data, or by deleting the already data.

Lifetime of a transaction has multiple states, these states update the system about the current state of the
transaction and also tell the user about how to plan further processing.

Transaction states
There are various database transaction states as follows.

1. Active state - this is the state in which a transaction execution process begins. Operations such as read
or write are performed on the database.

2. Partially committed - means that a transaction is only

partially committed once it has been completed.

3. Committed stage - After a transaction execution is

completed successfully the transaction is in a committed
state. All changes made to the database are permanently
documented.

4. Failed state - If a transaction is aborted while in the active

state, or if one of the checks fails, the transaction is in the
failed state.

5. Terminated state - This state happens once the transaction

leaving the system cannot be restarted once again .

Transaction properties
There are four main properties of a transaction represented in the acronym ACID. This referrs
to Atomicity, Consistency, Isolation, and Durability.

1. Atomicity - A transaction cannot be subdivided and can only be executed as a whole and is treated as
an atomic unit. It is either all the operations are carried out or none are performed.
2. Consistency - After any transaction is carried out in a database it should remain consistent. No
transaction should affect the data residing in the database adversely.
3. Isolation - When several transactions need to be conducted in a database at the same time, each
transaction is treated as if it were a single transaction. As a result, the completion of a single
transaction should have no bearing on the completion of additional transactions.
4. Durability - From durable, all changes made must be permanent such that once the transaction is
committed the effects of the transaction cannot be reversed. In case of system failure or unexpected

Page | 8

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

shutdown and changes made by a complete transaction are not written to the disk, during restart the
changes should be remembered and restored.

DATABASE ADMINISTRATOR AND USERS

Database users are categorized based up on their interaction with the database. These are seven types of
database users in DBMS.

1. Database Administrator (DBA) : Database Administrator (DBA) is a person/team who defines the
schema and also controls the 3 levels of database. The DBA will then create a new account id and
password for the user if he/she need to access the database. DBA is also responsible for providing
security to the database and he allows only the authorized users to access/modify the data base. DBA is
responsible for the problems such as security breaches and poor system response time.

• DBA also monitors the recovery and backup and provide technical support.

• The DBA has a DBA account in the DBMS which called a system or superuser account.

• DBA repairs damage caused due to hardware and/or software failures.

• DBA is the one having privileges to perform DCL (Data Control Language) operations such as
GRANT and REVOKE, to allow/restrict a particular user from accessing the database.

2. Naive / Parametric End Users : Parametric End Users are the unsophisticated who don’t have any DBMS
knowledge but they frequently use the database applications in their daily life to get the desired
results. For examples, Railway’s ticket booking users are naive users. Clerks in any bank is a naive user
because they don’t have any DBMS knowledge but they still use the database and perform their given
task.

3. System Analyst :
System Analyst is a user who analyzes the requirements of parametric end users. They check whether
all the requirements of end users are satisfied.

4. Sophisticated Users : Sophisticated users can be engineers, scientists, business analyst, who are familiar
with the database. They can develop their own database applications according to their requirement.
They don’t write the program code but they interact the database by writing SQL queries directly
through the query processor.

5. Database Designers : Data Base Designers are the users who design the structure of database which
includes tables, indexes, views, triggers, stored procedures and constraints which are usually enforced
before the database is created or populated with data. He/she controls what data must be stored and
how the data items to be related. It is responsibility of Database Designers to understand the
requirements of different user groups and then create a design which satisfies the need of all the user
groups.

6. Application Programmers : Application Programmers also referred as System Analysts or simply

Software Engineers, are the back-end programmers who writes the code for the application programs.
They are the computer professionals. These programs could be written in Programming languages such
as Visual Basic, Developer, C, FORTRAN, COBOL etc. Application programmers design, debug, test, and
maintain set of programs called “canned transactions” for the Naive (parametric) users in order to
interact with database.

7. Casual Users / Temporary Users : Casual Users are the users who occasionally use/access the database
but each time when they access the database they require the new information, for example, Middle or
higher level manager.

8. Specialized users : Specialized users are sophisticated users who write

specialized database application that does not fit into the traditional data-
processing framework. Among these applications are computer aided-design
systems, knowledge-base and expert systems etc.

Page | 9

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

DATA DICTIONARY
A data dictionary in Database Management System (DBMS) can be defined as a component that stores the
collection of names, definitions, and attributes for data elements that are being used in a database. The Data
Dictionary stores metadata, i.e., data about the database.

Data Dictionary is made up of two words, data which means the collected information through multiple sources,
and dictionary meaning the place where all this information is made available.

A data dictionary is a crucial part of a relational database as it provides additional information about the
relationships between multiple tables in a database. The data dictionary in DBMS helps the user to arrange data
in a neat and well-organized way, thus preventing data redundancy.

Data Dictionary in DBMS provides additional information about relationships between multiple database tables,
helps to organize data, and prevents data redundancy in DBMS.

A data dictionary is a set of files that contain a database's metadata. Thus, it is also known as a metadata
repository. storing the relational schemas and other metadata about the relations in a structure is known
as Data Dictionary or System Catalog.

A data dictionary is like the A-Z dictionary of the relational database system holding all information of each
relation in the database.

Types of Data Dictionary in DBMS

There are mainly two types of data dictionary in a database management system:

1. Integrated Data Dictionary

2. Stand Alone Data Dictionary

OVERALL SYSTEM ARCHITECTURE.

The Database Management System (DBMS) architecture shows how data in the database is viewed by the
users. It is not concerned about how the data are handled and processed by the DBMS.
It helps in implementation, design, and maintenance of a database to store and organize information for
companies. The concept of DBMS depends upon its architecture. The architecture can be designed as
centralized, decentralized, or hierarchical.
The architecture of DBMS can be defined at three levels as follows −

• External levels.
• Conceptual levels.
• Internal levels.
The main objective of the three level architecture is nothing but to separate each user view of the data from
the way the database is physically represented. The database internal structure should be unaffected while
changes to the physical aspects of storage.
The DBA should be able to change the conceptual structure of the database without affecting all other users.

Page | 10

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

ER MODEL: BASIC CONCEPTS

• ER model stands for Entity
relationship model and is the high-level data
model used to represent a logical view of the
system from a data perspective.

• ER models are normally represented

as a diagram called an entity-relationship
diagram.
Features of ER

The features of ER Model are as follows −

• ER Diagram: ER diagrams are the diagrams that are sketched out to design the database. They are
created based on three basic concepts: entities, attributes, and relationships between them. In ER
diagram we define the entities, their related attributes, and the relationships between them. This helps
in illustrating the logical structure of the databases.
• Database Design: The Entity-Relationship model helps the database designers to build the database in
a very simple and conceptual manner.
• Graphical Representation helps in Better Understanding: ER diagrams are very easy and simple to
understand and so the developers can easily use them to communicate with stakeholders.
• Easy to build: The ER model is very easy to build.
• The extended E-R features: Some of the additional features of ER model are specialization, upper and
lower-level entity sets, attribute inheritance, aggregation, and generalization.
• Integration of ER model: This model can be integrated into a common dominant relational model and is
widely used by database designers for communicating their ideas.
• Simplicity and various applications of ER model: It provides a preview of how all your tables should
connect, and what fields are going to be on each table, which can be used as a blueprint for
implementing data in specific software applications.

DESIGN ISSUES
Entity-Relationship Design Issues
The notions of an entity set and a relationship set are not precise, and it is possible to define a set of entities
and the relationships among them in a number of different ways.
Basic issues in the design of an E-R database schema are:-

1. Use of Entity Sets V/s Attributes

Consider the entity set instructor with the additional attribute phone number.
It can easily be argued that a phone is an entity in its own right with attributes phone number and location; the
location may be the office or home where the phone is located, with mobile (cell) phones perhaps represented
by the value “mobile.” If we take this point of view, we do not add the attribute phone number to the
instructor. Rather, we create:
A phone entity set with attributes phone_number and location.
A relationship set inst_phone, denoting the association between instructors and the phones that they have.

2. Use of Entity Sets V/s Relationship Sets

It is not always clear whether an object is best expressed by an entity set or a relationship set.
Then, we have an entity set to represent the course-registration record. Let us call that entity set registration.
Each registration entity is related to exactly one student and to exactly one section, so we have two relationship

Page | 11

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

sets, one to relate course registration records to students and one to relate course-registration records to
sections.

3. Binary V/s n-ary Relationship Sets

Relationships in databases are often binary. Some relationships that appear to be non binary could actually be
better represented by several binary relationships.
For instance, one could create a ternary relationship parent, relating a child to his/her mother and father.
However, such a relationship could also be represented by two binary relationships, mother and father, relating
a child to his/her mother and father separately.
Using the two relationships mother and father provides us a record of a child’s mother, even if we are not aware
of the father’s identity; a null value would be required if the ternary relationship parent is used. Using binary
relationship sets is preferable in this case.
In fact, it is always possible to replace a non binary (n-ary, for n > 2) relationship set by a number of distinct
binary relationship sets. For simplicity, consider the abstract ternary (n = 3) relationship set R, relating entity
sets A, B, and C.
a. RA, relating E and A.
b. RB, relating E and B.
c. RC, relating E and C.

4. Placement of Relationship Attributes

The cardinality ratio of a relationship can affect the placement of relationship attributes. Thus, attributes of one-
to-one or one-to-many relationship sets can be associated with one of the participating entity sets, rather than
with the relationship set.
For instance, let us specify that advisor is a one-to-many relationship set such that one instructor may advise
several students, but each student can be advised by only a single instructor.

ER Design Methodologies

The guidelines that should be followed while designing an ER diagram are discussed below:

• Recognize entity sets

• Recognize relationship sets and participating entity sets

• Recognize attributes of entity sets and attributes of

relationship sets

• Define binary relationship types and existence

dependencies

• Define general cardinality, constraints, keys, and

discriminators

• Design diagram

MAPPING CONSTRAINT
Cardinality means how the entities are arranged to each other or what is the relationship structure between
entities in a relationship set.

In a Database Management System, Cardinality represents a number that denotes how many times an entity is
participating with another entity in a relationship set. The Cardinality of DBMS is a very important attribute in
representing the structure of a Database. In a table, the number of rows or tuples represents the Cardinality.

Page | 12

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Cardinality Ratio

Cardinality ratio is also called Cardinality Mapping, which represents the mapping of one entity set to another
entity set in a relationship set. We generally take the example of a binary relationship set where two entities are
mapped to each other.

Cardinality is very important in the Database of various businesses. For example, if we want to track the
purchase history of each customer then we can use the one-to-many cardinality to find the data of a specific
customer. The Cardinality model can be used in Databases by Database Managers for a variety of purposes, but
corporations often use it to evaluate customer or inventory data.

There are four types of Cardinality Mapping in Database Management Systems:

1. One to one
2. Many to one
3. One to many
4. Many to many

KEYS

o Keys play an important role in the relational database.

o It is used to uniquely identify any record or row of data from the table. It is also used to establish and
identify relationships between tables.

KEYS in DBMS is an attribute or set of attributes which helps you to identify a row(tuple) in a relation(table).
They allow you to find the relation between two tables. Keys help you uniquely identify a row in a table by a
combination of one or more columns in that table.

Types of Keys in DBMS (Database Management System)

There are mainly Eight different types of Keys in DBMS and each key has it’s different functionality:

1. Super Key
2. Primary Key
3. Candidate Key
4. Alternate Key
5. Foreign Key
6. Compound Key
7. Composite Key
8. Surrogate Key

Let’s look at each of the keys in DBMS with example:

• Super Key – A super key is a group of single or multiple keys which identifies rows in a table.
• Primary Key – is a column or group of columns in a table that uniquely identify every row in that table.
• Candidate Key – is a set of attributes that uniquely identify tuples in a table. Candidate Key is a super
key with no repeated attributes.
• Alternate Key – is a column or group of columns in a table that uniquely identify every row in that table.
• Foreign Key – is a column that creates a relationship between two tables. The purpose of Foreign keys is
to maintain data integrity and allow navigation between two different instances of an entity.
• Compound Key – has two or more attributes that allow you to uniquely recognize a specific record. It is
possible that each column may not be unique by itself within the database.

Page | 13

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

• Composite Key – is a combination of two or more columns that uniquely identify rows in a table. The
combination of columns guarantees uniqueness, though individual uniqueness is not guaranteed.
• Surrogate Key – An artificial key which aims to uniquely identify each record is called a surrogate key.
These kind of key are unique because they are created when you don’t have any natural primary key.

ER DIAGRAM

• ER diagram was proposed by Peter Chen in 1971 as a visual tool to represent the ER model.
• ER diagrams are created based on three basic concepts: entities, attributes, and relationships between
them.

• Any object that physically exists and is logically constructed in the real world is called an entity.
• Strong entities are those entity types that have a key attribute, whereas weak entity type doesn’t have
a key attribute and so we cannot uniquely identify them by their attributes alone.
• Attributes are the characteristics or properties which define the entity type.
• Relationship is nothing but an association among two or more entities.
• A simple attribute is an attribute that cannot be decomposed further. A composite attribute is
an attribute that can be decomposed into simpler attributes.
• A multivalued attribute is an attribute that can have multiple values, and a derived attribute is
an attribute that can be derived from other attributes of an entity type.
• When only one instance of an entity is associated with the relationship to one instance of
another entity, then it is known as one to one relationship.
• If only one instance of the entity on the left side of the relationship is linked to multiple instances of
the entity on the right side, then this is considered a given one-to-many relationship.
• If only one instance of the entity on the left side of the relationship is linked to multiple instances of
the entity on the right side, then this is considered a given one-to-many relationship.
• If multiple instances of the entity on the left are linked by relationships to multiple instances of
the entity on the right, this is considered a many-to-one-relationship means relationship.

Component of ER Diagram

The ER diagram consists of three basic concepts:

1. Entities:- An entity is anything in the real world, such as an object, class, person, or place. Objects that
physically exist and are logically constructed in the real world are called entities. Each entity consists of
several characteristics or attributes that describe that entity. For example, if a person is an entity,
its attributes or characteristics are age, name, height, weight, occupation, address, hobbies, and so on.
Read this article to learn more about Entity in DBMS.
2. Attributes:- Attributes are the characteristics or properties which define the entity type. In ER diagram,
the attribute is represented by an oval.
For example, here id, Name, Age, and Mobile_No are the attributes that define the entity type Student.

There are five types of attributes:

1. Simple attribute: Attributes that cannot be further decomposed into sub-attributes are called simple
attributes. It's an atomic value and is also known as the key attribute. The simple attributes are
represented by an oval shape in ER diagrams with the attribute name underlined.

For example, the roll number of a student, or the student's contact number are examples of simple
attributes.

Page | 14

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

2. Composite attribute: An attribute that is composed of many other attributes and can be decomposed
into simple attributes is known as a composite attribute in DBMS. The composite attribute is
represented by an ellipse.

For example, a student's address can be divided into city, state, country, and pin code or a full name can
be divided into first name, middle name, and last name.

3. Multivalued attribute: Multivalued attributes in DBMS are attributes that can have more than one
value. The double oval is used to represent a multivalued attribute.

For example, the mobile_number of a student is a multivalued attribute as one student can have more
than one mobile number.

4. Derived attribute: Derived attributes in DBMS are the ones that can be derived from other attributes of
an entity type. The derived attributes are represented by a dashed oval symbol in the ER diagram.

For example, the age attribute can be derived from the date of birth (DOB) attribute. So, it's a derived
attribute.

3. Relationships:- The concept of relationship in DBMS is used to describe the relationship between
different entities. This is denoted by the diamond or a rhombus symbol. For example, the
teacher entity type is related to the student entity type and their relation is represented by the
diamond shape.

Page | 15

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

There are four types of relationships:

1. One-to-One Relationships: When only one instance of an entity is associated with the relationship to
one instance of another entity, then it is known as one to one relationship. For example, let us assume
that a male can marry one female and a female can marry one male. Therefore the relation is one-to-
one.

2. One-to-Many Relationships: If only one instance of the entity on the left side of the relationship is
linked to multiple instances of the entity on the right side, then this is considered a given one-to-many
relationship. For example, a Scientist can invent many inventions, but the invention is done by only a
specific scientist.

3. Many-to-One Relationships: If only one instance of the entity on the left side of the relationship is
linked to multiple instances of the entity on the right side, then this is considered a given one-to-many
relationship. For example, a Student enrolls for only one course, but a course can have many students.

4. Many to Many Relationships: If multiple instances of the entity on the left are linked by relationships to
multiple instances of the entity on the right, this is considered a many-to-one-relationship means
relationship. For example, one employee can be assigned many projects, and one project can be
assigned by many employees.

Page | 16

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

WEAK AND STRONG ENTITY SETS

Entities are of two types:

1. Strong Entity – A strong entity is

an entity type that has a
key attribute. It doesn't depend
on other entities in the schema.
A strong entity always has a
primary key, and it is
represented by a single
rectangle in the ER diagram.

Example – roll_number identifies each

student of the organization uniquely and
hence, we can say that the student is a
strong entity type.

2. Weak Entity – Weak entity type doesn’t have a key attribute and so we cannot uniquely identify them
by their attributes alone. Therefore, a foreign key must be used in combination with its attributes to
create a primary key. They are called Weak entity types because they can’t be identified on their own. It
relies on another powerful entity for its unique identity. A weak entity is represented by a double-
outlined rectangle in ER diagrams.

For example, the address can't be used to uniquely identify students as there can be many students from the
same locality. So, for this, we need an attribute of Strong Entity Type i.e ‘student’ to uniquely
identify entities of Address Entity Type.

The relationship between a weak entity type and a strong entity type is shown with a double-outlined diamond
instead of a single-outlined diamond. This representation can be seen in the image given below.

Each entity in the set of strong entities can be uniquely identified because it has a primary key, whereas
identifying each entity in a set of weak entities is not possible. After all, it has no primary key and it may contain
redundant entities.

SPECIALIZATION AND GENERALIZATION

Generalization
o Generalization is like a bottom-up approach in which two or more entities of lower level combine to
form a higher level entity if they have some attributes in common.
o In generalization, an entity of a higher level can also combine with the entities of the lower level to
form a further higher level entity.
o Generalization is more like subclass and superclass system, but the only difference is the approach.
Generalization uses the bottom-up approach.
o In generalization, entities are combined to form a more generalized entity, i.e., subclasses are combined
to make a superclass.

Generalization Characteristics:

• The bottom-up technique is used for generalization.

• It simplifies or generalizes the entities.
• Higher-level entities and lower-level entities can be linked.

Page | 17

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

For example, Faculty and Student entities can be generalized and create a higher level entity Person.

Specialization

o Specialization is a top-down
approach, and it is opposite to
Generalization. In specialization, one
higher level entity can be broken
down into two lower level entities.
o Specialization is used to identify the
subset of an entity set that shares
some distinguishing characteristics.
o Normally, the superclass is defined
first, the subclass and its related
attributes are defined next, and
relationship set are then added.
o For example: In an Employee
management system, EMPLOYEE entity can be specialized as TESTER or DEVELOPER based on what role
they play in the company.

Page | 18

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

AGGREGATION
In aggregation, the relation between two entities is treated as a single entity. In aggregation, relationship with
its corresponding entities is aggregated into a higher level entity.

For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor. In the real
world, if a visitor visits a coaching center then he will
never enquiry about the Course only or just about
the Center instead he will ask the enquiry about
both.

INHERITANCE
Inheritance is an important feature of Generalization
and Specialization. It allows lower-level entities to
inherit the attributes of higher-level entities.

DESIGN OF ER SCHEMA
ER Models in Database Design

They are widely used to design relational databases. The entities in the ER schema become tables, attributes and
converted the database schema. Since they can be used to visualize database tables and their relationship, it’s
commonly used for database troubleshooting as well.

REDUCTION OF ER
The notations in an ER diagram can be used to represent the database, and these notations can be reduced to a
set of tables.

The database can be represented using the notations, and these notations can be reduced to a collection of
tables.

In the database, every entity

set or relationship set can be
represented in tabular form.

Page | 19

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

UNIT II:- Domains

RELATIONS

• A one-to-one relationship means when a single record in the first table is related to only one record in
the other table.

• A one-to-many relationship is defined as when a single record in the first table is related to one or more
records in the other table, but a single record in the other table is related to only one record in the first
table.

• A many-to-many relationship can be defined as when a single record in the first table is related to one
or more products in the second table and a single record in the second table is related to one or more
records in the first table.

• A well-defined relationship adds more integrity to the table structure and makes the DBMS more
efficient.

KIND OF RELATIONS

There are 3 different types of relations in the database:

• one-to-one
• one-to-many, and
• many-to-many

RELATIONAL DATABASE

A relational database is a collection of information that organizes data in predefined relationships where data is
stored in one or more tables (or "relations") of columns and rows, making it easy to see and understand how
different data structures relate to each other.

A relational database (RDB) is a way of structuring information in tables, rows, and columns.

An RDB has the ability to establish links—or relationships–between information by joining tables, which makes it
easy to understand and gain insights about the relationship between various data points.

A relational database is a type of database that stores and provides access to data points that are related to one
another.

Relational databases are based on the relational model, an intuitive, straightforward way of representing data in
tables.

In a relational database, each row in the table is a record with a unique ID called the key. The columns of the
table hold attributes of the data, and each record usually has a value for each attribute, making it easy to
establish the relationships among data points.

CANDIDATE, PRIMARY, ALTERNATE AND FOREIGN KEYS.

Candidate Key in DBMS

A candidate key is a part of a key known as Super Key (discussed in the previous section), where the super key is
the super set of all those attributes that can uniquely identify a table.

A candidate key is a subset of a super key set where the key which contains no redundant attribute is none other
than a Candidate Key.

Role of a Candidate Key

Page | 20

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

The role of a candidate key is to identify a table row or column uniquely. Also, the value of a candidate key
cannot be Null. The description of a candidate key is "no redundant attributes" and being a "minimal
representation of a tuple," according to the Experts.

Primary Key

A Primary Key is the minimal set of attributes of a table that has the task to uniquely identify the rows, or we
can say the tuples of the given particular table.

A primary key of a relation is one of the possible candidate keys which the database designer thinks it's primary.
It may be selected for convenience, performance and many other reasons.

There are certain keys in DBMS that are used for different purposes, from which the most commonly known is
the Primary Key.

Properties of a Primary Key:

o A relation can contain only one primary key.

o A primary key may be composed of a single attribute known as single primary key or more than one
attribute known as composite key.

o A primary key is the minimum super key.

o The data values for the primary key attribute should not be null.

o Attributes which are part of a primary key are known as Prime attributes.

o Primary key is always chosen from the possible candidate keys.

o If the primary key is made of more than one attribute then those attributes are irreducible.

o We use the convention that the attribute that form primary key of relation is underlined.

o Primary key cannot contain duplicate values.

o Columns that are defined as LONG or LONG RAW cannot be part of a primary key.

FOREIGN KEY

In the relational databases, a foreign key is a field or a column that is used to establish a link between two
tables.

In simple words you can say that, a foreign key in one table used to point primary key in another table.

A foreign key (FK) is a column or combination of columns that is used to establish and enforce a link between the
data in two tables to control the data that can be stored in the foreign key table.

RELATIONAL ALGEBRA & SQL: FEATURES OF GOOD RELATIONAL DATABASE DESIGN

Relational algebra is a procedural query language. It gives a step by step process to obtain the result of the
query. It uses operators to perform queries.

it uses operators to perform queries. An operator can be either unary or binary. They accept relations as their
input and yield relations as their output. Relational algebra is performed recursively on a relation and
intermediate results are also considered relations.
The fundamental operations of relational algebra are as follows −
• Select
• Project
• Union

Page | 21

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

• Set different
• Cartesian product
• Rename

CODD’S RULE
Dr Edgar F. Codd, after his extensive research on the Relational Model of database systems, came up with twelve
rules of his own, which according to him, a database must obey in order to be regarded as a true relational
database.

These rules can be applied on any database system that manages stored data using only its relational
capabilities. This is a foundation rule, which acts as a base for all the other rules.

Rule 0: The Foundation Rule

The database must be in relational form. So that the system can handle the database through its relational
capabilities.

Rule 1: Information Rule

A database contains various information, and this information must be stored in each cell of a table in the form
of rows and columns.

Rule 2: Guaranteed Access Rule

Every single or precise data (atomic value) may be accessed logically from a relational database using the
combination of primary key value, table name, and column name.

Rule 3: Systematic Treatment of Null Values

This rule defines the systematic treatment of Null values in database records. The null value has various
meanings in the database, like missing the data, no value in a cell, inappropriate information, unknown data and
the primary key should not be null.

Rule 4: Active/Dynamic Online Catalog based on the relational model

It represents the entire logical structure of the descriptive database that must be stored online and is known as a
database dictionary. It authorizes users to access the database and implement a similar query language to access
the database.

Rule 5: Comprehensive Data SubLanguage Rule

The relational database supports various languages, and if we want to access the database, the language must
be the explicit, linear or well-defined syntax, character strings and supports the comprehensive: data definition,
view definition, data manipulation, integrity constraints, and limit transaction management operations. If the
database allows access to the data without any language, it is considered a violation of the database.

Rule 6: View Updating Rule

All views table can be theoretically updated and must be practically updated by the database systems.

Rule 7: Relational Level Operation (High-Level Insert, Update and delete) Rule

A database system should follow high-level relational operations such as insert, update, and delete in each level
or a single row. It also supports union, intersection and minus operation in the database system.

Rule 8: Physical Data Independence Rule

All stored data in a database or an application must be physically independent to access the database. Each data
should not depend on other data or an application. If data is updated or the physical structure of the database is
changed, it will not show any effect on external applications that are accessing the data from the database.

Page | 22

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Rule 9: Logical Data Independence Rule

It is similar to physical data independence. It means, if any changes occurred to the logical level (table
structures), it should not affect the user's view (application). For example, suppose a table either split into two
tables, or two table joins to create a single table, these changes should not be impacted on the user view
application.

Rule 10: Integrity Independence Rule

A database must maintain integrity independence when inserting data into table's cells using the SQL query
language. All entered values should not be changed or rely on any external factor or application to maintain
integrity. It is also helpful in making the database-independent for each front-end application.

Rule 11: Distribution Independence Rule

The distribution independence rule represents a database that must work properly, even if it is stored in
different locations and used by different end-users. Suppose a user accesses the database through an
application; in that case, they should not be aware that another user uses particular data, and the data they
always get is only located on one site. The end users can access the database, and these access data should be
independent for every user to perform the SQL queries.

Rule 12: Non Subversion Rule

The non-submersion rule defines RDBMS as a SQL language to store and manipulate the data in the database. If
a system has a low-level or separate language other than SQL to access the database system, it should not
subvert or bypass integrity to transform data.

BASIC STRUCTURE OF SQL

The basic structure of an SQL query consists of three clauses: select, from, and where. The query takes as its
input the relations listed in the from clause, operates on them as specified in the where and select clauses, and
then produces a relation
as the result.

SET OPERATIONS
The SQL Set operation is used to combine the two or more SQL SELECT statements.

Types of Set Operation

1. Union :-

Union

o The SQL Union operation is used to combine the result of two or more SQL SELECT queries.
o In the union operation, all the number of datatype and columns must be same in both the tables on
which UNION operation is being applied.
o The union operation eliminates the duplicate rows from its resultset.

Syntax

SELECT column_name FROM table1

UNION
SELECT column_name FROM table2;

2.UnionAll:-

Union All operation is equal to the Union operation. It returns the set without removing duplication and sorting
the data.

Page | 23

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Syntax:

SELECT column_name FROM table1

UNION ALL
SELECT column_name FROM table2;

3. Intersect:-
o It is used to combine two SELECT statements. The Intersect operation returns the common rows from
both the SELECT statements.
o In the Intersect operation, the number of datatype and columns must be the same.
o It has no duplicates and it arranges the data in ascending order by default.

Syntax

SELECT column_name FROM table1

INTERSECT
SELECT column_name FROM table2;

4. Minus:-
o It combines the result of two SELECT statements. Minus operator is used to display the rows which are
present in the first query but absent in the second query.
o It has no duplicates and data arranged in ascending order by default.

Syntax:

SELECT column_name FROM table1

MINUS
SELECT column_name FROM table2;

AGGREGATE FUNCTIONS
o SQL aggregation function is used to perform the calculations on multiple rows of a single column of a
table. It returns a single value.

o It is also used to summarize the data.

An aggregate function performs a calculation on a set of values, and returns a single value. Except for COUNT(*),
aggregate functions ignore null values. Aggregate functions are often used with the GROUP BY clause of the
SELECT statement.

All aggregate functions are deterministic.

Aggregate functions are a vital component of database management systems. They allow us to perform
calculations on large data sets quickly and efficiently.

Various types of SQL aggregate functions are:

• Count():-

o COUNT function is used to Count the number of rows in a database table. It can work on both numeric
and non-numeric data types.
o COUNT function uses the COUNT(*) that returns the count of all the rows in a specified table.
COUNT(*) considers duplicate and Null.

Page | 24

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Syntax

COUNT(*)
or
COUNT( [ALL|DISTINCT] expression )

• Sum():-

Sum function is used to calculate the sum of all selected columns. It works on numeric fields only.

Syntax

SUM()
or
SUM( [ALL|DISTINCT] expression )

• Avg():-

The AVG function is used to calculate the average value of the numeric type. AVG function returns the average
of all non-Null values.

Syntax

AVG()
or
AVG( [ALL|DISTINCT] expression )

• Min():-

MIN function is used to find the minimum value of a certain column. This function determines the smallest value
of all selected values of a column.

Syntax

MIN()
or
MIN( [ALL|DISTINCT] expression )

• Max():-

MAX function is used to find the maximum value of a certain column. This function determines the largest value
of all selected values of a column.

Syntax

MAX()
or

Page | 25

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

MAX( [ALL|DISTINCT] expression )

NULL VALUES
A field with a NULL value is a field with no value.

If a field in a table is optional, it is possible to insert a new record or update a record without adding a value to
this field. Then, the field will be saved with a NULL value.

NULL in SQL represents a column field in the table with no value. NULL is different from a zero value and from
"none".

NESTED SUB QUERIES

A Subquery is a query within another SQL query and embedded within the WHERE clause.

Important Rule:

o A subquery can be placed in a number of SQL clauses like WHERE clause, FROM clause, HAVING clause.

o You can use Subquery with SELECT, UPDATE, INSERT, DELETE statements along with the operators like =,
<, >, >=, <=, IN, BETWEEN, etc.

o A subquery is a query within another query. The outer query is known as the main query, and the inner
query is known as a subquery.

o Subqueries are on the right side of the comparison operator.

o A subquery is enclosed in parentheses.

o In the Subquery, ORDER BY command cannot be used. But GROUP BY command can be used to perform
the same function as ORDER BY command.

DERIVED RELATIONS
A derived relation is a relation instance resulting from the evaluation of a relational algebra expression over a
database instance.

DDL IN SQL
The Data Definition Language is made up of SQL commands that can be used to design the database structure. It
simply handles database schema descriptions and is used to construct and modify the structure of database
objects in the database.

DDL commands that construct the database structure are:

• CREATE Command: The database or its objects are created with this command (like table, index,
function, views, store procedure, and triggers). There are two types of CREATE statements in SQL, one
is for the creation of a database and the other for a table.

syntax: CREATE DATABASE db_name;

db_name : name of the database(any name can be given)

• DROP Command: The DROP command can be used to delete a whole database or simply a table that
means entire data will also be deleted. The DROP statement deletes existing objects such as databases,
tables, indexes, and views.

Page | 26

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Syntax: For dropping table: DROP TABLE table_name;

For dropping database: DROP DATABASE db_name;

• ALTER Command: In an existing table, this command is used to add, delete/drop, or edit columns. It
can also be used to create and remove constraints from a table that already exists.

To add a new column:

Syntax: ALTER TABLE table_name ADD column_name COLUMN-definition;

• TRUNCATE Command: used to indicate the table’s extents for deallocation (empty for reuse). This
procedure removes all data from a table quickly, usually circumventing a number of integrity checking
processes. It was included in the SQL:2008 standard for the first time. It is somewhat equivalent to the
delete command.

Syntax: TRUNCATE TABLE table_name;

Example: TRUNCATE TABLE Employee;

DOMAIN RULES
A domain is a unique set of values that can be assigned to an attribute in a database. For example, a domain of
strings can accept only string values.

• In a database, a domain is a set of values that can be assigned to an attribute.

• A domain can be created using the CREATE DOMAIN command in SQL.
• There are two types of domain constraints, NOT NULL and CHECK.

Types of Domain Constraints

There are two types of domain constraints :

1. NOT NULL :
The Not Null constraint prevents a column from accepting null values. This implies that you can't create
a new record or change an existing one without first putting a value in the field.

Example :

CREATE DOMAIN C_Number INT(10) NOT NULL;

2. Check :
It restricts the value of a column across ranges. It can also be understood as it's like a condition or filter
checking before saving data into a column since it defines a condition that each row must satisfy.

Example :

CREATE DOMAIN S_ID INT(3) NOT NULL CHECK(VALUE > 0);

ATTRIBUTE RULES

▪ Attribute rules enhance the editing experience and improve data integrity for
geodatabase datasets. They are user-defined rules that can be used to automatically

Page | 27

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

populate attributes, restrict invalid edits during edit operations, and perform quality
assurance checks on existing features.
▪ Attribute rules are complementary to existing rules used in the geodatabase, such as
domains and subtypes.

▪ When you create an attribute rule, you must specify the rule type to use. The
attribute rule type chosen depends on the task and at what point in the editing
process the rule needs to be evaluated.
▪ Attribute rules are viewed, created, and managed in their own tabular-style view
called the Attribute Rules view.
▪ The Attribute Rules view can be accessed using the context menu of the dataset
directly from the Catalog or Contents pane.
▪ It can also be accessed by clicking the Attribute Rules button in the Data Design group
on the Data tab for a feature layer or Standalone Table tab for a table when an active
layer in the map view is selected or when using the Fields or Subtypes view.

ASSERTIONS & TRIGGERS

A Trigger in Structured Query Language is a set of procedural statements which are executed automatically when
there is any response to certain events on the particular table in the database. Triggers are used to protect the
data integrity in the database.

A trigger is a database object that is associated with the table, it will be activated when a defined action is
executed for the table. The trigger can be executed when we run the following statements:

1. INSERT

2. UPDATE

3. DELETE

And it can be invoked before or after the event.

Syntax –

create trigger [trigger_name]

[before | after]

{insert | update | delete}

on [table_name]

[for each row]

[trigger_body]

Assertions are different from check constraints in the way that check constraints are rules that relate to one single
row only. Assertions, on the other hand, can involve any number of other tables, or any number of other rows in
the same table. Assertions also check a condition, which must return a Boolean value.

S.No Assertions Triggers

We can use Assertions when we know that We can use Triggers even particular condition may
1.
the given particular condition is always true. or may not be true.

Page | 28

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

S.No Assertions Triggers

When the SQL condition is not met then

Triggers can catch errors if the condition of the
2. there are chances to an entire table or even
query is not true.
Database to get locked up.

Assertions are not linked to specific table or It helps in maintaining the integrity constraints in
3. event. It performs task specified or defined the database tables, especially when the primary
by the user. key and foreign key constraint are not defined.

Assertions do not maintain any track of Triggers maintain track of all changes occurred in
4.
changes made in table. table.

Assertions have small syntax compared to They have large Syntax to indicate each and every
5.
Triggers. specific of the created trigger.

6. Modern databases do not use Assertions. Triggers are very well used in modern databases.

Purpose of assertions is to Enforces Purpose of triggers is to Executes actions in

7.
business rules and constraints. response to data changes.

Activation is checked after a transaction Activation is activated by data changes during a

8.
completes transaction

9. Granularity applies to the entire database Granularity applies to a specific table or view

10. Syntax Uses SQL statements Syntax Uses procedural code (e.g. PL/SQL, T-SQL)

Error handling Causes transaction to be Error handling can ignore errors or handle them
11. rolled back. explicitly

Assertions may slow down performance of

12. queries.
Triggers Can impact performance of data changes.

Assertions are Easy to debug with SQL Triggers are more difficult to debug procedural
13. statements. code

Examples –
Examples- CHECK constraints, FOREIGN KEY
14. constraints AFTER INSERT triggers, INSTEAD OF triggers

Page | 29

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

INTEGRITY AND SQL

▪ There are four types of data integrity in SQL. Domain integrity, Entity integrity, Referential integrity,
and User-defined integrity. All these ensure that data integrity is maintained in any table in SQL.

▪ Integrity constraints are a set of rules. It is used to maintain the quality of information.

▪ Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected.

▪ Thus, integrity constraint is used to guard against accidental damage to the database.

What are the Types of Data Integrity in SQL?

In SQL, we basically have four types of data integrity. Let’s go through all of them in detail:

Domain integrity:- The authenticity of inputs for a particular column is called domain integrity in SQL.

Entity integrity:- Entity integrity requires that every row in the table should have distinct records. Thus, there
must be no duplicate rows.

Referential integrity:- Relationships are fundamental to referential integrity in SQL. Whenever 2 or maybe more
tables are linked, we must guarantee that the foreign key value is always matched with the value of the primary
value. A situation in which the foreign key's value has no corresponding primary key value in the main table is
incorrect. As a result, the record would be considered an orphaned record in SQL.

User-defined integrity:- This type of integrity allows the user to implement business rules to any database which
are not covered by the other 3 types of data integrity.

UNIT III
FUNCTIONAL DEPENDENCIES AND NORMALIZATION: BASIC DEFINITIONS

Functional dependencies are relationships between attributes in a database. They describe how one attribute is
dependent on another attribute.

Functional dependencies can be used to design a database in a way that eliminates redundancy and ensures
data integrity.

• Functional Dependency in DBMS, as the name suggests it is the relationship between

attributes(characteristics) of a table related to each other.
• A relation consisting of functional dependencies always follows a set of rules called RAT rules. They are
proposed by William Armstrong in 1974.
• It helps in maintaining the quality of data in the database, and the core concepts behind database
normalization are based on functional dependencies.

TRIVIAL AND NON TRIVIAL DEPENDENCIES

Trivial Functional Dependency in DBMS

• In Trivial functional dependency, a dependent is always a subset of the determinant. In other words, a
functional dependency is called trivial if the attributes on the right side are the subset of the attributes
on the left side of the functional dependency.
• X → Y is called a trivial functional dependency if Y is the subset of X.

Page | 30

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Non-Trivial Functional Dependency in DBMS

• It is the opposite of Trivial functional dependency. Formally speaking, in Non-Trivial functional

dependency, dependent if not a subset of the determinant.
• X → Y is called a Non-trivial functional dependency if Y is not a subset of X. So, a functional
dependency X → Y where X is a set of attributes and Y is also a set of the attribute but not a subset of X,
then it is called Non-trivial functional dependency.

CLOSURE SET OF DEPENDENCIES AND OF ATTRIBUTES

• The set of all those attributes which can be functionally determined from an attribute set is called as a
closure of that attribute set.
• Closure of attribute set {X} is denoted as {X}+.

IRREDUCIBLE SET OF DEPENDENCIES

• A canonical cover is a simplified and reduced version of the given set of functional dependencies.
• Since it is a reduced version, it is also called as Irreducible set.
• A canonical cover or irreducible a set of functional dependencies FD is a simplified set of FD that has a
similar closure as the original set FD.
Characteristics-
• Canonical cover is free from all the extraneous functional dependencies.
• The closure of canonical cover is same as that of the given set of functional dependencies.
• Canonical cover is not unique and may be more than one for a given set of functional dependencies.

Need:-
• Working with the set containing extraneous functional dependencies increases the computation time.
• Therefore, the given set is reduced by eliminating the useless functional dependencies.
• This reduces the computation time and working with the irreducible set becomes easier.
INTRODUCTION TO NORMALIZATION
o Normalization is the process of organizing the data in the database.

o Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to
eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.

o Normalization divides the larger table into smaller and links them using relationships.

o The normal form is used to reduce redundancy from the database table.

o Normalization is the process of organizing the data and the attributes of a database. It is performed to
reduce the data redundancy in a database and to ensure that data is stored logically.
o Normalization is a database design technique that reduces data redundancy and eliminates undesirable
characteristics like Insertion, Update and Deletion Anomalies.
o Normalization rules divides larger tables into smaller tables and links them using relationships.
o The purpose of Normalisation in SQL is to eliminate redundant (repetitive) data and ensure data is
stored logically.
o The inventor of the relational model Edgar Codd proposed the theory of normalization of data with the
introduction of the First Normal Form, and he continued to extend theory with Second and Third
Normal Form. Later he joined Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.
o Normalization in DBMS is a process which helps produce database systems that are cost-effective and
have better security models.

Page | 31

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

o Normalization is a systematic approach of decomposing tables to eliminate data redundancy(repetition)

and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process
that puts data into tabular form, removing duplicated data from the relation tables.
o Normalization is used for mainly two purposes,
o Eliminating redundant(useless) data.
o Ensuring data dependencies make sense i.e data is logically stored.

Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.

Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.

Database Normal Forms

Here is a list of Normal Forms in SQL:

• 1NF (First Normal Form)

• 2NF (Second Normal Form)
• 3NF (Third Normal Form)
• BCNF (Boyce-Codd Normal Form)
• 4NF (Fourth Normal Form)
• 5NF (Fifth Normal Form)
• 6NF (Sixth Normal Form)

NON LOSS DECOMPOSITION

Lossless decomposition in DBMS is a technique used in database management systems to break down a large
table into smaller tables while preserving all the information contained in the original table. This means that no
data is lost during the decomposition process. The goal of lossless decomposition is to eliminate redundancy and
improve the efficiency of the database. It involves identifying and separating out functional dependencies within
the table so that each smaller table contains only the relevant information for a particular purpose.

Advantages of Lossless Decomposition in DBMS

Here are some advantages of lossless decomposition in DBMS:

o It eliminates data redundancy

o Improves the efficiency of the database

o Reduces storage space requirements

o Enables effective maintenance of the database

o Preserves all the information from the original table

o Prevents data inconsistencies

Page | 32

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Disadvantages of Lossless Dcomposition in DBMS

• Increases complexity of the database

• Not always feasible to achieve

• The retrieving original table can be complex

• May require additional storage space

FD DIAGRAM
In a functional dependency diagram (FDD), functional dependency is represented by rectangles representing
attributes and a heavy arrow showing dependency. Fig. Shows A functional dependency diagram for the
simplest functional dependency, that is, FD: Y -> X.
In functional dependency diagram, each FD is displayed as a horizontal line.
The left-hand side attributes of the FD, i.e. Determinants, are connected by Vertical lines to line representing
the FD.
The right-hand side attributes are connected by arrows pointing towards the attibutes.

FIRST, SECOND, THIRD NORMAL FORMS

First Normal Form (1NF)

o A relation will be 1NF if it contains an atomic value.

o It states that an attribute of a table cannot hold multiple values. It must hold only single-valued
attribute.

o First normal form disallows the multi-valued attribute, composite attribute, and their combinations.

Second Normal Form (2NF)

o In the 2NF, relational must be in 1NF.

o In the second normal form, all non-key attributes are fully functional dependent on the primary key

Third Normal Form (3NF)

o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.

o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.

o If there is no transitive dependency for non-prime attributes, then the relation must be in third normal
form.

Page | 33

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function
dependency X → Y.

1. X is a super key.

2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

DEPENDENCY PRESERVATION
A form of decomposition known as dependency-preserving decomposition preserves the relationships between
the characteristics. This implies that the original table's dependencies will remain in the deconstructed tables as
well. Data loss might occur as a result of dependency-preserving decomposition, though.

Dependency-preserving decomposition comes in two types: dependency-preserving decomposition and lossless

decomposition. While dependency-preserving decomposition retains dependencies but may result in data loss,
lossless decomposition preserves all data in a table but may not preserve dependencies.

o It is an important constraint of the database.

o In the dependency preservation, at least one decomposed table must satisfy every dependency.

o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part
of R1 or R2 or must be derivable from the combination of functional dependencies of R1 and R2.

o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The
relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A->BC
is a part of relation R1(ABC).

BCNF
Boyce Codd normal form (BCNF)
o BCNF is the advance version of 3NF. It is stricter than 3NF.

o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.

o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of
anomaly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is
said to be in BCNF. For a table to be in BCNF, following conditions must be satisfied:

• R must be in 3rd Normal Form

• and, for each functional dependency ( X → Y ), X should be a super Key.

MULTIVALUED DEPENDENCIES AND FOURTH NORMAL FORM

Multivalued dependency (MVD) is a type of dependency that exists when a table contains more than one
multivalued attribute, and changes to one attribute can affect another attribute. In other words, MVD occurs
when a table has a non-trivial relationship between attributes that are not part of the same composite key.

Page | 34

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Fourth Normal Form (4NF)

Fourth Normal Form (4NF) is a level of database normalization that requires a relation to be in BCNF and have
no non-trivial multivalued dependencies other than the candidate key, to eliminate redundant data and
maintain data consistency. If a table violates this standard, it needs to be split into two tables to achieve 4NF.
For a relation R to be in 4NF, it must meet two conditions −

• It should be in Boyce-Codd Normal Form (BCNF).

• It should not have any non-trivial multivalued dependencies.
Multivalued dependency is a concept which can be removed by 4NF. 4NF helps to eliminate data redundancy,
improve data integrity, and improve database performance. Achieving 4NF can be challenging. But it improves
data quality, better database performance, and less anomalies associated that has data redundancy. By analyzing
functional dependencies and using appropriate techniques, 4NF can be achieved.

JOIN DEPENDENCY AND FIFTH NORMAL FORM

Join Dependencies and Fifth Normal Form (5NF)

• If a relation is in 4NF and does not contain any join dependencies, it is in 5NF.
• To avoid redundancy, 5NF is satisfied when all tables are divided into as many tables as possible.
• A relation is said to have join dependency if it can be recreated by joining multiple sub relations and
each of these sub relations has a subset of the attributes of the original relation.

Fifth normal form (5NF)

o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).

UNIT IV
TRANSACTION
o A transaction can be defined as a group of tasks. A single task is the minimum processing unit which
cannot be divided further.
o The transaction is a set of logically related operation. It contains a group of tasks.
o A transaction is an action or series of actions. It is performed by a single user to perform operations for
accessing the contents of the database.
o A transaction usually means that the data in the database has changed. One of the major uses of DBMS
is to protect the user’s data from system failures. It is done by ensuring that all the data is restored to a
consistent state when the computer is restarted after a crash.
o The transaction is any one execution of the user program in a DBMS. One of the important properties
of the transaction is that it contains a finite number of steps. Executing the same program multiple
times will generate multiple transactions.

Operations in Transaction-
The main operations in a transaction are-

1. Read Operation
2. Write Operation

Page | 35

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

1. Read Operation-
• Read operation reads the data from the database and then stores it in the buffer in main memory.
• For example- Read(A) instruction will read the value of A from the database and will store it in the
buffer in main memory.
2. Write Operation-
• Write operation writes the updated data value back to the database from the buffer.
• For example- Write(A) will write the updated value of A from the buffer to the database.
Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A's account to
B's account. This very simple and small transaction involves several low-level tasks.
A’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)

CONCURRENCY AND RECOVERY: BASIC CONCEPTS

The concurrency control method is responsible for maintaining consistency among these copies. The recovery
method is responsible for making a copy consistent with other copies if the site on which the copy is stored fails
and recovers later.

The principles of concurrency in operating systems are designed to ensure that multiple processes or threads
can execute efficiently and effectively, without interfering with each other or causing deadlock.

ACID PROPERTIES

There are properties that all transactions should follow and possess. The four basic are in combination termed as
ACID properties. ACID Properties are used for maintaining the integrity of database during transaction
processing. ACID in DBMS stands for Atomicity, Consistency, Isolation, and Durability. ACID properties and its
concepts of a transaction are put forwarded by Haerder and Reuter in the year 1983.

The ACID has a full form and is as follows:

Page | 36

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Atomicity
All changes to data are performed as if they are a single operation. That is, all the changes are performed, or
none of them are.

The term atomicity defines that the data remains atomic. It means if any operation is performed on the data,
either it should be performed or executed completely or should not be executed at all. It further means that the
operation should not break in between or execute partially.

For example, in an application that transfers funds from one account to another, the atomicity property ensures
that, if a debit is made successfully from one account, the corresponding credit is made to the other account.

Consistency
Data is in a consistent state when a transaction starts and when it ends.

The word consistency means that the value should remain preserved always. In DBMS, the integrity of the data
should be maintained, which means if a change in the database is made, it should remain preserved always. In
the case of transactions, the integrity of the data is very essential so that the database remains consistent before
and after the transaction. The data should always be correct.

For example, in an application that transfers funds from one account to another, the consistency property
ensures that the total value of funds in both the accounts is the same at the start and end of each transaction.

Isolation
The intermediate state of a transaction is invisible to other transactions. As a result, transactions that run
concurrently appear to be serialized.

The term 'isolation' means separation. In DBMS, Isolation is the property of a database where no data should
affect the other one and may occur concurrently.

For example, in an application that transfers funds from one account to another, the isolation property ensures
that another transaction sees the transferred funds in one account or the other, but not in both, nor in neither.

Durability
After a transaction successfully completes, changes to data persist and are not undone, even in the event of a
system failure.

Durability ensures the permanency of something. In DBMS, the term durability ensures that the data after the
successful execution of the operation becomes permanent in the database.

For example, in an application that transfers funds from one account to another, the durability property ensures
that the changes made to each account will not be reversed.

TRANSACTION STATES
A transaction goes through many different states throughout its life cycle.

These states are called as transaction states.

Transaction states are as follows-

1. Active state
2. Partially committed state
3. Committed state
4. Failed state
5. Aborted state
6. Terminated state

Page | 37

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

1. Active state-
• This is the first state in the life cycle of a transaction.
• A transaction is called in an active state as long as its instructions are getting executed.
• All the changes made by the transaction now are stored in the buffer in main memory.
2. Partially committed state-
• After the last instruction of transaction has executed, it enters into a partially committed state.
• After entering this state, the transaction is considered to be partially committed.
• It is not considered fully committed because all the changes made by the transaction are still stored in
the buffer in main memory.
3. Committed state-
• After all the changes made by the transaction have been successfully stored into the database, it enters
into a committed state.
• Now, the transaction is considered to be fully committed.
Note-
• After a transaction has entered the committed state, it is not possible to roll back the transaction.
• In other words, it is not possible to undo the changes that has been made by the transaction.
• This is because the system is updated into a new consistent state.
• The only way to undo the changes is by carrying out another transaction called as compensating
transaction that performs the reverse operations.
4. Failed state-
• When a transaction is getting executed in the active state or partially committed state and some failure
occurs due to which it becomes impossible to continue the execution, it enters into a failed state.

5. Aborted state-
• After the transaction has failed and entered into a failed state, all the changes made by it have to be
undone.
• To undo the changes made by the transaction, it becomes necessary to roll back the transaction.
• After the transaction has rolled back completely, it enters into an aborted state.

6. Terminated state-
• This is the last state in the life cycle of a transaction.
• After entering the committed state or aborted state, the transaction finally enters into a terminated
state where its life cycle finally comes to an end.

Page | 38

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

IMPLEMENTATION OF ATOMICITY AND DURABILITY

Atomicity and durability are two important concepts in database management systems (DBMS) that ensure the
consistency and reliability of data.

Atomicity:
one of the key characteristics of transactions in database management systems (DBMS) is atomicity, which
guarantees that every operation within a transaction is handled as a single, indivisible unit of work.
Durability:
One of the key characteristics of transactions in database management systems (DBMS) is durability, which
guarantees that changes made by a transaction once it has been committed are permanently kept in the
database and will not be lost even in the case of a system failure or catastrophe.
Implementation of Atomicity:

A number of strategies are used to establish atomicity in DBMS to guarantee that either all operations inside a
transaction are correctly done or none of them are executed at all.

Techniques to Implement Atomicity in DBMS:

Here are some common techniques used to implement atomicity in DBMS:

o Undo Log: An undo log is a mechanism used to keep track of the changes made by a transaction before
it is committed to the database. If a transaction fails, the undo log is used to undo the changes made by
the transaction, effectively rolling back the transaction. By doing this, the database is guaranteed to
remain in a consistent condition.
o Redo Log: A redo log is a mechanism used to keep track of the changes made by a transaction after it is
committed to the database. If a system failure occurs after a transaction is committed but before its
changes are written to disk, the redo log can be used to redo the changes and ensure that the database
is consistent.
o Two-Phase Commit: Two-phase commit is a protocol used to ensure that all nodes in a distributed
system commit or abort a transaction together. This ensures that the transaction is executed atomically
across all nodes and that the database remains consistent across the entire system.
o Locking: Locking is a mechanism used to prevent multiple transactions from accessing the same data
concurrently. By ensuring that only one transaction can edit a specific piece of data at once, locking
helps to avoid conflicts and maintain the consistency of the database.

Implementation of Durability in DBMS:

The implementation of durability in DBMS involves several techniques to ensure that committed changes are
durable and can be recovered in the event of failure.

Techniques to Implement Durability:

Here are some common techniques used to implement durability in DBMS:

o Write-Ahead Logging: Write-ahead logging is a mechanism used to ensure that changes made by a
transaction are recorded in the redo log before they are written to the database. This makes sure that
the changes are permanent and that they can be restored from the redo log in the event of a system
failure.
o Checkpointing: Checkpointing is a technique used to periodically write the database state to disk to
ensure that changes made by committed transactions are permanently stored. Checkpointing aids in
minimizing the amount of work required for database recovery.

Page | 39

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

o Redundant storage: Redundant storage is a technique used to store multiple copies of the database or
its parts, such as the redo log, on separate disks or systems. This ensures that even in the event of a
disk or system failure, the data can be recovered from the redundant storage.
o RAID: In order to increase performance and reliability, a technology called RAID (Redundant Array of
Inexpensive Disks) is used to integrate several drives into a single logical unit. RAID can be used to
implement redundancy and ensure that data is durable even in the event of a disk failure.

Here are Some Common Techniques used by DBMS to Implement Atomicity and Durability:

o Transactions: Transactions are used to group related operations that need to be executed atomically.
They are either committed, in which case all their changes become permanent, or rolled back, in which
case none of their changes are made permanent.
o Logging: Logging is a technique that involves recording all changes made to the database in a separate
file called a log. The log is used to recover the database in case of a failure. Write-ahead logging is a
common technique that guarantees that data is written to the log before it is written to the database.
o Shadow Paging: Shadow paging is a technique that involves making a copy of the database before any
changes are made. The copy is used to provide a consistent view of the database in case of failure. The
modifications are made to the original database after a transaction has been committed.
o Backup and Recovery: In order to guarantee that the database can be recovered to a consistent state in
the event of a failure, backup and recovery procedures are used. This involves making regular backups
of the database and keeping track of changes made to the database since the last backup.

CONCURRENT EXECUTIONS
The execution of a concurrent program consists of multiple processes active at the same time. This process is
called a concurrent execution.

Advantages of concurrent execution of a transaction

1. Decrease waiting time or turnaround time.

2. Improve response time

3. Increased throughput or resource utilization.

A process progresses by submitting a sequence of instructions to a processor for execution.

If the computer has multiple processors then instructions from a number of processes, equal to the number of
physical processors, can be executed at the same time. This is sometimes referred to as parallel
or real concurrent execution.

BASIC IDEA OF SERIALIZABILITY

Serializability of schedules ensures that a non-serial schedule is equivalent to a serial schedule. It helps in
maintaining the transactions to execute simultaneously without interleaving one another. In simple words,
serializability is a way to check if the execution of two or more transactions are maintaining the database
consistency or not.

In computer science, serializability is a property of a system describing how different processes operate on
shared data. A system is serializable if its result is the same as if the operations were executed in some
sequential order, meaning there is no overlap in execution. A database management system (DBMS) can be
accomplished by locking data so that no other process can access it while it is being read or written.

serializability is a property of a system that describes how different processes operate on shared data.

Serializability guarantees that the final result is equivalent to some sequential execution but allows for improved
performance by allowing operations that do not conflict with each other to execute concurrently.

Page | 40

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Serializability is the property of a schedule whereby each transaction appears to execute atomically and
independently, even though they actually execute concurrently. In other words, when several transactions are
executed concurrently, they should appear as if they were executed either sequentially or not.

Types of Serializability
In a database management system (DBMS), serializability requires that transactions appear to happen in a
particular order, even if they execute concurrently. Transactions that are not serializable may produce incorrect
results.

1. Conflict Serializability

Conflict serializability is a type of serializability in which conflicting operations on the same data items are
executed in an order that preserves database consistency. Each transaction is assigned a unique number, and the
operations within each transaction are executed in order based on that number. This ensures that no two
conflicting operations are executed concurrently. For example, consider a database with two tables: Customers
and Orders. A customer can have multiple orders, but each order can only be associated with one customer.

2. View Serializability

View serializability is a type of serializability in which each transaction produces results that are equivalent to
some well-defined sequential execution of all transactions in the system.

BASIC IDEA OF CONCURRENCY CONTROL

Concurrency Control is the management procedure that is required for controlling concurrent execution of the
operations that take place on a database.

Concurrency Control in DBMS is a procedure of managing simultaneous transactions ensuring their atomicity,
isolation, consistency and serializability.

Concurrency control concept comes under the Transaction in database management system (DBMS). It is a
procedure in DBMS which helps us for the management of two simultaneous processes to execute without
conflicts between each other, these conflicts occur in multi user systems.

Concurrency can simply be said to be executing multiple transactions at a time. It is required to increase time
efficiency. If many transactions try to access the same data, then inconsistency arises. Concurrency control
required to maintain consistency data.

Advantages
The advantages of concurrency control are as follows −

• Waiting time will be decreased.

• Response time will decrease.

• Resource utilization will increase.

• System performance & Efficiency is increased.

Concurrency Control in DBMS

Executing a single transaction at a time will increase the waiting time of the other transactions which may result
in delay in the overall execution. Hence for increasing the overall throughput and efficiency of the system,
several transactions are executed.

Page | 41

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Concurrently control is a very important concept of DBMS which ensures the simultaneous execution or
manipulation of data by several processes or user without resulting in data inconsistency.

Concurrency control provides a procedure that is able to control concurrent execution of the operations in the
database.

BASIC IDEA OF DEADLOCK

• Deadlock is an undesired state that brings the whole system to a halt as no task ever gets finished and
is in a waiting state forever. If any of the transactions can lead to a deadlock then that transaction is
never executed.

• There are 4 Coffman conditions out of which if one or more are true, then there might occur a
deadlock in the system.

• Deadlock handling and its avoidance are methods to deal with the situation, while the Wait-die and
Wait-wound schemes are the two prominent ways of preventing a deadlock.

Approaches to detect deadlock in the distributed system

1. Centralized Approach: This is the simplest and easiest way of deadlock detection as in this only a single
resource is responsible for detecting the deadlock. But it also has its own disadvantages, such as
excessive load on a single node and having only a single point of failure that makes the system less
reliable.
2. Distributed Approach: Unlike the centralized approach, multiple nodes are responsible for detecting
deadlock. Because of this approach multiple nodes there is proper load balancing and no single point of
failure that helps to further increase the speed of detecting deadlock.
3. Hierarchical Approach: This Approach integrates both centralized and distributed approaches for
deadlock detection. In this, a single node is made to handle a particular selected set of nodes
responsible for detecting deadlock.

A deadlock is a condition where two or more transactions are waiting indefinitely for one another to give up
locks. Deadlock is said to be one of the most feared complications in DBMS as no task ever gets finished and is in
waiting state forever.

Deadlock Avoidance
o When a database is stuck in a deadlock state, then it is better to avoid the database rather than
aborting or restating the database. This is a waste of time and resource.
o Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A method like
"wait for graph" is used for detecting the deadlock situation but this method is suitable only for the
smaller database. For the larger database, deadlock prevention method can be used.

Deadlock Prevention
o Deadlock prevention method is suitable for a large database. If the resources are allocated in such a
way that deadlock never occurs, then the deadlock can be prevented.
o The Database management system analyzes the operations of the transaction whether they can create
a deadlock situation or not. If they do, then the DBMS never allowed that transaction to be executed.

Deadlock Detection

In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should detect whether the
transaction is involved in a deadlock or not. The lock manager maintains a Wait for the graph to detect the
deadlock cycle in the database.

Page | 42

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

FAILURE CLASSIFICATION
A failure is always related to a required
function. The function is o en specified
together with a performance requirement. 1
A failure occurs when the function cannot be
performed or has a performance that

falls outside the performance requirement.

Failure in terms of a database can be defined

as its inability to execute the specified
transaction or loss of data from the database.
A DBMS is vulnerable to several kinds of
failures and each of these failures needs to
be managed differently.

1. Transaction failure

The transaction failure occurs when it fails to execute or when it reaches a point from where it can't go any
further. If a few transaction or process is hurt, then this is called as transaction failure.

Reasons for a transaction failure could be -

1. Logical errors: If a transaction cannot complete due to some code error or an internal error
condition, then the logical error occurs.

2. Syntax error: It occurs where the DBMS itself terminates an active transaction because the
database system is not able to execute it. For example, The system aborts an active
transaction, in case of deadlock or resource unavailability.

2. System Crash

o System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.

Fail-stop assumption: In the system crash, non-volatile storage is assumed not to be corrupted.

3. Disk Failure

o It occurs where hard-disk drives or storage drives used to fail frequently. It was a common
problem in the early days of technology evolution.

o Disk failure occurs due to the formation of bad sectors, disk head crash, and unreachability to
the disk or any other failure, which destroy all or part of disk storage.

STORAGE STRUCTURE TYPES

The types of storage structures are summarized here:

HEAP

The non-keyed storage structure with sequential data entry and access. There is also a compressed heap
structure (cheap) with trailing blanks removed.

HASH

A keyed storage structure with algorithmically chosen addresses based on key data values. There is also a
compressed hash structure (chash) with trailing blanks removed.

Page | 43

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

ISAM

A keyed storage structure in which data is sorted by values in key columns for fast access. The index is static and
needs remodification as the table grows. There is also a compressed ISAM structure (cISAM) with trailing blanks
removed.

BTREE

A keyed storage structure in which data is sorted by values in key columns, but the index is dynamic and grows
as the table grows. There is also a compressed B-tree structure (cB-tree) with trailing blanks removed.

STABLE STORAGE IMPLEMENTATION

To achieve such storage, we need to replicate the required information on multiple storage devices with
independent failure modes. The writing of an update should be coordinate in such a way that it would not delete
all the copies of the state and when we are recovering from a failure we can force all the copies to a consistent
and correct valued even if another failure occurs during the recovery. In this article we will discuss how to cover
these needs.

To implement such storage, we need to replicate the needed information on multiple storage devices (usually
disks) with independent failure modes. We need to coordinate the writing of updates in a way that guarantees
that a failure during an update will not leave all the copies in a damaged state and that, when we are recovering
from a failure, we can force all copies to a consistent and correct value, even if another failure occurs during the
recovery. In this section, we discuss how to meet these needs. A disk write results in one of three outcomes:

1. Successful completion:- The data were written correctly on disk.

2. Partial failure:- A failure occurred in the midst of transfer, so only some of the sectors were written with the
new data, and the sector being written during the failure may have been corrupted.

3. Total failure.:-The failure occurred before the disk write started, so the previous data values on the disk
remain intact. Whenever a failure occurs during writing of a block, the system needs to detect it and invoke a
recovery procedure to restore the block to a consistent state.

DATA ACCESS
Data access is the ability to retrieve, modify, copy, or move data from IT systems in any location, whether the
data is in motion or at rest.

Data access works with complementary technologies, including data virtualization and master data
management, to put your data to work on premise, in the cloud and everywhere in between.

Users who have data access can store, retrieve, move or manipulate stored data, which can be stored on a wide
range of hard drives and external devices.

There are two ways to access stored data: random access and sequential access. The sequential method requires
information to be moved within the disk using a seek operation until the data is located. Each segment of data
has to be read one after another until the requested data is found. Reading data randomly allows users to store
or retrieve data anywhere on the disk, and the data is accessed in constant time.

Oftentimes when using random access, the data is split into multiple parts or pieces and located anywhere
randomly on a disk. Sequential files are usually faster to load and retrieve because they require fewer seek
operations.

The goal of data access is to provide individuals and organizations with the ability to access or retrieve data
stored within a repository so users can retrieve, move, or manipulate it across a wide range of use cases.

data access can involve a range of technologies, tools, and processes. For example, it may involve using a
database management system to store and retrieve data, implementing data security measures to protect
against unauthorized access, and using data analytics tools to visualize, process, and unlock insights.

Page | 44

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Overall, data access plays a crucial role in modern organizations, as it enables businesses to reach siloed data
sources to make informed decisions.

RECOVERY AND ATOMICITY- LOG BASED RECOVERY

Log-based recovery in DBMS provides the ability to maintain or recover data in case of system failure.

Log-Based Recovery

o The log is a sequence of records. Log of each transaction is maintained in some stable storage so that if
any failure occurs, then it can be recovered from there.

o If any operation is performed on the database, then it will be recorded in the log.

o But the process of storing the logs should be done before the actual transaction is applied in the
database.

Recovery using Log records

When the system is crashed, then the system consults the log to find which transactions need to be undone and
which need to be redone.

1. If the log contains the record <Ti, Start> and <Ti, Commit> or <Ti, Commit>, then the Transaction Ti
needs to be redone.
2. If log contains record<Tn, Start> but does not contain the record either <Ti, commit> or <Ti, abort>,
then the Transaction Ti needs to be undone.

Atomicity property of DBMS states that either all the operations of transactions must be performed or none. The
modifications done by an aborted transaction should not be visible to database and the modifications done by
committed transaction should be visible. To achieve our goal of atomicity, user must first output to stable
storage information describing the modifications, without modifying the database itself. This information can
help us ensure that all modifications performed by committed transactions are reflected in the database. This
information can also help us ensure that no modifications made by an aborted transaction persist in the
database. Log is a sequence of records, which maintains the records of actions performed by a transaction. It is
important that the logs are written prior to the actual modification and stored on a stable storage media, which
is failsafe.

Log-based recovery works as follows –

❖ The log file is kept on a stable storage media.

❖ When a transaction enters the system and starts execution, it writes a log about it.

DEFERRED DATABASE MODIFICATION & IMMEDIATE DATABASE MODIFICATION

There are two approaches to modify the database:

1. Deferred database modification:

o The deferred modification technique occurs if the transaction does not modify the database until it has
committed.

o In this method, all the logs are created and stored in the stable storage, and the database is updated
when a transaction commits.

2. Immediate database modification:

o The Immediate modification technique occurs if database modification occurs while the transaction is
still active.

o In this technique, the database is modified immediately after every operation. It follows an actual
database modification.

Page | 45

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

CHECKPOINTS
o The checkpoint is a type of mechanism where all the previous logs are removed from the system and
permanently stored in the storage disk.
o The checkpoint is like a bookmark. While the execution of the transaction, such checkpoints are
marked, and the transaction is executed then using the steps of the transaction, the log files will be
created.
o When it reaches to the checkpoint, then the transaction will be updated into the database, and till that
point, the entire log file will be removed from the file. Then the log file is updated with the new step of
transaction till next checkpoint and so on.
o The checkpoint is used to declare a point before which the DBMS was in the consistent state, and all
transactions were committed.

DISTRIBUTED DATABASE: BASIC IDEA

A distributed database is essentially a database that is dispersed across numerous sites, i.e., on various
computers or over a network of computers, and is not restricted to a single system. A distributed database
system is spread across several locations with distinct physical components. This can be necessary when
different people from all over the world need to access a certain database. It must be handled such that, to
users, it seems to be a single database.

Uses for distributed databases

o The corporate management information system makes use of it.

o Multimedia apps utilize it.
o Used in hotel chains, military command systems, etc.
o The production control system also makes use of it

distributed databases have the following characteristics:

o Place unrelated
o Spread-out query processing
o The administration of distributed transactions
o Independent of hardware
o Network independent of operating systems
o Transparency of transactions
o DBMS unrelated

DISTRIBUTED DATA STORAGE

A distributed database is basically a database that is not limited to one system, it is spread over different sites,
i.e, on multiple computers or over a network of computers. A distributed database system is located on various
sites that don’t share physical components. This may be required when a particular database needs to be
accessed by various users globally. It needs to be managed such that for the users it looks like one single
database.

Types:

1. Homogeneous Database:- A homogeneous database stores data uniformly across all locations. All
sites utilize the same operating system, database management system, and data structures. They are
therefore simple to handle.
2. Heterogeneous Database: -With a heterogeneous distributed database, many locations may employ
various software and schema, which may cause issues with queries and transactions. Moreover, one

Page | 46

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

site could not be even aware of the existence of the other sites. Various operating systems and
database applications may be used by various machines. They could even employ separate database
data models. Translations are therefore necessary for communication across various sites.

Distributed Data Storage :-

There are 2 ways in which data can be stored on different sites. These are:

1. Replication –
In this approach, the entire relationship is stored redundantly at 2 or more sites. If the entire database is
available at all sites, it is a fully redundant database. Hence, in replication, systems maintain copies of data.

This is advantageous as it increases the availability of data at different sites. Also, now query requests can be
processed in parallel.
However, it has certain disadvantages as well. Data needs to be constantly updated. Any change made at one
site needs to be recorded at every site that relation is stored or else it may lead to inconsistency. This is a lot of
overhead. Also, concurrency control becomes way more complex as concurrent access now needs to be checked
over a number of sites.

2. Fragmentation –
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and each of the fragments
is stored in different sites where they’re required. It must be made sure that the fragments are such that they
can be used to reconstruct the original relation (i.e, there isn’t any loss of data).
Fragmentation is advantageous as it doesn’t create copies of data, consistency is not a problem.

Fragmentation of relations can be done in two ways:

• Horizontal fragmentation – Splitting by rows –

The relation is fragmented into groups of tuples so that each tuple is assigned to at least one fragment.

• Vertical fragmentation – Splitting by columns –

The schema of the relation is divided into smaller schemas. Each fragment must contain a common
candidate key so as to ensure a lossless join.

In certain cases, an approach that is hybrid of fragmentation and replication is used.

Applications of Distributed Database:

• It is used in Corporate Management Information System.

• It is used in multimedia applications.

• Used in Military’s control system, Hotel chains etc.

• It is also used in manufacturing control system.

DATA REPLICATION
Data replication is the process of making multiple copies of data and storing them at different locations for
backup purposes, fault tolerance and to improve their overall accessibility across a network.

Benefits of data replication

Although data replication can be demanding in terms of cost, computational, and storage requirements,
businesses widely use this database management technique to achieve one or more of the following goals:
1. Improve the availability of data

2. Increase the speed of data access

3. Enhance server performance

4. Accomplish disaster recovery

Page | 47

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Data Replication is the process of storing data in more than one site or node. It is useful in improving the
availability of data. It is simply copying data from a database from one server to another server so that all the
users can share the same data without any inconsistency.

The advantages of data replication include:

1. Improved performance, as data can be read from a local copy of the data instead of a remote one.

2. Increased data availability, as copies of the data can be used in case of a failure of the primary database.

3. Improved scalability, as the load on the primary database can be reduced by reading data from the
replicas.

The disadvantages of data replication include:

1. Increased complexity, as the replication process needs to be configured and maintained.

2. Increased risk of data inconsistencies, as data can be updated simultaneously on different replicas.

3. Increased storage and network usage, as multiple copies of the data need to be stored and transmitted.

4. Data replication is widely used in various types of systems, such as online transaction processing
systems, data warehousing systems, and distributed systems.

DATA FRAGMENTATION: HORIZONTAL

• Horizontal Data Fragmentation :

As the name suggests, here the data / records are fragmented horizontally. i.e.; horizontal subset of table data is
created and are stored in different database in DDB.

For example, consider the employees working at different locations of the organization like India, USA, UK etc.
number of employees from all these locations are not a small number. They are huge in number. When any
details of any one employee are required, whole table needs to be accessed to get the information. Again the
employee table may present in any location in the world. But the concept of DDB is to place the data in the
nearest DB so that it will be accessed quickly. Hence what we do is divide the entire employee table data
horizontally based on the location. i.e.

VERTICAL AND MIXED FRAGMENTATION

• Vertical Data Fragmentation :

This is the vertical subset of a relation. That means a relation / table is fragmented by considering the columns of
it.

For example consider the

EMPLOYEE table with ID,
Name, Address, Age,
location, DeptID, ProjID.

Page | 48

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

The vertical fragmentation of this table may be dividing the table into different tables with one or more columns
from EMPLOYEE.

• Hybrid Data Fragmentation :

This is the combination of horizontal as well as vertical fragmentation. This type of fragmentation will have
horizontal fragmentation to have subset of data to be distributed over the DB, and vertical fragmentation to
have subset of columns of the table.

As we observe in above diagram, this type of fragmentation can be done in any order. It does not have any
particular order. It is solely based on the user requirement. But it should satisfy fragmentation conditions.

UNIT V
EMERGING FIELDS IN DBMS: OBJECT ORIENTED DATABASES-BASIC IDEA AND THE MODEL
Object-oriented databases (OODBs) are an emerging field in DBMS (Database Management Systems) that aim to
provide a data model and storage structure specifically designed for object-oriented programming paradigms.
OODBs extend the traditional relational database model to support the storage and retrieval of complex,
structured objects directly within the database.

The basic idea behind object-oriented databases is to bridge the gap between programming languages and
databases by integrating the concepts of object-oriented programming into the data management system. This
allows developers to work with persistent objects in a more natural and seamless manner.

Object-oriented databases provide a more natural and efficient way to manage complex and structured data that
aligns well with object-oriented programming paradigms. They are particularly useful in domains where complex
data structures and relationships are prevalent, such as computer-aided design (CAD), multimedia applications,
scientific research, and data-intensive software systems.

It's worth noting that while object-oriented databases have their advantages, they are not as widely adopted as
traditional relational databases. Relational databases, such as SQL-based systems, still dominate the mainstream
due to their maturity, standardization, and extensive tooling ecosystem. However, object-oriented concepts and
features are often integrated into relational databases through extensions or object-relational mapping
frameworks, providing a compromise between the two paradigms.

OBJECT STRUCTURE OBJECT CLASS

In object-oriented databases (OODBs) within a DBMS (Database Management System), object structure and
object class are two fundamental concepts related to how data is organized and represented.

1. Object Structure: The object structure refers to the way in which data is organized and stored within
the OODB. In an OODB, data is represented as objects, which are instances of predefined classes or
types. Each object has a unique identity and encapsulates both data attributes and the methods or
functions that operate on those attributes. The object structure defines the composition and
arrangement of these objects within the database.

Page | 49

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Object structures in OODBs provide several advantages:

• Complex Data Representation: Objects allow for the representation of complex data structures, such as
nested objects or object hierarchies. This enables the modeling of real-world entities and their
relationships in a more natural and intuitive manner.

• Encapsulation: Objects in an OODB encapsulate both data and behavior, following the principles of
object-oriented programming. Encapsulation ensures that an object's internal data and implementation
details are hidden and can only be accessed through defined methods or functions.

• Relationship Management: Object structures enable the representation of relationships and

associations between objects. These relationships can be defined using attributes or references,
allowing for the modeling of one-to-one, one-to-many, or many-to-many relationships between objects.

2. Object Class: In an OODB, an object class is a blueprint or template for creating objects. It defines the
structure, behavior, and properties that objects of that class will possess. Object classes in OODBs are
similar to classes in object-oriented programming languages.

Object classes provide a structured and organized approach to representing data in an OODB. They define the
characteristics and behaviors of objects, facilitating data modeling, code reuse, and maintaining data integrity.

Overall, the combination of object structure and object classes in OODBs allows for the effective representation,
manipulation, and organization of complex data within a database, aligning well with the principles of object-
oriented programming.

INHERITANCE
Inheritance in DBMS (Database Management Systems) refers to a mechanism that allows the creation of new
database objects based on existing objects, inheriting their attributes, relationships, and behaviors. It is a
concept borrowed from object-oriented programming and is commonly used in object-relational databases.

Benefits of Inheritance in DBMS:

1. Code Reusability: Inheritance enables the reuse of existing database objects, reducing redundancy and
promoting code reusability. Common attributes, relationships, and behaviors defined in the superclass
need not be redefined in each subclass.

2. Data Consistency: Inheritance helps maintain data consistency by ensuring that common attributes and
relationships are inherited across related objects. Changes made to the superclass propagate to the
subclasses, promoting data integrity and reducing data duplication.

3. Simplified Database Design: Inheritance allows for a more modular and organized database design. By
capturing similarities and differences between related objects in a hierarchical structure, the design
becomes more maintainable and scalable.

4. Polymorphism: Inheritance supports polymorphism, which allows different objects (subclasses) to be

treated as instances of their common superclass. This promotes flexibility and enables the use of
generic operations or queries that can be applied uniformly to multiple subclasses.

It's important to note that inheritance in DBMS is typically found in object-relational databases or object-
oriented extensions of relational databases. Traditional relational databases may not provide explicit support for
inheritance, although relationships between tables can be established to simulate some aspects of inheritance.

In summary, inheritance in DBMS provides a means to create hierarchical relationships between objects,
facilitating code reuse, data consistency, and modular database design. It is a powerful mechanism that brings
the benefits of object-oriented programming to database systems.

Page | 50

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

DATA WAREHOUSING- TERMINOLOGY

Data warehousing is a discipline within DBMS (Database Management Systems) that involves the collection,
storage, and management of large volumes of data from various sources to support business intelligence and
decision-making processes.

A Data Warehouse (DW) is a relational database that is designed for query and analysis rather than transaction
processing. It includes historical data derived from transaction data from single and multiple sources.

A Data Warehouse provides integrated, enterprise-wide, historical data and focuses on providing support for
decision-makers for data modeling and analysis.

A Data Warehouse is a group of data specific to the entire organization, not only to a particular group of users.It
is not used for daily operations and transaction processing but used for making decisions.

A Data Warehouse can be viewed as a data system with the following attributes:

o It is a database designed for investigative tasks, using data from various applications.
o It supports a relatively small number of clients with relatively long interactions.
o It includes current and historical data to provide a historical perspective of information.
o Its usage is read-intensive.
o It contains a few large tables.

"Data Warehouse is a subject-oriented, integrated, and time-variant store of information in support of

management's decisions."

Benefits of Data Warehouse

1. Understand business trends and make better forecasting decisions.
2. Data Warehouses are designed to perform well enormous amounts of data.
3. The structure of data warehouses is more accessible for end-users to navigate, understand, and query.
4. Queries that would be complex in many normalized databases could be easier to build and maintain in
data warehouses.
5. Data warehousing is an efficient method to manage demand for lots of information from lots of users.
6. Data warehousing provide the capabilities to analyze a large amount of historical data.

Goals of Data Warehousing

o To help reporting as well as analysis
o Maintain the organization's historical information
o Be the foundation for decision making.

Need for Data Warehouse

1. 1) Business User: Business users require a data warehouse to view summarized data from the past.
Since these people are non-technical, the data may be presented to them in an elementary form.
2. 2) Store historical data: Data Warehouse is required to store the time variable data from the past. This
input is made to be used for various purposes.
3. 3) Make strategic decisions: Some strategies may be depending upon the data in the data warehouse.
So, data warehouse contributes to making strategic decisions.
4. 4) For data consistency and quality: Bringing the data from different sources at a commonplace, the
user can effectively undertake to bring the uniformity and consistency in data.
5. 5) High response time: Data warehouse has to be ready for somewhat unexpected loads and types of
queries, which demands a significant degree of flexibility and quick response time.

Page | 51

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

DATA MINING AND IT’S OVERVIEW

Data mining is the process of sorting through large data sets to identify patterns and relationships that can help
solve business problems through data analysis. Data mining techniques and tools enable enterprises to predict
future trends and make more-informed business decisions.

Data mining is one of the most useful techniques that help entrepreneurs, researchers, and individuals to extract
valuable information from huge sets of data. Data mining is also called Knowledge Discovery in Database (KDD).
The knowledge discovery process includes Data cleaning, Data integration, Data selection, Data transformation,
Data mining, Pattern evaluation, and Knowledge presentation.

Data Mining is a process used by organizations to extract specific data from huge databases to solve business
problems. It primarily turns raw data into useful information.

Advantages of Data Mining

o The Data Mining technique enables organizations to obtain knowledge-based data.
o Data mining enables organizations to make lucrative modifications in operation and production.
o Compared with other statistical data applications, data mining is a cost-efficient.
o Data Mining helps the decision-making process of an organization.
o It Facilitates the automated discovery of hidden patterns as well as the prediction of trends and
behaviors.
o It can be induced in the new system as well as the existing platforms.
o It is a quick process that makes it easy for new users to analyze enormous amounts of data in a short
time.

Disadvantages of Data Mining

o There is a probability that the organizations may sell useful data of customers to other organizations for
money. As per the report, American Express has sold credit card purchases of their customers to other
organizations.
o Many data mining analytics software is difficult to operate and needs advance training to work on.
o Different data mining instruments operate in distinct ways due to the different algorithms used in their
design. Therefore, the selection of the right data mining tools is a very challenging task.
o The data mining techniques are not precise, so that it may lead to severe consequences in certain
conditions.

Data Mining Applications

Data Mining is primarily used by organizations with intense

consumer demands- Retail, Communication, Financial,
marketing company, determine price, consumer
preferences, product positioning, and impact on sales,
customer satisfaction, and corporate profits. Data mining
enables a retailer to use point-of-sale records of customer
purchases to develop products and promotions that help
the organization to attract the customer.

Page | 52

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

DATABASE ON WWW
Databases on the WWW provide a means for storing, managing, and accessing data over the internet. They play
a crucial role in powering various web applications, e-commerce platforms, content management systems, and
other online services.

A database on the World Wide Web (WWW) in database management systems (DBMS) refers to a collection of
structured data that is accessible and stored on the internet. It allows users to access, retrieve, and manipulate
data using web browsers or other web-based applications.

MULTIMEDIA DATA FORMATS

In a DBMS (Database Management System), multimedia data formats refer to the file formats used to store and
represent different types of multimedia data, such as images, audio, and video. These formats define the
structure and encoding of the data, allowing efficient storage, retrieval, and manipulation within the DBMS. Here
are some commonly used multimedia data formats in DBMS:

1. Image Formats:

• JPEG (Joint Photographic Experts Group): A widely used format for compressed images,
suitable for photographs and complex images.

• PNG (Portable Network Graphics): A lossless format for images with support for transparency
and high-quality graphics.

• GIF (Graphics Interchange Format): A format commonly used for animated images and simple
graphics.

• BMP (Bitmap): A basic format that stores uncompressed image data pixel by pixel, resulting in
large file sizes.

2. Audio Formats:

• MP3 (MPEG-1 Audio Layer 3): A compressed audio format that achieves high audio quality
while reducing file size.

• WAV (Waveform Audio File Format): A standard format for uncompressed audio files, often
used for high-fidelity recordings.

• AAC (Advanced Audio Coding): A format that offers better sound quality and smaller file sizes
compared to MP3.

• FLAC (Free Lossless Audio Codec): A lossless compression format that preserves audio quality
while reducing file size.

3. Video Formats:

• MP4 (MPEG-4 Part 14): A popular video format that supports audio, video, and subtitles in a
single file. It provides efficient compression while maintaining good quality.

• AVI (Audio Video Interleave): A container format that can store audio and video data in various
codecs, commonly used on Windows systems.

• MKV (Matroska Video): A flexible container format that can hold multiple audio, video, and
subtitle streams. It supports high-quality video and audio compression.

• MOV (QuickTime File Format): A multimedia container format developed by Apple that can
store video, audio, and other media types.

4. Other Formats:

• PDF (Portable Document Format): A format primarily used for documents, but can also store
multimedia elements such as images, audio, and video.

Page | 53

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

• SVG (Scalable Vector Graphics): A format for vector graphics that can be scaled without losing
quality. It is often used for icons, logos, and illustrations.

These are just a few examples of multimedia data formats commonly used in DBMS. Depending on the specific
requirements and applications, there are many other formats available, each with its own characteristics and
features. DBMS systems often provide support for multiple formats, allowing users to store and retrieve
multimedia data in a variety of ways.

STORAGE STRUCTURE AND FILE ORGANIZATIONS: OVERVIEW OF PHYSICAL STORAGE MEDIA

In a DBMS (Database Management System), physical storage media refers to the actual hardware devices used
to store the database files. These storage media can vary in terms of their characteristics, performance, and
cost. Here is an overview of the commonly used physical storage media in DBMS:

1. Magnetic Disks: Magnetic disks, also known as hard disk drives (HDDs), are the most prevalent storage
media in DBMS. They consist of rotating platters coated with a magnetic material, and data is stored in
binary form on these platters. Magnetic disks provide high capacity and are suitable for storing large
amounts of data. They offer relatively lower cost per unit of storage, but their access times and data
transfer rates are slower compared to other storage media like solid-state drives (SSDs).

2. Solid-State Drives (SSDs): Solid-state drives use flash memory to store data. Unlike magnetic disks, they
have no moving parts, resulting in faster access times and higher data transfer rates. SSDs offer
improved random I/O performance, making them suitable for applications that require fast data
retrieval. However, SSDs are generally more expensive per unit of storage compared to magnetic disks.
They are often used as a cache or for storing frequently accessed data in DBMS environments.

3. Optical Disks: Optical disks, such as CDs (Compact Discs), DVDs (Digital Versatile Discs), and Blu-ray
discs, are another form of physical storage media. They use optical technology to store data in a non-
volatile manner. Optical disks provide high data durability and are primarily used for long-term archival
purposes rather than frequent data access. They offer slower access times compared to magnetic disks
and SSDs.

4. Magnetic Tapes: Magnetic tapes are sequential access storage media that use a magnetic recording
method to store data. They consist of a long strip of tape coated with a magnetic material. Magnetic
tapes offer high storage capacity but have slower access times compared to disk-based storage media.
They are typically used for backups, archives, and large-scale data storage where infrequent access is
required.

It's important to note that the choice of physical storage media depends on factors such as the specific DBMS
requirements, performance needs, budget constraints, and the nature of the data being stored. In many cases, a
combination of storage media is employed, with faster media like SSDs used for frequently accessed data and
magnetic disks or tapes used for less frequently accessed or archival data.

MAGNETIC DISK PERFORMANCE AND OPTIMIZATION

In a DBMS (Database Management System), a magnetic disk is a common storage medium used to store data.
Optimizing the performance of disk operations is crucial for efficient database operations. Here are some
considerations and techniques for optimizing magnetic disk performance in a DBMS:

1. Disk Layout and Partitioning: Proper disk layout and partitioning can significantly impact performance.
Partitioning techniques such as striping or RAID (Redundant Array of Independent Disks) can distribute
data across multiple disks, improving parallelism and reducing disk contention.

2. Disk Scheduling Algorithms: The choice of disk scheduling algorithms can affect the order in which disk
requests are serviced. Algorithms like SCAN, C-SCAN, LOOK, or C-LOOK can minimize seek times and
improve overall disk performance.

3. Buffering and Caching: Using a disk buffer or cache can reduce the frequency of disk reads and writes.
Frequently accessed data can be cached in memory, reducing disk I/O operations and improving
response times.

Page | 54

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

4. File Organization: Choosing the appropriate file organization can impact disk performance. Techniques
such as indexing, clustering, or hashing can optimize data retrieval and minimize disk seeks.

5. Compression and Encoding: Data compression and encoding techniques can reduce the amount of data
stored on disk, resulting in reduced disk I/O operations and improved performance. However,
compression techniques may introduce overhead during data retrieval and updates.

6. I/O Parallelism: Exploiting parallelism in disk I/O operations can improve performance. Techniques such
as asynchronous I/O, parallel I/O, or multi-threading can overlap disk operations and reduce idle time.

7. Disk Defragmentation: Regularly defragmenting the disk can optimize performance by rearranging data
to minimize seek times. Defragmentation reduces the fragmentation of data blocks and improves
sequential access patterns.

8. RAID Configurations: Redundant Array of Independent Disks (RAID) configurations can provide fault
tolerance and improved performance. Techniques like RAID 0 (striping), RAID 1 (mirroring), or RAID 5
(striping with parity) can enhance disk performance and data availability.

9. Solid-State Drives (SSDs): Consider using solid-state drives instead of traditional magnetic disks. SSDs
offer faster access times and better random I/O performance, which can significantly boost DBMS
performance.

10. Regular Maintenance: Perform regular maintenance tasks such as disk cleanup, removing unnecessary
files, and optimizing database indexes. These activities can help maintain optimal disk performance
over time.

It's important to note that the performance optimization techniques may vary depending on the specific DBMS,
hardware, and workload characteristics. Therefore, it's recommended to analyze the system, benchmark
performance, and apply appropriate optimizations based on the specific environment and requirements.

BASIC IDEA OF RAID

Redundancy array of independent disk (RAID) is a way to combine multiple disk storages for increased
performance, data redundancy and disk reliability.

• RAID is used to backup the data when a disk fails for some reason.

• RAID 0 implements data striping.

• RAID 1 implements mirroring which creates redundant data.

• RAID 2 uses Hamming code Error Detection method to correct error in data.

• RAID 3 does byte-level data striping and has parity bits for each data word.

• RAID 4 does block-level data striping.

• RAID 5 has rotating parity across the disks.

• RAID 6 has two parity which can handle at most two disk failures.

In RAID technique, the combined disks are considered as a single logical disk by the operating system. These
individual disk uses different methods to store data. It depends on the type of RAID levels used.

RAID Levels in DBMS

The different RAID levels used in DBMS are :

• RAID 0

• RAID 1

• RAID 2

• RAID 3

Page | 55

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

• RAID 4

• RAID 5

• RAID 6

RAID 0

RAID 0 implements data striping. The data blocks are placed in multiple disks without redundancy. None of the
disks are used for data redundancy so if one disk fails then all the data in the array is lost.

Pros of RAID 0

• All the disk space is utilized and hence performance is increased.

• Data requests can be on multiple disks and not on a single disk hence improving the throughput.

Cons of RAID 0

• Failure of one disk can lead to complete data loss in the respective array.

• No data Redundancy is implemented so one disk failure can lead to system failure.

RAID 1

RAID 1 implements mirroring which means the data of one disk is replicated in another disk. This helps in
preventing system failure as if one disk fails then the redundant disk takes over.

Pros of RAID 1

• Failure of one Disk does not lead to system failure as there is redundant data in other disk.

Cons of RAID 1

• Extra space is required for each disk as each disk data is copied to some other disk also.

RAID 2

RAID 2 is used when error in data has to be checked at bit level, which uses a Hamming code detection method.
Two disks are used in this technique. One is used to store bit of each word in the disk and another is used to
store error code correction (Parity bits) of data words. The structure of this RAID is complex, so it is not used
commonly.

Pros of RAID 2

• It checks for error at a bit level for every data word.

• One full disk is used to store parity bits which helps in detecting error.

Cons of RAID 2

• Large extra space is used for parity bit storage.

RAID 3

RAID 3 implements byte-level striping of Data. Data is stored across disks with their parity bits in a separate disk.
The parity bits helps to reconstruct the data when there is a data loss.

Pros of RAID 3

• Data can be recovered with the help of parity bits.

Cons of RAID 3

• Extra space for storing parity bits is used.

Page | 56

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

RAID 4

RAID 4 implements block-level striping of data with dedicated parity drive. If only one of the data is lost in any
disk then it can be reconstructed with the help of parity drive. Parity is calculated with the help of XOR operation
over each data disk block.

Pros of RAID 4

• Parity bits helps to reconstruct the data if at most one data is lost from the disks.

Cons of RAID 4

• Extra space for Parity is required.

• If there is more than one data loss from multiple disks then Parity cannot help us reconstruct the data.

RAID 5

RAID 5 is similar to RAID 4 with only one difference. The parity Rotates among the Disks.

Pros of RAID 5

• Parity is distributed over the disk and makes the performance better.

• Data can be reconstructed using parity bits.

Cons of RAID 5

• Parity bits are useful only when there is data loss in at most one Disk. If there is loss in more than one
Disk Block then parity is of no use.

• Extra space for parity is required.

RAID 6

If there is more than one Disk failure, then RAID 6 implementation helps in that case. In RAID 6 there are two
parity in each array/row. It is similar to RAID 5 with extra parity.

Pros of RAID 6

• More parity helps in reconstructing at most 2 Disk data.

Cons of RAID 6

• Extra space is used for both parities. (P and Q).

• More than 2 disk failures can not be corrected.

FILE ORGANIZATION
o The File is a collection of records. Using the primary key, we can access the records. The type and frequency of
access can be determined by the type of file organization which was used for a given set of records.

o File organization is a logical relationship among various records. This method defines how file records are mapped
onto disk blocks.

o File organization is used to describe the way in which the records are stored in terms of blocks, and the blocks are
placed on the storage medium.

o The first approach to map the database to the file is to use the several files and store only one fixed length record
in any given file. An alternative approach is to structure our files so that we can contain multiple lengths for
records.

o Files of fixed length records are easier to implement than the files of variable length records.

Page | 57

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Objective of file organization

o It contains an optimal selection of records, i.e., records can be selected as fast as possible.
o To perform insert, delete or update transaction on the records should be quick and easy.
o The duplicate records cannot be induced as a result of insert, update or delete.
o For the minimal cost of storage, records should be stored efficiently.

ORGANIZATION OF RECORDS IN FILES

File organization contains various methods. These particular methods have pros and cons on the basis of access
or selection. In the file organization, the programmer decides the best-suited file organization method according
to his requirement.

Types of file organization are as follows:

o Sequential file organization:- This method is the easiest method for file organization. In this method,
files are stored sequentially. This method can be implemented in two ways:

1. Pile File Method:

o It is a quite simple method. In this method, we store the record in a sequence, i.e., one after
another. Here, the record will be inserted in the order in which they are inserted into tables.
o In case of updating or deleting of any record, the record will be searched in the memory blocks.
When it is found, then it will be marked for deleting, and the new record is inserted.

Insertion of the new record:

Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence, records are nothing
but a row in the table. Suppose we want to insert a new record R2 in the sequence, then it will be placed at
the end of the file. Here, records are nothing but a row in any table.

Page | 58

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

2. Sorted File Method:

o In this method, the new record is always inserted at the file's end, and then it will sort the
sequence in ascending or descending order. Sorting of records is based on any primary key or any
other key.
o In the case of modification of any record, it will update the record and then sort the file, and lastly,
the updated record is placed in the right place.

Insertion of the new record:

Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and R7. Suppose a
new record R2 has to be inserted in the sequence, then it will be inserted at the end of the file, and then it
will sort the sequence.

Pros of sequential file organization

o It contains a fast and
efficient method for the
huge amount of data.
o In this method, files can
be easily stored in
cheaper storage
mechanism like
magnetic tapes.
o It is simple in design. It
requires no much effort
to store the data.
o This method is used
when most of the
records have to be accessed like grade calculation of a student, generating the salary slip, etc.
o This method is used for report generation or statistical calculations.

Cons of sequential file organization

o It will waste time as we cannot jump on a particular record that is required but we have to move
sequentially which takes our time.
o Sorted file method takes more time and space for sorting the records.

o Heap file organization:-

✓ It is the simplest and most basic type of organization. It works with data blocks. In heap file
organization, the records are inserted at the file's end. When the records are inserted, it doesn't
require the sorting and ordering of records.

Page | 59

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

✓ When the data block is full, the new record is stored in some other block. This new data block need not to
be the very next data block, but it can select any data block in the memory to store new records. The heap
file is also known as an unordered file.
✓ In the file, every record has a unique id, and every page in a file is of the same size. It is the DBMS
responsibility to store and manage the new records.

Pros of Heap file organization

o It is a very good method of file organization for bulk insertion. If there is a large number of data which needs
to load into the database at a time, then this method is best suited.
o In case of a small database, fetching and retrieving of records is faster than the sequential record.

Cons of Heap file organization

o This method is inefficient for the large database because it takes time to search or modify the record.
o This method is inefficient for large databases.

o Hash file organization:- Hash File Organization uses the computation of hash function on some fields of
the records. The hash function's output determines the location of disk block where the records are to
be placed.

In this method, there is no effort for searching and sorting the entire file. In this method, each record
will be stored randomly in the memory.

o B+ file organization:-

B+ tree file organization is the advanced method of an indexed sequential access method. It uses a
tree-like structure to store records in File.

It uses the same concept of key-index where the primary key is used to sort the records. For each
primary key, the value of the index is generated and mapped with the record.

The B+ tree is similar to a binary search tree (BST), but it can have more than two children. In this
method, all the records are stored only at the leaf node. Intermediate nodes act as a pointer to the leaf
nodes. They do not contain any records.

Pros of B+ tree file organization

o In this method, searching becomes very easy as all the records are stored only in the leaf nodes and
sorted the sequential linked list.
o Traversing through the tree structure is easier and faster.
o The size of the B+ tree has no restrictions, so the number of records can increase or decrease and the
B+ tree structure can also grow or shrink.
o It is a balanced tree structure, and any insert/update/delete does not affect the performance of tree.

Cons of B+ tree file organization

o This method is inefficient for the static method.

o Indexed sequential access method (ISAM):-

ISAM method is an advanced sequential file organization. In this method, records are stored in the file using the
primary key. An index value is generated for each primary key and mapped with the record. This index contains the
address of the record in the file.

If any record has to be retrieved based on its index value, then the address of the data block is fetched and the
record is retrieved from the memory.

Page | 60

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Pros of ISAM:
o In this method, each record has the address of its data block, searching a record in a huge database is
quick and easy.
o This method supports range retrieval and partial retrieval of records. Since the index is based on the
primary key values, we can retrieve the data for the given range of value. In the same way, the partial
value can also be easily searched, i.e., the student name starting with 'JA' can be easily searched.

Cons of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be reconstructed to maintain the
sequence.
o When the record is deleted, then the space used by it needs to be released. Otherwise, the
performance of the database will slow down.

o Cluster file organization:-

• When the two or more records are stored in the same file, it is known as clusters. These files will
have two or more tables in the same data block, and key attributes which are used to map these
tables together are stored only once.
• This method reduces the cost of searching for various records in different files.
• The cluster file organization is used when there is a frequent need for joining the tables with the
same condition. These joins will give only a few records from both tables. In the given example, we
are retrieving the record for only particular departments. This method can't be used to retrieve
the record for the entire department

Types of Cluster file organization:

Cluster file organization is of two types:

o Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored together. The above
EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster. Here, all the records
are grouped based on the cluster key- DEP_ID and all the records are grouped.
o Hash Clusters:

It is similar to the indexed cluster. In hash cluster, instead of storing the records
based on the cluster key, we generate the value of the hash key for the cluster key
and store the records with the same hash key value.

Pros of Cluster file organization

▪ The cluster file organization is used when there is a frequent request for joining the tables with same
joining condition.
▪ It provides the efficient result when there is a 1:M mapping between the tables.

Cons of Cluster file organization

• This method has the low performance for the very large database.
• If there is any change in joining condition, then this method cannot use. If we change the condition
of joining then traversing the file takes a lot of time.
• This method is not suitable for a table with a 1:1 condition.

Page | 61

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

BASIC CONCEPTS OF INDEXING

o Indexing is used to optimize the performance of a database by minimizing the number of disk accesses
required when a query is processed.
o The index is a type of data structure. It is used to locate and access the data in a database table quickly.
o Indexing is a data structure technique which allows you to quickly retrieve records from a database file.
An Index is a small table having only two columns. The first column comprises a copy of the primary or
candidate key of a table. Its second column contains a set of pointers for holding the address of the disk
block where that specific key value stored.
An index –
o Takes a search key as input
o Efficiently returns a collection of matching records.

ORDERED INDICES
The indices are usually sorted to make searching faster. The indices which are sorted are known as ordered
indices.

Ordered indexing is the traditional way of storing that gives fast retrieval. The indices are stored in a sorted
manner hence it is also known as ordered indices.

Ordered Indexing is further divided into two categories:

1. Dense Indexing: In dense indexing, the index table contains records for every search key value of the
database. This makes searching faster but requires a lot more space. It is like primary indexing but
contains a record for every search key.

2. Sparse Indexing: Sparse indexing consumes lesser space than dense indexing, but it is a bit slower as
well. We do not include a search key for every record despite that we store a Search key that points to
a block. The pointed block further contains a group of data. Sometimes we have to perform double
searching this makes sparse indexing a bit slower.

BASIC IDEA OF B-TREE AND B+-TREE ORGANIZATION

Page | 62

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

NETWORK MODEL
• Network Model in DBMS is a hierarchical model that is used to represent the many-to-many
relationship among the database constraints.

• The network model in DBMS is a hierarchal structure but is different from the hierarchal database
model as there can be numerous parents of a member.

• In the network model in DBMS, there are multiple paths to the same record which helps in avoiding
data redundancy problems.
• In the network model in DBMS, there is data integrity as every member entity has one or more owners.
Only the prime parent has no owner but it has various inter-related children.
• The network database model is very complicated due to several entities inter-related with each other.
So, managing is also quite difficult.
Operations on Network Model in DBMS
• Insertion Operation - We can insert or add a new record in the network database model but before
adding any new record the database administrator or the user needs to understand the whole
structure.

• Update Operation - We can update the data record(s). If a certain data is updated then all its children
entities are also affected.

• Deletion Operation - We can delete the data record(s) but the deletion is a very crucial operation.
Before deleting any record, we should first look out for the various connected entities so that the
corresponding entities do not get affected by the deletion.

• Retrieval Operation - The retrieval of records in the network model in DBMS is quite complex to
program but it is very fast as the entities are interconnected and various paths lead to certain records.

Advantages of Network Model in DBMS

• It is a simple and easy-to-construct hierarchical database model.

• The network model in DBMS allows 1 : 1 (one-to-one), 1 : M (many-to-one), M : N (many-to-one)

relationships among the entities or members.

• In the network model in DBMS, there are multiple paths to the same record which helps in avoiding
data redundancy problems.

• In the network model in DBMS, there is data integrity as every member entity has one or more owners.
Only the prime parent has no owner but it has various inter-related children.

Page | 63

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

• The data retrieval is faster in the case of the network model in DBMS because the entities and the data
are more interrelated.

• Due to the parent-child relationship, if there is a change in the parent entity, it is reflected in the
children's entity as well. It also saves time as we do not need to update all the related children entities.

Disadvantages of Network Model in DBMS

• The network database model is very complicated due to several entities inter-related with each other.
So, managing is also quite difficult.

• In the case of the addition of new entities, the database administrator or the user needs to understand
the whole structure.

• Due to complex inter-related structure the addition, update, as well as deletion are very difficult.

• There is no scope for any automated query optimization in DBMS.

• We need to use a pointer for navigation hence the operational anomalies exist.

HIERARCHICAL MODELS
Hierarchical data model is being used from the 1960s onwards where data is organized like a tree structure

In 1966 IBM introduced an information management system(IMS product) which is based on this hierarchical
data model but now it is rarely used.

The hierarchical model organizes the data into a tree structure which consist of a single root node where each
record is having a parent record and many child records and expands like a tree

A hierarchical database is a set of tables arranged in the form of a parent-child relationship. Each set of parents
can have a relationship with any number of children. But every child can have a relationship with only one set of
parents.

A hierarchical database model is a one-to-many relationship. You can think of it as an upside-down tree with the
root at the top. To access data from the database, the whole tree has to be traversed starting from the root
downwards.

Features of Hierarchical Model

The features of hierarchical model are as follows −

• It is based on tree structure.

• It consists of a collection of records that are connected to each other by links.

• The tree structure used in hierarchical models is called as routed true.

• The root node of that tree is an empty node.

• So, Hierarchical model is a collection of routed trees and the relationship that exists in the hierarchical
model is one to many and one to one.

Advantages

The advantages of hierarchical model are as follows −

• It is easy to understand.

• More efficient than the ER model.

Page | 64

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Disadvantages

• Data inconsistency occurs when the parent node is deleting that result in the deletion of the child node.

• Wastage of storage space.

• Complex to design.

• Absence of structural independence.

DBTG MODEL
The acronym DBTG refers to the Data Base Task Group of the Conference on Data Systems Languages (CODASYL),
the group responsible for standardization of the programming language COBOL.

The DBTG final report appeared in Apri1971, it introduced a new distinct and self-contained language. The DBTG
is intended to meet the requirements of many distinct programming languages, not just COBOL, the user in a
DBTG system is considered to be an ordinary application programmer and the language therefore is not biased
toward any single specific programming language.

It is based on network model. In addition to proposing a formal notation for networks (the Data Definition
Language or DDL), the DBTG has proposed a Subschema Data Definition Language (Subschema DDL) for defining
views of conceptual scheme that was itself defined using the Data Definition Language. It also proposed a Data
Manipulation Language (DML) suitable for writing applications programs that manipulate the conceptual scheme
or a view.

Architecture of DBTG Model

The architecture of a DBTG system is illustrated in Figure.
The architecture of DBTG model can be divided in three different levels as the architecture of
a’ database system. These are:
• Storage Schema (corresponds to Internal View of database)
• Schema (corresponds to Conceptual View of database)

COMPARISON OF THE THREE MODEL

A database model is a logical structure of a database, which contains the relationships and constraints that
determine about how data is stored and accessed. The individual database models are designed based on the
rules and concepts. Most data models can be represented by an accompanying database diagram.
Types of database models
There are so many database models, but most used database models are −

• Hierarchical database model

• Relational model
• Network model
• Object-oriented database model
The major differences between the hierarchical, network and the relational models are as follows −

Hierarchical Model Network Model Relational Model

One to many or one to one Allowed the network mode One to one, one to many, many
relationship. to support many to many to one relationship.
relationships.

Retrieve algorithms are complex and Retrieve algorithms are Retrieve algorithms are simple

Page | 65

Downloaded by Ram Yadav ([email protected])

lOMoARcPSD|44330101

Hierarchical Model Network Model Relational Model

asymmetric. complex and symmetric. and symmetric.

Based on parent child relationship. A record can have many Based on relational data
parents as well as many structures.
children.

Doesn’t provide an independent stand Conference on data system Relational databases are what
alone query interface. language. bring many sources into a
common query such as SQL.

Cannot insert the information of a Does not suffer from any Doesn’t suffer from any insertion
child who does not have any parent. insertion anomaly. anomaly.

Multiple occurrences of child records Free from update Free from update anomalies.
which lead to problems of anomalies.
inconsistency during the update
operation.

Deletion of parent results in deletion Free from delete anomalies. Free from delete anomalies.
of child record.

This model lacks data independence. There is partial data It provides data independence.
independence.

Less flexible. Flexible. Flexible.

Difficult to access data. Easier to access data. Easier to access data.

Arrange data in a tree-like structure. Organizes data in graph-like Arrange data in ta

structure.

Page | 66

Downloaded by Ram Yadav ([email protected])

Data Abstraction in DBMS
No ratings yet
Data Abstraction in DBMS
13 pages
Chapter 2 IT Series Book
No ratings yet
Chapter 2 IT Series Book
51 pages
Customs of The Tagalogs (Critical Essay)
67% (3)
Customs of The Tagalogs (Critical Essay)
3 pages
Rdbms Theory
No ratings yet
Rdbms Theory
208 pages
Chapter Two: Database System Concepts and Architecture
No ratings yet
Chapter Two: Database System Concepts and Architecture
12 pages
DBMS Unit1
No ratings yet
DBMS Unit1
84 pages
Unit1 2024
No ratings yet
Unit1 2024
83 pages
DBMS Chapter 1
No ratings yet
DBMS Chapter 1
14 pages
Chapter 2 Database EnvironmentAA
No ratings yet
Chapter 2 Database EnvironmentAA
98 pages
Three Level Architecture
No ratings yet
Three Level Architecture
24 pages
1.2 View of Data, System Architecture and Data Models
No ratings yet
1.2 View of Data, System Architecture and Data Models
65 pages
Database Systems - Lecture 4
No ratings yet
Database Systems - Lecture 4
34 pages
Unit 1
No ratings yet
Unit 1
84 pages
4 The Three-Level ANSI-SPARC Architecture
No ratings yet
4 The Three-Level ANSI-SPARC Architecture
25 pages
Database Systems Ch1
No ratings yet
Database Systems Ch1
23 pages
DBMS
No ratings yet
DBMS
36 pages
Lecture 1
No ratings yet
Lecture 1
44 pages
DBMS Unit-1
No ratings yet
DBMS Unit-1
34 pages
U1 - System Architecture
No ratings yet
U1 - System Architecture
38 pages
DB1 Lecture2
No ratings yet
DB1 Lecture2
29 pages
Before Mids Data Base Notes
No ratings yet
Before Mids Data Base Notes
109 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
SWE DataBase
No ratings yet
SWE DataBase
56 pages
Data Abstraction and 3-Tier Architecture
No ratings yet
Data Abstraction and 3-Tier Architecture
22 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
66 pages
Lecture 3
No ratings yet
Lecture 3
25 pages
Chapter 1 Data Merise
No ratings yet
Chapter 1 Data Merise
6 pages
DBMS Unit01
No ratings yet
DBMS Unit01
84 pages
Dbms Unit1
No ratings yet
Dbms Unit1
57 pages
Dbms QB Solution Tt-1
No ratings yet
Dbms QB Solution Tt-1
15 pages
DBMS
No ratings yet
DBMS
70 pages
Database Management System
No ratings yet
Database Management System
21 pages
Merged
No ratings yet
Merged
393 pages
Cse CSPC403 DBMS
No ratings yet
Cse CSPC403 DBMS
98 pages
Chapter One - Introduction
No ratings yet
Chapter One - Introduction
30 pages
Database Management Systems
No ratings yet
Database Management Systems
34 pages
CSC9Q5 / ITNP31 (31Q5 / IT31) : Database Principles and Applications
No ratings yet
CSC9Q5 / ITNP31 (31Q5 / IT31) : Database Principles and Applications
24 pages
CH 2
No ratings yet
CH 2
47 pages
Dbms 1
No ratings yet
Dbms 1
30 pages
Database Management System
No ratings yet
Database Management System
116 pages
Data Abstraction
No ratings yet
Data Abstraction
7 pages
WINSEM2024-25 CSE2007 ETH AP2024254000455 2025-01-07 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE2007 ETH AP2024254000455 2025-01-07 Reference-Material-I
18 pages
Unit 1 DBMS
No ratings yet
Unit 1 DBMS
66 pages
Lecture 1.1.3 - Databases Architecture and Lecture 1.1.4 - Data Independence
No ratings yet
Lecture 1.1.3 - Databases Architecture and Lecture 1.1.4 - Data Independence
22 pages
Database Notes
No ratings yet
Database Notes
21 pages
Cse CSPC403 DBMS-1-14
No ratings yet
Cse CSPC403 DBMS-1-14
14 pages
U1T3 Data - Base - Environment (Views of Data)
No ratings yet
U1T3 Data - Base - Environment (Views of Data)
6 pages
Introduction DBMS 2024
No ratings yet
Introduction DBMS 2024
39 pages
Sudin Shakya 1
No ratings yet
Sudin Shakya 1
14 pages
DBMS Slides
No ratings yet
DBMS Slides
127 pages
Database Management System: (BCAC-401)
No ratings yet
Database Management System: (BCAC-401)
16 pages
Database Management System: (BCAC-401)
No ratings yet
Database Management System: (BCAC-401)
16 pages
Database CH-2
No ratings yet
Database CH-2
14 pages
MM 1
No ratings yet
MM 1
13 pages
FINAL English 10 Q1 Module 7
No ratings yet
FINAL English 10 Q1 Module 7
30 pages
DBMS Concepts&architecture
No ratings yet
DBMS Concepts&architecture
39 pages
1.architecture of DBMS
No ratings yet
1.architecture of DBMS
27 pages
Three Tier DBMS Architecture
No ratings yet
Three Tier DBMS Architecture
8 pages
Lecture #1: by Mohsin Riaz
No ratings yet
Lecture #1: by Mohsin Riaz
43 pages
Database Management System: Data: Types of Data Processing
No ratings yet
Database Management System: Data: Types of Data Processing
6 pages
The Book of The Dead
100% (2)
The Book of The Dead
54 pages
Notes On Database Management System
No ratings yet
Notes On Database Management System
24 pages
2023 Key Stage 2 English Grammar Punctuation and Spelling Paper 1 Questions
No ratings yet
2023 Key Stage 2 English Grammar Punctuation and Spelling Paper 1 Questions
28 pages
Muslim Names
No ratings yet
Muslim Names
95 pages
Prototype Theory and Meaning PDF
No ratings yet
Prototype Theory and Meaning PDF
405 pages
Basic Rules For Gerunds and Infinitives
No ratings yet
Basic Rules For Gerunds and Infinitives
7 pages
1100DW Manual Guide V2.0
No ratings yet
1100DW Manual Guide V2.0
24 pages
RBP020L062S FPM Assessment Brief 2024-25 - Final
No ratings yet
RBP020L062S FPM Assessment Brief 2024-25 - Final
13 pages
So3 b1 Unit Test U8b
No ratings yet
So3 b1 Unit Test U8b
5 pages
Fanvil Datasheet+V60W
No ratings yet
Fanvil Datasheet+V60W
2 pages
Principles and Procedures of Materials Development For Language Learning
No ratings yet
Principles and Procedures of Materials Development For Language Learning
2 pages
Four Aspects of His Theory (Except The Bilateral Model) Sassure
No ratings yet
Four Aspects of His Theory (Except The Bilateral Model) Sassure
24 pages
Kickstart Arrays Lesson
100% (1)
Kickstart Arrays Lesson
3 pages
IPDS Emulation UsersGuide 5thEd-Rev003-Aug2018
No ratings yet
IPDS Emulation UsersGuide 5thEd-Rev003-Aug2018
124 pages
Full Download Bradford S Indian Book Being The True Roote Rise of American Letters As Revealed by The Native Text Embedded in of Plimoth Plantation First Edition Betty Booth Donohue PDF
100% (4)
Full Download Bradford S Indian Book Being The True Roote Rise of American Letters As Revealed by The Native Text Embedded in of Plimoth Plantation First Edition Betty Booth Donohue PDF
55 pages
Tape Rec - Pas
No ratings yet
Tape Rec - Pas
4 pages
Prefatory Note
0% (1)
Prefatory Note
2 pages
Hedy HD700 Aug V4.1
No ratings yet
Hedy HD700 Aug V4.1
160 pages
02 Week TLE 8 - Electronics Lesson Exemplar
No ratings yet
02 Week TLE 8 - Electronics Lesson Exemplar
3 pages
Revista CTP Noviembre 2018
No ratings yet
Revista CTP Noviembre 2018
72 pages
The Hydrogen Cipher (Judy Beebe)
100% (1)
The Hydrogen Cipher (Judy Beebe)
14 pages
The Art of Music Production - The Theory and Practice (PDFDrive) - 105-119
No ratings yet
The Art of Music Production - The Theory and Practice (PDFDrive) - 105-119
15 pages
Holiday Homework Class-Vii
No ratings yet
Holiday Homework Class-Vii
80 pages
Steel-Grating-Catalogue
No ratings yet
Steel-Grating-Catalogue
23 pages
Black Doodle Group Project Presentation
No ratings yet
Black Doodle Group Project Presentation
33 pages
1 Syllabus For IINDUSTRY 4.0 - 20 Use Cases
No ratings yet
1 Syllabus For IINDUSTRY 4.0 - 20 Use Cases
5 pages
Muh. Fawaz Salammutaqi - Summary of Describing Jobs
No ratings yet
Muh. Fawaz Salammutaqi - Summary of Describing Jobs
8 pages
Lesson Plan IMR 665 - Oct 2022 - Feb 2023
No ratings yet
Lesson Plan IMR 665 - Oct 2022 - Feb 2023
3 pages
Asking and Encouraging Questions
No ratings yet
Asking and Encouraging Questions
2 pages
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet