0% found this document useful (0 votes)
0 views

Distributed Database Word

This document provides an overview of distributed database systems, including their types, advantages, and disadvantages. It contrasts centralized and distributed databases, highlighting features such as data fragmentation, replication, and allocation. Additionally, it discusses design alternatives and strategies for managing distributed databases, emphasizing the importance of data integrity and efficient query processing.

Uploaded by

aulakhnoor444
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Distributed Database Word

This document provides an overview of distributed database systems, including their types, advantages, and disadvantages. It contrasts centralized and distributed databases, highlighting features such as data fragmentation, replication, and allocation. Additionally, it discusses design alternatives and strategies for managing distributed databases, emphasizing the importance of data integrity and efficient query processing.

Uploaded by

aulakhnoor444
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Distributed Database system

UNIT - I

INTRODUCTION TO DISTRIBUTED DATABASE

A database is an ordered collection of related data that is built for a specific purpose. A data-
base may be organized as a collection of multiple tables, where a table represents a real world
element or entity. Each table has several different fields that represent the characteristic fea-
tures of the entity.
For example, a company database may include tables for projects, employees, departments,
products and financial records. The fields in the Employee table may be Name, Company_Id,
Date_of_Joining, and so forth.

Types of DBMS Hi-


erarchical DBMS
In hierarchical DBMS, the relationships among data in the database are established so that
one data element exists as a subordinate of another. The data elements have parent-child rela -

tionships and are modelled using the “tree” data structure. These are very fast and simple.

Figure 1.1 Hierarchical DBMS


Network DBMS
Network DBMS in one where the relationships among data in the database are of type many-
to-many in the form of a network. The structure is generally complicated due to the existence
of numerous many-to-many relationships. Network DBMS is modelled using “graph” data

structure.
Figure 1.2 Network DBMS
Relational DBMS
In relational databases, the database is represented in the form of relations. Each relation
models an entity and is represented as a table of values. In the relation or table, a row is
called a tuple and denotes a single record. A column is called a field or an attribute and de-
notes a characteristic property of the entity. RDBMS is the most popular database manage-
ment system.

For example − A Student Relation −

Figure 1.3 A Student Relation


Object Oriented DBMS
Object-oriented DBMS is derived from the model of the object-oriented programming para-
digm. They are helpful in representing both consistent data as stored in databases, as well as
transient data, as found in executing programs. They use small, reusable elements called ob-
jects. Each object contains a data part and a set of operations which works upon the data. The
object and its attributes are accessed through pointers instead of being stored in relational ta-
ble models.

For example − A simplified Bank Account object-oriented database −

Figure 1.4 A simplified Bank Account object-oriented database

Centralised Database :A centralized database is stored at a single location such as a mainframe com-
puter. It is maintained and modified from that location only and usually accessed using an internet con-
nection such as a LAN or WAN. The centralized database is used by organisations such as colleges, com-
panies, banks etc.
As can be seen from the above diagram, all the information for the organisation is stored in a sin-
gle database. This database is known as the centralized database.

Advantages
Some advantages of Centralized Database Management System are −

• The data integrity is maximised as the whole database is stored at a single physical location. This
means that it is easier to coordinate the data and it is as accurate and consistent as possible.

• The data redundancy is minimal in the centralised database. All the data is stored together and not
scattered across different locations. So, it is easier to make sure there is no redundant data avail-
able.

• Since all the data is in one place, there can be stronger security measures around it. So, the cen -
tralised database is much more secure.

• Data is easily portable because it is stored at the same place.


• The centralized database is cheaper than other types of databases as it requires less power and
maintenance.
• All the information in the centralized database can be easily accessed from the same location and
at the same time.

Disadvantages
• Since all the data is at one location, it takes more time to search and access it. If the net-
work is slow, this process takes even more time.
• There is a lot of data access traffic for the centralized database. This may create a bottle -
neck situation.
• Since all the data is at the same location, if multiple users try to access it simultaneously
it creates a problem. This may reduce the efficiency of the system.
• If there are no database recovery measures in place and a system failure occurs, then all
the data in the database will be destroyed.
Distributed DBMS
A distributed database is a set of interconnected databases that is distributed over the com-
puter network or internet. A Distributed Database Management System (DDBMS) manages
the distributed database and provides mechanisms so as to make the databases transparent to

the users. In these systems, data is intentionally distributed among multiple nodes so that all
computing resources of the organization can be optimally used.
A distributed database is a collection of multiple interconnected databases, which are
spread physically across various locations that communicate via a computer network.

Features
· Databases in the collection are logically interrelated with each other. Often they
represent a single logical database.
· Data is physically stored across multiple sites. Data in each site can be managed by
a DBMS independent of the other sites.
· The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
· A distributed database is not a loosely connected file system.
· A distributed database incorporates transaction processing, but it is not synonymous
with a transaction processing system.
Factors Encouraging DDBMS
· Distributed Nature of Organizational Units − Most organizations in the current times
are subdivided into multiple units that are physically distributed over the globe. Each
unit requires its own set of local data. Thus, the overall database of the organization
becomes distributed.
· Need for Sharing of Data − The multiple organizational units often need to communi-
cate with each other and share their data and resources. This demands common data-
bases or replicated databases that should be used in a synchronized manner.
· Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and
Online Analytical Processing (OLAP) work upon diversified systems which may
have common data. Distributed database systems aid both these processing by pro-
viding synchronized data.
· Database Recovery − One of the common techniques used in DDBMS is replication
of data across different sites. Replication of data automatically helps in data recovery
if database in any site is damaged. Users can access data from other sites while the
damaged site is being reconstructed. Thus, database failure may become almost in-
conspicuous to users.
· Support for Multiple Application Software − Most organizations use a variety of ap-
plication software each with its specific database support. DDBMS provides a uni-
form functionality for using the same data among different platforms.
Advantages of Distributed Databases
· Modular Development − If the system needs to be expanded to new locations or new
units, in centralized database systems, the action requires substantial efforts and dis-
ruption in the existing functioning. However, in distributed databases, the work sim-
ply requires adding new computers and local data to the new site and finally connect-
ing them to the distributed system, with no interruption in current functions.

· More Reliable − In case of database failures, the total system of centralized databases
comes to a halt. However, in distributed systems, when a component fails, the func-
tioning of the system continues may be at a reduced performance. Hence DDBMS is
more reliable.

· Better Response − If data is distributed in an efficient manner, then user requests can
be met from local data itself, thus providing faster response. On the other hand, in
centralized systems, all queries have to pass through the central computer for process-
ing, which increases the response time.

· Lower Communication Cost − In distributed database systems, if data is located lo-


cally where it is mostly used, then the communication costs for data manipulation can
be minimized. This is not feasible in centralized systems.

Disadvantages of Distributed Databases


· Need for complex and expensive software − DDBMS demands complex and often ex-
pensive software to provide data transparency and co-ordination across the several
sites.
· Processing overhead − Even simple operations may require a large number of com-
munications and additional calculations to provide uniformity in data across the sites.
· Data integrity − The need for updating data in multiple sites pose problems of data in-
tegrity.
· Overheads for improper data distribution − Responsiveness of queries is largely de-
pendent upon proper data distribution. Improper data distribution often leads to very
slow response to user requests.

Distributed Database Vs Centralized Database


Centralized DBMS Distributed DBMS
In Distributed DBMS the data-
In Centralized DBMS the database
base are stored in different site and help of
are stored in a only one site
network it can access it

Database and DBMS software


If the data is stored at a single
distributed over many sites,connected by a
computer site,which can be used by multiple
computer network
users

Database is maintained at a
Database is maintained at one site
number of different sites

If centralized system fails,entire If one system fails,system continues


system is halted work with other site
It is a less reliable It is a more reliable

Centralized database

Figure 1.5 Centralized database

Distributed database
Figure1. 6 Distributed database

Features of Distributed DBMS


There is a presence of a certain number of features that make DDBMS very popu-
lar in organizing data.
• Data Fragmentation: The overall database system is divided into smaller
subsets which are fragmentations. This fragmentation can be three types
horizontal (divided by rows depending upon conditions), vertical (divided
by columns depending upon conditions), and hybrid (horizontal + vertical).

• Data Replication: DDBMS maintains and stores multiple copies of the


same data in its different fragments to ensure data availability, fault toler-
ance, and seamless performance.

• Data Allocation: It determines if all data fragments are required to be


stored in all sites or not. This feature is used to reduce network traffic and
optimize the performance.

• Data Transparency: DDBMS hides all the complexities from its users and
provides transparent access to data and applications to users.

Types of Distributed Databases

Figure 1.7 Types of Distributed Databases


Distributed databases can be broadly classified into homogeneous and heterogeneous distrib-
uted database environments
Homogeneous Distributed Databases
In a homogeneous distributed database, all the sites use identical DBMS and operating sys-
tems. Its properties are −
· The sites use very similar software.
· The sites use identical DBMS or DBMS from the same vendor.

· Each site is aware of all other sites and cooperates with other sites to process user
requests.
· The database is accessed through a single interface as if it is a single data-

base.

Heterogeneous Distributed Databases


In a heterogeneous distributed database, different sites have different operating systems,
DBMS products and data models. Its properties are −
· Different sites use dissimilar schemas and software.
· The system may be composed of a variety of DBMSs like relational, network,
hierarchical or object oriented.
· Query processing is complex due to dissimilar schemas. Transaction processing is
complex due to dissimilar software.
· A site may not be aware of other sites and so there is limited co-operation in
processing user requests.
Design Alternatives
The distribution design alternatives for the tables in a DDBMS are as follows
− Non-replicated and non-fragmented
Fully replicated
Partially replicated Fragmented
Mixed
Non-replicated & Non-fragmented
In this design alternative, different tables are placed at different sites. Data is placed so that it
is at a close proximity to the site where it is used most. It is most suitable for database sys-
tems where the percentage of queries needed to join information in tables placed at different
sites is low. If an appropriate distribution strategy is adopted, then this design alternative
helps to reduce the communication cost during data processing.
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since,
each site has its own copy of the entire database, queries are very fast requiring negligible
communication cost. On the contrary, the massive redundancy in data requires huge cost dur-
ing update operations. Hence, this is suitable for systems where a large number of queries is
required to be handled whereas the number of database updates is low.
Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the ta-
bles is done in accordance to the frequency of access. This takes into consideration the fact
that the frequency of accessing the tables vary considerably from site to site. The number of
copies of the tables (or portions) depends on how frequently the access queries execute and
the site which generate the access queries.
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions,
and each fragment can be stored at different sites. This considers the fact that it seldom hap-
pens that all data stored in a table is required at a given site. Moreover, fragmentation in-
creases parallelism and provides better disaster recovery. Here, there is only one copy of each
fragment in the system, i.e. no redundant data.
The three fragmentation techniques are −
· Vertical fragmentation
· Horizontal fragmentation
· Hybrid fragmentation

Mixed Distribution: This is a combination of fragmentation and partial replications. Here, the
tables are initially fragmented in any form (horizontal or vertical), and then these fragments
are partially replicated across the different sites according to the frequency of accessing the
fragments.
Design Strategies
The strategies can be broadly divided into replication and fragmentation. However, in most
cases, a combination of the two is used.
Data Replication
Data replication is the process of storing separate copies of the database at two or more sites.
It is a popular fault tolerance technique of distributed databases.
Advantages of Data Replication
· Reliability − In case of failure of any site, the database system continues to work
since a copy is available at another site(s).
· Reduction in Network Load − Since local copies of data are available, query process-
ing can be done with reduced network usage, particularly during prime hours. Data
updating can be done at non-prime hours.
· Quicker Response − Availability of local copies of data ensures quick query process-
ing and consequently quick response time.
· Simpler Transactions − Transactions require less number of joins of tables located at
different sites and minimal coordination across the network. Thus, they become sim-
pler in nature.

Disadvantages of Data Replication


· Increased Storage Requirements − Maintaining multiple copies of data is associated
with increased storage costs. The storage space required is in multiples of the storage
required for a centralized system.
· Increased Cost and Complexity of Data Updating − Each time a data item is updated,
the update needs to be reflected in all the copies of the data at the different sites. This
requires complex synchronization techniques and protocols.
· Undesirable Application – Database coupling − If complex update mechanisms are
not used, removing data inconsistency requires complex co- ordination at application
level. This results in undesirable application – database coupling.

Types of data replication


1.Snapshot Replication

• Periodically copies the entire dataset from the source to the target database.
• Suitable for systems where real-time updates are not necessary.
• Example: A reporting system that updates once a day.
2. Transactional Replication

• Continuously replicates changes (INSERT, UPDATE, DELETE) in real time.


• Ensures that the target database remains in sync with the source.
• Example: Banking systems that require immediate consistency.
3. Merge Replication

• Changes can be made at multiple locations, and they are merged later.
• Used in scenarios where databases operate independently and sync periodically.
• Example: Mobile applications that work offline and sync when connected.

5. Full Replication

• The entire database is copied to multiple locations.


• Improves read performance but increases storage and update overhead.
• Example: Content delivery networks (CDNs) storing replicated website data.
6. Partial Replication

• Only selected portions of the database are replicated based on need.


• Reduces storage costs while maintaining availability for critical data.
• Example: A global company replicating regional customer data to local servers.
Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the ta -
ble are called fragments. Fragmentation can be of three types:
horizontal,
vertical, and
hybrid (combination of horizontal and vertical). Fragmentation should be done in a way so
that the original table can be reconstructed from the fragments. This is needed so that the
original table can be reconstructed from the fragments whenever required. This requirement
is called “reconstructiveness.”
Advantages
1. Permits a number of transactions to executed concurrently
2. Results in parallel execution of a single query
3. Increases level of concurrency, also referred to as, intra query concurrency
4. Increased System throughput.
5. Since data is stored close to the site of usage, efficiency of the database system is
increased.
6. Local query optimization techniques are sufficient for most queries since data is
locally available.
7. Since irrelevant data is not available at the sites, security and privacy of the database
system can be maintained.

Disadvantages
1. Applications whose views are defined on more than one fragment may suffer
performance degradation, if applications have conflicting requirements.
2. Simple tasks like checking for dependencies, would result in chasing after data in a
number of sites
3. When data from different fragments are required, the access speeds may be very
high.
4. In case of recursive fragmentations, the job of reconstruction will need expensive
techniques.
5. Lack of back-up copies of data in different sites may render the database ineffective
in case of failure of a site.

Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In or-
der to maintain reconstructiveness, each fragment should contain the primary key field(s) of
the table. Vertical fragmentation can be used to enforce privacy of data.

Grouping
· Starts by assigning each attribute to one fragment
o At each step, joins some of the fragments until some criteria is satisfied.
· Results in overlapping fragments
Splitting
· Starts with a relation and decides on beneficial partitioning based on the access
behaviour of applications to the attributes
· Fits more naturally within the top-down design
· Generates non-overlapping fragments
For example, let us consider that a University database keeps records of all registered stu-
dents in a Student table having the following schema.

Regd_No Name Course Address Semester Fees M


a

r
k
s
STUDENT

Now, the fees details are maintained in the accounts section.

Horizontal Fragmentation

CREATE TABLE STD_FEES AS


SELECT Regd_No, Fees FROM
STUDENT;
Horizontal fragmentation groups the tuples of a table in accordance to values of one or more
fields. Horizontal fragmentation should also confirm to the rule of reconstructiveness. Each
horizontal fragment must have all columns of the original base table.

· Primary horizontal fragmentation is defined by a selection operation on the owner


relation of a database schema.
· Given relation Ri, its horizontal fragments are
given by Ri = σFi(R), 1<= i <= w
Fi selection formula used to obtain fragment Ri

The example

Emp1 = σSal <= 20K (Emp)

Emp2 = σSal > 20K (Emp)

For example, in the student schema, if the details of all students of Computer Science Course
needs to be maintained at the School of Computer Science, then the designer will horizontally
fragment the database as follows −

CREATE COMP_STD AS SELECT * FROM STUDENT

WHERE COURSE = "Computer Science";

Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques
are used. This is the most flexible fragmentation technique since it generates fragments with
minimal extraneous information. However, reconstruction of the original table is often an ex-
pensive task.
Hybrid fragmentation can be done in two alternative ways −
At first, generate a set of horizontal fragments; then generate vertical fragments from one or
more of the horizontal fragments.
At first, generate a set of vertical fragments; then generate horizontal fragments from one or
more of the vertical fragments.

Transparency
Transparency in DBMS stands for the separation of high level semantics of the system from
the low-level implementation issue. High-level semantics stands for the endpoint user, and
low level implementation concerns with complicated hardware implementation of data or
how the data has been stored in the database. Using data independence in various layers of
the database, transparency can be implemented in DBMS.
Distribution transparency is the property of distributed databases by the virtue of which the
internal details of the distribution are hidden from the users. The DDBMS designer may
choose to fragment tables, replicate the fragments and store them at different sites. However,
since users are oblivious of these details, they find the distributed database easy to use like
any centralized database.
Unlike normal DBMS, DDBMS deals with communication network, replicas and fragments
of data. Thus, transparency also involves these three factors.
Following are three types of transparency:
1. Location transparency
2. Fragmentation transparency
3. Replication transparency

Location Transparency
Location transparency ensures that the user can query on any table(s) or fragment(s) of a ta-
ble as if they were stored locally in the user’s site. The fact that the table or its fragments
are stored at remote site in the distributed database system, should be completely oblivious to
the end user. The address of the remote site(s) and the access mechanisms are completely hid-
den.In order to incorporate location transparency, DDBMS should have access to updated and
accurate data dictionary and DDBMS directory which contains the details of locations of
data.
Fragmentation Transparency
Fragmentation transparency enables users to query upon any table as if it were unfragmented.
Thus, it hides the fact that the table the user is querying on is actually a fragment or union of
some fragments. It also conceals the fact that the fragments are located at diverse sites.This is
somewhat similar to users of SQL views, where the user may not know that they are using a
view of a table instead of the table itself.
Replication Transparency
Replication transparency ensures that replication of databases are hidden from the users. It
enables users to query upon a table as if only a single copy of the table exists.Replication
transparency is associated with concurrency transparency and failure transparency. Whenever
a user updates a data item, the update is reflected in all the copies of the table. However, this
operation should not be known to the user. This is concurrency transparency. Also, in case of
failure of a site, the user can still proceed with his queries using replicated copies without any
knowledge of failure. This is failure transparency.
Combination of Transparencies
In any distributed database system, the designer should ensure that all the stated transparen-
cies are maintained to a considerable extent. The designer may choose to fragment tables,
replicate them and store them at different sites; all oblivious to the end user. However, com-
plete distribution transparency is a tough task and requires considerable design efforts.

Database Security Control


Database control refers to the task of enforcing regulations so as to provide correct data to au-
thentic users and applications of a database. In order that correct data is available to users, all
data should conform to the integrity constraints defined in the database. Besides, data should
be screened away from unauthorized users so as to maintain security and privacy of the data-
base. Database control is one of the primary tasks of the database administrator (DBA).
The three dimensions of database control are −
· Authentication and Authorisation
· Access Control
· Integrity Constraints

Authentication
In a distributed database system, authentication is the process through which only legitimate
users can gain access to the data resources.
Authentication can be enforced in two levels −
Controlling Access to Client Computer − At this level, user access is restricted while login
to the client computer that provides user-interface to the database server. The most common
method is a username/password combination. However, more sophisticated methods like bio-
metric authentication may be used for high security data.
Controlling Access to the Database Software − At this level, the database software/admin-
istrator assigns some credentials to the user. The user gains access to the database using these
credentials. One of the methods is to create a login account within the database server.
Access Rights
A user’s access rights refers to the privileges that the user is given regarding DBMS opera-
tions such as the rights to create a table, drop a table, add/delete/update tuples in a table or
query upon the table.
In distributed environments, since there are large number of tables and yet larger number of
users, it is not feasible to assign individual access rights to users. So, DDBMS defines certain
roles. A role is a construct with certain privileges within a database system. Once the differ-
ent roles are defined, the individual users are assigned one of these roles. Often a hierarchy of
roles are defined according to the organization’s hierarchy of authority and responsibility.
For example, the following SQL statements create a role "Accountant" and then assigns this
role to user "ABC".

CREATE ROLE ACCOUNTANT;

GRANT SELECT, INSERT, UPDATE ON EMP_SAL TO ACCOUNTANT; GRANT INSERT, UPDATE,


DELETE ON TENDER TO ACCOUNTANT; GRANT INSERT, SELECT ON EXPENSE TO ACCOUNTANT;

COMMIT;
Authorization: Authorization determines what actions authenticated users are allowed to perform within
the database. It defines access control policies based on user roles, privileges, and permissions. Here are
common authorization mechanisms used in database security:

Role-Based Access Control (RBAC): RBAC assigns permissions to roles, and users are assigned to
these roles based on their job responsibilities or organizational roles. This simplifies access management
by grouping users with similar access requirements.

Semantic Integrity Control


Semantic integrity control defines and enforces the integrity constraints of the database sys-
tem.
The integrity constraints are as follows −
Data type integrity constraint
Entity integrity constraint Referen-
tial integrity constraint Data Type
Integrity Constraint
A data type constraint restricts the range of values and the type of operations that can be ap-
plied to the field with the specified data type.
For example, let us consider that a table "HOSTEL" has three fields - the hostel number, hos -
tel name and capacity. The hostel number should start with capital letter "H" and cannot be
NULL, and the capacity should not be more than 150. The following SQL command can be
used for data definition −

CREATE TABLE HOSTEL (

H_NO VARCHAR2(5) NOT NULL, H_NAME VARCHAR2(15), CAPACITY INTEGER, CHECK (

H_NO LIKE 'H%'), CHECK ( CAPACITY <= 150)

);
Entity Integrity Control
Entity integrity control enforces the rules so that each tuple can be uniquely identified from
other tuples. For this a primary key is defined. A primary key is a set of minimal fields that
can uniquely identify a tuple. Entity integrity constraint states that no two tuples in a table can
have identical values for primary keys and that no field which is a part of the primary key can
have NULL value.
For example, in the above hostel table, the hostel number can be assigned as the primary key
through the following SQL statement (ignoring the checks) −

CREATE TABLE HOSTEL (

H_NO VARCHAR2(5) PRIMARY KEY, H_NAME VARCHAR2(15), CAPACITY

INTEGER);

Referential Integrity Constraint


Referential integrity constraint lays down the rules of foreign keys. A foreign key is a field in
a data table that is the primary key of a related table. The referential integrity constraint lays
down the rule that the value of the foreign key field should either be among the values of the
primary key of the referenced table or be entirely NULL.
For example, let us consider a student table where a student may opt to live in a hostel. To in -
clude this, the primary key of hostel table should be included as a foreign key in the student
table. The following SQL statement incorporates this −

CREATE TABLE STUDENT (

S_ROLL INTEGER PRIMARY KEY, S_NAME VARCHAR2(25) NOT NULL, S_COURSE


VARCHAR2(10),

S_HOSTEL VARCHAR2(5) REFERENCES HOSTEL);


DATABASE BACKUP AND RECOVERY

Database Backup
· Database Backup is storage of data that means the copy of the data.
· It is a safeguard against unexpected data loss and application errors.
· It protects the database against data loss.
· If the original data is lost, then using the backup it can reconstructed.

The backups are divided into two types,


1. Physical Backup
2. Logical Backup

1. Physical backups
· Physical Backups are the backups of the physical files used in storing and recovering your database, such as
datafiles, control files and archived redo logs, log files.
· It is a copy of files storing database information to some other location, such as disk, some offline storage like mag-
netic tape.
· Physical backups are the foundation of the recovery mechanism in the database.
· Physical backup provides the minute details about the transaction and modification to the database.
2. Logical backup
· Logical Backup contains logical data which is extracted from a database.
· It includes backup of logical data like views, procedures, functions, tables, etc.
· It is a useful supplement to physical backups in many circumstances but not a sufficient protection against data loss
without physical backups, because logical backup provides only structural information.

Importance Of Backups
· Planning and testing backup helps against failure of media, operating system, software and any other kind of fail-
ures that cause a serious data crash.
· It determines the speed and success of the recovery.
· Physical backup extracts data from physical storage (usually from disk to tape). Operating system is an example of
physical backup.
· Logical backup extracts data using SQL from the database and store it in a binary file.
· Logical backup is used to restore the database objects into the database. So the logical backup utilities allow DBA
(Database Administrator) to back up and recover selected objects within the database.
Methods of Backup
The different methods of backup in a database are:

· Full Backup - This method takes a lot of time as the full copy of the database is made including the data and
the transaction records.
· Transaction Log - Only the transaction logs are saved as the backup in this method. To keep the backup file
as small as possible, the previous transaction log details are deleted once a new backup record is made.
· Differential Backup - This is similar to full backup in that it stores both the data and the transaction records.
However only that information is saved in the backup that has changed since the last full backup. Because of
this, differential backup leads to smaller files.

WHY Plan Backups?


There can be multiple reasons of failure in a database because of which a database backup and recovery plan is re-
quired. Some of these reasons are:
· A database includes a huge amount of data and transaction.
· If the system crashes or failure occurs, then it is very difficult to recover the database.

There are some common causes of failures such as,


1. System Crash
2. Transaction Failure
3. Network Failure
4. Disk Failure
5. Media Failure
· Each transaction has ACID property. If we fail to maintain the ACID properties, it is the failure of the database
system.
1. System Crash
· System crash occurs when there is a hardware or software failure or external factors like a power failure.
· The data in the secondary memory is not affected when system crashes because the database
has lots of integrity. Checkpoint prevents the loss of data from secondary memory.
2. Transaction Failure
· The transaction failure is affected on only few tables or processes because of logical errors in the code.
· This failure occurs when there are system errors like deadlock or unavailability of system resources to execute
the transaction.
3. Network Failure
· A network failure occurs when a client – server configuration or distributed database system are connected by com-
munication networks.
4. Disk Failure
· Disk Failure occurs when there are issues with hard disks like formation of bad sectors, disk head crash, unavail-
ability of disk etc.
5. Media Failure
· Media failure is the most dangerous failure because, it takes more time to recover than any other kind of failures.
· A disk controller or disk head crash is a typical example of media failure.
· Natural disasters like floods, earthquakes, power failures, etc. damage the data.
6. User Error
Normally, user error is the biggest reason of data destruction or corruption in a database. To rectify the error, the
database needs to be restored to the point in time before the error occured.

Hardware Protection and Type of Hardware Protection

Hardware protection is divided into 3 categories: CPU protection, Memory Protection, and I/O protection. These are
explained as following below.

1. CPU Protection:
CPU protection is referred to as we can not give CPU to a process forever, it should be for some limited time
otherwise other processes will not get the chance to execute the process. So for that, a timer is used to get over
from this situation. which is basically give a certain amount of time a process and after the timer execution a
signal will be sent to the process to leave the CPU. hence process will not hold CPU for more time.
2. Memory Protection:
In memory protection, we are talking about that situation when two or more processes are in memory and one
process may access the other process memory. and to protecting this situation we are using two registers as:
Base register
Limit register
So basically Bare register store the starting address of program and limit register store the size of the process, so
when a process wants to access the memory then it is checked that it can access or can not access the memory.
3. I/O Protection:
So when we ensuring the I/O protection then some cases will never have occurred in the system as:
1. Termination I/O of other process
2. View I/O of other process
3. Giving priority to a particular process I/O
Redundancy
Data redundancy is a condition created within a database or data storage technology in which the same piece of data is
held in two separate places.
This can mean two different fields within a single database, or two different spots in multiple software environments or
platforms. Whenever data is repeated, this basically constitutes data
redundancy. This can occur by accident, but is also done deliberately for backup and recovery purposes.
Hardware redundancy
Hardware redundancy is achieved by providing two or more physical copies of a hardware component. When other
techniques, such as use of more reliable components, manufacturing quality control, test, design simplification, etc.,
have been exhausted, hardware redundancy may be the only way to improve the dependability of a system.

What Is Recovery?
· Recovery is the process of restoring a database to the correct state in the event of a failure.
· It ensures that the database is reliable and remains in consistent state in case of a failure.

Database recovery can be classified into two parts;


1. Rolling Forward applies redo records to the corresponding data blocks.
2. Rolling Back applies rollback segments to the datafiles. It is stored in transaction tables.

Database Recovery
There are two methods that are primarily used for database recovery. These are:

· Log based recovery - In log based recovery, logs of all database transactions are stored in a secure area so
that in case of a system failure, the database can recover the data. All log information, such as the time of the
transaction, its data etc. should be stored before the transaction is executed.
· Shadow paging - In shadow paging, after the transaction is completed its data is automatically stored for
safekeeping. So, if the system crashes in the middle of a transaction, changes made by it will not be reflected
in the database.
SHADOW PAGE TABLE
CURRENT PAGE TABLE
Log-Based Recovery
· Logs are the sequence of records, that maintain the records of actions performed by a transaction.
· In Log – Based Recovery, log of each transaction is maintained in some stable storage. If any failure occurs, it
can be recovered from there to recover the database.
· The log contains the information about the transaction being executed, values that have been modified and transac-
tion state.
· All these information will be stored in the order of execution.

Example:
Assume, a transaction to modify the address of an employee. The following logs are written for this transaction,
Log 1: Transaction is initiated, writes 'START' log.
Log: <Tn START>

Log 2: Transaction modifies the address from 'Pune' to 'Mumbai'.


Log: <Tn Address, 'Pune', 'Mumbai'>
Log 3: Transaction is completed. The log indicates the end of the transaction.
Log: <Tn COMMIT>

There are two methods of creating the log files and updating the database,
1. Deferred Database Modification
2. Immediate Database Modification

1. In Deferred Database Modification, all the logs for the transaction are created and stored into stable storage sys-
tem. In the above example, three log records are created and stored it in some storage system, the database will be
updated with those steps.

2. In Immediate Database Modification, after creating each log record, the database is modified for each step of
log entry immediately. In the above example, the database is modified at each step of log entry that means after
first log entry, transaction will hit the database to fetch the record, then the second log will be entered followed
by updating the employee's address, then the third log followed by committing the database changes.
Recovery from Transaction Failures

Summary
A transaction may fail due to logical errors, deadlocks, or system crashes. Recovery tech-
niques include:

A. Undo (Rollback) & Redo (Rollforward)

• Undo (Rollback): Reverses uncommitted changes when a transaction fails.


• Redo (Rollforward): Reapplies committed changes if they were lost due to a failure.

ROLLBACK; -- Undo uncommitted changes


COMMIT; -- Make changes permanent

B. Deferred Update (No Undo, Redo Required)

• Updates are written to logs first and applied only after the transaction commits.
• If a crash occurs before commit, nothing is applied, and no rollback is needed.

C. Immediate Update (Undo & Redo Required)

• Updates are applied immediately to the database and logged.


• If a failure occurs:
◦ Redo committed transactions.
◦ Undo uncommitted transactions.

2. Recovery from System Failure (Crash Recovery)


A. Log-Based Recovery

• Before updating a database, log changes in the redo log.


• Uses undo logs and redo logs for rollback and rollforward.

<T1, Start>
<T1, Update(A, 100)>
<T1, Commit>
• If the system crashes after commit → Redo changes.
• If the system crashes before commit → Undo changes.

B. Checkpointing

• A checkpoint is a point where the database writes all committed changes from logs to
the database.
• Reduces the time for recovery after a crash.

Process:

1. A checkpoint is taken periodically.


2. Logs before the checkpoint are removed.
3. On recovery, only logs after the checkpoint are used.

3. Recovery from Media Failure


If disk storage is damaged, recovery is done using backups.

Shadow Paging

• Instead of modifying data pages, a shadow copy is created.


• If a failure occurs, the original data remains unchanged, and the shadow copy is dis-
carded.

ACID PROPERTIES : atomicity, consistency, isolation, and durability.


Atomicity
A transaction's changes to the state are atomic: either all happen or none happen. These
changes include database changes, messages, and actions on transducers.
Consistency
A transaction is a correct transformation of the state. The actions taken as a group do not vio-
late any of the integrity constraints associated with the state.
Isolation
Even though transactions execute concurrently, it appears to each transaction T, that others
executed either before T or after T, but not both.
Durability
Once a transaction completes successfully (commits), its changes to the database survive fail-
ures and retain its changes.

You might also like