0% found this document useful (0 votes)

691 views100 pages

Distributed Database System

This document is a report submitted in partial fulfillment of a Master of Science in Information Technology degree from the Royal University of Phnom Penh. It discusses the objectives and vision of the M.Sc. (Information Technology) program, which aims to provide students with knowledge and skills in fields like computer networking, software engineering, and project management. The report was prepared for the university's Department of Information Technology and advises on ordering additional copies from the program coordinator or author.

Uploaded by

chnarong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

691 views100 pages

Distributed Database System

Uploaded by

chnarong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 100

Submitted in Partial Fulfillment of the requirement for the degree of Master of Science in Information Technology

Chin Narong
2008

INTRODUCTION VISION & OBJECTIVES OF RUPP

The Royal University of Phnom Penh established its Master of Science in Information Technology degree in 2003. The initial aim of the degree was to build the knowledge and expertise of lecturers within the Department of Computer Science, to ensure the highest level of teaching at RUPP. The first generation of students consisted of fifteen lecturers from the Department of Computer Science and 28 candidates from other institutions. This first group graduated in August 2006. Presently, the Master of Science in Information Technology course boasts 54 students. Graduates of the M.Sc. (Information Technology) have the knowledge, understanding, skills and attitude required to perform at an advanced level within both the public and private sectors. They are fully capable of assisting in the development of I.T. in the region and the world. Teaching Methods The course is taught through a combination of lectures, seminars, tutorials, workshops, practical modules and computer laboratory sessions. This variety of learning environments helps prepare students for professional situations outside of their degrees.

Objective The Master of Science in Information Technology (M.Sc. (Information Technology)) degree aims to provide knowledge and understanding of modern information technology systems, in order to prepare students for practical careers within the Information Technology sector. To this end, the curriculum focuses on building understanding in the fields of computer networking, internet programming, software systems engineering, project management and high performance computer systems.

Assessment Students performing at a high level during their degree are selected to research and write a Masters thesis during their final year. In the first intake of students, six of the 48 students submitted a thesis. Those who are not selected to submit a thesis attend classes and sit examinations to attain the equivalent value of credits.

According to the above assessment, this report was prepared for the Department of Information Technology of Royal University of Phnom Penh. The project was advised by Mr. Ouk Chhieng Dean of Computer Science department or Coordinator of Royal University of Phnom Penh. This report is in the private domain. Authorization to reproduce it in whole or in part is granted. Permission to reprint this publication is not necessary. To order copies of this report, student has to: Write to Dr. Ouk Chhieng, Coordinator of Royal University of Phnom Penh, Room # 101, Campus I, call (855) 12 754-344 Write to Mr. Chin Narong, The study report owner, through the address:

[email protected]; [email protected]

This report also is not available on RUPP website at http://rupp.edu.kh.

Upon request, this report is available in alternate formats. For more information, please contact the IT Center, RUPP at 855-23-881-285.

iii

Acknowledgements

The Study report writer would like to express its gratitude to the university, staff, dean and especially to Mr. Ouk Chhieng who was advised for this preparation report. Their contributions to study, such as assessments, observations, surveys, are deeply appreciated in the filed of Distributed Database System. On be half of the writer, I would like to say thanks for their generosity of time and spirit.

Finally, I would like to say thanks to my friends who were given their recommendation and some resource to support my research.

Abbreviation:
DBMS : Database Management System Distributed Database Management System Multi-Database System Global Conceptual Schema Distributed Defect Tracking System Application programming interface Derived Horizontal Fragmentation Primary Horizontal Fragmentation Vertical Fragmentations Read Set Write Set Base Set Transaction Management Distributed Concurrency Control Concurrency Control Timestamp Ordering Two Phase Locking Wait-for graph Global Wait for Graph distributed processing system distributed program reliability distributed system reliability file spanning trees DDBMS : MDBS GCS DDTS API DHF PHF VF RS WS BS TM: DCC CC TO 2PL WFG GWFG DPS DPR DSR FSTs : : : : : : : : : : : : : : : : : : : : :

Contents
I/ INTRODUCTION .................................................................................................................... 1 1 OBJECTIVE OF THE STUDY ........................................................................................... 2 2 SIGNIFICANCE OF THE STUDY .................................................................................... 3 3 Layout of the Study ......................................................................................................... 4 II/ REVIEW OF LITERATURE ................................................................................................. 4 Chapter 1: Introduction ..................................................................................................... 4 Course Outline ................................................................................................................... 4 What is a Distributed Database System? ................................................................ 4 Implicit Assumptions ....................................................................................................... 5 Motivation ............................................................................................................................ 5 Distributed Computing .................................................................................................... 6 What is Distributed? ........................................................................................................ 6 What a Site is? ................................................................................................................... 6 DDBS Environment .......................................................................................................... 6 Distributed Database Graphic ...................................................................................... 7 What is not a Distributed Database System? ........................................................ 7 Shared-Memory Multiprocessor .................................................................................. 7 Centralized DBMS on a Network ................................................................................. 8 Why Distribute Database? ............................................................................................. 8 Advantages of DDBMSs .................................................................................................. 8 Disadvantages of DDBMSs ............................................................................................ 8 Applications ......................................................................................................................... 8 Issues with DDBMS .......................................................................................................... 9 Distributed Transaction Management ..................................................................... 10 Data Fragmentation ....................................................................................................... 11 Fragmentation Independence .................................................................................... 12 DBMS Independence ..................................................................................................... 12 Operating System Independence ............................................................................. 12 Hardware Independence .............................................................................................. 12 Who Should Provide Transparency .......................................................................... 12 Complexity......................................................................................................................... 13 Cost ...................................................................................................................................... 13 Distribution Control ........................................................................................................ 13 Security............................................................................................................................... 13 Distributed Database Design ...................................................................................... 14 Distributed Query Processing ..................................................................................... 14 Distributed Directory Management .......................................................................... 14 Distributed Concurrency Control............................................................................... 15 Distributed Deadlock Management .......................................................................... 15 Reliability of Distributed Databases......................................................................... 15 Operating System Support ......................................................................................... 16 Heterogeneous Databases .......................................................................................... 16 Chapter2: Distributed DBMS Architecture ................................................................ 17 1/ Objective ...................................................................................................................... 17 2/ Types of DDMS Architecture ................................................................................. 17

iii

3/ Distribution .................................................................................................................. 24 4/ Data Processor ........................................................................................................... 29 5/ Heterogeneity ............................................................................................................. 30 6/ Architectural Alternatives ...................................................................................... 30 7/ Implementation Alternatives ................................................................................ 31 8/ Multi-DBS Architecture............................................................................................ 32 Chapter 3 Distributed Database Design..................................................................... 35 1/ Design Problem .......................................................................................................... 35 2/ Alternative Design Strategies............................................................................... 36 3/ Horizontal Fragmentations .................................................................................... 41 4/ Primary Horizontal Fragmentation ..................................................................... 43 5/ Derived Horizontal Fragmentation ..................................................................... 43 6/ Minterm Fragments .................................................................................................. 44 7/ Vertical Fragmentations ......................................................................................... 45 8/ Hybrid Fragmentation ............................................................................................. 46 9/ Allocation Alternatives............................................................................................. 47 10/ Allocation Problem ................................................................................................. 47 11/ Reason of Replication............................................................................................ 48 12/ Thumb Rule ............................................................................................................... 48 Chapter 4 Transaction Management ........................................................................... 49 1/Definition of Transaction.......................................................................................... 49 Unit of Computing .......................................................................................................... 50 2/Database Consistency............................................................................................... 50 3/ Transaction Consistency ......................................................................................... 51 4/ Replica Consistency.................................................................................................. 51 5/ Reliability ...................................................................................................................... 51 6/ Flat Transactions ....................................................................................................... 52 7/ Nested transaction.................................................................................................... 52 8/ Characterization of Transaction........................................................................... 52 9/ Properties of Transactions ..................................................................................... 53 10/ Transaction Manager ............................................................................................. 55 11/ Scheduler ................................................................................................................... 55 12/ Local Recovery Manager ...................................................................................... 55 Chapter 5 Distributed Concurrency Control (DCC) ................................................ 57 1/ CC in Distributed DBMS .......................................................................................... 57 2/ Key Issue...................................................................................................................... 57 3/ Serializability Theory ............................................................................................... 57 4/ Taxonomy .................................................................................................................... 59 5/ Locking Based CC Algorithms............................................................................... 60 6/ Deadlock ....................................................................................................................... 61 7/ Why Deadlocks........................................................................................................... 62 8/ Methods ........................................................................................................................ 62 9/ Methodology ................................................................................................................ 64 10/ Detection.................................................................................................................... 64 11/ Key Issues ................................................................................................................. 64 Solution............................................................................................................................... 65 Chapter 6 Distributed Reliability ................................................................................... 66 1/ Fundamental Definitions ........................................................................................ 66

2/ Distributed Reliability Protocols ........................................................................... 67 3/ Two-Phase Commit Protocol ................................................................................. 67 4/ State Transitions in 2PC ......................................................................................... 68 5/ Three-Phase Commit ............................................................................................... 71 6/ Quorum Protocols for Replicated Databases .................................................. 71 7/ Network Partitioning ................................................................................................ 71 8/ Open Problems ........................................................................................................... 71 9/ In-Place Update Recovery Information ............................................................ 72 10/ Out-of-Place Update Recovery Information ................................................. 73 11/ Execution Strategies ............................................................................................. 73 12/ Checkpoints .............................................................................................................. 75 IV/ APPLIED METHOD ON ORACLE .................................................................................. 76 1/ Oracle Database Architecture................................................................................... 76 Memory Components .................................................................................................... 76 2/ Tasks of an Oracle Database Administrator ....................................................... 78 3/ Database Planning ........................................................................................................ 79 4/ Oracle management framework .............................................................................. 79 4.1/ Startup command ................................................................................................. 83 4.2/ Shutdown command ............................................................................................ 83 4.2/ How table data is stored .................................................................................... 83 4.3/ Automatic Storage Management..................................................................... 84 5/ Database concurrency ................................................................................................. 84 5.1/ PL/SQL ....................................................................................................................... 84 5.2/ Locks .......................................................................................................................... 85 6/ Database Reliability...................................................................................................... 86 6.1/ Principle of Least Privilege ................................................................................. 86 6.2/ Applying the Principle of Least Privilege ...................................................... 86 6.3/ Monitoring for Suspicious Activity .................................................................. 86 6.4/ Back up and Recovery......................................................................................... 87 7/ Database Efficiency ...................................................................................................... 89 7.1/ Listener ..................................................................................................................... 89 8/ Database performance ................................................................................................ 90 IV/ SUMMERY ........................................................................................................................... 91 V/ APPLY ..................................................................................................................................... 91 VI/ CONCLUSION .................................................................................................................... 91 VII/ REFERENCES ................................................................................................................... 92

I/ INTRODUCTION
In the former of human generation, the most of the knowledge management are based on documentation which was written to support the next generation learner to study and improve such as management, technology business, troubleshooting, architecture, law, regulation and so on Hence, after the new cutting-edge of computer science technology is introduced and the improvement of business management as well as any other fields are helped to make more creative and new things to supporting the demands in humanity. The database are play-act the role of data keeper to support the business such as: - Cost - Time - Accountability - Effectiveness; and - Transparency First of all, I would like to let you know: + What is distributed database? With the simple answer form www.webopedia.com say that a database that consists of two or more data files located at different sites on a computer network. Because the database is distributed, different users can access it without interfering with one another. However, the DBMS must periodically synchronize the scattered databases to make sure that they all have consistent data. A database that is under the control of a central database management system (DBMS), which storage devices are not all attached to a common processor. It may be stored in multiple computers located in the same physical location or dispersed over a network of interconnected computers called Distributed database. As like example, in a database, collections of data can be distributed across multiple physical locations (partitions/fragments). Each partitions of a distributed database may be replicated, said by others. Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database, and hence the price the business is willing to spend on ensuring data security, consistency and integrity. + Basic architecture A database users access the distributed database through: - Local applications: applications which do not require data from other sites. - Global applications: applications which do require data from other sites. + Important considerations Care with a distributed database must be taken to ensure the following: - The distribution is transparent: users must be able to interact with the system as if it were one logical system. This applies to the system's performance, and methods of access amongst other things.

- Transactions are transparent: each transaction must maintain database integrity across multiple databases. Transactions must also be divided into sub transactions, each sub transaction affecting one database system... + Advantages of distributed databases - Reflects organizational structure: database fragments are located in the departments they relate to. - Local autonomy: a department can control the data about them (as they are the ones familiar with it.) - Improved availability: a fault in one database system will only affect one fragment, instead of the entire database. - Improved performance: data is located near the site of greatest demand, and the database systems themselves are parallelized, allowing load on the databases to be balanced among servers. (A high load on one module of the database won't affect other modules of the database in a distributed database.) - Economics: it costs less to create a network of smaller computers with the power of a single large computer. - Modularity: systems can be modified, added and removed from the distributed database without affecting other modules (systems). - Fault-tolerant: ability of a computer system or component designed so that, respond gracefully in the event that a component fails, a backup component or procedure can immediately take its place with no loss of service. Fault tolerance can be provided with software, or embedded in hardware, or provided by some combination. + Disadvantages of distributed databases - Complexity: extra work must be done by the DBAs to ensure that the distributed nature of the system is transparent. Extra work must also be done to maintain multiple disparate systems, instead of one big one. Extra database design work must also be done to account for the disconnected nature of the database. For example, joins become prohibitively expensive when performed across multiple systems. - Economics: increased complexity and a more extensive infrastructure means extra labour costs. - Security: remote database fragments must be secured, and they are not centralized so the remote sites must be secured as well. The infrastructure must also be secured (e.g., by encrypting the network links between remote sites). - Difficult to maintain integrity: in a distributed database, enforcing integrity over a network may require too much of the network's resources to be feasible. - Inexperience: distributed databases are difficult to work with, and as a young field there is not much readily available experience on proper practice. - Lack of standards: there are no tools or methodologies yet to help users convert a centralized DBMS into a distributed DBMS. - Database design more complex: besides of the normal difficulties, the design of a distributed database has to consider fragmentation of data, allocation of fragments to specific sites and data replication.

1 OBJECTIVE OF THE STUDY

Regarding to my intention in this afterword is twofold. I wish to express my own knowledge and also wish to identify some issues which I chose to admit from the body of the paper. The body of this document began as a brief account of why distributed database system is more popular than pervious database. It was triggered by a number of

events happening in close proximity. Preparing a paper on a qualitative evaluation led me to think about the sources of responsiveness and the architecture of database system. And I volunteered to provide some documentation for coursework masters dissertations using Oracle 10g Database Administration Workshop I, to accompany a similar document for other forms which I had been learned. This particular document originated as a document for people wish to learn of how to administrate on database system. That was the urgent priority at the time. After completing it, I realized it was suitable for coursework masters dissertations too. By then it had become larger than intended; but perusing it persuaded me that the length was justified by the topic. But I would like to apology to some readers who want me to explain that this paper cant be larger than this, due to my short weekdays. Even though, I had some very encouraging responses from outside people and other educational institutions. So here it is. My experience suggests to me that the changing of computer technology requires a non-positivist approach. This was confirmed by my reading. It appears that many academics who find themselves in the role of change agents are led eventually towards a more flexible approach to their technical service. However, while in sympathy with the actual processes they used in field settings, I thought their supporting arguments were sometimes inadequate. Constructivism provides one example. The positivist view, or so it seems to me, depends upon reality being directly knowable. Many advisors are opposing this with a view that our theories and language inevitably colour what we see. It seems apparent to me that my mental frameworks colour what I would like to describe. I was encouraged to find such views expressed in the literature presentation. However, in this afterword let me try to make my own views clearer than I chose to in the body of this document. It seems to me that to judge a good database paradigm, it is reasonable to take into account the purpose of choosing a right database through my experience analysis on in-source database development such as: To introduce the important concepts, algorithms and techniques in the design of high performance distributed database systems (DDBS or called DDBMS) What are the differences between DDBS and DDB applications? o Database Systems VS. Distributed Database Systems An important purpose of the course is to introduce to you WHAT will happen after you have submitted your program (transaction) to a distributed database system for execution and HOW the system meet the performance requirements (WHAT ARE THE PERFORMANCE REQUIREMENTS???) We hope that the concepts, techniques and algorithms over in this course will be useful to you: When you develop database applications (as an application programmer) with a distributed database system and; When you design a new distributed database system (as a database system designer)

2 SIGNIFICANCE OF THE STUDY

The database administrator and project manager should indicate and defend why it is necessary to undertake the feasibility study on the system requirement. The benefits that will result from this and a proper database and application will be beneficial should be developed.

3 Layout of the Study

This project paper consists of five chapters as following: Chapter 1 Introduction Chapter 2 Review of literature Chapter 3 Research methodology Chapter 4Data analysis Chapter 5Conclusion and recommendation

II/ REVIEW OF LITERATURE

Chapter 1: Introduction Course Outline
What is a distributed DBMS Problems Current state-of-affairs Background Distributed DBMS Architecture Distributed Database Design Semantic Data Control Distributed Query Processing Distributed Transaction Management Parallel Database Systems Distributed Object DBMS Database Interoperability Current Issues

What is a Distributed Database System?

A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (DDBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users.

Distributed database system (DDBS) = DDB + DDBMS

Distributed Database System DBMS data DBMS data DBMS data DBMS data

The below, is database processing in one of the others site of in Distributed DBMS Environment

Implicit Assumptions
Data stored at a number of sites each site logically consists of a single processor. Processors at different sites are interconnected by a computer network no multiprocessors parallel database systems Distributed database is a database, not a collection of files data logically related as exhibited in the users access patterns relational data model D-DBMS is a full-fledged DBMS not remote file system, not a TP system

Motivation
Decentralization Disaster recovery and backup manipulation Technology integration Data replication

Distributed Computing
A number of autonomous processing elements (not necessarily homogeneous) that are interconnected by a computer network and that cooperate in performing their assigned tasks. Synonymous terms distributed function distributed data processing multi-processors/multi-computers satellite processing backend processing dedicated/special purpose computers timeshared systems functionally modular systems

What is Distributed?
Processing Logic: In fact the definition of distributed computing elicits processing elements or processing logic are distributed Function: Various functions of computer system could be delegated to various pieces of hardware or software Data: Data used by a number of applications may be distributed to a number of processing sites Control: The control of the execution of various tasks might be distributed instead of being performed by one computer system

What a Site is?

Each site participating in a distributed database system has its own local real databases, its own local users, its own local DBMS & transaction management software including its own local locking, logging, recovery etc. and its own local data communication manager.

DDBS Environment

Distributed Database Graphic

What is not a Distributed Database System?

A timesharing computer system A tightly coupled (shared memory) or loosely coupled (shared disk) multiprocessor system A database system which resides at one of the nodes of a network of computers - this is a centralized database on a network node

Shared-Memory Multiprocessor
Processor Unit Processor Unit Processor Unit

Memory

I/O System

Centralized DBMS on a Network

Why Distribute Database?

Organizational - geographic distribution Multinational company with distributed departments (e.g. IBM) Interconnection of existing databases (Merger of two companies e.g. Singtel & Optus) Incremental growth/scalability (Addition of new branches or warehouses) Reduced communication overhead Reliability and availability Performance considerations (parallelism) Local autonomy

Advantages of DDBMSs
Reflects organizational structure Improved share ability and local autonomy Improved availability Improved reliability Improved performance Economics

Disadvantages of DDBMSs
Complexity Cost Security Integrity control more difficult Lack of standards Lack of experience Database design more complex

Applications
Manufacturing - especially multi-plant manufacturing Military command and control Corporate MIS Airlines Hotel chains Payment system Any organization which has a decentralized organization structure

Issues with DDBMS

Fragmentation: Relation may be divided into a number of sub-relations, which are then distributed. o Horizontal subset of rows o Vertical subset of columns Each fragment must contain primary key Other columns can be replicated o Mixed both horizontal and vertical o Must be able to reconstruct original table o Can query and update through fragment o Strategize to achieve: Locality of Reference Improved Reliability and Availability Improved Performance Balanced Storage Capacities and Costs Minimal Communication Costs. Correctness of Fragmentation Completeness Decomposition of relation R into fragments R1, R2, ..., Rn is complete if and only if each data item in R can also be found in some Ri Reconstruction If relation R is decomposed into fragments R1, R2, ..., Rn, then there should exist some relational operator such that R = 1in Ri Disjointness If relation R is decomposed into fragments R1, R2, ..., Rn, and data item di is in Rj, then di should not be in any other fragment Rk (k j ) Allocation: Each fragment is stored at site with "optimal" distribution. Replication: Copy of fragment may be maintained at several sites. o Storing data at multiple sites Example: Internet grocer with multiple warehouses. CUSTOMER (Cust#, Addr, Location) Customer info at central location Location is warehouse that makes deliveries o o o Where do we store tables? Fragment? Replicate?

Distributed Transaction Management

There are two major aspects to transaction management. o Recovery Control & o Concurrency Control Both requires an extended treatment in distributed systems In a distributed system, a single transaction can involve the execution of code at many sites. It can involve updates at many sites. Each transaction is said to consist of several agents, where an agent, is the process performed in behalf of a given transaction at a given site. And the system needs to know when those agents are part of the same transaction. For instance, two agents that are a part of the same transaction must obviously not be allowed to deadlock with each other. In order to ensure that a given transaction is atomic (all or nothing) in distributed environment, the system must ensure that the set of agents for that transaction either all commit in unison or all rollback in unison. This effect can be achieved by means of the two phase commit protocols.

Before we go ahead for the next step, there are few requirements that caused those being the using of distributed database system with transparency.

Network Transparency
In a distributed database management environment, the network needs to be shielded in a manner data is shielded in centralized DBMS. Preferably, the user should be protected from the operational details of the network. Further more, it is desirable to hide even the existence of a network. Then there would be no difference between database applications that would run on a centralized database or on distributed database.

Distribution Transparency
Location transparency: refers to the fact that the command used to perform a task is independent of both the location of data and the system on which an operation is carried out. Naming Transparency: this means that a unique name is provided for each object in the database. In the absence of naming transparency users are required to embed the location name as the part of the object name.

Location Independence
Users should not have to know where data is physically stored but rather should be able to behave at least from a logical stand point as if the data were all stored at their own local site Location independence is desirable as it simplifies user programs and terminal activities. It allows data to migrate from one site to another without invalidating any of those programs or activities. Such migratability is desirable because it allows data to be moved around network in response to changing performance requirements. Location independence is just an extension to the distributed case of familiar concept of physical data independence

Data Replication
For performance, reliability, and availability reasons, it is usually desirable to be able to distribute data in a replicated fashion across the machines in a network. Such replication helps performance since diverse and conflicting user requirements can more easily be accommodated. Data that is commonly accessed by one user can be placed on that users local machine as well as the machine of another user with the same access requirements. This increases the locality of reference. Further if one machine fails, a copy of the same data is still available on another machine on the network.

Replication Transparency
Assuming that data is replicated, the issue related to transparency that needs to be addressed is whether the users should aware of the existence of copies or whether the system should handle the management of copies and the user should act as if there is a single copy of data. From a users perspective it is obvious and from the system perspective it is not that simple. It is not the system that decides whether or not to have copies and how many copies to have, but the user application. It is desirable that replication transparency be provided as a standard feature of DBMS. Distributing these replicas across a network in a transparent manner is the domain of network transparency.

Data Fragmentation
In a distributed database environment it is commonly desirable to divide each database relation into smaller fragments and treat each fragment as a separate database object. This is commonly done for reasons of performance, reliability, and availability. Furthermore, fragmentation reduces the negative effects of replication. Each replica is not the full relation but only a subset of it, thus less space is required and fewer data items need to be managed.

There are two general types of fragmentation alternatives: o Horizontal Fragmentation: A relation is partitioned into a set of sub relations each of which have a subset of tuples (rows) of original relation. o Vertical Fragmentation: Where each sub relation is defined on a subset of attributes (columns) of original relation.

Fragmentation Independence
When database objects are fragmented the user queries that were specified on entire relation now required to be dealt with the sub relations. Typically this requires a translation from what is called a global query to several fragment queries.

DBMS Independence
Under this heading all that is really needed that the DBMS instances at different sites all support the same interface they do not necessarily all have to be copies of the same DBMS software. If Ingress and Oracle both supported the official SQL standard then it might be possible to get an Ingress site and an Oracle site to talk to each other in the context of distributed system Support of heterogeneity is definitely desirable. The fact is, real world computer installations typically run not only many different machines and many different operating systems, they very often run different DBMS as well; and it would be nice if those different DBMSs could all participate somehow in a Distributed system. In other words ideal distributed system should provide DBMS independence.

Operating System Independence

This objective is partly a corollary of the previous one and is obviously desirable, not only to run the same DBMS on different hardware platforms, but also to be able to run it in different operating system platforms as well including different operating system on the same hardware - and have an MVS version and a UNIX version and an NT version all participate in the same distributed system

Hardware Independence
Real world computer installations typically involve a multiplicity of different machines IBM machines, ICL machines, HP machines, PCs and workstations of various kinds etc. etc. and there is a real need to be able to integrate the data on all of those systems and present the user with a single system image. Thus it is desirable to run the same DBMS on different hardware platforms, and further more to have all those machines all participate as equal partners in distributed systems.

Who Should Provide Transparency

The below, mode of operation is the most common method today: The responsibility of providing transparent access can be left with the access layer. The transparency features can be built into the user language, which then translates the required services into required operations. In other words, the compiler or interpreter takes over the task and makes the user free from all these botherations.

The second layer at which transparency is provided is the operating system level. State of the art operating systems provide some level of transparency to system users. Providing transparent access to resources at the operating system level can obviously be extended to the distributed environment, where the management of the network resource is taken over by the distributed operating system.

The third layer at which transparency can be supported is within the DBMS. It is the responsibility of the DBMS to make all necessary translations from the operating system to the higher level user interface.
DBMS problems are inherently more complex than centralized database management ones as they include not only the problems found in centralized environment but also a new set of unresolved problems

Complexity

Cost
Distributed systems require additional hardware (communication mechanisms etc.) thus have increased hardware costs. However the trend towards decreasing hardware costs does not make this a significant factor. The most important cost component is due to the replication of effort (manpower), which usually results in an increase in the personnel in the data processing operations. Therefore, the trade-off between increased profitability due to more efficient and timely use of information and increased personnel costs has to be analyzed carefully.

Distribution Control
This point was stated previously as an advantage of DDBS. Unfortunately, distribution creates problems of synchronization and coordination. (Disavantage) Distributed control can, therefore, easily becomes a liability if care is not taken to adopt policies to deal with these issues.

Security
One of the major benefits of centralized databases has been the control it provides over the access to data. Security can easily be controlled in a central location with DBMS enforcing the rules. However in distributed database system, a network is involved which is a medium that has got its own security requirements. It is well known that there are serious problems in maintaining adequate security over computer networks. Thus the security problems in distributed database systems are by nature more complicated then in centralized ones.

The distributed database system issues:

Distributed Database Design

How the database and applications that run against should be placed across sites. There are two basic alternatives to placing data: Partitioned and Replicated. o In the partitioned scheme the database is divided into a number of disjoint partitions each of which is placed at a different site. Replicated designs can be fully replicated (fully duplicated) where the entire database is stored at each site, or partially replicated where each partition of the database is stored at more than one site, but not at all the sites.

The research in this area mostly involves mathematical programming in order to minimize cost of storing the database, processing transactions against it, and communications.

Distributed Query Processing

Query processing deals with designing algorithms that analyze queries and convert them into a series of data manipulation operations. The problem is how to decide on a strategy for executing each query over the network in the most cost effective way. o The factors to be considered are the distribution of data, communication costs, lack of sufficient locally available information. The objective is to optimize where the inherent parallelism is used to improve the performance of executing the transactions.

Distributed Directory Management

A directory contains information (descriptions & locations) about data items in the database. Problems related to directory management are similar in nature to database placement problem. A directory may be global to the entire DDBS or local to each site; it can be centralized at one site or distributed over several sites; there can be single copies or multiple copies etc.

Distributed Concurrency Control

Concurrency control involves the synchronization of access to the distributed database, such that the integrity of the database is maintained. The concurrency control problem in distributed context is some what different then in a centralized framework. One not only has to worry about the integrity of a single database but also about the consistency of multiple copies of the database. Pessimistic Synchronizing the user requests before execution starts and Optimistic, executing the requests and then checking if the execution compromised the consistency of the database are two independent approaches to get rid of concurrency related problems. Locking which is based on mutual exclusion of access of data items, and time stamping, where the transactions are executed in some predetermined order. There are variations of these schemes as well as hybrid algorithms that attempt to combine the two basic mechanisms.

Distributed Deadlock Management

The competition among users for access to a set of resources (data) can result in dead locks if the synchronization mechanism is based on locking. The well known alternatives of prevention, avoidance, and detection/recovery also applies to distributed databases

Reliability of Distributed Databases

It is important that mechanisms be provided to ensure the consistency of the database as well as to detect failures and recover from them. The implication for DDBSs is that when failure occurs and various sites become either inoperable or inaccessible, the database at the operational sites remains consistent and up to date. Further when computer systems or network recovers from failures, the DDBSs should be able to recover and bring the database at the failed sites up to date. This may be especially difficult in case of network partitioning where the sites are divided into two or more groups with no communication among them.

The distributed database system related issues:

Operating System Support

The current implementation of distributed database systems on the top of the conventional operating systems suffers from the performance bottleneck the support provided by operating systems for database operations do not correspond properly to the requirement of the database management software. The major operating system related problems in single processor systems are memory management, file systems and access methods, crash recovery and process management. In distributed environments there is additional problem of having to deal with multiple layers of network software. The work in this area is on finding solutions to the dichotomy of providing adequate and simple support for distributed database operations, as well as providing general operating systems support for other applications.

Heterogeneous Databases
When there is no homogeneity among databases at various sites either in terms of the way data is logically structured (data model) or in terms of mechanisms provided for accessing it (data language), it becomes necessary to provide a translation mechanism between database systems. This translation mechanism usually involves a canonical form to facilitate data translation, as well as program templates for translating data manipulation instructions.

Chapter2: Distributed DBMS Architecture 1/ Objective

To structure the distributed DBMS such that it provides the requisite functionality. To understand clearly the levels of transparency. To study the design space for distributed DBMS and implementation details. To identify alternatives & give examples To segregate the participating component and to discuss their optimal utilization.

2/ Types of DDMS Architecture

Basically there exist three reference architectures for distributed DBMS, below: Client/Server Systems Peer to Peer Distributed Systems Multi Database Systems As regarding, to DBMS Standardization

First Attempts
In 1972, the Computer & Information Processing Committee of American National Standards Institute (ANSI) established a study group on DBMS under the auspices of its Standards Planning and Requirements Committee (SPARC).

The Mission
To study the feasibility of setting up standards in the area of Database Management System. Determining all areas that can be standardized if feasible

Proposal
The architectural framework proposed came to be known as the ANSI/SPARC architecture. The study proposed that the interface be standardized, and defined as architectural framework that contained 43 interfaces, 14 of which would deal with the physical storage subsystem of the computer.

Standardization Approaches
A reference model can be described according to three different approaches + Based on Components The components of the system are defined together with the interrelationship between the components. Thus a DBMS consists of a number of components, each of which provides some functionality. The orderly and well-defined interaction of these components provides the total system functionality

+ Based on Functions The different classes of users are identified and the functions that the system will perform for each class is defined. This results in hierarchical system architecture with well-defined interfaces between the functionalities of different layers. The ISO architecture falls in this category

+ Based on Data The different types of data are identified, and an architectural framework is specified which defines the functional units that will realize or use data according to these different views. Since data is the central resource that a DBMS manages, this approach (data-logical approach) is claimed to be the preferable choice for standardization activities

Based on Data Organization

The ANSI/SPARC architecture is claimed to be based on data organization It recognizes three views of data: o The External View: which user might be a programmer o The Internal View: that of a system or machine; and o The Conceptual View: that of the enterprise. For each of these views, an appropriate schema definition is required

+ Internal View At the lowest level of the architecture is the internal view, which deals with the physical definition and organization of data. The location of data on different storage devices and the access mechanisms used to reach and manipulate data are the issues dealt with this level.

Internal Schema
At the internal level, the storage details of these relations are described. Assuming that the emp relation is stored in an indexed file, where the index is defined on the key attribute (ENO) called EMINX. It is also assumed a HEADER field is associated which might contains flags (delete, update, etc.) and other control information. Then the internal schema definition of the relation may be as follows:

INTERNAL_REL EMPL [ INDEX ON E# CALL EMINX FIELD = { HEADER : BYTE(1) E# : BYTE (9) E:NAME : BYTE (15) TIT : BYTE (10) } ]
+ External View At the other extreme is the external view which is concerned with how users view the database. An individual users view represents the portion of the database that will be accessed by the user as well as the relationship that the user would like to see among the data. A view can be shared among a number of users with the collection of user views making up the external schema Finally the external views can be described using SQL notation. Considering two applications for example say one that calculates the payroll payments for engineers, and the second that produces a report on the budget of each project.

External Schema

Thus they can be defined as follows.

CREATE AS

VIEW SELECT

FROM WHERE

PAYROLL (ENO, ENAME, SAL) EMP.ENO, EMP.ENAME, PAY.SAL EMP, PAY EMP.TITLE=PAY.TITLE

The second application is simply a projection of PROJ relation which can be specified as:

CREATE AS

VIEW SELECT FROM

BUDGET (PNAME, BUD) PNAME, BUDGET PROJ

+ Conceptual Schema In between these two ends is the conceptual schema, which is an abstract definition of the database. It is the real world view of the enterprise being modeled in the database. As such, it is supposed to represent the data and the relationship among data without considering the requirements of individual applications or restrictions of physical storage media An Example: Considering the Engineering Database example for the four relations o EMP, o PROJ, o ASG, and o PAY, Conceptual Schema should describe each relation with respect to its attributes and key. The description might look like the following: 1/ RELATION PAY (Conceptual) RELATION PAY [ KEY = {TITLE} ATTRIBUTES = { TITLE : CHARACTER (10) SAL : NUMERIC (6) } ] 2/ RELATION PROJ (Conceptual) RELATION PROJ [ KEY = {PNO} ATTRIBUTES = { PNO : CHARACTER (7) PNAME : CHARACTER (20) BUDGET : NUMERIC (7) } ]

ANSI/SPARC Architecture
The investigation of the ANSI/SPARC architecture with respects to functions results in a considerably more complicated view. Conventions o Square Boxes : o Hexagons : o Arrows : o I shaped Bars: o Triangle : Processing Functions Administrative Roles Data, Command, Program Interfaces Data Dictionary

Enterprise Administrator

Database Administrator

Conceptual Database Schema Processor

Application System Administrator

Internal Database Schema Processor

GD/D

External Database Schema Processor

Internal storage/ internal DB trans

Internal DB/ conceptual trans.

Conceptual/ external DB trans

Internal Database
Application

External Database
Application

Program

System Programmer

Application Programmer

Partial Schematic of ANSI/SPARC Architectural Model Data Dictionary/Directory The major components that permits mappings between different data organizational views is the data dictionary/directory (depicted as a triangle), which is a meta database which contains the schema & mapping definitions. It also contains the usage statistics, access control information and the like. It serves as the central component in both processing different schemes and in providing mapping among them.

Roles Database Administrator: is responsible for maintaining the internal schema definition. Enterprise Administrator is responsible for defining the internal schema definition. Application administrator is responsible for preparing the external schema for the applications.

Architectural Models
Classification Considering the possible ways in which multiple databases may be put together the system can be classified with respect to o The Autonomy of Local System o Their Distribution o Their Heterogeneity.

Autonomy Autonomy refers to the distribution of control, not of data. It indicates the degree to which individual DBMS can operate independently Autonomy is also a function of a number of factors such as whether the component systems exchange information, whether they can independently execute transactions, and whether one is allowed to modify them.

Requirements of Autonomous Systems According to Gligour & Popescu Zeletin: o The local operations of individual DBMS are not affected by their participation in multidatabase systems. o The manner in which individual DBMSs processes queries and optimize them should not be affected by the execution of global queries that access multiple database. System consistency or operation should not be compromised when individual DBMSs join or leave the multidatabase confederation.

According to Du and Elmagarmid: o Design Autonomy: Individual DBMSs are free to use the data models and transaction management techniques that they prefer. o Communication Autonomy: Each of the individual DBMSs is free to make its own decision as to what type of information it wants to provide to the other DBMS or to software that controls their global execution Execution Autonomy: Each DBMS can execute the transactions that are submitted to it in any way that it wants to

Aspects of Classification Tight Integration Systems In these tightly integrated systems, the data managers are implemented so that one of them is in control of the processing of each users request even if that request is serviced by more than one data manager. The data managers do not operate as independent DBMS even though they usually have the functionality to do so. o Where a single image of the entire database is available to any user who wants to share the information, which may reside in multiple database. From the users perspective the data is logically centralized in one database

Semiautonomous Systems o That consists of DBMS that can operate independently, but have decided to participate in a federation to make their local data shareable. o Each part of DBMS determines what parts of their own database they will make accessible to users of other DBMS. They are not fully autonomous systems because they need to be modified to enable them to exchange information with one another.

Total Isolation o Where the individual systems are stand alone DBMSs, which know neither of the existence of other DBMSs nor how to communicate with them. o In such systems, the processing of user transactions that access multiple databases is especially difficult since there is no global control over the execution of individual DBMSs

3/ Distribution
Whereas autonomy refers to distribution of control, the distribution dimension deals with data. There are a number of ways DBMS have been distributed. Mainly they are of two types.

3.1/ Client-Server Distribution

A database server is the Oracle software managing a database, and a client is an application that requests information from a server. Each computer in a system is a node. A node in a distributed database system act as a client, a server, or both, depending on the situation 3.1.1/ Distributed DBMS Architecture o Client/Server DBMSs entered the computing scene at the beginning of 1990s and have made a significant impact on both the DBMS technology and the way we do computing. This provides a two level architecture, which makes it easier to manage the complexity of modern DBMSs and the complexity of distribution.

3.1.2/ General Idea The general idea is very simple and elegant to distinguish the functionality that need to be provided and divide these functions into two classes: Server functions and Client functions. 3.1.3/ Client-Server Reference Architecture

3.1.4/ Process Centric View o Any process that requests the services of another process is its Client and vice versa. However it is important to note that Client Server Computing and Client Server DBMS assist is used in its most modern context do not refer to processes, but to actual machines. Thus the focus is into what software should run on Client machines and what software should run on the Server machine

3.1.5/ Task Management (Server) o The first and foremost important thing in a Client/Server Architecture is that the server does most of the data management work. This means that all Query processing, Query optimization, Transaction management and Store management is done at the server.

3.1.6/ Task Management (Client) o o The client provides Application and User Interface DBMS client module that is responsible for Managing the data which is cached to the client, Managing the transaction locks and Managing Consistency checking of user queries at the client side Of course, there is operating system and communication software that runs on clients and server but communication between client and server is at the level of SQL statements. In other words, client passes SQL queries to the server without trying to understand or optimize them. The server does most of the work and returns the result relation to the client.

Advantages
More efficient division of labor Horizontal and vertical scaling of resources Better price/performance on client machines Ability to use familiar tools on client machines Client access to remote data (via standards) Full DBMS functionality provided to client workstations Overall better system price/performance Classification There are a number of different types of client server architecture. Multiple Client / Single Server Multiple Client / Multiple Server Multiple Client-Single Server o The simplest is the case where is only one server which is accessed by multiple clients. From data management perspective this is not much different from centralized databases since the database is stored only in one machine which also hosts the software to manage it. However there are some important differences in the way transactions are executed and caches are managed.

Problems o o o Server forms bottleneck (as we can see above) Server forms single point of failure Database scaling difficult

Multiple Client-Multiple Server o o The sophisticate client/server architecture is there are multiple servers in the system. In this case two alternative management strategies, that is called direct and indirect connection in oracle, are possible: either each client manages its own connection to the appropriate server or each client knows of only its home server which then communicates with other servers as required

3.2/ Peer to Peer Distributed Systems

A client can connect directly to a database server. When the client application issues the first and third statements for each transaction, the client is connected directly to the database that contains the remote data. 3.2.1/ Data Organizational View o Looking at the data organizational view the physical data organization on each machine is different. This means that there needs to be an individual internal schema definition at each site, termed as local internal schema (LIS). The enterprise view of the data is described by the Global Conceptual Schema (GCS), which is global because it describes the logical structure of the data at all the sites.

3.2.2/ Three Layer Architecture o As the data in a distributed database is usually fragmented and replicated, to handle the phenomenon of fragmentation and replication, the logical organization of data at each site needs to be described. Therefore there needs to be a third layer in the architecture, the local conceptual schema (LCS). Thus the global conceptual schema (GCS) is union of the local conceptual schema. Finally, user applications and user access to database is supported by external schemas (ESs)

3.2.3/ What does it Supports o This architectural transparency. model provides all the necessary levels of

Data independence is supported since the model is an extension of ANSI/SPARC, which provides such independence naturally. Location and replication transparencies are supported by the definition of the local and global conceptual schemas and the mapping in between. Network transparency on the other hand is supported by the definition of the global conceptual schema. The user queries data irrespective of its location or of which local component of the distributed database system will service it as the distributed DBMS translates global queries into a group of local queries, which are executed by distributed DBMS components at different sites that communicate with one another.

3.2.4/ Functional Description In terms of the detailed functional description of our model, the ANSI/SPARC model is extended by addition of a global directory/dictionary (GD/D) that permits the required global mappings. The local mappings are still performed by a local directory/ dictionary. Thus the local database management component is integrated by means of global DBMS functions. As the local conceptual schemas are mappings of the global schema onto each site such database is designed in top down fashion and therefore all external view definitions are made globally. There is an existence of a local database administrator at each site in order to have local control over the administration of data, which is one of the primary motivations of distributed processing. 3.2.5/ Component Description Distributed DBMS consists of a number of components. One component handles the interaction with users, and other deals with storage. The first major component, which is called as the user processor, consists of four elements: 3.2.5.1/ User Interface Handler: is responsible for interpreting user commands as they come in, and formatting the result data as it is sent to the user. This component is responsible is responsible in establishing a link between the system at one end and the user on the other. 3.2.5.2/ Semantic Data Controller: uses the integrity constraints and authorizations that are defined as part of the global conceptual schema to check if the user query can be processed. This component is responsible for authorization and other functions. 3.2.5.3/ Global Query Optimizer and Decomposer: are determines an execution strategy to minimize a cost function, and translates the global queries to local ones using the global and local conceptual schemas as well as the global directory. The global query optimizer is responsible among other things for generating the best strategy to execute distribute join operations.

3.2.5.4/ Distributed Execution Monitor: co-ordinates the distributed execution of request. it is also called the distributed transaction manager. In executing queries of a distributed fashion the execution monitors at various sites may, and usually do, communicate with one another.
USER

System Responses
USER PROCESSOR

User Requests

User Interface Handler External Schema

Semantic Data Controller

Global Conceptual Schema

Global Query Optimizer

Global Execution Monitor

GD/D

4/ Data Processor
The second major component of a distributed DBMS, which the primary component Oracle uses, is the data processor and consists of three elements:

4.1/ The Local Query Optimizer

Its actually acts as the access path selector, is responsible for choosing the best access path to access any data item.

4.2/ Local Recovery Manager

The local recovery manager is responsible for making sure that the local database remains consistent even when failure occurs. 4.3/ Runtime Support Processor It physically accesses the database according to the physical commands in the schedule generated by the query optimizer. It is the interface to the operating system and contains the database buffer (or cache) manage, which is responsible for maintaining the main memory buffers and managing the data access
DATA PROCESSOR Local Query Processor Local Conceptual Schema

Local Recovery Manager

System Log

Runtime Support Processor

Local Internal Schema

4.4/ Component Architecture

5/ Heterogeneity
Heterogeneity may occur in various forms in distributed systems, ranging from hardware heterogeneity and difference in networking protocols to variations in data managers. The important ones relate to data models, query languages, and transaction management protocols. Representing data with different modeling tools creates heterogeneity because of the inherent expressive powers and limitations of individual data models Heterogeneity in query languages not only involves the use of complexity different data access paradigm in different data models but also covers differences in languages even when the individual system use the same data model. Different query languages that use the same data model often select very different methods for expressing identical requests (e.g. DB2 uses SQL, while INGRES uses QUEL).

6/ Architectural Alternatives
The alternatives along each dimension are identified by numbers as 0,1, or 2 and these numbers of course have different meanings along each of the dimensions. Along the autonomy dimension, 0 represents tight integration, 1 represents semi autonomous systems and 2 represents total isolation. Along distribution, 0 identifies homogenous systems while 1 stands for heterogeneous systems.

7/ Implementation Alternatives

7.1/ (A0, D0, H0):

The first, class of systems are those which are logically integrated. Such systems can be given a generic name composite system. If there is no distribution or heterogeneity, the system is a set of multiple DBMSs that are logically integrated. There are not many examples of such systems, but they may be examples of shared everything multiprocessor systems

7.2/ (A0, D0, H1):

If heterogeneity is introduced, one has multiple data managers that are heterogeneous but provides an integrated view of the user. In the past some work was done in this class where systems are designed to provide integrated access to the network, hierarchical, and relational databases residing on a single machine.

7.3/ (A0, D1, H0):

The more interesting case is where the database is distributed even though an integrated view is provided to users. This alternative represents client server distribution

7.4/ (A0, D2, H0):

This point in the design space represents a scenario where the same type of transparency is provided to the user in a fully distributed environment. There is no distinction among clients and servers, each site providing identical functionally.

7.5/ (A1, D0, H0)

The next points in the autonomy dimension are semiautonomous systems, which are commonly termed as federated systems. The Component Systems in a federated environment have significant autonomy in their execution, but their participation in a federation indicate that they are willing to co-operate with others in executing user requests that access multiple databases.

7.6/ (A1, D0, H1)

These are systems that introduce heterogeneity as well as autonomy, what we might call a heterogeneous federated DBMS. Examples of these are easy to find in everyday use.

7.7/ (A1, D1, H1)

Systems of this type introduce distribution by placing component systems on different machines. They may be referred to as distributed heterogeneous federated DBMS. It is fair to state that the distribution aspects of these systems are less important then their autonomy and heterogeneity. Distribution introduces some new problems, but generally the techniques developed for homogenous and autonomous distributed DBMS can be applied to deal with those issues.

7.8/ (A2, D0, H0)

If we move to full autonomy, we get what we call the class of multi-database system (MDBS) architectures. The identifying characteristic of these systems is that the components have no concept of cooperation and they do not even know how to talk to each other. Without heterogeneity or distribution, an MDBS is an interconnected collection of autonomous databases.

7.9/ (A2, D0, H1)

This case is realistic, maybe even more so than (A1, D0, H1), in that we always want to build application which access data from multiple storage systems with different characteristics. Some of these storage systems may not even be DBMSs and they certainly have not been designed and developed with a view to interoperating with any other software.

7.10/ (A2, D1, H1) and (A2, D2, H1)

Make up the MDBS are distributed over a number of sites. We called distributed MDBS. The solutions to distribution issues for the two cases are similar and the general approach to dealing with interoperability is not differ too much. The major differences is that in the case of client/server distribution (A2, D1, H1), most of the interoperability concerns are delegated to middleware systems can be homogeneous or heterogeneous.

8/ Multi-DBS Architecture
The differences in the level of autonomy between the distributed multi-DBMS and distributed DBMSs are also reflected in their architectural models. The fundamental difference relates to the definition of the global conceptual schema.

8.1/ Fundamental Difference

In the case of logically integrated distributed DBMSs, the global conceptual schema defines the conceptual view of the entire database, while in case of distributed multi-DBMSs it represents only a collection of some of the local databases that each local DBMS want to share. Thus the definition of a global database is different in MDBSs then in distributed DBMSs. In the latter, the global database is equal to the union of local databases, where as in the former it is only a subset of the same union

8.2/ Models using GCS

In a MDBS, the GCS is defined by integrating either the external schemas of local autonomous databases or parts of their local conceptual schemas. Furthermore, users of local DBMS define their own views on local database and do not need to change their application if they do not want to access data from another database. This is again an issue of autonomy

8.3/Designing the GCS

Designing the global conceptual schema in multi-database systems involves the integration of either the local conceptual schemas or the local external schemas. A major difference between the design of the GCS in multi-DBMSs and in logically integrated distributed DBMS is that in the former the mapping is from local conceptual schema to global schema. In the latter, however, mapping is in the reverse direction. Hence the former is usually a bottom up approach, whereas in the latter it is usually a top down procedure

8.4/ Multi-DBMS Architecture

Once the GCS has been designed, views over the global schema can be designed for users who require global access. It is not necessary for the GES and GCS to be defined using the same data model and language; whether they do or not determines whether the system is homogenous or heterogeneous.

8.5/ Classification of multi DBMS

If heterogeneity exists in the system, then two implementation alternatives exist: 8.5.1/ Unilingual multi DBMS A unilingual multi DBMSs requires the users to utilize possibly different data models and languages when both a local database and the global database is accessed. The identifying characteristic of unilingual system is that any application that access data from multiple databases must do so by means of external view that is defined on global conceptual schema This means that the user of global database is effectively a different user than those who access only a local database, utilizing a different data model and a different data language. Thus one application may have a local external schema (LES) defined on the local conceptual schema as well as global external schema (GES) defined on global conceptual schema. The different external view definition may require the use of different access languages. Examples of such architecture are MULTIBASE system and DDTS.

8.5.2/ Multilingual multi DBMS An alternative is multilingual architecture, where the basic philosophy is to permit each user to access the global database by means of external schema, defined using the language of the users local DBMS. The GCS definition is quite similar in multilingual architecture and the unilingual approaches, the major difference being the definition of external schema of local database. Queries against the global database are made using the language of local DBMS, but they generally require some processing to be mapped to the global conceptual schema. The multi lingual approach obviously makes querying the database easier from the users perspective. However, it is more complicated because translation of queries is required at runtime. The multilingual approach is used in Sirus-Delta and in the HD-DBMS project

8.6/ Component based Model

The component based architectural model of multi-DBMS is significantly different from a distributed DBMS. The fundamental difference is the existence of full-fledged DBMSs, each of which manages a different database. The MDBS provides a layer of software that runs on top of these individual DBMSs and provides users with the facilities of accessing various databases. Depending on the existence or lack of global conceptual schema or the existence of heterogeneity or lack of it, the contents of these layers will change significantly. If the system is distributed it is necessary to replicate the multidatabase layer to each site where there is a local DBMS that participates in the system. As far as the individual DBMSs are concerned, the MDBS layer is simply another application that submits requests and receives answers

8.7/ Directory Issues

Chapter 3 Distributed Database Design 1/ Design Problem

In the general setting, to make a decision about the distribution (placement) of data and programs across the sites of a computer network as well as possibly designing on the network itself. In the case of distributed DBMSs, the distribution of applications involves two things: distribution of distributed DBMS software and

distribution of application programs that run on it

The former is not a significant problem, since we assume that a copy of the distributed DBMS software exists at each site where data are stored. Here we are not concerned with the placement of application program. Furthermore, we assume that the network has already been designed, or will be designed at a later stage, according to the decisions related to distributed database design. We only concentrate on distribution of data. It has been suggested that the organization of distributed system can be investigated along three orthogonal dimensions: Dimensions of the Problem

1.1/ Level of Sharing

In terms of the level of sharing, there are three possibilities: 1.1.1/ No Sharing Firstly, there is no sharing each application and data execute at one site and there is no communication with any other program or access to any data files at other sites. This characterizes the very early days of networking and is not very common today. 1.1.2/ Data Sharing We then find the level of data sharing where all the programs are replicated at all the sites, but data files are not. Accordingly, user requests are handled at the site where they originate and the necessary data files are moved around the network 1.1.3/Data + Program Sharing Finally, in data plus program sharing, both data and programs may be shared meaning that a program at a given site can request a service from another program at a second site, which in turn, may have access a data file located at a third site. In a heterogeneous environment it is usually very difficult, and sometimes impossible, to execute a given program on different hardware under a different operating system. It might, however, be possible to move data around relatively easily.

1.2/ Access Pattern Behavior

Along the second dimension of access pattern behavior, it is possible to identify two alternatives: 1.2.1/ Static The access patterns of user requests may be static, so that they do not change over time. It is obviously considerably easier to plan and manage the static environments. Unfortunately, it is difficult to find many real life distributed applications that would be classified as static. 1.2.2/ Dynamic The significant question then is not whether a system is static or dynamic, but how dynamic it is. Incidentally, it is along this dimension that the relationship between the distributed database design and query processing is established.

1.3/ Level of Knowledge

The third dimension of classification is the level of knowledge about the access pattern behavior. The possible alternatives are: 1.3.1/ No Information This is a theoretical possibility, but it is very difficult, if not impossible, to design a distributed DBMS that can effectively cope with this situation. Designers do not have any information about how users will access the database. 1.3.2/ Complete Information The more practical alternative are that the designers have complete information, where the access patterns can reasonably be predicted and do not deviate significantly from the predictions 1.3.3/ Partial Information Designers have partial information about how users will access the database thus there is a significant variation from the normal predictions.

2/ Alternative Design Strategies 2.1/ Alternative Design Approach

This activity is a joint function of the database, enterprise, and application system administrators. As the name indicates, they constitute very different approaches to the design processes. Therefore, It is important to keep in mind that in most database design, the two approaches may need to be applied to complement with one another. But real applications are rarely simple enough to fit nicely in either of these alternatives. To identified for designing a distributed databases:

2.1.1/ Top Down Design Process

Feedback

Observation & Monitoring

Feedback

Requirement Analysis The top down design process begins with requirement analysis that defines the environment of the system and elicits both the data and processing needs of all potential database users. The requirement study also specifies where the final system is expected to stand with respect to the objectives of a distributed DBMS. View Design Regarding, to the requirement document is input to two parallel activities: view design & conceptual design. The view design activity deals with defining the interface for the end users. Conceptual Design The process by which the enterprise is examined to determine entity types and relationships among these entities called Conceptual design. It can possibly be divided into two related activity group: + Entity Analysis: is concerned with determining the entities, their attributes and the relationship among them. + Functional Analysis on the other hand is concerned with determining the fundamental functions within which the modeled enterprise is involved. Relationship It is an integration of user views. View integration should be used to ensure that entity and relationship requirements for all the views are covered in the conceptual schema. The conceptual model should support applications, but future applications also. not only the existing

Activities In conceptual design and view design the user needs to specify Data Entities and must determine the applications that will run on the database as well as Statistical information about these applications. Statistical information includes: specification of the frequency of user applications, volume of various information and alike. GCS & Access Pattern Information Design From the conceptual design step comes the definition of Global conceptual schema and access pattern information. Note: GCS & API are inputs to the distribution design step. Distribution Design The objective at this stage is to design the Local Conceptual Schema by distributing the entities over the sites of the distributed system. It is possible to treat each entity as a unit of distribution and in relational model entity corresponds to relations. Rather than distributing relations, it is quite common to divide them into sub-relations, called fragments, which are then distributed. Thus the distribution design Fragmentation and Allocation activity consists of two steps:

Physical Design Is the last step in the design process, which maps the local conceptual schemas to the physical storage devices available at the corresponding sites. The inputs to this process are the local conceptual schema and access pattern information about the fragments in these. Observation and Monitoring During, the period of design and development activity that is an ongoing process requiring constant monitoring and periodic adjustments and tuning. Here one does not monitor the behavior of the database implementation but also the suitability of user views. This results in some form of feedback, which may results in backing up to one of the earlier steps in the design.

2.1.2/ Bottom Up Design Top down design is suitable approach when a database system is being designed from scratch. Commonly, however, a number of databases already exist, and the design task involves integrating them into one database. The bottom up approach is suitable for this type of environment Design Approach The starting point of bottom up design is the individual conceptual schema. The process consists of integrating local schemas into the global conceptual schemas. This type of environment exists primarily in the context of heterogeneous databases.

Distribution Design Issues: + Why fragment at all? + How to fragment? + How much to fragment? + How to test correctness of decomposition? + How to allocate? + Information requirements? Unit of Fragmentation With respect to fragmentation, the important issue is the appropriate unit of distribution. A relation is not a suitable unit, for a number of reasons. Relation Subsets First the application views are usually subsets of relations. Therefore, the locality of the access of applications is defined not on the entire relation but on their subsets. Hence, it is natural to consider subsets of relations on distribution units. If the application which views defined on a given relation resides on different sites two alternatives can be followed, with the entire relation being the unit of distribution, either the relation is not replicated and stored at only one site or it is replicated at all or some of its sites where application reside. Problem Areas The former results in an unnecessary are high volume of remote data access. The later on the other hand, has unnecessary replication, which causes problems in executing updates and may not be desirable if storage is limited. Advantages Fragments, each being treated as a unit, permits a number of transactions to execute concurrently. In addition the fragmentation of relations typically results in parallel execution of a single query by dividing it into a set of sub-queries that operate on fragments. Thus fragmentation typically increases the level of concurrency and therefore the system throughput. Disadvantages If the application have conflicting requirements which prevents decomposition of the relation into mutually exclusive fragments, those applications whose views are defined on more than one fragment may suffer performance degradation. It might be necessary to retrieve data from two fragments and then take either their union or their join, which is costly. Avoiding this is a fundamental fragmentation issue. The second problem is related to semantic data control, specifically to integrity checking. As a result of fragmentation, attributes participating in a dependency may be decomposed into different fragments which might be allocated to different sites. In this case even the simpler task of checking for dependencies would result in chasing after data in number of sites.

Fragmentation Alternatives Relation instances are essentially tables, so the issue is finding one of the alternative way of dividing a table into smaller ones. There are clearly two alternatives for this: Dividing Horizontally Dividing Vertically Degree of Fragmentation Its very small units of fragmentation, that extent to which the database should be fragmented is an important decision that affects the performance of query execution. The degree of fragmentation goes from one extreme, that is not to fragment at all, to the other extreme, to fragment to the level of individual tuples (in case of horizontal fragmentation) or to the level of individual attributes (in case of vertical fragmentation). What is needed is that to find a suitable level of fragmentation which is a compromise between the two extremes. Such a level can only be defined with respect to the application that will run on the database How Issue Characterized with respect to a number of parameters. According to the value of these parameters, individual fragments can be identified. Correctness of Fragmentation The following three rules can be enforced during fragmentation which together will ensure that the database does not undergo semantic changes during fragmentation. + Completeness: This property which is identical to the lossless decomposition property of normalization is also important in fragmentation since it ensures that the data in global relation is mapped into fragments without any loss. If a relation instance R is decomposed into fragments R1, R2, Rn , each data item that can be found R can also be found in one or more Ris. + Reconstruction: This operator will be different for different forms of fragmentation. The reconstruct ability of the relation from its fragments ensures that constraints defined on the data in the form of dependencies are preserved. Rn , then there should exist some relational operator such that R = Ri, Ri FR + Disjointness: If relation R is decomposed into fragments R1, R2, Rn , and data item di is in Rj, then di should not be in any other fragment Rk (k j ). This criterion ensures that the horizontal fragments are disjoint. If relation R is vertically decomposed, its primary key attributes are typically repeated in all its fragments. Therefore, in case of vertical partitioning, disjointness is defined only on non primary key attributes of a relation.

3/ Horizontal Fragmentations
Primary horizontal partition is performed using predicates that are defined on that relation. Derived horizontal partition is the partitioning of a relation which results from the predicates being defined on another relation. There are two versions of horizontal partitioning: Primary Derived
PNO PNAME P1 P2 P3 P4 P5 Instrumentation Database Develop CAD/CAM Maintenance CAD/CAM BUDGET LOC 150000 135000 250000 310000 500000 Montreal New York New York Paris Boston

PROJ

PNO PNAME P1 P2 Instrumentation Database Develop

BUDGET LOC 150000 135000 Montreal New York

PROJ1 : Projects with budgets less than $200,000 PROJ2 : Projects with budgets greater than or equal to $200,000

PNO PNAME P3 P4 P5 CAD/CAM Maintenance CAD/CAM

BUDGET LOC 250000 310000 500000 New York Paris Boston

3.1/ Database Information Requirements

The database information concerns the global conceptual schema. The important issue is how the database relation are connected to one another especially with joins. The quantitative information required about the the database is the cardinality of each relation R denoted as card(R). As Example:

3.2/ Application Information

The fundamental qualitative information consists of the predicates used in user queries. It is not possible to analyze all of the user applications to determine the predicates one should at least navigate for the most important ones It has been suggested that as a rule of thumb, the most active 20% of user queries account for 80% of the total data access. This 80/20 rule may be used

3.2.1/ Quantitative Information In terms of quantitative information about user applications two sets of data is required 3.2.1.1/ Minterm Selectivity [sel(mi)]: refers to the number of tuples of a relation that would be accessed by user query specified by minterm predicate mi. For example: the selectivity of m1 is 0 since there are no tuples in PROJ that satisfy the minterm predicate. The selectivity of m2 is 2 3.2.1.2/ Access Frequency acc(qi): refers to the frequency with which user application access data. For instance: If Q = {q1, q2, , qn}is a set of user queries, acc(qi) indicates the access frequency of query qi in a given period.

3.3/ Simple Predicates

Even though simple predicates are quite elegant to deal with, user queries quite often include more complicated predicates which are Boolean combination of simple predicates. One combination that we are particularly interested in called a minterm predicate, is a simple conjunction of simple predicates Given a relation R(A1, A2, , An), where Ai is an attribute defined over domain Di, a simple predicate pj defined on R has the form pj : Ai Value where {=,<,,>,,}, and Value is chosen from the domain of Ai (Value Di) For relation Ri we define Pri = {pi1, pi2, ,pim} Examples: Given the relation instance PROJ of Figure 3.1 and PAY of Figure in Exercises p1: PNAME = Maintenance p2: BUDGET 200000 p3: TITLE = Elect.Eng. p4: TITLE = Programmer are the simple predicates p5: SAL 30000 p6: SAL > 30000 3.3.1/ Minterm Predicates Mi={mi1,mi2,,miz} can be defined as Mi ={mij|mij = pikPri pik* }, 1km, 1jz where pik* = pik or pik* = (pik). Examples: m1: PNAME = MaintenanceBUDGET 200000 m2: NOT(PNAME=Maintenance)BUDGET 200000 m3: PNAME=Maintenance NOT(BUDGET 200000) m4: NOT(PNAME=Maintenance) NOT(BUDGET 200000)

3.4/ Completeness of Simple Predicates

A set of simple predicates Pri is said to be completed if and only if there is an equal probability of access by every application to any tuple belonging to any minterm fragment that is being defined according to Pri

3.5/ Minimality of Simple Predicates

If a predicate influences how fragmentation is performed, (i.e., causes a fragment f to be further fragmented into, say, fi and fj) then there should be at least one application that accesses fi and fj differently. In other words, the simple predicate should be relevant in determining a fragmentation.

4/ Primary Horizontal Fragmentation

A primary horizontal fragmentation is defined by a selection operation on the owner relations of a database schema. Given a relation R, its horizontal fragments are given by Rj = Fj (R), 1 j Where Fj is the selection formula which is a minterm predicate. Example The decomposition of the relation PROJ into horizontal fragments PROJ1 and PROJ2 is defined as follows: PROJ1 = s BUDGET200000 (PROJ) PROJ2 = s BUDGET>200000(PROJ) Example Horizontal Fragmentation based on project location. PROJ1 = s LOC=Montreal (PROJ) PROJ2 = s LOC=New York (PROJ) PROJ3 = s LOC=Paris (PROJ)

PROJ1
PNO PNAME P1 BUDGET LOC Montreal Instrumentation 150000

PROJ2
PNO PNAME P2 Database Develop P3 BUDGET 135000 250000 LOC New York New York

PROJ3
PNO PNAME P4 Maintenance BUDGET 310000 LOC Paris

5/ Derived Horizontal Fragmentation

A derived horizontal fragmentation is defined on a member relation of a link according to a selection operation specified on its owner. The link between the owner and member relations is defined as An equijoin which can be implemented by means of semijoins. Given a link L where owner(L)= S and member(L) = R, the derived horizontal fragments of R can be defined as Ri=R Si, 1iw where w is the maximum number of fragments. Si = s Fi (S) where Fi is the formula according which the primary horizontal fragment Si is defined.

Example Given a link L1 where owner(L1) = PAY and member(L1) = EMP EMP1 = EMP PAY1 EMP2 = EMP PAY2 Where PAY1 = s SAL 30000 (PAY) PAY2 = s SAL > 30000 (PAY)

EMP2 EMP1
ENO E3 E4 E7 ENAME A.Lee J.Miller R.Davis TITLE Mech.Eng. Programmer Mech.Eng. ENO E1 E2 E5 E6 E8 ENAME J.Doe M.Smith B.Casey L.Chu J.Jones TITLE Mech.Eng. Syst.Anal. Syst.Anal. Mech.Eng. Syst.Anal.

Needed To carry out a derived horizontal fragmentation three inputs are needed: The set of partitions of the owner relation The member relation The set of semijoin predicates between the owner and the member. (e.g EMP.TITLE = PAY.TITLE)

5.1/ Checking for Correctness

It is important to check for correctness of the fragments with respect to the following three criteria: 5.1.1/ Completeness The completeness of horizontal fragmentation is based on the selection of the predicates used. As long as the selection predicates are complete the resulting fragmentation is guaranteed to be complete. Let R be the member relation of a link whose owner relation S, which is fragmented as Fs = {S1, S2,, Sw). Further more let A be the join attribute between R & S. then each tuple t of R, there should be a tuple t of S such that t[A] = t[A] 5.1.2/ Reconstruction Reconstruction of a global relation from its fragments is performed by the Union operator in both primary and derived fragmentation. Thus, for a relation R fragmentation FR = {R1,R2,Rw}, R= U Ri for all Ri belongs to FR 5.1.3/ Disjointness It is easier to establish disjointness of fragmentation for primary then derived horizontal fragmentation. In the former case disjointness is guaranteed as long as the minterm predicates determining the fragmentation are mutually exclusive.

6/ Minterm Fragments
A horizontal fragment Ri of relation R consists of all the tuples of R which satisfy a minterm predicate mi. Given a set of minterm predicates M, there are as many horizontal fragments of relation R as there are minterm predicates. Set of horizontal fragments also referred to as minterm fragments

7/ Vertical Fragmentations
PNO P1 P2 P3 P4 P5 PNAME Instrumentation Database Develop CAD/CAM Maintenance CAD/CAM BUDGET 150000 135000 250000 310000 500000 LOC Montreal New York New York Paris Boston

PROJ

PROJ1 : Information about project budgets

Information about project names and locations PROJ2 :

PNO P1 P2 P3 P4 P5

BUDGET 150000 135000 250000 310000 500000

PNO P1 P2 P3 P4 P5

PNAME Instrumentation Database Develop CAD/CAM Maintenance CAD/CAM

LOC Montreal New York New York Paris Boston

7.1/ Need
Vertical fragmentation of a relation R produces fragments R1, R2, , Rr, each of which contains a subset of Rs attributes as well as the primary key of R. The objective of vertical fragmentation is to partition a relation into a set of smaller relations so that many of the user applications will run on only one fragment. In this context optimal fragmentation is one that produces a fragment scheme which minimizes the execution time of user applications that runs on these fragments.

7.2/ Motivation
VF in the context of a design tool allows the user queries to deal with smaller relations thus causing a smaller number of page access. It has also been suggested that most active sub relations can be identified and placed in faster memory subsystem where memory hierocracies are supported

7.3/ More Complicated

Vertical partitioning is more complicated than horizontal due to more number of alternatives. In horizontal if the total number of simple predicates in Pr is n there are 2n possible minterm predicates some of these will contradict the existing implications. In vertical if a relation has m non primary key attributes, the number of possible fragments is equal to B(m) mm.

7.4/ Heuristics Approach

To obtain optimal solutions to vertical partitioning problem two types of heuristics approaches is usually preferred.

7.4.1/ Grouping Starts by assigning each attribute to one fragment, and at each step, joins some of the fragments until some criteria is satisfied. This technique was first suggested for centralized database and then used later for distributed databases. 7.4.2/ Splitting Starts with a relation and decides on beneficial partitioning based on the access behavior of applications to the attributes. Splitting technique is a more preferred approach as it fits more naturally within the top down design methodology. Further more splitting generates non overlapping fragments where as grouping typically results in overlapping fragments. Of course, none overlapping refers only to non primary key attributes.

7.5/ Information Requirements

The major information required for vertical fragmentation is related to applications. Since vertical partitioning places in one fragment those attributes usually accessed together there is a need for some measure that would define more precisely the notion of togetherness

7.6/ Attribute Affinity

Attribute affinity is the measure of togetherness of attributes which indicates how closely related the attributes are. Unfortunately, it is not realistic to expect the designer to be able to specify these values. One way by which they can be obtained from more primitive data.

7.7/ Attribute Usage Values

Another major data requirement related to applications is their access frequencies. Let Q = {q1, q2, qq} be the set of user queries that will run on relation R(A1, A2, , An). Then for each query qi and each attribute Aj an attribute usage value is associated denoted as use(qi, Aj), and defined as follows: use (qi, Aj ) =1 if attribute Aj is referenced by query qi and 0 otherwise.

8/ Hybrid Fragmentation
In most cases a simple horizontal or vertical fragmentation of a database schema will not be sufficient to satisfy the requirements of user application. In such case a vertical fragmentation may be followed by a horizontal one or vice versa, producing a tree structured partitioning. Since the two types of partitioning strategies are applied one after the other it is called Hybrid Fragmentation.

9/ Allocation Alternatives
Assuming that the database is fragmented properly, one has to decide on the allocation of the fragments to various sites on the network. When data is allocated, it may either be replicated or maintained as a single copy. There are two modes of allocation alternatives:

9.1/ Non-replicated
Partitioned: each fragment resides at only one site

9.2/ Replicated
Fully Replicated: each fragment at each site Partially Replicated: each fragment at some of the sites

10/ Allocation Problem

Assume, there are a set of fragments F = {F1, F2,Fn} and a network consisting of sites S = {S1, S2, , Sm}on which a set of applications Q = {q1, q2, qq} is running. The allocation problem involves finding the optimal distribution of F to S. Which optimality can be defined with respect to

10.1/ Minimal Cost

The allocation problem, then attempts to find an allocation scheme that minimizes a combined cost function. Cost function consists of: The cost of Storing of each Fi at site Sj The cost of querying Fi at site Sj The cost of updating Fi at all the sites The cost of data communication

10.2/ Performance
The allocation strategy is designed to maintain a performance metric. The objective is to minimize the response time, maximize the system throughput at each site.

10.3/ Information Requirements

10.3.1/ Database Information - selectivity of fragments - size of a fragment 10.3.2/ Application Information number of read accesses of a query to a fragment number of update accesses of query to a fragment A matrix indicating which queries updates which fragments A similar matrix for retrievals originating site of each query

10.3.3/ Site Information - unit cost of storing data at a site - unit cost of processing at a site 10.3.4/ Networking Information - communication cost/frame between two sites - frame site

11/ Reason of Replication

The reasons of replication are reliability and efficiency of read-only queries. If multiple copies of a data item, there is a good chance that some copy of data exists on multiple sites. On the other hand, the execution of update queries cause trouble since the system has to ensure that all the copies of the data are updated properly.

12/ Thumb Rule

Hence the decision regarding replication is a trade off which depends on the ratio of read-only queries to update queries. If

read - only queries update queries

>= 1

then, replication is advantageous, otherwise replication causes a problem

Chapter 4 Transaction Management 1/Definition of Transaction

In computer programming, a transaction usually means a sequence of information exchange and related work (such as database updating) that is treated as a unit for the purposes of satisfying a request and for ensuring database integrity. For a transaction to be completed and database changes to made permanent, a transaction has to be completed in its entirety. A typical transaction is a catalog merchandise order phoned in by a customer and entered into a computer by a customer representative. The order transaction involves checking an inventory database, confirming that the item is available, placing the order, and confirming that the order has been placed and the expected time of shipment. If we view this as a single transaction, then all of the steps must be completed before the transaction is successful and the database is actually changed to reflect the new order. If something happens before the transaction is successfully completed, any changes to the database must be kept track of so that they can be undone. A program that manages or oversees the sequence of events that are part of a transaction is sometimes called a transaction monitor. Transactions are supported by Structured Query Language (SQL), the standard database user and programming interface. When a transaction completes successfully, database changes are said to be committed, when a transaction does not complete, changes are rolled back. The transactions were set up from the number of the Read and Write on database, as a queue, through a specific measurement. Mean that Transaction is a program which covered query to access on database [Padadimitriou, 1986]. Transaction is an executed command [Ullman, 1988] reliable computation. Intuitively, a transaction takes a database, performs an action on it, and generates a new version of the database, causing a state transition. A transaction is a collection of actions that make consistent transformations of system states while preserving system consistency. Example: Two sample transactions. a) T1 Read (X); X:=X-N; Write (X); Read (Y); Y:=Y+N; Write (Y); b) T2 Read (X); X:=X+M; Write (X);

Example: A Simple SQL Query Look on SQL query which increase more budget CAD/CAM project up tp 10% UPDATE PROJ SET BUDGET = BUDGET *1.1 WHERE PNAME = CAD/CAM

Assume BUDGET_UPDATE is name of transaction. Through the SQL, A transaction was structured as below: Begin_transaction BUDGET_UPDATE begin EXEC SQL UPDATE PROJ SET BUDGET = BUDGET*1.1 WHERE PNAME = CAD/CAM end. Example: Airline Database with the relations: - FLIGHT(FNO, DATE, SRC, DEST, STSOLD, CAP) - CUST(CNAME, ADDR, BAL) - FC(FNO, DATE, CNAME, SPECIAL) Begin_transaction Reservation begin input(flight_no, date, customer_name); EXEC SQL UPDATE FLIGHT SET STSOLD = STSOLD + 1 WHERE FNO = flight_no AND DATE = date; EXEC SQL INSERT INTO FC(FNO, DATE, CNAME, SPECIAL); VALUES (flight_no, date, customer_name, null); output(reservation completed) end. {Reservation} Unit of Computing In the former database, there is no concept of Consistent Execution or Reliable Computation associated with the concept of query. According to the experiences of database technical and developer such as: What happens if two queries attempt to update the same data item concurrently? What happens when system failure occurs during the execution of a query? Thus, the concept of a transaction is used within the database domain as a basic unit of consistent and reliable computing. More over, the more database concept, which database specialist needs as below:

2/Database Consistency
A database is a consistent state if it obeys all of the consistency (integrity) constraints defined over it. And its state changes occur due to modifications, insertions, deletions (together called updates). Which is the best point of database should never enter an inconsistent state. An important property is that the database can be temporarily inconsistent during the execution of transaction but should be consistent again when the transaction terminates.

3/ Transaction Consistency
Transaction Consistency: refers to the actions of concurrent transactions. The database should remain in a consistent state even if there are a number of user requests that are concurrently accessing the database. A complication arises when replicated database are considered.

4/ Replica Consistency
A replicated database is a mutually consistent state if all the copies of every data item in it have identical values. This is called one copy equivalence since all replica copies are forced to assume the same state at the end of a transactions execution.

5/ Reliability
Reliability refers to both the resiliency of a system to various system failures and its capability to recover from them. A resilient system is tolerant of a system failure and can continue to provide services even when failure occurs. A recoverable DBMS is one that can get to a consistent state (by moving back or forward) following various types of failures.

5.1/ Transaction Termination

A transaction always terminates, even when there are failures. If a transaction can complete its task successfully, we say the transaction commits. And if a transaction stops without completing its tasks we say it aborts.

5.2/ Transaction Aborts

Transaction may aborts for a number of reasons such: A transaction may abort itself because of a condition that would prevent it from completing its tasks successfully. Additionally the DBMS may abort a transaction due to say a deadlock or other conditions. When a transaction is aborted, its execution is stopped and all of its already executed actions are undone by returning the database to a state prior to its execution known as rollback

5.3/ Transaction Commit

The importance of commit is two fold: The commit command signals to the DBMS that the effects of the transaction should now be reflected in the database, thereby making it visible to other transactions. The point at which the transaction is committed is a point of no return. The results of the committed transactions are now permanently stored in the database and cant be undone.

Example: Begin_transaction Reservation begin input(flight_no, date, customer_name); EXEC SQL SELECT STSOLD,CAP INTO temp1,temp2 FROM FLIGHT WHERE FNO = flight_no AND DATE = date; if temp1 = temp2 then output(no free seats); Abort else EXEC SQL UPDATEFLIGHT SET STSOLD = STSOLD + 1 WHERE FNO = flight_no AND DATE = date; EXEC SQL INSERT INTO FC(FNO, DATE, CNAME, SPECIAL); VALUES (flight_no, date, customer_name, null); Commit output(reservation completed) endif end end. {Reservation}

6/ Flat Transactions
Flat transactions have a single start point (Begin_transaction) and a single termination point. (End_transaction). Consists of a number of primitive operations embraced between begin and end markers.

7/ Nested transaction
The operations of a transaction may themselves be transactions (many Flat Transaction are mixed). Begin_transaction Reservation Begin_transaction Airline end. {Airline} Begin_transaction Hotel end. {Hotel} end. {Reservation}

8/ Characterization of Transaction
The data items that a transactions reads are said to constitute its read set (RS). Similarly, the data items that a transaction writes are said to constitute its write set (WS). Finally the union of the read set and write set of a transaction constitutes its base set (BS = RS U WS) Example RS[Reservation]= {FLIGHT.STSOLD, FLIGHT.CAP} WS[Reservation]= {FLIGHT.STSOLD, FC.FNO, FC.DATE, FC.CNAME, FC.SPECIAL} BS[Reservation]= {FLIGHT.STSOLD, FLIGHT.CAP, FC.FNO, FC.DATE, FC.CNAME,
FC.SPECIAL}

9/ Properties of Transactions
The consistency and reliability aspects of transaction are due to four properties, which commonly referred as the ACIDity of transactions.

9.1/ Atomicity
Atomicity: refers to the fact a transaction is treated as a unit of operation. Therefore, either all the transactions actions are completed, or none of them are. This is also called as all or nothing property Atomicity requires that if the execution of transaction is interrupted by any sort of failure, the DBMS will be responsible for determining what to do with the transaction upon recovery from failure. It can either be terminated by completing the remaining actions; and/or it can be terminated by undoing all the actions that have been executed. 9.1.1/ Failure Classification Generally there are two types of failures that the transaction may fail due to Input data errors, deadlocks and other factors. In this case either the transaction aborts itself or the DBMS may abort it while handling deadlocks. Maintaining transaction atomicity in the presence of this type of failure is called transaction recovery. 9.1.2/ Crash recovery A transaction may fail caused by system crashes such as storage media failures, processors failures, communication link breakages, power outages and so on To ensuring the transaction atomicity in the presence of system crashes is called crash recovery

9.2/ Consistency
The consistency of a transaction is simply its correctness. In other words, a transaction is a correct program that maps one consistent state to another. Transaction consistency is ensured by semantic data control and concurrency control mechanisms 9.2.1/ Consistency Classification This classification groups database into four levels of consistency. This uses the concept of dirty data which refers to data values that have been updated by a transaction prior to its commitment. Based on the concept of dirty data the four levels of are, which called Consistency Degrees defined as follows: + Degree 0: Transaction T does not overwrite dirty data of other transactions. + Degree 1 Transaction T does not overwrite dirty data of other transactions. T does not commit any writes before EOT + Degree 2 Transaction T does not overwrite dirty data of other transactions. T does not commit any writes before EOT T does not read dirty data from other transactions. + Degree 3 Transaction T does not overwrite dirty data of other transactions. T does not commit any writes before EOT T does not read dirty data from other transactions. Other transactions do not dirty any data read by T before T completes.

9.3/ Isolation
Isolation is the property of transactions which requires each transaction to see a consistent database at all times. In other words, an executing transaction cannot reveal its results to other concurrent transactions before its commitment. These isolations are: Isolation Levels Based on these phenomena, the isolation levels are defined as: + Read Uncommitted: there are three phenomena are possible for transactions operating at this level. + Read Committed: Fuzzy reads and phantoms are possible, but dirty reads are not. + Repeatable Read: only phantoms are possible. + Anomaly Serializable: none of the phenomenon are possible. 9.3.1/ Serializability If several transactions are executed concurrently, the results must be the same as if they were executed serially in some order. 9.3.2/ Incomplete results An incomplete transaction cannot reveal its results to other transactions before its commitment. Necessary to avoid cascading aborts. Example

9.3.3/ Dirty Read It refers to data items whose values have been modified by a transaction that has not yet committed. T1 modifies x which is then read by T2 before T1 terminates; T1 aborts T2 has read value which never exists in the database. Example ,W1(x),,R2(x),,C1(or A1),,C2(or A2) or ,W1(x),,R2(x),,C2(or A2),,C1(or A1)

9.3.4/ Non-repeatable or Fuzzy Read Transaction T1 reads a data item value. Another transaction T2 then modifies or deletes that data item and commits. If T1 then attempts to reread the data item, it either reads a different value or it cannot find the data item at all; thus two reads within the same transaction T1 return different results.

Example

,R1(x),,W2(x),,C1(or A1),,C2(or A2) or ,R1(x),,W2(x),,C2(or A2),,C1(or A1)

9.3.5/ Phantom The phantom condition occurs when T1 does a search with a predicate and T2 inserts new tuples that satisfies the predicate. Example ,R1(P),,W2(yinP),,C1(or A1),,C2(or A2) or ,R1(P),,W2(yinP),,C2(or A2),,C1(or A1)

9.4/ Durability
Durability refers to that property of transactions which ensures that once a transaction commits, its results are permanent and cannot be erased from database. Therefore, the DBMS ensures that the results will survive subsequent system failures. The durability property brings forth the issue of database recovery, is how to recover the database into a consistent state where all the committed actions are reflected. Transaction Model

10/ Transaction Manager

On behalf of an application, transaction manager is responsible for coordinating the execution of the database operations by recording the transaction name and recording of originating application name

11/ Scheduler
In Oracle, archive mode, is a scheduler which records all activities, which have been done. The scheduler, on the other hand, is responsible for the implementation of a specific concurrency control algorithms. Synchronizes database access for various database operations with the data processors

12/ Local Recovery Manager

Participates in the management of distributed transactions resides in each site. Implements the local procedure by which the local database can be recovered to a consistent state following a failure.

12.1/ Transaction Management

Each transaction originates at one site which is called its originating site. The execution of the database operations of a transaction is coordinated by the TM at that transactions originating site. Each transaction is implemented by an interface for every application program.

12.2/ Application Program

Each transaction managers implement an interface for application programs which consists of five commands, as follows: 12.2.1/ Begin-transaction This is an indicator to the TM that a new transaction is starting. 12.2.2/ Read If the data item x is stored locally, its value is read and returned to the transaction. Otherwise, the TM selects one copy of x and requests its copy to be returned. 12.2.3/ Write The TM coordinates the updating of Xs value at each site where it resides. 12.2.4/ Commit The TM coordinates the physical updating of all databases that contain copies of each data item for which a previous write was issued. 12.2.5/ Abort The TM makes sure that no effects of the transaction are reflected in the database

12.3/ Centralized Transaction Execution

12.4/ Distributed Transaction Execution

Chapter 5 Distributed Concurrency Control (DCC) 1/ CC in Distributed DBMS

Concurrency control deals with the isolation and consistency property of transactions. The distributed DBMS ensures that the consistency of database is maintained in a multi user distributed environment. If transactions are internally consistent the simplest way of achieving this objective is to execute each transaction alone one after another.

2/ Key Issue
The level of concurrency i.e the number of concurrent transactions is probably the most important parameter in distributed systems. Therefore, the concurrency control mechanism attempts to find a suitable trade off between maintaining the consistency of the database and maintaining a high level of concurrency. Isolating transactions from one another in terms of their effects on the database is an important issue of distributed DBMS. If the concurrent execution of transactions leaves the database in a state that can be achieved by their serial execution in some order, problems such as lost updates will be resolved.

3/ Serializability Theory 3.1/ Serializability

If several transactions are executed concurrently, the results must be the same as if they were executed serially in some order.

3.2/ Schedule
Schedule is a prefix of a complete schedule such that only some of the operations and only some of the ordering relationships are included. For instance, a schedule S is defined over a set of transactions T = {T1, T2,, Tn} and specifies an interleaved order of execution of these transactions operations. 3.2.1/ Serial Schedule A schedule S is serial if, for every transaction T participating in the schedule, which all the operations of T are executed consecutively in the schedule. 3.2.2/ Serializable Schedule If schedule S is on n transactions is serializable, if it is equivalent to some serial schedule of the same n transactions. Result Equivalent: - Two schedules are result equivalent if they produce the same final state of the database. - However, two different schedules may accidentally produce the same final state.

3.3/ Problem
- Input: Schedule S create by a set of transactions T = {T1, T2, , Tn } - Output: Determine S is serializable or Non-Serializable? If S is serializable, find serial schedule that is equivalent to S?

Algorithm to Test a Schedule for Serializability by Lock This algorithm looks at only the Lock and Unlock operations, which to construct a precedence graph (also called serialization graph) that is a directed graph G=(N,E) that consists of a set of nodes N = {T1, T2, , Tn } and a set of directed edges E = {e1, e2, , em } The edges can optionally be labeled by the name of the data item that led to creating the edge 3.3.1/ Lock All Transactions indicate their intentions by requesting locks from the scheduler (called lock manager) and locks are either read lock (rl) [also called shared lock] or write lock (wl) [also called exclusive lock]. Locking works nicely to allow concurrent processing of transactions because A Transaction locks an object before using it. When an object is locked by another transaction, the requesting transaction must wait. When a transaction releases a lock, it may not request another lock. Note: Read locks and write locks conflict (because Read and Write operations are incompatible

Read Lock Write Lock

3.3.2/ Cycle

Read Lock yes no

Write Lock no no

A cycle in a directed graph is a sequence of edges C=(T1T2...Tn-1 Tn T1) with the property that: The starting node of each edge, except the first edge, is the same as the ending node of the previous edge and the starting node of the first edge is the same as the ending node of the last edge. 3.3.3/ Linear order or Topo order For each case in G, look for node Ti that do not precede by one or more directed edges or the directed edges do not face toward Ti, then delete Ti and their followed edges from G, and put Ti in the first order (1). Then, continue to do as (1) until has no node and put it in the next order until the last order. That this order is Linear order or Topo order Consider the schedule S bellow and determine that is it serializable or not?

Algorithm to Test a Schedule for Serializability by RLock, WLock - Input: Schedule S create by a set of transactions T = {T1, T2, , Tn } - Output: Determine S is serializable or Non-Serializable? If S is serializable, find serial schedule that is equivalent to S? There are some differences of how to determine the edges: For transaction Ti in schedule S executes RLock (X) or WLock (X), Tj (j i) is the next transaction executes WLock (X), create an edge {Ti Tj}in the precedence graph. For transaction Ti executes WLock (X) in schedule S, transaction Tm (m i) executes a RLock (X) after Ti Unlock (X), but before the other transactions execute WLock (X), create an edge {Ti Tm} in the precedence graph. G has cycle schedule S is nonserializable, Otherwise S is serializable determine its linear order or topo order. Consider the schedule S bellow and determine that is it serializable or not?

4/ Taxonomy
There are a number of ways that the concurrency control approaches can be classified. The concurrency control mechanisms can be classified into two broad classes.

4.1/ Pessimistic Concurrency Control

Pessimistic algorithms synchronize the concurrent execution of transactions early in the execution life cycle. The pessimistic group is consist of Locking Based Algorithms, Ordering Based Algorithms and Hybrid Algorithms.

4.2/ Optimistic Concurrency Control

Optimistic algorithms delay the synchronization of transactions until their terminations. The optimistic group can simply be classified as Locking based or Time Stamp Ordering based

4.2.1/ Lock based Approach In the locking based approach, the synchronization of transactions is achieved by employing physical or logical locks on some portion or granule of the database. The size of these portions (locking granularity) is an important issue. There are three modes of lock based: 4.2.1.1/ Centralized Locking In centralized locking, one of the sites in the network is designated as the primary site. It is the site where the lock tables for the entire database are stored. Note: This site is charged with the responsibility of granting locks to transactions 4.2.1.2/ Primary Copy Locking In primary copy locking one of the copies of each lock unit is designated as the primary copy. It is this copy that has to be locked for the purpose of accessing data. If the database is not replicated the primary copy locking mechanism distribute the lock management among all the sites. 4.2.1.3/ Decentralized Locking In decentralized locking the lock management duty is shared by all the sites of a network. In this case the execution of the transaction involves the participation and coordination of scheduler at more than one sites. Each local scheduler is responsible for lock units local to that site. 4.2.2/ Time Stamp Ordering The timestamp ordering (TO) class involves organizing the execution order of transactions so that they maintain mutual and inter consistency. The ordering is maintained by assigning timestamps to both the transactions and data items. The various types of Time Stamp Ordering algorithms are as under: - Basic Time Stamp Ordering - Multi Version Time Stamp Ordering - Conservative Time Stamp Ordering - Hybrid Algorithms

5/ Locking Based CC Algorithms

The main idea of locking based concurrency control is to ensure that the data that is shared by conflicting operations is accessed by one operation at a time. A lock unit cannot be accessed by any operation if it is already locked by another. This lock is set by a transaction before it is accessed and is reset at the end of its use, which accomplished by associating a lock with each lock unit. Thus, a lock request by a transaction is granted only if the associated lock is not being held by any other transaction. There are two types of locks commonly called lock modes: Read Lock (RLock) and Write Lock (WLock). In distributed DBMS, not only manages locks but also handles the lock management responsibilities on behalf of the transactions. In other words, users do not need to specify when data needs to be locked. The distributed DBMS takes care of that every time transaction issued a read or write operation. In locking based systems, the scheduler is the lock manager (LM). The transaction manager passes to the lock manager the database operation (read/write) and associated information (such as the item that is accessed and the identifier of that transaction that issues the database operation)

The lock manager then checks if the lock unit then contains the data item is already locked. If so and if the existing lock mode is in compactable with that of the current transaction, the current operation is delayed. Otherwise the lock is set in the desired mode and the database operation is passed on to the data processor for actual database access. The transaction manager is then informed of the results of the operation. The termination of a transaction results in the release of its locks and the initiation of another transaction that might be waiting for access on the same data item.

5.1/ 2PL (Two Phase Locking)

The two phase locking rule simply states that no transaction should request a lock after it releases one of its locks. Alternatively, a transaction should not release a lock until it is certain that it will not request another lock. 2PL algorithms: execute transactions in two phases. Each transaction has a growing phase where it obtains locks and access data items and a shrinking phase is one during which it releases locks.

5.2/ Lock Point

Lock point is the moment when the transaction has achieved all its locks but has not yet started to release any of them. Thus, the lock point can determines the end of the growing phase and the beginning of the shrinking phase of the transaction; and any schedule generated by a concurrency control algorithm that obeys the 2PL rule is serializable.

5.3/ Lock Graph

The lock graph indicates the lock manager releases locks as soon as access to that data item has been completed. This permits other transactions awaiting access to go ahead and lock it, thereby increasing the degree of concurrency. This is a much difficult task for the concurrency control manager.

5.4/ Requisites
The lock manager has to know that the transaction has obtained all its locks and will not need to lock another data item and also needs to know that the transaction manager no longer needs to access the data item so that the lock can be released.

5.5/ Lock Compatibility

Two lock modes are compactable if two transactions which access the same data item can obtain these locks on that data item at the same time

RLock(X) WLock(X)
6/ Deadlock

RLock(X) Compactable Not Compactable

WLock(X) Not Compactable Not Compactable

A transaction is deadlocked if it is blocked and will remain blocked until there is intervention. Locking-based CC algorithms may cause deadlocks and TO-based algorithms that involve waiting may cause deadlocks. For instance, If transaction Ti waits for another transaction Tj to release a lock on an entity, then Ti -> Tj in Wait-for graph (WFG). Any locking based concurrency control algorithms may result in deadlocks, since there is mutual exclusion of access to shared resources (data) and transaction may wait on locks. Further, some TO based algorithms that require the waiting of transactions may also cause deadlocks.

7/ Why Deadlocks
Deadlock is a permanent phenomenon. If one exists in a system, it will not go away unless outside intervention takes place. A deadlock can occur because transactions wait for one another. Informally, a deadlock situation is a set of requests that can never be granted by concurrency control mechanisms. This outside interference may come from the user, the system operator, or the software system (the operating system, or the distributed DBMS)

8/ Methods
There are three known methods for handling deadlocks.

8.1/ Deadlock Prevention

Deadlock prevention methods guarantee that deadlocks can not occur in the first place. Thus, a transaction manager checks a transaction when it is first initiated and does not permit it to proceed if it may cause a deadlock. 8.1.1/ Requirements To perform this check, it is required that all the data items that it will accessed by the transaction be pre declared. The TM, then, permits a transaction to proceed if all the data items that it will access are available otherwise it is not permitted to proceed. 8.1.2/ Problem Area Unfortunately, such systems are not suitable for database environments as it is very difficult to predict in advance which data item will be accessed by the transaction, which access to certain data may depend on the conditions that may not be resolved until run time. This would certainly reduce concurrency. Furthermore, there is additional overhead in evaluating whether a transaction can proceed safely or not. On the other hand, such it is not necessary to abort and restart a transaction due to deadlocks.

8.2/ Deadlock Avoidance

Deadlock avoidance schemes employ the concurrency control techniques that will never result in deadlocks and schedulers detect potential deadlock situations in advance and ensure that they will never occur 8.2.1/ Methods Avoiding deadlocks requires the ordering of resources and insist each process request access to these resources in that order. Accordingly, the lock units in the distributed database are ordered and transaction always request locks in that order. This ordering of lock units may be done either globally or locally at each site. Besides it is also necessary to order the sites and transactions which access data items at multiple sites visit the sites in the predefined order. Another alternative is to make use of transaction timestamps to prioritize transactions and resolve deadlocks by aborting transactions with higher priorities. To implement this lock manager needs to be modified in some other way. - If a lock request of a transaction Ti is denied, the lock manager does not automatically force Ti to wait. - Instead it applies a prevention test to requesting transaction and the transaction that currently holds the lock (Tj). - If the test is passed Ti is permitted to wait for Tj; otherwise, one transaction or the other is aborted.

1/ Wait Die Rule: If Ti requests a lock on a data item that is already locked by Tj, Ti is permitted to wait if and only if Ti is older than Tj. If Ti is younger than Tj, then Ti is aborted (die) and restarted with the same time stamp. begin Ti requests lock on data item currently held by Tj if ts(Ti) < ts(Tj) (Ti is older than Tj) then Ti waits for Tj else Ti die (is rolled back) end-if end 2/ Wound Wait Rule If Ti request a lock on a data item that is already locked by Tj, then Ti is permitted to wait if and only if it is younger then Tj. Otherwise, Tj is aborted (wounded) and the lock is granted to Ti. begin Ti requests lock on data item currently held by Tj if ts(Ti) < ts(Tj) (Ti is older than Tj) then Tj wound (is rolled back) else Ti waits for Tj end-if end

8.3/ Deadlock Detection and Resolution

Deadlock detection and resolution is the most popular and best studied method. Detection is done by studied by the GWFG for formation of cycles. 8.3.1/ Resolution Resolution of deadlocks is done by selection of one or victim transaction(s) that will be preempted and aborted in order to break the cycles in the GWFG. Under the assumption that the cost of preempting each member of a set of deadlocked transactions is calculated and the one that is minimum is selected as the victim. 8.3.2/ Cost Factor The amount of effort that has been invested in the transaction. This effort will be lost if the transaction is aborted, which the cost of aborting the transaction. This cost generally depends on the number of updates that the transaction has already performed. The amount of effort it will take to finish executing the transaction. The scheduler wants to avoid aborting a transaction that is almost finished The number of cycles that contain the transaction. Since aborting a transaction breaks all cycles that contain it, it is best to abort transactions that are part of more than one cycle.

8.3.3/ Deadlock Detection There are three fundamental methods of detecting distributed deadlocks. They are commonly called: 1-Centralized Deadlock Detection In the centralized deadlock detection approach, one site is designated as the deadlock detector for the entire system. Periodically, each lock manager transmits its LWFG to the deadlock detector, which then forms the GWFG and looks for cycles in it. Centralized deadlock detection has been proposed for distributed INGRES. This method is simple and natural if the concurrency control algorithm were centralized 2PL. 2-Hierarchical Deadlock Detection An alternative is to build a hierarchy of deadlock detectors. The deadlocks that are local to a single site would be detected at that site using a local WFG. Each site also sends its local WFG to the deadlock detector at the next level. Thus, distributed deadlock involving two or more sites would be detected by a deadlock detector at the next lowest level that has control over these sites. The hierarchical deadlock detection method reduces the dependence on the central site thus reducing the communication cost. Note: It is however more complicated implement and would involve nontrivial modification to the lock and transaction manager algorithms. 3-Distributed Deadlock Detection Distributed Deadlock detection algorithms delegate the responsibility of detecting deadlocks to individual sites. Thus, there are local deadlock detectors at each site which communicate their local WFG with one another.

9/ Methodology
The local WFG at each site is formed and is modified as follows. Since each site receives the potential deadlock cycles from other sites their edges are added to the local WFGs. The edges in the local WFG which show that local transactions are waiting for transactions at other sites are joined with edges in the local WFG which shows that remote transactions are waiting for local ones.

10/ Detection
Local deadlock detectors look for two things that one is if there is a cycle that does not include the external edges, there is a local deadlock that can be handled locally; and the next if there is a cycle involving these external edges, there is a potential distributed deadlock and the cycle information has to be communicated to other deadlocks detectors.

11/ Key Issues

The distributed deadlock detection algorithms require uniform modification to the lock managers at each site. However, there is a potential for excessive message transmission. Besides, each site may choose a different victim to abort.

Solution
Let the path that has the potential of causing distributed deadlocks in the LWG of a site be Ti Tj. A local deadlock detector forwards the cycle information only if ts(Ti) < ts(Tj). This reduces the average number of message transmission by one half.

Chapter 6 Distributed Reliability

The reliability of a distributed processing system (DPS) can be expressed by the analysis of distributed program reliability (DPR) and distributed system reliability (DSR). One of the good approaches to formulate these reliability performance indexes is to generate all disjoint file spanning trees (FSTs) in the DPS graph such that the DPR and DSR can be expressed by the probability that at least one of these FSTs is working. In the paper, a unified algorithm to efficiently generate disjoint FSTs by cutting different links is presented, and the DPR and DSR are computed based on a simple and consistent union operation on the probability space of the FSTs. The DPS reliability related problems are also discussed. For speeding up the reliability evaluation, nodes merged, series, and parallel reduction concepts are incorporated in the algorithm. Based on the comparison of number of subgraphs (or FSTs) generated by the proposed algorithm and by existing evaluation algorithms, it is concluded that the proposed algorithm is much more economic in terms of time and space than the existing algorithms. Thus, this procedure can be placed on with the reliability networks and the database maintainability should be considered that its transactions ought to atomicity and durability.

1/ Fundamental Definitions
how to maintain atomicity and durability of transactions

1.1/ Reliability
A measure of success with which a system conforms to some authoritative specification of its behavior. Probability, that the system has not experienced any failures within a given time period. Typically, used to describe systems that cannot be repaired or where the continuous operation of the system is critical.

1.2/ Availability
The fraction of the time that a system meets its specification. Probability that the system is operational at a given time t.

1.3/ Failure
The deviation of a system from the behavior that is described in its specification called Failure. There are four main types of failures: 1.3.1/ Transaction failures Mostly, this failure is occurred when the transaction is aborts due to deadlock. There is 3% of transactions abort abnormally on the research of reliability database. 1.3.2/ System failures These normally, failure of processor, main memory, power supply, main memory contents are lost, but secondary storage contents are safe 1.3.3/ Media failures Failure of secondary storage devices such the stored data is lost. An example, Head crash/controller failure (?) 1.3.4/ Communication failures This part calls technical failure, this generally lost/undeliverable messages or network partitioning are cause from

1.4/ Erroneous state

The internal state of a system such that there are exists circumstances which further processing, by the normal algorithms of the system, will lead a failure which is not attributed to a subsequent fault. This state is divided to two: Error: the part of the state which is incorrect. Fault: an error in the internal states of the components of a system in the design of a system. in to in

2/ Distributed Reliability Protocols

Generally, a distribute database system reliability techniques consists of Commit protocols, Termination protocols and Recovery protocols.

2.1/ Commit protocols

How to commit a transaction properly when more than one site are involved in the commitment. It's different from centralized DB. Which issue of how to ensure atomicity and durability?

2.2/ Termination protocols

Designed for the sites that are affected by a failed site, tell a site how to commit/abort properly when other site fails; and how the remaining operational sites deal with it. Non-blocking: the occurrence of failures should not force the sites to wait until the failure is repaired to terminate the transaction.

2.3/ Recover protocols

Address the recovery procedure for a failed site once it restarts, just opposite to the termination protocols. When a failure occurs, how the sites where the failure occurred deal with it. Independent: a failed site can determine the outcome of a transaction without having to obtain remote information.

3/ Two-Phase Commit Protocol

2PC ensures the atomic commitment of distributed transaction. 2PC involves one coordinator at the originating site and more than one participant from other sites. Phase 1: The coordinator gets the participants ready to write the results into the database Phase 2: Everybody writes the results into the database Coordinator: The process at the site where the transaction originates and which controls the execution Participant: The process at the other sites that participate in executing the transaction Global Commit Rule: 1- The coordinator aborts a transaction if and only if at least one participant votes to abort it. 2- The coordinator commits a transaction if and only if all of the participants vote to commit it.

Coordinator

Participant

Observations 1. A participant can unilaterally abort before he answers "yes". 2. Once a participant answers "yes", it must prepare for commit and cannot change its vote. 3. While a participant is READY, it can either to abort, or to commit, depending on the decision from the coordinator. 4. The global termination is commit if all participants vote "yes", or abort if any participant vote "no. 5. The coordinator and participants may be in some waiting state, time-out method can be used to exit.

4/ State Transitions in 2PC

4.1/ Site Failures - 2PC Termination

The most, in here, are: + Coordinator: Tmeout in INITIAL: who cares Timeout in WAIT: cannot unilaterally commit or can unilaterally abort Timeout in ABORT or COMMIT: stay blocked and wait for the acks

+ Participants: Timeout in INITIAL: coordinator must have failed in INITIAL state or unilaterally abort Timeout in READY: stay blocked Coordinator Participants

4.2/ Site Failures - 2PC Recovery

+ Coordinator: Failure in INITIAL: start the commit process upon recovery Failure in WAIT: restart the commit process upon recovery Failure in ABORT or COMMIT: nothing special if all the acks have been received Otherwise, the termination protocol is involved + Participants: Failure in INITIAL: unilaterally abort upon recovery Failure in READY: coordinator has been informed about the local decision and treat as timeout in READY state and invoke the termination protocol Failure in ABORT or COMMIT: nothing special needs to be done Coordinator Participants

4.3/ 2PC Recovery ProtocolsAdditional Case

Arise due to non-atomicity of log and message send actions Coordinator site fails after writing begin_commit log and before sending prepare command treat it as a failure in WAIT state; send prepare command Participant site fails after writing ready record in log but before votecommit is sent treat it as failure in READY state alternatively, can send vote-commit upon recovery Participant site fails after writing abort record in log but before voteabort is sent no need to do anything upon recovery Coordinator site fails after logging its final decision record but before sending its decision to the participants coordinator treats it as a failure in COMMIT or ABORT state participants treat it as timeout in the READY state Participant site fails after writing abort or commit record in log but before acknowledgement is sent participant treats it as failure in COMMIT or ABORT state coordinator will handle it by timeout in COMMIT or ABORT state

4.4/ Variations of 2PC

to reduce the number of messages to reduce the number of times logs are forced write Presumed abort protocol Presumed commit protocol 4.4.1/ Presumed abort protocol When the coordinator is inquired from participants about a transactions outcome, the response is abort if there is no information about a transactions status each site can forget about a transaction immediately after it decides to abort o abort ACKs are not necessary o abort log record needs not to be forced at both the coordinator and participants 4.4.2/ Presumed commit protocol If no information about the transaction exists in the coordinator, it should be considered committed (when the coordinator decides to globalcommit), writes a commit log record, sends a global-commit, and forgets about a transaction commit ACKs are not necessary Requires ACKs only from abort commit log records need not be forced by participants, while both commit and abort log records are forced by coordinator The coordinator force writes a participant list prior to sending the prepare message this is for making sure global-abort The coordinator can forget about transaction after sending a global-commit, or getting all the ACKs for global abort The presumed commit is not an exact dual of presumed abort if it were dual, COMMIT log records at coordinator would not be forced

4.5/ Problem With 2PC

Blocking o Ready implies that the participant waits for the coordinator o If coordinator fails, site is blocked until recovery o Blocking reduces availability Independent recovery is not possible However, it is known that: o Independent recovery protocols exist only for single site failures; no independent recovery protocol exists which is resilient to multiple-site failures. So we search for these protocols 3PC

5/ Three-Phase Commit
3PC is non-blocking. A commit protocols is non-blocking if it is synchronous within one state transition, and its state transition diagram contains no state which is adjacent to both a commit and an abort state, and no non-committable state which is adjacent to a commit state Adjacent: possible to go from one stat to another with a single state transition Committable: all sites have voted to commit a transaction (e.g.: COMMIT state)

6/ Quorum Protocols for Replicated Databases

Network partitioning is handled by the replica control protocol. One implementation: Assign a vote to each copy of a replicated data item(say Vi) such i Vi = V Each operation has to obtain a read quorum (Vr) to read and a write quorum (Vw) to write a data item Then, the following rules have to be obeyed in determining the quorums: Vr + Vw > V: a data item is not read and written by two transactions concurrently Vw > V/2 : two write operations from two transactions cannot occur concurrently on the same data item

7/ Network Partitioning
Simple modification of the ROWA rule: When the replica control protocol attempts to read or write a data item, it first checks if a majority of the sites are in the same partition as the site that the protocol is running on (by checking its votes). If so, execute the ROWA rule within that partition. Assumes that failures are clean which means: Failures that change the network's topology are detected by all sites instantaneously Each site has a view of the network consisting of all the sites it can communicate with

8/ Open Problems
Replication protocols experimental validation replication of computation and communication Transaction models changing requirements cooperative sharing vs. competitive sharing interactive transactions longer duration complex operations on complex data relaxed semantics non-serializable correctness criteria

9/ In-Place Update Recovery Information

Every action of a transaction must not only perform the action, but must also write a log record to an append-only file called Database log.

Logging is the log contains information used by the recovery process to restore the consistency of a system. This information may include transaction identifier type of operation (action) items accessed by the transaction to perform the action old value (state) of item (before image) new value (state) of item (after image)

9.1/ Why Logging?

Upon recovery: all of T1's effects should be reflected in the database (REDO if necessary due to a failure) none of T2's effects should be reflected in the database (UNDO if necessary)

9.1.1/ REDO Protocol REDO'ing an action means performing it again. The REDO operation uses the log information and performs the action that might have been done before, or not done due to failures. The REDO operation generates the new image. 9.1.2/ UNDO Protocol UNDO'ing an action means to restore the object to its before image. The UNDO operation uses the log information and restores the old value of the object.

9.2/ When to Write Log Records into Stable Store

Assume a transaction T updates a page P. Fortunate case: System writes P in stable database System updates stable log for this update SYSTEM FAILURE OCCURS!... (before T commits) We can recover (undo) by restoring P to its old state by using the log. Unfortunate case: System writes P in stable database SYSTEM FAILURE OCCURS!... (before stable log is updated) We cannot recover from this failure because there is no log record to restore the old value. Solution: Write-Ahead Log (WAL) protocol

9.2.1/ Notice: If a system crashes before a transaction is committed, then all the operations must be undone. Only need the before images (undo portion of the log). Once a transaction is committed, some of its actions might have to be redone. Need the after images (redo portion of the log). 9.2.2/ WAL protocol: Before a stable database is updated, the undo portion of the log should be written to the stable log When a transaction commits, the redo portion of the log must be written to stable log prior to the updating of the stable database. 9.2.3/ Logging Interface

10/ Out-of-Place Update Recovery Information 10.1/ Shadowing

When an update occurs, don't change the old page, but create a shadow page with the new values and write it into the stable database. Update the access paths so that subsequent accesses are to the new shadow page. The old page retained for recovery.

10.2/ Differential files

For each file F maintain a read only part FR differential file consisting of insertions part DF+ and deletions part DFThus, F = (FR DF+) DFUpdates treated as delete old value, insert new value

11/ Execution Strategies

Dependent upon Can the buffer manager decide to write some of the buffer pages being accessed by a transaction into stable storage or does it wait for LRM to instruct it? fix/no-fix decision Does the LRM force the buffer manager to write certain buffer pages into stable database at the end of a transaction's execution? flush/no-flush decision Possible execution strategies: no-fix/no-flush no-fix/flush fix/no-flush fix/flush

11.1/ No-Fix/No-Flush
Abort Buffer manager may have written some of the updated pages into stable database LRM performs transaction undo (or partial undo) Commit LRM writes an end_of_transaction record into the log. Recover For those transactions that have both a begin_transaction and an end_of_transaction record in the log, a partial redo is initiated by LRM For those transactions that only have a begin_transaction in the log, a global undo is executed by LRM

11.2/ No-Fix/Flush
Abort Buffer manager may have written some of the updated pages into stable database LRM performs transaction undo (or partial undo) Commit LRM issues a flush command to the buffer manager for all updated pages LRM writes an end_of_transaction record into the log. Recover No need to perform redo Perform global undo

11.3/ Fix/No-Flush
Abort None of the updated pages have been written into stable database Release the fixed pages Commit LRM writes an end_of_transaction record into the log. LRM sends an unfix command to the buffer manager for all pages that were previously fixed Recover Perform partial redo No need to perform global undo

11.4/ Fix/Flush
Abort None of the updated pages have been written into stable database Release the fixed pages Commit (the following have to be done atomically) LRM issues a flush command to the buffer manager for all updated pages LRM sends an unfix command to the buffer manager for all pages that were previously fixed LRM writes an end_of_transaction record into the log. Recover No need to do anything

12/ Checkpoints
Simplifies the task of determining actions of transactions that need to be undone or redone when a failure occur. A checkpoint record contains a list of active transactions. Those are Steps: Write a begin_checkpoint record into the log Collect the checkpoint date into the stable storage Write an end_checkpoint record into the log

IV/ APPLIED METHOD ON ORACLE

As this paper mentions, a distributed DBMS should provide a number of features which make the distributed nature of the DBMS transparent to the user. These include the following: Location transparency Replication transparency Performance transparency Transaction transparency Catalog transparency And through my experience in database administrator, I would like to introduce one of a good database which its architecture and performance are followed as the method of relational database called Oracle database. I do hope, after completing this paper, reader should be able to understand the following: Install, create, and administer Oracle Database 10g Configure the database for an application Employ basic monitoring procedures Implement a backup and recovery strategy Move data between databases and files

1/ Oracle Database Architecture

An Oracle server is a database management system that provides an open, comprehensive, integrated approach to information management. It consists of an Oracle instance and an Oracle database. Oracle 10g requirements: Memory: 1 GB for the instance with Database Control Disk space: o 1.5 GB of swap space o 400 MB of disk space in the /tmp directory o Between 1.5 GB and 3.5 GB for the Oracle software o 1.2 GB for the preconfigured database (optional) o 2.4 GB for the flash recovery area (optional)

Memory Components
Share/System Global Area (SGA) - it can be shared with few component as of the followings: 1. Shared pool: the size of the SP is maitained by an initial parameter called shared_pool_size. The larger the share pool, the better the performance will be. The shared pool is divided into two: library cache and data cache. Note: correct configuration of shared_pool affect the performance. 2. Buffered cache: buffer_cache_size: it is basically made up of buffer. The parameters which determine the buffer are called db_block_size (e.g. 8k) and db_cache_size (e.g. 80MB). 3. Log buffer: log_buffer_size 4. Java Pool: java_pool_size 5. Stream Pool: stream_pool_size; used for Oracle Streams. 6. Redo log buffer: used for instance recovery.

There are 5 mandatory background processors. In case any of the following processors is killed or down, Oracle database is not working. 1. dbwr 2. lgwr 3. smon 4. pmon 5. ckpt Note: Tablespaces consist of one or more data files. Data files belong to only one tablespace. The SYSTEM and SYSAUX tablespaces are mandatory tablespaces. They are created at the time of database creation. They must be online. The SYSTEM tablespace is used for core functionality (for example, data dictionary tables). The auxiliary SYSAUX tablespace is used for additional database components (such as the Enterprise Manager Repository). Segments exist within a tablespace. Segments are made up of a collection of extents. Extents are a collection of data blocks. Data blocks are mapped to disk blocks. Memory structures: o System Global Area (SGA): Database buffer cache, redo buffer, and various pools o Program Global Area (PGA) Process structures: o User process and Server process o Background processes: SMON, PMON, DBWn, CKPT, LGWR, ARCn, and so on Storage structures: o Logical: Database, schema, tablespace, segment, extent, and Oracle block o Physical: Files for data, parameters, redo, and OS block

Segment

Extents

Data blocks

Disk blocks

2/ Tasks of an Oracle Database Administrator

A prioritized approach for designing, implementing, and maintaining an Oracle database involves the following tasks: 1. Evaluating the database server hardware 2. Installing the Oracle software 3. Planning the database and security strategy 4. Creating, migrating, and opening the database 5. Backing up the database 6. Enrolling system users and planning for their Oracle Network access 7. Implementing the database design 8. Recovering from database failure 9. Monitoring database performance Tools Used to Administer an Oracle Database: Oracle Universal Installer Database Configuration Assistant Database Upgrade Assistant Oracle Net Manager Oracle Enterprise Manager SQL*Plus and iSQL*Plus Recovery Manager Oracle Secure Backup Data Pump SQL*Loader Command-line tools Setting Environment Variables: ORACLE_BASE: The base of the Oracle directory structure for OFA ORACLE_HOME: The directory containing the Oracle software ORACLE_SID: The initial instance name (by default, ORCL) NLS_LANG: The language, territory, and client character set settings

3/ Database Planning
As a DBA, you must plan: The logical storage structure of the database and its physical implementation: o How many disk drives do you have for this? o How many data files will you need? (Plan for growth.) o How many tablespaces will you use? o Which type of information will be stored? o Are there any special storage requirements due to type or size? The overall database design A backup strategy for the database

4/ Oracle management framework

The three components of the Oracle Database 10g management framework are: Database instance Listener Management interface o Database Control o Management agent (when using Grid Control)

Management agent

-orListener

Database Control Management interface

Dynamic Performance Views These views are owned by the SYS user. Different views are available at different times: o The instance has been started. o The database is mounted. o The database is open. You can query V$FIXED_TABLE to see all the view names. These views are often referred to as v-dollar views. Read consistency is not guaranteed on these views because the data is dynamic. In Oracle, SQL*Plus and iSQL*Plus provide additional interfaces to your database to Perform database management operations, Execute SQL commands to query, insert, update, and delete data in your database SQL*Plus is a command-line tool and used interactively or in batch mode iSQL*Plus is not command-line. Its an interface tool bases on web interface.

Example: Using iSQL*Plus to Start Up instance

Example: Using SQL*Plus to Start Up and Shut Down

[oracle@EDRSR9P1 oracle]$ sqlplus dba1/oracle as sysdba SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down. SQL> startup ORACLE instance started. Total System Global Area Fixed Size Variable Size Database Buffers Redo Buffers Database mounted. Database opened. SQL> 285212672 1218472 250177624 33554432 262144 bytes bytes bytes bytes bytes

After created a database, an instance and a listener, user sysdba have to create user database owner. Then, create initial file called Initialization Parameter Files The below is the following step to create database manually in windows: 1. create folders for creating database: 1. data1 (disk 1) 2. data2 (disk 2) 3. cdump (core dump) 4. bdump (background dump) 5. udump (user dump) 6. backup(for backup and recovery) 2. create service (oradim) * only for windows 3. configure pfile 4. startup instance 5. create database 6. create data dict views (@oracle_home\dbms\admin\catalog.sql)- as sysdba 7. create built-in PL/SQL packages (@oracle_home\dbms\admin\catproc.sql) - as sysdba 8. create user profile information (@oracle_home\dbms\admin\pubbld.sql) - as system/manager Initialization Parameter Files Example: create a sample of pfile: initdb1.ora

db_name = db1 shared_pool_size = 100m db_cache_size = 120m log_buffer = 5000000 background_dump_dest = d:\bdump user_dump_dest = d:\udump core_dump_dest = d:\cdump control_files = d:\data1\control01.ctl

spfileorcl.ora

Example steps: * switch database since there are many oracle instances running. C:\>set oracle_sid=dba1 C:\>sqlplus / as sysdba * startup the instance only, no mount or open sql>startup nomount >> This will give error since it cannot locate the initdba1.ora sql>startup nomount pfile=d:\data1\initdba1.ora >>> successful, then create a database manually sql>create database dba1 2> datafile 'd:\data1\system01.dbf' size 200m 3> logfile group1 'd:\data1\log1a.rdo'size 5m, 4> group2 'd:\data1\log2a.rdo'size 5m 5> sysaux datafile 'd:\data2\sysaux01.dbf'size 80m; >>>Database created. *** Note: the larger the size is, the longer it takes to create the database because it needs to allocate the space in the physical disk. sql>select name from v$database; NAME -------DBA1 sql>show parameter shared_pool; * oracle_home\rdbms\admin\catalog.sql sql>@d:\oracle\product\10.2.0\db_1\rdbms\admin\catalog.sql * oracle_home\rdbms\admin\catproc.sql sql>@d:\oracle\product\10.2.0\db_1\rdbms\admin\catproc.sql * run the create user profile information script by connect as system sql>conn system/manager sql>@d:\oracle\product\10.2.0\sqlplus\admin\pupbld.sql

sql>select group#, member from v$logfile; ***note: member = mirroring of logfile sql>alter database add logfile member 2>'d:\data2\log1b.rdo' to group 1, 3>'d:\log2b.rdo' to group 2; >>>Database altered. sql>select group#, member from v$logfile; * switch the logwr to write onto a different one, because we need to drop sql>alter system switch logfile; sql>database drop logfile member 'd:\log2b.rdo'; sql>alter daatabase add logfile member 2>'d:\data2\log2b.rdo' to group 2; sql>select group#, member from v$logfile; RESULT >>> GROUP#, MEMBER 1 D:\DATA1\LOG1A.RDO 2 D:\DATA1\LOG2A.RDO 1 D:\DATA2\LOG1B.RDO 2 D:\DATA2\LOG2B.RDO >> This shows log files are mirroring to another location as of in D:\Data2 *switch user to sysdba to shutdown/startup the instance & database sql>conn / as sysdba sql>shutdown immediate * create another control file as of in D:\data1 to d:\data2 (copy and paste) * modify 'control_files' parameter in the initdba1.ora control_files=d:\data1\control01.ctl, d:\data2\control02.ctl *startup the instance and the database using 'startup' alone sql>startup >>> Oracle instance started. sql>select name from v$controlfile; >>> Name d:\data1\control01.ctl d:\data2\control02.ctl

*** Note: controlfile files' sizes are identically the same; otherwise, there is a problem.

OPEN STARTUP MOUNT All files opened as described by the control file for this instance

NOMOUNT Instance started SHUTDOWN

Control file opened for this instance

4.1/ Startup command

In oracle, there two methods to startup instance: startup (is a normal startup instance) and startup force (is a command to restart instance without shutdown. internally, it does a shutdown abort and startup).

4.2/ Shutdown command

shutdown shutdown shutdown shutdown [normal] : wait until the user disconnect transactional : waits until the user commits the transaction immediate : kill user's session and rollback transaction abort: kill user's session only, files will not be closed properly.

Note: after instance startup after doing 'shutdown abort' oracle will do instance recovery done by oracle automatically. Then, redo writes the committed and logged transactions (logfile information) into datafiles

4.2/ How table data is stored

Database Tablespace

Data files

4.3/ Automatic Storage Management

Automatic Storage Management, in oracle: Is a portable and high-performance cluster file system Manages Oracle database files Spreads data across disks to balance load Mirrors data Solves many storage management challenges

Application Database File system Volume manager

ASM

Operating system

ASM Key Features and Benefits

Stripes files, but not logical volumes Provides online disk reconfiguration and dynamic rebalancing Allows for adjustable rebalancing speed Provides redundancy on a per-file basis Supports only Oracle database files Is cluster aware Is automatically installed

ASM Concept
Database ASM disk group ASM file

Tablespace

Data file

Segment

ASM disk File system file or raw device

Extent

Allocation unit

Oracle data block

Physical block

5/ Database concurrency
Up on time and this paper is limit, I would to introduce you about how create user, role, privilege. So, I would like to express more about oracle concurrency. But if you want to know more about this, please contact me: [email protected].

5.1/ PL/SQL
There are many types of PL/SQL database objects: Package, Package body, Type body, Procedure, Function and Trigger. Oracles Procedural Language extension to SQL (PL/SQL) is a fourth-generation programming language (4GL). It provides: Procedural extensions to SQL Portability across platforms and products Higher level of security and data integrity protection Support for object-oriented programming

5.2/ Locks
Locks prevent multiple sessions from changing the same data at the same time. They are automatically obtained at the lowest possible level for a given statement. They do not escalate.

Transaction 1
SQL> UPDATE employees 2 SET salary=salary+100 3 WHERE employee_id=100;

Transaction 2
SQL> UPDATE employees 2 SET salary=salary*1.1 3 WHERE employee_id=100;

5.2.1/ Locks Mechanism

High level of data concurrency: Row-level locks for inserts, updates, and deletes No locks required for queries Automatic queue management Locks held until the transaction ends (with the COMMIT or ROLLBACK operation)

Transaction 1
SQL> UPDATE employees 2 SET salary=salary+100 3 WHERE employee_id=100;

Transaction 2
SQL> UPDATE employees 2 SET salary=salary*1.1 3 WHERE employee_id=101;

5.2.2/ Enqueue Mechanism

The enqueue mechanism keeps track of: Sessions waiting for locks The requested lock mode The order in which sessions requested the lock

5.2.3/ Possible Causes of Lock Conflicts

Uncommitted changes Long-running transactions Unnecessarily high locking levels To resolve a lock conflict: Have the session holding the lock commit or roll back Terminate the session holding the lock as a last resort

6/ Database Reliability
A secure system ensures the confidentiality of the data that it contains. There are several aspects of security: Restricting access to data and services Authenticating users Monitoring for suspicious activity

6.1/ Principle of Least Privilege

Install only required software on the machine. Activate only required services on the machine. Give OS and database access to only those users that require access. Limit access to the root or administrator account. Limit access to the SYSDBA and SYSOPER accounts. Limit users access to only the database objects required to do their jobs.

6.2/ Applying the Principle of Least Privilege

Protect the data dictionary:
O7_DICTIONARY_ACCESSIBILITY=FALSE

Revoke unnecessary privileges from PUBLIC:

REVOKE EXECUTE ON UTL_SMTP, UTL_TCP, UTL_HTTP, UTL_FILE FROM PUBLIC;

Restrict the directories accessible by users. Limit users with administrative privileges. Restrict remote database authentication:
REMOTE_OS_AUTHENT=FALSE

6.3/ Monitoring for Suspicious Activity

Monitoring or auditing must be an integral part of your security procedures. Review the following: Mandatory auditing Standard database auditing Value-based auditing Fine-grained auditing (FGA) DBA auditing

6.4/ Back up and Recovery

In Oracle, bases on its background process there few methods of recovery; have being done during database processes. + Checkpoint (CKPT) is responsible for: Signaling Database Writer (DBWn) at checkpoints Updating data file headers with checkpoint information Updating control files with checkpoint information + Redo log files: Record changes to the database Should be multiplexed to protect against loss + LogWriter writes: At commit When one-third full Every three seconds Before DBWn writes + Archiver (ARCn): Is an optional background process Automatically archives online redo log files when ARCHIVELOG mode is set for the database Preserves the record of all changes made to the database

For Oracle database, Enterprise Manager uses Recovery Manager (RMAN) to perform backup and recovery operations. RMAN: is a command-line client for advanced functions, has powerful control and scripting language, has a published API that enables interface with most popular backup software, Backs up files to the disk or tape, Backs up data, control, archived log, and server parameter files After the instance is open, it fails in the case of the loss of: Any control file, A data file belonging to the system or undo tablespaces and An entire redo log group. As long as at least one member of the group is available, the instance remains open. In Oracle started up from 9i, The Flashback technology is a revolutionary advance in recovery. Traditional recovery techniques are slow. The entire database or a file (not just the incorrect data) has to be restored. Every change in the database log must be examined. Flashback is fast. Because indexed by row and by transaction are changed and only the changed data is restored. Flashback commands are easy. No complex multiple-step procedures are involved. Flashback Database brings the database to an earlier point in time by undoing all changes made since that time. Flashback Table recovers a table to a point in time in the past without having to restore from a backup. Flashback Drop restores accidentally dropped tables.

Note: If a control file is lost or corrupted, the instance normally aborts, at which time you must perform the following steps: 1. Shut down the instance, if it is still open. 2. Restore the missing control file by copying an existing control file. 3. Start the instance. If a member of a redo log file group is lost, as long as the group still has at least one member, then: 1. Normal operation of the instance is not affected 2. You receive a message in the alert log notifying you that a member cannot be found. 3. You can restore the missing log file by copying one of the remaining files from the same group.

If the database is in NOARCHIVELOG mode, and any data file is lost, perform the following tasks: 1. Shut down the instance if it is not already down. 2. Restore the entire database, including all data and control files, from the backup. 3. Open the database. 4. Have users reenter all changes made since the last backup. If a data file is lost or corrupted, and that file does not belong to the SYSTEM or UNDO tablespace, then restore and recover the missing data file. If a data file is lost or corrupted, and that file belongs to the SYSTEM or UNDO tablespace: 1. The instance may or may not shut down automatically. If it does not, use SHUTDOWN ABORT to bring the instance down. 2. Mount the database 3. Restore and recover the missing data file 4. Open the database

7/ Database Efficiency 7.1/ Listener

To make a client or middle-tier connection, Oracle Net requires the client to know the: Host where the listener is running Port that the listener is monitoring Protocol that the listener is using Name of the service that the listener is handling

Incoming request

Names Resolution

Commands from the listener control utility can be issued from the command line or from the LSNRCTL prompt. UNIX or Linux command-line syntax:
$ lsnrctl <command name> $ lsnrctl start $ lsnrctl status

Prompt syntax:
LSNRCTL> <command name> LSNRCTL> start LSNRCTL> status

Oracle Net supports several methods of resolving connection information: Easy connect naming: Uses a TCP/IP connect string, which no required client-side configuration and enabled by default. But its offers not support for advanced connection options, such as: connect-time failover, source routing and load balancing
SQL> CONNECT hr/[email protected]:1521/dba10g

No Oracle Net configuration files

Local naming: Uses a local configuration file, which requires a client-side Names Resolution file, it supports all Oracle Net protocols and advanced connection options.
SQL> CONNECT hr/hr@orcl

Oracle Net configuration files

Directory naming: Uses a centralized LDAP-compliant directory server, it supports all Oracle Net protocols, supports advanced connection options and requires LDAP with Oracle Net Names Resolution information loaded: o Oracle Internet Directory o Microsoft Active Directory Services
LDAP directory SQL> CONNECT hr/hr@orcl Oracle Net configuration files

External naming: Uses a supported non-Oracle naming service and includes: o Network Information Service (NIS) External Naming o Distributed Computing Environment (DCE) Cell Directory Services (CDS)
Non-Oracle naming service Oracle Net

8/ Database performance
In Oracle, Tuning are used to advice such as complicated SQL structure, accessing issues, missing index and so on according to this lead the improvement of database performance have been done.
Automatic Tuning Optimizer
Memory Memory allocation allocation issues issues Input/output Input/output device device contention contention Resource Resource contention contention
Statistics check optimization mode

Comprehensive SQL tuning

Detect stale or missing statistics

Plan tuning optimization mode

Tune SQL plan (SQL profile)

?
Application Application code code problems problems DBA Network Network bottlenecks bottlenecks

Access analysis optimization mode

Add missing index Run access advisor

SQL analysis optimization mode

SQL Tuning Advisor

Restructure SQL

Below are Performance statistics that database administrator should know:

IV/ SUMMERY
Regarding to the above describe, can indicated student/the fresher of software development in the ways of what is the important things that can associated in their application development which is efficiency, flexibility, availability, reliability, incremental growth and powerful database system. More over, it can be a shareable system which is a real-time multi-database system (replicate able).

V/ APPLY
Distributed database system is a kind of a new cutting-edge of business computerization can do their business transaction, manage, controls, evaluate and recover-supporting all around the world. Developer or student will be introduced to interoperable, distributed data processing architectures associated with the access of heterogeneous data sources. Which, Traditional distribution issues are addressed in the context of relational database systems, e.g.: distributed query processing, distributed database design. Hence, this subject can be applied and served in many fields of uses in human-society such as: - Provide a strong foundation for addressing issues of distributed database processing. - To know the essential of a reliability and concurrency control in database system - Meet the requirement of the distributed database centralization and decentralization - Perform a good communication with a lower cost.

VI/ CONCLUSION
In my point of view, I do hope that distributed database system is not only a new computer technology which supports business management, evaluation, decision making and it is also help the manageability data in cases of their crashing. More over, this will be a good result of developing in human community. Hence, I would like to recommend dean of Royal University of Phnom Penh and also dean of Computer Science department that this course should be expanded to give more opportunities to student can applied with real experiments better than study on its theory.

VII/ REFERENCES
This report was prepared by follow references: - Slides of presentations of lecturer: Pok Leakmony - Oracle 10g Database Administration Workshop I - M. T. Ozsu and P. Valduriez, Principles of Distributed Databases (2nd edition), Prentice-Hall, ISBN 0-13-659707-6 - Elmasri and Navathe, Fundamentals of database systems (3rd edition), Addison-Wesley Longman, ISBN 0-201-54263-3

Unit - 1 DDB
No ratings yet
Unit - 1 DDB
34 pages
Unit 5
No ratings yet
Unit 5
27 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Ddbms Lab Manual
No ratings yet
Ddbms Lab Manual
100 pages
Distibuted Database Management System Notes
No ratings yet
Distibuted Database Management System Notes
58 pages
Discrete Structure and Automata Theory for Learners: Learn Discrete Structure Concepts and Automata Theory with JFLAP
From Everand
Discrete Structure and Automata Theory for Learners: Learn Discrete Structure Concepts and Automata Theory with JFLAP
Sukhpreet Kaur Gill
No ratings yet
Cisco Advanced Services Presentation
No ratings yet
Cisco Advanced Services Presentation
59 pages
Distributed Database Management Notes - 1
100% (11)
Distributed Database Management Notes - 1
21 pages
Distributed Transactions Management
100% (3)
Distributed Transactions Management
28 pages
What Is Indexing?: Indexing Is A Data Structure Technique Which Allows You To Quickly Retrieve
100% (1)
What Is Indexing?: Indexing Is A Data Structure Technique Which Allows You To Quickly Retrieve
7 pages
Chapter 4 Distributed Database Systems
No ratings yet
Chapter 4 Distributed Database Systems
69 pages
System Models For Distributed and Cloud Computing
No ratings yet
System Models For Distributed and Cloud Computing
15 pages
DBMS LAB Manual Iare
No ratings yet
DBMS LAB Manual Iare
10 pages
Unit No.4 Parallel Database
No ratings yet
Unit No.4 Parallel Database
32 pages
Cse V Database Management Systems
No ratings yet
Cse V Database Management Systems
82 pages
Unit-1 Transparency in DDBMS
No ratings yet
Unit-1 Transparency in DDBMS
15 pages
Distributed Database Systems: January 2002
No ratings yet
Distributed Database Systems: January 2002
25 pages
Nosqlmodule 1
100% (1)
Nosqlmodule 1
102 pages
Unit-1 DDBMS Architecture
No ratings yet
Unit-1 DDBMS Architecture
14 pages
Cs9152 DBT Unit I Notes
100% (1)
Cs9152 DBT Unit I Notes
53 pages
CHAPTER 03: Big Data Technology Landscape
No ratings yet
CHAPTER 03: Big Data Technology Landscape
81 pages
Nosql Module 2
100% (1)
Nosql Module 2
87 pages
CD3291 Data Structures and Algorithms Lecture Notes 2
No ratings yet
CD3291 Data Structures and Algorithms Lecture Notes 2
156 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
5 pages
P.prabu (29x61c) CCS334 BDA - Unit 2
No ratings yet
P.prabu (29x61c) CCS334 BDA - Unit 2
29 pages
DBMS Course File
100% (1)
DBMS Course File
39 pages
DBMS Question Bank-2021
100% (1)
DBMS Question Bank-2021
14 pages
DBMS Unit-4 Notes
No ratings yet
DBMS Unit-4 Notes
62 pages
Unit: 1 Introduction To Distributed System
No ratings yet
Unit: 1 Introduction To Distributed System
147 pages
Dbms Unit II
No ratings yet
Dbms Unit II
49 pages
DBMS Full Notes
No ratings yet
DBMS Full Notes
99 pages
B+ Tree & B Tree
No ratings yet
B+ Tree & B Tree
38 pages
Chapter 1 Query Processing and Optimization
No ratings yet
Chapter 1 Query Processing and Optimization
129 pages
SM 6th-Sem Cse Internet-Of-Things
No ratings yet
SM 6th-Sem Cse Internet-Of-Things
76 pages
Darshan Institute of Engineering & Technology
No ratings yet
Darshan Institute of Engineering & Technology
49 pages
ADM Unit 1
100% (1)
ADM Unit 1
2 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
28 pages
Mining Social Network Graphs
No ratings yet
Mining Social Network Graphs
35 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
16 pages
Backup and Recovery
No ratings yet
Backup and Recovery
35 pages
Distributed Database Systems (DDBS)
No ratings yet
Distributed Database Systems (DDBS)
30 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
68 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
OS 2 Marks
100% (11)
OS 2 Marks
15 pages
What Is Postgresql?
No ratings yet
What Is Postgresql?
12 pages
Sonali DBMS Notes
100% (13)
Sonali DBMS Notes
61 pages
Dbms Unit 1 Notes
0% (1)
Dbms Unit 1 Notes
14 pages
DBMS Complete Notes
No ratings yet
DBMS Complete Notes
122 pages
Dbms Notes For Vtu Students
No ratings yet
Dbms Notes For Vtu Students
105 pages
Unit 3 (Distributed DBMS Architecture) : Architecture: The Architecture of A System Defines Its Structure
No ratings yet
Unit 3 (Distributed DBMS Architecture) : Architecture: The Architecture of A System Defines Its Structure
11 pages
Chapter 1 - Query Processing and Optimization
No ratings yet
Chapter 1 - Query Processing and Optimization
62 pages
Cloud Computing 2marks
No ratings yet
Cloud Computing 2marks
6 pages
Machine Learning Mastery for Engineers
From Everand
Machine Learning Mastery for Engineers
Abdellatif Sadeq
No ratings yet
Learning Advanced Programming
From Everand
Learning Advanced Programming
IT Campus Academy
No ratings yet
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
From Everand
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
Manish Soni
No ratings yet
Academic English for Computer Science: Academic English
From Everand
Academic English for Computer Science: Academic English
Disigma Publications
No ratings yet
Information Technology Project Management Interview Questions: IT Project Management and Project Management Interview Questions, Answers, and Explanations
From Everand
Information Technology Project Management Interview Questions: IT Project Management and Project Management Interview Questions, Answers, and Explanations
Equity Press
4/5 (4)
Touchpad Information Technology Class 10: Skill Education Based on Windows & OpenOffice Code (402)
From Everand
Touchpad Information Technology Class 10: Skill Education Based on Windows & OpenOffice Code (402)
Dr. Sanjay Jain
No ratings yet
Group Project Software Management: A Guide for University Students and Instructors
From Everand
Group Project Software Management: A Guide for University Students and Instructors
Tommy Yuan
No ratings yet
Exploring Higher Vocational Software Technology Education
From Everand
Exploring Higher Vocational Software Technology Education
Chen Ping
No ratings yet
The Art of Controller Design
From Everand
The Art of Controller Design
Martin Braae
No ratings yet
Please Make Note of The File That Is in The Startable or Executing Mode You Will Have To Return That File To That Same Status
No ratings yet
Please Make Note of The File That Is in The Startable or Executing Mode You Will Have To Return That File To That Same Status
3 pages
Solaris Cheat Sheet: Smit
100% (2)
Solaris Cheat Sheet: Smit
6 pages
Qfx10000 Junos Fusion Partner Activation Kit
No ratings yet
Qfx10000 Junos Fusion Partner Activation Kit
26 pages
1726686641-32bdl5150i 00 2023-12-14 10h06m02s PDF
No ratings yet
1726686641-32bdl5150i 00 2023-12-14 10h06m02s PDF
2 pages
Chap1 Digital System and Number System
100% (1)
Chap1 Digital System and Number System
83 pages
CT10 HL
No ratings yet
CT10 HL
4 pages
Ipmv Viva Questions
No ratings yet
Ipmv Viva Questions
27 pages
121114.001 - Specification For - TCN - VCU - Website PDF
No ratings yet
121114.001 - Specification For - TCN - VCU - Website PDF
32 pages
Microsoft Official Course: Evaluating The Environment For Virtualization
No ratings yet
Microsoft Official Course: Evaluating The Environment For Virtualization
39 pages
Ghat Roads Accidents Prevention System
No ratings yet
Ghat Roads Accidents Prevention System
5 pages
MX Memory Variables
No ratings yet
MX Memory Variables
3 pages
Sas Macros Final
No ratings yet
Sas Macros Final
75 pages
Djsks
No ratings yet
Djsks
6 pages
Cyber Ops Summary
No ratings yet
Cyber Ops Summary
5 pages
Data Sheet Windows2008Server
No ratings yet
Data Sheet Windows2008Server
2 pages
PSpice Batch Simulation Setup
No ratings yet
PSpice Batch Simulation Setup
26 pages
Grade 10 Online Worksheet Unit 8 2
No ratings yet
Grade 10 Online Worksheet Unit 8 2
3 pages
Wireless Sound Bar: Owner'S Manual
No ratings yet
Wireless Sound Bar: Owner'S Manual
44 pages
Online Reservation For Cine Malls: Software Requirement Specification
No ratings yet
Online Reservation For Cine Malls: Software Requirement Specification
26 pages
MPMC 1
No ratings yet
MPMC 1
27 pages
Circuit Analysis With The Laplace Transform
No ratings yet
Circuit Analysis With The Laplace Transform
9 pages
Assignment - 2 Basic Electronics Engineering
No ratings yet
Assignment - 2 Basic Electronics Engineering
7 pages
Shahul Ccna CV
No ratings yet
Shahul Ccna CV
2 pages
CSS 9 - Q3 W1 3 Pages 1
No ratings yet
CSS 9 - Q3 W1 3 Pages 1
24 pages
ECSE 420 - Parallel Cholesky Algorithm - Report
No ratings yet
ECSE 420 - Parallel Cholesky Algorithm - Report
17 pages
Analogmixedsignal and RF Integrated Circuit Technologies For Wir
No ratings yet
Analogmixedsignal and RF Integrated Circuit Technologies For Wir
6 pages
Final Os Lab File
No ratings yet
Final Os Lab File
41 pages
W7 - Lecture 1 and 2 (1) - 220418 - 134451
No ratings yet
W7 - Lecture 1 and 2 (1) - 220418 - 134451
13 pages
Week 9 - Data Acquisition Systems (DAQ)
No ratings yet
Week 9 - Data Acquisition Systems (DAQ)
35 pages