Distributed Database System
Distributed Database System
Chin Narong
2008
Objective The Master of Science in Information Technology (M.Sc. (Information Technology)) degree aims to provide knowledge and understanding of modern information technology systems, in order to prepare students for practical careers within the Information Technology sector. To this end, the curriculum focuses on building understanding in the fields of computer networking, internet programming, software systems engineering, project management and high performance computer systems.
Assessment Students performing at a high level during their degree are selected to research and write a Masters thesis during their final year. In the first intake of students, six of the 48 students submitted a thesis. Those who are not selected to submit a thesis attend classes and sit examinations to attain the equivalent value of credits.
ii
According to the above assessment, this report was prepared for the Department of Information Technology of Royal University of Phnom Penh. The project was advised by Mr. Ouk Chhieng Dean of Computer Science department or Coordinator of Royal University of Phnom Penh. This report is in the private domain. Authorization to reproduce it in whole or in part is granted. Permission to reprint this publication is not necessary. To order copies of this report, student has to: Write to Dr. Ouk Chhieng, Coordinator of Royal University of Phnom Penh, Room # 101, Campus I, call (855) 12 754-344 Write to Mr. Chin Narong, The study report owner, through the address:
[email protected]; [email protected]
Upon request, this report is available in alternate formats. For more information, please contact the IT Center, RUPP at 855-23-881-285.
iii
Acknowledgements
The Study report writer would like to express its gratitude to the university, staff, dean and especially to Mr. Ouk Chhieng who was advised for this preparation report. Their contributions to study, such as assessments, observations, surveys, are deeply appreciated in the filed of Distributed Database System. On be half of the writer, I would like to say thanks for their generosity of time and spirit.
Finally, I would like to say thanks to my friends who were given their recommendation and some resource to support my research.
Abbreviation:
DBMS : Database Management System Distributed Database Management System Multi-Database System Global Conceptual Schema Distributed Defect Tracking System Application programming interface Derived Horizontal Fragmentation Primary Horizontal Fragmentation Vertical Fragmentations Read Set Write Set Base Set Transaction Management Distributed Concurrency Control Concurrency Control Timestamp Ordering Two Phase Locking Wait-for graph Global Wait for Graph distributed processing system distributed program reliability distributed system reliability file spanning trees DDBMS : MDBS GCS DDTS API DHF PHF VF RS WS BS TM: DCC CC TO 2PL WFG GWFG DPS DPR DSR FSTs : : : : : : : : : : : : : : : : : : : : :
ii
Contents
I/ INTRODUCTION .................................................................................................................... 1 1 OBJECTIVE OF THE STUDY ........................................................................................... 2 2 SIGNIFICANCE OF THE STUDY .................................................................................... 3 3 Layout of the Study ......................................................................................................... 4 II/ REVIEW OF LITERATURE ................................................................................................. 4 Chapter 1: Introduction ..................................................................................................... 4 Course Outline ................................................................................................................... 4 What is a Distributed Database System? ................................................................ 4 Implicit Assumptions ....................................................................................................... 5 Motivation ............................................................................................................................ 5 Distributed Computing .................................................................................................... 6 What is Distributed? ........................................................................................................ 6 What a Site is? ................................................................................................................... 6 DDBS Environment .......................................................................................................... 6 Distributed Database Graphic ...................................................................................... 7 What is not a Distributed Database System? ........................................................ 7 Shared-Memory Multiprocessor .................................................................................. 7 Centralized DBMS on a Network ................................................................................. 8 Why Distribute Database? ............................................................................................. 8 Advantages of DDBMSs .................................................................................................. 8 Disadvantages of DDBMSs ............................................................................................ 8 Applications ......................................................................................................................... 8 Issues with DDBMS .......................................................................................................... 9 Distributed Transaction Management ..................................................................... 10 Data Fragmentation ....................................................................................................... 11 Fragmentation Independence .................................................................................... 12 DBMS Independence ..................................................................................................... 12 Operating System Independence ............................................................................. 12 Hardware Independence .............................................................................................. 12 Who Should Provide Transparency .......................................................................... 12 Complexity......................................................................................................................... 13 Cost ...................................................................................................................................... 13 Distribution Control ........................................................................................................ 13 Security............................................................................................................................... 13 Distributed Database Design ...................................................................................... 14 Distributed Query Processing ..................................................................................... 14 Distributed Directory Management .......................................................................... 14 Distributed Concurrency Control............................................................................... 15 Distributed Deadlock Management .......................................................................... 15 Reliability of Distributed Databases......................................................................... 15 Operating System Support ......................................................................................... 16 Heterogeneous Databases .......................................................................................... 16 Chapter2: Distributed DBMS Architecture ................................................................ 17 1/ Objective ...................................................................................................................... 17 2/ Types of DDMS Architecture ................................................................................. 17
iii
3/ Distribution .................................................................................................................. 24 4/ Data Processor ........................................................................................................... 29 5/ Heterogeneity ............................................................................................................. 30 6/ Architectural Alternatives ...................................................................................... 30 7/ Implementation Alternatives ................................................................................ 31 8/ Multi-DBS Architecture............................................................................................ 32 Chapter 3 Distributed Database Design..................................................................... 35 1/ Design Problem .......................................................................................................... 35 2/ Alternative Design Strategies............................................................................... 36 3/ Horizontal Fragmentations .................................................................................... 41 4/ Primary Horizontal Fragmentation ..................................................................... 43 5/ Derived Horizontal Fragmentation ..................................................................... 43 6/ Minterm Fragments .................................................................................................. 44 7/ Vertical Fragmentations ......................................................................................... 45 8/ Hybrid Fragmentation ............................................................................................. 46 9/ Allocation Alternatives............................................................................................. 47 10/ Allocation Problem ................................................................................................. 47 11/ Reason of Replication............................................................................................ 48 12/ Thumb Rule ............................................................................................................... 48 Chapter 4 Transaction Management ........................................................................... 49 1/Definition of Transaction.......................................................................................... 49 Unit of Computing .......................................................................................................... 50 2/Database Consistency............................................................................................... 50 3/ Transaction Consistency ......................................................................................... 51 4/ Replica Consistency.................................................................................................. 51 5/ Reliability ...................................................................................................................... 51 6/ Flat Transactions ....................................................................................................... 52 7/ Nested transaction.................................................................................................... 52 8/ Characterization of Transaction........................................................................... 52 9/ Properties of Transactions ..................................................................................... 53 10/ Transaction Manager ............................................................................................. 55 11/ Scheduler ................................................................................................................... 55 12/ Local Recovery Manager ...................................................................................... 55 Chapter 5 Distributed Concurrency Control (DCC) ................................................ 57 1/ CC in Distributed DBMS .......................................................................................... 57 2/ Key Issue...................................................................................................................... 57 3/ Serializability Theory ............................................................................................... 57 4/ Taxonomy .................................................................................................................... 59 5/ Locking Based CC Algorithms............................................................................... 60 6/ Deadlock ....................................................................................................................... 61 7/ Why Deadlocks........................................................................................................... 62 8/ Methods ........................................................................................................................ 62 9/ Methodology ................................................................................................................ 64 10/ Detection.................................................................................................................... 64 11/ Key Issues ................................................................................................................. 64 Solution............................................................................................................................... 65 Chapter 6 Distributed Reliability ................................................................................... 66 1/ Fundamental Definitions ........................................................................................ 66
iv
2/ Distributed Reliability Protocols ........................................................................... 67 3/ Two-Phase Commit Protocol ................................................................................. 67 4/ State Transitions in 2PC ......................................................................................... 68 5/ Three-Phase Commit ............................................................................................... 71 6/ Quorum Protocols for Replicated Databases .................................................. 71 7/ Network Partitioning ................................................................................................ 71 8/ Open Problems ........................................................................................................... 71 9/ In-Place Update Recovery Information ............................................................ 72 10/ Out-of-Place Update Recovery Information ................................................. 73 11/ Execution Strategies ............................................................................................. 73 12/ Checkpoints .............................................................................................................. 75 IV/ APPLIED METHOD ON ORACLE .................................................................................. 76 1/ Oracle Database Architecture................................................................................... 76 Memory Components .................................................................................................... 76 2/ Tasks of an Oracle Database Administrator ....................................................... 78 3/ Database Planning ........................................................................................................ 79 4/ Oracle management framework .............................................................................. 79 4.1/ Startup command ................................................................................................. 83 4.2/ Shutdown command ............................................................................................ 83 4.2/ How table data is stored .................................................................................... 83 4.3/ Automatic Storage Management..................................................................... 84 5/ Database concurrency ................................................................................................. 84 5.1/ PL/SQL ....................................................................................................................... 84 5.2/ Locks .......................................................................................................................... 85 6/ Database Reliability...................................................................................................... 86 6.1/ Principle of Least Privilege ................................................................................. 86 6.2/ Applying the Principle of Least Privilege ...................................................... 86 6.3/ Monitoring for Suspicious Activity .................................................................. 86 6.4/ Back up and Recovery......................................................................................... 87 7/ Database Efficiency ...................................................................................................... 89 7.1/ Listener ..................................................................................................................... 89 8/ Database performance ................................................................................................ 90 IV/ SUMMERY ........................................................................................................................... 91 V/ APPLY ..................................................................................................................................... 91 VI/ CONCLUSION .................................................................................................................... 91 VII/ REFERENCES ................................................................................................................... 92
I/ INTRODUCTION
In the former of human generation, the most of the knowledge management are based on documentation which was written to support the next generation learner to study and improve such as management, technology business, troubleshooting, architecture, law, regulation and so on Hence, after the new cutting-edge of computer science technology is introduced and the improvement of business management as well as any other fields are helped to make more creative and new things to supporting the demands in humanity. The database are play-act the role of data keeper to support the business such as: - Cost - Time - Accountability - Effectiveness; and - Transparency First of all, I would like to let you know: + What is distributed database? With the simple answer form www.webopedia.com say that a database that consists of two or more data files located at different sites on a computer network. Because the database is distributed, different users can access it without interfering with one another. However, the DBMS must periodically synchronize the scattered databases to make sure that they all have consistent data. A database that is under the control of a central database management system (DBMS), which storage devices are not all attached to a common processor. It may be stored in multiple computers located in the same physical location or dispersed over a network of interconnected computers called Distributed database. As like example, in a database, collections of data can be distributed across multiple physical locations (partitions/fragments). Each partitions of a distributed database may be replicated, said by others. Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database, and hence the price the business is willing to spend on ensuring data security, consistency and integrity. + Basic architecture A database users access the distributed database through: - Local applications: applications which do not require data from other sites. - Global applications: applications which do require data from other sites. + Important considerations Care with a distributed database must be taken to ensure the following: - The distribution is transparent: users must be able to interact with the system as if it were one logical system. This applies to the system's performance, and methods of access amongst other things.
- Transactions are transparent: each transaction must maintain database integrity across multiple databases. Transactions must also be divided into sub transactions, each sub transaction affecting one database system... + Advantages of distributed databases - Reflects organizational structure: database fragments are located in the departments they relate to. - Local autonomy: a department can control the data about them (as they are the ones familiar with it.) - Improved availability: a fault in one database system will only affect one fragment, instead of the entire database. - Improved performance: data is located near the site of greatest demand, and the database systems themselves are parallelized, allowing load on the databases to be balanced among servers. (A high load on one module of the database won't affect other modules of the database in a distributed database.) - Economics: it costs less to create a network of smaller computers with the power of a single large computer. - Modularity: systems can be modified, added and removed from the distributed database without affecting other modules (systems). - Fault-tolerant: ability of a computer system or component designed so that, respond gracefully in the event that a component fails, a backup component or procedure can immediately take its place with no loss of service. Fault tolerance can be provided with software, or embedded in hardware, or provided by some combination. + Disadvantages of distributed databases - Complexity: extra work must be done by the DBAs to ensure that the distributed nature of the system is transparent. Extra work must also be done to maintain multiple disparate systems, instead of one big one. Extra database design work must also be done to account for the disconnected nature of the database. For example, joins become prohibitively expensive when performed across multiple systems. - Economics: increased complexity and a more extensive infrastructure means extra labour costs. - Security: remote database fragments must be secured, and they are not centralized so the remote sites must be secured as well. The infrastructure must also be secured (e.g., by encrypting the network links between remote sites). - Difficult to maintain integrity: in a distributed database, enforcing integrity over a network may require too much of the network's resources to be feasible. - Inexperience: distributed databases are difficult to work with, and as a young field there is not much readily available experience on proper practice. - Lack of standards: there are no tools or methodologies yet to help users convert a centralized DBMS into a distributed DBMS. - Database design more complex: besides of the normal difficulties, the design of a distributed database has to consider fragmentation of data, allocation of fragments to specific sites and data replication.
events happening in close proximity. Preparing a paper on a qualitative evaluation led me to think about the sources of responsiveness and the architecture of database system. And I volunteered to provide some documentation for coursework masters dissertations using Oracle 10g Database Administration Workshop I, to accompany a similar document for other forms which I had been learned. This particular document originated as a document for people wish to learn of how to administrate on database system. That was the urgent priority at the time. After completing it, I realized it was suitable for coursework masters dissertations too. By then it had become larger than intended; but perusing it persuaded me that the length was justified by the topic. But I would like to apology to some readers who want me to explain that this paper cant be larger than this, due to my short weekdays. Even though, I had some very encouraging responses from outside people and other educational institutions. So here it is. My experience suggests to me that the changing of computer technology requires a non-positivist approach. This was confirmed by my reading. It appears that many academics who find themselves in the role of change agents are led eventually towards a more flexible approach to their technical service. However, while in sympathy with the actual processes they used in field settings, I thought their supporting arguments were sometimes inadequate. Constructivism provides one example. The positivist view, or so it seems to me, depends upon reality being directly knowable. Many advisors are opposing this with a view that our theories and language inevitably colour what we see. It seems apparent to me that my mental frameworks colour what I would like to describe. I was encouraged to find such views expressed in the literature presentation. However, in this afterword let me try to make my own views clearer than I chose to in the body of this document. It seems to me that to judge a good database paradigm, it is reasonable to take into account the purpose of choosing a right database through my experience analysis on in-source database development such as: To introduce the important concepts, algorithms and techniques in the design of high performance distributed database systems (DDBS or called DDBMS) What are the differences between DDBS and DDB applications? o Database Systems VS. Distributed Database Systems An important purpose of the course is to introduce to you WHAT will happen after you have submitted your program (transaction) to a distributed database system for execution and HOW the system meet the performance requirements (WHAT ARE THE PERFORMANCE REQUIREMENTS???) We hope that the concepts, techniques and algorithms over in this course will be useful to you: When you develop database applications (as an application programmer) with a distributed database system and; When you design a new distributed database system (as a database system designer)
Distributed Database System DBMS data DBMS data DBMS data DBMS data
The below, is database processing in one of the others site of in Distributed DBMS Environment
Implicit Assumptions
Data stored at a number of sites each site logically consists of a single processor. Processors at different sites are interconnected by a computer network no multiprocessors parallel database systems Distributed database is a database, not a collection of files data logically related as exhibited in the users access patterns relational data model D-DBMS is a full-fledged DBMS not remote file system, not a TP system
Motivation
Decentralization Disaster recovery and backup manipulation Technology integration Data replication
Distributed Computing
A number of autonomous processing elements (not necessarily homogeneous) that are interconnected by a computer network and that cooperate in performing their assigned tasks. Synonymous terms distributed function distributed data processing multi-processors/multi-computers satellite processing backend processing dedicated/special purpose computers timeshared systems functionally modular systems
What is Distributed?
Processing Logic: In fact the definition of distributed computing elicits processing elements or processing logic are distributed Function: Various functions of computer system could be delegated to various pieces of hardware or software Data: Data used by a number of applications may be distributed to a number of processing sites Control: The control of the execution of various tasks might be distributed instead of being performed by one computer system
DDBS Environment
Shared-Memory Multiprocessor
Processor Unit Processor Unit Processor Unit
Memory
I/O System
Advantages of DDBMSs
Reflects organizational structure Improved share ability and local autonomy Improved availability Improved reliability Improved performance Economics
Disadvantages of DDBMSs
Complexity Cost Security Integrity control more difficult Lack of standards Lack of experience Database design more complex
Applications
Manufacturing - especially multi-plant manufacturing Military command and control Corporate MIS Airlines Hotel chains Payment system Any organization which has a decentralized organization structure
Before we go ahead for the next step, there are few requirements that caused those being the using of distributed database system with transparency.
Network Transparency
In a distributed database management environment, the network needs to be shielded in a manner data is shielded in centralized DBMS. Preferably, the user should be protected from the operational details of the network. Further more, it is desirable to hide even the existence of a network. Then there would be no difference between database applications that would run on a centralized database or on distributed database.
Distribution Transparency
Location transparency: refers to the fact that the command used to perform a task is independent of both the location of data and the system on which an operation is carried out. Naming Transparency: this means that a unique name is provided for each object in the database. In the absence of naming transparency users are required to embed the location name as the part of the object name.
10
Location Independence
Users should not have to know where data is physically stored but rather should be able to behave at least from a logical stand point as if the data were all stored at their own local site Location independence is desirable as it simplifies user programs and terminal activities. It allows data to migrate from one site to another without invalidating any of those programs or activities. Such migratability is desirable because it allows data to be moved around network in response to changing performance requirements. Location independence is just an extension to the distributed case of familiar concept of physical data independence
Data Replication
For performance, reliability, and availability reasons, it is usually desirable to be able to distribute data in a replicated fashion across the machines in a network. Such replication helps performance since diverse and conflicting user requirements can more easily be accommodated. Data that is commonly accessed by one user can be placed on that users local machine as well as the machine of another user with the same access requirements. This increases the locality of reference. Further if one machine fails, a copy of the same data is still available on another machine on the network.
Replication Transparency
Assuming that data is replicated, the issue related to transparency that needs to be addressed is whether the users should aware of the existence of copies or whether the system should handle the management of copies and the user should act as if there is a single copy of data. From a users perspective it is obvious and from the system perspective it is not that simple. It is not the system that decides whether or not to have copies and how many copies to have, but the user application. It is desirable that replication transparency be provided as a standard feature of DBMS. Distributing these replicas across a network in a transparent manner is the domain of network transparency.
Data Fragmentation
In a distributed database environment it is commonly desirable to divide each database relation into smaller fragments and treat each fragment as a separate database object. This is commonly done for reasons of performance, reliability, and availability. Furthermore, fragmentation reduces the negative effects of replication. Each replica is not the full relation but only a subset of it, thus less space is required and fewer data items need to be managed.
11
There are two general types of fragmentation alternatives: o Horizontal Fragmentation: A relation is partitioned into a set of sub relations each of which have a subset of tuples (rows) of original relation. o Vertical Fragmentation: Where each sub relation is defined on a subset of attributes (columns) of original relation.
Fragmentation Independence
When database objects are fragmented the user queries that were specified on entire relation now required to be dealt with the sub relations. Typically this requires a translation from what is called a global query to several fragment queries.
DBMS Independence
Under this heading all that is really needed that the DBMS instances at different sites all support the same interface they do not necessarily all have to be copies of the same DBMS software. If Ingress and Oracle both supported the official SQL standard then it might be possible to get an Ingress site and an Oracle site to talk to each other in the context of distributed system Support of heterogeneity is definitely desirable. The fact is, real world computer installations typically run not only many different machines and many different operating systems, they very often run different DBMS as well; and it would be nice if those different DBMSs could all participate somehow in a Distributed system. In other words ideal distributed system should provide DBMS independence.
Hardware Independence
Real world computer installations typically involve a multiplicity of different machines IBM machines, ICL machines, HP machines, PCs and workstations of various kinds etc. etc. and there is a real need to be able to integrate the data on all of those systems and present the user with a single system image. Thus it is desirable to run the same DBMS on different hardware platforms, and further more to have all those machines all participate as equal partners in distributed systems.
12
The second layer at which transparency is provided is the operating system level. State of the art operating systems provide some level of transparency to system users. Providing transparent access to resources at the operating system level can obviously be extended to the distributed environment, where the management of the network resource is taken over by the distributed operating system.
The third layer at which transparency can be supported is within the DBMS. It is the responsibility of the DBMS to make all necessary translations from the operating system to the higher level user interface.
DBMS problems are inherently more complex than centralized database management ones as they include not only the problems found in centralized environment but also a new set of unresolved problems
Complexity
Cost
Distributed systems require additional hardware (communication mechanisms etc.) thus have increased hardware costs. However the trend towards decreasing hardware costs does not make this a significant factor. The most important cost component is due to the replication of effort (manpower), which usually results in an increase in the personnel in the data processing operations. Therefore, the trade-off between increased profitability due to more efficient and timely use of information and increased personnel costs has to be analyzed carefully.
Distribution Control
This point was stated previously as an advantage of DDBS. Unfortunately, distribution creates problems of synchronization and coordination. (Disavantage) Distributed control can, therefore, easily becomes a liability if care is not taken to adopt policies to deal with these issues.
Security
One of the major benefits of centralized databases has been the control it provides over the access to data. Security can easily be controlled in a central location with DBMS enforcing the rules. However in distributed database system, a network is involved which is a medium that has got its own security requirements. It is well known that there are serious problems in maintaining adequate security over computer networks. Thus the security problems in distributed database systems are by nature more complicated then in centralized ones.
13
The research in this area mostly involves mathematical programming in order to minimize cost of storing the database, processing transactions against it, and communications.
14
15
Heterogeneous Databases
When there is no homogeneity among databases at various sites either in terms of the way data is logically structured (data model) or in terms of mechanisms provided for accessing it (data language), it becomes necessary to provide a translation mechanism between database systems. This translation mechanism usually involves a canonical form to facilitate data translation, as well as program templates for translating data manipulation instructions.
16
First Attempts
In 1972, the Computer & Information Processing Committee of American National Standards Institute (ANSI) established a study group on DBMS under the auspices of its Standards Planning and Requirements Committee (SPARC).
The Mission
To study the feasibility of setting up standards in the area of Database Management System. Determining all areas that can be standardized if feasible
Proposal
The architectural framework proposed came to be known as the ANSI/SPARC architecture. The study proposed that the interface be standardized, and defined as architectural framework that contained 43 interfaces, 14 of which would deal with the physical storage subsystem of the computer.
Standardization Approaches
A reference model can be described according to three different approaches + Based on Components The components of the system are defined together with the interrelationship between the components. Thus a DBMS consists of a number of components, each of which provides some functionality. The orderly and well-defined interaction of these components provides the total system functionality
17
+ Based on Functions The different classes of users are identified and the functions that the system will perform for each class is defined. This results in hierarchical system architecture with well-defined interfaces between the functionalities of different layers. The ISO architecture falls in this category
+ Based on Data The different types of data are identified, and an architectural framework is specified which defines the functional units that will realize or use data according to these different views. Since data is the central resource that a DBMS manages, this approach (data-logical approach) is claimed to be the preferable choice for standardization activities
+ Internal View At the lowest level of the architecture is the internal view, which deals with the physical definition and organization of data. The location of data on different storage devices and the access mechanisms used to reach and manipulate data are the issues dealt with this level.
18
Internal Schema
At the internal level, the storage details of these relations are described. Assuming that the emp relation is stored in an indexed file, where the index is defined on the key attribute (ENO) called EMINX. It is also assumed a HEADER field is associated which might contains flags (delete, update, etc.) and other control information. Then the internal schema definition of the relation may be as follows:
INTERNAL_REL EMPL [ INDEX ON E# CALL EMINX FIELD = { HEADER : BYTE(1) E# : BYTE (9) E:NAME : BYTE (15) TIT : BYTE (10) } ]
+ External View At the other extreme is the external view which is concerned with how users view the database. An individual users view represents the portion of the database that will be accessed by the user as well as the relationship that the user would like to see among the data. A view can be shared among a number of users with the collection of user views making up the external schema Finally the external views can be described using SQL notation. Considering two applications for example say one that calculates the payroll payments for engineers, and the second that produces a report on the budget of each project.
External Schema
CREATE AS
VIEW SELECT
FROM WHERE
PAYROLL (ENO, ENAME, SAL) EMP.ENO, EMP.ENAME, PAY.SAL EMP, PAY EMP.TITLE=PAY.TITLE
The second application is simply a projection of PROJ relation which can be specified as:
CREATE AS
19
+ Conceptual Schema In between these two ends is the conceptual schema, which is an abstract definition of the database. It is the real world view of the enterprise being modeled in the database. As such, it is supposed to represent the data and the relationship among data without considering the requirements of individual applications or restrictions of physical storage media An Example: Considering the Engineering Database example for the four relations o EMP, o PROJ, o ASG, and o PAY, Conceptual Schema should describe each relation with respect to its attributes and key. The description might look like the following: 1/ RELATION PAY (Conceptual) RELATION PAY [ KEY = {TITLE} ATTRIBUTES = { TITLE : CHARACTER (10) SAL : NUMERIC (6) } ] 2/ RELATION PROJ (Conceptual) RELATION PROJ [ KEY = {PNO} ATTRIBUTES = { PNO : CHARACTER (7) PNAME : CHARACTER (20) BUDGET : NUMERIC (7) } ]
ANSI/SPARC Architecture
The investigation of the ANSI/SPARC architecture with respects to functions results in a considerably more complicated view. Conventions o Square Boxes : o Hexagons : o Arrows : o I shaped Bars: o Triangle : Processing Functions Administrative Roles Data, Command, Program Interfaces Data Dictionary
20
Enterprise Administrator
Database Administrator
GD/D
Internal Database
Application
External Database
Application
Program
Program
System Programmer
Application Programmer
Partial Schematic of ANSI/SPARC Architectural Model Data Dictionary/Directory The major components that permits mappings between different data organizational views is the data dictionary/directory (depicted as a triangle), which is a meta database which contains the schema & mapping definitions. It also contains the usage statistics, access control information and the like. It serves as the central component in both processing different schemes and in providing mapping among them.
Roles Database Administrator: is responsible for maintaining the internal schema definition. Enterprise Administrator is responsible for defining the internal schema definition. Application administrator is responsible for preparing the external schema for the applications.
21
Architectural Models
Classification Considering the possible ways in which multiple databases may be put together the system can be classified with respect to o The Autonomy of Local System o Their Distribution o Their Heterogeneity.
Autonomy Autonomy refers to the distribution of control, not of data. It indicates the degree to which individual DBMS can operate independently Autonomy is also a function of a number of factors such as whether the component systems exchange information, whether they can independently execute transactions, and whether one is allowed to modify them.
Requirements of Autonomous Systems According to Gligour & Popescu Zeletin: o The local operations of individual DBMS are not affected by their participation in multidatabase systems. o The manner in which individual DBMSs processes queries and optimize them should not be affected by the execution of global queries that access multiple database. System consistency or operation should not be compromised when individual DBMSs join or leave the multidatabase confederation.
22
According to Du and Elmagarmid: o Design Autonomy: Individual DBMSs are free to use the data models and transaction management techniques that they prefer. o Communication Autonomy: Each of the individual DBMSs is free to make its own decision as to what type of information it wants to provide to the other DBMS or to software that controls their global execution Execution Autonomy: Each DBMS can execute the transactions that are submitted to it in any way that it wants to
Aspects of Classification Tight Integration Systems In these tightly integrated systems, the data managers are implemented so that one of them is in control of the processing of each users request even if that request is serviced by more than one data manager. The data managers do not operate as independent DBMS even though they usually have the functionality to do so. o Where a single image of the entire database is available to any user who wants to share the information, which may reside in multiple database. From the users perspective the data is logically centralized in one database
Semiautonomous Systems o That consists of DBMS that can operate independently, but have decided to participate in a federation to make their local data shareable. o Each part of DBMS determines what parts of their own database they will make accessible to users of other DBMS. They are not fully autonomous systems because they need to be modified to enable them to exchange information with one another.
Total Isolation o Where the individual systems are stand alone DBMSs, which know neither of the existence of other DBMSs nor how to communicate with them. o In such systems, the processing of user transactions that access multiple databases is especially difficult since there is no global control over the execution of individual DBMSs
23
3/ Distribution
Whereas autonomy refers to distribution of control, the distribution dimension deals with data. There are a number of ways DBMS have been distributed. Mainly they are of two types.
3.1.2/ General Idea The general idea is very simple and elegant to distinguish the functionality that need to be provided and divide these functions into two classes: Server functions and Client functions. 3.1.3/ Client-Server Reference Architecture
3.1.4/ Process Centric View o Any process that requests the services of another process is its Client and vice versa. However it is important to note that Client Server Computing and Client Server DBMS assist is used in its most modern context do not refer to processes, but to actual machines. Thus the focus is into what software should run on Client machines and what software should run on the Server machine
24
3.1.5/ Task Management (Server) o The first and foremost important thing in a Client/Server Architecture is that the server does most of the data management work. This means that all Query processing, Query optimization, Transaction management and Store management is done at the server.
3.1.6/ Task Management (Client) o o The client provides Application and User Interface DBMS client module that is responsible for Managing the data which is cached to the client, Managing the transaction locks and Managing Consistency checking of user queries at the client side Of course, there is operating system and communication software that runs on clients and server but communication between client and server is at the level of SQL statements. In other words, client passes SQL queries to the server without trying to understand or optimize them. The server does most of the work and returns the result relation to the client.
Advantages
More efficient division of labor Horizontal and vertical scaling of resources Better price/performance on client machines Ability to use familiar tools on client machines Client access to remote data (via standards) Full DBMS functionality provided to client workstations Overall better system price/performance Classification There are a number of different types of client server architecture. Multiple Client / Single Server Multiple Client / Multiple Server Multiple Client-Single Server o The simplest is the case where is only one server which is accessed by multiple clients. From data management perspective this is not much different from centralized databases since the database is stored only in one machine which also hosts the software to manage it. However there are some important differences in the way transactions are executed and caches are managed.
25
Problems o o o Server forms bottleneck (as we can see above) Server forms single point of failure Database scaling difficult
Multiple Client-Multiple Server o o The sophisticate client/server architecture is there are multiple servers in the system. In this case two alternative management strategies, that is called direct and indirect connection in oracle, are possible: either each client manages its own connection to the appropriate server or each client knows of only its home server which then communicates with other servers as required
26
3.2.2/ Three Layer Architecture o As the data in a distributed database is usually fragmented and replicated, to handle the phenomenon of fragmentation and replication, the logical organization of data at each site needs to be described. Therefore there needs to be a third layer in the architecture, the local conceptual schema (LCS). Thus the global conceptual schema (GCS) is union of the local conceptual schema. Finally, user applications and user access to database is supported by external schemas (ESs)
27
3.2.3/ What does it Supports o This architectural transparency. model provides all the necessary levels of
Data independence is supported since the model is an extension of ANSI/SPARC, which provides such independence naturally. Location and replication transparencies are supported by the definition of the local and global conceptual schemas and the mapping in between. Network transparency on the other hand is supported by the definition of the global conceptual schema. The user queries data irrespective of its location or of which local component of the distributed database system will service it as the distributed DBMS translates global queries into a group of local queries, which are executed by distributed DBMS components at different sites that communicate with one another.
3.2.4/ Functional Description In terms of the detailed functional description of our model, the ANSI/SPARC model is extended by addition of a global directory/dictionary (GD/D) that permits the required global mappings. The local mappings are still performed by a local directory/ dictionary. Thus the local database management component is integrated by means of global DBMS functions. As the local conceptual schemas are mappings of the global schema onto each site such database is designed in top down fashion and therefore all external view definitions are made globally. There is an existence of a local database administrator at each site in order to have local control over the administration of data, which is one of the primary motivations of distributed processing. 3.2.5/ Component Description Distributed DBMS consists of a number of components. One component handles the interaction with users, and other deals with storage. The first major component, which is called as the user processor, consists of four elements: 3.2.5.1/ User Interface Handler: is responsible for interpreting user commands as they come in, and formatting the result data as it is sent to the user. This component is responsible is responsible in establishing a link between the system at one end and the user on the other. 3.2.5.2/ Semantic Data Controller: uses the integrity constraints and authorizations that are defined as part of the global conceptual schema to check if the user query can be processed. This component is responsible for authorization and other functions. 3.2.5.3/ Global Query Optimizer and Decomposer: are determines an execution strategy to minimize a cost function, and translates the global queries to local ones using the global and local conceptual schemas as well as the global directory. The global query optimizer is responsible among other things for generating the best strategy to execute distribute join operations.
28
3.2.5.4/ Distributed Execution Monitor: co-ordinates the distributed execution of request. it is also called the distributed transaction manager. In executing queries of a distributed fashion the execution monitors at various sites may, and usually do, communicate with one another.
USER
System Responses
USER PROCESSOR
User Requests
GD/D
4/ Data Processor
The second major component of a distributed DBMS, which the primary component Oracle uses, is the data processor and consists of three elements:
System Log
29
5/ Heterogeneity
Heterogeneity may occur in various forms in distributed systems, ranging from hardware heterogeneity and difference in networking protocols to variations in data managers. The important ones relate to data models, query languages, and transaction management protocols. Representing data with different modeling tools creates heterogeneity because of the inherent expressive powers and limitations of individual data models Heterogeneity in query languages not only involves the use of complexity different data access paradigm in different data models but also covers differences in languages even when the individual system use the same data model. Different query languages that use the same data model often select very different methods for expressing identical requests (e.g. DB2 uses SQL, while INGRES uses QUEL).
6/ Architectural Alternatives
The alternatives along each dimension are identified by numbers as 0,1, or 2 and these numbers of course have different meanings along each of the dimensions. Along the autonomy dimension, 0 represents tight integration, 1 represents semi autonomous systems and 2 represents total isolation. Along distribution, 0 identifies homogenous systems while 1 stands for heterogeneous systems.
30
7/ Implementation Alternatives
31
8/ Multi-DBS Architecture
The differences in the level of autonomy between the distributed multi-DBMS and distributed DBMSs are also reflected in their architectural models. The fundamental difference relates to the definition of the global conceptual schema.
32
33
8.5.2/ Multilingual multi DBMS An alternative is multilingual architecture, where the basic philosophy is to permit each user to access the global database by means of external schema, defined using the language of the users local DBMS. The GCS definition is quite similar in multilingual architecture and the unilingual approaches, the major difference being the definition of external schema of local database. Queries against the global database are made using the language of local DBMS, but they generally require some processing to be mapped to the global conceptual schema. The multi lingual approach obviously makes querying the database easier from the users perspective. However, it is more complicated because translation of queries is required at runtime. The multilingual approach is used in Sirus-Delta and in the HD-DBMS project
34
35
36
Feedback
Feedback
Requirement Analysis The top down design process begins with requirement analysis that defines the environment of the system and elicits both the data and processing needs of all potential database users. The requirement study also specifies where the final system is expected to stand with respect to the objectives of a distributed DBMS. View Design Regarding, to the requirement document is input to two parallel activities: view design & conceptual design. The view design activity deals with defining the interface for the end users. Conceptual Design The process by which the enterprise is examined to determine entity types and relationships among these entities called Conceptual design. It can possibly be divided into two related activity group: + Entity Analysis: is concerned with determining the entities, their attributes and the relationship among them. + Functional Analysis on the other hand is concerned with determining the fundamental functions within which the modeled enterprise is involved. Relationship It is an integration of user views. View integration should be used to ensure that entity and relationship requirements for all the views are covered in the conceptual schema. The conceptual model should support applications, but future applications also. not only the existing
37
Activities In conceptual design and view design the user needs to specify Data Entities and must determine the applications that will run on the database as well as Statistical information about these applications. Statistical information includes: specification of the frequency of user applications, volume of various information and alike. GCS & Access Pattern Information Design From the conceptual design step comes the definition of Global conceptual schema and access pattern information. Note: GCS & API are inputs to the distribution design step. Distribution Design The objective at this stage is to design the Local Conceptual Schema by distributing the entities over the sites of the distributed system. It is possible to treat each entity as a unit of distribution and in relational model entity corresponds to relations. Rather than distributing relations, it is quite common to divide them into sub-relations, called fragments, which are then distributed. Thus the distribution design Fragmentation and Allocation activity consists of two steps:
Physical Design Is the last step in the design process, which maps the local conceptual schemas to the physical storage devices available at the corresponding sites. The inputs to this process are the local conceptual schema and access pattern information about the fragments in these. Observation and Monitoring During, the period of design and development activity that is an ongoing process requiring constant monitoring and periodic adjustments and tuning. Here one does not monitor the behavior of the database implementation but also the suitability of user views. This results in some form of feedback, which may results in backing up to one of the earlier steps in the design.
2.1.2/ Bottom Up Design Top down design is suitable approach when a database system is being designed from scratch. Commonly, however, a number of databases already exist, and the design task involves integrating them into one database. The bottom up approach is suitable for this type of environment Design Approach The starting point of bottom up design is the individual conceptual schema. The process consists of integrating local schemas into the global conceptual schemas. This type of environment exists primarily in the context of heterogeneous databases.
38
Distribution Design Issues: + Why fragment at all? + How to fragment? + How much to fragment? + How to test correctness of decomposition? + How to allocate? + Information requirements? Unit of Fragmentation With respect to fragmentation, the important issue is the appropriate unit of distribution. A relation is not a suitable unit, for a number of reasons. Relation Subsets First the application views are usually subsets of relations. Therefore, the locality of the access of applications is defined not on the entire relation but on their subsets. Hence, it is natural to consider subsets of relations on distribution units. If the application which views defined on a given relation resides on different sites two alternatives can be followed, with the entire relation being the unit of distribution, either the relation is not replicated and stored at only one site or it is replicated at all or some of its sites where application reside. Problem Areas The former results in an unnecessary are high volume of remote data access. The later on the other hand, has unnecessary replication, which causes problems in executing updates and may not be desirable if storage is limited. Advantages Fragments, each being treated as a unit, permits a number of transactions to execute concurrently. In addition the fragmentation of relations typically results in parallel execution of a single query by dividing it into a set of sub-queries that operate on fragments. Thus fragmentation typically increases the level of concurrency and therefore the system throughput. Disadvantages If the application have conflicting requirements which prevents decomposition of the relation into mutually exclusive fragments, those applications whose views are defined on more than one fragment may suffer performance degradation. It might be necessary to retrieve data from two fragments and then take either their union or their join, which is costly. Avoiding this is a fundamental fragmentation issue. The second problem is related to semantic data control, specifically to integrity checking. As a result of fragmentation, attributes participating in a dependency may be decomposed into different fragments which might be allocated to different sites. In this case even the simpler task of checking for dependencies would result in chasing after data in number of sites.
39
Fragmentation Alternatives Relation instances are essentially tables, so the issue is finding one of the alternative way of dividing a table into smaller ones. There are clearly two alternatives for this: Dividing Horizontally Dividing Vertically Degree of Fragmentation Its very small units of fragmentation, that extent to which the database should be fragmented is an important decision that affects the performance of query execution. The degree of fragmentation goes from one extreme, that is not to fragment at all, to the other extreme, to fragment to the level of individual tuples (in case of horizontal fragmentation) or to the level of individual attributes (in case of vertical fragmentation). What is needed is that to find a suitable level of fragmentation which is a compromise between the two extremes. Such a level can only be defined with respect to the application that will run on the database How Issue Characterized with respect to a number of parameters. According to the value of these parameters, individual fragments can be identified. Correctness of Fragmentation The following three rules can be enforced during fragmentation which together will ensure that the database does not undergo semantic changes during fragmentation. + Completeness: This property which is identical to the lossless decomposition property of normalization is also important in fragmentation since it ensures that the data in global relation is mapped into fragments without any loss. If a relation instance R is decomposed into fragments R1, R2, Rn , each data item that can be found R can also be found in one or more Ris. + Reconstruction: This operator will be different for different forms of fragmentation. The reconstruct ability of the relation from its fragments ensures that constraints defined on the data in the form of dependencies are preserved. Rn , then there should exist some relational operator such that R = Ri, Ri FR + Disjointness: If relation R is decomposed into fragments R1, R2, Rn , and data item di is in Rj, then di should not be in any other fragment Rk (k j ). This criterion ensures that the horizontal fragments are disjoint. If relation R is vertically decomposed, its primary key attributes are typically repeated in all its fragments. Therefore, in case of vertical partitioning, disjointness is defined only on non primary key attributes of a relation.
40
3/ Horizontal Fragmentations
Primary horizontal partition is performed using predicates that are defined on that relation. Derived horizontal partition is the partitioning of a relation which results from the predicates being defined on another relation. There are two versions of horizontal partitioning: Primary Derived
PNO PNAME P1 P2 P3 P4 P5 Instrumentation Database Develop CAD/CAM Maintenance CAD/CAM BUDGET LOC 150000 135000 250000 310000 500000 Montreal New York New York Paris Boston
PROJ
PROJ1 : Projects with budgets less than $200,000 PROJ2 : Projects with budgets greater than or equal to $200,000
41
3.2.1/ Quantitative Information In terms of quantitative information about user applications two sets of data is required 3.2.1.1/ Minterm Selectivity [sel(mi)]: refers to the number of tuples of a relation that would be accessed by user query specified by minterm predicate mi. For example: the selectivity of m1 is 0 since there are no tuples in PROJ that satisfy the minterm predicate. The selectivity of m2 is 2 3.2.1.2/ Access Frequency acc(qi): refers to the frequency with which user application access data. For instance: If Q = {q1, q2, , qn}is a set of user queries, acc(qi) indicates the access frequency of query qi in a given period.
42
PROJ1
PNO PNAME P1 BUDGET LOC Montreal Instrumentation 150000
PROJ2
PNO PNAME P2 Database Develop P3 BUDGET 135000 250000 LOC New York New York
PROJ3
PNO PNAME P4 Maintenance BUDGET 310000 LOC Paris
43
Example Given a link L1 where owner(L1) = PAY and member(L1) = EMP EMP1 = EMP PAY1 EMP2 = EMP PAY2 Where PAY1 = s SAL 30000 (PAY) PAY2 = s SAL > 30000 (PAY)
EMP2 EMP1
ENO E3 E4 E7 ENAME A.Lee J.Miller R.Davis TITLE Mech.Eng. Programmer Mech.Eng. ENO E1 E2 E5 E6 E8 ENAME J.Doe M.Smith B.Casey L.Chu J.Jones TITLE Mech.Eng. Syst.Anal. Syst.Anal. Mech.Eng. Syst.Anal.
Needed To carry out a derived horizontal fragmentation three inputs are needed: The set of partitions of the owner relation The member relation The set of semijoin predicates between the owner and the member. (e.g EMP.TITLE = PAY.TITLE)
6/ Minterm Fragments
A horizontal fragment Ri of relation R consists of all the tuples of R which satisfy a minterm predicate mi. Given a set of minterm predicates M, there are as many horizontal fragments of relation R as there are minterm predicates. Set of horizontal fragments also referred to as minterm fragments
44
7/ Vertical Fragmentations
PNO P1 P2 P3 P4 P5 PNAME Instrumentation Database Develop CAD/CAM Maintenance CAD/CAM BUDGET 150000 135000 250000 310000 500000 LOC Montreal New York New York Paris Boston
PROJ
PNO P1 P2 P3 P4 P5
PNO P1 P2 P3 P4 P5
7.1/ Need
Vertical fragmentation of a relation R produces fragments R1, R2, , Rr, each of which contains a subset of Rs attributes as well as the primary key of R. The objective of vertical fragmentation is to partition a relation into a set of smaller relations so that many of the user applications will run on only one fragment. In this context optimal fragmentation is one that produces a fragment scheme which minimizes the execution time of user applications that runs on these fragments.
7.2/ Motivation
VF in the context of a design tool allows the user queries to deal with smaller relations thus causing a smaller number of page access. It has also been suggested that most active sub relations can be identified and placed in faster memory subsystem where memory hierocracies are supported
45
7.4.1/ Grouping Starts by assigning each attribute to one fragment, and at each step, joins some of the fragments until some criteria is satisfied. This technique was first suggested for centralized database and then used later for distributed databases. 7.4.2/ Splitting Starts with a relation and decides on beneficial partitioning based on the access behavior of applications to the attributes. Splitting technique is a more preferred approach as it fits more naturally within the top down design methodology. Further more splitting generates non overlapping fragments where as grouping typically results in overlapping fragments. Of course, none overlapping refers only to non primary key attributes.
8/ Hybrid Fragmentation
In most cases a simple horizontal or vertical fragmentation of a database schema will not be sufficient to satisfy the requirements of user application. In such case a vertical fragmentation may be followed by a horizontal one or vice versa, producing a tree structured partitioning. Since the two types of partitioning strategies are applied one after the other it is called Hybrid Fragmentation.
46
9/ Allocation Alternatives
Assuming that the database is fragmented properly, one has to decide on the allocation of the fragments to various sites on the network. When data is allocated, it may either be replicated or maintained as a single copy. There are two modes of allocation alternatives:
9.1/ Non-replicated
Partitioned: each fragment resides at only one site
9.2/ Replicated
Fully Replicated: each fragment at each site Partially Replicated: each fragment at some of the sites
10.2/ Performance
The allocation strategy is designed to maintain a performance metric. The objective is to minimize the response time, maximize the system throughput at each site.
47
10.3.3/ Site Information - unit cost of storing data at a site - unit cost of processing at a site 10.3.4/ Networking Information - communication cost/frame between two sites - frame site
>= 1
48
Example: A Simple SQL Query Look on SQL query which increase more budget CAD/CAM project up tp 10% UPDATE PROJ SET BUDGET = BUDGET *1.1 WHERE PNAME = CAD/CAM
49
Assume BUDGET_UPDATE is name of transaction. Through the SQL, A transaction was structured as below: Begin_transaction BUDGET_UPDATE begin EXEC SQL UPDATE PROJ SET BUDGET = BUDGET*1.1 WHERE PNAME = CAD/CAM end. Example: Airline Database with the relations: - FLIGHT(FNO, DATE, SRC, DEST, STSOLD, CAP) - CUST(CNAME, ADDR, BAL) - FC(FNO, DATE, CNAME, SPECIAL) Begin_transaction Reservation begin input(flight_no, date, customer_name); EXEC SQL UPDATE FLIGHT SET STSOLD = STSOLD + 1 WHERE FNO = flight_no AND DATE = date; EXEC SQL INSERT INTO FC(FNO, DATE, CNAME, SPECIAL); VALUES (flight_no, date, customer_name, null); output(reservation completed) end. {Reservation} Unit of Computing In the former database, there is no concept of Consistent Execution or Reliable Computation associated with the concept of query. According to the experiences of database technical and developer such as: What happens if two queries attempt to update the same data item concurrently? What happens when system failure occurs during the execution of a query? Thus, the concept of a transaction is used within the database domain as a basic unit of consistent and reliable computing. More over, the more database concept, which database specialist needs as below:
2/Database Consistency
A database is a consistent state if it obeys all of the consistency (integrity) constraints defined over it. And its state changes occur due to modifications, insertions, deletions (together called updates). Which is the best point of database should never enter an inconsistent state. An important property is that the database can be temporarily inconsistent during the execution of transaction but should be consistent again when the transaction terminates.
50
3/ Transaction Consistency
Transaction Consistency: refers to the actions of concurrent transactions. The database should remain in a consistent state even if there are a number of user requests that are concurrently accessing the database. A complication arises when replicated database are considered.
4/ Replica Consistency
A replicated database is a mutually consistent state if all the copies of every data item in it have identical values. This is called one copy equivalence since all replica copies are forced to assume the same state at the end of a transactions execution.
5/ Reliability
Reliability refers to both the resiliency of a system to various system failures and its capability to recover from them. A resilient system is tolerant of a system failure and can continue to provide services even when failure occurs. A recoverable DBMS is one that can get to a consistent state (by moving back or forward) following various types of failures.
51
Example: Begin_transaction Reservation begin input(flight_no, date, customer_name); EXEC SQL SELECT STSOLD,CAP INTO temp1,temp2 FROM FLIGHT WHERE FNO = flight_no AND DATE = date; if temp1 = temp2 then output(no free seats); Abort else EXEC SQL UPDATEFLIGHT SET STSOLD = STSOLD + 1 WHERE FNO = flight_no AND DATE = date; EXEC SQL INSERT INTO FC(FNO, DATE, CNAME, SPECIAL); VALUES (flight_no, date, customer_name, null); Commit output(reservation completed) endif end end. {Reservation}
6/ Flat Transactions
Flat transactions have a single start point (Begin_transaction) and a single termination point. (End_transaction). Consists of a number of primitive operations embraced between begin and end markers.
7/ Nested transaction
The operations of a transaction may themselves be transactions (many Flat Transaction are mixed). Begin_transaction Reservation Begin_transaction Airline end. {Airline} Begin_transaction Hotel end. {Hotel} end. {Reservation}
8/ Characterization of Transaction
The data items that a transactions reads are said to constitute its read set (RS). Similarly, the data items that a transaction writes are said to constitute its write set (WS). Finally the union of the read set and write set of a transaction constitutes its base set (BS = RS U WS) Example RS[Reservation]= {FLIGHT.STSOLD, FLIGHT.CAP} WS[Reservation]= {FLIGHT.STSOLD, FC.FNO, FC.DATE, FC.CNAME, FC.SPECIAL} BS[Reservation]= {FLIGHT.STSOLD, FLIGHT.CAP, FC.FNO, FC.DATE, FC.CNAME,
FC.SPECIAL}
52
9/ Properties of Transactions
The consistency and reliability aspects of transaction are due to four properties, which commonly referred as the ACIDity of transactions.
9.1/ Atomicity
Atomicity: refers to the fact a transaction is treated as a unit of operation. Therefore, either all the transactions actions are completed, or none of them are. This is also called as all or nothing property Atomicity requires that if the execution of transaction is interrupted by any sort of failure, the DBMS will be responsible for determining what to do with the transaction upon recovery from failure. It can either be terminated by completing the remaining actions; and/or it can be terminated by undoing all the actions that have been executed. 9.1.1/ Failure Classification Generally there are two types of failures that the transaction may fail due to Input data errors, deadlocks and other factors. In this case either the transaction aborts itself or the DBMS may abort it while handling deadlocks. Maintaining transaction atomicity in the presence of this type of failure is called transaction recovery. 9.1.2/ Crash recovery A transaction may fail caused by system crashes such as storage media failures, processors failures, communication link breakages, power outages and so on To ensuring the transaction atomicity in the presence of system crashes is called crash recovery
9.2/ Consistency
The consistency of a transaction is simply its correctness. In other words, a transaction is a correct program that maps one consistent state to another. Transaction consistency is ensured by semantic data control and concurrency control mechanisms 9.2.1/ Consistency Classification This classification groups database into four levels of consistency. This uses the concept of dirty data which refers to data values that have been updated by a transaction prior to its commitment. Based on the concept of dirty data the four levels of are, which called Consistency Degrees defined as follows: + Degree 0: Transaction T does not overwrite dirty data of other transactions. + Degree 1 Transaction T does not overwrite dirty data of other transactions. T does not commit any writes before EOT + Degree 2 Transaction T does not overwrite dirty data of other transactions. T does not commit any writes before EOT T does not read dirty data from other transactions. + Degree 3 Transaction T does not overwrite dirty data of other transactions. T does not commit any writes before EOT T does not read dirty data from other transactions. Other transactions do not dirty any data read by T before T completes.
53
9.3/ Isolation
Isolation is the property of transactions which requires each transaction to see a consistent database at all times. In other words, an executing transaction cannot reveal its results to other concurrent transactions before its commitment. These isolations are: Isolation Levels Based on these phenomena, the isolation levels are defined as: + Read Uncommitted: there are three phenomena are possible for transactions operating at this level. + Read Committed: Fuzzy reads and phantoms are possible, but dirty reads are not. + Repeatable Read: only phantoms are possible. + Anomaly Serializable: none of the phenomenon are possible. 9.3.1/ Serializability If several transactions are executed concurrently, the results must be the same as if they were executed serially in some order. 9.3.2/ Incomplete results An incomplete transaction cannot reveal its results to other transactions before its commitment. Necessary to avoid cascading aborts. Example
9.3.3/ Dirty Read It refers to data items whose values have been modified by a transaction that has not yet committed. T1 modifies x which is then read by T2 before T1 terminates; T1 aborts T2 has read value which never exists in the database. Example ,W1(x),,R2(x),,C1(or A1),,C2(or A2) or ,W1(x),,R2(x),,C2(or A2),,C1(or A1)
9.3.4/ Non-repeatable or Fuzzy Read Transaction T1 reads a data item value. Another transaction T2 then modifies or deletes that data item and commits. If T1 then attempts to reread the data item, it either reads a different value or it cannot find the data item at all; thus two reads within the same transaction T1 return different results.
54
Example
9.3.5/ Phantom The phantom condition occurs when T1 does a search with a predicate and T2 inserts new tuples that satisfies the predicate. Example ,R1(P),,W2(yinP),,C1(or A1),,C2(or A2) or ,R1(P),,W2(yinP),,C2(or A2),,C1(or A1)
9.4/ Durability
Durability refers to that property of transactions which ensures that once a transaction commits, its results are permanent and cannot be erased from database. Therefore, the DBMS ensures that the results will survive subsequent system failures. The durability property brings forth the issue of database recovery, is how to recover the database into a consistent state where all the committed actions are reflected. Transaction Model
11/ Scheduler
In Oracle, archive mode, is a scheduler which records all activities, which have been done. The scheduler, on the other hand, is responsible for the implementation of a specific concurrency control algorithms. Synchronizes database access for various database operations with the data processors
55
56
2/ Key Issue
The level of concurrency i.e the number of concurrent transactions is probably the most important parameter in distributed systems. Therefore, the concurrency control mechanism attempts to find a suitable trade off between maintaining the consistency of the database and maintaining a high level of concurrency. Isolating transactions from one another in terms of their effects on the database is an important issue of distributed DBMS. If the concurrent execution of transactions leaves the database in a state that can be achieved by their serial execution in some order, problems such as lost updates will be resolved.
3.2/ Schedule
Schedule is a prefix of a complete schedule such that only some of the operations and only some of the ordering relationships are included. For instance, a schedule S is defined over a set of transactions T = {T1, T2,, Tn} and specifies an interleaved order of execution of these transactions operations. 3.2.1/ Serial Schedule A schedule S is serial if, for every transaction T participating in the schedule, which all the operations of T are executed consecutively in the schedule. 3.2.2/ Serializable Schedule If schedule S is on n transactions is serializable, if it is equivalent to some serial schedule of the same n transactions. Result Equivalent: - Two schedules are result equivalent if they produce the same final state of the database. - However, two different schedules may accidentally produce the same final state.
3.3/ Problem
- Input: Schedule S create by a set of transactions T = {T1, T2, , Tn } - Output: Determine S is serializable or Non-Serializable? If S is serializable, find serial schedule that is equivalent to S?
57
Algorithm to Test a Schedule for Serializability by Lock This algorithm looks at only the Lock and Unlock operations, which to construct a precedence graph (also called serialization graph) that is a directed graph G=(N,E) that consists of a set of nodes N = {T1, T2, , Tn } and a set of directed edges E = {e1, e2, , em } The edges can optionally be labeled by the name of the data item that led to creating the edge 3.3.1/ Lock All Transactions indicate their intentions by requesting locks from the scheduler (called lock manager) and locks are either read lock (rl) [also called shared lock] or write lock (wl) [also called exclusive lock]. Locking works nicely to allow concurrent processing of transactions because A Transaction locks an object before using it. When an object is locked by another transaction, the requesting transaction must wait. When a transaction releases a lock, it may not request another lock. Note: Read locks and write locks conflict (because Read and Write operations are incompatible
Write Lock no no
A cycle in a directed graph is a sequence of edges C=(T1T2...Tn-1 Tn T1) with the property that: The starting node of each edge, except the first edge, is the same as the ending node of the previous edge and the starting node of the first edge is the same as the ending node of the last edge. 3.3.3/ Linear order or Topo order For each case in G, look for node Ti that do not precede by one or more directed edges or the directed edges do not face toward Ti, then delete Ti and their followed edges from G, and put Ti in the first order (1). Then, continue to do as (1) until has no node and put it in the next order until the last order. That this order is Linear order or Topo order Consider the schedule S bellow and determine that is it serializable or not?
58
Algorithm to Test a Schedule for Serializability by RLock, WLock - Input: Schedule S create by a set of transactions T = {T1, T2, , Tn } - Output: Determine S is serializable or Non-Serializable? If S is serializable, find serial schedule that is equivalent to S? There are some differences of how to determine the edges: For transaction Ti in schedule S executes RLock (X) or WLock (X), Tj (j i) is the next transaction executes WLock (X), create an edge {Ti Tj}in the precedence graph. For transaction Ti executes WLock (X) in schedule S, transaction Tm (m i) executes a RLock (X) after Ti Unlock (X), but before the other transactions execute WLock (X), create an edge {Ti Tm} in the precedence graph. G has cycle schedule S is nonserializable, Otherwise S is serializable determine its linear order or topo order. Consider the schedule S bellow and determine that is it serializable or not?
4/ Taxonomy
There are a number of ways that the concurrency control approaches can be classified. The concurrency control mechanisms can be classified into two broad classes.
59
4.2.1/ Lock based Approach In the locking based approach, the synchronization of transactions is achieved by employing physical or logical locks on some portion or granule of the database. The size of these portions (locking granularity) is an important issue. There are three modes of lock based: 4.2.1.1/ Centralized Locking In centralized locking, one of the sites in the network is designated as the primary site. It is the site where the lock tables for the entire database are stored. Note: This site is charged with the responsibility of granting locks to transactions 4.2.1.2/ Primary Copy Locking In primary copy locking one of the copies of each lock unit is designated as the primary copy. It is this copy that has to be locked for the purpose of accessing data. If the database is not replicated the primary copy locking mechanism distribute the lock management among all the sites. 4.2.1.3/ Decentralized Locking In decentralized locking the lock management duty is shared by all the sites of a network. In this case the execution of the transaction involves the participation and coordination of scheduler at more than one sites. Each local scheduler is responsible for lock units local to that site. 4.2.2/ Time Stamp Ordering The timestamp ordering (TO) class involves organizing the execution order of transactions so that they maintain mutual and inter consistency. The ordering is maintained by assigning timestamps to both the transactions and data items. The various types of Time Stamp Ordering algorithms are as under: - Basic Time Stamp Ordering - Multi Version Time Stamp Ordering - Conservative Time Stamp Ordering - Hybrid Algorithms
60
The lock manager then checks if the lock unit then contains the data item is already locked. If so and if the existing lock mode is in compactable with that of the current transaction, the current operation is delayed. Otherwise the lock is set in the desired mode and the database operation is passed on to the data processor for actual database access. The transaction manager is then informed of the results of the operation. The termination of a transaction results in the release of its locks and the initiation of another transaction that might be waiting for access on the same data item.
5.4/ Requisites
The lock manager has to know that the transaction has obtained all its locks and will not need to lock another data item and also needs to know that the transaction manager no longer needs to access the data item so that the lock can be released.
RLock(X) WLock(X)
6/ Deadlock
A transaction is deadlocked if it is blocked and will remain blocked until there is intervention. Locking-based CC algorithms may cause deadlocks and TO-based algorithms that involve waiting may cause deadlocks. For instance, If transaction Ti waits for another transaction Tj to release a lock on an entity, then Ti -> Tj in Wait-for graph (WFG). Any locking based concurrency control algorithms may result in deadlocks, since there is mutual exclusion of access to shared resources (data) and transaction may wait on locks. Further, some TO based algorithms that require the waiting of transactions may also cause deadlocks.
61
7/ Why Deadlocks
Deadlock is a permanent phenomenon. If one exists in a system, it will not go away unless outside intervention takes place. A deadlock can occur because transactions wait for one another. Informally, a deadlock situation is a set of requests that can never be granted by concurrency control mechanisms. This outside interference may come from the user, the system operator, or the software system (the operating system, or the distributed DBMS)
8/ Methods
There are three known methods for handling deadlocks.
62
1/ Wait Die Rule: If Ti requests a lock on a data item that is already locked by Tj, Ti is permitted to wait if and only if Ti is older than Tj. If Ti is younger than Tj, then Ti is aborted (die) and restarted with the same time stamp. begin Ti requests lock on data item currently held by Tj if ts(Ti) < ts(Tj) (Ti is older than Tj) then Ti waits for Tj else Ti die (is rolled back) end-if end 2/ Wound Wait Rule If Ti request a lock on a data item that is already locked by Tj, then Ti is permitted to wait if and only if it is younger then Tj. Otherwise, Tj is aborted (wounded) and the lock is granted to Ti. begin Ti requests lock on data item currently held by Tj if ts(Ti) < ts(Tj) (Ti is older than Tj) then Tj wound (is rolled back) else Ti waits for Tj end-if end
63
8.3.3/ Deadlock Detection There are three fundamental methods of detecting distributed deadlocks. They are commonly called: 1-Centralized Deadlock Detection In the centralized deadlock detection approach, one site is designated as the deadlock detector for the entire system. Periodically, each lock manager transmits its LWFG to the deadlock detector, which then forms the GWFG and looks for cycles in it. Centralized deadlock detection has been proposed for distributed INGRES. This method is simple and natural if the concurrency control algorithm were centralized 2PL. 2-Hierarchical Deadlock Detection An alternative is to build a hierarchy of deadlock detectors. The deadlocks that are local to a single site would be detected at that site using a local WFG. Each site also sends its local WFG to the deadlock detector at the next level. Thus, distributed deadlock involving two or more sites would be detected by a deadlock detector at the next lowest level that has control over these sites. The hierarchical deadlock detection method reduces the dependence on the central site thus reducing the communication cost. Note: It is however more complicated implement and would involve nontrivial modification to the lock and transaction manager algorithms. 3-Distributed Deadlock Detection Distributed Deadlock detection algorithms delegate the responsibility of detecting deadlocks to individual sites. Thus, there are local deadlock detectors at each site which communicate their local WFG with one another.
9/ Methodology
The local WFG at each site is formed and is modified as follows. Since each site receives the potential deadlock cycles from other sites their edges are added to the local WFGs. The edges in the local WFG which show that local transactions are waiting for transactions at other sites are joined with edges in the local WFG which shows that remote transactions are waiting for local ones.
10/ Detection
Local deadlock detectors look for two things that one is if there is a cycle that does not include the external edges, there is a local deadlock that can be handled locally; and the next if there is a cycle involving these external edges, there is a potential distributed deadlock and the cycle information has to be communicated to other deadlocks detectors.
64
Solution
Let the path that has the potential of causing distributed deadlocks in the LWG of a site be Ti Tj. A local deadlock detector forwards the cycle information only if ts(Ti) < ts(Tj). This reduces the average number of message transmission by one half.
65
1/ Fundamental Definitions
how to maintain atomicity and durability of transactions
1.1/ Reliability
A measure of success with which a system conforms to some authoritative specification of its behavior. Probability, that the system has not experienced any failures within a given time period. Typically, used to describe systems that cannot be repaired or where the continuous operation of the system is critical.
1.2/ Availability
The fraction of the time that a system meets its specification. Probability that the system is operational at a given time t.
1.3/ Failure
The deviation of a system from the behavior that is described in its specification called Failure. There are four main types of failures: 1.3.1/ Transaction failures Mostly, this failure is occurred when the transaction is aborts due to deadlock. There is 3% of transactions abort abnormally on the research of reliability database. 1.3.2/ System failures These normally, failure of processor, main memory, power supply, main memory contents are lost, but secondary storage contents are safe 1.3.3/ Media failures Failure of secondary storage devices such the stored data is lost. An example, Head crash/controller failure (?) 1.3.4/ Communication failures This part calls technical failure, this generally lost/undeliverable messages or network partitioning are cause from
66
or
67
Coordinator
Participant
Observations 1. A participant can unilaterally abort before he answers "yes". 2. Once a participant answers "yes", it must prepare for commit and cannot change its vote. 3. While a participant is READY, it can either to abort, or to commit, depending on the decision from the coordinator. 4. The global termination is commit if all participants vote "yes", or abort if any participant vote "no. 5. The coordinator and participants may be in some waiting state, time-out method can be used to exit.
68
+ Participants: Timeout in INITIAL: coordinator must have failed in INITIAL state or unilaterally abort Timeout in READY: stay blocked Coordinator Participants
69
70
5/ Three-Phase Commit
3PC is non-blocking. A commit protocols is non-blocking if it is synchronous within one state transition, and its state transition diagram contains no state which is adjacent to both a commit and an abort state, and no non-committable state which is adjacent to a commit state Adjacent: possible to go from one stat to another with a single state transition Committable: all sites have voted to commit a transaction (e.g.: COMMIT state)
7/ Network Partitioning
Simple modification of the ROWA rule: When the replica control protocol attempts to read or write a data item, it first checks if a majority of the sites are in the same partition as the site that the protocol is running on (by checking its votes). If so, execute the ROWA rule within that partition. Assumes that failures are clean which means: Failures that change the network's topology are detected by all sites instantaneously Each site has a view of the network consisting of all the sites it can communicate with
8/ Open Problems
Replication protocols experimental validation replication of computation and communication Transaction models changing requirements cooperative sharing vs. competitive sharing interactive transactions longer duration complex operations on complex data relaxed semantics non-serializable correctness criteria
71
Logging is the log contains information used by the recovery process to restore the consistency of a system. This information may include transaction identifier type of operation (action) items accessed by the transaction to perform the action old value (state) of item (before image) new value (state) of item (after image)
9.1.1/ REDO Protocol REDO'ing an action means performing it again. The REDO operation uses the log information and performs the action that might have been done before, or not done due to failures. The REDO operation generates the new image. 9.1.2/ UNDO Protocol UNDO'ing an action means to restore the object to its before image. The UNDO operation uses the log information and restores the old value of the object.
72
9.2.1/ Notice: If a system crashes before a transaction is committed, then all the operations must be undone. Only need the before images (undo portion of the log). Once a transaction is committed, some of its actions might have to be redone. Need the after images (redo portion of the log). 9.2.2/ WAL protocol: Before a stable database is updated, the undo portion of the log should be written to the stable log When a transaction commits, the redo portion of the log must be written to stable log prior to the updating of the stable database. 9.2.3/ Logging Interface
73
11.1/ No-Fix/No-Flush
Abort Buffer manager may have written some of the updated pages into stable database LRM performs transaction undo (or partial undo) Commit LRM writes an end_of_transaction record into the log. Recover For those transactions that have both a begin_transaction and an end_of_transaction record in the log, a partial redo is initiated by LRM For those transactions that only have a begin_transaction in the log, a global undo is executed by LRM
11.2/ No-Fix/Flush
Abort Buffer manager may have written some of the updated pages into stable database LRM performs transaction undo (or partial undo) Commit LRM issues a flush command to the buffer manager for all updated pages LRM writes an end_of_transaction record into the log. Recover No need to perform redo Perform global undo
11.3/ Fix/No-Flush
Abort None of the updated pages have been written into stable database Release the fixed pages Commit LRM writes an end_of_transaction record into the log. LRM sends an unfix command to the buffer manager for all pages that were previously fixed Recover Perform partial redo No need to perform global undo
74
11.4/ Fix/Flush
Abort None of the updated pages have been written into stable database Release the fixed pages Commit (the following have to be done atomically) LRM issues a flush command to the buffer manager for all updated pages LRM sends an unfix command to the buffer manager for all pages that were previously fixed LRM writes an end_of_transaction record into the log. Recover No need to do anything
12/ Checkpoints
Simplifies the task of determining actions of transactions that need to be undone or redone when a failure occur. A checkpoint record contains a list of active transactions. Those are Steps: Write a begin_checkpoint record into the log Collect the checkpoint date into the stable storage Write an end_checkpoint record into the log
75
Memory Components
Share/System Global Area (SGA) - it can be shared with few component as of the followings: 1. Shared pool: the size of the SP is maitained by an initial parameter called shared_pool_size. The larger the share pool, the better the performance will be. The shared pool is divided into two: library cache and data cache. Note: correct configuration of shared_pool affect the performance. 2. Buffered cache: buffer_cache_size: it is basically made up of buffer. The parameters which determine the buffer are called db_block_size (e.g. 8k) and db_cache_size (e.g. 80MB). 3. Log buffer: log_buffer_size 4. Java Pool: java_pool_size 5. Stream Pool: stream_pool_size; used for Oracle Streams. 6. Redo log buffer: used for instance recovery.
76
There are 5 mandatory background processors. In case any of the following processors is killed or down, Oracle database is not working. 1. dbwr 2. lgwr 3. smon 4. pmon 5. ckpt Note: Tablespaces consist of one or more data files. Data files belong to only one tablespace. The SYSTEM and SYSAUX tablespaces are mandatory tablespaces. They are created at the time of database creation. They must be online. The SYSTEM tablespace is used for core functionality (for example, data dictionary tables). The auxiliary SYSAUX tablespace is used for additional database components (such as the Enterprise Manager Repository). Segments exist within a tablespace. Segments are made up of a collection of extents. Extents are a collection of data blocks. Data blocks are mapped to disk blocks. Memory structures: o System Global Area (SGA): Database buffer cache, redo buffer, and various pools o Program Global Area (PGA) Process structures: o User process and Server process o Background processes: SMON, PMON, DBWn, CKPT, LGWR, ARCn, and so on Storage structures: o Logical: Database, schema, tablespace, segment, extent, and Oracle block o Physical: Files for data, parameters, redo, and OS block
Segment
Extents
Data blocks
Disk blocks
77
78
3/ Database Planning
As a DBA, you must plan: The logical storage structure of the database and its physical implementation: o How many disk drives do you have for this? o How many data files will you need? (Plan for growth.) o How many tablespaces will you use? o Which type of information will be stored? o Are there any special storage requirements due to type or size? The overall database design A backup strategy for the database
Management agent
-orListener
Dynamic Performance Views These views are owned by the SYS user. Different views are available at different times: o The instance has been started. o The database is mounted. o The database is open. You can query V$FIXED_TABLE to see all the view names. These views are often referred to as v-dollar views. Read consistency is not guaranteed on these views because the data is dynamic. In Oracle, SQL*Plus and iSQL*Plus provide additional interfaces to your database to Perform database management operations, Execute SQL commands to query, insert, update, and delete data in your database SQL*Plus is a command-line tool and used interactively or in batch mode iSQL*Plus is not command-line. Its an interface tool bases on web interface.
79
After created a database, an instance and a listener, user sysdba have to create user database owner. Then, create initial file called Initialization Parameter Files The below is the following step to create database manually in windows: 1. create folders for creating database: 1. data1 (disk 1) 2. data2 (disk 2) 3. cdump (core dump) 4. bdump (background dump) 5. udump (user dump) 6. backup(for backup and recovery) 2. create service (oradim) * only for windows 3. configure pfile 4. startup instance 5. create database 6. create data dict views (@oracle_home\dbms\admin\catalog.sql)- as sysdba 7. create built-in PL/SQL packages (@oracle_home\dbms\admin\catproc.sql) - as sysdba 8. create user profile information (@oracle_home\dbms\admin\pubbld.sql) - as system/manager Initialization Parameter Files Example: create a sample of pfile: initdb1.ora
db_name = db1 shared_pool_size = 100m db_cache_size = 120m log_buffer = 5000000 background_dump_dest = d:\bdump user_dump_dest = d:\udump core_dump_dest = d:\cdump control_files = d:\data1\control01.ctl
80
spfileorcl.ora
Example steps: * switch database since there are many oracle instances running. C:\>set oracle_sid=dba1 C:\>sqlplus / as sysdba * startup the instance only, no mount or open sql>startup nomount >> This will give error since it cannot locate the initdba1.ora sql>startup nomount pfile=d:\data1\initdba1.ora >>> successful, then create a database manually sql>create database dba1 2> datafile 'd:\data1\system01.dbf' size 200m 3> logfile group1 'd:\data1\log1a.rdo'size 5m, 4> group2 'd:\data1\log2a.rdo'size 5m 5> sysaux datafile 'd:\data2\sysaux01.dbf'size 80m; >>>Database created. *** Note: the larger the size is, the longer it takes to create the database because it needs to allocate the space in the physical disk. sql>select name from v$database; NAME -------DBA1 sql>show parameter shared_pool; * oracle_home\rdbms\admin\catalog.sql sql>@d:\oracle\product\10.2.0\db_1\rdbms\admin\catalog.sql * oracle_home\rdbms\admin\catproc.sql sql>@d:\oracle\product\10.2.0\db_1\rdbms\admin\catproc.sql * run the create user profile information script by connect as system sql>conn system/manager sql>@d:\oracle\product\10.2.0\sqlplus\admin\pupbld.sql
81
sql>select group#, member from v$logfile; ***note: member = mirroring of logfile sql>alter database add logfile member 2>'d:\data2\log1b.rdo' to group 1, 3>'d:\log2b.rdo' to group 2; >>>Database altered. sql>select group#, member from v$logfile; * switch the logwr to write onto a different one, because we need to drop sql>alter system switch logfile; sql>database drop logfile member 'd:\log2b.rdo'; sql>alter daatabase add logfile member 2>'d:\data2\log2b.rdo' to group 2; sql>select group#, member from v$logfile; RESULT >>> GROUP#, MEMBER 1 D:\DATA1\LOG1A.RDO 2 D:\DATA1\LOG2A.RDO 1 D:\DATA2\LOG1B.RDO 2 D:\DATA2\LOG2B.RDO >> This shows log files are mirroring to another location as of in D:\Data2 *switch user to sysdba to shutdown/startup the instance & database sql>conn / as sysdba sql>shutdown immediate * create another control file as of in D:\data1 to d:\data2 (copy and paste) * modify 'control_files' parameter in the initdba1.ora control_files=d:\data1\control01.ctl, d:\data2\control02.ctl *startup the instance and the database using 'startup' alone sql>startup >>> Oracle instance started. sql>select name from v$controlfile; >>> Name d:\data1\control01.ctl d:\data2\control02.ctl
82
*** Note: controlfile files' sizes are identically the same; otherwise, there is a problem.
OPEN STARTUP MOUNT All files opened as described by the control file for this instance
Note: after instance startup after doing 'shutdown abort' oracle will do instance recovery done by oracle automatically. Then, redo writes the committed and logged transactions (logfile information) into datafiles
Database Tablespace
Data files
83
ASM
Operating system
ASM Concept
Database ASM disk group ASM file
Tablespace
Data file
Segment
Extent
Allocation unit
Physical block
5/ Database concurrency
Up on time and this paper is limit, I would to introduce you about how create user, role, privilege. So, I would like to express more about oracle concurrency. But if you want to know more about this, please contact me: [email protected].
5.1/ PL/SQL
There are many types of PL/SQL database objects: Package, Package body, Type body, Procedure, Function and Trigger. Oracles Procedural Language extension to SQL (PL/SQL) is a fourth-generation programming language (4GL). It provides: Procedural extensions to SQL Portability across platforms and products Higher level of security and data integrity protection Support for object-oriented programming
84
5.2/ Locks
Locks prevent multiple sessions from changing the same data at the same time. They are automatically obtained at the lowest possible level for a given statement. They do not escalate.
Transaction 1
SQL> UPDATE employees 2 SET salary=salary+100 3 WHERE employee_id=100;
Transaction 2
SQL> UPDATE employees 2 SET salary=salary*1.1 3 WHERE employee_id=100;
Transaction 1
SQL> UPDATE employees 2 SET salary=salary+100 3 WHERE employee_id=100;
Transaction 2
SQL> UPDATE employees 2 SET salary=salary*1.1 3 WHERE employee_id=101;
85
6/ Database Reliability
A secure system ensures the confidentiality of the data that it contains. There are several aspects of security: Restricting access to data and services Authenticating users Monitoring for suspicious activity
Restrict the directories accessible by users. Limit users with administrative privileges. Restrict remote database authentication:
REMOTE_OS_AUTHENT=FALSE
86
87
For Oracle database, Enterprise Manager uses Recovery Manager (RMAN) to perform backup and recovery operations. RMAN: is a command-line client for advanced functions, has powerful control and scripting language, has a published API that enables interface with most popular backup software, Backs up files to the disk or tape, Backs up data, control, archived log, and server parameter files After the instance is open, it fails in the case of the loss of: Any control file, A data file belonging to the system or undo tablespaces and An entire redo log group. As long as at least one member of the group is available, the instance remains open. In Oracle started up from 9i, The Flashback technology is a revolutionary advance in recovery. Traditional recovery techniques are slow. The entire database or a file (not just the incorrect data) has to be restored. Every change in the database log must be examined. Flashback is fast. Because indexed by row and by transaction are changed and only the changed data is restored. Flashback commands are easy. No complex multiple-step procedures are involved. Flashback Database brings the database to an earlier point in time by undoing all changes made since that time. Flashback Table recovers a table to a point in time in the past without having to restore from a backup. Flashback Drop restores accidentally dropped tables.
Note: If a control file is lost or corrupted, the instance normally aborts, at which time you must perform the following steps: 1. Shut down the instance, if it is still open. 2. Restore the missing control file by copying an existing control file. 3. Start the instance. If a member of a redo log file group is lost, as long as the group still has at least one member, then: 1. Normal operation of the instance is not affected 2. You receive a message in the alert log notifying you that a member cannot be found. 3. You can restore the missing log file by copying one of the remaining files from the same group.
88
If the database is in NOARCHIVELOG mode, and any data file is lost, perform the following tasks: 1. Shut down the instance if it is not already down. 2. Restore the entire database, including all data and control files, from the backup. 3. Open the database. 4. Have users reenter all changes made since the last backup. If a data file is lost or corrupted, and that file does not belong to the SYSTEM or UNDO tablespace, then restore and recover the missing data file. If a data file is lost or corrupted, and that file belongs to the SYSTEM or UNDO tablespace: 1. The instance may or may not shut down automatically. If it does not, use SHUTDOWN ABORT to bring the instance down. 2. Mount the database 3. Restore and recover the missing data file 4. Open the database
Incoming request
Names Resolution
Commands from the listener control utility can be issued from the command line or from the LSNRCTL prompt. UNIX or Linux command-line syntax:
$ lsnrctl <command name> $ lsnrctl start $ lsnrctl status
Prompt syntax:
LSNRCTL> <command name> LSNRCTL> start LSNRCTL> status
89
Oracle Net supports several methods of resolving connection information: Easy connect naming: Uses a TCP/IP connect string, which no required client-side configuration and enabled by default. But its offers not support for advanced connection options, such as: connect-time failover, source routing and load balancing
SQL> CONNECT hr/[email protected]:1521/dba10g
Local naming: Uses a local configuration file, which requires a client-side Names Resolution file, it supports all Oracle Net protocols and advanced connection options.
SQL> CONNECT hr/hr@orcl
Directory naming: Uses a centralized LDAP-compliant directory server, it supports all Oracle Net protocols, supports advanced connection options and requires LDAP with Oracle Net Names Resolution information loaded: o Oracle Internet Directory o Microsoft Active Directory Services
LDAP directory SQL> CONNECT hr/hr@orcl Oracle Net configuration files
External naming: Uses a supported non-Oracle naming service and includes: o Network Information Service (NIS) External Naming o Distributed Computing Environment (DCE) Cell Directory Services (CDS)
Non-Oracle naming service Oracle Net
8/ Database performance
In Oracle, Tuning are used to advice such as complicated SQL structure, accessing issues, missing index and so on according to this lead the improvement of database performance have been done.
Automatic Tuning Optimizer
Memory Memory allocation allocation issues issues Input/output Input/output device device contention contention Resource Resource contention contention
Statistics check optimization mode
?
Application Application code code problems problems DBA Network Network bottlenecks bottlenecks
Restructure SQL
90
IV/ SUMMERY
Regarding to the above describe, can indicated student/the fresher of software development in the ways of what is the important things that can associated in their application development which is efficiency, flexibility, availability, reliability, incremental growth and powerful database system. More over, it can be a shareable system which is a real-time multi-database system (replicate able).
V/ APPLY
Distributed database system is a kind of a new cutting-edge of business computerization can do their business transaction, manage, controls, evaluate and recover-supporting all around the world. Developer or student will be introduced to interoperable, distributed data processing architectures associated with the access of heterogeneous data sources. Which, Traditional distribution issues are addressed in the context of relational database systems, e.g.: distributed query processing, distributed database design. Hence, this subject can be applied and served in many fields of uses in human-society such as: - Provide a strong foundation for addressing issues of distributed database processing. - To know the essential of a reliability and concurrency control in database system - Meet the requirement of the distributed database centralization and decentralization - Perform a good communication with a lower cost.
VI/ CONCLUSION
In my point of view, I do hope that distributed database system is not only a new computer technology which supports business management, evaluation, decision making and it is also help the manageability data in cases of their crashing. More over, this will be a good result of developing in human community. Hence, I would like to recommend dean of Royal University of Phnom Penh and also dean of Computer Science department that this course should be expanded to give more opportunities to student can applied with real experiments better than study on its theory.
91
VII/ REFERENCES
This report was prepared by follow references: - Slides of presentations of lecturer: Pok Leakmony - Oracle 10g Database Administration Workshop I - M. T. Ozsu and P. Valduriez, Principles of Distributed Databases (2nd edition), Prentice-Hall, ISBN 0-13-659707-6 - Elmasri and Navathe, Fundamentals of database systems (3rd edition), Addison-Wesley Longman, ISBN 0-201-54263-3
92