Database Performance Tuning and Optimization Using Oracle PDF
Database Performance Tuning and Optimization Using Oracle PDF
Mittra
Includes CD-ROM
123
Sitansu S. Mittra
Senior Principal Engineer and Database Management Specialist
Computer Sciences Corporation
5 Cambridge Center
Cambridge, MA 02139
USA
mittra4@aol.com
ACM computing Classification (1998): C.4, E.1, E.2, H.2, H.3, H.4
9 8 7 6 5 4 3 2 1 SPIN 10857938
www.springer-ny.com
Disclaimer:
This eBook does not include the ancillary media that was
packaged with the original printed version of the book.
Preface
Scope
The book provides comprehensive coverage of database performance tuning and optimi-
zation using Oracle 8i as the RDBMS. The chapters contain both theoretical discussions
dealing with principles and methodology as well as actual SQL scripts to implement the
methodology. The book combines theory with practice so as to make it useful for DBAs
and developers irrespective of whether they use Oracle 8i. Readers who do not use Oracle
8i can implement the principles via scripts of their own written for the particular RDBMS
they use. I have tested each script for accuracy and have included the sample outputs
generated from them.
An operational database has three levels: conceptual, internal, and external. The con-
ceptual level results from data modeling and logical database design. When it is imple-
mented via an RDBMS such as Oracle, it is mapped onto the internal level. Database ob-
jects of the conceptual level are associated with their physical counterparts in the internal
level. An external level results from a query against the database and, as such, provides a
window to the database. There are many external levels for a single conceptual level.
The performance of an OLTP database is measured by the response times of the data-
base transactions. The response time depends on the efficiency of all three levels. A
query on a well-designed conceptual level can run slowly if the SQL formulation of the
query is poorly done, or if the database objects involved in the query are fragmented, or if
a table used by the query has excessive chaining. Likewise, a well-formulated query can
run slowly if the design of the conceptual level is bad. Such examples abound in database
applications. The book addresses each level separately by focusing first on the underlying
principles and root causes of problems and then offering solutions, both on a theoretical
level and with Oracle SQL scripts with sample outputs.
Even if all three levels of a database are kept properly tuned, its performance may
suffer due to other factors. For example, the CPU may remain pegged at or very near
to 100%, the memory may be inadequate causing excessive paging and swapping bor-
dering on thrashing, disk controllers may be inefficient, etc. These factors are outside the
realm of the database and, therefore, are not treated in this book. Some discussion of
viii Preface
tuning the CPU and the memory as a part of the internal level of a database appears in
Chapter 6.
The theory of relational databases as propounded by Codd has its foundation rooted in
mathematics. Consequently, database topics can often be discussed using the mathemati-
cal language and notations. Due to the inherent precision of such language I have used it
in my treatment of database topics, whenever appropriate.
Part 1: Chapters 1 to 3
Part 1, Methodology, consists of Chapters 1 to 3 that cover the methodology aspect of
database performance tuning and optimization. The goal of Part 1 is to establish a sound
conceptual framework for identifying tuning issues and taking a well-planned approach
to address them. As such, it is primarily theoretical in nature and avoids, as far as practi-
cable, references to any particular RDBMS. The methods and principles discussed in this
part can be applied to any RDBMS with which the reader works.
Chapter 1, Database Application Development, contains a detailed discussion of
the five phases of building a database application starting with the information require-
ments analysis, continuing through logical and physical database designs, and ending
in database implementation. This is a one-time effort. When the database becomes op-
erational, its ongoing maintenance phase begins. The issues of performance tuning hap-
pen primarily during this phase, although initial performance checks are done during
the development. This chapter is targeted primarily for the developers and the system
analysts.
Chapter 2, Performance Tuning Methodology, describes the three levels of a database
and emphasizes that a database must be well tuned at all three levels in order to run opti-
mally. It provides the methodology for performance tuning. Both the DBAs and the de-
velopers will benefit from this chapter.
Chapter 3, Tuning the Conceptual Level of a Database, explores a variety of issues
underlying the tuning process of the conceptual level. It covers three major areas: denor-
malization, partitioning of tables and indices, and data replication among multiple loca-
tions of a database. The primary readers of this chapter are the developers.
Preface ix
Part 2: Chapters 4 to 10
Part 2, Oracle Tools for Tuning and Optimization, consists of Chapters 4 to 10 that cover
the Oracle tools for monitoring the performance of a database and tuning and optimizing
its internal and external levels, as needed. This part is specific to Oracle 8i with a glimpse
into Oracle 9i in Chapter 10. It is intended for the DBAs. The goal of Part 2 is to provide
the underlying tuning principles and a repertoire of Oracle tools to implement these prin-
ciples. Chapters 4 to 7 discuss the tuning of the internal level and Chapters 8 to 9 that of
the external level. Chapter 10 describes several features of Oracle 8i not covered in the
earlier chapters and gives an overview of several tuning features of Oracle 9i.
Chapter 4, Internal Level of an Oracle Database, introduces the structure of the inter-
nal level comprising an instance and a database. The instance consists of a set of mem-
ory-resident data structures and a set of background processes. The database consists of a
set of disk-resident data structures, namely, the tablespaces, tables, indices, etc.
Chapter 5, Tuning of Disk-Resident Data Structures, discusses in detail the tuning
principles of the components of an Oracle database. Fragmentation and chaining are two
major areas that the DBAs need to track and address for tuning. Ample scripts are pro-
vided to implement the tuning principles.
Chapter 6, Tuning of Memory-Resident Data Structures, discusses in detail the tuning
principles of the System Global Area (SGA) of an Oracle instance and the administration
and optimization of the background processes. The chapter includes separate sections
dealing with the tuning of the CPU and memory-related objects. Ample scripts are pro-
vided to implement the tuning principles.
Chapter 7, Oracle Utility for Tuning and Optimization, involves a detailed discussion
of the two Oracle diagnostic tools, UTLBSTAT and UTLESTAT, that collect detailed
statistics during a data collection period specified by the DBA and generate a report con-
taining an analysis of the collected statistics along with recommendations for improve-
ment, as needed. The chapter includes detailed directions for interpreting the output and
for taking corrective actions to address the deficiencies identified in the output.
Chapter 8, Optimization of the External Level of a Database, introduces the mathe-
matical theory underlying the query optimization process. This is followed by a detailed
treatment of Oracle’s optimization tools such as EXPLAIN PLAN, SQLTRACE and
TKPROF, and AUTOTRACE.
Chapter 9, Query Tuning and Optimization Under Oracle 8i, discusses the rule-based
and the cost-based optimizers, various joining techniques, and the use of hints in queries
to suggest specific query execution plans.
Chapter 10, Special Features of Oracle 8i and a Glimpse into Oracle 9i, covers several
special features of Oracle 8i pertaining to performance tuning and optimization that were
not covered in the earlier chapters. It closes with an overview of some of the features of
Oracle 9i.
x Preface
Appendices A to E
The five appendices discuss several DBA issues, although they are not directly related to
performance tuning and optimization. They are included here to make the book useful for
addressing issues outside the realm of performance monitoring and tuning.
Appendix A, Sizing Methodology in Oracle 8i, contains the algorithms and two C
programs for estimating the storage space needed for tables, indices, and tablespaces
during the physical design phase, as discussed in Chapter 1. The two C programs imple-
ment the sizing algorithms. Sizing of the tables, indices, and tablespaces constitutes the
capacity planning activity which is extremely important for smooth and optimal operation
of a database.
Appendix B, Instance and Database Creation, contains detailed instructions for creat-
ing an Oracle instance and an Oracle database under UNIX. Sample files such as init.ora,
config.ora, etc. are included as attachments to clarify the steps. Inexperienced DBAs of-
ten find the creation of an instance quite a formidable job because of the following two
reasons:
• Various files and directories must be properly set up before the instance can be cre-
ated.
• After the database is created, its tablespaces, tables, indices, views, user roles and
privileges, etc. must be created within the database.
Appendix C, Instance and Database Removal, offers a similar step-by-step methodol-
ogy to drop an instance and its associated database.
Preface xi
Unique Features
The book offers several unique features that distinguish it from other books with similar
titles and scope that are currently available in the market.
(a) Three Levels of a Database: A clear knowledge of the three levels and their mutual
relationships is crucial to the understanding of database operations and performance
tuning issues. Chapter 2 explains this background information with examples and
emphasizes that a database must run optimally at each level for overall optimal per-
formance. Also, tuning may have to be done at multiple levels to resolve a problem.
(a) Ideal Mix of Theory and Practice: Throughout the book I have described the under-
lying principles for handling database performance and then included Oracle SQL
scripts to implement these principles. This is the standard error-fixing methodology
for software and database problems. One first localizes a problem, then finds the root
cause(s), and finally corrects it. My treatment allows the reader to understand the
principles and then apply them to other RDBMSs besides Oracle.
(a) Web Based Databases: In today's software industry n-tier (n ≥ 3) architecture with
the database residing on the nth tier is very common. The principles of database de-
sign and performance tuning remain largely the same in such applications. However,
a few nuances need to be clarified. Chapter 12 addresses that area.
(a) Complete Instructions for Creating an Instance and a Database: From my own ex-
perience I have found that creating an Oracle instance often poses a more serious
xii Preface
problem than running a script to create the database. Creation of an instance involves
an understanding of the disk storage structure, defining the initialization parameters
properly, and setting up the necessary directories. Appendix B contains a complete
set of instructions to implement a database and all its objects.
(a) Mathematical Foundation of Relational Databases: Dr. Edgar Codd established the
relational database theory on a sound mathematical foundation. He introduced rela-
tional algebra and relational calculus as the two alternative but equivalent mecha-
nisms for query languages and query processing. Appendix E contains the relevant
mathematical materials along with the three data structures that are used for sequen-
tial, indexed, and direct searches of a relational database.
(a) References and Further Reading: Titles included in this section at the end of each
chapter will help the more inquisitive readers in finding additional materials relevant
to the topics. Some of the references offer an alternative approach to handle the per-
formance and tuning issues.
(a) Exercises: The exercises at the end of each chapter are optional. Occasionally they
introduce topics that are an extension to the main body of the text. But they are in-
tended primarily for students and instructors when the book is used as a text for a
database course. In today’s educational systems often colleges and universities offer
certificate courses in specialized areas such as performance tuning of Oracle data-
bases. The exercises can then be used as assignments for the students.
Reader Community
The book assumes an understanding of relational database concepts and a familiarity with
Oracle. Some knowledge of college-level mathematics will be helpful. Thus, it is not in-
tended for the beginners. It is designed primarily for database professionals such as
DBAs, developers, and system analysts with two to three years of experience in the de-
velopment, implementation, and maintenance of relational databases using some
RDBMS, preferably Oracle. A junior DBA or a developer can use the scripts of the book
under the guidance of a senior DBA or developer. The principles and tools presented here
arise out of my personal experience with Oracle and several other RDBMSs (e.g., SQL
Server, INGRES, LOGIX, RIM, etc.) that I have used since 1983. I have been using them
regularly as a part of my current and previous positions as a senior Oracle DBA, Data
Architecture Group Manager, Development and Architecture Manager, and QA Manager,
in industry. In addition, I have taught graduate-level database courses at Boston Univer-
sity, Northeastern University, and Wentworth Institute of Technology. I have used the
collective experience from these activities in writing this book. The scripts and their sam-
ple output included in the book were written over several years as my ideas shaped up
through academic and industrial experience.
Colleges and universities do not offer full semester courses dealing with database per-
formance and tuning per se. Usually such materials are included in other database courses
at an advanced level, in the certificate and state-of-the-art programs offered for the in-
dustry professionals, or simply as courses in continuing education programs. The book
Preface xiii
can be used as a text for such courses. The instructors can use the exercises at the end of
each chapter as student assignments. Some of the exercises explore new topics not dis-
cussed in the accompanying chapters. These can be used as group projects for the stu-
dents.
In order to keep the book self-contained I have avoided the use of any third party
tools. The only software being used is Oracle 8i RDBMS along with SQL*Plus and
PL/SQL. Almost all of the scripts included in the book are completely portable between
UNIX and NT and can be used in any Oracle 8i installation under either operating sys-
tem. The scripts are collected into a separate CD that is included with the book.
Some readers will read the book from beginning to end. Others will use it as a refer-
ence and utilize the tuning guidelines and accompanying scripts to resolve specific prob-
lems. Possibilities are quite varied.
Acknowledgments
I got much valuable information from Oracle MetaLink (accessible through the web site
http://metalink.oracle.com). I was able to resolve several issues and clarify concepts with
the help of the information. In addition, I benefited from phone conversations with mem-
bers of the Oracle Technical Group in response to TARs that I opened with Oracle. I am
highly impressed with the courtesy, promptness, and professionalism of the Oracle tech-
nical staff.
I acknowledge the friendly support of the staff of Springer-Verlag to make the publi-
cation of the book a success. In particular, I thank Wayne Yuhasz, the Executive Editor
of Computing and Information Science, Wayne Wheeler, the Assistant Editor of Com-
puting and Information Science, and Robert Wexler, who converted the whole manu-
script into the format needed for publication.
Preface vii
Part 1 Methodology 1
1 Database Application Development 3
Outline................................................................................................................................ 3
Overview of the Chapter ................................................................................................... 3
1.1 1970s Software Technology Era ................................................................................. 4
1.2 Role of the Database in SDLC and SMLC................................................................. 6
1.3 Enterprise Modeling .................................................................................................... 7
1.5. Physical Database Design ........................................................................................ 15
1.6 Database Implementation.......................................................................................... 18
1.7 Database Maintenance............................................................................................... 20
1.8 Naming Guidelines for Database Objects................................................................. 20
Key Words ....................................................................................................................... 25
References and Further Reading ..................................................................................... 26
Exercises .......................................................................................................................... 27
Appendices 399
Appendix A Sizing Methodology in Oracle 8i 401
Outline............................................................................................................................ 401
Overview of the Appendix ............................................................................................ 401
A1. Transition from Logical to Physical Database Design.......................................... 401
A2. Space Usage via Extents......................................................................................... 402
A3. Algorithms for Sizing Tables, Indices, Tablespaces ............................................. 403
A4. STORAGE Clause Inclusion: Table and Index Levels......................................... 407
A5. Sizing Methodology ............................................................................................... 407
A6. RBS, SYSTEM, TEMP, and TOOLS Tablespace Sizing..................................... 408
Key Words ..................................................................................................................... 411
References and Further Reading ................................................................................... 411
Index 481
Part 1
Methodology
Part 1 consists of three chapters that cover the methodology aspect of database perform-
ance tuning and optimization. The goal of Part 1 is to establish a sound conceptual
framework for identifying tuning issues and taking a well-planned approach to address
them. As such, it is primarily theoretical in nature and avoids, as far as practicable, refer-
ences to any particular RDBMS. The methods and principles discussed in this part can be
applied to any RDBMS with which the reader works.
Chapter 1, Database Application Development, contains a detailed discussion of the
five phases of building a database application starting with the information requirements
analysis, continuing through logical and physical database designs, and ending in data-
base implementation. This is provided as a basis for the subsequent discussions involving
performance tuning.
Chapter 2, Performance Tuning Methodology, describes the three levels of a database
and emphasizes that a database must be well tuned at all three levels in order to run opti-
mally. The chapter describes two metrics that are used for measuring database perform-
ance.
Chapter 3, Tuning the Conceptual Level of a Database, explores a variety of issues
underlying the tuning process of the conceptual level. Since this level is built during the
logical database design, it is essentially independent of any RDBMS. Consequently, I
have included this chapter in Part 1.
I have used Oracle 8i as the RDBMS in giving specific implementation bound exam-
ples, if needed. A case in point is a tuning example appearing in Section 2.3.1.
1
Database Application Development
Outline
1.1 1970s Software Technology Era
1.2 Role of the Database in SDLC and SMLC
1.3 Enterprise Modeling
1.4 Logical Database Design
1.5 Physical Database Design
1.6 Database Implementation
1.7 Database Maintenance
1.8 Naming Guidelines for Database Objects
Key Words
References and Further Reading
Exercises
offered in the chapter as a foundation for introducing the subsequent topics in the fol-
lowing chapters. The chapter concludes with a detailed set of naming guidelines for data-
base objects.
Up to the early 1970s only nonrelational data models were used for databases in the form
of hierarchical, network, or inverted file systems. Dr. E.F. Codd, at that time a member of
the IBM Research Laboratory in San Jose, California, first introduced the theory of the
relational data model in his paper, A Relational Model of Data for Large Shared Data
Banks (see [3]). Subsequently he published several papers between 1970 and 1973 for-
mulating the relational database technology consisting of the principles of database de-
sign, detailed syntax and examples of the query language, and database administration is-
sues. He introduced the query language as a nonprocedural language based on two
separate but logically equivalent mathematical paradigms of relational algebra and rela-
tional calculus. Appendix E offers a discussion of these two paradigms along with other
mathematical topics underlying relational database systems. The query language com-
mands fall into two major categories, Data Definition Language (DDL) and Data Ma-
nipulation Language (DML). Being a mathematician himself, Codd founded the entire
theory of relational database systems on mathematics. The first commercially available
relational database management system (RDBMS) was INGRES, marketed in 1979 by
Relational Technology, Inc. This was followed shortly by ORACLE, marketed in the
same year by Relational Software, Inc., which was later renamed Oracle Corporation.
The INGRES query language was based on relational calculus, and that of ORACLE
primarily used the relational algebra.
In 1982 the American National Standards Institute (ANSI) asked its Database Com-
mittee (X3H2) to develop a proposal for a standard relational database language. The
X3H2 proposal was finally ratified by ANSI in 1986 resulting in ANSI SQL, popularly
called SQL/86, as the industry standard for query languages using a hybrid of the two
paradigms of relational algebra and relational calculus. The International Organization
for Standardization (ISO) accepted SQL/86 in 1987. This standard was extended in 1989
to include an Integrity Enhancement Feature and was called SQL/89 or SQL1. In 1991 a
group of vendors known as the SQL Access Group published a set of enhancements to
support the interoperability of SQL/89 across different operating systems. Subsequently,
ANSI and ISO jointly published a revised and greatly expanded version of SQL/89. This
1.1 1970s Software Technology Era 5
became a ratified standard in late 1992 under the name of International Standard ISO/IEC
9075:1992, Database Language SQL (see [5]). Informally, it was called SQL/92 or
SQL2.
From mid-1992 efforts had been underway to formulate and publish the next set of
enhancements to SQL/92. By 1998 over 900 pages of specifications had been written un-
der the name SQL3. In the U.S.A. the entirety of SQL3 was being processed as both an
ANSI domestic project and as an ISO project. In late 1999 these specifications were rati-
fied under the name SQL:1999, which became the next update of SQL/92. Informally
SQL:1999 is still referred to as SQL/3 [6, p. 423] and is the current version of the SQL
specifications.
Throughout the early 1970s MIS applications were developed using a variety of mostly
ad hoc methodologies. By the mid-1970s several parallel efforts started to build a uni-
form methodology to develop applications. Three separate schools of thought were orga-
nized by Yourdon and Constantine, Gane and Sarson, and Jackson. Their approaches
were similar so that by the late 1970s an integrated methodology emerged and was called
the software development life cycle or SDLC. Under this methodology an application de-
velopment consists of five distinct phases, as described below (see [9]):
(a) Problem definition and feasibility study,
(b) Requirements analysis,
(c) Preliminary system design,
(d) Detailed system design, and
(e) System implementation, maintenance, and evaluation.
These five phases are grouped into three logical categories: analysis consisting of (a)
and (b), design consisting of (c) and (d), and implementation consisting of (e) alone.
Strictly speaking, maintenance and evaluation under (e) fall outside the SDLC, which
ends with the installation of the application as an operational system. After an application
is installed, it enters the maintenance phase which is governed by a methodology similar
to SDLC and is called the software maintenance life cycle or SMLC (see [11]).
The analysis category is guided by the keyword WHAT, i.e., what functions will the
application provide for the end users. Its end product consists of a logical system specifi-
cation (also called functional requirements document) that describes in detail the tasks
that end users will be able to do when the system becomes operational. The design cate-
gory is guided by the keyword HOW, i.e., how the functions described in the logical
system specification will be implemented. The end product here is called the physical
system specification that describes the architecture of the proposed system, screen for-
mats with samples, report formats with samples, the logical design of the database, the
process flow and logic, required hardware and software tools, architecture of the applica-
tion, resource estimates, etc. This document is a “paper” system built in response to the
logical system specification. The implementation category consists of converting the
6 1. Database Application Development
“paper” system into an “electronic” system and ends with a successful acceptance test by
end users.
More than 25 years have elapsed since the SDLC was first introduced in the informa-
tion industry for the purpose of building applications. Various technologies have ap-
peared since then such as object-oriented design and programming, first two-tier and then
n-tier (n > 2) client server architecture, Web-based design, event-driven programming in-
stead of logic-driven programming, etc. Although the tools used for building applications
in each of these areas are widely different, SDLC still remains the underlying methodol-
ogy.
The main differences between SDLC and SMLC arise from the focus: SMLC oper-
ates with an existing application, whereas SDLC builds a new application. The salient
points of difference are listed below.
• An SMLC job is initiated by a software maintenance request form, whereas an
SDLC job starts as a new application development.
• SMLC deals with a code and a database that are already in existence and need modi-
fication, whereas SDLC involves a new code and database.
• The analysis and design phases of SDLC are much longer and require in-depth dis-
cussion with the end users, whereas they are much simpler in SMLC.
base transactions, i.e., data retrieval and update requests, are done on the nth or the most
remote tier. Typically the client tier, which is often called a thin client, is equipped with a
Web browser that sends client requests to the application tier(s). The application server
processes the client request and returns the result to the client tier. It contains code to
handle complex business logic such as transaction management, fault tolerance, scalabil-
ity, and sharing/reusing code modules. In no case does a client access the database server
tier directly.
The analysis and design phases of the SDLC and SMLC result in the following data-
base tasks,
(a) Enterprise modeling,
(b) Logical database design,
(c) Physical database design,
(d) Database implementation, and
(e) Database maintenance.
Except for (a) and (b), the remaining activities require the use of a specific RDBMS.
The database performance and tuning principles are related to activities covered under (b)
through (e) and need an RDBMS for their implementation.
The next task is to bring order into this data pool chaos by grouping the data into separate
sets such that each set consists of data elements representing an atomic concept such as
person, place, object, event, etc. The guiding principle here is not to mix two or more
atomic concepts into a single set. Therefore, the data modeler performs
Step 2: Create distinct sets of data elements from the pool such that each set represents
an atomic concept.
8 1. Database Application Development
The sets thus created do not normally have common data elements among them. Since
the data in a database must be related to one another, the data modeler needs to create
links among the sets, which leads to
Step 3: Create additional sets, as needed, so that distinct atomic sets are linked by
them. Such link sets do not represent atomic concepts. Instead, they relate two
or more atomic concepts.
Steps (1) through (3) are iterative in nature in that a final enterprise data model results
after several attempts with the three steps.
In 1976, Peter Chen proposed an entity relationship (ER) model to formalize these
steps (see [2]). Each atomic set is called an entity, and each linking set a relationship.
Thus, an enterprise data model consists of entities and relationships. The component data
elements of entities and relationships are called attributes.
A database transaction involves accessing one or more records located among one or
more entities and relationships in the database. There are two types of transactions: re-
trieval, which is READ ONLY access, and update, which is WRITE access and is of
three types—insert a new record, modify an existing record, or delete an existing record.
Thus, a transaction requires searching for record(s) matching a given set of criteria. This
is facilitated by creating keys in entities and relationships. A key is defined as a set of at-
tribute(s) in an entity or a relationship such that given a value of the key the search re-
turns either no record or only one record. When multiple sets of attribute(s) qualify for a
key, each set is called a candidate key. One of them is designated a primary key and the
rest are called alternate key(s).
Two entities A and B, say, can be related to each other in one of three possible ways:
one to one (1:1), one to many (1:N), and many to many (M:N). If a given value of
the primary key of A matches that of only one record in B, then A and B are related in a
1:1 manner. If a given value of the primary key of A matches that of one or more records
in B, then A and B are related in a 1:N manner. Finally, if a given value of the primary
key of A matches that of one or more records in B and vice versa, then A and B are
related in an M:N manner. In this case, a new relationship R, say, can be created such that
A and R as well as B and R are each related in a 1:N manner. The data modeler thus
performs
Step 4: Determine how different entities are related and replace each M:N type of re-
lationship with two distinct relationships each of type 1:N. As a minimum,
each relationship contains the primary keys of both linked entities as attributes.
1.3 Enterprise Modeling 9
An entity may contain vector-valued attributes. For example, consider the following Stu-
dent entity,
STUDENT (ID, Name, Major, Course Number, Grade)
with the sample data values shown in Figure 1.1.
The attributes, Course Number and Grade, of the first two records are vector valued
since more than one value is recorded there. In general, an RDBMS cannot process such
records. Consequently, the data modeler needs to perform
Step 5: Break down each vector valued attribute into multiple records, each containing
a single value from the vector and repeat the remaining attributes.
The student entity then will appear as shown in Figure 1.2.
10 1. Database Application Development
However, Oracle 8i supports a datatype called VARRAY that can handle vector-
valued attributes.
Two or more entities can have several common attributes and a few distinct attri-
butes. For example, an organization can have different types of employees such as
Salaried, Contract, and Wage-Based. All of them have many common attributes such
as Name, Address, Date of Birth, Hire Date, Department, etc. But a salaried employee
has Annual Salary, a contract employee has Contract Amount, and a Wage-Based
employee has Hourly Wage as their respective unique attributes. In such a case, the
data modeler creates a supertype called EMPLOYEE containing all the common attrib-
utes, and three subtypes called SALARIED EMPLOYEE, CONTRACT EMPLOYEE,
and WAGE BASED EMPLOYEE. Each subtype contains the primary key, Social
Security Number, of the supertype EMPLOYEE, and all the distinct attribute(s) of
that subtype. For example, SALARIED EMPLOYEE may contain only two attributes:
Social Security Number and Annual Salary. Figure 1.3 shows this situation, where ISA
stands for “is a supertype of”. Thus, the data modeler completes the enterprise model
with
Step 6: Identify entities that can be classified as supertypes and subtypes and separate
them into distinct entities such that each subtype is a supertype and each su-
pertype is one of the subtypes.
These six steps complete the enterprise modeling and produce a data model that works as
the starting point of the next phase, the logical database design.
1.3 Enterprise Modeling 11
Employee
ISA
ISA
ISA
Salaried Contract Wage
This phase starts by performing the following conversions, almost mechanically, from the
enterprise data model.
• All the entities and relationships become tables and their respective attributes be-
come the columns of the corresponding tables.
• The primary key of an entity becomes the primary key of the corresponding table.
• The primary key of a relationship consists of the primary keys of the entities linked
by the relationship.
Functional dependency and normalization are driving principles for refining the logi-
cal database design. A set of columns B, say, in a table T is functionally dependent (FD)
on another set of columns A, say, in T if for a given value of A we always get only one
set of values of B. Thus, any non-PK column in a table is functionally dependent on the
PK of the table. Normalization is the mechanism by which each table is made to represent
an atomic entity. The level of atomicity depends on the specific normal form to which a
table belongs. There are six normal forms defined in a progressively restrictive manner:
first, second, third, Boyce-Codd, fourth, and fifth. For logical design purposes it is
enough to strive for a database with all of its tables in the third normal form.
Figure 1.5 shows the entity relationship diagram for the above 3-table database.
Student
Course
The principle of referential integrity involving foreign keys plays a crucial role in Step
(c), Section 1.4.2 above. The principle is stated as follows.
14 1. Database Application Development
One or more columns in a table T constitute a foreign key (FK) of T if they match the
primary key (PK) or a unique key (UK) of a table S, where T and S can be the same or
different tables. A value of FK can be inserted or modified in T only if its matching PK
or UK in S already contains that value. A value of the PK or UK of S can be deleted only
if there are no matching FK values in T.
In most cases, tables T and S are different. For example, in Figure 1.4, GRADE.ID is
an FK in GRADE matching STUDENT.ID as the PK of STUDENT. They are two differ-
ent tables. Now consider the table SUBJECT given below.
SUBJECT (Course Number, Course Description, Prerequisite
Course Number)
Here Course Number is the PK of SUBJECT. But Prerequisite Course Number must
always match Course Number. Hence Prerequisite Course Number is an FK matching the
PK Course Number of the same table. Such tables are called self-referencing tables.
Based on the principle of referential integrity the tables in a database are divided into
two categories, validation and transaction. A validation table contains the valid column
values for the matching columns in transaction table(s). If a new data value in a transac-
tion table does not match an existing data value in a validation table, then it cannot be
entered, thereby ensuring data integrity. On the other hand, if a transaction table contains
data matching row(s) in the corresponding validation table, then the row(s) in the valida-
tion table can be deleted only after the corresponding row(s) in the transaction table have
been deleted. This phenomenon is called cascading of updates and deletes. To cascade an
update, a change made in a validation table is propagated to all the related transaction ta-
bles. To cascade a delete, when a row is deleted from a validation table, all rows in all
transaction tables depending on the deleted row are also deleted. If the RDBMS does not
support the cascade option through DDL, it should be enforced via triggers.
During the enterprise modeling and logical database design a lot of information is gener-
ated about the structure of the proposed database. A data dictionary provides a central
repository to document the work done during Phases (1) and (2), and is often described as
a database about a database, or a meta database. It should preferably be built with a
CASE (Computer Aided Software Engineering) tool such as ERwin, Power Designer,
Oracle Designer, etc. although some desktop RDBMS software such as Microsoft Access
can also be used.
A CASE tool automatically builds the data dictionary as the enterprise data model is
created. If, however, RDBMS software is used to create the data dictionary, the following
structure is recommended.
As a minimum, the data dictionary should consist of the following three tables to
document the structure of the database for the application.
1.5 Physical Database Design 15
The transition from logical to physical database design involves the mapping of the logi-
cal data structures onto physical disk storage. This requires the sizing of tables, indices,
and tablespaces. The internal storage requirements of these database objects are properly
16 1. Database Application Development
estimated in order to assign values to their initial and next extents. Appendix A contains
detailed algorithms and two C programs to estimate data storage requirements.
Oracle partitions the physical disk space of a database into a hierarchy of compo-
nents: tablespace, segment, extent, and block. A database consists of multiple table-
spaces, which are logical concepts. Each tablespace spans one or more physical disk files
that are explicitly assigned to it. A tablespace consists of many segments such as table,
index, temporary, rollback, etc. Thus, a segment is a logical concept and is mapped onto
multiple chunks of physical storage areas called extents. The extents belonging to a seg-
ment need not be contiguous. Each extent, however, consists of multiple contiguous
blocks, which are the lowest indivisible units of disk space. See Figure 1.6.
An Oracle block can be of size 2, 4, 8, 16, 32, or 64 K with 2 K being the default,
where K = 1,024 bytes. A segment consists of an initial extent and many next extents
with the total number of extents not exceeding a parameter called maxextent. The default
value of the maxextent depends on the operating system and the block size. For most op-
erating systems the default values are given by the following table.
Tablespaces
Extent = contiguous
Extents blocks
The tasks under the physical database design, therefore, consist of the following.
(a) Decide on the unique and nonunique indices needed for each table. This depends on
the reporting needs and may not be fully known at this stage. The best possible esti-
mates will suffice. Remember that all the PKs, UKs, and FKs have already been de-
termined during the logical database design.
(b) Estimate the total storage space needed for each table and each index including the
PKs and FKs.
(c) Estimate the initial and next extents for each table and each index.
18 1. Database Application Development
(d) Estimate the total storage space needed for each tablespace containing the tables and
indices.
(e) Estimate the total storage space needed for the rollback tablespace, temporary table-
space, user tablespace, and tools tablespace.
Capacity planning for a database is critical before implementing it. Various algo-
rithms are available for this purpose. Appendix A contains a detailed procedure with ex-
amples for sizing tables, indices, and tablespaces.
Depending on the future growth of a database in production an appropriate archiving
policy should be considered at this stage. The policy is guided by business and legal re-
quirements for the organization, e.g., how long the data need be available online, how
much storage is currently available, what type of storage, disk or tape/cartridge, will be
used, etc. Periodic archiving, purging of old data from the current database, and saving
them in archived storage are standard components of an archiving policy. Such archiving
keeps tables at a manageable size and thereby improves performance. The archived data
can be accessed whenever necessary, e.g., to meet legal issues, perform analysis for busi-
ness trends, etc.
Preparation
Create all the requisite directories for various system files such as initSID.ora, config-
SID.ora, listener.ora, etc. and destinations of datafiles for tablespaces, where SID is the
instance name.
Instance Creation
Using Server Manager in line mode and connecting as internal, perform the following
tasks.
• Start up the instance via the initSID.ora file.
• Using the CREATE DATABASE command create the redo log file groups and the
SYSTEM datafile.
Database Creation
Using Server Manager in line mode and connecting as internal, create the
• Data dictionary (Oracle catalog) in SYSTEM tablespace;
1.6 Database Implementation 19
Schema Creation
Log in as a DBA privileged user other than SYSTEM/MANAGER and then create all ta-
bles, indices, constraints, triggers, procedures, functions, and packages. Maintain the
proper order in creating tables with constraints so as to enforce the referential integrity
via PK – UK/FK relationships. Thereby all the database objects in the schema are owned
by this DBA privileged user instead of by SYSTEM.
An internally consistent database structure with all of its data validation routines has
now been implemented. We next need to validate this structure through data loading to
populate all the tables.
Database Loading
Database loading is affected by several factors such as the sizes and numbers of tables
and indices, the order of loading tables linked by PK – UK/FK relationship to maintain
referential integrity, triggers, functions, and procedures, etc. For loading very large tables
it is recommended that indices be dropped and later recreated after the loading is com-
plete. If the order of input data does not match the order of the referential integrity con-
straints, all FK constraints should be disabled before the loading and then be enabled af-
ter the loading is complete.
• Perform necessary data conversions, if appropriate, to prepare clean data for popu-
lating all the data tables.
• Data can be loaded manually through data entry screens by data entry operators or
automatically through program control or some bulk data loading utility such as
SQL*Loader from Oracle.
The above steps should be implemented through scripts, as far as possible. Appendix
B contains a complete set of scripts and procedures for this purpose.
20 1. Database Application Development
• Index,
• Tablespace,
• Synonym, and
• Sequence.
Sections 1.8.1 through 1.8.8 describe the naming guidelines for the above objects.
No guidelines are provided for naming packages, procedures, functions, and triggers,
because these are programs rather than data objects. Normally, such objects are named so
as to indicate their purpose. Examples are UPDATE_CUSTNO, CHECK_BALANCE,
etc.
CASE tools such as ERwin or Oracle Designer often use their own default naming
conventions. In such cases, the following guidelines may not always be applicable.
1.8.1 Database
1. A database name will be the same as the application name in capital letters followed
by _DB. Example: ORDER_DB.
2. If multiple databases are used in an application, then the application name will have
suffix starting with 1 followed by _DB. Example: ORDR1_DB, ORDR2_DB,
ORDR3_DB, etc.
1.8.2 Column
1. A column name will be expressive enough to indicate the data element it represents.
Example: STATUS, COLOR, etc.
2. If a column name has multiple parts, each part will be connected to another by an
underscore (_). Example: SUPPLIER_NO, PART_CITY, etc.
3. Column names using generic descriptors like name, description, number, etc. will be
prefixed with the table name. Example: SUPPLIER_NAME instead of NAME,
PART_DESCRIPTION instead of DESCRIPTION, etc.
4. A column representing a date will have two or more parts, where the last part will be
DATE. The remaining part(s) will indicate the type of the date being named. Exam-
ple: SHIPMENT_DATE, INVOICE_CREATION_DATE, etc.
5. The data type of a column will be in uppercase. Example: NUMBER, DATE, etc.
6. If the data type is followed by a size indicator, then these conventions will be used:
• Single number size indicator: data type, one blank space, left parenthesis,
number, right parenthesis, Example: CHAR (6), NUMBER (5)
• Paired number size indicator: data type, one blank space, left parenthesis,
first number, comma, second number, right parenthesis, Example:
NUMBER (4,1).
22 1. Database Application Development
1.8.3 Table
1. The table name will be expressive enough to indicate the contents of the table. Ex-
ample: SUPPLIER, PART, etc.
2. If a table name has multiple parts, each part will be connected to another by an un-
derscore (_). Example: DETENTION_PAYMENT_DETAIL, AMTRAK_SHIP-
MENT, INBOUND_CAR_LOCATOR_MESSAGE, etc.
1.8.4 View
1. A view based on a single table will have the same name as the table followed with a
suffix _V (V for view).
2. The name of a view based on multiple tables will consist of the component table
names connected by #(s) and followed with a suffix _V. If thereby the name exceeds
30 characters (Oracle restriction on names), then the component table name(s) will
be abbreviated meaningfully to comply with the 30-character restriction. Example:
SUPPLIER#PART_V, SHIPMENT# AMTRAK_SHIPMENT_V, etc.
1.8.5 Index
The standards will handle four types of indices: PK, FK, unique, and nonunique. An in-
dex can be single or composite. There can be only one PK for a table.
1. The PK of a table will be named PK_(table name).
Example: PK_SUPPLIER = SUPPLIER_NO single index
PK_SHIPMENT = (SUPPLIER_NO, PART_NO) composite index.
The PRIMARY KEY clause of the CREATE TABLE command will not be used to cre-
ate the PK of a table since it does not allow the assignment of a selected name to the PK.
Instead Oracle assigns a default name of the form SYS_Cnnnnn (e.g., SYS_C00972),
which is very obscure. The PK will be created via the clause
CONSTRAINT PK_(name) PRIMARY KEY (column(s))
of the CREATE TABLE command.
2. The FK will be named FK_(column name(s)). If the FK is based on multiple col-
umns, the column names will be connected by #(s).
Example: FK_ SUPPLIER_NO, FK_ PART_NO,
FK_PAYMENT_ID#PAYMENT_HISTORY_ID, etc.
The REFERENCES clause of the CREATE TABLE command will not be used to create
the FK(s) in a table since it does not allow the assignment of a selected name to the FK.
Instead Oracle assigns a default name of the form SYS_Cnnnnn (e.g., SYS_C00972),
which is very obscure. The FK will be created via the clause
1.8 Naming Guidelines for Database Objects 23
Example: FK_UNIT_ID#UNIT_HISTORY_ID_1,
FK_UNIT_ID#UNIT_HISTORY_ID_2, etc.
4. A unique index will be named UNQ_(column name(s)). If the index is based on
multiple columns, the column names will be connected by #(s).
Example: UNQ_UNIT_ID#UNIT_HISTORY_ID_1,
UNQ_UNIT_ID#UNIT_HISTORY_ID_2, etc.
6. A nonunique index will be named IND_(column name(s)). If the index is based on
multiple columns, the column names will be connected by #(s).
Example: IND_UNIT_ID#UNIT_HISTORY_ID_1,
IND_UNIT_ID#UNIT_HISTORY_ID_2, etc.
24 1. Database Application Development
1.8.6 Tablespace
A database consists of many tablespaces and each tablespace is made up of one or more
datafiles. Tablespaces are the mechanisms by which logical database objects such as ta-
bles, indices, rollback segments, etc. are mapped onto the physical storage structure.
These tablespaces are normally used in a database:
• System,
• Data,
• Index,
• RBS (rollback segments),
• TEMP (temporary segments),
• Tools, and
• Users.
Except for the System tablespace, there can be one or more of the remaining table-
spaces depending on the size and complexity of the database.
1. System: The System tablespace will be named SYSTEM and will contain only Ora-
cle’s data dictionary tables (owned by SYS), the V$ views, and the System rollback
segment.
2. Data: If the database contains both dynamic transaction tables and relatively static
validation tables, then there will be at least two tablespaces called DATA_TRANS
and DATA_VALS for these two types of tables. Additional tablespaces, if needed,
will be named DATA_TRANS1, DATA_TRANS2, DATA_VALS1,
DATA_VALS2 etc.
3. Index: If the database contains both dynamic transaction tables and relatively static
validation tables, then there will be at least two tablespaces called INDEX_TRANS
and INDEX_VALS for the indices created on these two types of tables. Additional
tablespaces, if needed, will be named INDEX_TRANS1, INDEX_TRANS2,
INDEX_VALS1, INDEX_VALS2 etc.
4. RBS: If a single rollback tablespace is needed, it will be named RBS. If multiple roll-
back tablespaces are needed, they will be named RBS_1, RBS_2, etc.
5. TEMP: The temporary segment will be named TEMP. If certain users of an applica-
tion require much larger temporary segments than the rest of the application’s users,
then a separate temporary segment will be created for them under the name
TEMP_USER.
6. Tools: Many Oracle and third party tools store their data segments in the SYSTEM
tablespace because they store them under the SYSTEM database account, which has
the SYSTEM tablespace as its default tablespace. To avoid this situation, the
SYSTEM account’s default tablespace will be named TOOLS and its quota will be
revoked on the SYSTEM tablespace. If multiple TOOLS tablespaces are needed,
they will be named TOOLS_1, TOOLS_2, etc.
Key Words 25
If a database shows a lot of activity against the TOOLS tablespace, then the indices
for these tools’ data tables should be moved to a different tablespace. This tablespace will
be named TOOLS_IND. If multiple TOOLS_IND tablespaces are needed, they will be
named TOOLS_IND1, TOOLS_IND2, etc.
7. Users: If users are allowed the privilege of creating database objects in the test data-
base during the test phase, users’ quotas on other tablespaces will be revoked and
their default tablespace will be named USERS.
1.8.7 Synonym
A synonym is a name assigned to a table or a view that may thereafter be used to refer to
it and is created via the CREATE SYNONYM command.
The name of a synonym will be the same as the name of the table or view to which it
refers but without the owner as a prefix. Example: SMITTRA.SUPPLIER table created
by SMITTRA will have synonym SUPPLIER, OPSUSER.SUPPLIER#PART_V view
created by OPSUSER will have the synonym SUPPLIER#PART_V, etc.
1.8.8 Sequence
A sequence is used to generate unique integer values for primary keys with data type
NUMBER or NUMBER (integer) (e.g., NUMBER (6)) and is created via the CREATE
SEQUENCE command.
The name of a sequence will be the same as the name of the PK it sequences and will
be of the form SEQ_(PK name).
Example: SEQ_PK_PART refers to the sequence that assigns unique integer values
to the primary key PK_PART comprising the column PART_NO of the PART table.
Key Words
1:1 relationship cardinality
1:N relationship cascading
1NF chaining
2NF class
3NF client-server application
alternate key conceptual level
archiving Data Definition Language
atomic concept data dictionary
attribute Data Manipulation Language
block data validation
candidate key database transaction
26 1. Database Application Development
Some prior database experience is needed in order to understand the topics discussed
in this chapter. McFadden and Hoffer [8] and Mittra [10] provide ample coverage of such
materials. Codd’s paper [3] is regarded as a classic in relational database systems. It in-
troduces and builds the entire theory and application of RDBMS with mathematics as its
foundation, which was Codd’s goal. The terms relation, tuple, and attribute have their
origin in mathematics, and the same applies to functional dependency and transitive de-
pendency. Under his leadership the IBM System Research Institute built System R, the
very first RDBMS, although it was never marketed for commercial use. SDLC and
SMLC are tangentially related to the database application development process. Mittra
[9] offers adequate coverage of the related concepts. Mittra [11] discusses how SDLC
and SMLC are used in a client-server environment. Yourdon [12] deals with the struc-
tured methodology from a management perspective. Also, any textbook dealing with
systems analysis and design can be used to gain additional information in this area. The
five-phase approach to database development is fairly standard in industry, although the
exact packaging of tasks in each phase is somewhat open-ended. I have given my own
view here. McFadden and Hoffer [8, Chapters 3, 5] discuss this topic in a more descrip-
tive manner. Burleson [1, Chapter 1] discusses the enterprise modeling and logical data-
base design at length as a preparation for database tuning. An efficient physical database
design requires a solid understanding of the architecture of the underlying RDBMS that is
used for implementing the database. I have assumed here Oracle as the RDBMS and de-
scribed the storage structure accordingly. Additional discussion of Oracle-specific con-
cepts and algorithms related to physical design, implementation, and maintenance of da-
tabases are available in Corey et al. [4] and Loney [7].
Exercises
The Philanthropy Foundation, Inc. (PFI) is a Boston-based nonprofit agency that awards
grants for philanthropic activities to organizations in the U.S.A. and abroad. Started in
1966 as a small five-member private business it has now grown into a company with over
100 employees. PFI operates with these four goals:
• Invite organizations worldwide to submit proposals for receiving PFI grants;
28 1. Database Application Development
Outline
2.1 Three Levels of a Database
2.2 Optimization at Each Level
2.3 Process and Metric for Performance Tuning
Key Words
References and Further Reading
Exercises
longing to the database. From a logical design standpoint it includes the definitions of ta-
bles, indices, constraints, views, triggers, procedures, functions, packages, synonyms, etc.
From a physical design standpoint it contains the specifications of tablespaces, datafiles,
and the storage clauses of all tables and indices. To sum up, the conceptual level provides
all the details for implementing a database. One only needs to write the DDL commands
and programs to convert the conceptual level into an operational database. Since the con-
ceptual level is independent of implementation, it can be developed using only pseu-
docode. But a data dictionary built with a CASE tool is the best vehicle for building such
a conceptual level. As an example, the CASE tool Oracle Designer offers such a facility.
One can draw an E/R diagram with Oracle Designer and start to build at the entity-
attribute level, and then gradually make a transition to the logical design and then to the
physical design phases. Oracle Designer captures all the information in its repository and
can also generate all the script files from it with which the developers or the DBAs can
then implement the database.
The internal level, also called the physical level, is the one closest to the physical
storage, where the data contents of the conceptual level are stored on disk files. In addi-
tion, the internal level contains a set of memory resident data structures that Oracle calls
the System Global Area, and a set of background processes. Similar to the conceptual
level, there is only one internal level. The rows of the tables in the conceptual level are
often called logical records, and their counterparts at the internal level are called physical
records.
The external level is closest to the users. It is concerned with the way in which the us-
ers view the data for their own use. Thus, different users access different external levels
of the database. The user can be an end user, a database developer, or a DBA working on
some specific query generation. Each external level consists of a subset of the conceptual
level. In this sense, the latter can be regarded as a set-theoretic union of all possible ex-
ternal levels. Hence, for a given database, there can be many external levels, but only one
conceptual level. Some authors call it the logical level.
The tools used for building the external levels are SQL-based and can be GUI type or
executed at the command line. In an n-tier (n ≥ 3) client-server architecture the GUI type
external levels normally reside on the application tier(s) comprising levels 2 through n –
1. They access the database residing on the database server at the nth tier through the in-
tervention of application programs. But the external level of any user accessing the data-
base through the command line interface resides on the database server at the nth tier. An
example is a DBA who wants to test the performance of a query and types the query di-
rectly at the SQL prompt.
During the five-phase database development process discussed in Chapter 1 the con-
ceptual level is designed starting with the enterprise modeling (Phase (1)) and continuing
through the logical database design (Phase (2)). The external levels are directly related to
the reporting needs of the users. Hence their design starts during the logical database de-
sign (Phase (2)) and continues through the physical database design (Phase (3)). The in-
ternal level is designed during the physical database design (Phase (3)). All three levels
as designed are then implemented during the database implementation (Phase (4)). The
2.2 Optimization at Each Level 31
performance monitoring and tuning of the database at all three levels are conducted dur-
ing database maintenance (Phase (5)).
The three levels communicate with one another through sets of mappings (see [4,
Chapter 1]), as shown in Figure 2.1:
1. External to conceptual level mapping, and
2. Conceptual to internal level mapping.
The first mapping defines the correspondence between a particular external level and
the conceptual level. It maps each column referenced in the external level to its counter-
part in the conceptual level. Since there are many external levels but only one conceptual
level, this mapping is many-to-one. The second mapping specifies how the columns in
the conceptual level map onto their stored counterparts in the internal level. Since each
logical record maps in a one-to-one manner to a physical record, this mapping is one-to-
one.
The external/conceptual level mapping ensures the logical data independence. As the
database grows, it may become necessary to modify the conceptual level by adding new
data structures. But such changes do not affect the existing external levels. New external
levels can be generated to utilize the modified conceptual level. Similarly, the concep-
tual/internal level mapping implements the physical data independence. This means that
the internal storage structure for database objects can be changed without affecting the
conceptual level. Internally, the DBMS keeps track of the actual storage locations of all
objects. Application programs using the conceptual level need not be changed at all. The
DBMS will provide the necessary connections to run such programs under the changed
internal level.
External Levels
...
Many-one mapping
Conceptual Level
One-one mapping
Internal Level
Since the starting point of a database application is the conceptual level, the optimization
process should start here. If the five-phase database development process described in
Sections 1.3 through 1.7 are followed properly and are well documented in the data dic-
tionary as outlined there, then the resulting conceptual level is optimized for perform-
ance, when implemented. If, however, some queries at the external level require five or
more joins due to normalization necessitated by 3NF tables, then some level of denor-
malization may be needed for performance reasons. This should be the only deviation
from the normalization principle.
Two other measures adopted for improving performance are partitioning of the con-
ceptual level, as allowed under Oracle 8i, and data replication. Very large tables may be
partitioned into separate smaller fragments for parallel processing and conceptual levels
may be copied at different locations through data replication to reduce network traffic in
a distributed database. But these two measures do not interfere with normalization.
As noted in Section 2.1, the conceptual level is implemented through a one-to-one map-
ping onto the internal level. If the internal level is properly designed according to the
guidelines described in Sections 1.5 and 1.6, it starts at an optimal stage. But fragmenta-
2.3 Process and Metric for Performance Tuning 33
tion of used space of the tables and of free space in their respective tablespaces can occur
due to improper sizing. Fragmentation occurs when the extents of a table are not con-
tiguous, or when the free space of a tablespace consists of many small sets of non-
contiguous extents instead of a few large sets of contiguous extents. It degrades database
performance because a single table with many noncontiguous extents is scattered over
many small isolated fragments. Therefore, a data search may have to traverse many
fragments to access the required data.
Chaining and migration of rows (see Section 5.9) in a table can occur due to inserts
and updates of records when its PCTFREE parameter is set incorrectly or when columns
with VARCHAR2 data types that were initially NULL start getting updated with nonnull
values. Excessive chaining degrades performance because retrieval of chained data in-
volves following a chain of pointers, which is slower than retrieving physically contigu-
ous data. Tables with LONG or LONG RAW data types deserve special attention since
they almost always cause chaining. Adequate planning during the physical database de-
sign discussed in Section 1.5 is required to handle such issues. Constant monitoring of
such phenomena during database maintenance is needed to keep the internal level opti-
mized.
Query optimization is the tuning mechanism for the external level. A given query can
usually be implemented in multiple ways. The query optimizer in an RDBMS functions
as follows (see Sections 8.2 and 8.3).
• Formulates one or more internal representations of the query using relational algebra
or relational calculus, the two underlying mathematical formalisms for representing
queries;
• Converts the representation(s) into a single equivalent canonical form that can be
processed more efficiently;
• Selects an optimal access path, often called the execution plan for the query, to im-
plement the query.
This improvement in implementation strategy is called query optimization. The strat-
egy applies to both interactive and embedded forms of SQL. The latter is used in pro-
grams written in procedural languages such as PL/SQL and host language interfaces such
as Pro*C, Pro*C++, Pro*COBOL, etc. Query optimization strives to minimize the re-
sponse time for the query or maximize the throughput.
A word of caution! The example discussed below assumes a good level of knowledge
about the internal and external levels of a database. If the reader lacks such knowledge,
the example will not be very meaningful. Please note, however, that the different chapters
of the book cover all these materials. Therefore, the reader may want to revisit this exam-
ple to understand its contents after completing Chapters 1 through 8.
Now to the example! I was working with an Oracle8 database comprising some 70 ta-
bles and 300 indices including primary and foreign keys. Some of the tables contained
millions of rows and the entire database at production level would comprise some 200
gigabytes of storage. One developer was running a view based on five joins involving 6
tables and was getting very poor performance. When she complained to me about the
situation and asked me for a suggestion, my first reaction was to go for denormalization
to reduce the number of joins. We often find that the query performance starts to degrade
when the number of joins exceeds four. However, I regard denormalization as the last re-
sort, because it changes the structure of the conceptual level and can have adverse side ef-
fects that may not be uncovered and resolved very easily. Therefore, I proceeded with my
three-step principle as follows.
(a) Localize the problem and trace it to its root.
Since the long response time was caused by the query, I used the EXPLAIN PLAN
utility with SET AUTOTRACE ON to derive and analyze the execution plan of the query
(see Section 8.7 and its subsections). Nothing of a glaring nature appeared there. The
2.3 Process and Metric for Performance Tuning 35
query was using a driving table with full table scan and was accessing all the other tables
in indexed search, as expected. No additional indexing would help. So, the root cause was
not there.
Next I wanted to find the level of chaining (see Section 5.9) in the tables, if any. Note
that one has to ANALYZE a table to get its chaining level. So, I ANALYZEd each of the
six tables with the COMPUTE STATISTICS option. Then I queried the Oracle data dic-
tionary view DBA_TABLES to retrieve the values of three of its columns, NUM_ROWS,
CHAIN_CNT, and AVG_ROW_LEN, for each of the six tables. The driving table that
was being searched sequentially had the statistics:
NUM_ROWS = 10,003
CHAIN_CNT = 9,357
AVG_ROW_LEN = 8,522
The driving table had 93.5% (= 9,357/10,003) chaining, which is excessive. Almost
every row was being chained! The reason was not difficult to find. The database was us-
ing blocks of size 8 K (=8,192 bytes), whereas an average row needed 8,522 bytes to be
stored. Hence almost every row had to be chained to a second data block for storage. So,
indeed I found the root cause!
(b) Fix the root cause and test to ensure that the problem does not reappear.
I could not fix the root cause by increasing the data block size. One assigns the data
block size at the time of creating the database and it cannot be changed short of recreating
the database. So, I looked at the columns of the offending table and found one column
with its data type as LONG. Since Oracle allows a LONG data type to be of size up to
two gigabytes, the reason for such excessive chaining became obvious.
I split the driving table P, say, into parts, P1 and P2. P1 contained all the columns ex-
cept the column with the LONG data type, and P2 contained only the primary key of the
original table and the column with the LONG data type. P1 and P2 were related in a
PK/FK relationship that was maintained by referential integrity. The view that started this
whole range of tuning investigation did not use the column with the LONG data type and,
therefore, was reformulated with P1 replacing the driving table P.
I ran the query with SET TIMING ON and got an average of two seconds as the run-
ning time, which was well within the performance goal.
(c) Look for any adverse side effects caused by the fix.
This was a clean solution since an adverse effect could arise only from using the table
P2 mentioned above. All queries that used P before but did not need the LONG column
were reformulated to use P1. Any query that needed the LONG column and one or more
column(s) from P1 were formulated using both P1 and P2 with a join via P1.PK and
P2.PK. Those happened to be rare occasions and did not cause any problem.
36 2. Performance Tuning Methodology
Tuning goals are always set in measurable terms so that it can be verified objectively if
the goals have been met. A database transaction involves data retrieval and update. Since
data reside within a data block in auxiliary storage, every transaction requires access to a
block. The data must be fetched in memory for processing. Since the memory access time
is enormously fast, the time needed for any computation on the data inside a block resi-
dent in memory is negligible compared to the time needed for transferring the data from
auxiliary storage to memory. The block access time for data needed by a query is defined
as the time to fetch those data from auxiliary storage to memory. Let
tb = block access time for data needed by a query
tr = response time for that query.
Then,
tr = O (tb), where O stands for the order of magnitude.
Since tb is much simpler to compute compared to tr, we shall use tb as the metric for
measuring performance. Tuning activities are guided by the goal of reducing tb as far as
practicable. As we show in the succeeding chapters, tb is affected by a variety of factors
such as hit ratios, memory utilization, query design, logical database design, fragmenta-
tion of free space, chaining and migration of rows in a table, etc. Each of these issues
needs to be addressed.
Since it is easier to rectify flaws during the development and test phases instead of the
production phase, Oracle recommends that a new application should be tuned in the order
(see [5, pp. 1–8]):
• Design,
• Application Code,
• Memory,
• Auxiliary storage,
• Contention, and
• Operating System.
If the application is already in production, the order of the above six steps should be
reversed. Normally, the first two steps are the responsibility of the system architects and
the developers, and the last four are that of the DBAs. We should strive for an environ-
ment where the two groups work together.
Conceptual Level
• Denormalization
• Partitioning of Very Large Tables
• Data Replication in Distributed Database
Internal Level
• Data Storage
Two main reasons for inefficiency of data storage are fragmentation of tablespaces, and
chaining and migration of rows in tables.
• Memory Usage
Here the goal is to make data available in memory instead of retrieving them from the
disks since the data transfer rate for memory is an order of magnitude faster than that
from auxiliary storage.
External Level
• Data Retrieval
Oracle’s EXPLAIN PLAN utility (see Section 8.7.1) allows the developer to examine the
access path and to determine whether any changes should be made in indexing the ta-
ble(s) involved, reducing the number of joins, etc. The selectivity of an indexed column
determines if a full table scan is preferable to an indexed search.
• Data Update
If a large table is heavily indexed besides its PK, it may be worthwhile to drop all the
indices before starting the bulk update. It really depends on the number of rows and the
time and convenience of dropping and creating the indices. Since Oracle rearranges the
B*-tree for each index with an update of the indexed table, the process becomes very
slow for bulk updates. As a rule of thumb, if the data load with indices takes longer than
10 minutes or so, then I drop the indices first, load the data into the table, and then create
the indices for the updated table. Bulk deletes are best handled through the TRUNCATE
command. Although this command uses rollback segments to handle adjustments to the
segment header and to the data dictionary, the rollback segment space used is very small.
Hence the error of rollback segments being too small does not arise. Also, TRUNCATE
retains all the structural information of the table such as grants, indices, and constraints.
In addition, DELETE does not reclaim the space occupied by the deleted records. But
TRUNCATE with its default DROP STORAGE option deallocates all the extents of the
table except its INITIAL extent.
38 2. Performance Tuning Methodology
Key Words
access path logical design
auxiliary storage logical record
cache memory mapping, conceptual to internal
canonical form level
CASE mapping, external to conceptual
chaining level
client-server architecture N-tier
conceptual level optimization
Data Definition Language (DDL) Oracle Designer
data dictionary parallel processing
data replication partitioning
data transfer rate PCTFREE
database, distributed physical data independence
datafile physical design
denormalization physical record
driving table primary key
execution plan pseudocode
external level query optimization
fragmentation referential integrity
full table scan response time
host language interface selectivity
indexed search System Global Area
internal level tablespace
join unique key
logical data independence
Mittra [4, Chapter 1] contains a good discussion of the three levels of a database. The
notion of optimizing the database performance at each level was introduced in Mittra [3]
at an international Oracle User Week Conference. The performance and tuning methods
Exercises 39
of databases at all levels are covered in Corey et al. [1] and Loney [2]. These topics are
explored in significant detail in the rest of the book. Shasha [6, Chapter 1] describes how
tuning activities can be handled in a systematic well-planned manner. Ullman [7, Chapter
6] discusses the physical data organization from a mathematical viewpoint and explains
the basis of using block access time as a metric for tuning.
Exercises
1. The dictum of “Divide and Conquer” is often used as a principle in solving problems
in various disciplines including computer science.
a. Can you use this principle to handle database performance tuning?
b. How would it differ from the three-step methodology described in Section 2.3?
2. Use the database that you designed for GAMES (Exercises in Chapter 1).
a. Using structured English or pseudocode describe the external levels below:
• List of all grantees in Argentina, India, and Japan for their projects dealing
with the establishment of eye clinics in their countries;
• List of all project names with respective start and end dates that received
“adverse” findings on scrutiny.
b. Identify the entities referenced by each external level above.
3. Refer to the entity RFP in the GAMES database of Chapter 1. Assume that you have
created a table RFP matching this entity under Oracle. The following questions apply
to this Oracle table RFP.
a. Can you think of any column in RFP that should have the LONG data type? If
so, how would you split RFP into two separate tables to avoid the kind of prob-
lems described in Section 2.3.1?
b. Can you think of more than one column in RFP that should have the LONG data
type? Since the table RFP can have only one column with LONG datatype, how
would you resolve the problem?
c. Repeat parts (a) and (b) above for the table GRANT_SCRUTINY matching the
entity GRANT SCRUTINY in its structure.
4. A tuning action mostly results in some structural change of the database. For exam-
ple, a new index may be created, size of next extents may be changed, a new table-
space may be created, etc. As a result, some or all of the three levels of the database
may change.
a. How will you ensure that the three levels will still continue to be mutually con-
sistent?
b. What tools (e.g., CASE tool, desktop DBMS, spreadsheet, etc.) do you need to
implement your proposal?
3
Tuning the Conceptual Level of a Database
Outline
3.1 Three Versions of the Conceptual Level
3.2 Performance Issues at the Conceptual Level
3.3 Denormalization of the Conceptual Level
3.4 Optimal Indexing of Tables
3.5 Integration of Views into Queries
3.6 Partitioning of Tables and Indices
3.7 Data Replication
Key Words
References and Further Reading
Exercises
Very large tables in a database can be partitioned to allow parallel processing of data
yielding lower response time. Only tables and indices can be partitioned dynamically into
smaller components under Oracle 8i. All partitions have the same logical attributes (e.g.,
column names, data types, etc.), but may have different physical attributes related to stor-
age. The partitions of an index need not match those of the table to which the index be-
longs. The partitions can be implemented at the time of creating the tables or indices, or
can be added later.
Data replication applies to a database that resides in multiple sites, as with a distrib-
uted database. It reduces network traffic generated by queries accessing data from multi-
ple sites. The query is processed with the local replica of the database, as far as practica-
ble. Oracle 8i offers two types of data replication, basic and advanced. Basic replication
provides at each remote site read-only replicas of tables selected for data replication. Ad-
vanced replication extends the features of basic replication by allowing updatable replicas
at remote sites.
The development database is designed initially with the goal of making all tables in third
normal form (3NF). As a result, data are scattered over many tables. But the user view of
the database is geared towards running ad hoc queries and generating reports. It is not
surprising for a 3NF-only database to use joins of 6 or more tables to create a report or a
view. Normally, response time starts to degrade when more than 4 tables are involved in
a multi-table join. I have seen queries involving over 12 tables take 18 to 20 minutes to
produce the result. That is unacceptable in an OLTP environment. An Oracle expert once
44 3. Tuning the Conceptual Level of a Database
remarked to the effect that no production database can run in only 3NFs. Let us examine
why a large number (> 4) of joins leads to poor performance.
A join is a binary operation. When n (> 2) tables are joined in a query, Oracle desig-
nates one of them as the driving table and performs a full table scan of it to start the first
join. The resulting row set is labeled a driving set and is joined with the third table to
produce the next driving set. This process is repeated n - 1 times to exhaust the n tables
and return the final result. Depending on the availability and selectivity of the indices in
the tables and the sizes of the driving sets, the query uses indexed search and/or full table
scans. The actual query processing and optimization are discussed in Chapters 8 and 9.
But the concept of denormalization can be understood without knowing the details of
query processing.
The response time of a join query can improve if we can reduce the number of joins
in the query. To do that, we eliminate one or more tables by bringing their relevant col-
umns into another table. Thereby we degenerate 3NF tables into 2NF or even 1NF tables
and allow transitive dependency to enter into such tables. This is called denormalization,
which is incorporated at the conceptual level and thereby enforced at the internal level.
We strive for improving query performance at the cost of introducing possible update
anomalies. However, an update anomaly is more of a theoretical issue than an operational
reality for a database with tables below 3NF. Hence denormalization rarely causes loss of
data through updates.
Chris Date has offered a slightly different view of denormalization. Date recommends
that denormalization should happen "at the physical storage level, not at the logical or
base relation level" [3, p. 5]. This means that the mapping between the conceptual and the
internal levels need not be one-to-one (see Section 2.1). One possible implementation of
this principle is through clustering. See Section 8.6.5 for a discussion of clusters.
Let us take an example of denormalization. Figure 3.1, A five-table database, shows the
schema involving five tables, CUSTOMER, TRANSPORT_ROUTE, ROUTE_PLAN,
ORDER, and STATE. The business situation is as follows.
A CUSTOMER places an ORDER for one or more items. Each ORDER is tied to a
TRANSPORT_ROUTE, which follows a ROUTE_PLAN. Two of the columns in the
ROUTE_PLAN table are ORIGIN_STATE and DESTINATION_STATE, which are
validated through its validation table STATE. For the customer named Acme Corpora-
tion, a developer wants to determine the name(s) of state(s) through which all orders ex-
ceeding $100,000 from that customer will be transported. She uses the following logic for
this query.
(a) Select the customer;
(b) Select the order;
(c) Select the order’s transport route;
(d) Select the route plan for the route;
(e) Select the state(s) through which this route will go.
Accordingly, she writes the following query.
SELECT STATE_NAME FROM CUSTOMER, ORDER, TRANSPORT_ROUTE,
ROUTE_PLAN, STATE WHERE
CUSTOMER_NAME = ‘ACME CORPORATION’ AND
ORDER_VALUE > 100000 AND
CUSTOMER.CUSTOMER_ID = ORDER.CUSTOMER_ID AND
ORDER.ORDER_NUM = TRANSPORT_ROUTE.ORDER_NUM AND
TRANSPORT_ROUTE.ROUTE_ID = ROUTE_PLAN. ROUTE_ID AND
ROUTE_PLAN.ORIGIN_STATE = STATE.STATE_CODE AND
ROUTE_PLAN.DESTINATION_STATE = STATE.STATE_CODE;
As we can see, the query involves a five-table join and may perform poorly. We can
modify the database design to reduce the number of tables to four, as shown in Figure
3.2, and the number of joins to two. The above query now can be reformulated as a three-
table join:
SELECT STATE_NAME FROM ORDER, TRANSPORT_ROUTE,
ROUTE_PLAN WHERE
ORDER.CUSTOMER_NAME = ‘ACME CORPORATION’ AND
ORDER_VALUE > 100000 AND
ORDER.ORDER_NUM = TRANSPORT_ROUTE.ORDER_NUM AND
TRANSPORT_ROUTE.ROUTE_ID = ROUTE_PLAN. ROUTE_ID;
46 3. Tuning the Conceptual Level of a Database
The database has been denormalized in that both ORDER and ROUTE_PLAN are
now in 2NF and have transitive dependency. We sacrifice normalization to improve per-
formance.
On Line Transaction Processing (OLTP) applications update tables through online trans-
actions performed by the users. The GUI front-end accepts input data and transfers them
to the database residing on a remote tier. Data validation is maintained through referential
integrity, triggers, and stored procedures. Tables in third normal forms perform well un-
der this environment.
A decision support system (DSS) or a data warehouse (see Chapter 11), on the other
hand, poses a different type of demand on the database. Here joins of five or more tables
can occur frequently to produce reports needed by users. Denormalization reduces the
number of component tables by combining them, thereby improving the response time.
This is the advantage of denormalization.
But denormalization has its downside too. Three major disadvantages of denormali-
zation are extra storage needed for the database, extra redundancy of data in the database,
and possible change of existing code due to change(s) in the conceptual level. As shown
in Section 3.3.2, the denormalized database eliminates the table STATE by including the
State_Name with the State_Code in the table ROUTE_PLAN and in all other tables using
State_Code. Thus, instead of being stored in only one table STATE, the column
State_Name is now stored in many tables. This leads to the need for extra storage space
for the database. The extra redundancy of data is caused by the repetition of the same
column(s) in multiple tables. As a result, when such redundant columns are updated, the
updates must be propagated and synchronized among several tables. For example, if a
new state, say, Puerto Rico with a code PR, is added to USA, then State_Name = Puerto
Rico must be added to all tables where State_Code = PR appears. Synchronizing multiple
3.4 Optimal Indexing of Tables 47
Table Level
If partitioning is planned during the conceptual level design, it is implemented with the
option of PARTITION BY RANGE added to the CREATE TABLE statement. If it is
implemented after a table has been created, one uses the ALTER TABLE command with
the option of ADD PARTITION. An existing partition on a table can be changed via any
of the six commands:
• DROP PARTITION,
• MODIFY PARTITION,
• MOVE PARTITION,
• RENAME PARTITION,
50 3. Tuning the Conceptual Level of a Database
• TRUNCATE PARTITION, or
• SPLIT PARTITION.
The CREATE TABLE statement for a partitioned table specifies:
(a) Logical attributes of the table, such as column and constraint definitions;
(b) Physical attributes of the table specifying defaults for the individual partitions of the
table;
(c) Partition specification which includes:
• The table-level algorithm used to map rows to partitions based on the values of the
partition key, and
• A list of partition descriptions, one for each partition in the table
Each partition description includes a clause that defines supplemental, partition-level
information about the algorithm used to map rows to partitions. It also includes, option-
ally, a partition name and physical attributes for the partition.
Partitioned tables cannot have any columns with LONG or LONG RAW data types,
LOB data types (BLOB, CLOB, NCLOB, or BFILE), or object types.
Index Level
Partitioning is done for indices in the same way as it is done for tables, i.e., either by the
CREATE INDEX command with the PARTITION BY RANGE clause, or by the
ALTER INDEX command with the ADD PARTITION clause. An existing partitioned
index can be changed with the five options for modification, DROP, MODIFY, MOVE,
RENAME, and TRUNCATE, as listed above for tables.
It is not necessary to partition the indices of a partitioned table. An index should be
partitioned only if it is large enough to justify partitioning. An index on a partitioned ta-
ble may be created according to the same range values as were used to partition the table,
or with a different set of range values. In the former case the table and its associated in-
dices are said to be equipartitioned. Such a choice often improves query performance
since data and their indices are accessed on the same range of values. Also, an index that
is equipartitioned with its table is easier to implement (see [1, pp. 187 – 189]). However,
the tablespace of a table partition should be placed on a different drive than that of its as-
sociated index partition to avoid data contention problems, because a data table and its
index table are accessed together during an indexed search.
See Section 10.3 for additional information on partitions.
Let us suppose that the table ORDER of Section 3.3.2 is partitioned on the column
ORDER_VALUE, which has the data type NUMBER. We create the partitioned table as
follows:
3.6 Partitioning of Tables and Indices 51
Key Words
1NF CHECK (constraint)
2NF conceptual level
3NF contention
acceptance test data consistency
advanced replication data dictionary
basic replication data integrity
References and Further Reading 53
Very few books discuss conceptual level tuning. Loney [4, p. 163–170] contains some
materials about the three versions of a database. Cursory references to denormalization
appear in [1] and [5], but no examples are given. Partitioning is treated in detail with ex-
amples of sample SQL code in [1, Chapter 5], [2, Chapter 7], and [4, Chapter 12]. In par-
ticular, [4] contains SQL commands with syntax for creating and modifying partitions.
Data replication is treated as a network traffic issue in both [2, Chapter 11] and ([4,
Chapter 16]. Shasha [5, Chapter 4] compares denormalization with the clustering of ta-
bles and indices as possible modes of tuning the conceptual level. However, clustering is
not offered by all RDBMS vendors. Oracle supports clustering. Denormalizaton, on the
other hand, is independent of the RDBMS used. Date [3, pp. 4–5] wants denormalization
only at the physical level.
54 3. Tuning the Conceptual Level of a Database
Exercises
1. As a tuning method denormalization does not go well with updates. Why? Can you
identify update situations where denormalization is less harmful?
2. Read the reference Shasha [5] cited in References and Further Reading above. Ex-
plore clustering as a tuning option for the conceptual level. Describe your findings
with actual examples of both denormalization and clustering.
3. Using your own experience as a database professional describe situations where par-
titioning helped to improve response time.
4. Identify the disadvantages of overindexing tables in a database.
5. How will you implement denormalization after a database application has gone into
production?
6. Repeat Question 5 for partitioning.
Part 2
Oracle Tools for Tuning and Optimization
Part 2 consists of seven chapters that cover the Oracle tools for monitoring the perform-
ance of a database and tuning and optimizing its internal and external levels, as needed.
This part is specific to Oracle 8i with a glimpse into Oracle 9i in Chapter 10. The goal of
Part 2 is to provide the underlying tuning principles and a repertoire of Oracle tools to
implement these principles. Chapters 4 to 7 discuss the tuning of the internal level and
Chapters 8 and 9 that of the external level. Chapter 10 describes several features of Ora-
cle 8i not covered in the earlier chapters and gives an overview of several tuning features
of Oracle 9i.
Chapter 4, Internal Level of an Oracle Database, introduces the structure of the inter-
nal level comprising an instance and a database. The instance consists of a set of mem-
ory-resident data structures and a set of background processes. Their tuning principles are
treated in detail in Chapter 6, Tuning of Memory-Resident Data Structures. The database
consists of a set of disk-resident data structures, namely, the tablespaces, tables, indices,
etc. Their tuning principles are treated in detail in Chapter 5, Tuning of Disk-Resident
Data Structures. Chapter 7, Oracle Utility for Tuning and Optimization, involves a de-
tailed discussion of the two Oracle diagnostic tools UTLBSTAT and UTLESTAT.
Chapter 8, Optimization of the External Level of an Oracle Database, introduces the
mathematical theory underlying the query optimization process. This is followed by a
detailed treatment of Oracle’s optimization tools such as EXPLAIN PLAN, SQLTRACE
and TKPROF, and AUTOTRACE. Then, Chapter 9, Query Tuning and Optimization
Under Oracle 8i, discusses the rule-based and the cost-based optimizers, various joining
techniques, and the use of hints in queries to suggest specific query execution plans.
Chapter 10, Special Features of Oracle 8i and a Glimpse into Oracle 9i, covers several
special features of Oracle 8i pertaining to performance tuning and optimization that were
not covered in the earlier chapters. It closes with an overview of some of the features of
Oracle 9i.
4
Internal Level of an Oracle Database
Outline
4.1 Components of the Internal Level
4.2 Oracle Instance
4.3 Oracle Database
4.4 Tuning Methodology for Internal Level
4.5 Oracle Data Dictionary
4.6 V$ Views and X$ Tables
4.7 Initialization Parameters for an Oracle Database
Key Words
References and Further Reading
Exercises
The SGA and the background processes together constitute an Oracle instance. Sec-
tion 4.2 describes the function of each structure and each background process of the Ora-
cle instance in more detail.
Instance
+ Database
SG Auxiliary
Background Storage
Processess Files
Multiple instances can map onto a single database. The number of instances that can be
associated with a given database is determined by the value of MAXINSTANCES in the
CREATE DATABASE command. Sections 4.2.1 and 4.2.2 describe respectively the
memory structures and the background processes that together constitute an Oracle in-
stance.
When an instance is started, Oracle creates the operational background processes.
When the database is mounted, it is available for DBA activities only. Next as the data-
base is opened, all the physical structures become operational. At this time, the database
becomes available to all users. Installation is a separate process that may not actually cre-
ate a database or open an instance.
The memory resident structures of an Oracle instance reside in the System Global Area.
The size of the SGA can be found as follows:
SQL> select * from V$SGA;
NAME VALUE
------------------ ----------
Fixed Size 69616
Variable Size 75911168
Database Buffers 81920000
Redo Buffers 278528
A transaction in an Oracle database is defined as a logical unit of work that executes
a series of SQL statements such as INSERT, UPDATE, and DELETE to update objects.
The transaction begins with the first executable INSERT, UPDATE, or DELETE state-
ment in the logical unit and ends either by saving it via COMMIT or discarding it via
ROLLBACK. Although a retrieval operation involving the SELECT command does not
update any object, it can be regarded as a transaction in an extended sense that includes
data retrieval. A database transaction, retrieval or update, has the following features.
(a) It needs data from segments such as tables, indices, clusters, etc.
(b) An update transaction must be recorded in some log file so that in case of a system
crash before the transaction is committed or rolled back, the database can be recov-
ered and data integrity can be maintained.
(c) The locations of the needed data elements are available from appropriate views of
the Oracle data dictionary. The parsed and compiled form of the query used by the
transaction should be stored in some location for later retrieval, if needed. If the
same query is reused, it need not be recompiled.
The SGA is a location in memory that is divided into three main components, called
data block buffers, redo log buffers, and shared SQL pool corresponding respectively to
the features (a), (b), and (c), listed above.
4.2 Oracle Instance 61
The redo log buffer is a circular buffer. Redo entries that have already been written to
the redo log files are overwritten by new entries. LGWR normally writes fast enough to
ensure that space is always available in the redo log buffer for new entries.
One can obtain a complete list of the background processes running in an Oracle instance
as follows.
COLUMN DESCRIPTION FORMAT A30
SELECT NAME, DESCRIPTION FROM V$BGPROCESS ORDER BY NAME;
A partial output is shown below.
NAME DESCRIPTION
---- -------------------
ARCH Archival
CKPT checkpoint
DBW0 db writer process 0
DBW1 db writer process 1
DBW2 db writer process 2
DBW3 db writer process 3
DBW4 db writer process 4
DBW5 db writer process 5
DBW6 db writer process 6
DBW7 db writer process 7
DBW8 db writer process 8
DBW9 db writer process 9
LCK0 MI Lock Process 0
LCK1 MI Lock Process 1
LCK2 MI Lock Process 2
LCK3 MI Lock Process 3
LCK4 MI Lock Process 4
LCK5 MI Lock Process 5
LCK6 MI Lock Process 6
LCK7 MI Lock Process 7
LCK8 MI Lock Process 8
LCK9 MI Lock Process 9
LGWR Redo etc.
64 4. Internal Level of an Oracle Database
Archiver (ARCH)
An Oracle instance can run in ARCHIVELOG mode or in NOARCHIVELOG mode,
which is the default. Under the latter option, when one redo log file becomes full, LGWR
writes to the next redo log file, and continues this process until the last redo log file is
full. Then, LGWR writes to the first redo log file overwriting its contents. If the instance
runs in ARCHIVELOG mode, the contents of a redo log file are copied onto a tape or a
disk file before LGWR is allowed to overwrite its contents. This archiving function is
performed by the background process ARCH. In this case, the database recovery from a
disk crash can be done quickly and online since no transactions are lost.
An instance can be made to run in ARCHIVELOG mode by issuing the command
ALTER DATABASE ARCHIVELOG;
The automatic archiving is started by setting the initialization parameter
LOG_ARCHIVE_START to the value TRUE. Under Oracle 8i multiple ARCH I/O slave
processes can run at the same time, thereby improving performance. The number of
ARCH I/O slaves that can run simultaneously at a given time is determined by the ini-
tialization parameter ARCH_IO_SLAVES.
If an instance runs under ARCHIVELOG mode, three background processes, LGWR,
CKPT, and ARCH, need to be properly coordinated. LGWR writes sequentially in batch
the contents of the redo log buffer to the online redo log files. It then causes DBWR to
write to the datafiles all of the data blocks that have been modified since the last check-
point. When DBWR is finished with its task, CKPT is invoked. Then, CKPT updates the
datafile headers and control files to record the checkpoint. The frequency of invoking
CKPT is determined by the initialization parameter LOG_CHECKPOINT_INTERVAL.
ARCH performs the archiving function by making a copy on a disk device of each online
redo log file before overwriting it. As a result, a delay occurs if ARCH is busy archiving
while LGWR and CKPT wait for overwriting the already full online redo log file. It is a
66 4. Internal Level of an Oracle Database
tradeoff situation. But I always recommend ARCHIVELOG mode so that the database
can be recovered from media failure. EXPORT/IMPORT only protects from instance
failure.
The delay caused by ARCHIVELOG mode can be addressed as follows. Until the
system becomes stable, set LOG_CHECKPOINT_INTERVAL to a value that enforces a
checkpoint uniformly distributed over the entire file. For example, suppose that each redo
log file is 50 MB. Compute a value for the parameter LOG_CHECKPOINT_INTERVAL
to enforce a checkpoint at every 10 MB level, say. Then there will be five uniformly dis-
tributed checkpoints for each redo log file. When the system is deemed stable and, there-
fore, crashes very rarely, set LOG_CHECK-POINT_INTERVAL to a value larger than
the online redo log file, thereby forcing a checkpoint to coincide with a log switch, which
happens only when the redo log file is full. This will minimize the contention among
LGWR, CKPT, and ARCH processes.
Snapshots (SNPn)
SNP processes refresh snapshots and scheduling of internal job queues. The suffix “n”
can be a number or a letter to signify each distinct SNP. The total number of SNPn proc-
esses is determined by the initialization parameter JOB_QUEUE_PROCESSES.
Recoverer (RECO)
RECO is used only for distributed databases to resolve failures. It accesses the distributed
transactions that are suspected to have failed and resolves the transactions by commit or
rollback.
Lock (LCKn)
LCK processes are used only when the Oracle Parallel Server option is used. Here the
suffix “n” ranges from 0 to 9. The processes LCKn are used for inter-instance locking.
The total number of LCKn processes is determined by the initialization parameter
GC_LCK_PROCS.
Dispatcher (Dnnn)
Dnnn processes are part of the multithreaded server (MTS) architecture. They help in
minimizing resource needs by handling multiple connections. These are created at data-
base startup based on the SQL*Net configuration and can be removed or additional proc-
esses can be created while the database is open.
Server (Snnn)
Server processes are created for managing those database connections that require a dedi-
cated server.
Figure 4.3 shows a schematic view of an Oracle instance.
4.3 Oracle Database 67
Server (Snnn)
SMON PMON
Background LGWR ARCH
DBWR
Processes:
CKPT
Redo Archived
Data Control
Database: Log Redo Log
Files Files
Files Files
a database operation by separating its tablespaces by their types and activities. Since all
other database objects reside within their respective tablespaces, the configuration created
by the OFA optimizes database performance.
The basic OFA consists of the tablespaces with their contents listed below:
• SYSTEM: Data dictionary views, V$ views, SYSTEM rollback segment;
• DATA: Data tables for the application;
• INDEX: Indices created on all the data tables in DATA;
• TOOLS: Objects created by Oracle and other third party vendors;
• RBS: Rollback segments;
• TEMP: Temporary segments ; and
• USERS: Objects (tables, indices, views, etc.) created by the end users.
The basic OFA is extended to handle more complex databases by including the two
additional tablespaces:
• DATA_2: The data tables usually fall into two categories, dynamic for transactions
and static for validation; DATA_1 (new name of DATA) holds the former and
DATA_2 the latter; and
• INDEX_2: Two separate tablespaces, INDEX_1 (new name for INDEX) and
INDEX_2 to hold the indices for data tables in DATA_1 and DATA_2 respectively.
Additional tablespaces may be added if necessary.
Control File
This is a binary file and is not human readable. It contains control information about all
the files in the database. The file is used for maintaining the physical architecture and in-
ternal consistency of the database, and for any database recovery operation. This file is
vital for the database in that a database cannot start without its control file. For this rea-
son, it is recommended that multiple, say, three, copies of the control file be kept on three
4.3 Oracle Database 69
separate disk devices. In case of media crash at least one copy will hopefully survive to
bring the database back into operation.
Trace Files
There are two types of trace files: the background process trace file and user process trace
file. Each background process running in an Oracle instance keeps a trace file to record
significant events encountered by the process. There are as many trace files as there are
running background processes. These files are used for uncovering the cause of major
failure of the database. A user process trace file can be generated by server processes at
user request to display resource consumption during statement processing.
The location of the background process trace file is determined by the initialization
parameter BACKGROUND_DUMP_DEST. Its default values under UNIX and NT are
respectively $ORACLE_HOME/rdbms/log and %ORACLE_HOME%\Rdbms80\Trace.
The location of the user process trace file is determined by the parameter
USER_DUMP_DEST. Its default values under UNIX and NT are respectively
$ORACLE_HOME/rdbms/log and %ORACLE_HOME%\Rdbms80\Trace. The size of
the user process trace file is determined by the initialization parameter
MAX_DUMP_FILE_SIZE. User process tracing can be enabled at the instance level by
setting the initialization parameter SQL_TRACE to TRUE. Alternatively, tracing can be
enabled at session level by the command
ALTER SESSION SET SQL_TRACE=TRUE;
All trace files have .TRC as the extension.
Oracle Database
Redo
Datafiles
Log File Control
for
Groups Files
Tablespaces
(Online, Archived)
each view with its component columns and their descriptions. Figures 4.5 and 4.7 contain
two scripts for querying DICT and DICT_COLUMNS respectively to show the kind of
information that is available from them. Figures 4.6 and 4.8 contain partial listings of the
sample output from these scripts. The data dictionary views belong to three major catego-
ries, ALL, DBA, and USER with the names of the views starting with prefixes ALL_,
DBA_, and USER_ respectively.
Views prefixed with USER_ contain information on all database objects owned by the
account performing the query. Those with the prefix ALL_ include the information in
USER_ views as well as the information about objects on which privileges have been
granted to PUBLIC or to the user. The views prefixed with DBA_ are all inclusive in that
they contain information on all database objects irrespective of the owner.
Figure 4.6 contains a partial listing of the output from the script.
72 4. Internal Level of an Oracle Database
Figure 4.8 contains a partial listing of the output generated by the script file shown in
Figure 4.7..
FIGURE 4.8 (continued): Data Dictionary Views with Columns (Partial List)
4.6.1 V$ Views
Figure 4.9 contains a script for querying the view V$FIXED_VIEW_DEFINITION and
prints a list of all such views with their respective definitions.
4.6 V$ Views and X$ Tables 75
Figure 4.10 contains a partial output from running the script file shown in Figure 4.9.
4.6.2 X$ Tables
Oracle provides very scanty documentation about the X$ tables that underlie the V$
views. These tables store up-to-date information about database activities since the last
startup. The tables cannot be updated or dropped. As they reside in memory, there is very
limited access to them, which makes them very difficult to use. Also, unlike the data dic-
tionary views and the V$ views, the names of the X$ tables are very cryptic and do not
give any indication of their contents. The SELECT statement is the only command that
can be run against them. An error occurs if one attempts to grant the SELECT privilege
on these tables to another user. As a result, SQL*Plus reporting commands cannot be
executed on the X$ tables. However, one can use a "bypass" procedure described in Fig-
ure 4.11 to create views from two tables, x$kcbrbh and x$kcbcbh, and then execute
SQL*Plus reporting commands against these views.
Now, we can write script files using SQL*Plus commands to query the views
x$kcbrbh and x$kcbcbh. See Figures 6.7 and 6.8 for two examples of such script files.
Values of the updateable (also called variable) parameters can be changed at any time
by editing the parameter file. But the new values will take effect only after the instance is
shut down and then restarted. The only exception to this rule is the set of dynamic pa-
rameters. They can be changed as follows via ALTER SESSION or ALTER SYSTEM
command while the instance is running:
ALTER SESSION SET parameter = value
ALTER SYSTEM SET parameter = value
The contents of a parameter file can be displayed by invoking the Server Manager and
typing
SHOW PARAMETERS
Alternatively, the script file in Figure 4.12 can be used to display the contents of a pa-
rameter file.
Figure 4.13 shows a partial listing of the contents of the parameter file.
May 17th, 2001 Initialization Parameter Listing for EXAMPLE Database Page 1
Parameter Parameter Default Parameter
Name Value Value Description
--------------------- -------------- -------- --------------
always_anti_join NESTED_LOOPS FALSE always use this
anti-join when
possible
audit_trail TRUE FALSE enable system
auditing
b_tree_bitmap_plans FALSE TRUE enable the use
of bitmap plans
for tables w.
only B-tree
indexes
blank_trimming FALSE TRUE blank trimming
semantics
parameter
cache_size_threshold 480 FALSE maximum size of
table or piece
to be cached (in
blocks)
checkpoint_process TRUE FALSE create a
separate
checkpoint
process
Key Words
alert log file initialization parameter
ARCH instance
ARCHIVELOG LGWR
background process library cache
checkpoint log switch
CKPT memory cache
control file Optimal Flexible Architecture
data block buffer redo log buffer
data dictionary redo log file
database transaction rollback segment
datafile SGA
DBWR shared SQL pool
80 4. Internal Level of an Oracle Database
Exercises
The exercises given below identify some of the areas not specifically covered in Chapter
4 and should be considered an extension of the text.
1. A user schema in a database is defined as a collection of all database objects owned
by the user. The data dictionary view DBA_TABLES contains information about all
tables created by all the users of the database. How can you query DBA_TABLES to
extract information about all the tables created by a particular user?
2. Which of the following two statements is true? Why?
a. An instance can be associated with multiple databases.
b. A database can be associated with multiple instances.
3. Which of the following two statements is true? Why?
a. A log switch enforces a checkpoint.
b. A checkpoint enforces a log switch.
4. The init.ora file is created for an instance. How does the instance identify its data-
base?
5. How can you tell by looking at the init.ora file whether a database runs in the
ARCHIVELOG mode?
6. What is the difference between an alert log file and a user trace file with respect to
the information content of the files?
5
Tuning of Disk Resident Data Structures
Outline
5.1 Disk-Resident Data Structures
5.2 Performance Tuning of Disk-Resident Data Structures
5.3 Baseline of Disk-Resident Data Structure
5.4 Changes to Database Schema
5.5 Data Block Structure
5.6 Used Space Fragmentation at the Segment Level
5.7 Severity of Free Space Shortage
5.8 Free Space Fragmentation at Tablespace Level
5.9 Row Chaining and Row Migration in Tables
5.10 Performance Tuning of Rollback Segments
Key Words
References and Further Reading
Exercises
set of scripts given in Sections 5.3.1 through 5.3.7 to establish the baseline for the stored
data.
Figure 5.2 shows the sample output from the script file shown in Figure 5.1.
84 5. Tuning of Disk-Resident Data Structures
May 17th, 2001 Data File Listing for EXAMPLE Database Page 1
Tablespace File File Size in
Name Name Oracle Blocks
--------------- ------------------------------ -----------
EXPLORE_DATA D:\ORANT\DATABASE\USR2ORCL.ORA 50
D:\ORANT\DATABASE\USR3ORCL.ORA 2,560
ROLLBACK_DATA D:\ORANT\DATABASE\RBS1ORCL.ORA 2,560
SYSTEM D:\ORANT\DATABASE\SYS1ORCL.ORA 10,240
TEMPORARY_DATA D:\ORANT\DATABASE\TMP1ORCL.ORA 1,024
USER_DATA D:\ORANT\DATABASE\USR1ORCL.ORA 1,536
Script File: My_Directory\DATAFILE.sql
Spool File: My_Directory\DATAFILE.lst
The report shows that EXPLORE_DATA has two datafiles allocated to it, and the
other four tablespaces have one datafile each.
FIGURE 5.3 (continued): Script for Listing Tablespaces with Extent Information
Figure 5.4 shows the sample output from the script shown in Figure 5.3.
May 17th, 2001 Tablespace Extent Listing for EXAMPLE Database Page 1
Tablespace Initia Next Max
Name Extent Extent Extents
----------- --------- -------- --------
EXPLORE_DATA 516,096 5,120,000 505
ROLLBACK_DATA 40,960 40,960 505
SYSTEM 16,384 16,384 505
TEMPORARY_DATA 40,960 40,960 505
USER_DATA 40,960 40,960 505
Script File: My_Directory\TABLESPC.sql
Spool File: My_Directory \TABLESPC.lst
The report shows the size of the initial and next extents of each tablespace in bytes.
Each tablespace can claim up to 505 extents depending on available space in its
datafile(s).
86 5. Tuning of Disk-Resident Data Structures
Figure 5.6 shows a partial listing of the report against the EXAMPLE database.
5.3 Baseline of Disk-Resident Data Structures 87
Figure 5.6 provides the same information for segments that Figure 5.4 provides for
tablespaces.
FIGURE 5.7 (continued): Script for Listing Segments with Extent Allocation
Figure 5.8 shows a partial listing of the sample report from the EXAMPLE database,
May 17th, 2001 Tablespace Segment and Extent - EXAMPLE Database Page 1
Tablespace Segment Segment Ext
Name Type Name Ext Blocks ID Blocks
------------- -------- --------- ---- ------ ---- ------
ROLLBACK_DATA ROLLBACK RB1 2 150 0 25
2 150 1 125
RB2 2 150 0 25
2 150 1 125
RB3 2 150 0 25
2 150 1 125
RB4 2 150 0 25
2 150 1 125
RB5 2 150 0 25
USER_DATA INDEX PK_EMP 1 5 0 5
SYS_C00532 1 5 0 5
SYS_C00535 1 5 0 5
and provides a complete map of the extents allocated to each segment. For example, the
rollback segment RB1 has two extents comprising 150 blocks of storage (one block =
2,048 bytes). This space is taken from the datafile D:\ORANT\DATABASE\
RBS1ORCL.ORA of size 2,560 blocks assigned to the tablespace ROLLBACK_DATA
(see Figure 5.2). The two extents are identified by their Extent IDs of 0 and 1. Similarly,
the segment CALL has one extent of size 5 blocks and with Extent ID of 0. This space is
taken from the datafile of the tablespace USER_DATA containing the segment CALL.
Oracle always assigns sequential values to Extent IDs starting with 0.
Figure 5.9 gives the table structure listing and Figure 5.10 shows a sample report for
EXAMPLE database.
Each table in Figure 5.10 is listed with its column names, column data types, and an
indication if a column can remain blank.
5.3 Baseline of Disk-Resident Data Structures 91
Figure 5.11 gives the index structure listing and Figure 5.12 shows a partial report for the
EXAMPLE database.
92 5. Tuning of Disk-Resident Data Structures
Figure 5.12 shows, for example, that CALL_ID is a single column index in CALL,
and (CALL_ID, CSR_ID) is a two-column composite index in CALL_CSR.
The script for listing constraints is provided in Figure 5.13, and Figure 5.14 shows a
sample partial report for the EXAMPLE database, showing the constraints for each table.
Unless the user gives a name for a constraint, Oracle assigns a name in the format
SYS_Cnnnnnn, where n is an integer. The type of a constraint can be C for CHECK, P
for primary key, R for foreign key, U for unique index, and N for the NOT NULL re-
quirement. Thus, Figure 5.14 shows that CALL_ID is the single column primary key of
CALL, (CALL_ID, CSR_ID) is the two-column primary key of CALL_CSR, CSR_ID is
a foreign key in CALL_CSR, etc. We recall that this information matches the corre-
sponding information from Figure 5.12.
Oracle uses the SYSTEM tablespace to store database objects needed for database ad-
ministration. No user objects should be stored in this tablespace. However, database ob-
jects created by any Oracle or third party tools use SYSTEM as the default database ac-
count and, therefore, any objects created by them use SYSTEM as the tablespace. To
prevent this situation, designate TOOLS as the default tablespace for the SYSTEM ac-
count and reduce the quota of this account on SYSTEM tablespace to zero by running the
following command.
alter user SYSTEM
quota 30M on TOOLS
quota 0 on SYSTEM
default tablespace TOOLS
temporary tablespace TEMP;
As a result, no database objects can be created by the SYSTEM account on the
SYSTEM tablespace. Oracle's Optimal Flexible Architecture described in Section 4.3
recommends this policy.
Run the script in Figure 5.15 to determine if any user has SYSTEM as its default or
temporary tablespace:
If this script returns any user name, change its default or temporary tablespace to
TOOLS via the ALTER USER command.
As the database changes during production use, the above components also change.
The DBA must track these changes by running the above script files on a regular basis. If
the structure of any database object changes, it must be recorded at the conceptual level.
dropping the table, the emptied space remains allocated to the table instead of being re-
turned as free space to the holding tablespace. This phenomenon can make the used space
in the table full of "holes". The analogy of Swiss cheese is often used to describe the
situation. When a table is truncated, the option DROP STORAGE is the default. Conse-
quently, a truncated table retains its structure, but releases its storage space for reuse. If it
is necessary to retain the storage space of a truncated table instead of releasing it, then
use the option REUSE STORAGE with the TRUNCATE command. This becomes useful
if a table is truncated and immediately afterward is populated with new data. By using the
REUSE STORAGE option Oracle avoids the overhead involved in first reclaiming the
truncated space and then immediately allocating new space for storing the new data.
For each table, Oracle maintains one or more linked lists for free data blocks available
within the extents allocated to the table. These blocks are used for INSERTs to the table.
FREELISTS, a parameter of the CREATE TABLE command with a default value of 1,
represents the number of these linked lists. Each INSERT statement refers to the
FREELISTS to identify an available block.
When the data belonging to a table are scattered over many noncontiguous extents, the
used space of the segment becomes fragmented and the data retrieval takes longer. A rule
of thumb is that a segment with 100 or more extents is starting to get fragmented and
should, therefore, be examined for defragmentation. This leads us to the following rule:
(a) Run the script in Figure 5.7 regularly to monitor the number of extents (“Ext” col-
umn in Figure 5.8) allocated to each segment. Identify the segments for which this
number exceeds 100.
(b) For each segment identified in step (a), run the script of Figure 5.16. Its output (see
Figure 5.17 and its accompanying analysis below) lists each segment with all of its
extents identified with their Extent_IDs, Block_IDs, and sizes in Oracle blocks.
(c) If the extents allocated to a segment are mostly contiguous, the performance may not
suffer. Otherwise, corrective action is necessary.
May 17th, 2001 Used Space Extent Map - EXAMPLE Database Page 1
Segment Segment Ext Block
Name Type Ext ID ID Blocks
------- -------- ------- ----- ---- ------
BONUS TABLE 1 0 307 5
CALL TABLE 1 0 477 5
ITEM TABLE 2 0 362 5
1 367 5
When a segment has multiple extents, we can test if they are contiguous by using the
following rule.
IF (Block_ID)(n+1) = (Block_ID)n + (Blocks)n,
THEN (Extent_ID)(n-1) and (Extent_ID)n are contiguous;
ELSE (Extent_ID)(n-1) and (Extent_ID)n are not contiguous.
Here (Block_ID)n is the nth Block ID within a given Ext ID, and (Extent_ID)(n-1) is the
nth Ext ID for a given segment, where n = 1, 2, 3, . . . .
Looking at Figure 5.17 we find that the table ITEM has one extent with two blocks
having the property
367 = (Block_ID)2 = 362 + 5 = (Block_ID)1 + (Blocks)1
Hence its two extents with Extent_IDs 0 and 1 are contiguous. On the other hand, the
rollback segment RB1 has two extents with the property
2402 = (Block_ID)2 ≠ 2 + 25 = (Block_ID)1 + (Blocks)1
Hence its two extents with Extent_IDs 0 and 1 are not contiguous.
The root cause of segment fragmentation is wrong sizing of the initial and next extents of
the fragmented segments, primarily tables and rollback segments. Appendix A provides
algorithms and C programs to compute the extent sizes of segments. One of the input pa-
rameters needed by the C programs in Appendix A is the ROWCOUNT of each table.
Since this value is often not known during the initial sizing, an incorrect estimate of the
INITIAL and NEXT extents may occur. The DBA, therefore, needs to monitor the rate of
acquisition of new extents by the segments and then resize the NEXT EXTENTs, if nec-
essary. A better estimate of the ROWCOUNTs of the tables is needed for this exercise.
The following five-step methodology addresses the root cause of fragmentation.
(a) Perform a resizing of the table(s) and estimate the new value of the NEXT EXTENT
for each table.
100 5. Tuning of Disk-Resident Data Structures
(b) Perform an export of the affected tables labeled table_1, table_2, …,table_n, say,
with the following specification
exp username/password file=expdat.dmp compress=Y grants=Y
indexes=Y tables=table_1,table_2,...,table_n
The command “exp” (short for Oracle’s Export utility) exports all the n tables along
with their respective indices and grants to a compressed file called expdat.dmp. The
utility determines the total amount of space allocated to a table and then writes to the
expdat.dmp file a newly computed INITIAL EXTENT storage parameter equal in
size to the total of all the fragmented storage extents that were allocated to that table.
As a result, the newly sized INITIAL EXTENTs of each table and its associated in-
dices contain all the current data.
(c) Drop the tables table_1, table_2, . . . ,table_n.
(d) Perform an import of the tables, table_1, table_2,..., table_n, with the following
specification
imp username/password file=expdat.dmp commit=Y full=Y
buffer=1000000
The command “imp” (short for Oracle’s Import utility) now imports the earlier ex-
ported tables with the new storage parameters. The option “full=Y” ensures that all
the exported tables in expdat.dmp file are imported.
(e) For each imported table, run the command
ALTER TABLE table_name STORAGE (NEXT size);
where “size” is the newly computed value of the NEXT EXTENT for the table.
A few comments are in order here.
• The value of “buffer” in step (d) must be estimated according to the sizes of the ta-
bles being exported. It is usually set at a high value, say, over 200,000. Each “com-
mit” occurs when the buffer becomes full.
• If the tables being exported do not contain columns of the LONG data type, then use
the option “direct=Y” with the “exp” command. This speeds up the export process.
• Steps (c) and (d) imply that the tables, table_1, table_2, . . . , table_n, are not avail-
able to the users between the instant when they are dropped and the instant when
they are imported successfully. Consequently, for heavily used OLTP applications
the above five-step procedure should be run when users do not access these tables,
say, during late evenings (after 10 P.M.) or weekends. If the export dump file in Step
(b) gets corrupted, then that step has to be repeated.
• If a table is very large, say, with millions of rows and occupying over 1 GB of space,
its newly computed INITIAL EXTENT may not fit into its holding tablespace. In
that case, proceed as follows.
5.7 Severity of Free Space Shortage 101
• Using the script in Figure 5.18 find the largest chunk of contiguous blocks avail-
able in the tablespace.
• Create the table with its INITIAL EXTENT at about 70% of the largest chunk
found above and its NEXT EXTENT as computed in Step (e).
• Run the “imp” command of Step (d) with the option “ignore=Y”. This option
causes the exported data from Step (b) to be imported into the newly created ta-
ble without any problems regarding available storage. Also, it ignores error mes-
sages when the import utility finds that the table already exists.
Note that for DATA_B, INDEX_C, and RBS, the largest contiguous chunk is signifi-
cantly smaller than the total free space available. This indicates some level of fragmenta-
tion of free space there. Since TEMP grows and shrinks depending on the database trans-
102 5. Tuning of Disk-Resident Data Structures
actions at the time, the difference between the two columns (995 versus 108) may not be
indicative of problems.
The DBA should track the usage of free space at the tablespace level and determine if
more datafiles should be added to a tablespace before it runs out of its free space. Figures
5.19 and 5.21 contain two scripts that provide information that will help the DBA in this
respect.
The script shown in Figure 5.19 lists for each tablespace except SYSTEM the amount of
free space in bytes for all the segments after the next largest extent is allocated.
The script shown in Figure 5.21 is an exception report. It returns only those segments that
cannot acquire the next extent within their tablespace(s) due to lack of free space.
104 5. Tuning of Disk-Resident Data Structures
Figure 5.22 shows the output from Figure 5.21. Only one segment, RBS_LARGE, is un-
able to acquire the next extent due to lack of adequate free space. Ideally, when this script is
run, the output should be "no rows selected.” Otherwise, the DBA must take immediate
measures such as adding more datafile(s) to the listed tablespace(s).
5.8 Free Space Fragmentation at Tablespace Level 105
Comparing Figures 5.20 and 5.22 we note that the segment RBS_LARGE needs
5,242,880 bytes to extend. But the tablespace RBS has only 2,052,096 bytes of free
space. The difference of –3,190,784 (=2,052,096 – 5,242,880) bytes is shown in Figure
5.20 as "Bytes Left" for this segment.
Table A Table B
Initial = 100K Initial = 100K Free Space
Next = 50K Next = 50K
There are some situations where two or more contiguous free extents are combined
into a single large contiguous free extent.
1. If the PCTINCREASE storage parameter is positive, the background process SMON
(System MONitor) automatically coalesces contiguous free extents into a single ex-
tent. However, we usually set PCTINCREASE = 0 while creating tablespaces to
avoid combinatorial explosion of NEXT EXTENT sizes. A compromise can be made
by setting PCTINCREASE = 1, the smallest positive integer, so that SMON can op-
erate and, at the same time, the NEXT EXTENT sizes increase very slowly.
2. Starting with Oracle 7.3 we can use the command
ALTER TABLESPACE tablespace_name COALESCE;
to coalesce manually two or more contiguous free extents into a single free extent.
Of course, it is necessary to establish via the procedure described below that the tar-
get tablespace indeed has many contiguous free extents.
We now describe a two-step procedure to reduce or eliminate tablespace fragmenta-
tion.
(a) Measure the level of fragmentation in each tablespace in the database.
Loney [2, p. 342–344] defined an index called the free space fragmentation index (FSFI)
to determine the level of fragmentation. The index is given below:
Ideally, FSFI should be 100. But as fragmentation occurs, FSFI gradually decreases.
In general, a tablespace with FSFI < 30 should be further examined for possible defrag-
mentation. Run the script in Figure 5.24 to calculate FSFI for each tablespace.
Figure 5.25 contains a sample output from the script. It shows that the tablespaces
INDEX_B and PD_DATA warrant further investigation with respect to free space frag-
mentation since their FSFIs are below 30.
108 5. Tuning of Disk-Resident Data Structures
(b) List the File IDs, Block IDs, and the total blocks for each tablespace with FSFI < 30
and determine how many of their free space extents are contiguous.
Run the script in Figure 5.26.
SET LINESIZE 78
SET PAGESIZE 41
SET NEWPAGE 0
SPOOL My_Directory\FREESPACE.lst
SELECT TABLESPACE_NAME, FILE_ID, BLOCK_ID, BLOCKS,
TO_CHAR (SysDate, 'fmMonth ddth, YYYY') TODAY
FROM DBA_FREE_SPACE
ORDER BY TABLESPACE_NAME, FILE_ID, BLOCK_ID;
SPOOL OFF
FIGURE 5.26 (continued): Script for Listing Free Space Extents in Tablespaces
Chaining occurs in a table when a row physically spans multiple data blocks within an
extent allocated to the table. Oracle then stores the data in the row in a chain of blocks by
setting up pointer(s) linking the physically separated parts of the row.
In order to detect chaining in tables run the PL/SQL procedure, examine_chaining.sql,
given in Figure 5.28. It contains a script file called generate_analyze.sql that ANA-
LYZEs each designated table, COMPUTEs statistics, and LISTs CHAINED ROWS into
the table called chained_rows. The Oracle utility UTLCHAIN.sql creates this table. If a
table has chaining, the PL/SQL procedure calls another PL/SQL procedure, se-
lected_rows_proc.sql, to return the count of chained rows in the table along with the
ROWID of the first part of each chained row. Figure 5.29 contains this latter script file.
examine_chaining.sql
REM Examine chaining in tables
REM Script File Name: My_directory\examine_chaining.sql
REM Author: NAME
REM Date Created: DATE
REM Purpose: Collect statistics via ANALYZE command and
REM list chained rows in tables.
REM
REM
selected_rows_proc.sql
REM Return chained rows in tables
REM Script File Name: My_Directory\selected_rows_proc.sql
REM Author: NAME
REM Date Created: DATE
REM Purpose: For a given table name, return the ROWID
REM of the first part of a chained row in the
REM table. The table name is passed as a
REM parameter to the procedure.
REM
SET serveroutput on size 100000
CREATE OR REPLACE PROCEDURE selected_rows_proc (tbl_name IN
VARCHAR2) AS
hd_row chained_rows.head_rowid%TYPE;
time_run chained_rows.analyze_timestamp%TYPE;
BEGIN
DECLARE CURSOR selected_rows IS
SELECT head_rowid, analyze_timestamp FROM chained_rows
WHERE table_name = tbl_name;
BEGIN
OPEN selected_rows;
FETCH selected_rows INTO hd_row, time_run;
DBMS_OUTPUT.PUT_LINE ('ROWID of the first part of chained
row in table ' || tbl_name || ' is ' || hd_row || '
and table was last ANALYZEd on ' || time_run);
CLOSE selected_rows;
END;
END;
/
Chaining usually becomes a problem when more than 10% of the rows in a table are
chained. Run the following query to return a list of tables with chained rows:
select table_name, round ((100*chain_cnt)/num_rows, 2)
“%Chaining”
from user_tables where num_rows >0 order by 1;
For each table with %Chaining > 10, proceed as follows to reduce chaining:
( a ) If most of the transactions on the table are UPDATE, increase the value of
PCTFREE for the table. The default value of PCTFREE is 10, which usually suffices
for most tables. If, however, a table contains many columns with VARCHAR2 data
types that are initially NULL but expand subsequently through updates, then the de-
fault amount of 10% free space left in each data block may not be sufficient to allow
the expansion of the existing rows within that block. In this case, PCTFREE should
be set at higher than 10. A higher PCTFREE value leaves more space in an Oracle
data block for an updated row to expand within the block, thereby preventing the oc-
currence of chaining. Start with a new PCTFREE = 30 and run the above query
weekly to determine if %Chaining has stopped growing and started decreasing. If
chaining still continues to grow, try PCTFREE = 40 and examine if chaining has
stopped growing. If it still does not work, export and import the table with
PCTFREE = 40.
(b) If most of the transactions on the table are INSERTs, increasing the data block size
may help. But that option may cause other problems such as increased block conten-
tion on the index leaf blocks. Also, to increase the block size one has to drop and
then recreate the database. Hence unless there are other more significant reasons to
increase the block size, this option may not be worthwhile to pursue.
114 5. Tuning of Disk-Resident Data Structures
(c) If the chained table has a column with the LONG data type, row chaining may be
unavoidable. The problem can be minimized by splitting the table into two tables.
One table contains all the columns except the column with the LONG data type, and
the other table contains only the primary key of the original table and the column
with the LONG data type. All programs using the original chained table must be
modified to handle the new design involving two tables. See Section 2.3.1 for an ac-
tual case study.
Steps (a) and (b) above require that the frequency of INSERTs or UPDATEs on a
chained table must be determined. For this purpose the DBA needs to activate auditing on
the table and query the view DBA_AUDIT_TRAIL. The four steps listed below describe
that process.
• Set the initialization parameter AUDIT_TRAIL to the value TRUE in order to enable
auditing.
• Invoke the Server Manager and enter “connect internal” to access the database.
• For each table to be audited, enter the command
AUDIT ALL PRIVILEGES ON table_name BY ACCESS;
where table_name is the name of the table. The command activates the auditing
mechanism on the table. The option BY ACCESS causes an audit record to be writ-
ten to the table SYS.AUD$ once for each time the table is accessed. For example, if
a user performs four update transactions (INSERT, UPDATE, DELETE) on the ta-
ble, then four separate records are written to SYS.AUD$. The results can be viewed
by querying the view DBA_AUDIT_TRAIL.
• Run the following query against DBA_AUDIT_TRAIL:
select username, to_char (timestamp, ‘dd-mon-yy
hh24:mi:ss’) “Transaction Time”, obj_name, action_name
from dba_audit_trail
order by username;
The query returns the list of actions taken on the table. The DBA can determine the
frequency of INSERTs and UPDATEs from the list.
Migration occurs when an UPDATE statement increases the amount of data in a row
whereby the updated row can no longer fit into a single Oracle data block. Oracle uses
variable length records to store data of VARCHAR2 columns. As a result, if a table con-
tains many columns with VARCHAR2 data types that are initially blank but are subse-
quently populated through updates, row migration may result by the expansion of exist-
ing rows. Oracle tries to find another block with sufficient free space to accommodate the
updated row. If such a block is available, Oracle moves the entire row to that block. But it
5.10 Performance Tuning of Rollback Segments 115
keeps the original row fragment at its old location and sets up a pointer to link this frag-
ment to the new location. The ROWID of a migrated row does not change, nor are the in-
dices, if any, updated. If a new data block to contain the updated row is not available, the
row is physically split into two or more fragments stored at different locations. This is
row chaining as discussed above.
The scripts listed in Figures 5.28 and 5.29 will detect both chained and migrated
rows. Row migration can be addressed by increasing the value of PCTFREE in the af-
fected table, as described above in Section 5.9.1. A more rigorous method to eliminate
row migration is given below:
(a) Using the ROWIDs of the affected rows copy them to another table;
(b) Delete the affected rows from the original table;
(c) Insert the rows from Step (a) into the table. This eliminates migration, because mi-
gration is caused only by UPDATEs.
Both retrieval and update transactions of a database use rollback segments. An update
transaction involves the commands INSERT, UPDATE, and DELETE. The rollback
segments store the “before image” version of the data used by such a transaction. A re-
trieval transaction involves the SELECT command. If a query uses data that another user
is updating, Oracle uses the rollback segments to store the data as they existed before the
update began.
Each rollback segment belongs to a rollback tablespace that provides its physical stor-
age. As such, each segment consists of multiple extents, which are not necessarily con-
116 5. Tuning of Disk-Resident Data Structures
tiguous, and each extent consists of contiguous blocks of storage. A rollback segment
entry is the set of data blocks that contain the rows that are being modified by an update
transaction. If the transaction is committed, the entry is labeled inactive and its storage is
available for use by another transaction. If the transaction is rolled back or if the system
crashes before a COMMIT occurs, the entry is used for updating the table involved in the
transaction. A single rollback segment can contain multiple rollback segment entries, but
a single entry cannot span multiple rollback segments. Oracle assigns transactions to roll-
back segments in a round-robin fashion. This results in an even distribution of the num-
ber of transactions in each rollback segment. In order to use the rollback segment, a
transaction updates the rollback segment header block, which resides in the data block
buffer cache in the SGA.
Three major issues with rollback segments are wraps, extends, and waits. The goal in
designing rollback segments is twofold.
• The data for a transaction should fit within a single extent.
• A transaction should not wait to access the rollback segment header block.
When a transaction's rollback segment entry cannot be stored within a single extent,
the entry acquires the next extent within that segment if that extent is inactive, i.e., avail-
able. This is called a wrap. If the next extent is active (i.e., unavailable), the segment ex-
tends to acquire a new extent from its tablespace. If the current extent is the last extent in
the segment and the first extent is inactive, the transaction wraps to the first extent. If the
first extent is active, the segment extends to acquire a new extent from its tablespace.
Suppose that a rollback segment has six extents E1, . . ., E6. A transaction T currently
occupies extents E3 and E4, and needs another extent. If E5 is inactive, then T wraps to
E5, as shown in Figure 5.30. If E5 is active, then T extends to a new extent E7, as shown
in Figure 5.31.
Rollback segment header activity controls the writing of changed data blocks to the
rollback segment. The rollback segment header block resides in the data block buffer
cache. If a user process has to wait for access to the rollback segment header block, it is
counted as a wait. Excessive wraps, extends, and waits indicate problems for the data-
base.
A rollback segment should be created with an optimal size via the OPTIMAL option
in the STORAGE clause in the CREATE ROLLBACK SEGMENT command. During
regular transactions a rollback segment occasionally grows beyond this optimal size for a
particular transaction, but then it shrinks back after the transaction is complete. The dy-
namic performance views V$ROLLSTAT and V$ROLLNAME provide the statistics on
wraps, extends, shrinks, etc., as shown in the SQL script below.
5.10 Performance Tuning of Rollback Segments 117
E1 E2 E3
Free Space Free Space Transaction
E4 E5 E6
Transaction Inactive Inactive
T WRAPS to E5
E1 E2 E3
Free Space Free Space Transaction T
E4 E5 E6
Transaction T Active Inactive
T
Extends
to E7
E7
New Extent
The column AVESHRINK shows the size by which a rollback segment shrinks.
SHRINKS and EXTENDS are exceptional situations and Oracle recommends that they
not happen during normal production runs. Each rollback segment listed above has an
optimal size of 80 MB.
The statistics on waits are available from the view V$WAITSTAT, as shown below:
The COUNT for the CLASS “undo header” records the total number of waits to ac-
cess the rollback segment header block that occurred since the startup of the database.
5.10 Performance Tuning of Rollback Segments 119
The following list, although not exhaustive, identifies the major problem areas with the
rollback segments:
(a) Large number of WRAPS,
(b) Large number of EXTENDS,
(c) Multiple transactions using the same rollback segment,
(d) High value of WAITS, and
(e) Oracle error message ORA-01555: snapshot too old (rollback segment too small).
Items (a) through (d) have been explained before. Let us examine item (e). Referring
to Figure 5.32 we make the following assumptions:
A long running query Q starts at instant x. Subsequently, a transaction T starts
at instant x + a, say, to update the data, some or all of which are being used by Q. To
provide a read-consistent image of the data to Q, Oracle uses the data as they existed be-
fore T began. These before image data currently reside in the three extents, E3, E4,
and E5, say. Suppose that T finishes execution at instant x + a + b. At that instant,
extents E3 through E5 are marked inactive and made available for use by other trans-
actions, if needed. But the query Q is not yet complete. Suppose now that at instant
x + a + b + c, another transaction U starts and acquires extents E1 through E4. Since
E1 and E2 are free, no conflict arises from their acquisition by U. However, E3 and
E4 contain inactive in-use data still needed by Q. When E3 and E4 are overwritten with
new data used by U, Q can no longer resolve the query since its data are partially
gone. As a result, Oracle returns the error message ORA-01555: snapshot too old (roll-
back segment too small). The underlying reason is that the extents of the rollback seg-
ment are too small and cause wraps to occur too quickly. Figure 5.32 shows the situation
graphically.
The symptoms (a) through (e) listed above arise out of two root causes:
• The size of each extent of a rollback segment is small; and
• The number of rollback segments is low and should be increased.
In order to resolve the problems we need to collect data about the rollback segment
usage while the database is running.
120 5. Tuning of Disk-Resident Data Structures
Instant Events
x+a+b+c U starts; acquires E1, E2, E3, E4; overwrites inactive but
in-use data in E3 and E4; Q cannot execute since some of
its data is gone
x+a+b T finishes; E3, E4, E5 are inactive, but contain data being
used by Q
x Q starts
The scripts in Figures 5.33 and 5.34 can be run interactively or under the control of
command files. For interactive running, it is necessary to remove the REM clauses from
the two SPOOL commands in each file. Then spooling will be activated and the output
will be spooled to the designated .lst files. For running the scripts under the control of
command files in a UNIX environment, we can follow one of two paths. We can run both
scripts as "cron" jobs under UNIX, or execute two command files for the two scripts.
Figure 5.35 contains a sample "crontab" entry. Here both scripts are run every 10 minutes
for a 24-hour period and the output is sent to a .lst file in each case.
Figures 5.36 and 5.37 contain two UNIX shell scripts to execute the two SQL scripts
of Figures 5.33 and 5.34 respectively. In each case, the data collector, say, a DBA, is
prompted for the number of iterations to run and the time span between two consecutive
runs. For example, the scripts can be run every 15 minutes for a total of 8 hours, every 10
minutes for a 24-hour period, and so on.
#!/bin/csh
#
# File Name: RBS_Usage_Command
# Author: NAME
# Date Written: July 22, 1999
# Purpose: For each rollback segment, list the number of
# blocks used, and numbers of "wraps","extends",
# "shrinks"
# Connect to the production database in Oracle
# Execute the script file, RBS_Usage_Volume.sql, in Oracle
#
set Usage_ITER_COUNT = 1
echo -n "How many iterations do you want? "
set ITERATIONS_LIMIT = $<
echo The program will loop through $ITERATIONS_LIMIT iterations.
echo -n "Enter in number of seconds the time interval for
collecting data."
echo -n "For example, to collect data every 2 minutes, enter 120.
Number of seconds: "
set SLEEP_COUNTER = $<
echo The data will be collected every $SLEEP_COUNTER seconds.
while ($Usage_ITER_COUNT <= $ITERATIONS_LIMIT)
echo "Usage ITERATION COUNT = "$Usage_ITER_COUNT
sqlplus supread/supread@database_name @RBS_Usage_Volume 1>>
RBS_Usage_Volume.lst
sleep $SLEEP_COUNTER
@ Usage_ITER_COUNT++
end
echo "Usage Loop terminates."
#!/bin/csh
#
# File Name: RBS_Transaction_Command
# Author: NAME
# Date Written: July 22, 1999
# Purpose: For each active transaction, list its start
# time, hexadecimal address, number of blocks
# used, rollback segment name. SQL text, etc.
#
#
# Connect to the production database in Oracle
# Execute the script file, RBS_Transaction_Volume.sql, in Oracle
#
set Tran_ITER_COUNT = 1
echo -n "How many iterations do you want? "
set ITERATIONS_LIMIT = $<
echo The program will loop through $ITERATIONS_LIMIT iterations.
echo -n "Enter in number of seconds the time interval for
collecting data."
echo -n "For example, to collect data every 2 minutes, enter 120.
Number of seconds: "
set SLEEP_COUNTER = $<
echo The data will be collected every $SLEEP_COUNTER seconds.
August 2nd, 1999 Rollback Segment Data for Active Transactions Page 1
Start Transaction Rollback Blocks
RUNTIME Time Address Segment Used
---------------- -------------- --------- -------- ------
02-AUG-1999 124754 08/02/99 11:59:55 1E3339C8 RBS3 2
02-AUG-1999 124754 08/02/99 11:58:37 1E332FC0 RBS4 2
02-AUG-1999 124754 08/02/99 12:46:59 1E333D20 RBS5 2
02-AUG-1999 124754 08/02/99 12:30:21 1E3325B8 L_RBS_01 6,304
02-AUG-1999 124754 08/02/99 12:47:04 1E333318 L_RBS_01 1
Script File: /My_Directory/RBS_Transaction_Volumes.sql
Spool File: /My_Directory/RBS_Transaction_Volumes.lst
August 2nd, 1999 Rollback Segment Data for Active Transactions Page 1
Rollback
Segment XACTS SID SERIAL# User SQL_TEXT
------- ----- --- ------ ------ ------------
RBS3 1 33 16 VERTEX begin DBMS_APPLICATION_IN
FO.SET_MODULE(:1,NULL);
end;
RBS4 1 88 1984 VERTEX begin DBMS_APPLICATION_IN
FO.SET_MODULE(:1,NULL);
end;
RBS5 2 18 5 OPS$INVOICE insert into table
(column list) values
(value list);
L_RBS_01 3 65 9 OPS$INVOICE update table set column
= value where
condition(s);
Script File: /My_Directory/RBS_Transaction_Volumes.sql
Spool File: /My_Directory/RBS_Transaction_Volumes.lst
August 2nd, 1999 Rollback Segment Data for Active Transactions Page 2
Rollback
Segment XACTS SID SERIAL# User SQL_TEXT
--------- ------ ---- --------- --------- -------------
L_RBS_01 3 70 9 OPS$INVOICE INSERT INTO table
(column list) values
(value list);
Script File: /My_Directory/RBS_Transaction_Volumes.sql
Spool File: /My_Directory/RBS_Transaction_Volumes.lst
5 rows selected.
In an actual case study done during August 1999 involving a large production database,
the two command files were run every 10 minutes over an eight-hour period. The data-
base had six rollback segments, RBS1 through RBS5 and L_RBS_01. The last segment
was designed for handling large transactions in batch jobs. The scripts produced two vo-
luminous output files. A careful analysis of the files revealed the following.
1. Large number of WRITES and WRAPS against RBS3, RBS5 and L_RBS_01;
2. Very high WAITS against L_RBS_01;
3. Total of 23 transactions ran in eight hours; #UPDATEs = 12, and #INSERTs = 7
against large tables;
4 . Most of the time 5 transactions ran concurrently; at one time there were 6
transactions;
5. The following 5 UPDATE transactions consumed more than one extent in its roll-
back segment (extent = 1,025 blocks of size 4 K each).
Tran ID StartTime EndTime (est.) RBS Blocks Used
1E3325B8 12:30:21 16:30:15 L_RBS_01 6,304
1E334A80 12:55:25 16:30:15 RBS5 14,984
1E333D20 16:34:31 16:50:18 VL_RBS_01 1,158
1E3343D0 16:49:39 19:30:43 RBS3 21,676
1E332910 19:40:43 20:41:12 RBS4 15,296
6. "undo header" values in V$WAITSTAT view increased from 1,675 to 28,018 in
eight hours;
7. The EXTENDS values increased as follows on the segments listed below:
RBS Range of EXTENDS Time Span
RBS3 4 to 47 17:46:25 to 20:36:32
RBS5 3 to 30 14:36:04 to 20:36:32
L_RBS_01 2 to 98 13:55:53 to 16:26:18
L_RBS_01 5 to 24 18:46:27 to 20:36:32
Based on the above findings I made the following conclusions and recommendations.
(a) The large values of WAITS indicate that there is high contention for accessing the
rollback segment header block. This indicates that we need to add more rollback
segments.
(b) The high values of EXTENDS for some of the segments indicate that they are im-
properly sized. The extent size should be increased.
(c) L_RBS_01 should be sized much larger than its current size. Notice the large values
of EXTENDS for L_RBS_01 listed above.
In summary, I proposed the following steps to resolve the problems.
• Perform an appropriate sizing of the extents of the rollback segments.
• Determine the number of rollback segments needed for the database.
• Estimate a value for the OPTIMAL parameter of the storage clause.
128 5. Tuning of Disk-Resident Data Structures
/* PURPOSE:
The program prompts the user to enter the values of four
parameters:
Free Space (FS) = percentage of rollback segment reserved as free
space
Inactive In-use Data (IID) = percentage of rollback segment
reserved for use by inactive extents containing data being
used by other queries; this avoids the error message ORA-
01555: "snapshot too old"
Header Area (HA) = percentage of rollback segment reserved for use
by rollback segment header block
Commit Frequency (CF) = maximum number of records after which the
user enters COMMIT or ROLLBACK command
The program uses the following intermediary variables for
computation:
MPS = Minimum Possible Size
MTS = Minimum Total Size
MNRS = Minimum Number of Rollback Segments
LTS = Largest Transaction Size
MAX_TS = Maximum Total Size of All Transactions .
The program generates an interactive output, which is saved in a
text file RBS_Sizing.txt for future reference. The output file is
opened in "append" mode to allow the user to experiment with
different parameter values and compare the results.
PROGRAM FILE NAME: My_Directory\RBS_Sizing.c
OUTPUT FILE NAME: My_Directory\RBS_Sizing.txt
AUTHOR: NAME
DATE CREATED: September 27, 1999
*/
#include <stdio.h>
#include <math.h>
void main ()
{
FILE *fopen (), *out_file;
int FS, IID, HA, CF, LTS, MTS_INT, MAX_TS, MNRS;
float MPS, MTS;
out_file = fopen ("RBS_Sizing.txt", "a");
printf ("\nEnter as an integer the percentage reserved as free
space: ");
scanf ("%d", &FS);
printf ("\nEnter as an integer the percentage reserved for
inactive in-use extents: ");
scanf ("%d", &IID);
printf ("\nEnter as an integer the percentage reserved as
header area: ");
scanf ("%d", &HA);
printf ("\nEnter the maximum number of records processed
before COMMIT: ");
scanf ("%d", &CF);
printf ("\nEnter the largest transaction size in blocks: ");
scanf ("%d", <S);
printf ("\nEnter the maximum total size of all transactions in
blocks: ");
scanf ("%d", &MAX_TS);
MPS = (CF * LTS) / (100 - (FS + IID + HA));
MTS = (CF * MAX_TS ) / (100 - (FS + IID + HA));
MTS_INT = ceil (MTS);
MNRS = ceil (MTS / MPS);
printf ("\nYou entered the following parameter values:\n");
printf ("\nFree Space = %d percent of each extent",FS);
printf ("\nInactive In-use Data = %d percent of each
extent",IID);
printf ("\nHeader Area = %d percent of each extent",HA);
printf ("\nMaximum Number of Records Processed before COMMIT =
%d",CF);
printf ("\nLargest Transaction Size in Blocks = %d", LTS);
printf ("\nMaximum Total Size of All Transactions in Blocks =
%d",MAX_TS);
printf ("\nMPS = %f", MPS);
printf ("\nMTS = %f", MTS);
printf ("\nMTS_INT = %d", MTS_INT);
printf ("\nMNRS = %d", MNRS);
I ran the above program with two sets of input values, one for large rollback segments
and the other for regular ones. The output file RBS_Sizing.txt is given in Figure 5.41.
Niemiec [3, p. 121] offers the following guideline for determining the total number of
rollback segments.
# Concurrent Transactions # of Rollback Segments
Less than 16 4
16 to 32 8
Greater than 32, say N N/4, but not more than 50
If there is a conflict between the number derived from Figure 5.40 and the above ta-
ble, always take the greater of the two numbers.
For the production database in the case study, I ran the program RBS_Sizing.c twice,
once for rollback segments to handle very large transactions and then for rollback seg-
ments to handle normal transactions. The output in Figure 5.41 has, therefore, two parts.
The first part recommends that there be 3 rollback segments with extent size of 18,253
blocks (about 71 MB) for very large transactions. The second part shows that there are 10
rollback segments with extent size of 13,666 blocks (about 53 MB) for regular transac-
tions.
In order to use the designated large segments the programs involving large UPDATE
transactions must contain the following block of code before every transaction that must
use a designated segment called large_segment.
COMMIT;
SET TRANSACTION USE ROLLBACK SEGMENT large_segment;
COMMIT;
I implemented my recommendations with the new size, number, and types of rollback
segments, as listed below.
Large Rollback Segments: Number = 3
INITIAL = NEXT = 70 M,
MINEXTENTS = 20,
OPTIMAL = 1400 M
Regular Rollback Segments: Number = 10
INITIAL = NEXT = 50 M,
MINEXTENTS = 20,
OPTIMAL = 1000 M
Two weeks later I ran the scripts of Figures 5.33, 5.34, 5.36, and 5.37 to collect data.
Figures 5.42 and 5.43 contain a sample of rollback segment usage and transaction data
for this follow-up study.
August 17th, 1999 Rollback Segment Data for Active Transactions Page 1
Start Transaction Rollback Blocks
RUNTIME Time Address Segment Used
------------------ ----------------- --------- -------- ------
17-AUG-1999 114203 08/17/99 11:42:02 1E574BEC RBS5 2
17-AUG-1999 114203 08/17/99 11:42:03 1E573E8C RBS8 2
Script File: /My_Directory/RBS_Transaction_Volumes.sql
Spool File: /My_Directory/RBS_Transaction_Volumes.lst
August 17th, 1999 Rollback Segment Data for Active Transactions Page 1
Rollback
Segment XACTS SID SERIAL# User SQL_TEXT
-------- ----- --- ------- ------------ -----------------
RBS5 1 54 2 OPS$INVOICE insert into table
(column list) values (value
list);
Script File: /My_Directory/RBS_Transaction_Volumes.sql
Spool File: /My_Directory/RBS_Transaction_Volumes.lst
1 rows selected.
Key Words
audit trail Extent ID
before image extent map
bind variable fragmentation
block ID free space
chaining free space fragmentation index
constraint FREELIST GROUPS
contention FREELISTS
crontab import
data block buffers index
data block header initialization parameter
data dictionary instance
database transaction LOB
datafile migration
defragmentation rollback segment
dictionary cache rollback segment entry
export rollback segment header activity
export via conventional path temporary segment
export via direct path user object
EXTENDS WAITS
extent WRAPS
Exercises 135
All the above references have discussed the tuning principles of the disk-resident struc-
tures. A clearcut distinction of the instance and the database tuning, as done in this chap-
ter and in Chapter 6, is not found in any of them. Aronoff et al. [1, Chapter 5], Niemic [3,
p. 109–114], and Page et al. [4, Chapter 21] discuss the used space fragmentation at the
segment level, the chaining of rows in tables, and defragmentation of tables and indices
via export/import. Page et al. [4, Chapter 16] offer an excellent treatment of the tuning
principles in general and the various tools that are available in Oracle for handling them.
Loney [2, Chapter 7] discusses the general management of the rollback segments from
the viewpoint of a DBA. In particular, he provides algorithms [2, pp. 305–314] for cal-
culating the size and number of rollback segments and some guidelines for determining
the value of the OPTIMAL parameter in creating a rollback segment. Also, Loney [2,
Chapter 8] addresses the free space fragmentation in tablespaces.
Exercises
Theoretical exercises are of little value for this chapter since the best practice comes from
monitoring the performance of actual production databases and tuning them, as needed.
The exercises given below identify some of the areas not specifically covered in Chapter
5 and should be considered an extension of the text.
1. You have been collecting data on extents in tables and tablespaces. You want to gen-
erate a report on the trend of extent usage by these objects. Your goal is to be proac-
tive in predicting when an object will fail to extend. Devise an action plan to meet
your goal using the following guidelines.
(a) Write a script to collect the needed data.
(b) Decide on the frequency of data collection and then collect the data.
(c) Use a statistical forecasting technique such as linear trend, exponential trend,
etc. to forecast future trend usage.
(d) Write a program using some 3GL such as C, UNIX shell scripting, etc. that
prompts the user for a time value, computes the extent sizes for that date, and
then returns a message about the need if any for allocating more data files to the
objects. Caution: You need some understanding of statistical forecasting tech-
nique to do the above exercise.
2. Explain why the sizing algorithm of rollback segments (Figure 5.40) is different
from that for sizing data and index segments (Section A3 in Appendix A). How
136 5. Tuning of Disk-Resident Data Structures
would you size a rollback tablespace given the sizes of its component rollback seg-
ments?
3. Do you think that you should be concerned about fragmentation of rollback and tem-
porary tablespaces? Is FSFI (Figure 5.24) a valid measure of their fragmentation?
Give reasons for your answer.
4. A table T in an Oracle database under UNIX has over 18 million rows and currently
occupies nearly 3 GB of space. Also, T has a column of the LONG data type and is
heavily fragmented. You want to defragment T via export/import. List the steps that
you want to follow and some potential problems that you will face. (Hint: a UNIX
file size cannot exceed 2 GB.)
5. What is the exact difference between row chaining and row migration? Why are both
considered potentially harmful? What is the adverse impact of setting PCTFREE too
high, say, PCTFREE = 70 and PCTUSED = 30?
6
Tuning of Memory-Resident Data Structures
Outline
6.1 Memory-Resident Data Structures
6.2 Performance Tuning
6.3 Data Block Buffers
6.4 Redo Log Buffer
6.5 Shared SQL Pool
6.6 Background Processes
6.7 Tuning the Memory
6.8 Tuning the CPU
6.9 Pinning Packages in Memory
6.10 Latching Mechanism for Access Control
Key Words
References and Further Reading
Exercises
Cache
LRU Blocks
Disk Files
spool DBB_HR_value.lst
select name, value from v$sysstat where name in ('consistent
gets', 'db block gets', 'physical reads') order by name;
select TO_CHAR (sysdate, 'MM/DD/YY HH:MI:SS') "RunTime",
a.value + b.value LR, c.value PR,
ROUND (((a.value + b.value - c.value) / (a.value + b.value)) *
100)
"DBB-HR" from v$sysstat a, v$sysstat b, v$sysstat c
where a.name = 'consistent gets'
and b.name = 'db block gets'
and c.name = 'physical reads';
spool off
NAME VALUE
--------------- ----------
consistent gets 546345397
db block gets 174344671
physical reads 116061101
RunTime LR PR DBB-HR
----------------- --------- ---------- -----
08/05/99 03:08:53 720690068 116061101 84
It is recommended that the DBB-HR should exceed 90. This minimizes the impact of
the CPU costs for I/O operations. The target value of the DBB-HR should be determined
based on the mix of OLTP and batch jobs run against the database. Figure 6.5 contains a
script file to compute the DBB-HR for specific Oracle and operating system users.
REM User Mix for Target Data Block Buffer Hit Ratio for Example
REM Database
REM Script File Name: /My_Directory/UserMix_DBB_HR.sql
REM Spool File Name: /My_Directory/UserMix_DBB_HR.lst
REM Author: NAME
REM Date Created: DATE
REM Purpose: Determine batch and online users for target hit
REM ratio for data block buffers at a given instant
REM Use the views V$SESSION and V$SESS_IO
REM
REM
SET PAGESIZE 41
SET NEWPAGE 0
SET FEEDBACK OFF
COLUMN username FORMAT A15
COLUMN osuser FORMAT A10
spool UserMix_DBB_HR.lst
select TO_CHAR (sysdate, 'MM/DD/YY HH:MI:SS') "RunTime", username,
osuser,
consistent_gets + block_gets LR, physical_reads PR,
ROUND (((consistent_gets + block_gets - physical_reads) /
(consistent_gets + block_gets)) * 100) "DBB-HR"
from v$session, v$sess_io
where v$session.SID = v$sess_io.SID
and consistent_gets + block_gets > 0
and username is NOT NULL
order by username, osuser;
spool off
The listing in Figure 6.6 shows that the batch users have nearly perfect DBB-HR val-
ues. As a guideline, if there are 20 or more users and batch users cause less than 50% of
6.3 Data Block Buffers 143
the LRs, then the DBB-HR should be above 94. On the other hand, for less than 20 users,
the DBB-HR may range between 91 and 94.
There are two situations where the DBB-HR can be inflated but does not necessarily
imply that most of the requested data are available in the DBB.
Page Fault
A data block within the DBB may be moved out to virtual memory when it is associated
with an inactive process. If the block is needed later by another process, it is brought into
the DBB. That event is called a page fault (see Section 6.7). When Oracle brings in the
requested block to the DBB, it is recorded as an LR although the data block has been
fetched from disk, which is the virtual memory. Thus, DBB-HR increases despite a PR
caused by the page fault.
Oracle's system tables x$kcbrbh and x$kcbcbh track respectively the numbers of
cache hits and cache misses when the database is running. Figures 6.7 and 6.8 contain the
script files that show the changes in cache hits and cache misses respectively as we in-
crease or decrease the DBB by N blocks at a time, where N is a number supplied by the
user. You can use different values of N to experiment with the number of additional
cache hits (or misses) caused by the increments (or decrements) in the size of the DBB.
Alternatively, one can experiment with the effect of more or fewer blocks on cache
hits or misses by activating the two views, V$RECENT_BUCKET and V$CURRENT_
BUCKET. The view V$RECENT_BUCKET is activated by setting the initialization pa-
rameter DB_BLOCK_LRU_EXTENDED_STATISTICS to a positive integer N, say,
(default being zero). Then Oracle collects N rows of statistics to populate the view, each
row reflecting the effect on cache hits of adding one more buffer to the DBB up to the
maximum of N buffers. The view V$CURRENT_BUCKET is activated by setting the
initialization parameter DB_BLOCK_LRU _STATISTICS to TRUE, the default being
FALSE. Then the view V$CURRENT_BUCKET keeps track of the number of additional
cache misses that will occur as a result of removing buffers from the DBB. By enabling
these two initialization parameters, however, you cause a large performance loss to the
system. So, they should be enabled only when the system is lightly loaded. Using the
script files in Figures 6.7 and 6.8 to accomplish the same goal is a much more preferable
way than enabling these two views. Figures 6.7A and 6.8A contain the SELECT state-
ments that should be used instead of the SELECT statements appearing in Figures 6.7
and 6.8 respectively if the views V$RECENT_BUCKET and V$CURRENT_BUCKET
are used instead of the tables x$kcbrbh and x$kcbcbh. The rest of the programs in Figures
6.7 and 6.8 remain the same.
146 6. Tuning of Memory-Resident Data Structures
rameter LOG_BUFFER will help decrease this statistic. A positive value of this statistic
occurs when both the RLB and the active online redo log file are full so that the LGWR is
waiting for disk space in the form of an online redo log file to empty the contents of the
RLB. Disk space is made available by performing a log switch that starts a new redo log
file. However, for the log switch to occur, Oracle must ensure that all the committed dirty
buffers in the data block buffer cache have been written to disk files via the DBWR. If a
database has a large data block buffer full of dirty buffers and small redo log files, then a
log switch has to wait for the DBWR to write the dirty buffers into disk files before con-
tinuing. We can see, therefore, that a chain of events must occur to resolve the situation
triggered by a redo log space request.
Since prevention is always better than cure, the metric RLB-HR should be monitored
continuously and if it shows an increasing trend toward the value of 1/5,000 (= .0002),
the DBA should take corrective action. Figure 6.9 contains the script file to compute
RLB-HR and Figure 6.10 shows a sample output from the script file.
NAME VALUE
------------------------- --------
redo entries 5213968
redo log space requests 46
redo log space wait time 1945
RunTime Redo Entries RL Req RL Wait in Secs RLB-HR
----------------- ------- ----- --------------- ------
08/12/99 10:08:40 5213970 46 19 0
Library Cache
The LC contains the parse tree and the execution plan for all SQL and PL/SQL state-
ments that are encountered in the course of a transaction, both retrieval and update.
Chapter 8 discusses query processing and optimization in detail. For our purpose here, it
will suffice to understand that the parsing phase that produces the parse tree is the most
time consuming and resource intensive for the following reasons.
• The SQL syntax of the statement is checked.
• All syntax errors are resolved.
• A search is made to determine if an identical SQL statement already resides in
the LC.
• The execution plan is prepared.
The metric to measure the efficiency of LC is the library cache hit ratio (LC-HR). It
uses two statistics, pins and reloads, from the view V$LIBRARYCACHE. A pin indi-
cates a cache hit; i.e., the parse tree is available in cache. A reload indicates a cache miss,
i.e., the parse tree has been flushed from LC under the LRU algorithm and, therefore,
must be reloaded from disk. LC-HR represents the ratio
pins
pins + reloads
and should be >99%. Figure 6.11 provides the script file to compute LC-HR and Figure
6.12 shows a sample output from running the script.
6.5 Shared SQL Pool 149
Here we have a perfect value of 1 (or 100%) for the LC-HR. To make the LC-HR ap-
proach 1, we need to make “reloads” approach zero, i.e., reduce the number of cache
misses. This is the primary goal for tuning the LC. Cache misses can be reduced by
keeping parsing to a minimum as follows.
(a) Use as much generic code as possible so that SQL statements can utilize a shared
SQL area in SSP.
(b) Use bind variables rather than constants. A bind variable in a PL/SQL program is a
host variable to which a specific value is bound. The bind variable accepts its value
at runtime. Thus, even if its value changes, the parsed form of the SQL statement
remains the same and, therefore, the statement is not reparsed.
(c) Increase the size of the LC by increasing the value of the initialization parameter
SHARED_POOL_SIZE. The larger LC will be flushed less often by the LRU
algorithm.
Figure 6.13 contains a script file to compute SHARED_POOL_SIZE based on the
current load and employing a user-supplied padding factor for free space. It is recom-
150 6. Tuning of Memory-Resident Data Structures
mended that this factor be kept around 33% at start. The following seven-step algorithm
is used in this script.
1. Find the primary user of the database. This is usually a generic account representing
the application such as OPS$INVOICE, etc.
2. Find the amount of memory utilized by the user in (1).
3. Find the amount of memory in SSP currently in use.
4. Ask for an estimated number of concurrent users accessing the application.
5. Calculate the currently used SSP size as the expression (2) * (4) + (3).
6. Ask for the padding factor for free space, say PF, as a positive decimal number <1.
7. The optimal value of SHARED_POOL_SIZE is (5) * (1 + PF).
There is one phenomenon called invalidation that increases cache misses resulting in
a higher value of “reloads”. An invalidation occurs when a schema object referenced ear-
lier in a SQL statement is modified subsequently. If the object is a table or an index, such
modifications occur as a result of ALTER or DROP TABLE, ANALYZE TABLE,
ALTER or DROP INDEX, etc. If the object is a PL/SQL package, procedure, or function,
it is modified via recompilation. In any such case, the shared SQL area referencing that
object becomes invalidated and Oracle marks the area as "invalid." A SQL statement ref-
erencing the modified object must be reparsed the next time it is executed and, therefore,
the parsed form must be reloaded. Figure 6.15 contains a script file that returns the num-
ber of invalidations from V$LIBRARYCACHE. Figure 6.16 contains an Oracle session
transcript showing the impact of an ANALYZE command on reloads.
SQL> @Invalidation
RunTime NAMESPACE PINS RELOADS INVALIDATIONS
----------------- --------------- ------ ------- -------------
08/18/99 12:08:57 BODY 153 3 0
08/18/99 12:08:57 CLUSTER 5291 3 0
08/18/99 12:08:57 INDEX 41 0 0
08/18/99 12:08:57 OBJECT 0 0 0
08/18/99 12:08:57 PIPE 0 0 0
08/18/99 12:08:57 SQL AREA 302836 287 1312
08/18/99 12:08:57 TABLE/PROCEDURE 38295 563 0
08/18/99 12:08:57 TRIGGER 0 0 0
8 rows selected.
SQL> analyze table ORDER compute statistics;
Table analyzed.
SQL> @Invalidation
RunTime NAMESPACE PINS RELOADS INVALIDATIONS
---------------- -------------- ------ ------- -------------
08/18/99 12:08:53 BODY 153 3 0
08/18/99 12:08:53 CLUSTER 5323 3 0
08/18/99 12:08:53 INDEX 41 0 0
08/18/99 12:08:53 OBJECT 0 0 0
08/18/99 12:08:53 PIPE 0 0 0
08/18/99 12:08:53 SQL AREA 303199 294 1359
08/18/99 12:08:53 TABLE/PROCEDURE 38342 570 0
08/18/99 12:08:53 TRIGGER 0 0 0
8 rows selected.
Note that RELOADS and INVALIDATIONS for the SQL AREA are 287 and 1,312
respectively before the ANALYZE command is issued. Subsequently, they increase to
294 and 1,359 respectively.
154 6. Tuning of Memory-Resident Data Structures
Dictionary Cache
The dictionary cache (DC) contains relevant data from Oracle's data dictionary pertaining
to database objects that are referenced in SQL statements used by applications. If the
needed data are available in DC, we have a “gets.” If, however, the data have to be
brought into DC from disk, we have a “getmisses.” After a database has been running for
some time, most of the required data are normally found in DC so that the value of “get-
misses” becomes low. The efficiency of DC is measured by the dictionary cache hit ratio
(DC-HR), which is computed by the following formula,
gets
.
gets + getmisses
Ideally, the DC-HR should be 1 implying that all the data dictionary information is
available in DC. If DC-HR <90%, the value of the initialization parameter
SHARED_POOL_SIZE should be increased using the script in Figure 6.13. Figure 6.17
contains the script file to compute the DC-HR and Figure 6.18 shows a sample output
form running this script.
The value of the DC-HR for the three initialization parameters, DC_USERS,
DC_USER_GRANTS, and DC_TABLE_GRANTS, should be kept above 95%, because
they are used during almost all SQL processing and hence should reside in the DC. Fig-
ure 6.19 contains the script file to compute the DC-HR for each initialization parameter
with (gets + getmisses) > 0. Figure 6.20 shows a sample output form running this script.
6.6.1 DBWR
If an instance has only one DBWR process, it may cause a bottleneck during I/O opera-
tions even if data files are properly distributed among multiple devices. It is better to have
the effect of running multiple DBWRs for an instance. The following guideline can be
used for determining an optimal number of DBWRs.
Rule: Allow one DBWR for every 50 online users performing both retrievals and up-
dates and one DBWR for every two batch jobs performing updates.
We can have the effect of multiple DBWR processes in an instance in two ways by
using two different initialization parameters: DBWR_IO_SLAVES or DB_WRITER_
PROCESSES. Each procedure is described below.
To determine the number of DBWRs needed for an instance, proceed as follows:
(a) Determine the maximum number of concurrent logins since the database was started.
Figure 6.21 contains a script file along with an output to accomplish this task. Note
that the columns, SESSIONS_HIGHWATER and SESSIONS_CURRENT in Figure
6.21 have the following meanings.
6.6 Background Processes 157
(b) Determine the distribution of both READ and WRITE I/Os among the datafiles to
assess if the database is read-intensive or write-intensive. A write-intensive database
should have multiple DBWRs. A read-intensive database may need multiple DBWRs
dictated by the number of concurrent online users and batch processes running
against the database. Figure 6.22 contains a script file to extract the I/O load
information.
SPOOL IO_Load_Distribution.lst
SELECT NAME, PHYBLKRD, PHYBLKWRT, PHYBLKRD + PHYBLKWRT "I/O Volume",
ROUND (100 * PHYBLKRD / (PHYBLKRD + PHYBLKWRT)) "%READ Volume",
ROUND (100 * PHYBLKWRT / (PHYBLKRD + PHYBLKWRT)) "%WRITE Volume",
TO_CHAR (SysDate, 'fmMonth ddth, YYYY') TODAY
from V$FILESTAT, V$DATAFILE
WHERE V$FILESTAT.FILE# = V$DATAFILE.FILE#
ORDER BY ROUND (100 * PHYBLKWRT / (PHYBLKRD + PHYBLKWRT))
DESC;
SELECT NAME, PHYRDS, PHYWRTS, PHYRDS + PHYWRTS "I/O Count",
ROUND (100 * PHYRDS / (PHYRDS + PHYWRTS)) "%READ Count",
ROUND (100 * PHYWRTS / (PHYRDS + PHYWRTS)) "%WRITE Count",
TO_CHAR (SysDate, 'fmMonth ddth, YYYY') TODAY
from V$FILESTAT, V$DATAFILE
WHERE V$FILESTAT.FILE# = V$DATAFILE.FILE#
ORDER BY ROUND (100 * PHYWRTS / (PHYRDS + PHYWRTS)) DESC;
SPOOL OFF
Figure 6.23 contains a partial output file from running the script shown in Figure
6.22.
May 17th, 2001 I/O Load Distribution among Disk Files Page 1
File I/O %READ %WRITE
Name PHYBLKRD PHYBLKWRT Volume Volume Volume
-------------- -------- --------- ------- ------ ------
/abc02/temp01. 133182 2049685 2182867 6 94
dbf
/abc01/rbs01. 108905 861163 970068 11 89
dbf
/abc02/rbs04. 131855 895593 1027448 13 87
dbf
/abc02/temp03. 508201 2981511 3489712 15 85
dbf
/abc01/rbs02. 140896 585761 726657 19 81
dbf
/abc02/temp02. 117277 500856 618133 19 81
dbf
May 17th, 2001 I/O Load Distribution among Disk Files Page 1
File I/O %READ %WRITE
Name PHYRDS PHYWRTS Count Count Count
-------------- ------- ------- ------- ------ ------
/abc02/temp01. 19943 2049685 2069628 1 99
dbf
/abc02/temp03. 59698 2981511 3041209 2 98
dbf
/abc02/temp02. 17241 500856 518097 3 97
dbf
/abc01/rbs01. 108905 861163 970068 11 89
dbf
/abc02/rbs04. 131855 895593 1027448 13 87
dbf
/abc01/rbs02. 140896 585761 726657 19 81
dbf
/abc02/rbs03. 199062 576986 776048 26 74
dbf
/abc05/billingx02. 43676 69088 112764 39 61
dbf
(c) Determine the optimal number of DBWRs by using the rule of thumb given at the
beginning of this section. This number should be the same or very close to the num-
ber of database files or disks as listed in Figure 6.23.
Let us assume that we need to have the effect of running n DBWR processes. We can
implement that in two different ways:
• Via the Parameter DBWR_IO_SLAVES—Set the value of the parameter
DBWR_IO_SLAVES (called DB_WRITERS in Oracle7) to n;
• Via the Parameter DB_WRITER_PROCESSES—Set the value of the parameter
DB_WRITER_PROCESSES to n. Here n must be less than or equal to 10.
The effect of the first option is to have a single DBWR master process spawn n I/O
slave processes to parallelize the writing of the contents of the data block buffer cache
among these n processes. The effect of the second option is to have n (≤10) DBWR proc-
esses labeled DBW0 through DBWn to parallelize both the reading and the writing of the
160 6. Tuning of Memory-Resident Data Structures
contents of the data block buffer cache among these n processes. The only limitation of
this latter option is that n must be less than or equal to 10. From the throughput stand-
point, n DBWR processes (second option) deliver more throughput than one DBWR
process with n I/O slave processes (first option) can. However, if we take the first option,
only one DBWR process will be set regardless of the value of the initialization parameter
DB_WRITER_PROCESSES.
Let us now apply the above three-step procedure to the database. In doing so we come
across several contradictions which are resolved as follows.
• The rule appearing above Step (a) applied to Figure 6.21 suggests two DBWRs to
handle the online users. Also, it is known that three to four concurrent batch jobs are
run for the application. Hence the rule adds two more DBWRs bringing the total
count of DBWRs to four.
• We, therefore, expect to have four or five disks supporting the database, according to
Step (c). However, the system uses seven disks, contradicting Step (c).
• Looking at the full output, only a part of which appears in Figure 6.23, we find that
only 8 out of a total of 69 rows returned have both %WRITE Volume and %WRITE
Count above 60%. So, the application is not write-intensive. This may suggest a
smaller number of DBWRs than four.
• But we further notice that the I/O load is not evenly distributed. Disks /abc01 and
/abc02 are heavily loaded compared to the remaining five disks in the system. By
properly balancing the I/O load among them the total number of disks can be brought
down to five.
• Thus, the number of DBWRs is very close to the number of disks, which resolves the
contradiction with Step (c).
• Accordingly, we implement this configuration under Oracle 8i either by setting the
initialization parameter DBWR_IO_SLAVES to the value 4 (default being zero), or
by setting DB_WRITER_PROCESSES to 4 (default being one).
We now give a five-step procedure to distribute the I/O load evenly among disks. Let
us suppose that we need to move the datafile xyz.dbf allocated to the tablespace TBS from
the disk disk_1 to the disk disk_2. We then proceed as follows.
(a) Bring TBS offline via the following command issued from the Server Manager.
ALTER TABLESPACE TBS OFFLINE;
(b) Copy the file xyz.dbf from disk_1 to disk_2 using the operating system command for
copying files (e.g., cp /disk_1/xyz.dbf /disk_2/xyz.dbf in UNIX).
(c) Designate the file on disk_2 as the datafile for TBS via the following command.
ALTER TABLESAPCE TBS
RENAME /disk_1/xyz.dbf TO /disk_2/xyz.dbf;
(d) Bring TBS online with this new datafile via the following command issued from the
Server Manager.
6.6 Background Processes 161
6.6.2 LGWR
An instance can be made to run multiple LGWR processes by setting the initialization pa-
rameter LGWR_IO_SLAVES to a positive value, the default being zero. There are two
cases to consider depending on whether the instance is running in the ARCHIVELOG or
in NOARCHIVELOG mode, the latter being the default. In the case of the former, there
is an additional background process ARCH that has to be coordinated with LGWR. In the
case of the latter, LGWR wakes up periodically and transfers the contents of the redo log
buffer into the online redo log files. When one file becomes full, a log switch occurs and
LGWR starts using the next redo log file. Figure 6.24 gives a script file to show the log
switch frequencies over a 24-hour period. Figure 6.25 shows its partial output.
Figure 6.25 is a partial list of log switch times. It shows that the first eight switches
occurred every 15 minutes. Then the log switches occurred more frequently. The climax
was reached when two successive log switches took place only 13 seconds apart, the one
at 14:14:49 being followed by the next at 14:15:02. Such a situation usually arises during
very large batch jobs running for a long time. Subsequently, the situation improved, how-
ever. For a high-activity production database there should be enough redo log files of
adequate size so that log switches occur every 20 to 30 minutes. The database in Figure
6.25 has five redo log files of size 50 MB each (see Figure 6.27). The size is too small for
the production environment in which the database runs and to start with should be at least
tripled.
Figure 6.26 contains a script file that lists the names, sizes, and status of the redo log
files and Figure 6.27 shows its output.
6.6 Background Processes 163
Size in
MEMBER GROUP# MEMBERS MB STATUS
------------------------------- ------ ----- ------- ------
/d01/oradata/INVOICE/redo01b.log 1 2 50
/d01/oradata/INVOICE/redo02b.log 2 2 50
/d01/oradata/INVOICE/redo03b.log 3 2 50
/d01/oradata/INVOICE/redo04b.log 4 2 50
/d01/oradata/INVOICE/redo05b.log 5 2 50
/d02/oradata/INVOICE/redo01a.log 1 2 50
/d02/oradata/INVOICE/redo02a.log 2 2 50
/d02/oradata/INVOICE/redo03a.log 3 2 50
/d02/oradata/INVOICE/redo04a.log 4 2 50
/d02/oradata/INVOICE/redo05a.log 5 2 50
Note that STATUS = blank in Figure 6.27 indicates that all the files are in use.
6.6.3 CKPT
• DBWR writes the modified contents of the data block buffer cache, which are called
dirty buffers, into the appropriate database files; and
• CKPT updates the control files and the headers in all database files to record the time
of the last checkpoint.
A log switch always triggers a checkpoint, but not vice versa. In scheduling a check-
point we examine two options.
(a) A checkpoint need not occur more frequently than the log switches.
In this case, set the initialization parameter LOG_CHECKPOINT_INTERVAL to a value
larger than the size of the online redo log file. This parameter represents the number of
redo log file blocks based on the size of the operating system blocks (and not the Oracle
blocks) that must be written to the online redo log file by LGWR at each checkpoint.
When the value exceeds the size of an online redo log file, a checkpoint occurs only at a
log switch. This means that if a system or media crash occurs, data will be lost only since
the last log switch occurred. This usually suffices for most production databases. How-
ever, databases for a mission-critical application may need more frequent checkpoints so
that in case of a crash data will be lost over a much smaller interval of time. This leads us
to the second option described below.
(b) A checkpoint must occur more frequently than the log switches.
In this case, set the value of LOG_CHECKPOINT_INTERVAL such that the check-
points are evenly distributed over the size of the redo log file. For example, if the redo
log file consists of 20,000 operating system blocks, set the value of the parameter to a
factor of 20,000, say, at 5,000 operating system blocks. This will cause a checkpoint to
occur four times as the redo log file fills up to its capacity. The fourth checkpoint will
coincide with a log switch. To enforce this option, set the initialization parameter
LOG_CHECKPOINT_TIMEOUT to zero, which is the default. This parameter measures
the number of seconds elapsed between two consecutive checkpoints. If, for instance, we
want a checkpoint every 15 minutes, we set the value of this parameter to 900. By setting
it to zero, we disable time-based checkpoints, i.e., no additional checkpoints occur be-
tween log switches or between checkpoints forced by LOG_CHECKPOINT_
INTERVAL.
6.6.4 ARCH
ARCH is activated after the database is made to run in ARCHIVELOG mode as follows
from the Server Manager,
SVRMGR> connect internal
SVRMGR> startup mount database_name;
SVRMGR> alter database archivelog;
SVRMGR> archive log start;
SVRMGR> alter database open;
6.6 Background Processes 165
and then setting the initialization parameter LOG_ARCHIVE_START to TRUE, the de-
fault being FALSE. The database can be return to the default mode of
NOARCHIVELOG as follows,
Svrmgrl
SVRMGR> connect internal
SVRMGR> startup mount database_name;
SVRMGR> alter database noarchivelog;
SVRMGR> alter database open;
Once activated, ARCH copies the online redo log files to archived redo log files in a
directory defined by the initialization parameter LOG_ARCHIVE_DEST. It is always
recommended to run a production database in the ARCHIVELOG mode to save the con-
tents of the online redo log files that have been overwritten by LGWR in a log switch. To
avoid contention, the online redo log files and the archived redo log files should reside on
different disks. Figure 6.28 shows the interaction of the LGWR and ARCH processes.
The only performance issue related to ARCH arises in the following situation. LGWR
wants to do a log switch and write to another online redo log file, but cannot because
ARCH is still copying the contents of that file to an archived redo log file. In this case,
one or more of the following Oracle wait events triggered by the waiting for a log switch
will occur.
Scenario: LGWR is ready for a log switch. But it enters a wait state since ARCH is still
not finished with copying online file to archived file. The entire database enters
into a WAIT status.
• Log file switch (archiving needed): The target online redo log file has not been ar-
chived yet;
• Log file switch (checkpoint incomplete): Checkpoint is still in progress;
• Log file switch (clearing log file): The target redo log file is to be cleared due to a
CLEAR LOGFILE command;
166 6. Tuning of Memory-Resident Data Structures
• Log file switch completion: Waiting for the log switch to complete
The wait time in each case is one second. The root cause of this situation can be one
or more of three possibilities.
(a) ARCH works slowly compared to LGWR. To speed it up, set these three initializa-
tion parameters to nondefault values:
• ARCH_IO_SLAVES to a positive value such as 3 or 4, the default being zero.
This allows multiple ARCH processes to function in parallel;
• LOG_ARCHIVE_BUFFERS, which specifies the number of buffers to be allo-
cated for archiving, to 3 or 4;
• LOG_ARCHIVE_BUFFER_SIZE, which specifies the size of each archival
buffer in terms of operating system blocks, to 64. Normally, these three settings
work for most production systems.
(b) The online redo log files are small in size so that they wrap too quickly for the
LGWR to continue. The remedy is to increase their sizes.
(c) There are too few online redo log files so that the LGWR has to overwrite a file be-
fore it has been archived. The solution is to increase the number of these files.
It is recommended that a production database have five to six online redo log file
groups with two mirrored files in each group.
The value of free memory should be 5% or less of the total SGA size. A higher value
indicates that Oracle has aged objects out of the shared SQL pool, which has become
fragmented. Figure 6.30 shows that the free memory occupies 3.7% of the SGA indicat-
ing very little fragmentation of the shared SQL pool.
Given that the SGA should reside in real memory, the following rule of thumb is of-
ten used to compute the required memory for the database server.
Rule: Server memory should be at least three times the sum of the SGA size and the
minimum memory required for installing Oracle. If the number of concurrent
users is more than 50, then a larger memory is needed.
In some operating systems, the DBA can lock the SGA in real memory so that it is
never paged out to disk. If that option is available, use it. Oracle performs better if the
entire SGA is kept in real memory.
If the initialization parameter PRE_PAGE_SGA is set to TRUE (default being
FALSE), Oracle reads the entire SGA into real memory at instance startup. Operating
system page table entries are then pre-built for each page of the SGA. This usually slows
down the instance startup time and may also slow down the individual process startup
times. But it speeds up the amount of time needed by Oracle to reach its full performance
capability after the startup. Therefore, if the database normally runs around the clock and
the system does not create and destroy processes all the time (e.g., by doing continuous
logon/logoff), then the above setting improves performance. However, it does not prevent
the operating system from paging or swapping the SGA after it is initially read into real
memory. The issues specific to the operating system are discussed in the next paragraph.
To take the full advantage of the PRE_PAGE_SGA setting make the page size the largest
possible allowed by the operating system. In general, the page size is operating system
specific and cannot be changed. But some operating systems have a special implementa-
tion for shared memory that allows the system administrator to change the page size to a
larger value.
In any operating system most of the virtual memory physically resides in auxiliary stor-
age. When the operating system needs real memory to meet a service request but the real
memory has fallen below a predefined threshold, blocks are “paged out” from real mem-
ory to virtual memory and new processes are “paged into” real memory occupying the
space freed up by paging. Paging and swapping are the mechanisms used for managing
virtual memory. Thrashing occurs when there is excessive paging and swapping. It causes
blocks to be continually transferred back and forth (“thrashed”) between real and virtual
memory. Paging and swapping impose an overhead on the operating system. The goal
for tuning the real memory is to control paging and swapping so that thrashing does not
occur.
6.7 Tuning the CPU 169
Paging occurs when a process needs a page (block) of memory that is no longer in
real memory but in virtual memory (disk space). The block must be read from there into
real memory. This is called paging in. The block that it replaces in real memory may
have to be written out to virtual memory. This is called paging out. Paging is usually
controlled by the LRU algorithm, as with the SGA, and generally involves inactive proc-
esses. The disk space in virtual memory where the page is transferred is called the page
space. Swapping is more serious and extensive than paging since an entire active process,
instead of only selected pages of a process, is written out from real memory to virtual
memory to make room for another process to execute in real memory. The disk space in
virtual memory where the process is transferred is called swap space. It is strongly rec-
ommended that the swap space for an Oracle database server be configured at least two to
four times the size of real memory. Insufficient swap space often results in a limited real
memory usage since the operating system is unable to reserve swap space for a new proc-
ess to be loaded into real memory. In the case of swapping, the pages that are written out
to virtual memory must later be read back into real memory to continue with their execu-
tion, because no process can execute in virtual memory. If there is insufficient real mem-
ory, the operating system may have to continuously page in and out of real memory re-
sulting in thrashing. Figure 6.31 shows a typical paging and swapping scenario.
Paging is triggered by a page fault, which happens as follows.
Suppose that a page associated with an inactive process has been written out to the
page space in the virtual memory under the LRU. A new process now needs that page,
but cannot find it. This event is called a page fault. Oracle then pages in that page to real
memory. Repeated page faults eventually lead to swapping, when an active process is
moved from real to virtual memory. These are clear symptoms of system degradation.
Consequently, the DBA or the system administrator must monitor very closely both pag-
ing and swapping.
The extent of paging and swapping and the amount of free memory can be monitored
in a UNIX System V environment via the command sar (system activity reporter) run
with four switches, -p, -g, -w, and -r. These switches have the following implications:
-p: Paging in page fault activities,
-g: Paging out activities,
-w: System swapping and switching activities, and
-r: Unused memory pages and disk blocks.
Figure 6.32 contains a script file that prompts the user for two parameters for sar and
then executes sar with those values for each of the four switches described above. Figure
6.33 shows the result of executing the script file of Figure 6.32.
170 6. Tuning of Memory-Resident Data Structures
Real memory
Page
Process
Page Page
Swapping in out
Swap Page
space space
#!/bin/csh
#
# File Name: sar_Switch_Command
# Author: NAME
# Date Written: DATE
# Purpose: Run the 'sar' command with four switches, -p,
# -g, -w, and -r.
# User enters the frequency of collecting
# statistics and the number of such collections.
#
#
printf "\n"
printf "\n"
echo "At what time interval do you want to collect statistics? "
echo -n "Enter the interval in number of seconds: "
set TIME_INTERVAL = $<
printf "\n"
echo "How many times do you want to collect statistics? "
echo -n "Enter the total number of statistics collection that you
want: "
set TOTAL_COUNTER = $<
printf "\n"
echo "The program will run 'sar' with switches, -p, -g, -w, and -r, "
echo "and will collect statistics every $TIME_INTERVAL seconds for
$TOTAL_COUNTER times."
printf "\n"
printf "\n"
echo sar -p $TIME_INTERVAL $TOTAL_COUNTER
printf "\n"
sar -p $TIME_INTERVAL $TOTAL_COUNTER
printf "\n"
printf "\n"
printf "\n"
echo sar -g $TIME_INTERVAL $TOTAL_COUNTER
printf "\n"
sar -g $TIME_INTERVAL $TOTAL_COUNTER
printf "\n"
printf "\n"
printf "\n"
echo sar -w $TIME_INTERVAL $TOTAL_COUNTER
printf "\n"
sar -w $TIME_INTERVAL $TOTAL_COUNTER
printf "\n"
printf "\n"
printf "\n"
echo sar -r $TIME_INTERVAL $TOTAL_COUNTER
printf "\n"
sar -r $TIME_INTERVAL $TOTAL_COUNTER
printf "\n"
printf "\n"
printf "\n"
echo Statistics collection is complete.
printf "\n"
printf "\n"
FIGURE 6.32 (continued): Script File for sar with User Supplied Parameters
The output from the script shown in Figure 6.32 appears in Figure 6.33.
The program will run 'sar' with switches, -p, -g, -w, and
-r, and will collect statistics every 8 seconds for 5 times.
sar -p 8 5
SunOS ABCD 5.6 Generic_105181-13 sun4u 08/30/99
11:32:06 atch/s pgin/s ppgin/s pflt/s vflt/s slock/s
11:32:14 0.00 0.00 0.00 9.84 12.95 0.00
11:32:22 0.37 0.50 0.50 25.09 119.48 0.00
11:32:30 0.00 0.00 0.00 0.00 0.00 0.00
11:32:38 0.00 0.00 0.00 9.86 9.49 0.00
11:32:46 0.25 0.37 0.37 8.97 12.08 0.00
Average 0.12 0.17 0.17 10.75 30.77 0.00
sar -g 8 5
SunOS ABCD 5.6 Generic_105181-13 sun4u 08/30/99
11:32:46 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
11:32:54 0.00 0.00 0.00 0.00 0.00
11:33:02 0.00 0.00 0.00 0.00 0.00
11:33:10 0.37 0.37 0.37 0.00 0.00
11:33:18 0.00 0.00 0.00 0.00 0.00
11:33:26 0.00 0.00 0.00 0.00 0.00
Average 0.07 0.07 0.07 0.00 0.00
sar -w 8 5
SunOS ABCD 5.6 Generic_105181-13 sun4u 08/30/99
11:33:26 swpin/s bswin/s swpot/s bswot/s pswch/s
11:33:34 0.00 0.0 0.00 0.0 587
11:33:42 0.00 0.0 0.00 0.0 581
11:33:50 0.00 0.0 0.00 0.0 573
11:33:58 0.00 0.0 0.00 0.0 625
11:34:06 0.00 0.0 0.00 0.0 626
Average 0.00 0.0 0.00 0.0 598
sar -r 8 5
SunOS ABCD 5.6 Generic_105181-13 sun4u 08/30/99
11:34:07 freemem freeswap
11:34:15 9031 13452815
11:34:23 9031 13452815
11:34:31 9032 13452815
11:34:39 8924 13435165
11:34:47 8760 13449823
Average 8956 13448684
Statistics collection is complete.
FIGURE 6.33 (continued): Output from Running sar with Four Switches
6.7 Tuning the CPU 173
The meanings of the column titles for each switch and potential problem symptoms
identified by them are listed below.
Switch -p:
atch/s = Page faults per second that are satisfied by reclaiming a page currently in
memory;
pgin/s = Number of requests per second for paging in;
ppgin/s = Number of pages that are paged in per second;
pflt/s = Page faults per second from protection errors, i.e., illegal access to page, or
"copy-on-writes";
vflt/s = Page faults per second from address translation, i.e., valid page not in
memory;
slock/s = Page faults per second from software lock requests requiring physical I/O.
Problem symptom(s) caused by memory deficiency: A high number of page faults indi-
cated by one or more of atch/s, pflt/s, vflt/s, and slock/s.
Switch -g:
pgout/s = Number of requests per second for paging out;
ppgout/s = Number of pages that are paged out per second;
pgfree/s = Number of pages that are placed on free list per second by the page
stealing daemon;
pgscan/s = Pages scanned per second by the page stealing daemon;
%ufs_ipf = Percentage of UFS i-nodes taken off the free list by igets that had
reusable pages associated with them. This is the percentage of igets
with page flushes.
Problem symptom(s) caused by memory deficiency: A high value of ppgout/s
Switch -w:
swpin/s = Number of transfers per second for swapping in;
bswin/s = Number of 512-byte blocks transferred per second for swapping in;
swpot/s = Number of transfers per second for swapping out;
bswot/s = Number of 512-byte blocks transferred per second for swapping out
pswch/s = Number of process switches per second.
Problem symptom(s) caused by memory deficiency: High values of one or more of the pa-
rameters.
Switch -r:
freemem = Number of 512-byte blocks of free memory available to user processes;
freeswap = Number of 512-byte blocks available for page swapping.
174 6. Tuning of Memory-Resident Data Structures
sar -u 8 5
SunOS ABCD 5.6 Generic_105181-13 sun4u 09/01/99
12:07:45 %usr %sys %wio %idle
12:07:53 44 25 30 1
12:08:01 43 26 29 2
12:08:09 30 10 54 6
12:08:17 28 9 58 5
12:08:25 31 1054 5
Average 35 16 45 4
The CPU utilization is poor here due to very low %idle and high %wio.
176 6. Tuning of Memory-Resident Data Structures
SET LINESIZE 80
SET PAGESIZE 41
SET NEWPAGE 0
col owner format a10
col name format a30
col type format a20
spool /My_Directory/Pinning_Objects.lst
(b) Execute the script DBMSPOOL.sql. For example, under UNIX, run the following
command,
@$ORACLE_HOME/rdbms/admin/dbmspool
This creates a package called dbms_shared_pool consisting of the following proce-
dures.
procedure sizes(minsize number)
procedure keep(name varchar2, flag char DEFAULT 'P')
procedure unkeep(name varchar2, flag char DEFAULT 'P')
procedure aborted_request_threshold(threshold_size
number)
Note that the list displayed in Figure 6.36 can alternatively be derived by executing
the command:
178 6. Tuning of Memory-Resident Data Structures
There are three areas where a DBA can tune the latches:
(a) RLB via redo allocation latch and redo copy latch,
(b) DBB via LRU latch, and
(c) Free lists for segments.
When a process needs to write to the RLB, it acquires a redo allocation latch. The initiali-
zation parameter LOG_SMALL_ENTRY_MAX_SIZE specifies the size in bytes of the
largest chunk that can be written to the RLB under the redo allocation latch. Its default
value is operating system dependent. If the value of this parameter is zero, then all
"writes" are considered small and are written to the RLB using this latch. After the writ-
ing is complete, the process releases the latch. If the parameter is set to a positive value
and the redo entry to be written to the RLB is larger than this value, the process uses the
redo allocation latch to allocate space in the RLB for writing and acquires a redo copy
latch for doing the actual "write." Once the space is allocated, the process releases the
redo allocation latch and uses the redo copy latch to do the actual writing. After the writ-
ing is complete, the process releases the redo copy latch.
An instance can have only one redo allocation latch and up to n redo copy latches,
where n = 6 * (number of CPUs). The actual value of n is determined by the initialization
parameter LOG_SIMULTANEOUS_COPIES. Multiple redo copy latches allow the
writing of multiple redo log entries simultaneously, thereby reducing contention among
these latches and enhancing performance.
Latches are of two types: willing-to-wait and immediate. The first type requests a
latch, waits if the latch is not available, requests again, and waits until the latch is avail-
able. The second type requests a latch and if it is not available continues processing. The
dynamic performance view V$LATCH contains the statistics for both types of latch con-
tention. The tuning goals for both redo allocation latch and redo copy latch are:
WAIT Hit Ratio > .99, NO-WAIT_Hit Ratio > .99, and
SLEEPS / MISSES ≤ 1.
180 6. Tuning of Memory-Resident Data Structures
Figures 6.38 and 6.39 provide respectively the script file and its output showing the
above three metrics for the two latches, redo allocation and redo copy.
May 17th, 2001 Wait and No-Wait Latch Hit Ratios Page 1
Latch Name WAIT Hit Ratio NO-WAIT Hit Ratio SLEEPS/MISSES
--------------- -------------- ----------------- -------------
redo allocation .999 0 .016
redo copy .144 1 .068
Script File: /My_Directory/Latch_Tuning.sql
Spool File: /My_Directory/Latch_Tuning.lst
Figure 6.39 shows that the WAIT Hit Ratio for redo copy latch and NO-WAIT Hit
Ratio for the redo allocation latch do not meet the tuning goals. However, this in itself
may not be indicative of problems. The latch contention for redo log buffer access rarely
causes any database performance problem. However, if necessary, one can take the fol-
lowing corrective measures.
• For the redo copy latch, increase the value of the initialization parameter
LOG_SIMULTANEOUS_COPIES to its maximum allowed value of 6 * (number of
CPUs).
• For the redo allocation latch, reduce the value of the initialization parameter
LOG_SMALL_ENTRY_MAX_SIZE as noted at the beginning of this section.
When new data blocks are read into the DBB, the necessary space must exist there. Oth-
erwise, Oracle removes the least recently used blocks from the DBB via the LRU algo-
rithm to make room for the new data. LRU latches regulate the LRU lists used by the
DBB. Each latch controls a minimum of 50 buffers. The initialization parameter
DB_BLOCK_LRU_LATCHES determines the number of LRU latches available, default
being (Number of CPUs) / 2.
The tuning goal for the LRU latches is to minimize contention among processes that
are requesting the latch. The metric used to measure contention is defined by
LRU Hit Ratio = SLEEPS / GETS for latch = 'cache buffers
lru chain'
and we need LRU Hit Ratio < .01. If this goal is not met, increase the value of the pa-
rameter DB_BLOCK_LRU_LATCHES to the following value,
min (6 x (Number of CPUs), (Number of buffers) / 50)
Figures 6.40 and 6.41 provide respectively the script file and its output showing the
value of the LRU Hit Ratio for the latch “cache buffers lru chain”. The LRU Hit Ratio
meets the desired goal.
For each data and index segment, Oracle maintains one or more free lists. A free list is a
list of data blocks that have been allocated for the extents of that segment and have free
space greater than PCTFREE for that segment. The list is implemented as a linked list to
make insertion and deletion of blocks to and from the free list simple. See Section E13 of
Appendix E for a discussion of linear linked lists. The blocks of the free list are made
available for inserting data into a segment. After a DELETE or an UPDATE statement is
executed, Oracle checks to see if the space being used in the block is less than the
PCTUSED value for the segment. If it is, the block is placed at the beginning of the free
list and becomes the first of the available blocks to be used. When an INSERT occurs in a
segment, the free list is used to determine what blocks are available for the INSERT. If
6.10 Latching Mechanism for Access Control 183
multiple processes try to insert data into the same segment, a contention arises for the free
lists of that segment. This incurs possible waits for the processes performing the IN-
SERTs. Since there is no initialization parameter setting up the number of free lists, they
cannot be created dynamically as needed. Consequently, a sufficient number of them
must be set up at the time of creating or altering the segment via the STORAGE clause.
This clause is an optional item in any CREATE or ALTER command. Two of the
STORAGE options are FREELISTS and FREELIST GROUPS defined below:
FREELISTS = number of free lists in each free list group
FREELIST GROUPS = number of groups of free lists,
default being one
The tuning goal in setting up free lists and their groups is to minimize contention
among multiple processes inserting data into a segment. The three dynamic performance
views V$SESSION_WAIT, V$WAITSTAT, and V$SYSTEM_EVENT are used to
identify problems with free list contention. The two data dictionary views
DBA_EXTENTS and DBA_SEGMENTS are used to identify the objects that need to be
altered to increase the number of their free lists. Figure 6.42 contains a script that exe-
cutes a three-step procedure to determine if there is free list contention:
(a) Find out if there is wait involved for free lists;
(b) Determine the amount of wait in seconds; and
(c) Determine the segment(s) with free list contention.
Figure 6.43 is a sample output. It shows that there is no free list contention.
If one or more segments are returned showing free list contention, then proceed as
follows to resolve contention.
• Drop the segment(s) showing contention.
• Recreate the segment(s) with larger value(s) for FREELISTS and FREELIST
GROUPS in the STORAGE clause.
Key Words
ARCH background process
ARCHIVELOG bind variable
References and Further Reading 185
All the above references have discussed the tuning issues along with hit ratios, other
performance metrics, and their target values. A clearcut distinction of the instance and the
186 6. Tuning of Memory-Resident Data Structures
database tuning, as done here, is not found in any of them. A good treatment of hit ratios
and their associated benchmark values can be found in Aronoff et al. [1, Chapters 3 and
7] and Ault [2, Chapter 9]. Memory tuning, SGA sizing, and shared SQL pool tuning are
discussed in Aronoff et al. [1, Chapter 9], Ault [2, Chapter 11], Corey et al. [5, Chapter
2], and Dunham [6, Chapters 1 and 7]. Burleson [4, Chapter 3] covers memory tuning
from a more comprehensive viewpoint of tuning the Oracle architecture. Disk I/O tuning
and reducing I/O contention are treated well in Ault [2, Chapter 11] and Bobrowski [3,
Chapter 12]. Corey et al. [5, Chapter 4] treat the tuning of CPU in fair detail. Although
Aronoff et al. [1], Bobrowski [3], Corey et al. [5], and Dunham [6] cover only up to Ora-
cle7.3, most of the scripts included in their books apply equally well to Oracle 8i. Corey
et al. [5, Chapter 9] contain a large number of scripts with nice tuning tips arranged by
topics such as space management, table and index sizing, use of buffers in the SGA, etc.
The book by Niemiec [8] is almost encyclopedic in nature. Most of its chapters are la-
beled with the target readers such as Beginner Developer and Beginner DBA, Developer
and DBA, DBA, and Advanced DBA.
Exercises
Theoretical exercises are of little value for this chapter since the best practice comes from
monitoring the performance of actual production databases and tuning them, as needed.
The exercises given below identify some of the areas not specifically covered in Chapter
6 and should be considered an extension of the text.
1. Augmenting the procedures described in Figures 6.7A and 6.8A prepare a report
showing the effect of additional cache hits and cache misses as you increase or de-
crease the size of the DBB by 75 blocks at a time. Analyze the report to determine if
there is a peak in additional cache hits and cache misses, or if there is no such visible
trend.
2. You need a test database for this exercise. Using the three-step procedure in Section
6.6.1 to estimate the optimal number of DBWRs, determine the number, say, N (≤
10), of DBWRs needed for an application. Set up this configuration in two different
ways by creating two separate instances for the application:
• One instance has N distinct DBWR processes, and
• The other instance has N distinct slaves of a single DBWR process.
Run the application under these two configurations and measure the runtimes of the
same batch jobs in the two cases. Do you see any difference in performance between
the configurations? Can you explain the difference?
3. Why does the use of bind variables help to reduce the number of “reloads”?
4. An Oracle instance has the following values.
Initialization Parameter PROCESSES = P
V$LICENSE. SESSIONS_HIGHWATER = H
Exercises 187
V$LICENSE. SESSIONS_CURRENT = C
Will there be any inconsistency with the condition
P < min (H, C) ?
5. An Oracle instance is created with the following values:
Redo Log File = 24000 KB
LOG_CHECKPOINT_INTERVAL = 6000 KB
LOG_CHECKPOINT_TIMEOUT = 1800
It is assumed that a log switch has occurred at 14:00:00 and then occurs every 16
minutes between 14:00:00 and 15:30:00 and that the redo log files fill up uniformly
over time. Thus, for example, it takes 8 minutes to write 12,000 KB to the redo log
file. Under these assumptions complete the following table.
(Note that the first row of the table has been completed as the starting point.)
7
Oracle Utility for Tuning and Optimization
Outline
7.1 Scope of Oracle Utilities
7.2 UTLBSTAT and UTLESTAT Utilities
7.3 Location and Function of the Utility Scripts
7.4 Procedure for Running the Utilities
7.5 UTLBSTAT/ESTAT Performance Report Analysis
7.6 Comprehensive Tuning Plan for Internal Level
7.7 Performance Tracking
7.8 Tuning Activities
Key Words
References and Further Reading
Exercises
of specific areas of the database, namely the data structures residing in memory and in
auxiliary storage. For a comprehensive performance tuning Oracle provides two utilities
called utlbstat and utlestat that collect systemwide statistics over an interval of time
specified by the DBA. The major part of this chapter discusses these two utilities and
analyzes the performance report generated by them.
The script UTLESTAT.sql creates a set of summary tables that collect statistics dur-
ing the specified interval of time, generates the output REPORT.TXT, and then drops all
the tables created by UTLBSTAT.sql and UTLESTAT.sql. The file REPORT.TXT re-
sides in the default directory of the account from which the script file UTLESTAT.sql is
run. Figure 7.2 shows the names of the tables and views created by UTLESTAT.sql and
the names of the source tables and views from which they collect the data.
192 7. Oracle Utility for Tuning and Optimization
Figures 7.3 and 7.4 contain respectively the full script files, UTLBSTAT.sql and
UTLESTAT.sql.
rem
rem $Header: utlbstat.sql 26.feb-96.19:20:51 gpongrac Exp $
bstat.sql
rem
Rem Copyright (c) 1988, 1996 by Oracle Corporation
Rem NAME
REM UTLBSTAT.SQL
Rem FUNCTION
Rem NOTES
Rem MODIFIED
Rem jloaiza 10/14/95 - add tablespace size
Rem jloaiza 09/19/95 - add waitstat
Rem jloaiza 09/04/95 - add per second and background waits
Rem drady 09/09/93 - merge changes from branch 1.1.312.2
Rem drady 03/22/93 - merge changes from branch 1.1.312.1
Rem drady 08/24/93 - bug 173918
Rem drady 03/04/93 - fix bug 152986
Rem glumpkin 11/16/92 - Renamed from UTLSTATB.SQL
rem
rem $Header: utlestat.sql 12-jan-98.13:50:59 kquinn Exp $ estat.sql
rem
Rem Copyright (c) 1988, 1996, 1998 by Oracle Corporation
Rem NAME
REM UTLESTAT.SQL
Rem FUNCTION
Rem This script will generate a report (in "report.txt") which
Rem will contain usefull information for performance
Rem monitoring. In particular information from v$sysstat,
Rem v$latch, and v$rollstat.
Rem NOTES
Rem Don't worry about errors during "drop table"s, they are
Rem normal.
Rem MODIFIED
Rem kquinn 01/12/98 - 607968: Correct nowait latch hit ratio
calc
Rem jklein 08/23/96 - bug 316570 - fix typo
Rem akolk 08/09/96 - #387757: fix latch hitratios
Rem akolk 07/19/96 - #298462: correcting latch miss rate
(Fixing)
Rem akolk 07/19/96 - #298462: correcting latch miss rate
Rem akolk 07/12/96 - #270507: remove db_block_write_batch
Rem jloaiza 10/14/95 - add vtcsh 5.18 (BBN) 2/20/90 Patch
level 0
Rem jloaiza 09/19/95 - add waitstat
Rem jloaiza 09/04/95 - per second stats, split background
waits
Rem drady 09/09/93 - merge changes from branch 1.1.312.2
Rem drady 04/26/93 - Stat name changes for 7.1
Rem drady 03/22/93 - merge changes from branch 1.1.312.1
Rem drady 08/24/93 - bug 173918
Rem drady 03/04/93 - fix bug 152986
Rem glumpkin 11/23/92 - Creation
Rem glumpkin 11/23/92 - Renamed from UTLSTATE.SQL
set numwidth 27
Rem Average length of the dirty buffer write queue. If this is larger
Rem than the value of:
Rem 1. (db_files * db_file_simultaneous_writes)/2
Rem or
Rem 2. 1/4 of db_block_buffers
Rem which ever is smaller and also there is a platform specific limit
Rem on the write batch size (normally 1024 or 2048 buffers). If
the average length of the dirty buffer write queue is larger
Rem than the value calculated before, increase
Rem db_file_simultaneous_writes or db_files.
Rem Also check for disks that are doing many more IOs than other
disks.
select queue.change/writes.change "Average Write Queue Length"
from stats$stats queue, stats$stats writes
where queue.name = 'summed dirty queue length'
and writes.name = 'write requests';
set charwidth 32;
set numwidth 13;
Rem System wide wait events for non-background processes (PMON,
Rem SMON, etc). Times are in hundreths of seconds. Each one of
Rem these is a context switch which costs CPU time. By looking at
Rem the Total Time you can often determine what is the bottleneck
Rem that processes are waiting for. This shows the total time
Rem spent waiting for a specific event and the average time per
Rem wait on that event.
select n1.event "Event Name",
n1.event_count "Count",
n1.time_waited "Total Time",
round(n1.time_waited/n1.event_count, 2) "Avg Time"
from stats$event n1
where n1.event_count > 0
order by n1.time_waited desc;
Rem System wide wait events for background processes (PMON, SMON, etc)
select n1.event "Event Name",
n1.event_count "Count",
n1.time_waited "Total Time",
round(n1.time_waited/n1.event_count, 2) "Avg Time"
from stats$bck_event n1
where n1.event_count > 0
order by n1.time_waited desc;
set charwidth 18;
set numwidth 11;
Rem Latch statistics. Latch contention will show up as a large
Rem value for the 'latch free' event in the wait events above.
Rem Sleeps should be low. The hit_ratio should be high.
Figure 7.7 contains data showing the performance of the library cache, which is a part of
the shared SQL pool in the SGA (see Section 4.2.1). The data help us to determine if the
shared SQL statements are being reparsed due to insufficient memory being allocated to
the library cache. The crucial data to look for in Figure 7.7 are SQL AREA under
LIBRARY and its accompanying statistics under GETHITRATIO, PINHITRATIO, and
RELOADS. These are defined below:
number of times that a requested object was found in cache
GETHITRATIO = ;
total number of requests made
number of times that a pinned object was in cache
PINHITRATIO = ;
total number of pin requests made
7.5 UTLBSTAT/ESTAT Performance Report Analysis 205
SVRMGR>
SVRMGR> set charwidth 12
Charwidth 12
SVRMGR> set numwidth 10
Numwidth 10
SVRMGR> Rem Select Library cache statistics. The pin hit rate
should be high.
SVRMGR> select namespace library,
2> gets,
3> round(decode(gethits,0,1,gethits)/decode(gets,0,1,gets),3)
4> gethitratio,
5> pins,
6> round(decode(pinhits,0,1,pinhits)/decode(pins,0,1,pins),3)
7> pinhitratio,
8> reloads, invalidations
9> from stats$lib;
LIBRARY GETS GETHITRATI PINS PINHITRATI RELOADS INVALIDATI
------------ ---- --------- ---- --------- ------- ---------
BODY 2 .5 2 .5 0 0
CLUSTER 0 1 0 1 0 0
INDEX 0 1 0 1 0 0
OBJECT 0 1 0 1 0 0
PIPE 0 1 0 1 0 0
SQL AREA 170 .9 502 .928 0 1
TABLE/PROCED 133 .842 133 .752 0 0
TRIGGER 0 1 0 1 0 0
8 rows selected.
Figure 7.8 contains a partial output of systemwide events statistics. It lists only those sta-
tistics that are used for our discussion here. This part of REPORT.TXT covers a wide
range of performance areas for the database. For each statistic, it displays the total num-
ber of operations, the total number of operations per user commit, per user logon, and per
second. "Per logon" will always be based on at least one logon since the script
UTLESTAT.sql logs on as "internal." We discuss only five major performance statistics
here.
(a) Buffer Hit Ratio: This has been discussed in Section 6.3. Figure 7.8 shows that
consistent gets = 1,013; db block gets = 460; physical reads = 216.
Therefore,
Logical reads = consistent gets + db block gets = 1,013 + 460 = 1,473
Buffer hit ratio = (logical reads–physical reads) / logical reads = .85.
Since the ratio < .9, the data block buffer cache should be increased in size.
(b) Number of DBWR Processes: The following three statistics determine if we need to
increase the number of DBWR background process.
Dirty buffers inspected = Number of modified (dirty) buffers that were aged out
via the LRU algorithm (see Section 4.2.1).
Free buffer inspected = Number of buffers that were skipped by foreground proc-
esses to find a free buffer.
Free buffer requested = Number of times a free buffer was requested.
The guidelines here are as follows.
1. The value of “dirty buffers inspected” should be zero. If the value is positive,
then DBWR is not working fast enough to write all the dirty buffers to the data
files before they are removed from the data block buffer cache under LRU.
2. The ratio (free buffer inspected/free buffer requested) should be ≤.04. If this
ratio is greater than .04, then there may be too many unusable buffers in the
data block buffer cache. It may mean that checkpoints are occurring too fre-
quently so that DBWR cannot keep up.
If the guidelines above are not met, set the initialization parameter
DBWR_IO_SLAVES to a positive value, the default being zero.
From Figure 7.8 we get
dirty buffers inspected = 3,
free buffer inspected/free buffer requested
= 4 / 237 = .02
7.5 UTLBSTAT/ESTAT Performance Report Analysis 207
Here we get ambiguous results in that (1) does not hold, but (2) holds. The value
of DBWR_IO_SLAVES is zero for the database. Further investigation is needed be-
fore we set this parameter to a positive value.
(c) Redo Log Buffer Hit Ratio: This has been discussed in Section 6.4.
Figure 7.8 shows that
redo log space requests = 805, redo entries = 387,815
Therefore, redo log buffer hit ratio
= (redo log space requests)/(redo entries)
= 805/387,815 = .002
This exceeds the guideline that the ratio should be ≤.0002, as noted in Section
6.4. Hence the value of the initialization parameter LGWR_IO_SLAVES should be
increased from its current default value of zero.
(d) Sorting Area Size: It is recommended that all the sorts be done in memory instead of
through the creation of temporary segment(s) in the TEMP tablespace.
Figure 7.8 shows that
sorts(memory) = 447, sorts (disk) = 16
The guideline is as follows: sorts (disk) should be <5% of sorts(memory). Otherwise,
increase the initialization parameter SORT_AREA_SIZE. Since 5% of 447 = 22.35,
the guideline is satisfied here.
(e) Dynamic Extension of Segments: Rollback segments created with an OPTIMAL
clause are dynamically extended if needed. Too much dynamic extension is bad for
performance. The following guideline involving the two statistics, recursive calls and
user calls, can be used to detect if any dynamic extension is occurring.
If recursive calls/user calls > 30, too much dynamic extension is occurring. In
this case, resize the extents of rollback segments resulting in fewer but larger extents.
See Section 5.10 for further details about rollback segments.
From Figure 7.8 we find that the ratio
(recursive calls/user calls) = 1975 / 111 = 17.8
indicating that the dynamic extension is not an issue here.
208 7. Oracle Utility for Tuning and Optimization
SVRMGR>
SVRMGR> set charwidth 27;
Charwidth 27
SVRMGR> set numwidth 12;
Numwidth 12
SVRMGR> Rem The total is the total value of the statistic between
the time
SVRMGR> Rem bstat was run and the time estat was run. Note that
the estat
SVRMGR> Rem script logs on as "internal" so the per_logon
statistics will
SVRMGR> Rem always be based on at least one logon.
SVRMGR> select n1.name "Statistic",
2> n1.change "Total",
3> round(n1.change/trans.change,2) "Per Transaction",
4> round(n1.change/logs.change,2) "Per Logon",
5> round(n1.change/(to_number(to_char(end_time, 'J'))*60*60*24 -
6> to_number(to_char(start_time, 'J'))*60*60*24 +
7> to_number(to_char(end_time, 'SSSSS')) -
8> to_number(to_char(start_time, 'SSSSS')))
9> , 2) "Per Second"
10> from stats$stats n1, stats$stats trans, stats$stats logs,
stats$dates
11> where trans.name='user commits'
12> and logs.name='logons cumulative'
13> and n1.change != 0
14> order by n1.name;
Statistic Total Per Transact Per Logon Per Second
----------------------- ----- ------------ --------- ----------
consistent gets 1013 1013 101.3 .18
db block gets 460 460 46 .08
dirty buffers inspected 3 3 .3 0
free buffer inspected 4 4 .4 0
free buffer requested 237 237 23.7 .04
physical reads 216 216 21.6 .04
recursive call 1975 1975 197.5 .35
redo entries 387815 387815 3878.15 65.27
redo log space requests 805 805 80.5 .13
sorts(disk) 16 16 1.6 0
sorts(memory) 447 447 44.7 .07
user calls 111 111 11.1 .02
74 rows selected
SVRMGR>
Figure 7.9 contains the output for this part. The value of this statistic should be very close
to zero. The "Rem" statements in Figure 7.9 provide the following guideline:
dirty buffer write queue ≤
min ((db_files * db_file_simultaneous_writes)/2,
db_block_buffers/4).
If this guideline is not met, increase the initialization parameter DB_FILES or
DB_FILE_SIMULTANEOUS_WRITES.
For this database, the values of the above three parameters are:
DB_FILES = 256, DB_FILE_SIMULTANEOUS_WRITES = 4,
DB_BLOCK_BUFFERS = 200
Since
0 = dirty buffer write queue ≤ 50 = min (512, 50),
the guideline is satisfied here.
SVRMGR>
SVRMGR> set numwidth 27
Numwidth 27
SVRMGR> Rem Average length of the dirty buffer write queue. If
this is larger
SVRMGR> Rem than the value of:
SVRMGR> Rem 1. (db_files * db_file_simultaneous_writes)/2
SVRMGR> Rem or
SVRMGR> Rem 2. 1/4 of db_block_buffers
SVRMGR> Rem which ever is smaller and also there is a platform
specific limit
SVRMGR> Rem on the write batch size (normally 1024 or 2048
buffers). If the average
SVRMGR> Rem length of the dirty buffer write queue is larger than
the value
SVRMGR> Rem calculated before, increase
db_file_simultaneous_writes or db_files.
SVRMGR> Rem Also check for disks that are doing many more IOs than
other disks.
SVRMGR> select queue.change/writes.change "Average Write Queue Length"
2> from stats$stats queue, stats$stats writes
3> where queue.name = 'summed dirty queue length'
4> and writes.name = 'write requests';
Average Write Queue Length
---------------------------
0
1 row selected.
SVRMGR>
Figure 7.10 contains the output for this part. Each wait event is a context switch that costs
CPU time. By looking at the Total Time column in Figure 7.10 we can determine the
bottleneck for processes. Total Time and Avg Time represent respectively the total
amount of time and the average amount of time that processes had to wait for the event.
Both times are computed over the period of time during which the statistics were col-
lected. The time is measured in 1/100ths of a second.
The general guideline in interpreting the statistics is to regard all "waits" as bad. The
waits for non-background processes discussed in this section directly affect the user proc-
esses. The following events are of interest for performance tuning.
free buffer waits = wait because a buffer is not available
buffer busy waits = wait because a buffer is either being read
into the data block buffer cache by another
session or the buffer is in memory in an
incompatible mode; i.e., some other session
is changing the buffer
SVRMGR>
SVRMGR> set charwidth 32;
Charwidth 32
SVRMGR> set numwidth 13;
Numwidth 13
SVRMGR> Rem System wide wait events for non-background Times are
in SVRMGR> hundreths of seconds. Each one of these is a context
switch
SVRMGR> Rem which costs CPU time. By looking at the Total Time you can
SVRMGR> Rem often determine what is the bottleneck that processes are
SVRMGR> Rem waiting for. This shows the total time spent waiting for a
SVRMGR> Rem specific event and the average time per wait on that event.
SVRMGR> select n1.event "Event Name",
2> n1.event_count "Count",
3> n1.time_waited "Total Time",
4> round(n1.time_waited/n1.event_count, 2) "Avg Time"
5> from stats$event n1
6> where n1.event_count > 0
7> order by n1.time_waited desc;
Event Name Count Total Time Avg Time
---------------------------- ----- ---------- --------
SQL*Net message from client 140 0 0
SQL*Net message from dblink 17 0 0
SQL*Net message to client 140 0 0
SQL*Net message to dblink 17 0 0
control file sequential read 17 0 0
db file sequential read 163 0 0
file open 18 0 0
log file sync 2 0 0
refresh controlfile command 4 0 0
9 rows selected.
SVRMGR>
Figure 7.11 contains the output for this part. The comments of Section 7.5.4 apply
equally well to this section. If ARCH, DBWR, or LGWR cannot keep up with the opera-
tion activities, increase respectively the values of the initialization parameters,
ARCH_IO_SLAVES, DBWR_ IO_SLAVES, or LGWR_ IO_SLAVES. See Sections
6.6.1, 6.6.2, and 6.6.4 for further details about these three parameters pertaining to the
background processes.
SVRMGR>
SVRMGR> Rem System wide wait events for background processes
(PMON, SMON, etc)
SVRMGR> select n1.event "Event Name",
2> n1.event_count "Count",
3> n1.time_waited "Total Time",
4> round(n1.time_waited/n1.event_count, 2) "Avg Time"
5> from stats$bck_event n1
6> where n1.event_count > 0
7> order by n1.time_waited desc;
Event Name Count Total Time Avg Time
---------------------------- ------ ---------- --------
control file parallel write 1882 0 0
control file sequential read 8 0 0
db file parallel write 10 0 0
db file scattered read 11 0 0
db file sequential read 17 0 0
log file parallel write 6 0 0
pmon timer 1896 0 0
rdbms ipc message 5691 0 0
smon timer 19 0 0
9 rows selected.
SVRMGR>
Figure 7.12 contains the output for this part. Oracle defines a latch as a "low level seriali-
zation mechanism to protect shared data structures in the SGA," i.e., the data block buffer
cache, the redo log buffer cache, the library cache, and the dictionary cache. In essence, a
latch is a lock on a part of the SGA to control accesses to data structures in the SGA. A
server or a background process acquires a latch for very short time while manipulating or
looking at one of these structures. The number of latches on the data block buffer cache is
given by the formula:
2 x (number of CPUs), in Oracle 7.0
6 x (number of CPUs), in Oracle 8i
7.5 UTLBSTAT/ESTAT Performance Report Analysis 213
The library cache has only one latch. Latch contention occurs when multiple Oracle
processes concurrently attempt to obtain the same latch. See Section 6.10 for an addi-
tional discussion of the latch.
Three of the columns, gets, misses, and sleeps, in Figure 7.12 are described below:
Gets = Number of times a latch was requested and was available;
Misses = Number of times a latch was requested and was not available initially;
Sleeps = Number of times a latch was requested, was not available, and was
requested again.
The two metrics that are used as guidelines for managing latch contention are:
SVRMGR>
SVRMGR> set charwidth 18;
Charwidth 18
SVRMGR> set numwidth 11;
Numwidth 11
SVRMGR> Rem Latch statistics. Latch contention will show up as a
large value for
SVRMGR> Rem the 'latch free' event in the wait events above.
SVRMGR> Rem Sleeps should be low. The hit_ratio should be high.
SVRMGR> select name latch_name, gets, misses,
2> round((gets-misses)/decode(gets,0,1,gets),3)
3> hit_ratio,
4> sleeps,
5> round(sleeps/decode(misses,0,1,misses),3) "SLEEPS/MISS"
6> from stats$latches
7> where gets != 0
8> order by name;
LATCH_NAME GETS MISSES HIT_RATIO SLEEPS SLEEPS/MISS
------------------ ----- ------ --------- ------ -----------
Active checkpoint 1896 0 1 0 0
Checkpoint queue l 4219 0 1 0 0
Token Manager 9 0 1 0 0
cache buffer handl 1 0 1 0 0
cache buffers chai 2871 0 1 0 0
cache buffers lru 24 0 1 0 0
dml lock allocatio 11 0 1 0 0
enqueue hash chain 657 0 1 0 0
enqueues 1298 0 1 0 0
global transaction 477 0 1 0 0
global tx free lis 6 0 1 0 0
global tx hash map 32 0 1 0 0
ktm global data 19 0 1 0 0
library cache 2654 0 1 0 0
library cache load 70 0 1 0 0
list of block allo 1 0 1 0 0
messages 11397 0 1 0 0
modify parameter v 100 0 1 0 0
multiblock read ob 38 0 1 0 0
ncodef allocation 90 0 1 0 0
process allocation 9 0 1 0 0
redo allocation 1925 0 1 0 0
redo writing 3794 0 1 0 0
row cache objects 2080 0 1 0 0
sequence cache 27 0 1 0 0
Figure 7.13 contains the statistics for this part. As with Section 7.5.6, the metric
NOWAIT_HIT_RATIO should exceed .99. For Figure 7.13 this guideline is met.
Figure 7.14 contains the statistics for this part. Unfortunately, no statistics are shown
since there is no wait involved here. However, we discuss below the guidelines that Ora-
cle offers in handling these statistics when they appear.
The output has three columns, CLASS, COUNT, and TIME, explained below:
CLASS Class of the latch; there are 14 of them altogether: bitmap block, bitmap index
block, data block, extent map, free list, save undo block, save undo header,
segment header, sort block, system undo block, system undo header, undo
block, undo header, and unused.
COUNT Number of times a request for a latch was made, but it was busy.
TIME Amount of time that the process waited.
It was noted in Section 7.5.4 that if (buffer busy waits) / (logical read) > .04, we need
to examine the specific CLASSes of data blocks that are involved in high contention. In
such cases, the output of this section lists CLASSes with positive values under COUNT
and TIME. We are listing below the corrective actions to take for some of the CLASSes
where contention is more frequent:
1 . Data Blocks: Occurs when DBWR cannot keep up; increase the value of
DBWR_IO_SLAVES parameter.
2. Free List: May occur when multiple data loading programs performing multiple IN-
SERTs run at the same time; try to stagger the runtimes to avoid simultaneous exe-
cution; otherwise, increase the value of FREELISTS in the STORAGE clause of the
affected segment(s). See Section 6.10.3 for more detail.
3. Segment Header: Occurs when there is free list contention; proceed as in paragraph
(2) above. See Section 6.10.3 for more detail.
4. Undo Header: Occurs when there are not enough rollback segments so that there is
contention for accessing the rollback segment header block; increase the number of
rollback segments. See Section 5.10 for a discussion of rollback segments.
SVRMGR>
SVRMGR> Rem Buffer busy wait statistics. If the value for 'buffer
busy wait' in
SVRMGR> Rem the wait event statistics is high, then this table
will identify
SVRMGR> Rem which class of blocks is having high contention. If
there are high
SVRMGR> Rem 'undo header' waits then add more rollback segments.
If there are
SVRMGR> Rem high 'segment header' waits then adding freelists
might help. Check
Figure 7.15 contains the statistics for this part. The only guideline that Oracle provides is
the following.
If TRANS_TBL_WAITS is high, add more rollback segments.
Refer to Section 5.10 for much more detailed discussions about how to identify and
resolve problems related to the sizes and numbers of the rollback segments.
SVRMGR>
SVRMGR> set numwidth 19;
Numwidth 19
SVRMGR> Rem Waits_for_trans_tbl high implies you should add
rollback segments.
SVRMGR> select * from stats$roll;
UNDO_SEGMENT TRANS_TBL_GETS TRANS_TBL_WAITS
UNDO_BYTES_WRITTEN SEGMENT_SIZE_BYTES XACTS SHRINKS
WRAPS
-------------- ---------------- ---------------- ---------
---------- -------------- ------------- --------- ---------
0 20 0
0 407552 0 0
0
1 21 0
0 3151872 0 0
0
2 20 0
0 3151872 0 0
0
3 27 0
2330 3151872 -1 0
0
4 24 0
54 3143680 0 0
0
5 26
0
2382 3149824 1 0
0
6 20 0
0 3143680 0 0
0
7 20 0
0 3149824 0 0
0
8 20 0
0 3149824 0 0
0
9 20 0
0 3149824 0 0
0
10 20 0
0 3143680 0 0
0
11 20 0
0 10483712 0 0
0
12 rows selected.
SVRMGR>
Figure 7.16 contains the data for this part. This is a list of those initialization parameters
that have been modified from their respective default values.
NAME VALUE
---------------------------- -------------------------------
audit_file_dest /oracle/EXAMPLE/adump
background_dump_dest /oracle/EXAMPLE/bdump
compatible 8.0.5
control_files /oradata/EXAMPLE/control01.ctl,
/oradata/EXAMPLE/control02.ctl
core_dump_dest /oracle/EXAMPLE/cdump
db_block_buffers 200
db_block_size 2048
db_file_multiblock_read_count 16
db_files 256
db_name EXAMPLE
dml_locks 100
global_names FALSE
log_archive_dest /oracle/EXAMPLE/arch
log_archive_start FALSE
log_buffer 65536
log_checkpoint_interval 5000
log_files 255
max_dump_file_size 10240
max_enabled_roles 35
max_rollback_segments 100
nls_date_format DD-MON-YYYY HH24MISS
open_links 20
open_links_per_instance 20
processes 100
rollback_segments RBS1, RBS2, RBS3, L_RBS_01
sequence_cache_entries 10
sequence_cache_hash_buckets 10
shared_pool_size 10000000
user_dump_dest /oracle/EXAMPLE/udump
29 rows selected.
SVRMGR>
Figure 7.17 contains the statistics for this part. The columns of interest are GET_REQS,
GET_MISS, SCAN_REQ, and SCAN_MIS, explained below:
GET_REQS Number of requests for objects in the dictionary cache;
GET_MISS Number of object information not in cache;
SCAN_REQ Number of scan requests;
SCAN_MIS Number of times a scan failed to find the data in the cache.
The following guidelines are used for interpreting the data dictionary statistics.
220 7. Oracle Utility for Tuning and Optimization
Figure 7.18 contains the statistics for this part. The four columns, READS, BLKS_
READ, WRITES, and BLKS_WRT, show the I/O operations for the database. They are
described below:
7.5 UTLBSTAT/ESTAT Performance Report Analysis 221
SVRMGR>
SVRMGR> set charwidth 80;
Charwidth 80
SVRMGR> set numwidth 10;
Numwidth 10
SVRMGR> Rem Sum IO operations over tablespaces.
SVRMGR> select
2> table_space||' '
3> table_space,
4> sum(phys_reads) reads, sum(phys_blks_rd) blks_read,
5> sum(phys_rd_time) read_time, sum(phys_writes) writes,
6> sum(phys_blks_wr) blks_wrt, sum(phys_wrt_tim) write_time,
7> sum(megabytes_size) megabytes
8> from stats$files
9> group by table_space
10> order by table_space;
Figure 7.19 contains the statistics for this part. It includes the data of Figure 7.18 along
with the datafile names of the tablespaces, which is unique to this output. Hence the out-
put shown in Figure 7.19 has been reformatted to display only the first five columns of
the output. It omits the remaining four columns, WRITES, BLKS_WRT, WRITE_TIME,
and MEGABYTES, which are available from Figure 7.18. The I/O load should be evenly
distributed among the disk drives. This has been discussed in detail in Section 6.6.1.
SVRMGR>
SVRMGR> set charwidth 48;
Charwidth 48
SVRMGR> set numwidth 10;
Numwidth 10
SVRMGR> Rem I/O should be spread evenly accross drives. A big
difference between
SVRMGR> Rem phys_reads and phys_blks_rd implies table scans are
going on.
SVRMGR> select table_space, file_name,
2> phys_reads reads, phys_blks_rd blks_read, phys_rd_time
3> read_time, phys_writes writes, phys_blks_wr blks_wrt,
4> phys_wrt_tim write_time, megabytes_size megabytes
5> from stats$files order by table_space, file_name;
TABLE_SP FILE_NAME READS BLKS_READ READ_TIME
-------- ------------------------- ----- --------- ---------
DATA /abc01/EXAMPLE/data01.dbf 1 1 0
IDX /abc07/EXAMPLE/idx01.dbf 1 1 0
RLBK /abc10/EXAMPLE/rlbk01.dbf 9 9 0
SYSTEM /abc06/EXAMPLE/system01.dbf 173 220 0
TEMP /abc09/EXAMPLE/temp01.dbf 0 0 0
5 rows selected.
SVRMGR>
7.5.14 Date/Time
Figure 7.20 contains the output for this part. It shows the start time and the end time of
the period during which the two utility scripts collected the data.
SVRMGR>
SVRMGR> set charwidth 25
Charwidth 25
SVRMGR> Rem The times that bstat and estat were run.
SVRMGR> select to_char(start_time, 'dd-mon-yy hh24:mi:ss') start_time,
2> to_char(end_time, 'dd-mon-yy hh24:mi:ss') end_time
3> from stats$dates;
START_TIME END_TIME
------------------ ------------------
13-sep-99 12:57:48 13-sep-99 14:32:56
1 row selected.
SVRMGR>
Figure 7.21 contains the output of this part. It shows the versions of all the Oracle prod-
ucts that were running at the time of generating the report.
SVRMGR>
SVRMGR> set charwidth 75
Charwidth 75
SVRMGR> Rem Versions
SVRMGR> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle8 Enterprise Edition Release 8.0.5.1.0 - Production
PL/SQL Release 8.0.5.1.0 - Production
CORE Version 4.0.5.0.0 - Production
TNS for Solaris: Version 8.0.5.0.0 - Production
NLSRTL Version 3.3.2.0.0 - Production
5 rows selected.
SVRMGR>
1. Establish the baseline of the database using the scripts in Figures 5.1, 5.3, 5.5, 5.7,
5.9, 5.11, and 5.13.
2. Run the script in Figure 5.15 to identify any user having SYSTEM as its default or
temporary tablespace. Assign TOOLS as the default tablespace and TEMP as the
temporary tablespace for these users.
3. Monitor all changes to the database at the schema level and update the conceptual
level accordingly. The conceptual and the internal levels must always match on
schema, i.e., tables, indices, views, constraints, procedures, functions, packages,
package bodies, sequences, and synonyms.
4. Monitor used space fragmentation in data and index tables with the script in Figure
5.16. Examine the contiguity of extents using the algorithm given in Section 5.6.
7.8 Tuning Activities 225
5. Monitor free space fragmentation in tablespaces with the scripts in Figures 5.18,
5.19, 5.21, 5.24, and 5.26. Examine the contiguity of extents using the algorithm
given in Section 5.8.
6. Monitor the chaining of tables with the scripts in Figures 5.28 and 5.29.
7. Monitor the contention and sizing of rollback segments with the scripts in Figures
5.33, 5.34, 5.36, and 5.37. Create two rollback tablespaces, if appropriate, as de-
scribed in Section 5.10.7.
1. Establish the baseline of the instance via its init<instance name>.ora file.
2. Monitor the performance of SGA components via the various hit ratios and other
performance metrics discussed in Sections 6.3 through 6.5.
3. Monitor the performance of the background processes by checking the adequacy of
DBWRs, LGWRs, and ARCHs. Also, monitor the efficiency of checkpoints and log
switches. Use the scripts given in Figures 6.3, 6.5, 6.9, 6.11, 6.17, 6.19, 6.22, 6.24,
6.26, 6.29, and 6.32 to help you in executing Steps (2) and (3).
4. Monitor the change of the initialization parameter values and ensure that the changed
values are taking effect. For example, in many cases you need to shut down and then
start up the database after any changes made to the init<instance name>.ora file.
5. Monitor ongoing database performance by using the V$ views.
6. For a stable database, run the UTLBSTAT/ESTAT utility scripts every three to four
weeks. Analyze REPORT.TXT generated by the scripts to identify problem areas.
See Sections 7.5 and its subsections for more details.
1. If the used space in tables or indices shows fragmentation, take appropriate correc-
tive action by resizing their extents. See Section 5.6 for further details. Refer to Ap-
pendix A for sizing algorithms with examples.
2. If the free space in tablespaces shows fragmentation, take appropriate corrective ac-
tion using the measures in Section 5.8.
3. If tables indicate excessive chaining, take corrective action with the measures in
Section 5.9.
4. If the rollback segments show contention, use the program in Figure 5.40 to estimate
their correct size and number.
226 7. Oracle Utility for Tuning and Optimization
1. If any of the hit ratios or performance metrics do not meet the target values, take ap-
propriate corrective actions. See Sections 6.3 through 6.5.
2. Through performance tracking assess the impact of Step (1). If the problem persists,
repeat Step (1) until the problem is resolved.
3. Experiment with the sizing of the data block buffer cache and shared SQL pool using
the scripts in Figures 6.7, 6.8, and 6.13.
Key Words
background process LGWR
data block buffers library cache
data block header memory cache
data dictionary redo log buffer
DBWR redo log file
dictionary cache REPORT.TXT
hit ratio, data block buffer rollback segment
hit ratio, dictionary cache SGA
hit ratio, library cache shared SQL pool
hit ratio, LRU UTLBSTAT
hit ratio, redo log buffer UTLESTAT
initialization parameter V$ view
instance WAITS
latch WRAPS
Corey et al. [2, Chapter 8] offers an integrated coverage of performance tuning under
the title "Putting it All Together: A Wholistic Approach,” which is helpful for junior
DBAs. Loney [3, Chapter 6] recommends that a separate database be created to monitor
the performance of one or more production databases, because otherwise the scripts run
against the production databases may skew the statistical findings. Burleson [1, Chapter
11] suggests creating an Oracle performance database with Oracle utilities for a compre-
hensive performance tracking and tuning plan. The database will be populated by a modi-
fied UTLESTAT utility that will dump the collected data into this database after
REPORT.TXT is created. The unmodified UTLESTAT utility currently deletes all the
Exercises 227
tables that were created to collect statistical data (see the last section, Drop Temporary
Tables, of Figure 7.4).
Exercises
No exercises are provided here since this chapter discusses almost exclusively the two
Oracle utilities, UTLBSTAT and UTLESTAT. Running them on a regular basis and fol-
lowing the checklists of Sections 7.7 and 7.8 for performance tracking and tuning of da-
tabases will be the appropriate exercises.
8
Optimization of the External Level of a Database
Outline
8.1 Contents of the External Level
8.2 Principles of Query Optimization
8.3 Query Optimization in Oracle
8.4 Optimal Indexing Guidelines
8.5 Methodology for Optimal Indexing
8.6 Implementation of Indices in Oracle
8.7 Tools for Tracking Query Performance
Key Words
References and Further Reading
Exercises
Parse Phase
• The optimizer checks the syntax of the query for accuracy.
• It then identifies the database objects that are referenced in the query.
• It refers to the data dictionary to resolve all these references. The referred objects
must exist in the schema accessible by the user entering the query.
• It reports all errors encountered in the above steps.
• The cycle repeats until all the errors are rectified.
8.2 Principles of Query Optimization 231
Execute Phase
• The optimizer performs the necessary read and write operations to support the parsed
query.
Fetch Phase
• The optimizer retrieves all data, if any, that are returned by the execute phase, sorts
them if necessary, and displays the final information to the user.
The parse phase is usually the most resource intensive and time consuming, because
during this phase the optimizer examines alternative execution plans for the query and
selects the one with the least cost. Consequently, the query optimizer concentrates on the
parse phase to optimize performance. The process of query optimization consists of the
following steps.
dominant when a query accesses remote databases or fetches data from different nodes in
a distributed database.
Of the three phases of query processing mentioned earlier, the parse phase corre-
sponds to Steps (1) and (2), and the execute and fetch phases together correspond to Step
(3).
The basic ingredient of any query processing is table search. The search can be un-
qualified as in
SELECT column(s) FROM table;
or qualified as in
SELECT column(s) FROM table(s) WHERE condition(s);
The search can be implemented as a sequential access or indexed access of the table.
In a sequential access of a table, all its rows are accessed; in the case of an unqualified
search all of them are returned, whereas in a qualified search zero or more rows matching
the selection criteria specified in the WHERE clause are returned. In an indexed access to
a table, a two-step search is made: first, a binary search of the index table to retrieve the
addresses (ROWIDs in Oracle) of the matching rows, and, then, a direct access of the
rows via their retrieved addresses.
An index can be unique or nonunique. For a unique index, each index value is dis-
tinct. For a nonunique index, duplicate values can exist. If an index consists of a single
column, it is called a single index. If an index includes two or more columns, it is called a
concatenated or a composite index.
may ignore the hint. Some of these hints enforce a set operation and some others a row
operation. For example, the hint ALL_ROWS ensures that the output is displayed only
after all the returned rows are ready and is, therefore, a set operation. On the other hand,
the hint FIRST_ROWS ensures that the output starts to be displayed as soon as the first
row is returned and is thus a row operation. For a join, the hint MERGE_JOIN is a set
operation, and NESTED_LOOPS is a row operation.
Optimizer Modes
Starting with Oracle7 two query optimizer modes are available: rule-based and cost-
based. The mode can be set at any one of three levels: instance, session, and statement.
• Instance level: This is set by the value of the initialization parameter
OPTIMIZER_MODE. Its default value is CHOOSE and its other possible values are
RULE, FIRST_ROWS, and ALL_ROWS.
• Session level: This is set by the ALTER SESSION command as follows,
ALTER SESSION SET OPTIMIZER_GOAL = value,
where value is any one of the four settings: CHOOSE, RULE, FIRST_ROWS, and
ALL_ROWS. The setting remains valid only during the session.
• Statement level: This is set by including hints as part of the query. See Sections 9.3.1
and 9.3.2 for a discussion of hints.
The query optimizer operates as follows under the four settings mentioned above:
Rule
Rule-based optimization is syntax driven in that it uses the query syntax to evaluate the
alternative access paths that a query can follow. Each path is assigned a score based on
the rank that Oracle assigns to that path for data retrieval. The ranks range from 1 to 15,
as listed in Figure 8.1. Rank 1 is the best and rank 15 the worst.
12 Sort-merge join
13 MAX or MIN of indexed column
14 ORDER BY on indexed columns
15 Full table scan
The success of rule-based optimization depends on how well the query has been
tuned.
Choose
This is the default setting. If at least one of the tables used in a query has been ANA-
LYZEd with the COMPUTE STATISTICS or ESTIMATE STATISTICS option, then
CHOOSE enforces the cost-based optimizer for optimizing that query. Also, queries
against partitioned tables always use the cost-based optimizer irrespective of the presence
of statistics. In all other cases, the rule-based optimizer is used. In order to optimize a
query the cost-based optimizer uses statistical data about the tables used in the query. For
example, the optimizer knows how many rows the tables have, how selective the indices
are, how the data values are distributed within each table, etc. All of these statistics are
used in evaluating alternative execution plans and selecting the one with the least cost.
The cost of a full table scan of table is computed by the formula:
Value of BLOCKS column for table in DBA_TABLES
Cost = .
DB_FILE_MULTIBLOCK_READ_CNT
The cost of an indexed search by index is computed by the formula:
Selectivity of index
Cost = .
Clustering Factor of index in DBA_INDEXES
See Section 8.4.1 for the definition of selectivity and Section 8.6.1 for the definition
of clustering factor.
If the statistics are not up to date, the cost computation will be wrong and, as a result,
the path chosen may not be optimal. Hence it is essential to ANALYZE all the tables in a
database and to reANALYZE a table if it has undergone some significant update activi-
ties. Each time a table is ANALYZEd, the statistics pertaining to itself and all of its index
tables are updated. Also, if some of the tables in a database are ANALYZEd and the rest
are not, the cost-based optimizer may perform full table scans on the latter. This often de-
feats the purpose of query optimization. Therefore, tables should be ANALYZEd regu-
larly.
Oracle recommends that all queries use the cost-based optimizer since it is more reli-
able. Also, Oracle no longer enhances the rule-based optimizer. The role of the rule-based
optimizer will diminish over time. If, for some reason, one wants to use both the cost-
based and the rule-based optimizers, then the DBA should set OPTIMIZER_MODE to its
8.4 Query Optimization in Oracle 235
default value of CHOOSE. A developer can override the cost-based optimizer for a query
by explicitly using the hint RULE in its formulation. But such a hybrid approach should
be taken only if there is a strong justification for it.
FIRST_ROWS, ALL_ROWS
By setting the parameter to FIRST_ROWS or ALL_ROWS one can enforce the cost-
based optimizer. However, FIRST_ROWS is a row operation. Hence it causes the opti-
mizer to choose execution plans that minimize the response time. On the other hand,
ALL_ROWS is a set operation. Hence it causes the optimizer to choose execution plans
that maximize the throughput.
In conclusion, we recommend that the default setting of CHOOSE be used for a data-
base. All the tables must be ANALYZEd to start with and be reANALYZEd whenever
major updates occur. For OLTP applications with plenty of update transactions by the us-
ers it is advisable to ANALYZE the updated tables on a weekly basis. All index tables
associated with a data table are then automatically reANALYZEd.
Step 1: The column(s) targeted for indexing must have high selectivity.
The selectivity of a set of column(s) in a table is defined as the ratio m/n, where
m = number of distinct values of the set, often called its cardinality;
n = total number of rows in the table.
236 8. Optimization of the External Level of a Database
The selectivity of a set of columns is a measure of the usefulness of that set in reduc-
ing the I/O required by queries against the table using that set. The maximum value of
selectivity is 1 when every value of the set is distinct so that m = n. For example, the col-
umn(s) used as the primary key of a table have selectivity 1.
Given two columns C1 and C2 with respective selectivities s1 and s2, say, the selectiv-
ity of C1 Ψ C2 is given by
s1 + s2 , if Ψ is AND operator
s1 + s2–s1 * s2, if Ψ is OR operator
Figure 8.2 contains a script with a partial output that lists all indices in all tables with
their respective selectivities. Before using this script the following command must be run
for each table in the database,
ANALYZE TABLE table_name COMPUTE STATISTICS;
PROMPT
PROMPT
PROMPT Before running this script you must ANALYZE each table
in the PROMPT database with COMPUTE STATISTICS option.
PROMPT
PROMPT List of all indices with their selectivities
PROMPT
SELECT A.TABLE_NAME, A.INDEX_NAME,
ROUND ((A.DISTINCT_KEYS/DECODE(B.NUM_ROWS, 0, 1,
B.NUM_ROWS)), 2) SELECTIVITY
FROM USER_INDEXES A, USER_TABLES B
WHERE A.TABLE_NAME = B.TABLE_NAME
ORDER BY A.TABLE_NAME, A.INDEX_NAME;
PARTIAL OUTPUT:
Before running this script you must ANALYZE each table in the
database with COMPUTE STATISTICS option.
List of all indices with their selectivities
TABLE_NAME INDEX_NAME SELECTIVITY
---------- ---------- -----------
TABLE_1 PK_TABLE_1 1
TABLE_2 IND_TABLE_2 0.44
TABLE_3 PK_TABLE_3 1
TABLE_4 PK_TABLE_4 .98
TABLE_5 PK_TABLE_5 1.09
TABLE_6 UNQ_TABLE_6 .92
TABLE_7 IND_TABLE_7 .88
TABLE_8 UNQ_TABLE_8 .8
Figure 8.3 contains a script with partial output to compute the selectivity of individual
columns of any table designated by the user.
PROMPT
PROMPT
PROMPT Enter the name of the table for which you want the
selectivity of
PROMPT its columns
PROMPT
ACCEPT TARGET_TABLE
PROMPT
PROMPT Table name is &TARGET_TABLE
PROMPT
SELECT B.COLUMN_NAME, NUM_DISTINCT, NUM_ROWS,
ROUND ( (NUM_DISTINCT/DECODE(NUM_ROWS, 0, 1, NUM_ROWS)), 2 )
SELECTIVITY
FROM USER_INDEXES A, USER_TAB_COL_STATISTICS B
WHERE B.TABLE_NAME = '&TARGET_TABLE'
AND A.TABLE_NAME = B.TABLE_NAME
ORDER BY NUM_DISTINCT DESC;
PARTIAL OUTPUT:
Enter the name of the table for which you want the selectivity of
its columns
TABLE_X
Table name is TABLE_X
old 4: WHERE B.TABLE_NAME = '&TARGET_TABLE'
new 4: WHERE B.TABLE_NAME = 'TABLE_X'
COLUMN_NAME NUM_DISTINCT NUM_ROWS SELECTIVITY
------------ ------------ -------- -----------
COLUMN_1 2430 2430 1.00
COLUMN_2 2225 2430 .92
COLUMN_3 1623 2430 .67
COLUMN_4 1053 2430 .43
COLUMN_5 1031 2430 .42
COLUMN_6 867 2430 .36
COLUMN_7 708 2430 .29
COLUMN_8 327 2430 .13
COLUMN_9 252 2430 .1
COLUMN_10 59 2430 .02
COLUMN_11 27 2430 .01
COLUMN_12 12 2430 0
COLUMN_13 7 2430 0
COLUMN_14 5 2430 0
The script in Figure 8.3 computes the selectivity of individual columns only. If we
need the selectivity of concatenated columns in a table, we need to query that table with
238 8. Optimization of the External Level of a Database
the specific concatenation of columns. For example, suppose that a three-column con-
catenated index is created on the three columns (COLUMN_X, COLUMN_Y,
COLUMN_Z). Then the selectivity of the concatenated columns is determined as fol-
lows.
COLUMN A HEADING 'Count of |Concatenated Columns'
COLUMN B HEADING 'Selectivity of |Concatenated Columns'
SELECT COUNT(DISTINCT COLUMN_X||'%'||COLUMN_Y||'%'||COLUMN_Z) A
FROM TABLE_NEEDED;
SELECT ROUND (COUNT(DISTINCT
COLUMN_X||'%'||COLUMN_Y||'%'||COLUMN_Z)
/ DECODE (COUNT (1), 0, 1, COUNT (1)), 2)
B FROM TABLE_NEEDED;
The output from the above code appears below.
Count of
Concatenated Columns
--------------------
257
Selectivity of
Concatenated Columns
--------------------
.86
Step 2: The leading column of a concatenated index must have high selectivity.
The first column of a concatenated index is called its leading column. This column
should be used in the predicates of the WHERE clause of a query. To make the indexed
search efficient, it is necessary that the leading column have high selectivity. The second
most used and most selective column should be the second column in the concatenated
index, and so on. If such columns are not available, the application design should be re-
visited. The selectivity of the leading column of a concatenated index plays a major role
in the selectivity of the entire index.
sion in either case goes in favor of a multicolumn concatenated index. In reality, how-
ever, this desired option can offer substantial improvement over the other option, espe-
cially in the case of AND-EQUAL operation (see Section 8.7.1 under the paragraph
“OPERATION with OPTIONS”). If necessary, in addition to creating a multicolumn
concatenated index, one or more separate single-column indices may be created on some
of the component columns of the concatenated index.
It is possible to combine several single columns each with low selectivity into one
concatenated column set with high selectivity. In that case, creating a concatenated index
on the column set improves performance. As an example, consider a report that lists em-
ployees by gender belonging to each department of a company. The column “Sex” has
only two values, M and F, say. The column “Department” also has a low selectivity.
However, the pair (Department, Sex) has a relatively high selectivity. We put “Depart-
ment” as the leading column in the index (Department, Sex) since its cardinality is higher
than that of ‘Sex’ whenever a company has three or more departments.
Step 1: Identify the columns and tables that may need to be indexed.
As described in Section 8.1, each external level of a database involves a query using one
or more column(s) in one or more table(s). Make a complete list of these items in a for-
mat as follows:
240 8. Optimization of the External Level of a Database
Table created.
SQL> alter table X add constraint X_PK primary key (a, b, c);
Table altered.
SQL> desc X
Name Null? Type
------------------- -------------- ---------------------
A NOT NULL NUMBER
B NOT NULL CHAR(4)
C NOT NULL VARCHAR2(10)
D NUMBER(8,2)
E DATE
NOTE: The PK-columns are made NOT NULL.
SQL> create table Y
2 (aa number,
3 bb char (4),
4 cc varchar2 (10),
5 f date,
6 g number (7,3),
7 h varchar2 (22));
Table created.
SQL> alter table Y add constraint Y_FK
2 FOREIGN KEY (aa, bb, cc) references X (a, b, c);
Table altered.
SQL> insert into X (a, b, c, d)
2 values (1, 'JOHN', 'MA', 68.23);
1 row created.
SQL> insert into Y (aa, bb, cc, g)
2 values (NULL, NULL, 'MA', 77.231);
1 row created.
SQL> insert into Y (aa, bb, cc, g)
2 values (NULL, NULL, NULL, 65);
1 row created.
NOTE: Y(NULL, NULL, 'MA') and Y(NULL, NULL, NULL) both match X(1, 'JOHN',
'MA').
242 8. Optimization of the External Level of a Database
UK-FK Relationship:
SQL> edit
Wrote file afiedt.buf
1 create table P
2 (p number,
3 q char (4),
4 r varchar2 (10),
5 s number (8,2),
6* t date)
SQL> /
Table created.
SQL> alter table P add constraint P_UK unique (p, q, r);
Table altered.
SQL> desc P
Name Null? Type
-------------------- ---------- ---------
P NUMBER
Q CHAR(4)
R VARCHAR2(10)
S NUMBER(8,2)
T DATE
Table altered.
SQL> insert into P (p, q, r, s)
2 values (NULL, NULL, 'MA', 68.23);
1 row created.
SQL> insert into Q (pp, qq, rr, v)
2 values (1, 'JOHN', 'MA', 68.23);
1 row created.
SQL> insert into Q (pp, qq, rr, v)
2 values (1, NULL, 'MA', 68.23);
1 row created.
NOTE: Q(1, 'JOHN', 'MA') and Q(1, NULL, 'MA') both match P (NULL, NULL, 'MA').
We have discussed above the case of a unique index where a ROWID exists for every
data value. For a nonunique index, the ROWID is included in the key in sorted order so
that such indices are sorted by the index key value and the ROWID. Key values contain-
ing all NULLs are not indexed except for cluster indices (see Section 8.6.5). Two rows
can both contain all NULLs and still not violate a unique index constraint. For a concate-
nated index, the key value consists of the concatenation of individual component values
of the index.
<6500 6500
1000 1700 2000 5000 6000 6400 6500 6900 7000 7500 8000 9000 9500
above the split leaf node, to point to the new leaf node. This process is repeated as new
rows are inserted into the indexed table until the rightmost branch node at level 1 be-
comes full and has to split in two. This pattern is repeated until the root node at level 0
can no longer contain the pointers to the nodes at level 1. Since Oracle does not split the
root node, it adds a new level of branch nodes immediately above the level containing the
leaf nodes. Then we get a four-level B*-tree where the leaf nodes now appear as level 3
and the branch levels above it are labeled as levels 2 and 1 pointing upward. The deeper
an index is, the less efficient it becomes. Oracle recommends that a B*-tree should not
extend to more than four levels. The performance of a B*-tree index degrades when the
tree contains more than four levels. In such cases, a rebuilding of the index is needed. See
Section 9.2.4 for more details about rebuilding indices.
Figure 8.5 shows an example of the statistics for a B*-tree index and Figure 8.6 con-
tains a sample output from the script.
SPOOL My_Directory\INDEX_STRUCTURE.lst
SELECT TABLE_NAME, INDEX_NAME, BLEVEL, DISTINCT_KEYS, LEAF_BLOCKS,
AVG_LEAF_BLOCKS_PER_KEY, AVG_DATA_BLOCKS_PER_KEY, CLUSTERING_FACTOR,
TO_CHAR (SysDate, 'fmMonth ddth, YYYY') TODAY
FROM DBA_INDEXES
WHERE TABLE_NAME NOT LIKE '%$' AND TABLE_NAME NOT LIKE '%#'
AND TABLE_NAME NOT LIKE '%#%#' AND TABLE_NAME NOT LIKE '%$%'
ORDER BY TABLE_NAME, INDEX_NAME;
SPOOL OFF
The script contains column names from DBA_INDEXES, some of which are ex-
plained below.
BLEVEL B*-Tree level; depth of the index from its root
block to its leaf blocks. A depth of zero indicates
that the root node and the leaf node are the same.
Even for very large indices the BLEVEL should not
exceed three. Each BLEVEL represents an addi-
tional I/O that must be performed against the B*-
tree.
LEAF_BLOCKS Number of leaf blocks in the tree.
DISTINCT_KEYS Number of distinct keys in the index, also called the
cardinality of the index; if this value is less than 15
or so, a bitmap index may perform better.
AVG_LEAF_BLOCKS_PER_KEY Average number of leaf blocks per key; this is al-
ways 1 except for nonunique indices.
AVG_DATA_BLOCKS_PER_KEY Average number of data blocks per key; this meas-
ures the size of the index; its value is high for indi-
ces with low cardinality (e.g., gender, status, etc.)
and for very large indices.
CLUSTERING_FACTOR It represents the total count of the indexed table
blocks that have to be fetched if all the I/O is done
via the leaf block records. If the clustering factor is
close to the number of blocks in the table, then the
index is considered to be well clustered. This yields
a low cost for the query. If the clustering factor is
close to the rowcount of the indexed table, then for
every leaf record a different block of the table has to
be fetched. This yields a high cost for the query.
8.6 Implementation of Indices in Oracle 249
Looking at Figure 8.6 we find that the depth of the tree is mostly 0 or 1, which is
good. Since all the indices are PKs, their distinct values equal the rowcounts of the un-
derlying indexed tables. Hence the clustering factors of TABLE1 and TABLE9 are very
close to their respective rowcounts. So, queries involving PK_TABLE1 and
PK_TABLE9 will have high costs.
The mathematical theory of B*-trees and the general tree traversal algorithm are dis-
cussed in Section E14, Appendix E.
them and then index the columns appropriately. Bitmap indices are especially suitable for
handling such situations where columns have low selectivity. Oracle was the first to ap-
ply this principle to a relational DBMS.
A bitmap index for a column is useful under the following circumstances.
(a) The column has low selectivity.
(b) The table is very infrequently updated.
(c) The table is very large, say over 500,000 rows.
(d) The query involves one or more columns connected by AND or OR.
Internally a bitmap index maps the distinct values of the bitmap indexed column to
each record. One row is stored in the index table for each distinct value of the column.
The number of bits (0 or 1) in each row equals the number of rows in the indexed table.
When a new row is inserted in the table, an extra bit is added to each row in the index ta-
ble. The value of this bit depends on the value of the indexed column in the new row.
When an existing row in the table is updated or deleted, the bit in its corresponding posi-
tion in the bitmap index table is changed accordingly. During this transaction the entire
bitmap segment is locked, making it a costly operation. Hence bitmap indices are rec-
ommended only for tables that are not frequently updated. The following example shows
how a query on a bitmap indexed table is performed.
The bitmap index tables for the four columns have the following values.
Suppose now that the following row is inserted into the table: (1301, 50, Individual,
South, Active). Then an extra bit is added to each row of each bitmap index table as fol-
lows.
Suppose that we run the following query against the table CLAIM_HISTORY:
As shown below, the query is processed by first selecting the rows that have “1” for
each condition separately and then returning those that have “1” for all four conditions
together. The query optimizer quickly compares the bitmap values for each qualifying
condition in the query and returns the final result.
Therefore, rows 1 and 6 are returned by the query. Thus, the output appears as fol-
lows.
252 8. Optimization of the External Level of a Database
As shown above, the bitmap indices perform very fast Boolean operations on columns
with low selectivity. Complex AND and OR logic are performed within the index table
without accessing the underlying data table. The latter is accessed only to retrieve the
rows returned by the complete set of qualifying conditions. Without a bitmap index many
such queries would be processed via full table scans. The higher the number of columns
used in the qualifying conditions, the more valuable the bitmap indices are. Also, because
of the high number of repeating “1”s and “0”s in the index table, bitmap indices can be
compressed very effectively for storage and then decompressed at execution time. The
lower the selectivity, the better the compression. Thus, in the above example, the com-
pression of Client_Type and Claim_Status, each with only two distinct values, is better
than that for Region with four distinct values and Policy_Type with five distinct values.
In decompressed form, the index files for Client_Type and Claim_Status are each half in
size of the index file for Region.
Given that bitmap indices are suitable for columns with low selectivity, use the script
in Figure 8.3 and the SELECT statements appearing below Figure 8.3 to identify the po-
tential candidates for bitmap indexing. A column with selectivity below .5 may benefit
from bitmap indexing. The ideal method is to create both B*-tree and bitmap indices on
such a column and then test the runtime for queries on that column under both situations.
The index that runs faster should be selected.
8.6 Implementation of Indices in Oracle 253
Update transactions treat B*-tree and bitmap indices differently. Suppose that n in-
dexed rows in a table T are being updated. If T is B*-tree indexed, then the tree itself is
restructured after each row is updated so as to keep it balanced. As a result, the B*-tree is
modified n times, thereby slowing down the updates. But if T is bitmap indexed, then the
index table is updated after all the n rows have been updated. During the update of each
row of T the updated bitmap column and the ROWID of each updated row are stored
temporarily in the sort buffer in memory. Hence the initialization parameter
SORT_AREA_SIZE must be sized properly for good performance with INSERTs and
UPDATEs on bitmap indices. In addition, two other initialization parameters must be
properly set to yield good performance on bitmap indices.
(a) CREATE_BITMAP_AREA_SIZE: This parameter measures the amount of memory
allocated for bitmap indices. Its default value is 8 MB, although a larger value leads
to faster index creation. If the selectivity of the indexed column is very low, say .2 or
.3, the parameter may be set at KB level. As a general rule, the higher the selectivity,
the larger the value of the parameter to yield good performance.
(b) BITMAP_MERGE_AREA_SIZE: This parameter measures the amount of memory
needed to merge bitmaps retrieved from a range scan of the index. Its default value is
1 MB. A larger value improves performance since the bitmap segments must be
sorted before being merged into a single bitmap.
Neither of these two parameters can be altered at the session level.
A reverse key index reverses the bytes of each column key value when storing them in
the index leaf nodes, but maintains the column order for a concatenated index. For ex-
ample, suppose that an Order_ID has been generated by Oracle’s sequence mechanism
and contains the following five values: 10001, 10002, 10003, 10004, and 10005. If an
index is created on Order_ID as a B*-tree index, the five sequential values are stored as
adjacent leaf nodes on the B*-tree. As the index continues to grow, the BLEVEL starts
to degrade due to repetitive splitting of the branch nodes. But if a reverse key index is
created on Order_ID, the index values are stored as 10001, 20001, 30001, 40001, and
50001. Thus the values spread the leaf nodes more evenly and reduce the repetitive splits
of the branch nodes.
In general, a reverse key index is beneficial when a sequential key value is used to
populate the index. Figure 8.8 shows the commands for creating or altering a reverse key
index. The key word “reverse” tells Oracle to create the index as a reverse key. To alter
an existing index, the “rebuild” option is necessary. After altering the index it is advis-
able to rename the altered index so as to contain the word “revkey” as a part of its name.
This helps to identify the reverse key indices by name when retrieved from the
DBA_INDEXES view.
254 8. Optimization of the External Level of a Database
An index organized table appears the same as a regular table with a B*-tree index on the
primary key (PK). But instead of maintaining a data table and a B*-tree index table sepa-
rately, Oracle maintains one single B*-tree index structure that contains both the PK val-
ues and the other column values for each row. Since a separate data table is not stored, an
index organized table uses less storage than a regular table. Also, it provides faster PK-
based access for queries involving an exact match or a range scan of the PK. The rows in
the table do not have ROWIDs, but instead are stored as ordered by the PK. No secon-
dary indices can be created on the table.
The storage space for an index organized table is divided into two portions: index and
overflow. The index portion is stored in an index tablespace and the overflow portion in a
separate data tablespace. The splitting of rows in the table between the two areas is gov-
erned by two clauses in the CREATE TABLE command for the table:
• PCTTHRESHOLD p, where p is an integer and 0 ≤ p ≤ 50; and
• INCLUDING column_name, where column_name is a column in the table.
For a given value of p, PCTTHRESHOLD p returns a value that equals p% of the
data block size for the database. Whenever the size of a new or an updated row exceeds
this value, the row is split into two parts for storage. The column in the INCLUDING
clause and all the columns that are listed before that column in the CREATE TABLE
command for the table are stored in the index tablespace. The remaining column(s), if
any, are stored in a different tablespace called the overflow area.
Figure 8.9 shows the command for creating an index organized table and a sample
output from running the command. Note that although the storage structure for the table
is a B*-tree index and it is stored in an index tablespace, it is created via the CREATE
TABLE command rather than the CREATE INDEX command.
8.6 Implementation of Indices in Oracle 255
The PCTTHRESHOLD for the table is 25. Thus, if the data block size is 8 K, any row
larger than 2,048 bytes (= 2 K) is split into two parts for storage. The first three columns,
OREDER_ID, LINE_NUM, and QTY_SOLD, of the row are stored in INDEX_2, and
the remaining two columns, ITEM_NAME and SHIP_DATE, are stored in DATA_2.
The clause ORGANIZATION INDEX signifies that the rows of the table are stored in the
order of the PK. The default clause is ORGANIZATION HEAP, which indicates that the
rows are stored in the order of their physical entry into the table.
8.6.5 Cluster
A cluster is a group of two or more tables that are physically stored together in the same
disk area. Tables benefit from clustering when the following three conditions hold.
(a) The tables have common column(s).
(b) They are related by a PK–FK relationship.
(c) The tables are frequently joined via the (PK, FK) columns.
The physical proximity of the data in a cluster improves the performance of join que-
ries involving the clustered tables. The common column(s) of the tables stored in a clus-
ter constitute the cluster key, which refers to the rows in every table. The cluster key can
be the PK or a part of the PK of one or more of the tables. As an example, consider two
tables, ORDER and LINE_ITEM, with the following structures.
If ORDER and LINE_ITEM are frequently joined via the column OrderID, they are
good candidates for being stored as clustered tables with OrderID as the cluster key. Note
that the cluster key is the PK of ORDER and a part of the PK of LINE_ITEM. Figure
8.10 shows the storage arrangement of ORDER and LINE_ITEM as unclustered tables
and Figure 8.11 shows the same when the tables are stored in a cluster.
ORDER: LINE_ITEM:
OrderID Other Columns OrderID Line_Num Other Columns
2153 ......... 6195 001 .........
1694 ......... 6195 002 .........
1188 ......... 1694 001 .........
6195 ......... 3244 001 .........
3244 ......... 3244 002 .........
3244 003 .........
1188 001 .........
2153 001 .........
2153 002 .........
2153 003 .........
2153 004 .........
Cluster Key
(OrderID)
1188 LINE_ITEM Other Columns
001 .........
1694 LINE_ITEM Other Columns
001 .........
2153 LINE_ITEM Other Columns
001 .........
002 .........
003 .........
004 .........
3244 LINE_ITEM Other Columns
001 .........
002 .........
003 .........
6195 LINE_ITEM Other Columns
001 .........
002 .........
Each distinct value of the cluster key is stored only once in a clustered table configu-
ration regardless of whether it occurs once or many times in the tables, thereby resulting
in less storage space. However, clustered tables require more storage space than unclus-
8.6 Implementation of Indices in Oracle 257
tered tables. The sizing algorithm for clustered tables and indices is much more complex
than that for unclustered tables and indices. See Loney [6, pp. 193–197] for the former
and Appendix A for the latter. When retrieving data from the clustered tables, each table
appears as if it contains all its rows including duplication, if any. When two clustered ta-
bles are joined via their cluster key, the response time is generally reduced since the re-
turned data are stored mostly in physically contiguous data blocks. However, full table
scans are generally slower for clustered tables than for unclustered tables. Clustering is
not recommended in the following situations.
• At least one of the clustered tables is frequently used for full table scans. Such a table
uses more storage than if it were stored as an unclustered table. As a result, Oracle
has to read more data blocks to retrieve the data than if the table were unclustered.
This increases the time for full table scans.
• The data from all the tables with the same cluster key value uses more than two data
blocks. Here again Oracle has to read more data blocks to retrieve the data than if the
table were unclustered.
• Partitioning is not compatible with clustering.
Creation of Clusters
A cluster of two or more tables is created via the CREATE CLUSTER command. Each
table in a cluster is created via the CREATE TABLE command with the CLUSTER
clause. The maximum length of all the cluster columns combined for a single CREATE
CLUSTER command is 239 characters. At least one cluster column must be NOT NULL.
Tables with LONG columns cannot be clustered. Figure 8.12 shows the commands to
create the cluster ORDER_PROCESS with two tables, ORDER and LINE_ITEM, in that
cluster.
Table created.
SQL> CREATE TABLE LINE_ITEM
2 (OrderID NUMBER (10),
3 LINE_NUM NUMBER (3),
4 Item_Price NUMBER (8,2),
5 PRIMARY KEY (OrderID, LINE_NUM))
6 CLUSTER ORDER_PROCESS (OrderID);
Table created.
The clause SIZE used in Figure 8.12 represents the average size of a row in the clus-
ter. Thus, the size of a row in the cluster ORDER_PROCESS is estimated as 500 bytes.
An index cluster can be dropped via the DROP CLUSTER command. When the
clause INCLUDING TABLES accompanies this command, all tables and indices in-
cluded in the cluster are also dropped. Figure 8.14 shows this feature.
8.6 Implementation of Indices in Oracle 259
A hash cluster uses a hash function to calculate the location of a row by using the
value of the cluster key for that row. When a row is inserted into the cluster, the hash
function converts the value of the hash key for that row into the address of the row. The
hash function can be user defined or system generated. If it is user defined, the hash
cluster is created with a special clause HASH IS, which designates one column of the
cluster for storing the hash value. This clause is optional. It should be used only when the
cluster key is a single numeric column and its value is uniformly distributed. If it is
omitted, Oracle uses a system generated hash function. Another specific clause, which is
mandatory for creating a hash cluster, is HASHKEYS. It represents the maximum num-
ber of cluster key entries allowed for the cluster. Its value should be a prime number.
Otherwise, Oracle rounds it up to the nearest prime number. The reason for making
HASHKEYS a prime lies in the hashing algorithm used for computing the address of a
row based on its cluster key. See Section E6, Appendix E, for the mathematical theory
underlying hashing. Figure 8.15 shows the creation of a hash cluster in two flavors: user
supplied hash function and system generated hash function.
In Figure 8.15, the cluster ORDER_PROCESS is created first without the HASH IS
clause so that Oracle uses a system generated hash function. Then the cluster is dropped
and recreated with the HASH IS clause so that Oracle now uses the data in the column
OrderID, which is numeric, to compute the address of the corresponding row.
As with other indexing options discussed so far, a cluster is beneficial only under certain
circumstances. Three alternative options are available in this respect: indexed unclustered
table, index cluster, and hash cluster. One needs to assess these two factors to make an in-
formed decision:
• Type of the tables: static (validation) versus dynamic (transaction); and
• Nature of qualifying conditions in a query: exact match, range scan, other.
Figure 8.16 shows a decision tree describing the selection process that considers the
above two factors. Since a dynamic table undergoes frequent updates, clustering is not
recommended there. For static tables, the choice is clear for exact match and range scan.
The category of “Other” is vague, as always. The developer has to exercise his or her
judgment to make a decision.
8.6 Implementation of Indices in Oracle 261
Static Dynamic
Table Type
?
Index
Qualifying Uncluster
Condition
?
Exact Range
Match Scan Other
8.6.6 Histogram
Histograms are statistical entities that are used for describing the distribution of data. We
can have a histogram of the data in any column of a table, not just an indexed column.
However, Oracle uses them to determine the distribution of the data of an indexed col-
umn, especially when the data are not uniformly distributed. The cost-based optimizer
can accurately estimate the execution cost of an indexed query if the data in the indexed
column are uniformly distributed. But if the data are skewed, the query optimizer benefits
from using a histogram for estimating the selectivity of an indexed column. If the selec-
tivity is low due to the skewed distribution of data, the query optimizer may decide to use
a full table scan. As an example, consider a table EMPLOYEE containing personnel data
of a large Boston-based company. Although some of the employees may live in New
Hampshire, Maine, or Rhode Island, a large majority would reside in Massachusetts.
Consequently, the data of the column EMPLOYEE.State will be severely skewed in favor
of State = MA. A histogram of the data in State tells the query optimizer that an indexed
search by State is undesirable.
A histogram consists of a set of buckets. Each bucket contains information from the
same number of rows and has a starting point and an ending point. These two points dis-
play respectively the starting and ending values of the indexed column for the given
number of rows. The data dictionary views, DBA_HISTOGRAMS and DBA_TAB_
COLUMNS, contain information about histograms. Figure 8.17 contains the code to gen-
erate histograms and a partial list of the accompanying statistics. Note that only the col-
umns ENDPOINT_NUMBER and ENDPOINT_VALUE are stored in the view DBA_
HISTOGRAMS, because the starting value of bucket N, say, immediately follows the
ending value of bucket N – 1. The parameter SIZE determines the maximum number of
buckets allocated to the column.
262 8. Optimization of the External Level of a Database
Figure 8.17 shows that CUST_ID uses only 74 buckets out of the 100 allocated and
2,546 is the largest value of CUST_ID. On an average each bucket stores 34.4 (=
2,546/74 rounded) values of CUST_ID.
8.6.7 In Conclusion
As discussed in Sections 8.6.1 through 8.6.6, Oracle offers six different options to create
indices and get the benefit of indexed search. In conclusion, we ask: which index do we
use and when? The answer is an obvious cliche: it depends. In this section we list the
features of each type of index so that the developers and the DBAs can decide which
types of indices to use and how to create them in an optimal fashion.
The qualifying condition of a query involves a search based on an exact match or a
range scan. The following list summarizes the search feature of each of the six types of
indices.
Index Type Features
B*-tree Exact match, range scan for columns with high selectivity
Bitmap Exact match, range scan for columns with low selectivity
Reverse Key Sequential key value used to populate the index
Index Organized Exact match, range scan using PK-based access
Cluster Tables related via PK–FK link and frequently used together in joins
Histogram Exact match, range scan for nonuniformly distributed data
In general, B*-tree indices are used most widely followed by bitmap indices. Clusters
help join queries, but have their drawbacks too. One major problem with clusters is to
size them properly. Clusters, especially hash clusters, often waste storage. Also, we need
to keep in mind that creating a bad index is almost as harmful for performance as not cre-
ating an index at all. When multiple columns are candidates for an index, we must weigh
the two possible alternatives: one concatenated index on all the columns versus separate
single column indices on each separate column. Use the selectivity of the columns as the
guiding factor in both cases. Section 8.4 contains further details on optimal indexing.
This utility allows the developer to examine the access path selected by a query optimizer to
execute a query. Its real benefit lies in the fact that the execution plan is displayed without
actually executing the query. The developer can view the plan, examine a set of alternative
plans, and then decide on an optimal plan to execute. The utility works as follows.
(a) Run the script file UTLXPLAN.sql to create a table named PLAN_TABLE in your
account. This script and a whole set of additional UTL*.sql scripts are usually found
in the Oracle home directory listed below:
$ORACLE_HOME/rdbms/admin (for UNIX)
orant\rdbms80\admin (for NT)
Figure 8.18 shows the structure of PLAN_TABLE so that you can create it via the
CREATE TABLE command, if necessary:
REMARKS VARCHAR2(80)
OPERATION VARCHAR2(30)
OPTIONS VARCHAR2(30)
OBJECT_NODE VARCHAR2(128)
OBJECT_OWNER VARCHAR2(30)
OBJECT_NAME VARCHAR2(30)
OBJECT_INSTANCE NUMBER(38)
OBJECT_TYPE VARCHAR2(30)
OPTIMIZER VARCHAR2(255)
SEARCH_COLUMNS NUMBER
ID NUMBER(38)
PARENT_ID NUMBER(38)
POSITION NUMBER(38)
COST NUMBER(38)
CARDINALITY NUMBER(38)
BYTES NUMBER(38)
OTHER_TAG VARCHAR2(255)
PARTITION_START VARCHAR2(255)
PARTITION_STOP VARCHAR2(255)
PARTITION_ID NUMBER(38)
OTHER LONG
The query execution plan is determined by querying PLAN_TABLE using the col-
umns Statement_ID, Operation, Options, Object_Name, ID, Parent_ID, and Position. fig-
ures 8.20 and 8.21 contain two SELECT statements that put the execution plan of a query
in a tabular format and a nested format respectively. The tabular format displays each op-
eration and each option separately, whereas the nested format improves readability of the
plan, labels each step of the plan with a number, and assigns a cost for executing the
query under the cost-based optimizer.
Figures 8.20 and 8.21 show that the cost of the query is 88. The cost is not absolute. It
is meaningful only with respect to a given query. Thus, two queries with the same cost C,
say, do not necessarily perform equally well. But if a given query has n execution plans
with respective costs Ci , i = 1, 2, . . . , n, then the plan with the cost = min (Ci , i = 1, 2, . .
. , n) performs the best. Both figures display the types of operations the database must
perform to resolve the query irrespective of whether the query has been run already.
Thus, the output helps the developer to experiment with alternative versions of a query to
decide on the best access path that the developer wants.
The execution plan in the nested format is read from innermost to outermost steps and
from top to bottom. We start at the highest step number and move towards the lowest.
Two steps labeled as m.n1 and m.n2, say, are read in the order m.n1 followed by m.n2 if n1 <
n2. Let us now interpret the execution plan in Figure 8.21.
The query optimizer executes the plan steps in the following order.
4.1 TABLE ACCESS FULL CUSTOMER
4.2 TABLE ACCESS FULL ACCOUNT
Each operation involves a full table scan and returns rows from a table when the
ROWID is not available for a row search. Since the query does not contain any qualifying
8.7 Tools for Tracking Query Performance 269
conditions on the columns of either table, the query optimizer performs a full table scan
of CUSTOMER followed by the same of ACCOUNT until all rows have been read. It
then returns them to Step 3.1 for further processing.
3.1 HASH JOIN
This operation joins two tables by creating an in-memory bitmap of one of the tables
and then using a hash function to locate the matching rows of the other table. When one
table is significantly smaller than the other table and fits into the available memory area,
a hash join is performed instead of the traditional NESTED LOOPS join. Even if an in-
dex is available for the join, a hash join may be preferable to a NESTED LOOPS join.
Since CUSTOMER is significantly smaller than ACCOUNT, it is read into memory.
Then a hash function is used to retrieve all the matching rows of ACCOUNT. The result
is then returned to Step 2.1.
2.1 SORT ORDER BY
This operation is used for sorting results without elimination of duplicate rows. The
rows returned by step 3.1 are sorted first by Cust_name, and then by Acct_Name. The
sorted rows are then stripped of duplicates and the final result is returned to the user.
SELECT STATEMENT
This is the final statement of the execution plan. It merely indicates that the query in-
volves a SELECT statement as opposed to INSERT, UPDATE, or DELETE.
The two columns, OPERATION and OPTIONS, are most important for analyzing the
execution plan. Oracle offers 23 operations. An operation can have one or more
OPTIONS, as described below.
d. MINUS Subtracts the bits of one bitmap from another. This row
source is used for negated predicates and can be used
only if there are some nonnegated predicates yielding a
bitmap from which the subtraction can take place.
e. OR Computes the bitwise OR of two bitmaps.
3. CONCATENATION An operation that accepts multiple sets of rows and
returns the union of all the sets.
4. CONNECT BY A retrieval of rows in a hierarchical order for a query
containing a CONNECT BY clause.
5. COUNT An operation that counts the number of rows selected from a
table. It offers the following OPTION.
STOPKEY A count operation where the number of rows returned is
limited by the ROWNUM expression in the WHERE
clause.
6. FILTER An operation that accepts a set of rows, eliminates some of
them, and returns the rest.
7. FIRST ROW A retrieval on only the first row selected by a query.
8. FOR UPDATE An operation that retrieves and locks the rows selected by a
query containing a FOR UPDATE clause.
9. HASH JOIN An operation that joins two sets of rows, and returns the final
result. It offers two OPTIONS:
a. ANTI A hash antijoin.
b. SEMI A hash semijoin.
10. INDEX This is an access method. It offers three OPTIONS:
a. UNIQUE SCAN A retrieval of a single ROWID from an index.
b. RANGE SCAN A retrieval of one or more ROWIDs from an index.
Indexed values are scanned in ascending order.
c. RANGE SCAN A retrieval of one or more ROWIDs from
DESCENDING an index. Indexed values are scanned in descending
order.
11. INLIST ITERATOR It offers the OPTION:
CONCATENATED Iterates over the operation below it, for each value in
the “IN” list predicate.
12. INTERSECTION An operation that accepts two sets of rows and returns the
intersection of the sets, eliminating duplicates.
13. MERGE JOIN This is a join operation that accepts two sets of rows, each
sorted by a specific value, combines each row from one set
with the matching rows from the other, and returns the result.
It offers three OPTIONS:
a. OUTER A merge join operation to perform an outer join
statement.
8.7 Tools for Tracking Query Performance 271
As noted in Section 8.3, an operation can be a row or a set operation. A row operation
returns one row at a time, whereas a set operation processes all the rows and then returns
the entire set. We list below the 23 operations grouped under these two categories.
Row Operations Set Operations
AND-EQUAL FOR UPDATE
BITMAP HASH JOIN (hybrid of row
CONCATENATION and set)
CONNECT BY INTERSECTION
COUNT MERGE JOIN
FILTER MINUS
FIRST ROW REMOTE (can be row or set)
HASH JOIN (hybrid of SORT
row and set) UNION
INDEX VIEW
INLIST ITERATOR
NESTED LOOPS
PROJECTION
PARTITION
REMOTE (can be row or set)
SEQUENCE
TABLE ACCESS
8.7.2 SQLTRACE
The SQLTRACE utility both parses and executes a query, whereas the EXPLAIN PLAN
utility only parses it. As a result, SQLTRACE provides more information than EXPLAIN
PLAN, but also takes longer to run. The following two steps are performed for enabling
SQLTRACE and displaying its output.
(a) Set three initialization parameters as follows.:
• TIMED_STATISTICS = TRUE.
• MAX_DUMP_FILE_SIZE = your estimated size.
• USER_DUMP_DEST = designated directory.
The default value of TIMED_STATISTICS is FALSE. By setting it to TRUE, we
allow Oracle to collect runtime statistics such as CPU and elapsed times, various
statistics in the V$ dynamic performance views, etc.
When the SQL trace facility is enabled at the instance level, every call to the
server produces a text line in a file using the file format of the operating system. The
parameter MAX_DUMP_FILE_SIZE provides the maximum size of these files in
operating system blocks. Its default value is 10,000 operating system blocks. It can
accept a numerical value or a number followed by the suffix K or M, where K means
that the number is multiplied by 1,000 and M means a multiplication by 1,000,000.
The parameter can also assume the special value of UNLIMITED, which means that
there is no upper limit on the dump file size. In this case, the dump files can grow as
8.7 Tools for Tracking Query Performance 273
large as the operating system permits. This is an undesirable option due to its poten-
tial adverse impact on disk space and hence should be used with extreme caution. If
the trace output is truncated, increase the value of this parameter.
The parameter USER_DUMP_DEST specifies the pathname for a directory
where the server writes debugging trace files on behalf of a user process. The default
value for this parameter is the default destination for system dumps on the operating
system.
(b) Enable SQLTRACE at a session level or at the instance level as follows.
Session Level
Type the command ALTER SESSION SET SQL_TRACE TRUE; Then enter the
query to be traced. At the end of the session, disable the utility by typing ALTER
SESSION SET SQL_TRACE FALSE;
Instance Level
Set the initialization parameter SQL_TRACE = TRUE in the init<instance
name>.ora file. This causes Oracle to TRACE every query in the database resulting
in degradation of performance. Consequently, this option is not recommended. Al-
ternatively, this option may be activated for a short time, if needed, and then be de-
activated at the instance level by setting SQL_TRACE = FALSE, which is the de-
fault.
The output file of a SQLTRACE session under UNIX is named in the format
SID_ora_PID.trc, where SID is the instance name and PID is the server process ID. The
file consists of three sections described below.
SQL Statement
This section consists of the query that was executed. It allows the developer or the DBA
to examine the query when analyzing the rest of the output.
Statistical Information
This section has two parts. The first part is tabular in form and the second part is a list of
items. The tabular part has eight columns and four rows. The first column is labeled
“call” and has four values: parse, execute, fetch, and totals, which is the sum of the first
three columns. For each value of call, the remaining seven columns provide the corre-
sponding statistics. These eight columns are described below:
CALL Type of call made to the database, i.e., parse, execute, or fetch;
COUNT Number of times a statement was parsed, executed, or fetched;
CPU Total CPU time in seconds for all parse, execute, or fetch calls for the
Statement;
ELAPSED Total elapsed time in seconds for all parse, execute, or fetch calls for the
statement;
DISK Total number of data blocks physically read from the datafiles on disk
for all parse, execute, or fetch calls;
QUERY Total number of buffers retrieved from the data block buffer cache for all
274 8. Optimization of the External Level of a Database
EXPLAIN PLAN
This section describes the execution plan of the query under two headers, rows and exe-
cution plan. The latter displays the steps of the plan in a nested format as in Figure 8.21
without the step numbers and the former displays the number of rows returned by each
step of the plan. Figure 8.22 shows the transcript of a SQLTRACE session.
8.7.3 TKPROF
TKPROF is an operating system utility that converts the output file from a SQLTRACE
session into a readable format. Its syntax is as follows.
TKPROF filename1 filename2 [PRINT=number] [AGGREGATE=NO]
[INSERT=filename] [SYS=YES/NO] [TABLE=filename]
[RECORD=filename] [EXPLAIN=username/password]
[SORT=parameters]
The arguments and their respective values are explained below.
filename1: Name of the input file containing statistics produced by SQLTRACE; it can
be a trace file for a single session or a file produced by concatenating individual trace
files from multiple sessions.
filename2: Name of the output file to which TKPROF writes its formatted output
[PRINT=number]: Number of statements to be included in the output file; if omitted,
TKPROF lists all traced SQL statements.
[AGGREGATE=NO]: If specified as AGGREGATE = NO, TKPROF does not aggre-
gate multiple users of the same SQL text.
[INSERT=filename]: Creates a SQL script labeled filename that stores the trace statistics
in the database; the script creates a table and inserts a row of statistics for each traced
SQL statement into the table.
[SYS=YES/NO]: Enables (YES) or disables (NO) the listing of recursive SQL statements
issued by the user SYS into the output file. The default is YES. Regardless of the value
selected for this parameter, the statistics for all traced SQL statements, including recur-
sive SQL statements, are inserted into the output file.
[TABLE=filename]: The value of filename specifies the schema and name of the table
into which TKPROF temporarily places execution plans before writing them to the output
file. If the specified table already exists, TKPROF deletes all rows in the table, uses it for
the EXPLAIN PLAN command (which writes more rows into the table), and then deletes
those rows. If this table does not exist, TKPROF creates it, uses it, and then drops it. The
specified user must be able to issue INSERT, SELECT, and DELETE statements against
the table. If the table does not already exist, the user must also be able to issue CREATE
TABLE and DROP TABLE statements.
[RECORD=filename]: Creates a SQL script with the specified filename for storing all of
the nonrecursive SQL in the trace file.
[EXPLAIN=username/password]: Prepares the execution plan for each SQL statement
in the trace file and writes these execution plans to the output file. TKPROF produces the
execution plans by issuing the EXPLAIN PLAN command after connecting to Oracle
with the specified username/password. TKPROF takes longer to process a large trace file
if the EXPLAIN option is used. This option is related to the option [TABLE=filename]
listed above. If this option is used without the option [TABLE=filename], TKPROF uses
276 8. Optimization of the External Level of a Database
the table PROF$PLAN_TABLE in the schema of the user specified by the EXPLAIN
option. If, on the other hand, the TABLE option is used without the EXPLAIN option,
TKPROF ignores the TABLE option.
[SORT=parameters]: Oracle provides a large number of sorting choices under this op-
tion. The traced SQL statements are sorted in descending order of the specified sort op-
tions before listing them in the output file. If more than one sort option is specified, the
output is sorted in descending order by the sum of the values specified in the sort options.
If this parameter is omitted, TKPROF lists statements in the output file in order of first
use. The available sort options are listed below:
PRSCNT = Number of times parsed;
PRSCPU = CPU time spent in parsing;
PRSELA = Elapsed time spent in parsing;
PRSDSK = Number of physical reads from disk during parse;
PRSQRY = Number of consistent mode block reads during parse;
PRSCU = Number of current mode block reads during parse;
PRSMIS = Number of library cache misses during parse;
EXECNT = Number of executes;
EXECPU = CPU time spent in executing;
EXEELA = Elapsed time spent in executing;
EXEDSK = Number of physical reads from disk during execute;
EXEQRY = Number of consistent mode block reads during execute;
EXECU = Number of current mode block reads during execute;
EXEROW = Number of rows processed during execute;
EXEMIS = Number of library cache misses during execute;
FCHCNT = Number of fetches;
FCHCPU = CPU time spent in fetching;
FCHELA = Elapsed time spent in fetching;
FCHDSK = Number of physical reads from disk during fetch;
FCHQRY = Number of consistent mode block reads during fetch;
FCHCU = Number of current mode block reads during fetch;
FCHROW = Number of rows fetched;
We describe below the procedure to generate an output file under TKPROF. Let us
assume that
• Trace file = Ora00118.TRC,
• USER_DUMP_DEST = G:\oradba\admin\ordr\udump, and
• Oracle Home directory = E:\orant\RDBMS80\TRACE.
Hence the trace file Ora00118.TRC resides in the directory G:\oradba\admin\ordr\udump.
Now proceed as follows.
• Run the query.
• From the Oracle Home directory run the command
8.7 Tools for Tracking Query Performance 277
tkprof80 G:\oradba\admin\ordr\udump\Ora00118.TRC
C:\Ora00118_TRC.out
8.7.4 AUTOTRACE
AUTOTRACE produces the execution plan without using TKPROF to format the output.
To activate AUTOTRACE, we issue the following command within SQL*Plus
sql> SET AUTOTRACE ON TIMING ON
278 8. Optimization of the External Level of a Database
and then enter the query statement to be AUTOTRACEd. The output consists of four
parts:
(a) Result of executing the query;
(b) Execution time of the query measured in realtime, not the CPU time; the unit of time
is the millisecond, i.e., 1/1,000th of a second;
(c) Execution plan of the query;
(d) Statistics of system resources used such as logical and physical reads, sorts done,
number of rows returned, etc.
However, the output is less detailed than that produced by SQLTRACE and
TKPROF. Figure 8.24 contains the transcript of an AUTOTRACE session.
Statistics
----------------------------------------------------------
0 recursive calls
6 db block gets
11 consistent gets
0 physical reads
0 redo size
2586 bytes sent via SQL*Net to client
908 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
20 rows processed
Key Words
access, indexed exact match
access, sequential execute phase
AND logic EXPLAIN PLAN
AUTOTRACE fetch phase
B*-tree hash cluster
B*-tree index hash function
bitmap index hash function, system generated
BLEVEL hash function, user supplied
bucket hash join
cluster hashing
cluster join hint
CLUSTERING_FACTOR index cluster
concatenated index index organized table
cost of query processing internal representation
decision tree leaf node
dense index LEAF_BLOCKS
driver least cost
driving table limiting condition
dynamic performance view nested loops join
ending point nondense index
equijoin optimization, cost-based
280 8. Optimization of the External Level of a Database
All of the above books except [7] discuss query tuning under both rule-based and
cost-based optimizers in Oracle. But none of them offers the unified view of addressing
the database performance tuning problems at all three levels jointly, i.e., conceptual
(Chapter 3), internal (Chapters 4 through 7), and external (Chapters 8 and 9), as discussed
here. Mittra [7, Chapter 5] discusses the theoretical issue of query optimization using re-
lational algebra and relational calculus. Corey et al. [5, p. 173–174] discuss the three
phases of query processing. Aronoff et al. [1, pp. 264–269], Ault [2, Chapter 10], Bo-
browski [3, p. 458–465], Burleson [4, pp. 191–196], Corey et al. [5, pp. 188–193], and
Niemiec [8, Chapter 8] treat the topic of optimal indexing and selectivity from various
viewpoints with many examples. In particular, we mention Bobrowski [3, pp. 465–472]
for examples of clustering, and Aronoff et al. [1, Chapter 13] for good examples of his-
tograms and an extensive treatment of the row and set operations with plenty of examples
of their EXPLAIN PLAN output.
Exercises 281
Exercises
Theoretical exercises are of little value for this chapter since the best practice comes from
optimizing the performance of queries in your own development environment before they
migrate to the test or production databases. The exercises given below identify some of
the areas not specifically covered in Chapter 8 and should be considered an extension of
the text.
1. Refer to Figure 8.3. Write a program using some procedural language such as Pro*C,
UNIX shell scripting, etc. according to the following specifications:
(a) Ask the user to enter a column name for computing its selectivity.
(b) Ask the user if more column(s) need to be added to the set before computing the
selectivity.
(c) Repeat Step (b) until the user is finished with input.
(d) Connect to the Oracle database, and compute the selectivity of the column(s) en-
tered by the user.
(e) Format an output to display the result nicely.
2. A database contains the following objects.
CUSTOMER (Cust_ID, Cust_Name, Cust_Type, Cust_Region, ….)
ORDER (Order_ID, …)
LINE_ITEM (Order_ID, Line_Number, Line_Amount, …)
CUST_ORDER (Cust_ID, Order_ID, …)
CUSTOMER.PK = Cust_ID; ORDER.PK = Order_ID;
LINE_ITEM.PK = (Order_ID, Line_Number);
CUST_ORDER.PK = (Cust_ID, Order_ID)
In addition, two nonunique B*-tree indices are created on the two columns
CUSTOMER. Cust_Type and CUSTOMER.Cust_Region.
You want to produce a report with the specification:
• Headers: Customer Name, Region, Type, Total Order Amount;
• Region = East or North;
• Type = Residential.
Note that each order has one or more line items, each of which has a correspond-
ing line amount. You must add all the line amounts for a given order to get the or-
der amount for that order. Also, Region has four values: East, West, North, South.
Type has two values: Industrial, Residential.
(a) Run the query to generate the report specified above.
(b) Use EXPLAIN PLAN to find the execution plan of the query and interpret the
plan. In particular, find the driver of the query and determine if that table is in-
deed the best candidate for the driver.
282 8. Optimization of the External Level of a Database
(c) Get runtime statistics in two ways: using SQLTRACE with TKPROF, and using
AUTOTRACE with TIMING ON.
(d) Verify that you get the same execution plan in both cases.
3. Redo Exercise 2 with the change:
Drop the B*-tree indices on the two columns, CUSTOMER.Cust_Type and
CUSTOMER.Cust_Region and create two bitmap indices on them. What differences
do you notice in the execution plan and the runtime statistics under the new indices?
4. Refer to Exercises 2 and 3 above. You have collected runtime statistics in both cases
using SQLTRACE with TKPROF, and using AUTOTRACE with TIMING ON. List
the statistics collected in both cases under the three headers:
• Common to both SQLTRACE with TKPROF, and AUTOTRACE with
TIMING ON;
• Unique to SQLTRACE with TKPROF;
• Unique to AUTOTRACE with TIMING ON.
9
Query Tuning and Optimization Under Oracle 8i
Outline
9.1 Oracle Query Performance
9.2 Query Tuning in Oracle: General Principles
9.3 Query Tuning in Oracle: Cost-based Optimizer
9.4 Query Tuning in Oracle: Rule-based Optimizer
9.5 Tuning of Join Queries
9.6 Statistical Forecasting for Tracking Performance
Key Words
References and Further Reading
Exercises
eral guidelines that apply to both categories, the specific tools for the cost-based opti-
mizer are different from those for the rule-based optimizer. We discuss the general prin-
ciples in Section 9.2 and the principles specific to an optimizer in Sections 9.3 and 9.4.
A query performs poorly when its data have to be retrieved from disks via physical reads.
The higher the number of physical reads, the worse the performance. On the other hand,
if a query uses too much of the data block buffer cache in SGA, there may not be enough
left for use by other queries running simultaneously. If the latter situation persists, the
DBA should increase the value of the initialization parameter DB_BLOCK_BUFFERS.
Therefore, to identify the poorly performing queries we need to follow a three-step meth-
odology:
(a) Identify queries using high amount of physical reads;
(b) Identify queries using high amount of logical reads;
(c) Identify the user running queries that need tuning.
Figure 9.1 contains a script that displays queries, the numbers of physical and logical
reads for each query, and their respective users. Figure 9.2 contains a sample output from
running the script. Some of the columns from V$SQLAREA used in the script are ex-
plained below:
9.2 Query Tuning in Oracle: General Principles 285
FIGURE 9.1: Script Showing Queries with Physical and Logical Reads
286 9. Query Tuning and Optimization Under Oracle 8i
ACCEPT SPOOL_NAME
PROMPT
PROMPT Enter threshold for physical reads
PROMPT
ACCEPT P_READS_THRESHOLD
PROMPT
PROMPT Enter threshold for logical reads
PROMPT
ACCEPT L_READS_THRESHOLD
PROMPT
SPOOL &SPOOL_DIR\&SPOOL_NAME
SELECT B.USERNAME, A.DISK_READS, A.EXECUTIONS Executions,
ROUND (A.DISK_READS / DECODE (A.EXECUTIONS, 0, 1, A.EXECUTIONS),
4)
P_RATIO, A.SQL_TEXT, TO_CHAR (SysDate, 'fmMonth ddth, YYYY')
TODAY
FROM V$SQLAREA A, DBA_USERS B
WHERE A.PARSING_USER_ID = B.USER_ID
AND A.DISK_READS > &P_READS_THRESHOLD
AND USERNAME != 'SYS'
ORDER BY A.DISK_READS DESC;
SELECT B.USERNAME, A.BUFFER_GETS, A.EXECUTIONS Executions,
ROUND (A.BUFFER_GETS/DECODE(A.EXECUTIONS, 0, 1, A.EXECUTIONS), 4)
L_RATIO, A.SQL_TEXT, TO_CHAR (SysDate, 'fmMonth ddth, YYYY')
TODAY
FROM V$SQLAREA A, DBA_USERS B
WHERE A.PARSING_USER_ID = B.USER_ID
AND A.BUFFER_GETS > &L_READS_THRESHOLD
AND USERNAME != 'SYS'
ORDER BY A.BUFFER_GETS DESC;
FIGURE 9.1 (continued): Script Showing Queries with Physical and Logical Reads
FIGURE 9.2: Output Showing Queries with Physical and Logical Reads
9.2 Query Tuning in Oracle: General Principles 287
FIGURE 9.2 (continued): Output Showing Queries with Physical and Logical Reads
Once we identify queries that need tuning, we need to contact the users running these
queries and ask them to take corrective action as described in Sections 9.3 and 9.4. This
is often an iterative process that needs close monitoring by the DBA.
288 9. Query Tuning and Optimization Under Oracle 8i
Full table scans are mostly time consuming operations. Since they are heavily used by
batch transactions, tuning queries involving full table scans benefit the batch jobs. The
data blocks read by a full table scan are always marked as least recently used and are,
therefore, removed quickly from the data block buffer cache in the SGA under the LRU
algorithm. This is known as buffer aging. As a result, if a subsequent query needs the
same data, the corresponding data blocks have to be fetched again into the data block
buffer cache. This phenomenon slows down the execution of queries using full table
scans. The problem can be addressed by using the hint CACHE (see Section 9.3 for full
details) in queries involving full table scans. This causes Oracle to retain the data blocks
used by full table scans in the data block buffer cache in the SGA so that subsequent que-
ries can access those blocks via logical reads. In addition, full table scans can be paral-
lelized by using the hint PARALLEL (see Section 9.3).
9.2 Query Tuning in Oracle: General Principles 289
It was noted in Section 8.6.1 that the number of levels in a B*-tree should not exceed
four. As a new level is added to the tree, an extra data block has to be read to retrieve the
ROWID of the desired row. Since the data blocks are not read sequentially, they each re-
quire an extra disk I/O. The level of a B*-tree increases due to three factors; the size of
the table, a very narrow range of the index values, and a large number of deleted rows in
the index. This situation may arise due to continuous usage and update of the table.
Niemiec [3, pp. 76–78] recommends that if the number of deleted rows in an index ap-
proaches 20 to 25 % of the total rowcount of the index table, rebuilding the index is ap-
propriate. This will reduce the number of levels and the amount of empty space that is
being read during a disk I/O. The command for rebuilding an index is given below:
ALTER INDEX index REBUILD PARALLEL
TABLESPACE tablespace
STORAGE (storage clause);
The REBUILD option allows the index to be rebuilt using the existing index and
without using the underlying table. There must be enough disk space to store both indices
during the operation.
See Section 10.11 for rebuilding indices online.
PL/SQL is Oracle’s proprietary procedural language that augments SQL, which is non-
procedural. PL/SQL provides the three programming structures, assignment, decision,
and iteration, allowing a developer to write structured programs and use embedded SQL
to interface with an Oracle database. One major difference between a structured program
and a SQL query is that the former works on one record at a time, and the latter works
with a group of records returned by the query. As a result, any database procedural lan-
guage such as PL/SQL needs a mechanism to process one record at a time from a group
of records returned by a SQL query. The cursor of a PL/SQL program provides this
mechanism. The program manages a cursor by using four commands:
• DECLARE CURSOR cursor_name IS (select STATEMENT);
• OPEN CURSOR cursor_name;
• FETCH CURSOR cursor_name INTO variable(s);
• CLOSE CURSOR cursor_name;
A cursor can be explicit or implicit. An explicit cursor is created by the developer as a
part of a PL/SQL program and is handled with the four commands listed above. An im-
plicit cursor is any SQL statement issued by a user. It can involve SELECT, INSERT,
UPDATE, or DELETE. It is an unnamed address where the SQL statement is processed
by SQL or by the cursor handler of PL/SQL. Every time a SQL operation is requested by
the user, an implicit cursor is used. It can be introduced in a PL/SQL program through a
290 9. Query Tuning and Optimization Under Oracle 8i
query that does not identify the cursor by name, but instead contains the SQL statement
that defines the scope of the cursor. Figures 9.3 and 9.4 contain examples of an explicit
and an implicit cursor respectively.
Although functionally both explicit and implicit cursors are equivalent, the former is
preferable for query tuning. An explicit cursor as in Figure 9.3 is not only easy to read,
but also performs better. It issues a single call to the database for data. An implicit cursor
as shown in Figure 9.4 does not mention the cursor in the code. Consequently, Oracle
generates two calls to the database: first, to fetch the required data, and, second, to check
for any error condition that the first call may have detected. Therefore, it is preferable to
use explicit cursors in the PL/SQL programs.
9.2.6 Denormalization
We discussed denormalization in Section 3.3.1 from the viewpoint of the conceptual level
of a database. When the denormalization takes place, the queries using the changed tables
will be affected. For example, if computed columns such as TOTAL_SALE,
REVENUE_BY_REGION, etc. are added to a table, the query formulation will change.
If some lookup tables are dropped and pulled into a transaction table, join queries will be
affected. All such queries must then be examined via EXPLAIN PLAN for performance
and must be retuned, if necessary.
9.2 Query Tuning in Oracle: General Principles 291
We discussed integration of views into queries in Section 3.5 from the viewpoint of the
conceptual level of a database. Here we address it from a query tuning viewpoint. If a
query includes a view, the query optimizer has two options for processing the query:
• Resolve the view first and then resolve the query by integrating the result set re-
turned by the view into the query;
• Integrate the definition of the view into the rest of the query and then execute the re-
sulting query.
Usually, the second option performs better with respect to response time, because the
view as a set operation is replaced with a row operation. In particular, if the view returns
a large set or if the result of the view is subjected to additional filtering conditions, the
query benefits from using the second option above. But if the view includes a grouping
operation such as SUM, COUNT, DISTINCT, GROUP BY, etc., the second option is not
available. In this case, the grouping operation may be transferred from the view to the
body of the query, the view redefined without the grouping operation, and then the query
reformulated to yield the same result. However, if such a reformulation is not possible
due to the nature of the operation, then the first option is the only one available.
Target Query
Run EXPLAIN
PLAN
Yes Indexing
Optimal?
No
No Query
Time Critical? Index properly
Yes
Run TKPROF or
END AUTOTRACE
for timing statistics
Execution Time
Acceptable? No
A variety of hints is available for query optimization. The hints belong to six distinct
categories. They are described below with a brief description of each along with the cate-
gories to which they belong:
• Optimization approach and goal for a SQL statement,
• Access path for a table accessed by the statement,
• Join order for a join statement,
• Join operation in a join statement,
• Parallel execution, and
• Other.
Access Path
Hints belonging to this category cause the optimizer to use the specified access path only
if the access path is available based on the existence of an index or a cluster and the syn-
tax of the query. If a hint specifies an unavailable access path, the optimizer ignores it.
There are 15 hints in this category:
FULL (table) Uses a full table scan of table.
ROWID (table) Uses a table scan by ROWID of table.
CLUSTER (table) Uses a cluster scan of table; applies only to clustered objects.
HASH (table) Uses a hash scan of table; applies only to tables stored in a
cluster.
HASH_AJ Transforms a NOT IN subquery into a hash antijoin.
HASH_SJ Transforms a correlated EXISTS subquery into a hash semijoin.
INDEX (table Uses an index scan of table by index and may optionally specify
index…index) one or more indices as listed below.
• If a single available index is specified, the optimizer performs a scan on the index. It
does not consider a full table scan or a scan on another index on the table.
• If a list of available indices is specified, the optimizer considers the cost of a scan on
each index in the list and then performs the index scan with the least cost. The opti-
mizer may also choose to scan multiple indices from this list and merge the results, if
such an access path has the lowest cost. The optimizer does not consider a full table
scan or a scan on an index not listed in the hint.
• If no indices are specified, the optimizer considers the cost of a scan on each avail-
able index on the table and then performs the index scan with the least cost. The
optimizer may also choose to scan multiple indices and merge the results, if such an
access path has the least cost. The optimizer does not consider a full table scan.
INDEX_ASC Chooses an index scan of table by index for the specified table.
(table index) If the query uses an index range scan, Oracle scans the index en-
tries in ascending order of their indexed values.
INDEX_DESC Same as INDEX_ASC except that Oracle scans the index entries
(table index) in descending order of their indexed values.
INDEX_COMBINE If no indices are given as arguments for the hint, the optimizer
(table index) uses the Boolean combination of bitmap indices that has the best
cost estimate. If certain indices are given as arguments, the opti-
mizer tries to use some Boolean combination of those particular
bitmap indices.
INDEX_FFS Performs a fast full index scan of table by index instead of a full
(table index) table scan. It accesses only the index and not the corresponding
table. So, this hint is used only when all the information needed
by the query resides in the index as in an index-organized table.
See Section 8.6.4 for more detail.
9.3 Query Tuning in Oracle: Cost-based Optimizer 295
Join Order
Hints belonging to this category cause the optimizer to use the join order specified in the
hint. There are two hints in this category.
ORDERED Joins tables in the order in which they appear in the FROM
clause. If this hint is omitted, the optimizer chooses the order in
which to join the tables. This hint benefits the optimization if the
developer knows something about the number of rows selected
from each table that the optimizer does not. For example, if the
tables have not been ANALYZEd recently, the statistics become
misleading. Here the developer can choose an inner and outer ta-
ble better than the optimizer.
STAR Forces a star query plan to be used if possible. A star plan places
the largest table in the query last in the join order and joins it with
a nested loops join on a concatenated index. This hint applies
when there are at least three tables, the concatenated index of the
large table has at least three columns, and there are no conflicting
access or join method hints. The optimizer also considers
different permutations of the small tables. The most precise
method is to order the tables in the FROM clause in the order of
the keys in the index, with the large table last.
Join Operation
Hints belonging to this category cause the optimizer to use the join operation specified in
the hint. Oracle recommends that USE_NL and USE_MERGE hints be used with the
ORDERED hint for optimizing performance. There are four hints in this category:
USE_NL (table) Joins each specified table to another row source with a nested
loops join using the specified table as the inner table. By default,
the inner table is the one that appears immediately after the
FROM clause.
296 9. Query Tuning and Optimization Under Oracle 8i
USE_MERGE Joins each specified table with another row source with a sort–
(table) merge join.
USE_HASH Joins each specified table with another row source with a hash
(table) join.
DRIVING_SITE Executes the query at a different site than that selected by the (ta-
ble) optimizer using table as the driver. This hint can be used
with both rule-based and cost-based optimization.
Parallel Execution
Hints belonging to this category cause the optimizer to decide how statements are paral-
lelized or not parallelized when using parallel execution. There are six hints in this cate-
gory.
PARALLEL (table Specifies the desired number of concurrent servers that can be
integer integer) used for a parallel operation on table. The first integer specifies
the degree of parallelism for the given table; the second integer
specifies how the table is to be split among the instances of a par-
allel server. It applies to INSERT, UPDATE, and DELETE
statements in addition to SELECT. If any parallel restrictions are
violated, the hint is ignored.
NOPARALLEL Overrides a PARALLEL specification in the table clause for
(table) table. In general, hints take precedence over table clauses.
APPEND When used with INSERT, data ARE appended to a table without
using the existing free space in the block. If INSERT is parallel-
ized using the PARALLEL hint or clause, the append mode is
used by default. The NOAPPEND hint can be used to override
the append mode. The APPEND hint applies to both serial and
parallel inserts.
NOAPPEND Overrides APPEND hint.
PARALLEL_INDEX Specifies the desired number of concurrent servers that can be
(table index integer used to parallelize index range scans for partitioned indices
integer)
NOPARALLEL_ Overrides a PARALLEL attribute setting on an index.
INDEX (table index)
Other Hints
This is a catch-all category that includes eight hints listed below.
CACHE (table) Specifies that the blocks retrieved for this table are placed at the
most recently used end of the LRU list in the buffer cache when a
full table scan is performed. This option is useful for small
lookup tables. See Section 9.2.3 for additional details.
9.3 Query Tuning in Oracle: Cost-based Optimizer 297
NOCACHE (table) Specifies that the blocks retrieved for this table are placed at the
least recently used end of the LRU list in the buffer cache when a
full table scan is performed. This is the default behavior of blocks
in the data buffer cache in SGA.
MERGE (table) Works in conjunction with the setting of the initialization pa-
rameter COMPLEX_VIEW_MERGING. When this parameter is
set to TRUE, complex views or subqueries are merged for proc-
essing. When set to its default value FALSE, this parameter
causes complex views or subqueries to be evaluated before the
referencing query is processed. In this case, the MERGE hint
causes a view to be merged on a per-query basis.
NO_MERGE (table) Prevents the merging of mergeable views, thereby allowing the
developer to have more control over the processing of views.
When the initialization parameter COMPLEX_VIEW_
MERGING is set to TRUE, the NO_MERGE hint within the
view prevents a designated query from being merged.
PUSH_JOIN_PRED Works in conjunction with the setting of the initialization
(table) parameter PUSH_JOIN_PREDICATE. When this parameteris set
to TRUE, the optimizer can evaluate, on a cost basis, , whether
pushing individual join predicates into the view will benefit the
query. This can enable more efficient access paths and join meth-
ods, such as transforming hash joins into nested loop joins, and
full table scans to index scans. If the initialization parameter
PUSH_JOIN_PREDICATE is set to FALSE, the PUSH_JOIN_
PRED hint forces the pushing of a join predicate into the view.
NO_PUSH_JOIN Prevents the pushing of a join predicate into the view when the
_PRED (table) initialization parameter PUSH_JOIN_PREDICATE is set to
TRUE.
PUSH_SUBQ Causes nonmerged subqueries to be evaluated at the earliest
possible place in the execution plan. Normally, such subqueries
are executed as the last step in the execution plan. But if the
subquery is relatively inexpensive and reduces the number of
rows significantly, performance improves if the subquery is
evaluated earlier. The hint has no effect if the subquery is applied
to a remote table or one that is joined using a merge join.
STAR_TRANS- Causes the optimizer to use the best plan in which the
FORMATION transformation has been used. Without the hint, the optimizer
makes a cost-based decision to use the best plan generated with-
out the transformation, instead of the best plan for the trans-
formed query. Even if the hint is given, there is no guarantee that
the transformation will occur. The optimizer will generate the
subqueries if it seems reasonable to do so. If no subqueries are
298 9. Query Tuning and Optimization Under Oracle 8i
generated, there is no transformed query, and the best plan for the
untransformed query is used regardless of the hint.
A hint is included in a query by using any one of the following two syntax.
SELECT /*+ hint_name */ column(s)
FROM table(s) WHERE condition(s);
SELECT --+ hint_name column(s)
FROM table(s) WHERE condition(s);
The slash asterisk (/*) or a pair of hyphens (--) can be used to signify a hint. The plus
sign (+) must immediately follow * or -- without an intervening space. If /* is used, the
hint must be closed with */. No such closing punctuation is needed for the -- version. The
hint_name can be any one of the hints described above. If multiple hints are used, each
must be separated from the next with a space. But multiple hints are not recommended
since the query optimizer may get confused by them and may even ignore some or all of
them. If an alias is used for a table, the alias must be used in the hint instead of the table
name. When using a hint it is advisable to run the hinted query with the EXPLAIN PLAN
or AUTOTRACE (see Sections 8.7.1 and 8.7.4) option to ensure that the hint has been
properly formulated and has indeed been used by the optimizer.
We now give below several examples of queries using some of the hints listed above.
FULL: SELECT /*+ FULL (A) */ Name, Balance
FROM CUST_ACCOUNT A WHERE Cust_ID = 1846;
INDEX_DESC: SELECT /*+ INDEX_DESC (ORDER PK_ORDER) */
Order_ID, Cust_Name, Order_Amount FROM ORDER;
ORDERED: SELECT --+ ORDERED ORDER.Order_ID, Line_Num,
Item_Name FROM ORDER, LINE_DETAIL WHERE
ORDER.Order_ID = LINE_DETAIL. Order_ID
AND Order_Amount > 10000;
DRIVING_SITE: SELECT /*+ DRIVING_SITE (A) */
* FROM ORDER, LINE_DETAIL@CHICAGO A WHERE
ORDER.Order_ID = A.Order_ID;
PUSH_SUBQ: SELECT --+ PUSH_SUBQ ORDER.Order_ID, Line_Num,
Item_Name FROM ORDER, LINE_DETAIL WHERE
ORDER.Order_ID = LINE_DETAIL. Order_ID AND
Name IN
(SELECT Name FROM ORDER WHERE
Order_Amount > 10000);
9.4 Query Tuning in Oracle: Rule-based Optimizer 299
Statistics
------------------------------------------------------
0 recursive calls
6 db block gets
11 consistent gets
0 physical reads
0 redo size
2587 bytes sent via SQL*Net to client
908 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
20 rows processed
select /*+ RULE */ Cust_Name, Account_Name, Status
from Customer A, Account B
where A.Cust_ID = B.Cust_ID
and rownum < 21
order by Cust_name, Account_Name;
OUTPUT from Rule-based Optimizer:
20 rows selected.
real: 1903
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: RULE
1 0 SORT (ORDER BY)
2 1 COUNT (STOPKEY)
3 2 NESTED LOOPS
4 3 TABLE ACCESS (FULL) OF 'ACCOUNT'
5 3 TABLE ACCESS (BY INDEX ROWID) OF 'CUSTOMER'
6 5 INDEX (UNIQUE SCAN) OF 'PK_CUSTOMER' (UNIQUE)
Statistics
----------------------------------------------------------
0 recursive calls
3 db block gets
43 consistent gets
2 physical reads
0 redo size
2588 bytes sent via SQL*Net to client
929 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
20 rows processed
FIGURE 9.6 (continued): Execution Plans Under Cost- and Rule-based Optimizers
9.5 Tuning of Join Queries 301
Note that the two plans are not identical. The cost-based optimizer uses a hash join of
CUSTOMER and ACCOUNT, and the rule-based optimizer uses a nested loops join of
those two tables. Also, the runtime statistics are close but not identical. The real time is
slightly different, 1,922 (= 1.922 seconds) for cost versus 1,903 (= 1.903 seconds) for
rule.
Selection of the First Table in the Join, Which Is Called the Driving
Ttable or the Driver
This table may undergo a full table scan. Hence its size is crucial for the performance.
Under rule-based optimization the driver is selected as follows.
• If one table is indexed and the other is not, then the driving table is the one without
an index.
• If both tables are indexed, then the table having the index with a lower rank becomes
the driving table.
• If both tables are indexed and the indices have the same rank, then the table on the
right becomes the driving table.
Under cost-based optimization with an ORDERED hint the driver is the first table
listed under the FROM clause. If no hint is used, the cost-based optimizer selects the
driver according to the available statistics on the table and the associated cost of the
query.
Nested Loops
The optimizer
1. Reads the first row of A;
2. Retrieves the value of A.column_1;
3. Finds all rows in B for which B.column_1 matches A.column_1;
4. If no match is found, reads the next row in A;
5. If a match is found, sends A.column_1, A.column_2, B.column_3 from the matching
rows in A and B to the result set;
6. Reads the next row of A;
7. Repeats Steps (2) through (6) until all rows in A are read and processed via a full ta-
ble scan of A.
Figure 9.7 contains a flowchart of a nested loops join.
9.5 Tuning of Join Queries 303
Read next
row of A
Retrieve
A.column_1
Rows where
A.column_1=
No B.column_1 ?
Yes
Send data
to result set
More rows in A
Yes
No
End
The nested loops join works with any join condition, not necessarily just an equijoin,
i.e., where the join condition is an equality. It is very efficient when A is a small table
related to B in a 1:N relationship and B.column_1 is uniquely indexed or is highly selec-
tive in a nonunique index. Otherwise, this join can cause a performance drag.
304 9. Query Tuning and Optimization Under Oracle 8i
Sort–Merge Join
The optimizer
1. Sorts all the rows of A by A.column_1 to SORTED_A, say;
2. Sorts all the rows of B by B.column_1 to SORTED_B, say;
3. If SORTED_A.column_1 does not match SORTED_B.column_1, continues to read
the next row(s) of SORTED_A until SORTED_A.column_1 matches
SORTED_B.column_1;
4. When a match is found between SORTED_A.column_1 and SORTED_B.column_1,
continues to read all succeeding rows of SORTED_B for which
SORTED_A.column_1 matches SORTED_B.column_1;
5. Sends A.column_1, A.column_2, B.column_3 from the matching rows found above
to the result set;
6. Reads next row of SORTED_A;
7. Repeats steps 3 through 6 until all rows in SORTED_A are read and processed via a
full table scan of SORTED_A.
Figure 9.8 contains a flowchart of a sort–merge join.
The sort–merge join is effective when the index on the joined column is not highly
selective or when the join condition returns more than 10% of the rows in the driver. But
this join works only with equijoins. Its performance depends on the initialization pa-
rameter SORT_AREA_SIZE, because the tables involved in the join are sorted first. This
parameter specifies the maximum amount, in bytes, of the Program Global Area (PGA)
in memory that can be used for a sort. The PGA is an area in memory that is used by a
single Oracle user process. The memory in the PGA is not shareable. If multithreaded
servers (MTS) are enabled, the sort area is allocated from the SGA. After the completion
of the sort but before the fetching of the sorted rows to the user area, the memory is re-
leased down to the size specified by the initialization parameter SORT_AREA_RE-
TAINED_SIZE. After the last row is fetched, all of the memory is released back to the
PGA, not to the operating system.
9.5 Tuning of Join Queries 305
Sort A to
SORTED_A
Sort B to
SORTED_B
Read SORTED_A.column_1
Read SORTED_B.column_1
SORTED_A.column_1=
No SORTED_B.column_1 ?
Yes
More rows in
Yes SORTED_A ?
No
End
possible in memory at a time, then storing the result in a temporary segment, and con-
tinuing this process until the sort is complete. If this parameter is set too low, excessive
disk I/O will be needed to transfer data between the temporary segment on disk and the
sort area in the PGA in memory for performing sorts. If this parameter is set too high, the
operating system may run out of physical memory and start swapping. The sizing of the
parameter SORT_AREA_SIZE is somewhat tricky. The default value, which is operating
system dependent, usually suffices for most OLTP applications. But the value may need
to be adjusted to 2 MB or more for decision support systems, batch jobs, or large
CREATE INDEX operations. Multiple allocations of the sorting space never exist. Hence
there is only one memory area of SORT_AREA_SIZE for each user process at any time.
This parameter can be set dynamically at the session level via the ALTER SESSION
command.
If necessary, the DBA and the developers can jointly determine the largest sorting
space needed for an application and compute an appropriate value for this parameter.
Remember that the SGA should not exceed 40% of the total physical memory and the
sorting space size determined by the value of SORT_AREA_SIZE must fit within the
SGA. See Section 6.7 for further details about tuning memory.
Hash Join
The hash join involves the concept of a hash table and hashing. A hash table data struc-
ture is an array of some fixed size containing the values of a variable and their respective
addresses. Hashing means searching the hash table for a specific value and, if found, ac-
cessing the full record via its address stored in the table. Section E15 in Appendix E dis-
cusses the hashing concepts in detail.
The optimizer
1. Reads all the values of B.column_1;
2. Builds in memory a hash table of all these values;
3. Reads the first row of A
4. Retrieves the value of A.column_1;
5. Finds from the hash table the addresses of all rows in B for which B.column_1
matches A.column_1;
6. If no match is found, reads the next row in A;
7. If a match is found, sends A.column_1, A.column_2, B.column_3 from the matching
rows in A and B to the result set;
8. Reads the next row of A;
9. Repeats Steps (4) through (8) until all rows in A are read and processed via a full ta-
ble scan of A.
As such, a hash join is similar to a nested loops join except that the optimizer first
builds a hash table of the values of B.column_1. Figure 9.9 contains the flowchart of a
hash join.
A hash join is effective when table B does not have a good and highly selective index
on B.column_1. But it needs sufficient memory space so as to fit the hash table of
9.5 Tuning of Join Queries 307
B.column_1 entirely in memory. The performance of hash joins depends on three initiali-
zation parameters: HASH_AREA_SIZE, HASH_JOIN_ENABLED, and HASH_
MULTIBLOCK_IO_COUNT. All of them can be modified dynamically via the ALTER
SESSION command. HASH_AREA_SIZE determines the amount of memory, in bytes,
that is made available for building a hash table. Its default value is 2 * SORT_AREA_
SIZE. If HASH_AREA_SIZE is too small, a part of a large hash table may have to be
stored in a temporary segment in a temporary tablespace causing performance degrada-
tion. If it is too big, it may use up the physical memory. HASH_JOIN_ENABLED speci-
fies whether the optimizer should consider hash join as an option. Its default value is
TRUE, which allows the cost-based optimizer to use it. If it is set to FALSE, hash join is
not available as a join method. HASH_MULTIBLOCK_IO_COUNT specifies how many
sequential blocks a hash join reads and writes in one I/O. Its default value is 1. If the
multithreaded server mode is activated, this parameter is set at 1 regardless of the value
assigned to it. The maximum value of this parameter depends on the operating system,
but is always less than the maximum I/O size allowed by the operating system. As such,
its role for hash joins is similar to the role of DB_FILE_MULTIBLOCK_READ_
COUNT for sort–merge joins. Oracle provides formulas for estimating the value of
HASH_MULTIBLOCK_IO_COUNT.
Cluster Join
A cluster join is a special case of a nested loops join. If both of the tables A and B are
members of a cluster and if the join is an equijoin between the cluster keys A.column_1
and B.column_1, then the optimizer can use a cluster join. This join is very efficient since
both A and B reside in the same data blocks. But since the clusters have certain restric-
tions (see Section 8.6.5), cluster joins are not that popular.
308 9. Query Tuning and Optimization Under Oracle 8i
Read all
values of
B.column_1
Read next
row of A
Retrieve
A.column_1
No
Yes
Send data
to result set
More rows in A?
Yes
No
End
(b) f(x) is a quadratic function, i.e., f(x) = a + bx + cx2, where a, b, and c are constants.
The trend is called quadratic.
(c) f(x) is an asymptotic growth curve, for example,
f(x) = k / (1 + 10a+bx) logistic curve,
log (f(x)) = log k + (log a) bx Gompertz curve,
f(x) = k + abx modified exponential curve.
The linear trend mostly suffices for trend prediction in performance tracking. Spread-
sheets such as EXCEL provide the capability of predicting linear trends based on given
(x, y) values.
x y
1 3
2 10
3 15
4 19
5 35
6 38
7 44
8 61
9 100
10 105
Forecast value of y for x = 11 104.8666667
For x = 11, we get y = 104.867. This means that for week 11, the number of extents
will remain the same at 105.
Key Words
access, indexed index cluster
access, sequential linear trend
asymptotic growth curve MTS
AUTOTRACE multithreaded server
B*-tree index nested format
bitmap index nested loops join
buffer aging optimization, cost-based
cluster optimization, rule-based
cluster join optimizer mode
concatenated index optimizer, cost-based
driver optimizer, rule-based
driving table parse phase
equijoin PGA
exact match PLAN_TABLE
execute phase Program Global Area
EXPLAIN PLAN quadratic trend
explicit cursor query optimizer
fetch phase range scan
filtering condition row operation
hash cluster runtime statistics
hash function set operation
hash join sort–-merge join
hashing SQLTRACE
hint temporary tablespace
implicit cursor
Niemiec [3] offers an extensive treatment of using hints in queries under cost-based
optimization in Chapter 7 and the tuning of join queries in Chapter 8. In addition,
Niemiec [3, p. 398–412] discusses some statistical techniques for predicting query exe-
10
Special Features of Oracle 8i and
a Glimpse into Oracle 9i
Outline
10.1 Scope of the Chapter
10.2 Evolution of Oracle Through Versions 8 and 8i
10.3 Partitioning of Tables and Indices
10.4 Materialized Views
10.5 Defragmentation via Local Tablespace Management
10.6 LOB Data Type Versus LONG Data Type
10.7 Multiple Buffer Pools in the SGA
10.8 Query Execution Plan Stability via Stored Outlines
10.9 Index Enhancements
10.10 Query Rewrite for Materialized Views
10.11 Online Index Creation, Rebuild, and Defragmentation
10.12 ANALYZE Versus DBMS_STATS
10.13 Optimization of Top_N Queries
10.14 Glimpse into Oracle 9i
Key Words
References and Further Reading
Version 8.1.5
• Content Management for Internet,
• InterMedia, Spatial, Time Series, and Visual Image Retrieval,
• Java Support,
• Date Wrehousing and Very Large Data Bases (VLDB),
• Partitioning Enhancements,
• System Management,
• Oracle Parallel Server,
• Distributed Systems, and
• Networking Features and Advanced Security,
Version 8.1.6
• Improvements of Existing Oracle Features,
• Java Enhancements,
10.3 Partitioning of Tables and Indices 317
Version 8.1.7
• Additional Java Enhancements,
• Oracle Internet File Systems (iFS),
• Enhanced XML Support,
• Oracle Integration Server,
• Additional Enhancements of Security, InterMedia, and Spatial Features, and
• Improved Management of Standby Databases.
We list below a set of topics embodying some special features of Oracle 8i that per-
tain to the performance tuning and optimization issues of an Oracle database. As indi-
cated in Section 10.1, they are grouped under three separate categories: conceptual, inter-
nal, and external levels.
Conceptual Level
(a) Partitioning of Tables and Indices, and
(b) Materialized Views.
Internal Level
(a) Defragmentation via Local Tablespace Management,
(b) LOB Data Type Versus LONG Data Type, and
(c) Multiple Buffer Pools in the SGA.
External Level
(a) Query Execution Plan Stability via Stored Outlines,
(b) Index Enhancements,
(c) Query Rewrite for Materialized Views,
(d) Online Index Creation, Rebuild, and Defragmentation,
(e) ANALYZE Versus DBMS_STATS, and
(f) Optimization of Top_N Queries.
Sections 10.3 through 10.13 cover these topics.
is partitioned, its management for performance becomes easier to handle and data re-
trieval from the table becomes quicker. A partitioned table allows the DBA to manage
each partition of the table independently of the other partitions. Thus the availability of
data on a large partitioned table is higher than that on a comparably sized nonpartitioned
table. Also, if a disk failure damages only one partition of an object, the remaining parti-
tions can still be used while the DBA repairs the damaged partition.
Partitioning was first introduced in Oracle8 and only range partitioning (see Section
3.6.2) was available. Oracle 8i made several enhancements to partitioning that are sum-
marized below:
(a) Hash Partitioning,
(b) Composite Partitioning,
(c) Data Dictionary Views for Partitioned Objects, and
(d) ANALYZE Command for Partitioned Objects.
Sections 10.3.1 to 10.3.4 describe these four features.
When a table is partitioned by range, it is generally assumed that the DBA knows how
much data will fit into each partition. However, this information may not always be
available when the DBA wants to create a partition. Hash partitions alleviate this prob-
lem. They allow Oracle 8i to manage the dynamic distribution of data into the partitions.
Figure 10.1 contains the SQL command for creating a table using hash partitions.
SQL>
Wrote file afiedt.buf
1 create table cust_order
2 (order_id number (4),
3 order_description varchar2 (50),
4 order_date date,
5 customer_id number (4),
6 invoice_id number (4))
7 storage (initial 2M next 1M pctincrease 0)
8 partition by hash (order_id) partitions 8
9* store in (perf_data1, perf_data2, perf_data3, perf_data4)
SQL> /
Table created.
In creating a table with hash partitioning one must comply with the following re-
quirements.
10.3 Partitioning of Tables and Indices 319
SQL>
Wrote file afiedt.buf
1 create table new_order
2 (order_id number (4),
3 order_description varchar2 (50),
4 order_date date,
5 customer_id number (4),
6 invoice_id number (4))
7 storage (initial 2M next 1M pctincrease 0)
8 partition by hash (order_id)
9 (partition p1 tablespace perf_data1,
10 partition p2 tablespace perf_data2,
11 partition p3 tablespace perf_data3,
12 partition p4 tablespace perf_data4,
13 partition p5 tablespace perf_data1,
14 partition p6 tablespace perf_data2,
15 partition p7 tablespace perf_data3,
16* partition p8 tablespace perf_data4)
SQL> /
Table created.
Hash partitioning as described above can be combined with range partitioning to create
subpartitions by hash on the partitions by range. Although slightly more complex than
320 10. Special Features of Oracle 8i and a Glimpse into Oracle 9i
hash or range partitioning, the composite partitioning increases the overall potential for
parallelism. Figure 10.3 contains an example of composite partitioning. The table
COMP_ORDER is partitioned by range on the ORDER_DATE column and then sub-
partitioned by hashing on the ORDER_ID column.
SQL>
Wrote file afiedt.buf
1 create table comp_order
2 (order_id number (4),
3 order_description varchar2 (50),
4 order_date date,
5 customer_id number (4),
6 invoice_id number (4))
7 partition by range (order_date)
8 subpartition by hash (order_id) subpartitions 8
9 store in (perf_data1, perf_data2, perf_data3, perf_data4)
10 (partition co1 values less than (TO_DATE('10-NOV-2000',
'DD-MON-YYYY')),
11 partition co2 values less than (TO_DATE('10-JAN-2001', 'DD-
MON-YYYY')),
12 partition co3 values less than (TO_DATE('10-MAR-2001', 'DD-
MON-YYYY')),
13* partition co4 values less than (MAXVALUE))
SQL> /
Table created.
The Oracle 8i data dictionary contains several new views that provide information about
the partitioned tables and indices. They provide additional information beyond what was
already available from the Oracle8 data dictionary views such as DBA_TABLES,
DBA_INDEXES, DBA_TAB_COLUMNS, and DBA_OBJECTS. The new views along
with their descriptions are listed below.
DBA_PART_TABLES Partition basis of tables including the partition
keys
DBA_PART_INDEXES Partition basis of indices including the partition
keys
DBA_PART_KEY_COLUMNS Partition keys used for all tables and indices
DBA_TAB_PARTITIONS Partitions of all tables in the database
DBA_IND_PARTITIONS Partitions of all indices in the database
10.3 Partitioning of Tables and Indices 321
SQL>
Wrote file afiedt.buf
1 create materialized view links_view
2 tablespace perf_data
3 storage (initial 100K next 50K pctincrease 0)
4 refresh complete
5 start with SYSDATE
6 next SYSDATE + 7
7 as
8* select * from links_copy
SQL> /
Snapshot created.
SQL> create materialized view log on links_copy
2 tablespace perf_data
3 storage (initial 100K next 50K pctincrease 0)
4 with ROWID;
Snapshot log created.
Oracle 8i provides four options for the REFRESH clause for updating the data of a
materialized view by executing the view definition. Each option repopulates the view by
using the contents of the base table(s), which may have changed. The options are listed
below.
REFRESH COMPLETE Replace the data completely. This option is used when the
materialized view is first created as in Figure 10.5.
REFRESH FAST Replace only the data that have changed since the last re-
fresh. Oracle 8i uses the materialized view logs or
ROWID ranges to determine which rows have changed.
REFRESH FORCE Use REFRESH FAST, if possible; otherwise, use
REFRESH COMPLETE.
REFRESH NEVER Never refresh the view data.
If the data changes account for less than 25% of the rows in the base table, the
REFRESH FAST option is generally better than the REFRESH COMPLETE option.
In addition to the REFRESH options, the DBA must specify the intervals at which the
data will be refreshed. This can be done in two ways: automatic and manual. The auto-
matic refreshing occurs either when the underlying base table data changes are commit-
ted, or at regular intervals specified by the DBA. Manual refreshing is done manually by
the DBA by executing the view definition.
See Section 10.10 for the QUERY REWRITE option of materialized views related to
query optimization.
10.5 Defragmentation via Local Tablespace Management 325
Oracle provides two methods for allocating extents in a locally managed tablespace:
• Autoallocate, and
• Uniform.
Autoallocate Method
Under the autoallocate method, which is the default, one can specify the size of the initial
extent and then Oracle determines the optimal sizes of the subsequent extents from a se-
lection of 64 KB, 1 MB, 8 MB, and 64 MB. When a segment is created in such a table-
space, Oracle assigns 64 KB to the next extents until the segment reaches 1 MB in size.
At that point, the subsequent extents are sized at 1 MB each. When the segment reaches a
size of 64 MB, the subsequent extent sizes are increased to 8 MB each. Finally, if the ta-
ble reaches a size of 1 GB, the subsequent extent size is increased for the last time to
64 MB.
326 10. Special Features of Oracle 8i and a Glimpse into Oracle 9i
Uniform Method
Under the uniform method, one can specify an extent size when the tablespace is created
or use the default extent size, which is 1 MB. All the extents of the tablespace and of the
segments created in that tablespace will be of that uniform size. No segment can be cre-
ated in such a tablespace with a different extent size. See Section 10.5.2 below for the
impact of the STORAGE clause at the segment level.
Figure 10.6 shows two locally managed tablespaces created with the uniform and the
autoallocate options.
FIGURE 10.7 (continued): Listing of Locally Managed and Dictionary Managed Tablespaces
The extent sizes of a locally managed tablespace are managed at the tablespace level and
not at the segment level. The STORAGE clause of a segment in such a tablespace has
practically no impact on the space allocated for that segment. More precisely, the situa-
tion is as follows.
If the STORAGE clause of a segment specifies the parameters (INITIAL, NEXT,
MINEXTENTS), then Oracle computes the total space determined by these three pa-
rameters and allocates that amount to the INITIAL EXTENT of the segment. But the
NEXT EXTENT is determined differently, as noted below.
(a) If the extent management is uniform, then NEXT EXTENT equals the value speci-
fied by the UNIFORM SIZE clause of the tablespace.
(b) If the extent management is autoallocate, then NEXT EXTENT and, therefore,
PCTINCREASE are ignored.
Figure 10.8 contains a session transcript showing the following steps.
• Two tables, TEST01 and TEST02, are created in two locally managed tablespaces,
one uniform (LOC_MANAGE_DATA) and the other autoallocate (LOC_
MANAGE_DATA_AUTO) respectively.
• The two tables have an identical STORAGE clause.
• Oracle creates an INITIAL EXTENT of both TEST01 and TEST02 that equals the
total space computed from the INITIAL, NEXT, and MINEXTENTS values of the
STORAGE clause.
• Oracle assigns the value of 8 MB to the NEXT EXTENT of TEST01, but does not
assign any value to the NEXT EXTENT of TEST02.
328 10. Special Features of Oracle 8i and a Glimpse into Oracle 9i
SESSION TRANSCRIPT
SQL> edit
Wrote file afiedt.buf
1 CREATE TABLE TEST01
2 (ABC NUMBER,
3 DEF CHAR (7),
4 TEST_DATE DATE)
5 TABLESPACE LOC_MANAGE_DATA
6* STORAGE (INITIAL 40K NEXT 2M MINEXTENTS 6 PCTINCREASE 0)
SQL> /
Table created.
SQL> edit
Wrote file afiedt.buf
1 CREATE TABLE TEST02
2 (GHI NUMBER,
3 JKL CHAR (7),
4 NEW_DATE DATE)
5 TABLESPACE LOC_MANAGE_DATA_AUTO
6* STORAGE (INITIAL 40K NEXT 2M MINEXTENTS 6 PCTINCREASE 0)
SQL> /
Table created.
SQL> COL TABLESPACE_NAME FORMAT A22
SQL> COL SEGMENT_NAME HEADING “SEGMENT|NAME” FORMAT A20
SQL> COL INITIAL_EXTENT HEADING “INITIAL|EXTENT”
SQL> COL NEXT_EXTENT HEADING “NEXT|EXTENT”
SQL> COL PCT_INCREASE HEADING "%INCR." FORMAT 99
SQL> select tablespace_name, segment_name, initial_extent,
next_extent,
2 pct_increase from dba_segments where owner = 'SEKHAR' AND
TABLESPACE_NAME IN
3 ('LOC_MANAGE_DATA', 'LOC_MANAGE_DATA_AUTO')
4 ORDER BY 1, 2;
TABLESPACE_ SEGMENT_ INITIAL_ NEXT_ %INCR.
NAME NAME EXTENT EXTENT
------------------ ------- -------- -------- -------
LOC_MANAGE_DATA TEST01 10526720 83886080 0
LOC_MANAGE_DATA_AUTO TEST02 10526720
SQL> edit
Wrote file afiedt.buf
1 SELECT TABLESPACE_NAME, SEGMENT_NAME, EXTENTS FROM dba_segments
2 where owner = 'SEKHAR' AND TABLESPACE_NAME IN
3 ('LOC_MANAGE_DATA', 'LOC_MANAGE_DATA_AUTO')
4* ORDER BY 1, 2
SQL> /
TABLESPACE_NAME SEGMENT_NAME EXTENTS
-------------------- ------------- --------
LOC_MANAGE_DATA TEST01 2
LOC_MANAGE_DATA_AUTO TEST02 11
For TEST01, the total space of 10,526,720 bytes (= 10.4 MB) is provided by 2 extents
of size 8 MB each, which is the UNIFORM SIZE value for its holding tablespace. For
TEST02, the total space of 10,526,720 bytes (= 10.4 MB) is provided by 11 extents of
size 1 MB each. Since the segment TEST02 needs over 1 MB of space, the extent size
becomes 1 MB each (see the paragraph, Autoallocate Method, in Section 10.5.1 above).
Therefore, 10.4 MB is provided by 11 extents.
From this point on, as we allocate additional extents to TEST01 and TEST02, the ex-
tent count increases accordingly. Figure 10.10 shows the result of adding four extents to
each table. Note that the column EXTENTS has increased by 4 for each table beyond
their earlier values in Figure 10.9.
SQL> /
Table altered.
SQL> ALTER TABLE TEST02 ALLOCATE EXTENT;
Table altered.
SQL> /
Table altered.
SQL> /
Table altered.
SQL> /
Table altered.
SQL> SELECT TABLESPACE_NAME, SEGMENT_NAME, EXTENTS FROM dba_segments
2 where owner = 'SEKHAR' AND TABLESPACE_NAME IN
3 ('LOC_MANAGE_DATA', 'LOC_MANAGE_DATA_AUTO')
4 ORDER BY 1, 2;
This built-in package consists of a set of PL/SQL procedures and functions that help
the DBA to maintain the integrity of the locally managed tablespaces. For example, using
the procedure SEGMENT_VERIFY the DBA can verify the consistency of the extent
map of the segment, i.e., that the bitmaps properly reflect the way in which the extents
are allocated and that no two segments claim the same extent. There are two procedures
in this package called TABLESPACE_MIGRATE_TO_LOCAL and TABLESPACE_
MIGRATE_FROM_LOCAL that take a tablespace name as an argument and allow re-
spectively the migration of a dictionary managed tablespace to a locally managed table-
space and vice versa. Figures 10.11 and 10.12 contain the commands showing the execu-
tion of these two procedures and subsequent verification that the tablespaces indeed
migrated as desired. Note that the segment PERF_DATA is migrated from dictionary
managed to locally managed. Also note that the segment PERF_DATA is migrated to
dictionary managed from locally managed autoallocate.
10.5 Defragmentation via Local Tablespace Management 331
SQL> begin
2 DBMS_SPACE_ADMIN.TABLESPACE_MIGRATE_FROM_LOCAL ('PERF_DATA');
3 end;
5 /
PL/SQL procedure successfully completed.
SQL> select tablespace_name, extent_management, allocation_type
2 from dba_tablespaces
3 order by 1;
TABLESPACE_NAME EXTENT_MAN ALLOCATIO
-------------------- ---------- ----------
LOC_MANAGE_DATA LOCAL UNIFORM
LOC_MANAGE_DATA_AUTO LOCAL SYSTEM
PERF_DATA DICTIONARY USER
stores the free and used extent information of the datafile underlying a locally managed
tablespace is stored in the location previously occupied by the start of the first free extent
in the datafile.
For further details about the DBMS_SPACE_ADMIN package run the following
SQL commands,
SET HEADING OFF
SET NEWPAGE 0
SELECT TEXT FROM DBA_SOURCE
WHERE NAME = ‘DBMS_SPACE_ADMIN’
AND TYPE = ‘PACKAGE’;
When a segment is created in a locally managed tablespace under the UNIFORM SIZE
option, it does not get fragmented at all. Any isolated free extent in the tablespace
matches the uniform extent size and, therefore, gets reused when more space is needed.
In addition, the locally managed tablespaces offer several advantages over the dictionary
managed tablespaces, as listed below.
• Local management of extents automatically tracks adjacent free space, thereby
eliminating the need to coalesce adjacent free extents manually.
• A dictionary managed tablespace may incur recursive space management operations.
This happens when consuming or releasing space in an extent in the tablespace re-
sults in another operation that consumes or releases space in a rollback segment or a
data dictionary table. This scenario does not apply to locally managed tablespaces,
because they do not use rollback segments and do not update data dictionary tables.
Changes to the bitmaps representing the free or used status of extents in a locally
managed tablespace are not propagated to the data dictionary tables.
• Since a locally managed tablespace does not record free space information in the
data dictionary tables, it reduces contention on these tables.
ceed 4,000 bytes, then only the pointer to the data for the LOB column is stored
inline with the table data. In other words, no LOB column will ever require more
than 4,000 bytes of space inline with other table data. As a result, SELECT state-
ments on the LONG column return the actual data, whereas the same statements on a
LOB column return only the pointer to the data if the data value exceeds 4,000 bytes.
2. LOB columns can assume values up to 4 GB, whereas LONG columns can be up to
2 GB. Thus, LOB columns are larger than LONG columns.
3. LOB data can be accessed piecewise, whereas LONG access is sequential. Only the
entire value of a LONG column can be obtained unless one defines a procedure to
retrieve the LONG column a chunk at a time. Even then, the procedure must read
through the contents of the LONG column sequentially. On the other hand, parts of
the LOB column can be obtained at random. This offers more flexibility in handling
LOB columns compared to LONG columns.
4. A LOB column can be included in partitioned tables as long as the LOB column is
not the partition key. LONG columns do not support that feature.
5. LOB columns can be distributed and replicated, but LONG columns cannot.
Oracle recommends that LONG columns be migrated to LOB columns since the
LONG data type will not be supported in the future. The function TO_LOB can be used
to convert a LONG data type column to a LOB data type column as part of an operation
INSERT AS SELECT.
• Default pool.
The underlying principle in assigning objects to a specific buffer pool is to reduce or
eliminate physical reads.
The keep buffer pool is used for storing data blocks of database objects that are needed to
stay in the pool for a long time. This pool is never flushed and consists of buffers that
need to be pinned in memory indefinitely. Objects in the keep pool are not removed from
memory, meaning that references to objects in the keep pool will not result in a physical
read. This pool should be used only for small tables that are frequently accessed and need
to stay in memory at all times.
The recycle buffer pool is used for storing data blocks of database objects that need
not stay in the pool for any length of time. These might include data blocks read in as part
of full table scans or blocks read in order to update a row, followed quickly by a commit.
The recycle pool is instantly flushed in order to reduce contention and waits in the pool
by leaving the LRU list empty at all times. This pool should be used for large and less
important tables that are usually accessed only once in a long time.
The space left in the DBB cache after the keep and the recycle pools get their own
allocated space is assigned to the default pool. The DBB cache is the set-theoretic union
of the three pools. If the keep and the recycle pools are not set up for an instance via ini-
tialization parameters, the default pool spans the entire DBB cache.
The overall DBB cache storage and LRU latch allocation are set at instance startup time
by two initialization parameters, DB_BLOCK_BUFFERS and DB_BLOCK_LRU_
LATCHES. As noted in Section 6.10, a latch is an Oracle internal resource that governs
access to other resources. A user process must acquire a latch before it can access any
structure in the SGA such as the DBB cache. The LRU latch controls access to the DBB
cache by server processes acting on behalf of user processes.
The number of latches determined by the initialization parameter DB_BLOCK_
LRU_LATCHES for an instance must be proportional to the number of buffers in the DBB
cache determined by the initialization parameter DB_BLOCK_BUFFERS. The ratio
DB_BLOCK_BUFFERS / DB_BLOCK_LRU_LATCHES
must be ≥ 50. Otherwise, the instance will not start.
In order to set up the keep and recycle buffer pools we need the following two ini-
tialization parameters:
BUFFER_POOL_KEEP = (BUFFERS:n1, LRU_LATCHES: n2)
BUFFER_POOL_RECYCLE = (BUFFERS: n3, LRU_LATCHES: n4),
10.7 Multiple Buffer Pools in the SGA 335
A table is assigned to the keep or the recycle buffer pool via the command
ALTER TABLE table STORAGE (BUFFER_POOL KEEP);
or
ALTER TABLE table STORAGE (BUFFER_POOL RECYCLE);
Since each partition of a partitioned table can have its own STORAGE clause, each
partition can be assigned to different buffer pools, if necessary. A table that is not explic-
itly assigned to the keep or the recycle buffer pool is assigned to the default buffer pool.
Figure 10.13 shows that the table CUST_ORDER is assigned to the keep pool.
Note that when a table is assigned to a particular buffer pool, its data are not actually
loaded into the pool. The data remain as usual in the tablespace in which the table has
been created. But when the data of the table need to be fetched into the DBB cache for
processing, they are stored in the buffer pool to which the table has been assigned.
The performance information about multiple buffer pools can be found in the dynamic
performance view V$BUFFER_POOL_STATISTICS. The hit ratios for objects in the
buffer pools are determined by using this performance view. This view is created by run-
ning the script file catperf.sql located in the directory $ORACLE_HOME/rdbms/admin
in UNIX. It is quite possible that the DBB cache hit ratio (see Figures 6.3 and 6.4) for the
keep pool approaches but does not achieve the value 100.
338 10. Special Features of Oracle 8i and a Glimpse into Oracle 9i
There are two ways by which a stored outline can be created for a query, as described below.
The CREATE OUTLINE statement creates a stored outline, which contains a set of
attributes that the optimizer uses to create an execution plan. The data dictionary views,
DBA_OUTLINES and DBA_OUTLINE_HINTS, provide information about stored out-
lines. The information about the stored outline created above is recorded in the data dic-
tionary view DBA_OUTLINES as shown in Figure 10.16.
Figure 10.17 contains the corresponding information from the data dictionary view
DBA_OUTLINE_HINTS.
INVOICE_LISTING SEKHAR 1
3 0
ORDERED
INVOICE_LISTING SEKHAR 1
3 0
NO_FACT(INVOICE_LISTING)
INVOICE_LISTING SEKHAR 1
3 1
FULL(INVOICE_LISTING)
INVOICE_LISTING SEKHAR 1
2 0
NOREWRITE
INVOICE_LISTING SEKHAR 1
1 0
NOREWRITE
6 rows selected.
After creating the outlines, we need to specify that Oracle use them. By default, Oracle does not
use a stored outline. Oracle can be enforced to use stored outlines by running the command.
ALTER SESSION SET USE_STORED_OUTLINES = value;
Here value can be TRUE or FALSE or a predefined category name. If the value is
TRUE, then Oracle uses the outlines stored under the DEFAULT category. If the value is
a category name, then the outlines in that category are used. If no outline exists in that
category for a query, Oracle 8i checks the DEFAULT category for an outline. If none is
found, Oracle 8i simply generates an execution plan for the query and uses that plan. If
the value is FALSE, then Oracle does not use the stored outlines.
Oracle creates a user OUTLN with DBA privilege for managing the execution plan sta-
bility with stored outlines. The directory $ORACLE_HOME/rdbms/admin in UNIX
contains the script catproc.sql that calls two scripts, dbmsol.sql and prvtol.plb. The script
dbmsol.sql creates the package OUTLN_PKG and the script prvtol.plb creates the body
of OUTLN_PKG. Using the package OUTLN_PKG the user OUTLN centrally manages
the stored outlines and their outline categories. This user is created automatically during
the installation of Oracle 8i. There are other tables (base tables), indices, grants, and
synonyms related to this package. In the case of upgrade, it is necessary to run the In-
staller again or the manual upgrade script c0800050.sql located in the directory
$ORACLE_HOME/rdbms/admin in UNIX.
Figure 10.18 shows the attributes of OUTLN as a user.
The tablespace SYSTEM is assigned both as the default and the temporary tablespace
for OUTLN. But one can change one or both to other tablespaces, if desired. Figure 10.19
shows that both tablespaces have been changed for the user OUTLN.
Function-Based Index
If a query uses an indexed column that is modified by an Oracle built-in function (e.g.,
UPPER, ROUND, SUM, etc.) or by a computational formula, then the query optimizer
performs a full table scan instead of an indexed search. For example, let us suppose that a
table INVOICE has an index on a column named ACCT_NAME. Then, the following
query will perform an indexed search on ACCT_NAME.
SELECT ORDER_ID, INV_DATE, DELETE_FLAG FROM INVOICE
WHERE ACCT_NAME = ‘LITTLE ROCK CORP’;
10.9 Index Enhancements 343
But when we modify ACCT_NAME by a function such as UPPER, a full table scan will
be made of the INVOICE table. Thus, the query below will not use the index on
ACCT_NAME:
SELECT ORDER_ID, INV_DATE, ACCT_NAME FROM INVOICE
WHERE UPPER (ACCT_NAME) = ‘LITTLE ROCK CORP’;
Similarly, the query below will perform a full table scan due to the function SUM:
SELECT SUM (INVOICE_AMT) FROM INVOICE;
Oracle 8i introduced the function-based index whereby an index can be created on
one or more columns that are modified by a computational formula or by a built-in Ora-
cle function. This improves the query performance by making it possible to perform an
indexed search on column(s) that have been modified.
To enable the use of function-based indices, we must issue the following two com-
mands at the session level.
ALTER SESSION SET QUERY_REWRITE_ENABLED = TRUE;
ALTER SESSION SET QUERY_REWRITE_INTEGRITY=TRUSTED;
Figure 10.20 contains a session transcript showing that a B*-tree index on ACCT_
NAME uses a full table scan of INVOICE, while a function-based index on UPPER
(ACCT_NAME) performs an indexed search.
SESSION TRANSCRIPT:
SQL> create index ACCT_NAME_idx on INVOICE (ACCT_NAME)
2 tablespace perf_data;
Index created.
SQL>
Wrote file afiedt.buf
1 explain plan
2 set statement_id = 'TEST_QUERY_20'
3 for
4 select ORDER_ID, INV_DATE, ACCT_NAME from INVOICE
5 where
6 upper (ACCT_NAME) = 'LITTLE ROCK CORP'
7* order by 1
SQL> /
Explained.
SQL> SELECT DECODE(ID,0,'',
2 LPAD(' ',2*(LEVEL-1))||LEVEL||'.'||POSITION)||' '||
3 OPERATION||' '||OPTIONS||' '||OBJECT_NAME||' '||
4 OBJECT_TYPE||' '||
Bitmap indices can be function based. Function-based indices can also be partitioned.
10.9 Index Enhancements 345
Descending Index
The data in a B*-tree index are stored in ascending order, ordered from the lowest col-
umn value to the highest. But in Oracle 8i we can categorize data in a B*-tree index in
descending order by creating an index on column(s) and specifying DESC as the order in
which the indexed data are stored. This feature becomes very useful in queries where
sorting operations are mixed, some ascending and some descending. For example, let us
consider the following table.
STUDENT (Student_id, Last_Name, First_Name, Grade, …)
We issue the following query on STUDENT.
SELECT LAST_NAME, FIRST_NAME, GRADE FROM STUDENT
ORDER BY LAST_NAME, FIRST_NAME, GRADE DESC;
If the STUDENT table is large, then without descending indices Oracle may need a
large amount of sort space to retrieve the indexed data for LAST_NAME and FIRST_
NAME in the ascending sort order and the GRADE data in the descending order. But if
we create a descending index on GRADE, then its B*-tree index will store the indexed
data in descending order. As a result, a large sort space will not be needed. Figure 10.21
shows the creation of a descending index on the column STUDENT.GRADE.
It is possible to create a concatenated index on multiple columns with mixed sort or-
der, i.e., some columns in ascending order and the rest in descending order. Figure 10.22
shows an example of such an index.
Finally, a function-based index can be combined with the descending index feature to
create a function-based descending index.
346 10. Special Features of Oracle 8i and a Glimpse into Oracle 9i
SQL>
Wrote file afiedt.buf
1 create materialized view invest_option_view
2 tablespace perf_data
3 storage (initial 100K next 50K pctincrease 0)
4 refresh complete
5 start with SYSDATE
6 next SYSDATE + 7
7 as
8* select * from invest_option
SQL> /
Snapshot created.
The optimizer hints, /*+rewrite */ and /*+norewrite */, can be used for specifying that
the query rewrite option be used on SELECT statements.
Several dictionary views are available for materialized view information. They are
listed below.
ALL_REFRESH_DEPENDENCIES List of tables used by materialized views.
DBA_MVIEW_AGGREGATES Information about grouping functions used
by materialized views.
DBA_MVIEW_ANALYSIS Information about materialized views sup-
porting query rewrites.
DBA_MVIEW_DETAIL_RELATIONS List of all objects referenced in a material-
ized view.
DBA_MVIEW_JOINS List of columns joined from base tables in
materialized views.
DBA_MVIEW_KEYS More information about the relationships
between objects identified in DBA_
MVIEW_DETAIL_RELATIONS.
(c) Oracle continues to rebuild the index by populating the index with data from the ta-
ble.
(d) Oracle also creates an index-organized copy of the table called a journal table. The
journal table stores data changes made by users while the index is being rebuilt.
(e) When the rebuild using the data from the original table is complete, Oracle compares
the data in the rebuilt index with changes recorded in the journal table, if any. Each
change is then merged into the index, and the row is simultaneously deleted from the
journal table as long as no user is making changes to that row.
(f) If it takes too many iterations through the journal table to clean out all the data, Ora-
cle places an exclusive lock on the journal table until the cleaning operation is com-
plete. This is the only time other than at the very beginning of the build or rebuild
operation when other users cannot change data of the table.
Figure 10.24 shows the command to rebuild an index online.
The above process applies primarily to the B*-tree indices and their variants such as
function based, descending, and reverse key indices, and for partitioned indices. It does
not apply to the bitmap, cluster, and secondary indices in index organized tables.
to view and modify optimizer statistics gathered for database objects. It also allows the
gathering of some statistics in parallel. The statistics can reside in two different locations:
• Specific data dictionary views such as DBA_TABLES, DBA_INDEXES, etc.;
• A table created in the user's schema for this purpose.
However, the cost-based optimizer uses only the statistics stored in the data dictionary
views. The statistics stored in a user’s table have no impact on the optimizer.
The package is divided into three main sections, each section consisting of a set of
procedures.
Section 1: Procedures that gather certain classes of optimizer statistics and
have either improved or equivalent performance characteristics
compared to the ANALYZE command.
The following procedures belong to this section.
• GATHER_DATABASE_STATS—Collects statistics for all database objects,
• GATHER_SCHEMA_STATS—Collects statistics for all objects owned by a par-
ticular user;
• GATHER_INDEX_STATS—Collects statistics for a specified index;
• GATHER_TABLE_STATS—Collects statistics for a specified table.
Section 2: Procedures that enable the storage, retrieval, and removal of the
statistics related to individual columns, indices, and tables, and re-
moval of statistics related to a schema and an entire database.
The following procedures belong to this section.
• SET_COLUMN_STATS—Sets column-related statistics;
• SET_INDEX_STATS— Sets index-related statistics;
• SET_TABLE_STATS— Sets table-related statistics;
• GET_COLUMN_STATS—Gets column-related statistics;
• GET_INDEX_STATS—Gets index-related statistics;
• GET_TABLE_STATS—Gets table-related statistics;
• DELETE_COLUMN_STATS—Deletes column-related statistics;
• DELETE _INDEX_STATS—Deletes index-related statistics;
• DELETE _TABLE_STATS—Deletes table-related statistics ;
• DELETE_SCHEMA_STATS—Deletes statistics for a schema;
• DELETE_DATABASE_STATS—Deletes statistics for an entire database.
Section 3: Procedures THAT create and drop the statistics table and transfer
statistics between the data dictionary views and the statistics table.
The following procedures belong to this section.
• CREATE_STAT_TABLE—Creates the statistics table;
• DROP_STAT_TABLE—Drops the statistics table;
350 10. Special Features of Oracle 8i and a Glimpse into Oracle 9i
offers new capabilities that extend the existing features of Oracle 8i aimed at the tradi-
tional relational database environment. In this section we discuss a set of features of Ora-
cle 9i relevant to the performance tuning and optimization issues.
Tablespace Management
When a tablespace is created via the CREATE TABLESPACE command with the clause
SEGMENT SPACE MANAGEMENT, two options are available: MANUAL and AUTO.
The option MANUAL is the default that creates the dictionary managed tablespaces as in
Oracle 8i. The option AUTO creates the locally managed tablespaces as discussed in
Section 10.5. However, the default for the EXTENT MANAGEMENT clause depends on
the setting of the initialization parameter COMPATIBLE, as noted below.
• If COMPATIBLE is set to 8.1.7 or lower, then the tablespace is created as dictionary
managed.
• If COMPATIBLE is set to 9.0.0.0 or higher, then the tablespace is created as locally
managed. A permanent locally managed tablespace cannot be assigned as a user’s
temporary tablespace.
Flashback Query
This feature allows the user to retrieve data via SQL query from a point of time in the
past. The user first needs to specify the date and time for which he or she wants to query
the data. Subsequently, any SELECT statement issued by the user as the flashback query
operates on the data as they existed at the specified date and time. The underlying
mechanism is known as Oracle’s multiversion read consistency capability, which is
achieved by restoring data by applying undo operations as needed. The DBA needs to
specify how long undo should be kept in the database. When a flashback query is exe-
cuted, a snapshot of the data is created via undo operations on the data. The DBA must
size the rollback segments large enough to contain the undo information for executing a
flashback query.
The performance of a flashback query as measured by its response time depends on
two factors:
• Amount of data being queried and retrieved, and
• Number of changes made to the data between the current time and the user specified
flashback time.
There are several restrictions on the use of flashback queries, as noted below.
• They cannot be issued by the user SYS. In order to use such queries on data diction-
ary tables owned by SYS, it is necessary to grant the SELECT privilege on these ta-
bles to a non-SYS user who then issues these queries.
• These queries do not work with the QUERY REWRITE option of the materialized
views (see Section 10.10). This option must be disabled when using the flashback
queries.
354 10. Special Features of Oracle 8i and a Glimpse into Oracle 9i
When a query is executed for a time in the past, a snapshot of the data as they existed
at that time is recreated using the undo operation on the data saved in the rollback seg-
ments. For every piece of data that has changed since the time requested by the flashback
query, the corresponding undo data need to be retrieved and compiled. Hence the per-
formance of the flashback query can degrade significantly as the volume of data which
need to be recreated increases. As a result, a flashback query works best when it is used
to select a small set of data, preferably using indices. If a flashback query has to perform
a full table scan, its performance will depend on the amount of DML activities performed
on that table between the present time and the time of the data that the flashback query is
retrieving. Sometimes parallel query slaves can be used to improve the performance of
such full table scans.
It is necessary to set up the database for using the flashback query feature. The
init.ora file contains four “undo” initialization parameters with the following default
values:
NAME VALUE
-------------------- --------
undo_management MANUAL
undo_retention 900
undo_suppress_errors FALSE
undo_tablespace UNDOTBS
FIRST_ROWS Hint
The FIRST_ROWS (n) hint is an improvement over the FIRST_ROWS hint (see Section
9.3.1) and is fully cost-based. Oracle recommends that this new hint be used in place of
the previous FIRST_ROWS hint which might give suboptimal results for “first n” queries
with a very selective CONTAINS clause. The FIRST_ROWS (n) hint is used like the
FIRST_ROWS hint in cases where we want the shortest response time. For example, to
obtain the first eight rows in the shortest possible time, use the new hint as follows.
select /* FIRST_ROWS(8) */ account_id from INVOICE
where contains (account_name, 'Little Rock Corp') > 0
order by invoice_date desc;
10.14 Glimpse into Oracle 9i 355
Dynamic SGA
The System Global Area (SGA) consists of several memory-resident data structures that
constitute an Oracle instance (see Section 4.2.1). The size of the SGA is static in the
sense that it is allocated at the start of an instance and cannot be changed dynamically
while the instance is running. Oracle 9i offers a new feature called the dynamic SGA that
allows a DBA to modify the SGA size dynamically. This provides an SGA that can grow
and shrink in response to a DBA command depending on the operational performance
needs of the database. The Oracle Server process can modify the physical address space
use in response to the need of physical memory for the operating system.
Under the dynamic SGA infrastructure the DBA can
• Size the buffer cache, the shared pool, and the large pool without having to shut
down the instance; and
• Allow the limits to be set at runtime on how much physical memory will be used for
the SGA. The instance will be started under-conFigured and will use as much mem-
ory as the operating system gives it.
The dynamic SGA creates a new unit of memory allocation called a granule, which is
a unit of contiguous virtual memory that can be allocated dynamically while the instance
is running. The size of a granule is calculated as follows depending on the estimated total
SGA size.
granule = 4 MB if SGA size < 128 MB,
granule = 16 MB otherwise
The buffer cache, the shared pool, and the large pool are allowed to grow and shrink
depending on granule boundaries. When the instance is started, Oracle allocates the gran-
ule entries, one for each granule to support an address space of size SGA_MAX_SIZE
bytes. Subsequently, each component acquires as many granules as it requires. The
minimum SGA size is three granules resulting from an allocation of one granule each to
the fixed SGA including the redo buffers, the buffer cache, and the shared pool.
The DBA can alter the granules allocated to components by using the ALTER
SYSTEM command. This command interchanges free granules among the three compo-
nents of the SGA, namely, the buffer cache, the shared pool, and the large pool. But the
granules are never automatically deallocated from one component of the SGA and allo-
cated to another. The granules are rounded up to the nearest default granule size (4 or 16
MB). Adding a number of granules to a component with the ALTER SYSTEM command
succeeds only if Oracle has sufficient free granules to satisfy the request. Oracle cannot
automatically allocate granules from one component to another. Instead, the DBA must
ensure that the instance has enough free granules to satisfy the increase of a component's
granule use. If the current size of the SGA memory is less than SGA_MAX_SIZE, then
Oracle can allocate more granules until the SGA size reaches the limit of
SGA_MAX_SIZE. The Oracle Server that invokes the ALTER SYSTEM command re-
serves a set of granules for the corresponding SGA component. The foreground process
then hands the completion to the background process. The background process completes
356 10. Special Features of Oracle 8i and a Glimpse into Oracle 9i
the operation by taking the reserved granules and adding them to the component's granule
list. This is referred to as growing the SGA memory area of a component.
SGA memory is tracked in granules by SGA components. One can monitor the
tracking via the V$BUFFER_POOL view.
Miscellaneous Features
• Index Coalesce: Defragments free space in B*-tree index leaf blocks while the asso-
ciated table remains online for user access.
• Resumable Space Allocation: If a large operation such as a batch update or bulk data
load encounters too many “out of space” errors, the DBA can suspend the operation
temporarily, fix the problem by allocating additional datafiles as needed, and then re-
start the operation from the point where it was suspended. This can be done without
interrupting the normal database operation.
Key Words 357
• Caching Query Execution Plans: These plans are cached in the shared pool so that
the DBA can investigate any reported performance problem without reexecuting the
offending queries.
• Basic Replication: Provides bidirectional replication with automated conflict detec-
tion and resolution; supports configurations that include a single updatable master
site with multiple updatable or read-only snapshot sites
• Unused Index Maintenance: Allows the database server to track unused indices.
• DB_CACHE_SIZE: Specifies the size of the DEFAULT buffer pool for buffers cre-
ated with the block size determined by the DB_BLOCK_SIZE parameter. Oracle
recommends that DB_CACHE_SIZE be used to size the DEFAULT buffer pool in-
stead of DB_BLOCK_BUFFERS.
Key Words
buffer pool locally managed tablespace,
buffer pool sizing autoallocate
category locally managed tablespace, uni-
composite partitioning form
DBMS_SPACE_ADMIN LONG data type
DBMS_STATS materialized view
default pool materialized view, QUERY
descending index REWRITE
dictionary managed tablespace materialized view, REFRESH
dynamic SGA clause
flashback query OUTLN user
function-based index OUTLN_PKG
granule rebuilt index
hash partitioning recycle pool
index coalesce snapshot replication
inline view STORE IN clause
journal table stored outline
keep pool subpartition
LOB data type top_N query
locally managed tablespace undo initialization paramaeters
The chapter covers a set of topics embodying new features in Oracle 8i and Oracle 9i
related primarily to database performance tuning and optimization. The best source of in-
formation on these topics is Oracle MetaLink which contains numerous papers dealing
with actual issues and case studies raised by Oracle DBAs and developers related to these
features. I have drawn heavily from many of these papers published in the MetaLink. The
site can be accessed at the URL http://metalink.oracle.com and searched for the relevant
topics. In addition, the references cited above contain more information on some of the
topics.
References [6] and [7] constitute the primary source of the contents of Section 10.2.
Couchman [1, Chapters 25–28] discusses a large set of topics dealing with features that
are unique to Oracle 8i, not all of which, however, pertain to database performance tun-
ing and optimization. Greenwald et al. [2] offer a comprehensive coverage of Oracle da-
tabase internals. In particular, [2, Chapters 2, – 4, and 6] addresses Oracle 9i features.
Loney and Koch [3] provide an in-depth discussion of snapshots and materialized views
in Chapter 23, and LOB data types in Chapter 30. Loney and Theriault [4, pp. 175–177]
provide some additional insights for creating and altering stored outlines. Niemiec [5,
Chapter 13] provides a few topics on Oracle 8i features. Scherer et al. [10, Chapter 5]
contain a comprehensive discussion of the DBMS_STATS package and the process of
automatic statistics gathering for cost-based optimization. References [8] and [9] consti-
tute the primary source of the contents of Section 10.14. As of June 9, 2001 Oracle has
published a list of open bugs not all of which, however, pertain to performance tuning
and optimization issues. See [8, § 43] for full details.
Part 3
Contemporary Issues
Part 3 consists of two chapters that discuss two contemporary issues, namely, the tuning
principles of data warehouses and of Web-based databases. The tuning principles of an
OLTP database are substantially the same as those for a data warehouse or a Web-based
database. In this part we capture only those tools and techniques that are unique to these
two special types of databases.
Chapter 11 starts with a discussion of the design principles of a data warehouse and
identifies the structural differences between a transactional database and a data ware-
house. It then introduces the data loading principles for a data warehouse. The chapter
closes with a discussion of the tuning principles for a data warehouse at the internal and
external levels.
Chapter 12 starts with an introduction to the three-tier and n-tier architectures of cli-
ent server applications with emphasis on Web-based applications. The Oracle product
OAS is discussed in detail and an overview is offered of the new product iAS. The chap-
ter closes with a discussion of the tuning principles for a Web-based database at the in-
ternal and external levels.
11
Tuning the Data Warehouse at All Levels
Outline
11.1 Advent of Data Warehouse
11.2 Features of Data Warehouse
11.3 Design Issues of Data Warehouse
11.4 Structure of Data Warehouse
11.5 Proliferation from Data Warehouse
11.6 Metadata
11.7 Implementation and Internal Level
11.8 Data Loading in Warehouse
11.9 Query Processing and Optimization
Key Words
References and Further Reading
Exercises
• Data are collected from various sources and integrated into the warehouse structure.
As a result, inconsistencies among data sources are resolved before the data are
loaded into the warehouse.
• Data are less detailed and are often summarized through rollups.
• Data are nonvolatile since they are loaded in bulk from the source feeds and snap-
shots of transactional databases. End users do not change the data through online
transactions.
The development life cycle of a data warehouse is the opposite of that of transactional
systems. Somewhat dramatically, Inmon [6, p. 24] calls it CLDS, which is SDLC (system
development life cycle) read backwards. While SDLC is driven by requirements, CLDS
is data driven and works as follows.
• Start with the data in transactional databases and legacy file systems.
• Combine multiple data sources through extraction programs, rollups, and denormali-
zation.
• Generate reports to analyze the data collected so far.
• Identify user requirements and decide on the final data models needed.
The hardware usage of a data warehouse is different from that of transactional sys-
tems. There are peaks and valleys in transactional processing, but ultimately a fairly pre-
dictable pattern emerges comprising batch jobs and interactive user transactions. For a
data warehouse, however, the usage pattern is ad hoc and mostly unpredictable except for
scheduled batch jobs that populate the data warehouse and prescheduled reports that run
at designated times. Consequently, a data warehouse and a transactional database should
not run on the same machine. A server can be optimized for a transactional application or
for a data warehouse, but not for both.
Granularity
Granularity refers to the level of detail via aggregation captured by the data elements in
the warehouse. A high level of granularity means a low level of detail, which is the char-
acteristic of the warehouse. On the other hand, transactional databases exhibit a low level
of granularity, i.e., a high level of detail that is needed for day-to-day operations. As a re-
sult, an insurance warehouse can provide answers to questions such as: What is the aver-
age dollar value of auto insurance claims during the last 12 months. But it cannot answer
the question: What is the claim amount settled for client James Arnold. Often a middle
ground is followed based on time. Thus, the insurance warehouse may retain data with
364 11. Tuning the Data Warehouse at all Levels
low granularity for a sliding window of 30 days, say, while retaining only high granular-
ity for data older than 30 days. This is called multiple levels of granularity.
Determining the correct level of granularity poses a unique challenge for the design-
ers of data warehouses. Usually this is accomplished through stepwise refinement as fol-
lows.
• Build a small subset of the target warehouse and generate reports.
• Allow users to use these reports and collect their feedback.
• Modify the design to incorporate the feedback and adjust the levels of granularity.
• Ask users to use reports from the modified design and collect their feedback.
• Repeat the above incremental development until you achieve the correct level(s) of
granularity.
Partitioning
Partitioning represents the breaking up of large blocks of detailed data into smaller
physical units. Data can be partitioned by a combination of one or more of such attributes
as date, line of business, geographical region, organizational units, etc. If granularity and
partitioning are handled properly, the other design issues become easy to manage.
Denormalization
Denormalization is the mechanism by which data in the 3NF tables in transactional data-
bases are combined for inclusion in warehouse tables. As a result, both primitive and de-
rived data elements appear in a data warehouse. This situation introduces transitive de-
pendency in data warehouse tables. For example, a CUSTOMER ORDER table may
contain columns, Customer Name, Address, . . . , Monthly Order appearing 12 times
during one year, and Annual Average Monthly Order. The last column is derived from
the 12 primitive columns, the Monthly Orders for 12 months. As such, the CUSTOMER
ORDER table is not in 3NF. It is quite common to have tables in 2NF or even in 1NF in a
data warehouse.
Various methods are used to denormalize data. Aggregation is one technique that has
been discussed above under granularity. A second strategy is to introduce data redun-
dancy in tables. For example, the two-character STATE CODE may be used in multiple
tables in a warehouse. Rather than creating a separate lookup table with STATE CODE
and STATE NAME and then enforcing a join operation each time the STATE NAME is
needed, we include the two columns, STATE CODE and STATE NAME, together in
every table where STATE CODE appears. This brings in data redundancy, but avoids
multiple joins.
We know that data redundancy causes multiple update problems to keep data syn-
chronized in a transactional database. Since tables in a data warehouse are updated only
through batch jobs but never via online user transactions, such multiple update problems
are handled in a warehouse through appropriate processing logic in the batch update
programs.
11.4 Structure of Data Warehouse 365
Wrinkle of Time
There is a time lag between record updates in a transactional database and those in a data
warehouse. This time lag is called the wrinkle of time. A wrinkle of 24 hours is normal.
The wrinkle implies that transactional data and warehouse data are loosely coupled. Data
in the former must settle down before they are transferred to the latter. This does not,
however, affect the usability of the warehouse data, because the warehouse does not need
instantaneous updates as does a transactional database.
industrywide. Figure 11.1 shows the structure of a star schema consisting of n fact tables
supported collectively by m dimension tables.
Each dimension table represents an attribute of a fact table. The queries supported by
a warehouse involve several dimensions related to a fact. Hence they are called multidi-
mensional and a data warehouse is often called a multidimensional database (MDDB).
The process of extracting information about the various dimensions of a fact is called di-
mensional analysis. For example, the following query may be posed to an insurance
claims data warehouse.
What is the average claim amount for auto insurance during the calendar year 1999
for the six New England states?
DMN_I
DMN_1 DMN_2
FACT_N
FACT_1
DMN_M
DMN_J DMN_K
The query involves three dimensions: insurance type (auto), time (1999), and region
(New England states). A fact table contains the basic summarized data such as claimant
name, total monthly claim, insurance type, month, state, etc. The four dimension tables,
insurance type, month, year, and state, provide the lookup data for the query.
11.5.1 Datamart
The data warehouse supports the DSS requirements of a large organization and handles
several different but related subject areas. Its fact tables may range from four to six in
number along with an even larger number of dimension tables. For example, an insurance
data warehouse may deal with three separate subject areas such as customer transactions,
insurance claims, and new market venture. As discussed in Section 11.2, a data ware-
house is built via CLDS in stepwise refinements. Thus, the warehouse may start with the
customer transactions component at first, then build the insurance claims piece, and fi-
nally complete the warehouse with the new market venture segment. Each component is
called a datamart. Thus, a datamart can be regarded as a subset of a data warehouse
dealing with a single subject area. Several datamarts focusing on several related subject
areas comprise a data warehouse. The above example shows that the insurance data
warehouse consists of three related datamarts. However, we must keep in mind that each
time a new datamart is added to the existing one(s), some data modeling and restructuring
will be needed.
The real appeal of a datamart over a data warehouse lies in two factors.
• Cost: A datamart is less costly to build than a data warehouse. Corey and Abbey [4,
pp. 193] report that a data warehouse comes with a price tag of three to five million
dollars and takes up to three years for delivery.
• Ease of Use: Tools can handle datamarts more easily due to their smaller size al-
lowing end users to retrieve data and produce reports more easily and perhaps
quickly.
Datamarts are really the stepping stones to a data warehouse. However, in the rest of
the chapter we use the term data warehouse collectively to refer to a datamart and a data
warehouse.
In collaboration with many other vendors Oracle established a consortium called
Warehouse Technology Initiative (WTI). The members of WTI offer a variety of tools
that augment the functionality of Oracle warehousing and help customers to build a ro-
bust environment for their DSSs.
Users of a data warehouse access it to run ad hoc queries and generate predesigned re-
ports. Except for the DBAs and developers, the rest of the user community does not ac-
cess it via the command line using the SQL prompt. Instead a GUI front-end is provided
for the end users that allows them to formulate their queries or select reports from some
available lists under various categories. The underlying software converts the user re-
quests into appropriate SQL code and executes them or invokes some report executables
to return the results to the users. The goal of the software is to search the data warehouse,
drill down as far as necessary to retrieve and assemble the requested information, and
then display it for the users. An integral part of this search process is to understand the
368 11. Tuning the Data Warehouse at all Levels
relationships among different pieces of data, to discern trends or patterns among the data
elements, and to bring it out for the benefit of the users. Data mining is defined as the
discovery and pattern recognition process for data elements in a data warehouse.
The fact and dimension tables of a data warehouse contain information about a related
set of data belonging to a subject area. The discovery process of data mining explores the
hidden underlying relationships among them that are not clearly visible to end users. For
example, consider two fact tables, Customer Transaction (CT) and Insurance Claim (IC)
in an insurance data warehouse. IC may contain a relatively large volume of auto insur-
ance claims tied to a small set of zip codes. On exploring CT one may find that these zip
codes represent three cities each of which contains a large number of colleges and univer-
sities. Further data mining may lead to the discovery of the relationship between the high
number of auto accidents in these cities and the ages of the young students attending the
educational institutions there.
Pattern recognition involves the correlation between two sets of data such that a
change in one triggers a corresponding change in the other, either in the same direction or
in the opposite. The statistical theory of correlation and chi-square tests have been used to
handle pattern recognition in many areas. The process works in two steps. First, a hy-
pothesis is formulated claiming the existence of a pattern. Next, data are collected over a
period of time to validate or refute the hypothesis. Depending on the outcome we confirm
or deny the existence of the pattern. If a pattern exists, additional reports may be gener-
ated utilizing it for the benefit of the organization. For example, a company knows that
by advertising a product it can increase its sale. The data warehouse of the company
contains volumes of data supporting this hypothesis. However, the data on the surface
may not indicate that consumers with a specific level of education and a specific income
range can be targeted by advertising only in certain specific media. Data mining can un-
cover this pattern. Although using software for data mining can indeed benefit a com-
pany, the expense incurred in such use has to be justified on a cost benefit analysis
method.
Refer to any college-level statistics book for further details on hypothesis testing, chi-
square test, and correlation and other related statistical methods.
11.6 Metadata
Metadata is described as a database about a database. Often the terms meta database or
simply metabase are used. It consists of a set of text descriptions of all the columns, ta-
bles, views, procedures, etc. used in a database, the schema diagrams at the logical level,
business rules implemented through various data validation algorithms such as declara-
tive integrity and triggers, sizing algorithms used for the physical database, etc. Section
1.4.4 describes the metadata for an OLTP relational database. Oracle’s data dictionary
views and the V$ dynamic performance views are two examples of built-in metadata.
They collect realtime data and statistics about all the database objects, the components of
the SGA, background processes, and user transactions among others. The metadata for a
11.7 Implementation and Internal Level 369
data warehouse contains all of the above data needed for an OLTP database and, in addi-
tion, includes data about all the mappings needed for extracting data from the raw data
sources and sending them to the target data tables in the warehouse.
The metadata is created, maintained, and processed through CASE tools such as Ora-
cle Designer 2000, ERwin, Power Designer, etc. Any such software must have the ability
to allow the users to run ad hoc queries and generate predesigned reports including dia-
grams. The users fall into two categories, technical and nontechnical. Technical users in-
clude DBAs and developers. Nontechnical users include business analysts.
Initialization Parameters
As discussed in Chapter 4, the initialization parameters control the settings of an instance.
A database is started up by identifying its parameter file (pfile) via the command: startup
370 11. Tuning the Data Warehouse at all Levels
pfile=’pfile_path’. The parameters along with the guidelines listed below pertain to both
internal and external levels of a data warehouse.
• db_block_buffers: It represents the number of data blocks cached in the memory (see
Sections 4.2.1 and 6.3). Set this parameter to 30% of the total memory.
• db_file_multiblock_read_count: It represents the number of data blocks that are read
in one I/O operation during a full table scan. Since I/O is a major activity in a data
warehouse, set this parameter between 16 and 32. For a heavily accessed warehouse,
set the value to the high end of the range.
• db_files: It represents the maximum number of database files that can remain open
concurrently. This number should be around 1,020 for a warehouse with partitioning
and with separate tablespaces for its fact and dimension tables and their respective
indices.
• log_buffer: It represents the size of the redo log buffer cache in bytes (see Sections
4.2.1 and 6.4). Set it to a large value such as 4 MB to handle large transactions dur-
ing warehouse updates. This will reduce the number of log switches and hence fewer
I/Os.
• open_cursors: It represents the maximum number of cursors that can remain open
simultaneously. Set this parameter to a value between 400 and 600.
• processes: It represents the maximum number of concurrent processes that can con-
nect to the Oracle server. Set this parameter between 200 and 300. For the UNIX op-
erating system, the number of semaphores must be set much higher than this pa-
rameter (see Section 6.7).
• rollback_segments: It specifies one or more rollback segments by name that are as-
signed to an instance. Due to the large volume of updates during the data load phase
of the warehouse allocate six to eight segments of size 100 MB each with
MINEXTENTS = 20.
• shared_pool_size: It represents the size in bytes of the shared SQL pool in SGA (see
Sections 4.2.1 and 6.5). Set its value to 20 MB or more.
• sort_area_size: It represents in bytes the amount of memory allocated to do sorts for
user requested operations. Since warehouse users often perform large sorts, set its
value between 2 and 4 MB.
• star_transformation_enabled: If it is set to TRUE, the cost-based optimizer uses the
star transformation for processing a star query via the best execution plan.
User Accounts
In a production database we normally create three categories of user accounts:
• SELECT privilege on all database objects;
• SELECT, INSERT, UPDATE, and DELETE privileges on all database objects; and
• CREATE, ALTER, DROP privileges on all database objects.
In general, the first category is assigned to developers, the second to end users, and
the third to DBAs.
11.8 Data Loading in Warehouse 371
Various other combinations including the DBA privilege are possible. In a data ware-
house, however, users do not make online updates. Instead data are loaded into the ware-
house via nightly batch jobs (see Section 11.8 below). As a result, the user accounts are
created primarily of the first category, i.e., with SELECT privilege only.
tion algorithms reflecting the desired level of granularity and other design requirements
are applied to the staging database. Next, the processed data are loaded into the fact and
dimension tables of the data warehouse as the targets of the mappings. Due to the wrinkle
of time discussed in Section 11.3, the data finally loaded into the warehouse are not up to
date, and are not meant to be either. Figure 11.2 shows a schematic of the data loading
process for a data warehouse via the mappings.
Data loader
Raw data
source
Aggregation,
Data data extraction
Data software Data warehouse
Raw data loader =
source staging
area Facts + Dimensions
Various tools are available for implementing the mappings to load data from raw data
sources into a warehouse. We describe below two Oracle tools and make some comments
about third party tools in general.
SQL*Loader
SQL*Loader is used to load data from operating system files into the tables of an Oracle
database. Such files are mostly used in mainframe-based legacy applications. Thus,
SQL*Loader is helpful in the first stage of data loading where data from diverse sources
are collected in a staging area such as an intermediate Oracle database. The loader ses-
sion needs two inputs:
• The source data to be moved into the Oracle tables, and
• A set of command line parameters that provide the details of the mapping such as the
format and location of the source data, and the location of the target data.
11.8 Data Loading in Warehouse 373
• Retrieve those rows from DT1, . . . , DTn that satisfy the qualifying conditions of
SQ.
• Create a table constituting a Cartesian product of these retrieved rows.
• Store this table in memory for fast processing of SQ.
• Join this table with FT as the last step of executing SQ and displaying the result set.
The star query execution plan requires that the following two conditions hold:
• FT must be much greater in size than each DTi, i = 1, . . . , n.
• The qualifying conditions return only a few rows from DT1, . . . , DTn so that the
Cartesian product becomes a small table that can be stored in memory.
One must use proper hints in the query formulation to enforce this plan. It joins the
fact table with the prior result set by means of a nested loops join on a concatenated in-
dex. It is advisable to verify with EXPLAIN PLAN that indeed the nested loops join is
used. The optimizer also considers different permutations of the small tables. If the tables
are ANALYZEd regularly, the optimizer will choose an efficient star query execution
plan. Oracle offers two alternative ways to process star queries:
• Using hints, and
• Using star transformation.
ACCT_DIMN PERIOD_DIMN
ACCT_ID PERIOD_ID
ACCT_NAME PERIOD_END_DATE
CUST_NAME MONTH
PLAN_NAME ACCT_FACT QUARTER
ACCT_ID
CRNC_CODE
PERIOD_ID
PERF_ID
TOT_FISCAL_PERF
AVG_FISCAL_PERF
CRNC_DIMN PERF_DIMN
CRNC_CODE PERF_ID
CRNS_NAME PERF_NAME
The table ACCT_FACT has a four-column concatenated index on the set (ACCT_ID,
CRNC_CODE, PERIOD_ID, PERF_ID). We run the following query SQ against the
warehouse.
SQ: SELECT /*+ ORDERED USE_NL(E) INDEX(IND_CONCAT) */
TOT_FISCAL_PERF, AVG_FISCAL_PERF
from
ACCT_DIMN A, CRNC_DIMN B, PERIOD_DIMN C, PERF_DIMN D,
ACCT_FACT E
WHERE
A. CUST_NAME = ‘value1’ AND
B. CRNC_NAME = ‘value2’ AND
C. QUARTER = 3 AND
D. PERF_NAME = ‘value3’ AND
A. ACCT_ID = E. ACCT_ID AND
B. CRNC_CODE = E. CRNC_CODE AND
C. PERIOD_ID = E.PERIOD_ID AND
D. PERF_ID = E. PERF_ID;
11.9 Query Processing and Optimization 377
The star query SQ represents a star join. Its first part after the WHERE clause consists
of four qualifying conditions on the four dimension tables. The second part contains four
join conditions. The concatenated index IND_CONCAT on the four FK columns facili-
tates this type of join. The order of the columns in this index must match the order of the
tables in the FROM clause and the fact table must appear at last. This is critical to per-
formance. Figure 11.4 shows the execution plan for SQ under this version.
STAR Hint
An alternative method to process a star query such as SQ is to use the STAR hint /*+
STAR */. This version of SQ reads as follows.
SQ: SELECT /*+ STAR */
TOT_FISCAL_PERF, AVG_FISCAL_PERF
FROM
ACCT_DIMN A, CRNC_DIMN B, PERIOD_DIMN C, PERF_DIMN D,
ACCT_FACT E
WHERE
A.CUST_NAME = ‘value1’ AND
B.CRNC_NAME = ‘value2’ AND
C.QUARTER = 3 AND
D.PERF_NAME = ‘value3’ AND
A.ACCT_ID = E.ACCT_ID AND
B.CRNC_CODE = E.CRNC_CODE AND
C.PERIOD_ID = E.PERIOD_ID AND
D.PERF_ID = E.PERF_ID;
11.9 Query Processing and Optimization 379
Figure 11.5 shows the execution plan for SQ under this version.
Execution Plan
----------------------------
SELECT STATEMENT Cost = 9
2.1 NESTED LOOPS
3.1 NESTED LOOPS
4.1 NESTED LOOPS
5.1 MERGE JOIN CARTESIAN
6.1 TABLE ACCESS FULL PERF_DIMN
6.2 SORT JOIN
7.1 TABLE ACCESS FULL CRNC_DIMN
5.2 TABLE ACCESS FULL ACCT_FACT
4.2 TABLE ACCESS BY INDEX ROWID ACCT_DIMN
5.1 INDEX UNIQUE SCAN PK_ACCOUNT UNIQUE
3.2 TABLE ACCESS BY INDEX ROWID PERIOD_DIMN
4.1 INDEX UNIQUE SCAN PK_PERIOD_DIMN UNIQUE
13 rows selected.
The star transformation is an alternative way for executing star queries efficiently under
the cost-based query optimizer. The methods described in Section 11.9.1 work well for
star schemas with a small number of dimension tables and dense fact tables. The star
transformation provides an alternative method when one or more of the following condi-
tions hold.
• The number of dimension tables is large.
• The fact table is sparse.
• There are queries where not all dimension tables have qualifying predicates.
The star transformation does not compute a Cartesian product of the dimension tables.
As a result, it is better suited for those star queries where the sparsity of the fact table
and/or a large number of dimension tables would lead to a large Cartesian product with
few rows having actual matches in the fact table. In addition, rather than relying on con-
catenated indices, the star transformation is based on combining bitmap indices on indi-
vidual FK-columns of the fact table. Hence it is necessary to create a bitmap index on
each FK-column in the fact table. These columns have low cardinality and, therefore, are
ideal candidates for bitmap indexing. The star transformation can thus choose to combine
indices corresponding precisely to the constrained dimensions. There is no need to create
many concatenated indices where the different column orders match different patterns of
constrained dimensions in different queries.
The two hints, STAR and STAR_TRANSFORMATION, are mutually exclusive in
the sense that a query can use only one of these two hints. If a query does not mention
either of these two hints explicitly, then the query optimizer uses the STAR path when
11.9 Query Processing and Optimization 381
the number of tables involved in the FROM clause of the query exceeds the value of the
initialization parameter OPTIMIZER_SEARCH_LIMIT.
To use the hint STAR_TRANSFORMATION the query must satisfy the conditions:
As with any hint, the query optimizer may choose to ignore the star transformation
hint if it regards an alternative path as better. In general, the queries for the star transfor-
mation have to be very selective (i.e., restrictive WHERE clause) for the query optimizer
to choose a star transformation join. Otherwise, it usually does parallel hash joins. Oracle
does not allow the use of star transformation for tables with any of the characteristics:
• Tables with a table hint that is incompatible with a bitmap access path,
• Tables with too few bitmap indices,
• Remote tables,
• Antijoined tables,
• Tables that are already used as a dimension table in a subquery,
• Tables that have a good single-table access path, and
• Tables that are too small for the transformation to be worthwhile.
The hint STAR works well if the Cartesian product produces few rows. The hint
STAR_TRANSFORMATION works well if equijoin predicates produce few rows. In
other situations one must use a trial and error method to determine the best formulation of
a star query.
382 11. Tuning the Data Warehouse at all Levels
Key Words
aggregation metadata
ANALYZE multidimensional database
Cartesian product parameter file
data mining qualifying condition
data redundancy redo log buffer
datamart repository
decision support system rollback segment
denormalization schema diagram
dimension table semaphore
dimensional analysis SGA
DSS, see decision support system shared SQL pool
EXPLAIN PLAN SQL*Loader
export star query
extraction programs star query execution plan
fact table star schema
homonym star transformation
import synonym
life cycle transitive dependency
log switch Warehouse Technology Initiative
meta database wrinkle of time
metabase WTI
Burleson [3, Chapters 1–3], Corey and Abbey [4, Chapters 1 and 2], Inmon [6,
Chapters 1–4), and Schur [9, Chapter 9] have discussed the evolution of the concept of a
data warehouse and the characteristics and design issues resulting from them. Sparley
Exercises 383
Exercises
1. Refer to Exercise 3, Chapter 9. Assume that you want to design a datamart with
ORDER and LINE_ITEM combined into a single fact table and with CUSTOMER
and CUST_ORDER as two dimension tables. Remove Cust_Type and Cust_Region
from CUSTOMER and introduce them as dimension tables such as TYPE and
REGION. Make additional changes, as needed, to make the new structure represent
the conceptual level of a datamart.
(a) Run the query specified in the exercise against the datamart and record the exe-
cution plan and runtime.
(b) Run the same query using the hints described in Section 11.9.1 under “Combi-
nation of Three Hints” and record the execution plan and runtime.
(c) Run the same query again using the STAR hint described in Section 11.9.1 and
record the execution plan and runtime.
(d) Compare your findings in parts (a), (b), and (c) above to determine which exe-
cution plan is the best for your query.
2 . Record the values of the initialization parameters listed in Section 11.7 except
star_transformation_enabled for an operational database and a data warehouse in
your workplace.
(a) Identify the differences in their values and try to justify the differences.
(b) If you are not convinced that the values are indeed appropriate, talk with your
DBA to explore further. If you are the DBA, then either satisfy yourself that the
values are justified or change the values as needed.
384 12. Web-Based Database Applications
3. Explore the data loading tools that are used in a data warehouse environment with
which you are familiar.
(a) Prepare a metadata listing of all the mappings used in bulk data loading.
(b) For each mapping, identify the source, the target, and the transformation being
done to derive the target from the source.
(c) Describe how data inconsistencies are resolved during the transformation. Spe-
cifically, examine the four types of inconsistencies listed in Section 11.3.
4. Explore the wrinkle of time for a data warehouse with which you are familiar.
(a) Is there a single uniform wrinkle for all tables in the warehouse, or are there dif-
ferent wrinkles for different tables?
(b) In the latter case, determine the reason for the difference and examine if any data
inconsistency enters the warehouse due to the different wrinkles for different ta-
bles.
(c) Explore the possibility of setting up a single wrinkle equal to the maximum
value of all the different wrinkles. Will it introduce any operational difficulty?
12
Web-Based Database Applications
Outline
12.1 Advent of Web-Based Applications
12.2 Components of Web-Based Applications
12.3 Oracle Application Server (OAS)
12.4 Database Transaction Management Under OAS
12.5 Oracle Internet Application Server (iAS)
12.6 Performance Tuning of Web-Based Databases
12.7 Tuning of Internal Level
12.8 Tuning of External Level
Key Words
References and Further Reading
Exercises
logging at the client level accesses the database on the server via the connection estab-
lished by the user request. As long as the user remains logged in, the connection is main-
tained. Hence the connection is labeled as persistent. Once a persistent connection is
opened, it stays opened and results in quick response time. But idle user sessions waste
such resources. A nonpersistent connection, on the other hand, releases the resource as
soon as the transaction is complete, thereby allowing many more users to share this lim-
ited resource. No one user holds up a connection.
A Web-based database application optimizes database connectivity to achieve quick
response time with nonpersistent connections.
In order to perform the functions listed above we need the following set of tools that
serve as props for designing a Web-based application.
1. Web browser to provide end user interface;
2 . Application server to establish nonpersistent connections and to process user re-
quests;
3. Database server running one or more RDBMSs such as Oracle, SQL Server, Infor-
mix, etc. to provide SQL as the nonprocedural query language and some procedural
languages such as PL/SQL (Oracle), Transact SQL (SQL Server), etc. for supporting
the database tier;
4. HTTP (Hyper Text Transport Protocol), which is a stateless request/response proto-
col, to transmit objects back and forth between the Web browser and the application
server. An HTTP transaction consists of a request from the client directed to the
server and a response to the request returned by the server;
5. HTML (Hyper Text Markup Language) to code display objects. HTML pages con-
tain texts interspersed with tags that cause the Web browser to display these pages
with formats as directed by the tags. The pages are static in that they cannot change
their behavior in response to user generated events.
6. Compiled or interpreted programming languages such as Perl, C++, Java, UNIX
shell languages, etc. to code CGI-compliant application programs residing on the ap-
plication server that makes pages dynamic. These server-resident executable pro-
grams generate customized results in response to a user’s request.
Figure 12.1 contains a schematic of a Web-based database application.
Database Server
Oracle RDBMS
Application Server
OAS 4.0, C, Java, ...
Web Browser
HTTP
Client
OAS 4.0
Web Request
HTTP Listener Broker (WRB) Cartridges
ORB Server
ORB Metrics
Daemon
ORB Name
Server
ORB Monitor
Daemon
HTTP Listener
This layer offers the universal front-end for all Web-based applications. End users enter
their requests via a Web browser at this layer. They can request static pages or invoke
server-side programs to activate and display dynamic pages. The four components of this
layer perform the following functions.
• Listener: Handles incoming requests and routes them to the Dispatcher.
• Adapter: Embodies a common application programming interface (API) used by
Oracle and other third party listeners to connect to the Dispatcher.
• Dispatcher: Accepts a request from Listener and dispatches it to an appropriate car-
tridge instance (see the paragraph “Cartridges and Components” below).
• Virtual Path Manager: Provides the Dispatcher with cartridge mapping and authenti-
cation requirements for the cartridge instance.
The ultimate goal of WRB is to activate the cartridges and components to service end
user requests. The above 12 processes collectively accomplish that goal. The individual
steps involved in this activation are described below. The responsible processes are listed
within parentheses.
1. End user request arrives at Web browser.
2. HTTP listener identifies the cartridge instance, say, CART, needed by the request.
3. HTTP listener transmits the request to ORB services provided by Oracle Media Net
(ORB Server, ORB Name Server, ORB Log Server, ORB Metrics Daemon, ORB
Monitor Daemon).
4. If CART is available, then
(a) WRB is invoked (Configuration Provider, Logger, OAS Monitoring Daemon).
(b) CART instance is authenticated and provided (Authorization Host Server and
Authorization Server).
(c) WRB is started and CART instance is created in the cartridge factory (Resource
Manager, RM Proxy).
5. If CART is not available, then
(a) HTTP listener requests WRB that a new CART instance be created (Configura-
tion Provider, Logger, OAS Monitoring Daemon).
(b) Steps (b) and (c) are repeated.
Figure 12.3 shows the above sequence of steps graphically.
392 12. Web-Based Database Applications
HTTP Listener
identifies CART
CART
No available?
Authenticate
CART
Start WRB
Create CART
cialized functions such as computation of sales tax, estimation of inventory reorder level,
projection of human resource for a given level of customer needs and operating budget,
etc.
This layer performs four functions as listed below:
• Application: Controls cartridges executing application code;
• Cartridge: Uses configuration data and passed parameters to execute code in the
server;
• Cartridge Server: Manages and runs the cartridges as part of the application; and
• Cartridge Server Factory: Instantiates a cartridge when the Cartridge Server first
starts up and when requests for cartridges come in.
• Set the minimum and the maximum number of cartridges at 0.7N and 1.5N respec-
tively, assuming that these limits do not impose excessive paging. Oracle recom-
mends that when all cartridge instances are running, no more than 75% of the avail-
able swap space should be consumed.
The structure of the database on tier 3 and its tuning principles are the same irrespec-
tive of the application. To optimize the database performance it is necessary to monitor
and tune each level, i.e., conceptual, internal, and external. Therefore, the tools and prin-
ciples discussed in Chapters 3 through 10 for Oracle 8i apply equally well to the database
of a Web-based application. We discuss the tuning principles briefly in Sections 12.7 and
12.8.
OAS Listener/Dispatcher
IBM Compatible
IBM Compatible
Database Server
Database
Oracle 8i offers a set of built-in packages that help in running a well-tuned application
against the database. The list below gives an overview of some of these packages. Brown
[1, Chapter 9] contains a detailed discussion of these packages with sample code frag-
ments.
(a) DBMS_ALERT: Provides support for asynchronous notification of database events.
It allows the monitoring of database changes occurring from the start to the end of a
procedure.
(b) DBMS_APPLICATION_INFO: Provides a mechanism for registering the name of
the application module currently running and its current action within the database.
Using this package the DBA can determine usage and performance of real users and
take corrective action, if needed.
References and Further Reading 397
(c) DBMS_DDL: Allows the DBA to execute DDL commands from within a stored
procedure. Such commands include CREATE, ALTER, and ANALYZE.
(d) DBMS_SESSION: Provides access to the command ALTER SESSION and other
session information from within stored procedures. The DBA can alter session-
specific parameters dynamically with this package.
(e) DBMS_UTILITY: Contains a large number of procedures that help tuning both the
internal and the external levels. In particular, DBAs can use procedures and func-
tions such as analyze_schema, analyze_database, is_parallel_server, get_time, get_
parameter_value, db_version, etc. for their tasks.
Key Words
3-tier architecture nonpersistent connection
application server persistent connection
cartridge thin client
Common Gateway Interface Transaction Management
component universal desktop
HTTP Listener Web browser
Hyper Text Markup Language Web server
Hyper Text Transport Protocol World Wide Web
n-tier architecture
4. Oracle Corporation—Analytic Functions for Oracle 8i, An Oracle Technical White Paper,
October 1999.
5. J. Rodley—Developing Databases for the Web & Intranets,Coriolis Group Books, 1997.
Ju [3, Chapter 2] and Rodley [5, Chapter 1] provide a nice introduction to the new
paradigm of Web-based database applications. Dynamic Information Systems [2, Chap-
ter 2] and Rodley [5, Chapter 4] offer good descriptions of the architecture of Web-based
applications, primarily of the Web browser and the application server functions. Dynamic
Information Systems [2, Chapters 3, 10] discusses OWAS 3.0, which is the version prior
to OAS 4.0. Brown [1, Chapter 10] gives a comprehensive coverage of the features of the
four different versions of OAS. Transaction Management services constitute an important
feature of both OWAS 3.0 and OAS 4.0. Dynamic Information Systems [2, Chapter 10]
and Brown [1, Chapters 10, 14] discuss this feature of OWAS and OAS respectively.
Brown [1, Chapter 4] discusses the use of UNIX “sar” utility and the principle of load
balancing among nodes as tools for tuning the internal level of the Web database. Both of
them have been discussed earlier in Chapter 6 of the present book. Brown [1, Chapter 9]
gives a detailed treatment of Oracle 8i built-in packages for tracking database perform-
ance at both internal and external levels under Oracle 8i.
Exercises
This chapter does not introduce any new tools for performance tuning. Instead it refers to
Chapters 4 through 10 for the corresponding tools and methods. Hence the exercises be-
low involve the use of those tools in a Web-based database environment. The exercises
also assume that you have access to Web-based database applications as a DBA and that
these applications are built with Oracle8 and/or Oracle 8i.
1. Apply the tuning principles and tools for the internal level of a database as discussed
in Chapters 4 through 7 and 10 to a Web-based application built with Oracle 8i. As-
sess the effect of your tuning by using metrics such as hit ratios, shared SQL pool
size, “sar” utility for measuring the extent of paging and swapping, SGA size versus
real memory size, etc.
2. Use the tuning principles and tools for the external level of a database as discussed in
Chapters 8 through 10 to a Web-based application built with Oracle 8i. Compare the
results with those derived by using the built-in packages for tuning as discussed in
Section 12.7.
3. If you have experience with using OAS 4.0 under Oracle8 and Oracle 8i, describe
the differences in performance of OAS 4.0, if any, between these two versions of the
Oracle RDBMS.
Appendices
The five appendices included here discuss several DBA issues, although not directly re-
lated to performance tuning and optimization.
Appendix A provides a sizing methodology for estimating the storage space needed
for tables, indices, and tablespaces. Two C programs are included to implement the sizing
algorithms.
Appendix B contains a step-by-step methodology to create an instance and its associ-
ated database. A large number of scripts are included in the appendix to help a DBA per-
form this task.
Appendix C offers a similar step-by-step methodology to drop an instance and its as-
sociated database.
Appendix D provides a methodology for refreshing an existing database with data
from another database with identical structure. The transportable tablespaces introduced
in Oracle 8i are used for this purpose.
Appendix E contains the mathematical foundation of relational database systems. It is
designed for mathematically inquisitive readers who are interested in the theory underly-
ing relations, normalization principles, query languages, and the search process.
Appendix A
Sizing Methodology in Oracle 8i
Outline
A1. Transition from Logical to Physical Database Design
A2. Space Usage via Extents
A3. Algorithms for Sizing Tables, Indices, Tablespaces
A4. STORAGE Clause Inclusion: Table and Index Levels
A5. Sizing Methodology
A6. RBS, SYSTEM, TEMP, and TOOLS Tablespace Sizing
Key Words
References and Further Reading
current extent becomes full. The NEXT extent is sized according to the following for-
mula,
NEXT = NEXT * (1 + PCTINCREASE / 100).
Oracle sets the default value of PCTINCREASE to 50. We can see right away that
PCTINCREASE = 50 will lead to a combinatorial explosion of the size of the NEXT ex-
tents, because then each NEXT EXTENT will be 1.5 times the size of the current NEXT
EXTENT. Therefore, one should make PCTINCREASE = 0. However, see the caveat in
Section A4. The default value of MAXEXTENTS has already been listed in Section
1.5.1. The option MAXEXTENTS = UNLIMITED, which assigns the value of
2,147,483,645 to MAXEXTENTS, should be used with caution and with regular moni-
toring of the storage space in the tablespaces, because it may lead to a system crash when
a disk becomes full during an unusually long update transaction.
Header Block
PCTFREE Area
Various algorithms are available to estimate the storage space needed for storing the
rows of a table. All of them need the rowcount of a table as one of the input parameters.
Usually, this is difficult to obtain. Also, overestimation is always recommended. I use
two C programs, one for tables and the other for indices.
Table
/* PURPOSE: This program reads the name of a table in the
database, its rowsize, and its rowcount from a text file.
Then it computes the number of Oracle blocks needed to store
the data of the table and prints the result in the output
file.
PROGRAM FILE NAME: tablsize.c
INPUT FILE NAME: tbls.txt
OUTPUT FILE NAME: tbls_size.txt
AUTHOR: NAME
DATE CREATED: DATE
*/
#include <stdio.h>
#define BLOCK_SIZE 8192
#define HEADER 90
#define PCTFREE 10.0
void main ()
{
FILE *fopen (), *in_file, *out_file;
int num_blocks_needed, num_rows_per_block, block_space,
rowsize, num_rows;
float available_block_space, rows_per_block,
A3 Algorithms for Sizing Tables, Indices, Tablespaces 405
blocks_needed, rowcount;
char line [81], table_name [50];
/* Open the input file tbls.txt in READ mode and
the output file tbls_size.txt in APPEND mode */
in_file = fopen ("tbls.txt", "r");
out_file = fopen ("tbls_size.txt", "a");
/* Read each record in the file tbls.txt, disassemble it into
three separate fields, TABLE_NAME, ROWSIZE, and
ROWCOUNT, and compute the number of Oracle blocks
needed to store the records of the table */
while ( fgets (line, 81, in_file) != NULL )
{ sscanf (line, "%s %d %f", table_name, &rowsize,
&rowcount);
num_rows = rowcount;
block_space = BLOCK_SIZE - HEADER;
available_block_space = block_space * (1 - PCTFREE / 100);
rows_per_block = available_block_space / rowsize;
num_rows_per_block = rows_per_block;
blocks_needed = rowcount / num_rows_per_block;
num_blocks_needed = blocks_needed + 1.0;
fprintf (out_file, "%-50s%d\n", table_name, num_blocks_needed);
}
}
Index
/* PURPOSE: This program reads from input file the
following parameters:
-- an index name,
-- total length of indexed columns,
-- total number of indexed columns,
-- the number of columns exceeding 127
characters,
-- uniqueness,
-- and rowcount of the table.
Then it computes the number of Oracle blocks needed to
store the index data and prints the result in the output
file.
PROGRAM FILE NAME: indxsize.c
INPUT FILE NAME: indx.txt
OUTPUT FILE NAME: indx_size.txt
AUTHOR: NAME
DATE CREATED: DATE
*/
#include<stdio.h>
#define BLOCK_SIZE 8192
#define HEADER 161
406 Appendix A. Sizing Methodology in Oracle 8i
#define HEADER_SPACE 8
#define PCTFREE 10.0
void main ()
{
FILE *fopen (), *in_file, *out_file;
int num_blocks_needed, num_indices_per_block,
block_space,tot_length_ind_cols;
int tot_num_ind_cols, tot_num_long_cols, num_rows,
index_space, unq_ind;
float available_block_space, indices_per_block,
blocks_needed, rowcount;
char line [81], index_name [50];
/* Open the input file indx.txt in READ mode and the
output file indx_size.txt in APPEND mode. */
in_file = fopen ("indx.txt", "r");
out_file = fopen ("indx_size.txt", "a");
/* Read each record in the file indx.txt, disassemble it
into six separate fields, INDEX_NAME,
TOT_LENGTH_IND_COLS, TOT_NUM_IND_COLS,
TOT_NUM_LONG_COLS, UNQ_IND, and ROWCOUNT; then
compute the total number of Oracle blocks needed to
store the index, and print the result in the output
file indx_size.txt. */
while ( fgets (line, 81, in_file) != NULL )
{ sscanf (line, "%s %d %d %d %d %f",
index_name, &tot_length_ind_cols,
&tot_num_ind_cols, &tot_num_long_cols, &unq_ind,
&rowcount);
num_rows = rowcount;
block_space = BLOCK_SIZE - HEADER;
available_block_space = block_space * (1 - PCTFREE / 100);
index_space = tot_length_ind_cols + tot_num_ind_cols +
tot_num_long_cols + HEADER_SPACE + unq_ind;
indices_per_block = available_block_space / index_space;
num_indices_per_block = indices_per_block;
blocks_needed = rowcount / num_indices_per_block;
num_blocks_needed = blocks_needed + 1.0;
fprintf (out_file, "%-50s%d\n", index_name, num_blocks_needed);
}
}
Tablespace
A tablespace for tables or indices holds all the tables or indices respectively that are cre-
ated in that tablespace. The size of a tablespace, therefore, is calculated using the sizes of
its contents as follows.
A4 STORAGE Clause Inclusion: Table and Index Levels 407
RBS
We assume that RBS stores two rollback segments, RBS1 and RBS2. A rollback segment
stores the before image data for rows marked for update and delete. Oracle allocates ex-
tents for rollback segments in a round-robin fashion. Extents whose data have been com-
mitted or rolled back are marked inactive and are available for reuse. If a long query us-
ing the data marked for update or delete started before the update or delete occurred and
is still continuing, it may look for data in these inactive extent(s). But when the data in
some of these extent(s) are overwritten with new data, the query can no longer find the
data it needs. In such a case, Oracle returns a rather cryptic message, “snapshot too old.”
To remedy such situations, I have designated RBS2 as the rollback segment to be used
for long transactions via the SET TRANSACTION command given below.
commit; /* clear prior rollback segment assignment */
set transaction use rollback segment RBS2
(Oracle long update transaction statements appear here)
commit; /* clear assignment of current rollback segment RBS2 */
The initial and next extents of RBS2 are, therefore, much larger than those of RBS1.
References and Further Reading 409
SYSTEM
It stores only the data dictionary, V$ tables, and several tables pertaining to stored objects
such as SOURCES$, TRIGGERS$, etc. But it must never store any user tables, indices,
constraints, etc.
TEMP
Its segment sizes change continually depending on the space needed by a sorting opera-
tion. Always designate the TEMP tablespace as “temporary” as follows.
create tablespace TEMP temporary;
Then, only the temporary segments can be stored there. This allows each sorting op-
eration that uses TEMP to continue without dropping and then claiming a new segment,
thereby reducing fragmentation.
TOOLS
It stores database objects created by any Oracle or third party tools. By default such tools
use SYSTEM as the database account and, therefore, any objects created by them use
SYSTEM as the tablespace. This should never be done. To enforce this policy, designate
TOOLS as the default tablespace for the SYSTEM account and reduce the quota of this
account on SYSTEM tablespace to zero as follows.
410 Appendix A. Sizing Methodology in Oracle 8i
If one uses Oracle’s data replication option, more storage in TOOLS will be needed to
store the catalog tables REPCAT$ and the error tables needed by replication. This should
be taken into consideration in sizing the TOOLS tablespace.
Key Words
block header, fixed round-robin
block header, variable SMON
Loney [2, Chapter 5] discusses an algorithm to compute the sizes of tables and indi-
ces. The C programs included in Section A3 use a slight variation of them. In addition,
Loney provides valuable information about the sizing of database objects and the space
management mechanism used by Oracle. Mittra [3] contains an earlier version of the
sizing algorithm used here.
Appendix B
Instance and Database Creation
Outline
B1. Preparation
B2. Instance Startup
B3. Database Creation
B4. Creation of Users, Roles, Privileges, Schema
B5. Miscellaneous Informational Items
Attachments 1 to 15
B1. Preparation
I recommend that the UNIX shell command script be used to create a complete transcript
of the sessions to create the full database. It helps the beginner DBAs to keep track of
their actions including errors during the creation of the database. For example, type at
UNIX prompt:
script –a scriptfile
The file scriptfile will contain everything that is displayed on the screen. The option
“-a” will continue to append to the same scriptfile all the transactions from all sessions in
case the database creation spans multiple sessions.
414 Appendix B. Instance and Database Creation
Oracle requires a number of directories and files to be placed in certain specific loca-
tions before it starts the instance and creates the database. This section describes these
preparatory steps. The names and locations of these files and directories are dependent on
the installation. Change the steps below, as needed, to match your specific installation.
Oracle first starts the instance in the NOMOUNT state from the specifications of the
parameter file (“pfile”). It allocates memory to the SGA and activates the required back-
ground processes. Then it creates the database via the command CREATE DATABASE,
creates the data dictionary views, and all the tablespaces.
(a) Login as oracle/oradba or some other highly privileged account.
(b) cd $ORACLE_HOME/dbs
(This directory contains the file initABCD.ora or contains a link to another directory
containing that file. In the latter case, type cd (link to that directory); e.g., type “cd
/oracle/admin/ABCD/pfile”. See Attachment 1 for the initABCD.ora file.
(c) Edit the file initABCD.ora, as needed. This file sets up the updatable initialization
parameters. For example, you may want to assign nondefault values to certain ini-
tialization parameters such as PROCESSES, DB_FILES, etc.
(d) Ensure that the file /oracle/admin/ABCD/pfile/configABCD.ora exists. This file
sets up the Oracle configuration for the instance ABCD. See Attachment 2 for the
configABCD.ora file.
(e) cd /etc
(f) Find the file “oratab” and ensure that the file contains the line:
ABCD:/oracle/product/8.1.7:Y
(This causes Oracle to recognize the new instance ABCD. The field value “Y” indi-
cates to the dbstart utility that the database will be brought up at the system boot
time.) See Attachment 3 for the oratab file.
(g) Edit the file /etc/listener.ora to include the following blocks.
(ADDRESS=
(PROTOCOL=IPC)
(KEY= ABCD)
)
(SID_DESC =
(SID_NAME = ABCD)
(ORACLE_HOME = /oracle/product/8.1.7)
)
This ensures that the Oracle listener will run when the instance and the database are
created. See Attachment 4 for the listener.ora file.
(h) cd /oradata
Normally the datafiles for the tablespaces in the database are created in one or more
subdirectories of this directory, although different locations are not uncommon. For
this appendix, it is assumed that 10 directories named u01a, u01b, u02a, u02b, ….,
u05a, u05b are created as subdirectories of this directory. Next, ensure that a subdi-
B1 Preparation 415
rectory named ABCD exists under each of u01a through u05b. All datafiles, control
file, and redo log files will be stored in these 10 subdirectories.
/oradata/u01a/ABCD
/oradata/u01b/ABCD
/oradata/u02a/ABCD
/oradata/u02b/ABCD
/oradata/u03a/ABCD
/oradata/u03b/ABCD
/oradata/u04a/ABCD
/oradata/u04b/ABCD
/oradata/u05a/ABCD
/oradata/u05b/ABCD
(i) cd /usr/lbin/oracle/bin
(j) ls – l
This displays the contents of this directory, which should include the file db_prms.
This file is the parameter file for the instance ABCD. Verify that the directory also
contains the file db_util, which must contain the line
prm_file=/usr/lbin/oracle/bin/db_prms
to signify that ABCD recognizes db_prms as its parameter file. See Attachment 5 for
the db_prms file and Attachment 6 for the db_util file.
(k) Insert the following line for ABCD in db_prms.
:ABCD:system:manager:applmgr:apps:fnd:/dev/rmt/5mn:Y:N:Y:Y:fn
snetdev
Connected
Total System Global Area 13409004 bytes
Fixed Size 47852 bytes
Variable Size 11649024 bytes
Database Buffers 1638400 bytes
Redo Buffers 73728 bytes
Statement processed
for all database objects owned by ABCD_OWNER so that any other user can access
these objects without putting a prefix ABCD_OWNER to each object name.)
See Attachment 12 for the PUBL_SYN.sql file.
(d) Type @SYNON_cr.sql
(This creates all the requisite public synonyms mentioned in Step (c) above.)
See Attachment 13 for the SYNON_cr.sql file.
(e) Run all the appropriate scripts for creating views, triggers, procedures, etc.
Attachments
ATTACHMENT 1: initABCD.ora
# $Header: initx.orc 12-jun-97.09:14:56 hpiao Exp $ Copyr (c) 1992
Oracle
#
# include database configuration parameters
ifile = /oracle/admin/ABCD/pfile/configABCD.ora
rollback_segments = (RBS1,RBS2,RBS3,RBS4,RBS5)
#################################################################
###
##
# Example INIT.ORA file
#
# This file is provided by Oracle Corporation to help you
# customize your RDBMS installation for your site. Important
# system parameters are discussed, and example settings given.
#
B5 Miscellaneous Informational Items 419
ATTACHMENT 2: configABCD.ora
#
# $Header: cnfg.orc 1.1 95/02/27 12:14:25 wyim Osd<unix> $ Copyr
(c) 1992 Oracle
#
# cnfg.ora - instance configuration parameters
control_files = (/oradata/u01b/ABCD/ctl_001.ctl,
/oradata/u02b/ABCD/ctl_002.ctl,
/oradata/u03b/ABCD/ctl_003.ctl)
# Below for possible future use...
#init_sql_files = (?/dbs/sql.bsq,
# ?/rdbms/admin/catalog.sql,
# ?/rdbms/admin/expvew.sql)
background_dump_dest = /oratmp/dump/ABCD/bdump
core_dump_dest = /oratmp/dump/ABCD/cdump
user_dump_dest = /oratmp/dump/ABCD/udump
log_archive_dest = /oratmp/arch/ABCD/ABCD
db_block_size = 8192
B5 Miscellaneous Informational Items 421
# checkpoint_process = true
db_name = ABCD
db_files = 254
compatible = 8.1.5
open_cursors = 500
nls_date_format = DD-MON-RR
nls_language = AMERICAN
nls_territory = AMERICA
nls_numeric_characters = ".,"
nls_sort = binary
optimizer_mode = CHOOSE
_optimizer_undo_changes = true
hash_join_enabled = false
row_locking = always
OS_AUTHENT_PREFIX = "OPS$"
# unlimited_rollback_segments = true
ATTACHMENT 3: ORATAB
ORCL:/oracle/product/8.1.7:N
ABCD:/oracle/product/8.1.7:Y
ATTACHMENT 4: listener.ora
################
# Filename......: listener.ora
# Date..........: 10-NOV-00 13:38:20
################
LISTENER =
(ADDRESS_LIST =
(ADDRESS=
(PROTOCOL=IPC)
(KEY= ORCL.world)
)
(ADDRESS=
(PROTOCOL=IPC)
(KEY= ORCL)
)
(ADDRESS=
(PROTOCOL=IPC)
(KEY= ABCD)
)
(ADDRESS =
(COMMUNITY = FNS2)
(PROTOCOL = TCP)
(Host = abcnetde)
(Port = 1521)
)
)
STARTUP_WAIT_TIME_LISTENER = 0
CONNECT_TIMEOUT_LISTENER = 10
422 Appendix C. Instance and Database Removal
TRACE_LEVEL_LISTENER = OFF
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = ORCL)
(ORACLE_HOME = /oracle/product/8.1.7)
)
(SID_DESC =
(SID_NAME = ABCD)
(ORACLE_HOME = /oracle/product/8.1.7)
)
)
ATTACHMENT 5: db_prms
#***************************************************************
****
#: Parameter Description Example
#: --------- ----------- -------
#: $1 active or inactive SID " ":active, #:inactive
#: $2 Oracle SID name XXX_D, XXX_P
#: $3 database system username sysusername
#: $4 database system password syspassword
#: $5 AP manager password apmanager
#: $6 AP username apusername
#: $7 AP password appassword
#: $8 tape device name /dev/rmt/1m
#: $9 concurrent manager flag Y or N
(financials)
#: $10 turn nightly export on/off Y or N
#: $11 turn nightly physical backup Y or N
on/off
#: $12 restart database after backup Y or N
flag
#: $13 UNIX system node name development:develop
#: production:waltham
#:**************************************************************
******
#:
:ORCL:system:manager::::/dev/rmt/5mn:Y:N:Y:Y:abcnetdev
:ABCD:system:manager::::/dev/rmt/5mn:Y:N:Y:Y:abcnetdev
ATTACHMENT 6: db_util
#!/bin/sh
#
# Input: $1 is the Oracle SID name
#
# Definition:
# Set Oracle SID specific UNIX environment
# variables for use in database maintenance scripts.
# SID specific parameters are maintained in
B5 Miscellaneous Informational Items 423
# file db_prms.
#
in1=$
#
in2=$1
# input parameter is ORACLE SID name
prm_file=/usr/lbin/oracle/bin/db_prms
# data base parameter file
#
# verify input parameter is a valid SID
# if input parameter is blank or an invalid SID then exit
#
if [ "$in1" -ne 1 -o "`grep "$in2" /etc/oratab`" = "" ]
then
echo ""
echo "db_envs : Invalid Oracle SID - $in2 "
echo "db_envs : Aborting db_envs"
echo ""
sleep 5
exit 1
else # exit if SID not found or not active in the db_prms
file
if [ "`grep "$in2" $prm_file | awk -F: '{print $2}'`" = "" -o
"`grep "$in2" $prm_file | awk -F: '{print $1}'`" = "#" ]
then
echo ""
echo "db_envs : Inactive Oracle SID - $in2"
echo "db_envs : Verify Oracle SID in db_prms"
echo "db_envs : Aborting db_envs"
echo ""
sleep 5
exit 1
fi
fi
#
# extract all administration parameters and
# assign them to global environment variables
#
db_excl=`grep ${in2} $prm_file | awk -F: '{print $1}'`
db_dsid=`grep ${in2} $prm_file | awk -F: '{print $2}'`
db_smgr=`grep ${in2} $prm_file | awk -F: '{print $3}'`
db_spas=`grep ${in2} $prm_file | awk -F: '{print $4}'`
db_amgr=`grep ${in2} $prm_file | awk -F: '{print $5}'`
db_ausr=`grep ${in2} $prm_file | awk -F: '{print $6}'`
db_apas=`grep ${in2} $prm_file | awk -F: '{print $7}'`
db_tdev=`grep ${in2} $prm_file | awk -F: '{print $8}'`
db_cflg=`grep ${in2} $prm_file | awk -F: '{print $9}'`
db_eflg=`grep ${in2} $prm_file | awk -F: '{print $10}'`
db_bflg=`grep ${in2} $prm_file | awk -F: '{print $11}'`
db_sflg=`grep ${in2} $prm_file | awk -F: '{print $12}'`
db_node=`grep ${in2} $prm_file | awk -F: '{print $13}'`
424 Appendix C. Instance and Database Removal
ATTACHMENT 7: create1_ABCD.SQL
spool $HOME/ABCD/create1_ABCD.log;
connect INTERNAL
startup nomount pfile=$ORACLE_HOME/dbs/initABCD.ora
CREATE DATABASE ABCD
LOGFILE GROUP 1 ('/oradata/u01a/ABCD/log_a01.dbf',
'/oradata/u04a/ABCD/log_a02.dbf') size 50M reuse,
GROUP 2 ('/oradata/u01a/ABCD/log_b01.dbf',
'/oradata/u04a/ABCD/log_b02.dbf') SIZE 50M reuse
MAXLOGFILES 32
MAXLOGMEMBERS 3
MAXLOGHISTORY 1
DATAFILE '/oradata/u03a/ABCD/sys_001.dbf' SIZE 300M
MAXDATAFILES 254
MAXINSTANCES 1
CHARACTER SET WE8ISO8859P1
NATIONAL CHARACTER SET WE8ISO8859P1;
spool off;
ATTACHMENT 8: create2_ABCD.sql
spool $HOME/ABCD/create2_ABCD.log;
set echo on
connect INTERNAL
CREATE ROLLBACK SEGMENT SYSROL TABLESPACE "SYSTEM"
STORAGE (INITIAL 100K NEXT 100K);
ALTER ROLLBACK SEGMENT "SYSROL" ONLINE;
@$ORACLE_HOME/rdbms/admin/catalog.sql;
@$ORACLE_HOME/rdbms/admin/catproc.sql;
@$ORACLE_HOME/rdbms/admin/catexp.sql;
@$ORACLE_HOME/rdbms/admin/catexp7.sql;
@$ORACLE_HOME/sqlplus/admin/pupbld.sql;
REM *** Create tablespace for rollback segments ***
CREATE TABLESPACE RBS DATAFILE '/oradata/u03b/ABCD/rbs_001.dbf'
SIZE 1000M
DEFAULT STORAGE (INITIAL 10M NEXT 10M MINEXTENTS 5 MAXEXTENTS 31
PCTINCREASE 0);
REM *** Alter SYSTEM tablespace ***
ALTER TABLESPACE SYSTEM
DEFAULT STORAGE (INITIAL 50M NEXT 5M MINEXTENTS 1 MAXEXTENTS 51
PCTINCREASE 0);
REM *** Create tablespace for data in transaction tables ***
B5 Miscellaneous Informational Items 425
ATTACHMENT 9: SYS_MOD.sql
REM Script for Altering SYS and SYSTEM Users
REM Script File: My_Directory\SYS_MOD.sql
REM Spool File: My_Directory\SYS_MOD.log
REM Author: NAME
REM Date Created: DATE
REM Purpose: Alter SYS and SYSTEM users for their default
tablespaces
SPOOL /users/oracle/ABCD/SYS_MOD.log
ALTER USER SYS TEMPORARY TABLESPACE TEMP;
ALTER USER SYSTEM
QUOTA UNLIMITED ON TOOLS
QUOTA 0 ON SYSTEM
DEFAULT TABLESPACE TOOLS
TEMPORARY TABLESPACE TEMP;
SPOOL OFF
MAXTRANS value
STORAGE (INITIAL value NEXT value PCTINCREASE value
MAXEXTENTS value MINEXTENTS value )
TABLESPACE tablespace_name;
REM Create other objects such as triggers, functions, procedures,
REM packages after running the above script.
partition_id numeric,
other long);
Outline
C1. Preparation
C2. Locating the Components
C3 Removing the Components
C4. Verification
C1. Preparation
In order to drop an instance and its associated database it is necessary to remove these
components:
(a) Data files, log files, and control files constituting the database;
(b) Parameter file init<SID>.ora and configuration file config<SID>.ora;
(c) Destination files for bdump, cdump, udump, archive, and audit;
(d) Database links referencing the database ;
(e) References to ORACLE_SID for the instance in the ORATAB file;
(f) References to the database in the files TNSNAMES.ora and LISTENER.ora, and
(g) References to the background processes.
Section C2 contains the steps along with SQL scripts to identify and locate the above
components so that they can be subsequently removed from the system. The database
must be running to allow extracting the relevant location information. Therefore, if nec-
434 Appendix C. Instance and Database Removal
essary, run the following command from Server Manager after connecting as “connect
internal”.
startup pfile=’pfile path’
Also, it is recommended that you make a full cold backup of the database.
Locate Data Files, Log Files, and Control Files Constituting the Database
Locate the Destination Files for bdump, cdump, udump, archive, and audit
SVRMGR> show parameters dest
NAME TYPE VALUE
----------------------- ------ ----------------------
audit_file_dest string ?/rdbms/audit
background_dump_dest string /home/oracle/admin/ABCD/bdump
core_dump_dest string /home/oracle/admin/ABCD/cdump
log_archive_dest string /home/oracle/oradata/ABCD/admi
log_archive_dest_1 string
log_archive_dest_2 string
log_archive_dest_3 string
log_archive_dest_4 string
log_archive_dest_5 string
log_archive_dest_state_1 string enable
log_archive_dest_state_2 string enable
log_archive_dest_state_3 string enable
log_archive_dest_state_4 string enable
log_archive_dest_state_5 string enable
log_archive_duplex_dest string
log_archive_min_succeed_dest integer 1
standby_archive_dest string ?/dbs/arch
user_dump_dest string /home/oracle/admin/ABCD/udump
Here process_ID refers to the individual process numbers listed above. Repeat the
“kill –9” command for each process.
(f) Remove all the directories recursively that are related to the instance(s) using this
database. But if any other instances exist that refer to the files in these directories,
then do not proceed with this step.
C4. Verification
By removing all the files that are associated with the database we have eliminated all ref-
erences to the instance and freed up all resources tied to this database and all the in-
stances that reference it. Also, all the storage space has been reclaimed. To be on the safe
side, it is better to try starting up the instance with the init<SID>.ora file that was deleted
in Step C3(a). Oracle should return an error message that the instance cannot be started.
But if the instance starts even partially, then the steps described in Sections C2 and C3
should be rechecked and repeated until the instance and its associated database are totally
removed.
Appendix D
Database Refresh with
Transportable Tablespaces
Outline
D1 Database Refresh Process
D2. Detailed Methodology with Scripts
D3. Time Estimates
D4 Internal Inconsistency
Key Words
belonging to the set cannot reside on a tablespace outside that set. Oracle 8i provides a
PL/SQL procedure to test whether a set of tablespaces is self-contained.
The set of transportable tablespaces used in the refresh must satisfy the following
conditions.
• They must be identical in name and number on both the source and the destination
databases.
• The source and the destination database must be on the same hardware platform. For
example, one can transport a tablespace from a Sun Solaris platform to another Sun
Solaris platform, but not to an HP-UX platform.
• The source and the destination database must have the same block size; i.e., the ini-
tialization parameter db_block_size must be identical in value on both the source and
the destination databases.
• A tablespace on the source database cannot be transported to a destination database if
the latter already contains a tablespace with the same name.
Conceptually, the refresh process consists of the following steps.
• Identify a set of self-contained tablespaces in the source database that will be used in
the refresh process.
• Export the metadata, i.e., the structural information, of these tablespaces to an export
dump file.
• Copy the disk files comprising these tablespaces to the destination database file sys-
tem.
• Import the metadata into the destination database so as to be plugged into the copied
datafiles.
If, on the other hand, S is not self-contained, then Oracle displays a set of violations
as shown in Figure D.2. In that case, S cannot be used as a set of transportable table-
spaces for the database refresh.
(b) Assuming S to be self-contained, put the four tablespaces in READ ONLY mode as
follows.
alter tablespace DATA01 read only;
alter tablespace DATA02 read only;
alter tablespace INDEX01 read only;
alter tablespace INDEX01 read only;
Verify the altered status of the tablespaces by running the following command.
Select tablespace_name, status from dba_tablespaces
order by 1;
442 Appendix D. Database Refresh with Transportable Tablespaces
The four tablespaces, DATA01, DATA02, INDEX01, and INDEX02, will have their
status as READ ONLY. The segments in the above four tablespaces will be available
only for data retrieval until the tablespaces are altered again to the READ WRITE
mode.
(c) Use a copy utility at the operating system level to copy all the datafiles comprising
these four tablespaces of X to appropriate locations on Y. For example, under UNIX
one can use “cp” to copy files on the same server or “ftp” to copy files to a different
server. Ensure that there is enough space on Y to hold the copied files. If such space
is not available, proceed as follows.
• Execute Steps (a) and (b) of Section D2.2 (see below) so that the datafiles of Y
corresponding the four tablespaces, DATA01, DATA02, INDEX01, and
INDEX02, are no longer in use. The space occupied by them now becomes
available.
• Copy the datafiles on X to their respective locations on Y that have just been
freed up.
After the copying is finished, verify that the size of each copied file exactly matches
the size of its source file. If a copied file is smaller than its source, then the copying
did not complete successfully. In that case, repeat the copying process for the file
until the size of the copied file equals that of the source file.
(d) Export the metadata of the four tablespaces using Oracle’s export utility for
transportable tablespaces.
exp parfile=exp_transp_tblspc.txt
Figure D.3 contains the code of the export parameter file (“parfile”) for performing
the export.
userid=’sys as sysdba’
TRANSPORT_TABLESPACE=Y
TABLESPACES=DATA01,DATA02,INDEX01,INDEX02
file=X_tblspc.dmp
log= X_tblspc.log
Oracle prompts for the password when the export is run. Enter “sys as sysdba” in re-
sponse.
D2 Detailed Methodology with Scripts 443
During this stage the tables and the tablespaces of Y matching the set S are dropped, the
metadata of the tablespaces are imported from the export dump file to Y, and each table-
space is matched with its disk file(s) already copied from X.
( a ) Drop all tables cascading their constraints from the tablespaces DATA01 and
DATA02 on Y by running a series of statements of the form
drop table table_name cascade constraints;
for every table belonging to the tablespaces DATA01 and DATA02.
(b) Drop the four tablespaces on Y including their contents by running the following se-
ries of commands.
drop tablespace DATA01 including contents;
drop tablespace DATA02 including contents;
drop tablespace INDEX01 including contents;
drop tablespace INDEX02 including contents;
(c) Stop the instance by issuing the following command.
SVRMGR> shutdown immediate
(d) Start the instance by issuing the following command.
SVRMGR> startup pfile=’pfile_path’
The database Y no longer contains the four tablespaces DATA01, DATA02,
INDEX01, and INDEX02. Hence their metadata can be imported from the export
dump file created in Step (d), Section D2.1.
(e) Import the metadata of the four tablespaces using Oracle’s import utility for trans-
portable tablespaces.
imp parfile=imp_transp_tblspc.txt
Figure D.4 contains the code of the import parameter file (“parfile”) for performing
the import.
userid=’sys as sysdba’
TRANSPORT_TABLESPACE=Y
TABLESPACES=DATA01,DATA02,INDEX01,INDEX02
DATAFILES=(‘copied disk file(s) for DATA01’,‘copied disk
file(s) for DATA02’, ‘copied disk file(s) for INDEX01’,
‘copied disk file(s) for INDEX02’)
file=X_tblspc.dmp
log= Y_tblspc.log
The string ‘copied disk file(s) for DATA01’ stands for the full absolute path on Y of
each datafile copied from X comprising DATA01. A similar interpretation applies to
DATA02, INDEX01, and INDEX02.
(f) Put the four tablespaces DATA01, DATA02, INDEX01, and INDEX02 on X back to
READ WRITE mode by issuing the following commands:
alter tablespace DATA01 READ WRITE;
alter tablespace DATA02 READ WRITE;
alter tablespace INDEX01 READ WRITE;
alter tablespace INDEX02 READ WRITE;
Verify the altered status of these tablespaces by issuing the following command.
select tablespace_name, status from dba_tablespaces
order by 1;
Check that the four tablespaces DATA01, DATA02, INDEX01, and INDEX02 have
their status changed to ONLINE.
Including preparation and ancillary tasks the total refresh procedure for a database of
size 60+ GB takes about 6 hours. During this time the source and the destination data-
bases remain unavailable except that the source database can be accessed for data re-
trieval only. This does not always sit well with the business users. As a compromise, the
DBA can put the tablespaces on the source database X to READ WRITE mode after exe-
cuting Step (d), Section D2.1. Then, Step (f), Section D2.2 is not needed. X becomes
fully available to the users after about 4 1/2 hours instead of 6 hours. But this shortcut
may cause a problem as discussed in Section D4 below.
the time stamps of their copies made earlier on Y. In fact, the timestamps of the datafiles
on Y will be less than the new timestamps of the datafiles on X. When Step (e), Section
D2.2 is run with the new export dump file, Oracle detects the violation of internal con-
sistency due to the timestamp mismatch and returns the following error message.
IMP-00017: following statement failed with ORACLE error 19722:
“BEGIN
sys.dbms_plugts.checkDatafile(NULL,2385892692,6,256000,4,6,0,0,1970”
“6,72926999,1,NULL, NULL, NULL, NULL); END;”
IMP-00003: ORACLE error 19722 encountered
ORA-19722: datafile DATA01_a.dbf is an incorrect version
ORA-06512: at “SYS.DBMS_PLUGTS”, line 1594
ORA-06512: at line 1
IMP-00000: import terminated unsuccessfully
Figure D.5 shows the violation of internal consistency graphically. We assume that at
instant t3.1, where t3 < t3.1 < t4, the four tablespaces on X are put back into READ WRITE
mode. Then at instant t8.1, where t8 < t8.1, the export causes internal inconsistency.
t4
.
.
.
.
t7
t8: Physical import to Y
Key Words
export parameter file self-contained tablespaces
import parameter file transportable tablespaces
Appendix E
Mathematical Foundation of
Relational Databases
Outline
E1. Relational Database Systems Foundation Pillars
E2. Relation
E3. Functional Dependency
E4. Query Languages
E5. Relational Algebra: Prescriptive Query Languages
E6. Primitive and Derived Operations
E7. Closure Property for Relational Algebra
E8. Relational Calculus: Descriptive Query Languages
E9. Tuple Relational Calculus
E10. Domain Relational Calculus
E11. Equivalence Theorem for Algebra and Calculus
E12. Data Structures for Search Algorithms
E13. Linear Linked List
E14. Search Tree
E15. Hash Table
E16. Performance Metrics
Key Words
References and Further Reading
EXISTS command based on relational calculus. The three data structures used for de-
signing efficient search algorithms are described with some discussions of search times in
each case. Search algorithms constitute an active research area of theoretical computer
science which is based on mathematics.
E2. Relation
The Cartesian product of n sets A1, . . . , An is written as A1 x … x A n and is defined as a
set of n-tuples (a1, a2, …, an) such that ai ∈ Ai , i = 1, 2, . . . , n. We use the notation A1 x
… x An = {( a1, a2, …, an) | ai ∈ Ai , i = 1, 2, . . . , n }.
An n-ary relation R(A1, . . . , An) defined on n sets A1, . . . , An is a subset of the Carte-
sian product A1 x … x An. Each ai is called an attribute of R defined on the domain Ai , i =
1, 2, . . . , n. The integer n is called the degree or arity of the relation R.
E3 Functional Dependency 451
Example
Consider the table CUSTOMER defined below.
Cust_ID NUMBER(5),
Name VARCHAR2 (20),
Balance NUMBER (10,2),
Credit_Status CHAR (10)
CUSTOMER is a quaternary or 4-ary relation with four attributes, Cust_ID, Name,
Balance, and Credit_Status. The respective domains of these attributes are given by:
Cust_ID {5-digit numbers}
Name {character strings of variable length up to 20}
Balance {10-digit numbers with two digits to the right
of the decimal}
Status {character strings of fixed length 10}
Obviously, none of the columns assumes all possible values from its domain.
mine the entire closure CL (S) of S. Korth and Silberschatz [3, pp. 187 and 223] have
given two algorithms in pseudo Pascal language to compute CL (S) from a given S.
We pursue further this topic in Section E11 after discussing the relational algebra and
calculus.
Two relations R1 and R2 are said to be union compatible if R1 and R2 have the same de-
gree, say n, and the kth attribute of R1 has the same domain as the kth attribute of R2,
where k = 1, . . . , n. However, the corresponding attributes need not have the same name.
Let f1, …, fn be n distinct operations on a set S and let fi1, …, fik, k < n, be a proper sub-
set of these n operations. If each of the remaining n – k operations of the total set of n op-
erations can be written in terms of the k operations fi1, …, fik, then we say that these k op-
erations are primitive, and the remaining n – k are derived.
Two sets S1 and S2 are isomorphic if there exists a one-to-one and onto mapping be-
tween the elements of S1 and S2. Two finite sets with the same number of elements are
always isomorphic. An example of two isomorphic infinite sets is the following.
S1 = {n | n is a positive integer}; S2 = {n | n is a squared integer}.
The mapping n ∈ S1 ↔ n2 ∈ S2 establishes the isomorphism between S1 and S2.
We now describe the four set-theoretic operations below.
Union
Given two union compatible relations R1 and R2 of degree n each, the union of R1 and R2
is a relation of degree n, which is written as R1 ∪ R2 and is defined as follows.
R1 ∪ R2 = {r | r is an n-tuple, r ∈ R1 or r ∈ R2 or r ∈ (both R1 and R2)}.
Example: Let R1 = {all rows in CUSTOMER | Balance > 10,000}
R2 = {all rows in CUSTOMER | Credit_Status = “Excellent”}
Then, R1 ∪ R2 consists of all rows in CUSTOMER for which
Balance > 10,000 or Credit_Status is “Excellent” or both.
Although defined as a binary operation, union can be extended to m (>2) union compati-
ble relations as follows:
R1 ∪ R2 ∪ … ∪ Rm = {r | r is an n-tuple, r ∈ Ri for at least one i = 1, 2, . . . , m}
E5 Relational Algebra: Prescriptive Query Languages 455
Intersection
Given two union compatible relations R1 and R2 of degree n each, the intersection of R1
and R2 is a relation of degree n, which is written as R1 ∩ R2 and is defined as follows.
R1 ∩ R2 = {r | r is an n-tuple, r ∈ R1 and r ∈ R2}
Example: Let R1 = {all rows in CUSTOMER | Balance > 10,000}
R2 = {all rows in CUSTOMER | Credit_Status = “Excellent”}
Then, R1 ∩ R2 consists of all rows in CUSTOMER for which both
Balance > 10,000 and Credit_Status is “Excellent”.
Although defined as a binary operation, intersection can be extended to m (> 2) union
compatible relations as follows.
R1 ∩ R2 ∩ … ∩ Rm = {r | r is an n-tuple, r ∈ Ri for all i = 1, 2, . . . , m}.
Difference
Given two union compatible relations R1 and R2 of degree n each, the difference of R1 and
R2 is a relation of degree n, which is written as R1 – R2 and is defined as follows:
R1 – R2 = {r | r is an n-tuple, r ∈ R1 but r ∉ R2}
Example: Let R1 = {all rows in CUSTOMER | Balance > 10,000}
R2 = {all rows in CUSTOMER | Credit_Status = ‘Excellent’}
Then, R1 – R2 consists of all rows in CUSTOMER for which
Balance > 10,000 but Credit_Status is not “Excellent”.
Cartesian Product
Given two relations R1 and R2 of degree m and n respectively, the Cartesian product of R1
and R2 is a relation of degree m + n, which is written as R1 × R2 and is defined as follows.
R1 × R2 = {r | r is an (m + n)-tuple of the form (a1, …, am, b1, …, bn) such
that (a1, …, am) ∈ R1 and (b1, …, bn) ∈ R2}
If R1 and R2 have respectively p and q rows, then R1 × R2 has pq rows.
Example: Let R1 = CUSTOMER, as defined above
R2 = ORDER (Order_ID, Order_Amount, Order_Date).
Then, R1 × R2 is a relation of degree 7 with the structure:
R1 × R2 = (Cust_ID, Name, Balance, Status, Order_ID,
Order_Amount, Order_Date).
Every tuple of CUSTOMER is concatenated with every tuple of
ORDER to produce the Cartesian product CUSTOMER × ORDER.
If CUSTOMER has 100 rows, say, and ORDER has 1,000 rows,
then CUSTOMER × ORDER has 100,000 rows.
456 Appendix E. Mathematical Foundation of Relational Databases
Selection
Let R be a relation of degree n, a1 and a 2 be two attributes of R, and Θ be a relational op-
erator; i.e., Θ takes any one of six possible values, <, ≤, >, ≥, =, ≠. Then, the Θ-selection
of R on a 1 and a 2 is defined to be the following set of tuples t of R: {t | t ∈ R, and a 1 Θ a2
holds}.
In particular, a 2 can be a constant c, say. In that case, the Θ-selection is defined by
{ t | t ∈ R, and a 1 Θ c holds}.
The Θ-selection of R is a unary operation and returns a horizontal subset of R, i.e., the
set of all tuples of R that are related by the relational operator Θ. Hence the Θ-selection is
implemented by a WHERE clause in SQL such as (R WHERE a1 Θ a 2). The expression a
1 Θ a 2 is called the predicate of the Θ-selection. The predicate can be extended from a
single relational operator Θ to a Boolean combination of such operators by using the
three primitive Boolean operations, conjunction (Λ), disjunction (V), and negation (∼).
These compound predicates are defined as follows in terms of the Θ-selection and the
three set-theoretic operations, union, intersection, and difference.
• R WHERE p1 Λ p2 ≡ (R WHERE p1) ∩ (R WHERE p2)
• R WHERE p1 V p2 ≡ (R WHERE p1) ∪ (R WHERE p2)
• R WHERE ∼ p ≡ R − (R WHERE p)
Note that p, p1, and p2 are simple predicates of the form a1 Θ a 2. Also, the binary
combinations p1 Λ p2 and p1 V p2 can be extended to k-ary combinations such as p1 Λ p2 Λ
… Λ p k and p1 V p2 V … V p k respectively, since union and intersection can be so ex-
tended.
The relational algebraic operation of Θ-selection is different from the command
SELECT of SQL. SELECT is a much more powerful operation than the Θ-selection and
is, in fact, used to implement the relational algebraic operations.
Projection
Let R be a relation of degree n with attributes a 1, . . . , a n. Let a i1, . . . , a ik, k < n, be a
subset of these n attributes. The k-projection of R on the k attributes a i1, . . . , aik is written
as R [a i1, . . . , aik] and is defined by
R [a i1, . . . , aik] = {t | t ∈ R, t = (t 1, . . . , tk) such that tj is the value assumed by t for
the attribute aij, j = 1, . . . , k}
Clearly, R [a i1, . . . , aik] is a vertical subset of R. The projection is a unary operation.
E5 Relational Algebra: Prescriptive Query Languages 457
Join
Let R1 and R2 be two relations of degree m and n respectively and let Θ be a relational
operator, i.e., Θ = (<, ≤, >, ≥, =, ≠). If x and y are two attributes of R1 and R2 respectively,
then the Θ-join of the relation R1 on x with the relation R2 on y is defined as follows:
{t | t is a concatenation of tuples t 1 ∈ R1 and t 2 ∈ R2 such that t 1.x Θ t 2.y holds}
The Θ-join is a relation of degree m + n. If R1 has p rows and R2 has q rows, then their
Θ-join will have at most pq rows, because the join condition will eliminate those rows
that do not satisfy the condition t1.x Θ t2.y. We can express a Θ-join in terms of a Carte-
sian product and a Θ-selection as follows.
1. Let R = R1 × R2 so that R is of degree m + n.
2. Perform the Θ-selection of R such that R.x Θ R.y holds.
We can define six possible types of Θ-join corresponding to the six different values of
the operator Θ. If Θ is the equality operator, the Θ-join is called an equijoin. An equijoin
by definition has two identical columns since R has the condition t1.x = t2.y. Therefore,
we take a projection of an equijoin to remove one of the two identical columns. The re-
sulting relation is called a natural join, or simply a join. Therefore, the algorithm to com-
pute a join of R1 on x with R2 on y consists of Steps (1) and (2) above along with Step (3)
stated below:
3. Perform the projection of R on all its attributes except R. x or R.y.
We note that when R1 and R2 are of degree m and n respectively, the degree of their
join is at most m + n – 1.
Division
The division operation divides a dividend relation R of degree m + n by a divisor relation
S of degree n to yield a quotient relation R / S of degree m. The (m + i)th attribute of R
and the ith attribute of S must be defined on the same domain for i = 1, . . . , n. Let us
write the tuples r and s of R and S as follows.
R = {r | r = (r 1, . . . , rm, rm+1, . . . , rm+n)}, S = {s | s = (s 1, . . . , sn)}.
Then the quotient R /S consists of tuples t such that the following condition holds:
R / S = {t | t = (t 1, . . . , tm), and (t 1, . . . , tm, s1, . . . , sn ) ∈ R for all tuples (s 1, . . . , sn) ∈ S}.
Note that R is the Cartesian product of T and S. This resembles the arithmetical nota-
tion of multiplication and division. For example, if A, B, and C are real numbers and C =
A / B, then we write A = B × C. The same notation is used in relational algebra for de-
scribing the division operation. For any two relations R and S, if T = R / S is the quotient
relation, then we can write R = T × S, where T × S is the Cartesian product of T and S.
458 Appendix E. Mathematical Foundation of Relational Databases
This completes the proof that Γ and, therefore, Ω are closed with respect to the opera-
tions of relational algebra.
AND or Λ p q pΛq
T T T
T F F
F T F
F F F
E8 Relational Calculus: Descriptive Query Languages 461
OR or V p q pVq
T T T
T F T
F T T
F F F
Example: “For any two numbers a and b, there always exists at least one
number c between them” can be written as
(∀ a, b) (∃ c) ((a < c) Λ (c < b))
De Morgan’s laws connect conjunction, disjunction, and negation as follows.
∼ (p Λ q) means ∼pV∼q
∼ (p V q) means ∼pΛ∼q
The universal and the existential quantifiers are connected as follows.
∼ (∀ x P (x)) means ∃ y (∼P (y))
∼ (∃ x P (x)) means ∀y ((∼P (y)),
where P (x) and P (y) are predicates with arguments x and y respectively.
Atom
An atom is of three types:
(a) R (t), as defined above, to mean that t is a tuple of the relation R.
(b) s (i) Θ u (j), where s and u are tuple variables and Θ is a relational operator, to mean
that the ith component of s is related to the jth component of u by the operator Θ. For
example, s (3) > u (5) means that the third component of s is greater than the fifth
component of u.
(c) s (i) Θ c, where c is a constant and s (i) is defined as in type (b) above, to mean that
the ith component of s is related to the constant c by the operator Θ. For example, s
(2) != 6 means that the second component of s is not equal to 6.
E9 Tuple Relational Calculus 463
Formula
A formula is defined recursively as follows.
(a) Every atom is a formula. All occurrences of tuple variables mentioned in the atom
are free in the formula.
(b) If ψ1 and ψ2 are formulas, so are ψ1 Λ ψ2, ψ1 V ψ2, and ∼ ψ1. Occurrences of tuple
variables are free or bound in ψ1 Λ ψ2, ψ1 V ψ2, and ∼ ψ1 accordingly as they are free
or bound in the individual components ψ1 and ψ2.
(c) If ψ is a formula, then so is (∃ s | ψ (s)). The formula means that there exists a value
of s such that when we substitute this value for all free occurrences of s in ψ, the
formula becomes true. Occurrences of s that are free in ψ are bound to ∃ s in (∃s | ψ
(s)).
(d) If ψ is a formula, then so is (∀s | ψ (s)). The formula means that whatever value we
substitute for all free occurrences of s in ψ, the formula becomes true. Occurrences of
s that are free in ψ are bound to ∀ s in (∀s | ψ (s)).
(e) Parentheses may be placed around formulas, as needed, to override the default order
of precedence. The default order is: relational operators (<, ≤, >, ≥, =, and !=) the
highest, the quantifiers ∀ and ∃ the next, and then the three operators ∼, Λ, and V, in
that order.
(f) Nothing else is a formula.
R (r1…rn) asserts that the values of those ris must be selected that make (r1…rn)
a tuple in R. The meaning of r Θ s is that r and s must make r Θ s true.
Formulas in domain relational calculus use the five operators, ∼, Λ, V, ⇒, ⇔, and the
two quantifiers, ∃ and ∀. But in each case the arguments must be domain variables in-
stead of tuple variables. Free and bound variables and the scope of a bound variable are
defined the same way as in tuple relational calculus. A domain relational calculus expres-
sion is of the form { r1…rn | ψ (r1, . . . , rn)}, where ψ is a formula whose only free do-
main variables are the distinct variables r1, . . . , rn. The domain relational calculus ex-
pression { r1…rn | ψ (r1, . . . , rn)} is defined to be safe if the following three conditions
hold.
(a) ψ (r1…rn) is true implies that each ri is in ∆ (ψ).
(b) If (∃ s)(ϕ (s)) is a subformula of ψ, then ϕ (s) is true implies that s ∈ ∆ (ϕ).
(c) If (∀ s)(ϕ (s)) is a subformula of ψ, then ϕ (s) is true implies that s ∈ ∆ (ϕ).
quire such a sequential order of execution. Despite this difference query languages based
on algebra and calculus are equivalent in their expressive power.
Codd [1, pp. 78–85] first proposed the tuple relational calculus in a formulation
slightly different from that described in Section E9. His objective was to use the calculus
as a benchmark for evaluating query languages. In [1] he introduced a query language
named ALPHA, which was never implemented commercially, as an example of a query
language based on tuple relational calculus. He established that a query language that
does not at least have the expressive power of the safe formulas of tuple or domain rela-
tional calculus, or of the five primitive operations of relational algebra is inadequate for
retrieval and update operations of relational database systems. SQL, which is the ANSI
4GL for relational databases, is a hybrid of relational algebra and calculus. SQL is pre-
dominantly relational algebra based, but it uses the existential quantifier of relational cal-
culus in its command EXISTS.
The equivalence of relational algebra and calculus is proved mathematically as fol-
lows.
• If E is a relational algebra expression, then E can be written as a safe tuple relational
calculus expression E', say. The proof uses the principle of finite induction on the
number of relational algebra operations used in E.
• If E is a safe tuple relational calculus expression, then E can be written as a safe do-
main relational calculus expression.
• If E is a safe domain relational calculus expression, then E can be written as a rela-
tional algebra expression.
The above three theorems together prove that the three alternative formulations of
query languages are mathematically equivalent. A detailed formal proof can be found in
Mittra [4, Chapter 4], Ullman [5, Chapter 5], and Yang [8, Chapter 3]. Ullman [6, Vol-
ume 1, Chapter 4] provides several examples of query languages including SQL that are
based on algebra or calculus.
A query language is said to be relationally complete if it is at least as powerful in its
expressive capability as the tuple relational calculus. Due to the equivalence theorem dis-
cussed above, a query language can be proved to be relationally complete if it can emu-
late the five primitive operations of relational algebra. SQL can be proved to be relation-
ally complete by writing five SQL expressions to implement the five primitive operations
of relational algebra, as shown below.
Union: SELECT × FROM R
UNION
SELECT × FROM S;
Difference: SELECT × FROM R
WHERE NOT EXISTS
(SELECT × FROM S WHERE
all-columns-of R = all-columns-of S);
Cartesian Product: SELECT × FROM R, S;
E11 Equivalence Theorem for Algebra and Calculus 467
Header
The advantage of a linear linked list is that the insertion of a new node or the deletion
of an existing node can be accomplished by merely resetting the pointer part of the af-
fected node.
Insertion
Suppose that a list L contains n nodes l1, . . . , ln and that we want to insert a new node l
between the nodes lj and lk. We proceed as follows (see Figure E.2).
• Replace the pointer part of node l with the pointer part of lj ;
• Replace the pointer part of node lj with the starting address of node l.
Data Pointer
l
Header
Deletion
Suppose that we want to delete a node lj currently placed between the nodes l1 and lk. We
proceed as follows (see Figure E.3).
• Replace the pointer part of l1 with the starting address of lk.
Header
Modification
A modification of a node l is implemented as a deletion of l followed by an insertion of
the modified data part of l.
E14 Search Tree 469
Search Algorithm
A linear linked list can be searched to return nodes whose data parts match given selec-
tion criteria. The search consists of the following steps.
• Start with the first node of the list.
• Return the data part of each matching node.
• Continue the search until you reach the last node.
• Terminate the search.
Figure E.4 shows the flowchart of the search algorithm. Clearly it represents a se-
quential search.
Thus, if a relation is implemented as a linear linked list, the update, i.e., insertion, de-
letion, and modification, and retrieval transactions are implemented as outlined above.
The retrieval via a linear linked list involves a full table scan.
Start with
first node
Yes Pointer
END =
NULL?
No
Data
Store data Yes matches
in display selection
area criteria?
No
Go to next
node
T1
T2 T3
Tk
Tj
Tn
A B-tree of order n is a special type of tree that is always balanced. This means that
the tree satisfies the following four conditions.
E14 Search Tree 471
Search Algorithm
To search for data indexed by a given key value in a B-tree of order n we proceed as fol-
lows.
• Start at the root of the tree.
• Branch down in at most n directions at each succeeding lower level until you reach
the leaves.
• Search each leaf node to find the pointer to the data indexed by the given key value.
If the data do not exist, a NULL pointer is returned.
Insertion
To insert a new key into a B-tree of order n we proceed as in the search process described
above. When we reach the leaf nodes, we insert the key into a leaf node so as to maintain
the order of key values in the leaf nodes. However, this process may pose a problem, be-
cause a leaf can contain at most n keys. If the target node is already full, the new key
cannot be inserted. In this case, we split the node into two nodes and adjust the pointers
from their parents. This process may lead to an increase in the height of the tree causing
an increase in the search time.
B-trees are widely used in implementing indices in a relational database. The index B-
tree is kept in auxiliary storage such as a disk file. During the execution of a database
transaction such as retrieval via indexed search or updating an indexed table the relevant
portion of the tree is brought into the memory. As we know, the disk access speed is sev-
eral orders of magnitude slower than accessing the memory. The order n of an index B-
tree is chosen such that a nonleaf node fits into one database block, which is always a
multiple of the operating system block size. Usually when n belongs to the range 32 ≤ n ≤
256, this condition is satisfied. The maximum number of elements that are stored in a leaf
node is chosen so that a full leaf fits into one database block. This means that a record
can be found via an indexed search in a very few disk accesses since the level of a B-tree
usually does not exceed 4. The root and possibly the first level of nodes can even be kept
in memory.
472 Appendix E. Mathematical Foundation of Relational Databases
Steps (b) and (c) are now applied to λ to compute the bucket number of v.
When an integer v is divided by a modulus N, it leaves N possible remainders 0, 1, 2,
…, N – 1. The infinitely many integers of the form kN + r, where k is any integer, leave
E15 Hash Table 473
the same remainder r when divided by N. The set { kN + r | k is any integer} is called an
equivalence class of integers modulo N. Any modulus N partitions the set of integers into
N disjoint equivalence classes of congruent elements such that any two elements of the
same class leave the same remainder when divided by the modulus. Each bucket in a hash
table is an equivalence class of key values. Since we can map a virtually endless set of
keys onto a finite number of buckets given by the modulus N, multiple keys can and do
map onto the same bucket. This leads to the problem of collision, which is resolved in
two ways:
• Open hashing (also called separate chaining), and
• Closed hashing (also called open addressing)
We now describe each method of resolution.
Open Hashing
For each bucket in the hash table we maintain a linear linked list of all keys that hash to
the same bucket. Let us suppose that
N = size of hash table = number of buckets in the hash table
r = bucket number, where r = 0, 1, 2, . . . , N – 1
Then, bucket r consists of the key value of r and a pointer to a linear linked list contain-
ing all key values of the type kN + r, where k is an integer. Figure E.6 shows an example
where N = 7, a prime number.
To search for a given key value v, say, in a hash table with modulus N, we proceed as
follows.
• Let r = v mod N
• Access bucket r for the key = v
• Search the linear linked list originating at r until you find v
• If a NULL pointer is returned, then v is not in the hash table
Closed Hashing
Open hashing has the disadvantage of requiring pointers and a second data structure,
namely, the linear linked list. This tends to slow down the search time slightly due to the
time required to allocate new cells. Closed hashing, which is also known as open ad-
dressing, is an alternative method that does not use the linked lists.
474 Appendix E. Mathematical Foundation of Relational Databases
0 0 7 14 . . .
1 1 8 15 . . .
2 . .
. .
3 . .
. .
4
6 6 13 20 . . .
Let
N = size of the hash table, i.e., N = modulus
N = key value that must be stored in a bucket
H (v) = v mod (N), where H is the hash function, i.e., H (v) is the bucket
where v is hashed via the hash function of modulo arithmetic.
If the bucket H (v) is already occupied, a collision arises. Under closed hashing all the
remaining buckets located after the bucket H (v) are tried sequentially until an empty one
is found to place v. The search can wrap around the hash table if the end of the table is
reached. The mathematical procedure is given below.
Let
hi (v) = bucket to store v under closed hashing, where i = 0, 1, 2, . . . , N – 1.
Define
hi (v) = (H (v) + f (i)) mod (N), where i = 0, 1, 2, . . . , N – 1
Then, f (i) is called the probing function and is used for resolving the collision. We define
f (0) = 0. Buckets h0 (v), h1 (v), . . . , h N-1 (v) are tried until a free bucket is found for stor-
ing v.
Two forms of the probing function are commonly used:
• Linear probing where we set f (i) = i
• Quadratic probing where we set f (i) = i2
E15 Hash Table 475
Bucket Key
0 1222
1
2 2042 36 1753
3 36
4 123
5 5
6 1753
7
8
9
10
11
12
13
14
15 372 50 1222
16 50
Since bucket 6 is not occupied, key 1753 is placed there. Proceeding similarly we find
the following hash locations for the remaining keys: 372 → 15, 50 → 16, and 1222 → 0.
Figure E.7 shows the hash table configuration with all these keys. We note that the search
for key 1222 starts at the computed hash location of bucket 15, then goes to bucket 16,
and then wraps around the hash table to find the ultimate location of bucket 0 where the
key 1222 is stored.
Since in closed hashing all keys are ultimately stored inside the table instead of being
stored in separate linked lists, the size of the hash table must be sufficiently large. Nor-
mally, the size N is taken as at least double the number of key values. The linear probing
amounts to trying buckets sequentially with wraparound until an empty bucket is located.
As a result, blocks of occupied buckets start to form clusters. This phenomenon is known
as primary clustering. It means that it will take longer to find an empty bucket to place a
key value that hashes into a bucket in the cluster area. Two examples of primary cluster-
ing can be found in Figure E.7 at the buckets 2 and 15. Under linear probing the hash ta-
ble should not be allowed to get more than 70% full. Under quadratic probing that num-
ber dwindles to below 50%. In fact, the following theorem can be proved mathematically.
Under quadratic probing with a prime modulus a key value can always be inserted if
the table is at least half empty.
E16 Performance Metrics 477
Rehashing
As a hash table gets too full, the execution time of storing a key gets longer and empty
buckets become harder to find. In this case, the following strategy known as rehashing is
adopted.
(a) Build another hash table that is at least double the size of the original table.
(b) Design a new hash function.
(c) Scan the entire original hash table to compute the new hash value of each key.
(d) Insert each key into the new table according to the value computed in Step (c).
According to the division remainder algorithm using modulo arithmetic the new hash
function is defined as follows. Let H (v) = v mod (p1), where p1 is a prime, be the original
hash function. We define the rehash function RH (v) as follows.
RH (v) = v mod (p2), where p2 is a prime > 2p1
B-tree
Let the set of n elements be arranged in a B-tree of order m. Then the average search time
for an element is O (logm (n)). The same estimate applies to a B*-tree, which is a special
case of a B-tree.
Hash Table
Given a hash table T with m buckets and n keys, we define the load factor µ of T as µ = n
/ m. Thus the load factor represents the average number of keys that can be stored in each
bucket. Note that µ can be less than, equal to, or greater than 1. It is assumed that µ re-
mains fixed as both n and m → ∞, i.e., in real terms, both n and m grow arbitrarily large.
Since the address, i.e., the bucket number, of a key element in a hash table is com-
puted from the value of the key, the access time of any key is constant, i.e., O (1). Based
on this fact the following theorems hold.
1. In open hashing the search time for a key irrespective of whether the key exists in the
table is O (1 + µ), where µ is the load factor.
2. In closed hashing with µ < 1, the expected number of probes in an unsuccessful
search, i.e., where the key does not reside in the hash table, is at most 1 / (1 − µ).
3. In closed hashing with µ < 1, the expected number of probes in a successful search,
i.e., where the key resides in the hash table, is at most
(1 / µ) * loge (1 / (1 − µ)) + 1 / µ
Theorems (2) and (3) assume that the keys are uniformly distributed over the hash
table.
Key Words
Armstrong's axioms domain relational calculus
atom equijoin
B-tree equivalence class
bound variable equivalence theorem
bucket existential quantifier
Cartesian product formula
closed hashing free variable
closure property fully functionally dependent
collision function
conjunction functional dependency
De Morgan's laws generation of languages
difference hash function
disjunction hash table
division remainder algorithm intersection
domain isomorphic
References and Further Reading 479
A B
acceptance test, 6, 42, 52 B-tree, 47, 79, 472, 473, 479
access B*-tree, 37, 245–250, 252–254, 263, 279,
indexed, 232 281–282, 289, 312, 344, 346, 349, 357,
sequential, 232 473, 478, 479
access path, 33, 37–38, 231–233, 238, 264, B*-tree index, 245–247, 249–250, 253–254,
268, 294–295, 297, 301, 383 263, 279, 281–282, 312, 344, 346, 349,
advanced replication, 52 357
aggregation, 365, 374, 384 background process, 30, 55, 57–61, 63, 65,
alert log file, 69, 79, 80, 417, 418 69, 79, 106, 137, 138, 146, 156, 161,
alternate key, 8, 25 184, 200, 206, 212, 224–226, 356, 370,
ANALYZE, 35, 110–111, 152,–153, 234– 396, 409, 416, 418, 435, 438
236, 245, 262, 299, 315, 317–318, 323, balanced tree, see B-tree
337, 349, 350, 351, 384, 399 basic replication, 42, 52
AND logic, 279 before image, 115, 119, 134, 288, 410
application server, 7, 387, 389–390, 394, bind variable, 134, 149, 184, 186, 383
396–397, 399, 400 bitmap index, 118, 216, 248–253, 269, 279,
ARCH, 58, 63, 65–66, 79, 156, 161, 164– 312, 382, 383
166, 184, 212 BLEVEL, 247–248, 253, 279
ARCHIVELOG, 65–66, 68, 79–80, 161, bound variable, 464–465, 467, 479
164–165, 184 bucket, 261, 263, 279, 474–477, 479
archiving, 18, 25, 65, 165, 166, 422 buffer
Armstrong's axioms, 454, 479 dirty, 147, 156, 164, 200, 206, 208, 209
asymptotic growth curve, 311–312 free, 206, 208, 210
atom, 464,–466, 479
atomic concept, 7–8, 25 C
attribute, 8–10, 25, 27, 296, 368, 432–433, cache get, 138, 185
452–454, 456, 458–459, 461 cache hit, 138, 140, 143, 145, 148, 185, 186,
AUTOTRACE, 34, 55, 263–264, 277–279, 338
282, 291, 298–299, 309, 312 cache memory, 38, 61
auxiliary storage, 36–38, 57–59, 64, 67, 82– cache miss, 138, 143–145, 148–149, 152,
83, 115, 168, 190, 224, 473 185–186, 205
482 Index
cache reload, 138, 185 conjunction, 137, 297, 351, 454, 458, 462–
caching algorithm, 58, 185 464, 479
candidate key, 8, 25 constraint, 47, 49–51, 93, 134, 241–242,
canonical form, 33, 38, 231 246, 429
cardinality, 25, 235, 239, 248, 382, 431, contention, 49–50, 52, 64, 66, 113, 115,
453–454 127, 134, 165, 175, 178–179, 181,
Cartesian product, 377, 382–383, 384, 452, 183–186, 200–201, 210, 213–214,
456–457, 459–461, 465, 479 216–217, 225, 319, 333, 335, 357,
cartridge, 18, 392–397, 399 397
cascading, 14, 25, 445 control file, 65, 68, 79, 164, 211–212, 417,
CASE, 14, 21, 28, 30, 38–39, 43, 371 419, 435, 438
CGI, see Common Gateway Interface crontab, 123, 134
chaining, 20, 25, 31, 33, 35–38, 82, 110–
115, 134–136, 225, 333, 404, 474 D
CHECK (constraint), 13, 15, 21, 28, 51–52, data
93, 404, 429, 443 mining, 368, 370, 384, 385
checkpoint, 58, 63–66, 79–80, 156, 163– redundancy, 366, 384
165, 185, 210, 214, 219, 421, 423 replication, 20, 32, 38, 42, 52–53, 413
CKPT, 58, 63, 65–66, 79, 156, 163–164, retrieval, 6, 36, 47, 52, 60, 97, 230, 233,
185, 418 318, 444, 447
class, 6, 25, 118, 121, 139, 183, 198, 201, transfer rate, 37, 38
216, 474 update, 52
client-server application, 6, 25, 28 validation, 13, 19, 25, 28, 42, 367, 370–
client-server architecture, 30, 38, 388 371
closed hashing, 476, 477, 479 warehouse, 28, 46, 53, 239, 249, 361,
closure property, 461, 479 363–373, 375–376, 384–386
cluster, 74, 233, 246, 255–260, 271, 279, data block buffers, 60–61, 64, 77, 134, 137–
294, 302, 307, 312, 349, 477 138, 140, 142, 185, 226
cluster index, 258 data block header, 134, 185, 226
cluster join, 233, 279, 307, 312 data consistency, 41, 52
CLUSTERING_FACTOR, 247–248, 279 Data Definition Language, 4, 25, 38
collision, 474, 476, 479 data dictionary, 14, 24–25, 28, 30, 32, 35,
Common Gateway Interface, 388, 399 37–38, 48, 52, 57, 60, 62, 64, 70, 74,
component, 7–8, 22, 46, 48, 49, 58, 71, 136, 76, 79, 80, 82, 134, 154, 183, 185, 219,
231, 239–240, 246, 316, 356, 369, 388, 226, 230, 245, 261, 274, 293, 320,
395, 399, 403, 409, 464, 466–467 326–327, 333, 340, 348–351, 354, 370,
concatenated index, 238–239, 244, 246, 253, 371, 411, 416
263, 279, 295, 312, 346, 376–379 data integrity, 14–15, 20, 47, 52, 60
conceptual level, 1, 20, 25, 29–32, 34, 38, Data Manipulation Language, 4, 25
41–44, 46–49, 52–54, 82, 96, 224, 230, database transaction, 8, 25, 36, 60, 64, 68,
235, 240, 290–291, 316, 318, 323, 363, 79, 102, 115, 134, 146, 288, 388, 469,
385, 452, 454 473
Index 483
datafile, 18, 38, 49, 65, 67, 79, 84–85, 89, entity, 8–9, 11–12, 26, 28, 39
104, 134, 160–161, 194, 222, 326, 327, equijoin, 279, 303, 307, 312, 383, 459, 479
333, 418, 436, 446, 448 equipartitioned, 50–51, 53
datamart, 368–369, 384, 385 equivalence class, 474, 479
DBWR, 58, 61, 64–65, 79, 147, 156, 159– equivalence theorem, 468, 479
160, 164, 185–186, 190, 206–207, 210, Erwin, 12, 21, 43, 371, 406
212, 216, 226, 396, 418 execute phase, 231–232, 279, 312
DDL, see Data Definition Language execution plan, 33–34, 38, 47, 55, 62, 148,
De Morgan's laws, 479 229, 231, 234–235, 263–264, 266–269,
decision support system, 46, 53, 239, 306, 274–275, 277, 279, 281–284, 291, 297,
364, 384–385 299, 309, 339, 340, 342, 372, 379, 381,
decision tree, 260, 279 383, 385
declarative constraint, 13, 26 existential quantifier, 462–463, 464, 468,
defragmentation, 97, 105, 107, 134–135, 479
326 EXPLAIN PLAN, 20, 34, 37, 55, 263–265,
denormalization, 13, 20, 26, 32, 34, 38, 41, 272, 274–275, 279–281, 284, 291, 298,
43–44, 46–47, 52–54, 290, 365, 384 312, 339, 377, 379, 381, 384, 420
dense index, 279 explicit cursor, 289–290, 312
descriptive query language, 453, 487 export, 100, 110, 113, 134–136, 375, 384,
dictionary cache, 62, 64, 134, 148, 154–155, 418, 424, 426, 442, 444–449
167, 185, 212, 219–220, 226 export parameter file, 444, 449
difference (relational algebraic operation), 6, extends, 42, 52, 116, 118, 121, 123
80, 102, 105, 136, 186, 202, 222, 289, extent
386, 388, 456–458, 460–461, 467, 479 initial, 16, 95, 96, 105, 326
dimension table, 367–372, 374, 376, 377, next, 16–17, 39, 85, 95–96, 99, 103, 104–
379, 382–385 105, 116, 326, 371, 404, 410
dimensional analysis, 368, 384 extent map,, 216
disjunction, 458, 462, 464, 479 external level, 20, 26, 30–34, 38–39, 55,
distributed database, 28, 32, 42, 52, 66, 231– 229–230, 239, 310, 315–317, 361, 363,
232 372, 399, 400
division remainder algorithm, 474, 478, 479 extraction programs, 365, 384
domain relational calculus, 231, 451, 455,
466–469, 479 F
driver, see driving table fact table, 367–370, 376–377, 379, 382–385
driving set, 43, 52 fetch phase, 232, 279, 312
driving table, 35, 38, 44, 53, 244, 279, 301, filtering condition, 291, 312
312 first normal form, 26
DSS, see decision support system, FK, 14–15, 19, 22–23, 26, 35, 240–244,
dynamic performance view, 57, 70, 116, 255, 376, 377, 379, 404, 443
179, 183, 190, 272, 279, 338, 370 foreign key, 13–14, 26, 34, 93, 95, 240
formula, 96, 131, 140, 151, 154, 212,
E 221, 234, 343, 344, 405, 464–467,
ending point (of a histogram), 261, 279 479
484 Index
SGA, (continued), 317, 334–335, 356–358, TKPROF, 55, 263–264, 275–278, 280, 282,
370, 372, 384, 400, 416, 418, 421 291, 338
shared SQL pool, 60, 62, 77, 79, 137–138, trace file, 69, 80, 273, 275–276, 338, 417,
148, 168, 176, 185–186, 204–205, 226, 418, 422
372, 384, 400 Transaction Management, 387, 395–396,
sibling, 480 399–400
SMLC, see software maintenance life cycle transitive dependency, 13, 26–27, 44, 46,
SMON, 58, 64, 80, 106, 190, 200, 212, 409, 366, 384
413 transportable tablespaces, 401, 441–446,
software development life cycle, 3, 5, 26 449
software maintenance life cycle, 3, 5, 26 tree, 62, 79, 148, 245–246, 248–249, 253,
sort–merge join, 234, 280, 293, 296–297, 289, 469, 471–473, 480
304–305, 307, 312 tree structured index, 245
SQL*Loader, 19, 374–375, 384 trigger, 26
SQLTRACE, 55, 263–264, 272–275, 278, truth table, 462–463, 480
280, 282, 291, 312, 338 tuple relational calculus, 231, 451, 455,
star query, 295, 372, 376–377, 379–380, 464–468, 480
383–384 2NF, see second normal form
star query execution plan, 376–377, 384
star schema, 363, 367, 376–377, 382, 384 U
star transformation, 363, 372, 377, 382–384 UGA, 62, 80, 148, 185
starting point, 7, 10, 32, 187, 261, 280, 403 union compatible, 456–457, 480
stored procedure, 46, 53, 399 universal desktop, 389, 399
subtype, 10, 26, 28 universal quantifier, 462–463, 480
supertype, 10, 26, 28 unqualified query, 243
swap space, 169, 185, 397 update anomaly, 44, 53
swapping, 168–169, 173, 185, 306, 334, user object, 80, 94, 134
400 user supplied hash function, 259
system generated hash function, 259–260 UTLBSTAT, 55, 70, 74, 77, 189–195, 203–
System Global Area, see SGA 204, 221, 225–227
UTLESTAT, 20, 55, 70, 74, 77, 189–192,
T 195–204, 206, 226–227
tabular format, 267, 280
Θ-join, 459 V
theta-join, 480 V$ view, 24, 57, 68, 70, 74–76, 80, 82, 203,
Θ-selection, 458–459 225–226
thin client, 7, 388, 399 validation table, 14, 24, 26, 45, 427
third normal form, 11, 13, 26, 43, 46 vector-valued attribute, 9–10, 26
thrashing, 168–169, 185
3NF, see third normal form W
three-tier architecture, 387, 388 Waits, 116, 118–121, 125, 127, 132, 134,
throughput, 33, 160, 230, 232, 235, 280, 146, 179, 183–184, 194–198, 201, 203,
284, 293, 299, 310 210, 216–217, 226, 235
Index 489
Warehouse Technology Initiative, 369, 384 wrinkle of time, 367, 374, 384, 386
Web browser, 7, 388–390, 392–394, 399– WTI, 369, 375, 384
400
Web server, 389, 396, 399 Z
World Wide Web, 388, 399 X$ table, 57, 74, 76, 80
wraps, 116, 118–119, 121, 123, 198, 477