0% found this document useful (0 votes)
13 views89 pages

Block 4

The document discusses emerging database technologies including multimedia databases, spatial databases, genomic databases, and knowledge databases. It describes factors influencing multimedia data growth, applications of multimedia databases, and considerations for designing multimedia database management systems.

Uploaded by

Jimed Galson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views89 pages

Block 4

The document discusses emerging database technologies including multimedia databases, spatial databases, genomic databases, and knowledge databases. It describes factors influencing multimedia data growth, applications of multimedia databases, and considerations for designing multimedia database management systems.

Uploaded by

Jimed Galson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Emerging Database

UNIT 1 EMERGING DATABASE MODELS, Models, Technologies


and Applications-I
TECHNOLOGIES AND
APPLICATIONS-I
Structure Page Nos.
1.0 Introduction 5
1.1 Objectives 5
1.2 Multimedia Database 6
1.2.1 Factors Influencing the Growth of Multimedia Data
1.2.2 Applications of Multimedia Database
1.2.3 Contents of MMDB
1.2.4 Designing MMDBs
1.2.5 State of the Art of MMDBMS
1.3 Spatial Database and Geographic Information Systems 10
1.4 Gnome Databases 12
1.4.1 Genomics
1.4.2 Gene Expression
1.4.3 Proteomics
1.5 Knowledge Databases 17
1.5.1 Deductive Databases
1.5.2 Semantic Databases
1.6 Information Visualisation 18
1.7 Summary 19
1.8 Solutions/Answers 20

1.0 INTRODUCTION
Database technology has advanced from the relational model to the distributed DBMS
and Object Oriented databases. The technology has also advanced to support data
formats using XML. In addition, data warehousing and data mining technology has
become very popular in the industry from the viewpoint of decision making and
planning.

Database technology is also being used in advanced applications and technologies.


Some of this new application includes multimedia based database applications,
geographic applications, Gnome databases, knowledge and spatial databases and
many more such applications. These applications require some additional features
from the DBMS as they are special in nature and thus are categorised as emerging
database technologies.

This unit provides a brief introduction of database requirements of these newer


applications.

1.1 OBJECTIVES
After going through this unit, you should be able to:

• define the requirements of a multimedia database systems;


• identify the basic features of geographic databases;
• list the features of Gnome databases;
• differentiate various knowledge databases and their advantages, and
• define the terms information visualisation and spatial databases.

5
Emerging Trends and
Example DBMS
Architectures
1.2 MULTIMEDIA DATABASE
Multimedia and its applications have experienced tremendous growth. Multimedia
data is typically defined as containing digital images, audio, video, animation and
graphics along with the textual data. In the past decade, with the advances in network
technologies, the acquisition, creation, storage and processing of multimedia data and
its transmission over networks have grown tremendously.

A Multimedia Database Management System (MMDBMS) provides support for


multimedia data types. It also provides the facilities for traditional DBMS functions
like database creation, data modeling, data retrieval, data access and organisation, and
data independence. With the rapid development of network technology, multimedia
database system, multimedia information exchange is becoming very important. Any
such application would require the support for a string multimedia database
technology. Let us look into some of the factors that influence the growth of
multimedia data.

1.2.1 Factors Influencing the Growth of Multimedia Data


(i) Technological Advancements
Some of the technological advances that attributed to the growth of multimedia data
are:

• computers, their computational power and availability,


• availability of high-resolution devices for the capture and display of multimedia
data (digital cameras, scanners, monitors, and printers),
• development of high-density storage devices, and
• integration of all such technologies through digital means.

(ii) High Speed Data Communication Networks and Software


Secondly high-speed data communication networks are common these days. These
networks not only support high bandwidth but also are more reliable and support
digital data transfer. Even the World Wide Web has rapidly grown and software for
manipulating multimedia data is now available.

(iii) Applications
With the rapid growth of computing and communication technologies, many
applications have come to the forefront. Thus, any such applications in future will
support life with multimedia data. This trend is expected to go on increasing in the
days to come.

1.2.2 Applications of Multimedia Database


Multimedia data contains some exciting features. They are found to be more effective
in dissemination of information in science, engineering, medicine, modern biology,
and social sciences. They also facilitate the development of new paradigms in distance
learning, and interactive personal and group entertainment.

Some of the typical applications of multimedia databases are:

• media commerce
• medical media databases
• bioinformatics
• ease of use of home media
• news and entertainment
• surveillance

6
• wearable computing Emerging Database
Models, Technologies
• management of meeting/presentation recordings and Applications-I
• biometrics (people identification using image, video and/or audio data).

The huge amount of data in different multimedia-related applications needs databases


as the basic support mechanism. This is primarily due to the fact that the databases
provide consistency, concurrency, integrity, security and availability of data. On the
other hand from a user perspective, databases provide ease of use of data
manipulation, query and retrieval of meaningful and relevant information from a huge
collection of stored data.

Multimedia Databases (MMDBs) must cope with the large volume of multimedia
data, being used in various software applications. Some such applications may include
digital multimedia libraries, art and entertainment, journalism and so on. Some of
these qualities of multimedia data like size, formats etc. have direct and indirect
influence on the design and development of a multimedia database.

Thus, a MMDBs needs to provide features of a traditional database as well as some


new and enhanced functionalities and features. They must provide a homogenous
framework for storing, processing, retrieving, transmitting and presenting a wide
variety of multiple media data types available in a large variety of formats.

1.2.3 Contents of MMDB


A MMDB needs to manage the following different types of information with respect
to the multimedia data:
Media Data: It includes the media data in the form of images, audio and video.
These are captured, digitised, processes, compressed and stored. Such data is the
actual information that is to be stored.

Media Format Data: This data defines the format of the media data after the
acquisition, processing, and encoding phases. For example, such data may consist of
information about sampling rate, resolution, frame rate, encoding scheme etc. of
various media data.

Media Keyword Data: This contains the keyword related to the description of media
data. For example, for a video, this might include the date, time, and place of
recording, the person who recorded, the scene description, etc. This is also known as
content description data.

Media Feature Data: This contains the features derived from the media data. A
feature characterises the contents of the media. For example, this could contain
information on the distribution of colours, the kinds of textures and the different
shapes present in an image. This is also referred to as content dependent data.

The last three types are known as meta data as, they describe several different aspects
of the media data. The media keyword data and media feature data are used as indices
for searching purpose. The media format data is used to present the retrieved
information.

1.2.4 Designing MMDBs


The following characteristics of multimedia data have direct and indirect impacts on
the design of MMDBs:

• the huge size of MMDBs,


• temporal nature of the data,
• richness of content through media, and
7
Emerging Trends and • complexity of representation and subjective interpretation specially from the
Example DBMS
Architectures
viewpoint of the meta data.

Challenges in Designing of Multimedia Databases


The major challenges in designing multimedia databases are due to the requirements
they need to satisfy. Some of these requirements are:
1) The database should be able to manage different types of input, output, and
storage devices. For example, the data may be input from a number of devices
that could include scanners, digital camera for images, microphone, MIDI
devices for audio, video cameras. Typical output devices are high-resolution
monitors for images and video, and speakers for audio.
2) The database needs to handle a variety of data compression and storage formats.
Please note that data encoding has a variety of formats even within a single
application. For example, in a medical application, the MRI image of the brain
should be loss less, thus, putting very stringent quality on the coding technique,
while the X-ray images of bones can be coded with lossy techniques as the
requirements are less stringent. Also, the radiological image data, the ECG data,
other patient data, etc. have widely varying formats.
3) The database should be able to support different computing platforms and
operating systems. This is due to the fact that multimedia databases are huge
and support a large variety of users who may operate computers and devices
suited to their needs and tastes. However, all such users need the same kind of
user-level view of the database.
4) Such a database must integrate different data models. For example, the textual
and numeric data relating to a multimedia database may be best handled using a
relational database model, while linking such data with media data as well as
handling media data such as video documents are better done using an object-
oriented database model. So these two models need to co-exist in MMDBs.
5) These systems need to offer a variety of query systems for different kinds of
media. The query system should be easy-to-use, fast and deliver accurate
retrieval of information. The query for the same item sometimes is requested in
different forms. For example, a portion of interest in a video can be queried by
using either:
(a) a few sample video frames as an example
(b) a clip of the corresponding audio track or
(c) a textual description using keywords.

6) One of the main requirements for such a Database would be to handle different
kinds of indices. The multimedia data is in exact and subjective in nature, thus,
the keyword-based indices and exact range searches used in traditional
databases are ineffective in such databases. For example, the retrieval of records
of students based on enrolment number is precisely defined, but the retrieval of
records of student having certain facial features from a database of facial
images, requires, content-based queries and similarity-based retrievals. Thus,
the multimedia database may require indices that are content dependent key-
word indices.
7) The Multimedia database requires developing measures of data similarity that
are closer to perceptual similarity. Such measures of similarity for different
media types need to be quantified and should correspond to perceptual
similarity. This will also help the search process.

8) Multimedia data is created all over world, so it could have distributed database
features that cover the entire world as the geographic area. Thus, the media data
may reside in many different distributed storage locations.
8
Emerging Database
9) Multimedia data may have to be delivered over available networks in real-time. Models, Technologies
and Applications-I
Please note, in this context, the audio and video data is temporal in nature. For
example, the video frames need to be presented at the rate of about 30
frames/sec for smooth motion.

10) One important consideration with regard to Multimedia is that it needs to


synchronise multiple media types relating to one single multimedia object. Such
media may be stored in different formats, or different devices, and have
different frame transfer rates.

Multimedia data is now being used in many database applications. Thus, multimedia
databases are required for efficient management and effective use of enormous
amounts of data.

1.2.5 State of the Art of MMDBMS


The first multimedia database system ORION was developed in 1987. The mid 90s
saw several commercial MMDBMS being implemented from scratch. Some of them
were MediaDB, now MediaWay, JASMINE, and ITASCA (the commercial
successor of ORION). They were able to handle different kinds of data and support
mechanisms for querying, retrieving, inserting, and updating data. However, most of
these products are not on offer commercially and only some of them have adapted
themselves successfully to hardware, software and application changes.

These software are used to provide support for a wide variety of different media types,
specifically different media file formats such as image formats, video etc. These files
need to be managed, segmented, linked and searched.

The later commercial systems handle multimedia content by providing complex object
types for various kinds of media. In such databases the object orientation provides the
facilities to define new data types and operations appropriate for the media, such as
video, image and audio. Therefore, broadly MMDBMSs are extensible Object-
Relational DBMS (ORDBMSs). The most advanced solutions presently include
Oracle 10g, IBM DB2 and IBM Informix. These solutions purpose almost similar
approaches for extending the search facility for video on similarity-based techniques.

Some of the newer projects address the needs of applications for richer semantic
content. Most of them are based on the new MPEG-standards MPEG-7 and MPEG-
21.

MPEG-7
MPEG-7 is the ISO/IEC 15938 standard for multimedia descriptions that was issued
in 2002. It is XML based multimedia meta-data standard, and describes various
elements for multimedia processing cycle from the capture, analysis/filtering, to the
delivery and interaction.

MPEG-21 is the ISO/IEC 21000 standard and is expected to define an open


multimedia framework. The intent is that the framework will cover the entire
multimedia content delivery chain including content creation, production, delivery,
presentation etc.

Challenges for the Multimedia Database Technologies: Multimedia technologies


need to evolve further. Some of the challenges posed by multimedia database
applications are:

• the applications utilising multimedia data are very diverse in nature. There is a
need for the standardisation of such database technologies,

9
Emerging Trends and • technology is ever changing, thus, creating further hurdles in the way of
Example DBMS
Architectures
multimedia databases,
• there is still a need to refine the algorithms to represent multimedia information
semantically. This also creates problems with respect to information
interpretation and comparison.
) Check Your Progress 1
1) What are the reasons for the growth of multimedia data?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2) List four application areas of multimedia databases.
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
3) What are the contents of multimedia database?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
4) List the challenges in designing multimedia databases.
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………

1.3 SPATIAL DATABASE AND GEOGRAPHIC


INFORMATION SYSTEMS
A spatial database keeps track of an object in a multi-dimensional space. A spatial
database may be used to represent the map of a country along with the information
about railways, roads, irrigation facilities, and so on. Such applications are known as
Geographic Information Systems (GIS). Let us discuss GIS in this section.
The idea of a geographic database is to provide geographic information; therefore,
they are referred to as the Geographic Information System (GIS). A GIS is basically a
collection of the Geographic information of the world. This information is stored and
analysed on the basis of data stored in the GIS. The data in GIS normally defines the
physical properties of the geographic world, which includes:
• spatial data such as political boundaries, maps, roads, railways, airways, rivers,
land elevation, climate etc.
• non-spatial data such as population, economic data etc.

But what are the Applications of the Geographic Databases?


The applications of the geographic databases can be categorised into three broad
categories. These are:

• Cartographic Applications: These applications revolve around the capture and


analysis of cartographic information in a number of layers. Some of the basic
applications in this category would be to analyse crop yields, irrigation facility
10
planning, evaluation of land use, facility and landscape management, traffic Emerging Database
monitoring system etc. These applications need to store data as per the required Models, Technologies
and Applications-I
applications. For example, irrigation facility management would require study
of the various irrigation sources, the land use patterns, the fertility of the land,
soil characteristics, rain pattern etc. some of the kinds of data stored in various
layers containing different attributes. This data will also require that any
changes in the pattern should also be recorded. Such data may be useful for
decision makers to ascertain and plan for the sources and types of and means of
irrigation.

• 3-D Digital Modelling Applications: Such applications store information about


the digital representation of the land, and elevations of parts of earth surfaces at
sample points. Then, a surface model is fitted in using the interpolation and
visualisation techniques. Such models are very useful in earth science oriented
studies, air and water pollution studies at various elevations, water resource
management etc. This application requires data to be represented as attribute
based just as in the case of previous applications.

• The third kind of application of such information systems is using the


geographic objects applications. Such applications are required to store
additional information about various regions or objects. For example, you can
store the information about the changes in buildings, roads, over a period of
time in a geographic area. Some such applications may include the economic
analysis of various products and services etc.

Requirements of a GIS
The data in GIS needs to be represented in graphical form. Such data would require
any of the following formats:

• Vector Data: In such representations, the data is represented using some


geometric objects such as line, square, circle, etc. For example, you can
represent a road using a sequence of line segments.
• Raster Data: Here, data is represented using an attribute value for each pixel or
voxel (a three dimensional point). Raster data can be used to represent three-
dimensional elevation using a format termed digital elevation format. For object
related applications a GIS may include a temporal structure that records
information about some movement related detail such as traffic movement.

A GIS must also support the analysis of data. Some of the sample data analysis
operations that may be needed for typical applications are:

• analysing soil erosion


• measurement of gradients
• computing shortest paths
• use of DSS with GIS etc.

One of the key requirements of GIS may be to represent information in an integrated


fashion, using both the vector and raster data. In addition it also takes care of data at
various temporal structures thus, making it amenable to analysis.

Another question here is the capturing of information in two-dimensional and three-


dimensional space in digital form. The source data may be captured by a remote
sensing satellite, which can then be, further appended by ground surveys if the need
arises. Pattern recognition in this case, is very important for the capture and
automating of information input.

11
Emerging Trends and Once the data is captured in GIS it may be processed through some special operations.
Example DBMS
Architectures
Some such operations are:

• Interpolation for locating elevations at some intermediate points with reference


to sample points.
• Some operations may be required for data enhancement, smoothing the data,
interpreting the terrain etc.
• Creating a proximity analysis to determine the distances among the areas of
interest.
• Performing image enhancement using image processing algorithms for the
raster data.
• Performing analysis related to networks of specific type like road network.

The GIS also requires the process of visualisation in order to display the data in a
proper visual.

Thus, GIS is not a database that can be implemented using either the relational or
object oriented database alone. Much more needs to be done to support them. A
detailed discussion on these topics is beyond the scope of this unit.

1.4 GNOME DATABASES


One of the major areas of application of information technology is in the field of
Genetics. Here, the computer can be used to create models based on the information
obtained about genes. This information models can be used to study:

• the transmission of characteristics from one generation to next,


• the chemical structure of genes and the related functions of each portion of
structure, and
• the variations of gene information of all organisms.

Biological data by nature is enormous. Bioinformation is one such key area that has
emerged in recent years and which, addresses the issues of information management
of genetic data related to DNA sequence. A detailed discussion on this topic is beyond
the scope of this unit. However, let us identify some of the basic characteristics of the
biological data.

Biological Data – Some Characteristics:

• Biological data consists of complex structures and relationships.


• The size of the data is very large and the data also has a lot of variations across
the same type.
• The schema of the database keeps on evolving once or twice a year moreover,
even the version of schema created by different people for the same data may be
different.
• Most of the accesses to the database would be read only accesses.
• The context of data defines the data and must be preserved along with the data.
• Old value needs to be kept for future references.
• Complex queries need to be represented here.

The Human Genome Initiative is an international research initiative for the creation of
detailed genetic and physical maps for each of the twenty-four different human
chromosomes and the finding of the complete deoxyribonucleic acid (DNA) sequence
of the human genome. The term Genome is used to define the complete genetic
information about a living entity. A genetic map shows the linear arrangement of
genes or genetic marker sites on a chromosome. There are two types of genetic maps−
genetic linkage maps and physical maps. Genetic linkage maps are created on the

12
basis of the frequency with which genetic markers are co-inherited. Physical maps are Emerging Database
Models, Technologies
used to determine actual distances between genes on a chromosome. and Applications-I

The Human Genome Initiative has six strong scientific objectives:

• to construct a high-resolution genetic map of the human genome,

• to produce a variety of physical maps of the human genome,

• to determine the complete sequence of human DNA,

• for the parallel analysis of the genomes of a selected number of well-


characterised non-human model organisms,

• to create instrumentation technology to automate genetic mapping, physical


mapping and DNA sequencing for the large-scale analysis of complete
genomes,

• to develop algorithms, software and databases for the collection, interpretation


and dissemination of the vast quantities of complex mapping and sequencing
data that are being generated by human genome research.

Genome projects generate enormous quantities of data. Such data is stored in a


molecular database, which is composed of an annotated collection of all publicly
available DNA sequences. One such database is the Genbank of the National Institutes
of Health (NIH), USA. But what would be the size of such a database? In February
2000 the Genbank molecular database contained 5,691,000 DNA sequences, which
were further composed of approximately 5,805,000,000 deoxyribonucleotides.

One of the major uses of such databases is in computational Genomics, which refers
to the applications of computational molecular biology in genome research. On the
basis of the principles of the molecular biology, computational genomics has been
classified into three successive levels for the management and analysis of genetic data
in scientific databases. These are:

• Genomics.
• Gene expression.
• Proteomics.

1.4.1 Genomics
Genomics is a scientific discipline that focuses on the systematic investigation of the
complete set of chromosomes and genes of an organism. Genomics consists of two
component areas:

• Structural Genomics which refers to the large-scale determination of DNA


sequences and gene mapping, and

• Functional Genomics, which refers to the attachment of information


concerning functional activity to existing structural knowledge about DNA
sequences.

Genome Databases
Genome databases are used for the storage and analysis of genetic and physical maps.
Chromosome genetic linkage maps represent distances between markers based on
meiotic re-combination frequencies. Chromosome physical maps represent distances
between markers based on numbers of nucleotides.

13
Emerging Trends and Genome databases should define four data types:
Example DBMS
Architectures • Sequence
• Physical
• Genetic
• Bibliographic

Sequence data should include annotated molecular sequences.

Physical data should include eight data fields:

• Sequence-tagged sites
• Coding regions
• Non-coding regions
• Control regions
• Telomeres
• Centromeres
• Repeats
• Metaphase chromosome bands.

Genetic data should include seven data fields:

• Locus name
• Location
• Recombination distance
• Polymorphisms
• Breakpoints
• Rearrangements
• Disease association
• Bibliographic references should cite primary scientific and medical literature.

Genome Database Mining


Genome database mining is an emerging technology. The process of genome database
mining is referred to as computational genome annotation. Computational genome
annotation is defined as the process by which an uncharacterised DNA sequence is
documented by the location along the DNA sequence of all the genes that are involved
in genome functionality.

1.4.2 Gene Expression


Gene expression is the use of the quantitative messenger RNA (mRNA)-level
measurements of gene expression in order to characterise biological processes and
explain the mechanisms of gene transcription. The objective of gene expression is the
quantitative measurement of mRNA expression particularly under the influence of
drugs or disease perturbations.

Gene Expression Databases


Gene expression databases provide integrated data management and analysis systems
for the transcriptional expression of data generated by large-scale gene expression
experiments. Gene expression databases need to include fourteen data fields:

• Gene expression assays


• Database scope
• Gene expression data
• Gene name
• Method or assay
• Temporal information

14
• Spatial information Emerging Database
Models, Technologies
• Quantification and Applications-I
• Gene products
• User annotation of existing data
• Linked entries
• Links to other databases
o Internet access
o Internet submission.

Gene expression databases have not established defined standards for the collection,
storage, retrieval and querying of gene expression data derived from libraries of gene
expression experiments.

Gene Expression Database Mining


Gene expression database mining is used to identify intrinsic patterns and
relationships in gene expression data.

Gene expression data analysis uses two approaches:

• Hypothesis testing and


• Knowledge discovery.

Hypothesis testing makes a hypothesis and uses the results of perturbation of a


biological process to match predicted results. The objective of knowledge discovery is
to detect the internal structure of the biological data. Knowledge discovery in gene
expression data analysis employs two methodologies:

• Statistics functions such as cluster analysis, and


• Visualisation.

Data visualisation is used to display the partial results of cluster analysis generated
from large gene expression database cluster.

1.4.3 Proteomics
Proteomics is the use of quantitative protein-level measurements of gene expression in
order to characterise biological processes and describe the mechanisms of gene
translation. The objective of proteomics is the quantitative measurement of protein
expression particularly under the influence of drugs or disease perturbations. Gene
expression monitors gene transcription whereas proteomics monitors gene translation.
Proteomics provides a more direct response to functional genomics than the indirect
approach provided by gene expression.
Proteome Databases
Proteome databases also provide integrated data management and analysis systems for
the translational expression data generated by large-scale proteomics experiments.
Proteome databases integrate expression levels and properties of thousands of proteins
with the thousands of genes identified on genetic maps and offer a global approach to
the study of gene expression.
Proteome databases address five research problems that cannot be resolved by DNA
analysis:

• Relative abundance of protein products,


• Post-translational modifications,
• Subcellular localisations,
• Molecular turnover and
• Protein interactions.

15
Emerging Trends and The creation of comprehensive databases of genes and gene products will lay the
Example DBMS
Architectures
foundation for further construction of comprehensive databases of higher-level
mechanisms, e.g., regulation of gene expression, metabolic pathways and signalling
cascades.

Proteome Database Mining


Proteome database mining is used to identify intrinsic patterns and relationships in
proteomics data. Proteome database mining has been performed in areas such as
Human Lymphoid Proteins and the evaluation of Toxicity in drug users.

Some Databases Relating to Genome


The following table defines some important databases that have been developed for
the Genome.

Database Name Characteristics Database Problem Areas


GenBank Keeps information on the Schema is always evolving.
DNA/RNA sequences and This database requires
information on proteins linking to many other
databases
GDB Stores information on genetic It faces the same problem of
map linkages as well as non- schema evolution and linking
human sequence data of database. This database
also has very complex data
objects
ACEDB Stores information on genetic It also has the problem of
map linkages as well as non- schema evolution and linking
human sequence data. It uses of database. This database
object oriented database also has very complex data
technology objects

A detailed discussion on these databases is beyond the scope of this Unit. You may
wish to refer to the further readings for more information.

) Check Your Progress 2


1) What is GIS? What are its applications?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2) List the requirements of a GIS.
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
3) What are the database requirements for Genome?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
16
Emerging Database
1.5 KNOWLEDGE DATABASES Models, Technologies
and Applications-I

Knowledge databases are the database for knowledge management. But what is
knowledge management? Knowledge management is the way to gather, manage and
use the knowledge of an organisation. The basic objectives of knowledge management
are to achieve improved performance, competitive advantage and higher levels of
innovation in various tasks of an organisation.

Knowledge is the key to such systems. Knowledge has several aspects:

• Knowledge can be implicit (called tacit knowledge) which are internalised or


can be explicit knowledge.
• Knowledge can be captured before, during, or even after knowledge activity is
conducted.
• Knowledge can be represented in logical form, semantic network form or
database form.
• Knowledge once properly represented can be used to generate more knowledge
using automated deductive reasoning.
• Knowledge may sometimes be incomplete. In fact, one of the most important
aspects of the knowledge base is that it should contain upto date and excellent
quality of information.

Simple knowledge databases may consist of the explicit knowledge of an organsiation


including articles, user manuals, white papers, troubleshooting information etc. Such a
knowledge base would provide basic solutions to some of the problems of the less
experienced employees.
A good knowledge base should have:

• good quality articles having up to date information,


• a good classification structure,
• a good content format, and
• an excellent search engine for information retrieval.
One of the knowledge base technologies is based on deductive database technology.
Let us discuss more about it in the next sub-section.
1.5.1 Deductive Databases
A deductive database is a database system that can be used to make deductions from
the available rules and facts that are stored in such databases. The following are the
key characteristics of the deductive databases:

• the information in such systems is specified using a declarative language in the


form of rules and facts,
• an inference engine that is contained within the system is used to deduce new
facts from the database of rules and facts,
• these databases use concepts from the relational database domain (relational
calculus) and logic programming domain (Prolog Language),
• the variant of Prolog known as Datalog is used in deductive databases. The
Datalog has a different way of executing programs than the Prolog and
• the data in such databases is specified with the help of facts and rules. For
example, The fact that Rakesh is the manager of Mohan will be represented as:
Manager(Rakesh, Mohan) having the schema:
Manager(Mgrname, beingmangaed)
17
Emerging Trends and Similarly the following represents a rule:
Example DBMS
Architectures Manager(Rakesh, Mohan) :- Managedby(Mohan, Rakesh)

• Please note that during the representation of the fact the data is represented
using the attribute value only and not the attribute name. The attribute name
determination is on the basis of the position of the data. For instance, in the
example above Rakesh is the Mgrname.
• The rules in the Datalog do not contain the data. These are evaluated on the
basis of the stored data in order to deduce more information.

Deductive databases normally operate in very narrow problem domains. These


databases are quite close to expert systems except that deductive databases use, the
database to store facts and rules, whereas expert systems store facts and rules in the
main memory. Expert systems also find their knowledge through experts whereas
deductive database have their knowledge in the data. Deductive databases are applied
to knowledge discovery and hypothesis testing.

1.5.2 Semantic Databases


Information in most of the database management systems is represented using a
simple table with records and fields. However, simple database models fall short of
applications that require complex relationships and rich constructs to be represented
using the database. So how do we address such a problem? Do we employ object
oriented models or a more natural data model that represents the information using
semantic models? Semantic modeling provides a far too rich set of data structuring
capabilities for database applications. A semantic model contains far too many
constructs that may be able to represent structurally complex inter-relations among
data in a somewhat more natural way. Please note that such complex inter-
relationships typically occur in commercial applications.

Semantic modeling is one of the tools for representing knowledge especially in


Artificial Intelligence and object-oriented applications. Thus, it may be a good idea to
model some of the knowledge databases using semantic database system.

Some of the features of semantic modeling and semantic databases are:

• these models represent information using high-level modeling abstractions,


• these models reduce the semantic overloading of data type constructors,
• semantic models represent objects explicitly along with their attributes,
• semantic models are very strong in representing relationships among objects,
and
• they can also be modeled to represent IS A relationships, derived schema and
also complex objects.
Some of the applications that may be supported by such database systems in addition
to knowledge databases may be applications such as bio-informatics, that require
support for complex relationships, rich constraints, and large-scale data handling.

1.6 INFORMATION VISUALISATION


Relational database offers one of the simplest forms of information visualisation in the
form of the tables. However, with the complex database technologies and complex
database inter-relationship structures, it is important that the information is presented
to the user in a simple and understandable form. Information visualisation is the
branch of Computer Graphics that deals with the presentation of digital images and
interactions to the users in the form that s/he can handle with ease. Information
visualisation may result in presentation of information using trees or graph or similar
data structures.
18
Another similar term used in the context of visualisation is knowledge visualisation Emerging Database
the main objective of which is to improve transfer of knowledge using visual formats Models, Technologies
and Applications-I
that include images, mind maps, animations etc.

Please note the distinction here. Information visualisation mainly focuses on the tools
that are supported by the computer in order to explore and present large amount of
data in formats that may be easily understood.

You can refer to more details on this topic in the fifth semester course.

) Check Your Progress 3


1) What is a good knowledge base?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2) What are the features of deductive databases?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………

3) State whether the following are True or False:


(a) Semantic model is the same as an object.

(b) IS A relationship cannot be represented in a semantic model.

(c) Information visualisation is used in GIS.

(d) Knowledge visualisation is the same as information visualisation.

1.7 SUMMARY
This unit provides an introduction to some of the later developments in the area of
database management systems. Multimedia databases are used to store and deal with
multimedia information in a cohesive fashion. Multimedia databases are very large in
size and also require support of algorithms for searches based on various media
components. Spatial database primarily deals with multi-dimensional data. GIS is a
spatial database that can be used for many cartographic applications such as irrigation
system planning, vehicle monitoring system etc. This database system may represent
information in a multi-dimensional way.

Genome database is another very large database system that is used for the purpose of
genomics, gene expression and proteomics. Knowledge database store information
either as a set of facts and rules or as semantic models. These databases can be utilised
in order to deduce more information from the stored rules using an inference engine.
Information visualisation is an important area that may be linked to databases from
the point of visual presentation of information for better user interactions.

19
Emerging Trends and
Example DBMS
Architectures
1.8 SOLUTIONS/ANSWERS
Check Your Progress 1
1) a) Advanced technology in terms of devices that were digital in nature and
support capture and display equipment.
b) High speed data communication network and software support for
multimedia data transfer.
c) Newer application requiring multimedia support.
2) Medical media databases
Bio-informatics
Home media
News etc.
3) Content can be of two basic types:
a) Media Content
b) Meta data, which includes media, format data, media keyword data and
media feature data.
4) Some of the challenges are:
a) Support for different types of input/output
b) Handling many compressions algorithms and formats
c) Differences in OS and hardware
d) Integrating to different database models
e) Support for queries for a variety of media types
f) Handling different kinds of indices
g) Data distribution over the world etc.
Check Your Progress 2

1) GIS is a spatial database application where the spatial and non-spatial data is
represented along with the map. Some of the applications of GIS are:
• Cartographic applications
• 3-D Digital modeling applications like land elevation records
• Geographic object applications like traffic control system.
2) A GIS has the following requirements:
• Data representation through vector and raster
• Support for analysis of data
• Representation of information in an integrated fashion
• Capture of information
• Visualisation of information
• Operations on information
3) The data may need to be organised for the following three levels:
• Geonomics: Where four different types of data are represented. The
physical data may be represented using eight different fields.
• Gene expression: Where data is represented in fourteen different fields
• Proteomics: Where data is used for five research problems.
Check Your Progress 3
1) A good knowledge database will have good information, good classification and
structure and an excellent search engine.

2) They represent information using facts and rules


New facts and rules can be deduced
Used in expert system type of applications.

3) a) False b) False c) True d) False.

20
Emerging Database
Models, Technologies
and Applications-I

21
Emerging Database
UNIT 2 EMERGING DATABASE MODELS, Models, Technologies
and Applications-II
TECHNOLOGIES AND
APPLICATIONS - II
Structure Page Nos.
2.0 Introduction 21
2.1 Objectives 21
2.2 Mobile and Personal Database 22
2.2.1 The Basic Framework for Mobile Computing
2.2.2 Characteristics of Mobile Databases
2.2.3 Wireless Networks and Databases
2.3 Web Databses 24
2.4 Accessing Databases on different DBMSs 28
2.4.1 Open Database Connectivity (ODBC)
2.4.2 Java Database Connectivity (JDBC)
2.5 Digital Libraries 32
2.6 Data Grid 33
2.7 Summary 35
2.8 Solutions/Answers 36

2.0 INTRODUCTION
Database applications have advanced along with the advancement of technology from
age old Relational Database Management Systems. Database applications have
moved to Mobile applications, Web database applications, Digital libraries and so on.
The basic issues to all advanced applications are in making up-to-date data available
to the user, in the desired format related anywhere, anytime. Such technology requires
advanced communication technologies; advanced database distribution models and
advanced hardware. In this unit, we will introduce the concepts of mobile and
personal databases, web databases and the issues concerned with such databases. In
addition, we will also discuss the concepts and issues related to Digital Libraries, Data
Grids and Wireless Communication and its relationship with Databases.
Mobile and personal databases revolve around a mobile computing environment and
focuses on the issues that are related to a mobile user. Web databases on the other
hand, are very specific to web applications. We will examine some of the basic issues
of web databases and also discuss the creation of simple web database development.
We will examine some of the concepts associated with ODBC and JDBC − the two
standards for connecting to different databases. Digital Libraries are a common source
of information and are available through the databases environment. Such libraries are
equipped to handle information dissemination, searching and reusability. Data grids
allow distributed storage of data anywhere as one major application unit. Thus, all
these technologies have a major role to play in the present information society. This
unit discusses the issues and concerns of technology with respect to these databases
system.

2.1 OBJECTIVES
After going through this unit, you should be able to:
• define the requirements of a mobile database systems;
• identify and create simple web database;
• use JDBC and ODBC in databases;
• explain the concept of digital libraries, and
• define the concept of a data grid.
21
Emerging Trends and
Example DBMS 2.2 MOBILE AND PERSONAL DATABASE
Architectures

In recent years, wireless technology has become very useful for the communication of
information. Many new applications of wireless technology are already emerging
such as electronic valets which allow electronic money anywhere, anytime, mobile
cells, mobile reporting, etc. The availability of portable computing devices that
supports wireless base a communication helps database users access relevant
information from anywhere, at anytime. Let us discuss mobile database systems and
the issue related to it in this section.

2.2.1 The Basic Framework for Mobile Computing


Mobile Computing is primarily a distributed architecture. This basic architecture is
shown in Figure 1.

Mobile
units
Mobile
Support Wireless LAN
Stations

High Speed
Host Network-
Hosts
Wired

Mobile
Support
Wired LAN Station
Hosts

Mobile units

Figure 1: A typical view for Mobile Computing

A mobile computing environment consists of:

• host Computers which are fixed,


• mobile support stations,
• mobile client units, and
• networks.

Please note that the basic network may be a wired network but there are possibilities
of Wireless LANs as well.
22
2.2.2 Characteristics of Mobile Databases Emerging Database
Models, Technologies
and Applications-II
The mobile environment has the following characteristics:
1) Communication Latency: Communication latency results due to wireless
transmission between the sources and the receiver. But why does this latency
occur? It is primarily due to the following reasons:
a) due to data conversion/coding into the wireless formats,
b) tracking and filtering of data on the receiver, and
c) the transmission time.

2) Intermittent wireless connectivity: Mobile stations are not always connected to the
base stations. Sometimes they may be disconnected from the network.

3) Limited battery life: The size of the battery and its life is limited. Information
communication is a major consumer of the life of the battery.

4) Changing location of the client: The wireless client is expected to move from a
present mobile support station to an other mobile station where the device has
been moved. Thus, in general, the topology of such networks will keep on
changing and the place where the data is requested also changes. This would
require implementation of dynamic routing protocols.
Because of the above characteristics the mobile database systems may have the
following features:

Very often mobile databases are designed to work offline by caching replicas of the
most recent state of the database that may be broadcast by the mobile support station.
The advantages of this scheme are:

• it allows uninterrupted work, and


• reduces power consumption as data communication is being controlled.

However, the disadvantage is the inconsistency of data due to the communication gap
between the client and the server.
However, some of the challenges for mobile computing are:
(i) Scalability: As the number of stations increase, latency increases. Thus, the time
for servicing the client also increases. This results in increase in latency, thus
more problems are created for data consistency.

The solution: Do data broadcasting from many mobile stations thus, making the
most recent information available to all, thus eliminating the enough latency time.

(ii) Data Mobile problem: Client locations keeps on changing in such networks thus,
keeping track of the location of the client is important for, the data server and data
should be made available to the client from the server which is minimum latency
way from the client.
2.2.3 Wireless Networks and Databases
A mobile support station covers a large geographic area and will service all the mobile
hosts that are available in that wireless geographical area − sometimes referred to as
the cell. A mobile host traverses many mobile zones or cells, requiring its information
to be passed from one cell to another, not necessarily adjacent, as there is an
overlapping of cells. However, with the availability of wireless LANS now, mobile
hosts that are in some LAN areas may be connected through the wireless LAN rather
than a wide area cellular network. Thus, reducing the cost of communication as well
as overheads of the cell movement of the host.
23
Emerging Trends and With wireless LAN technology it is now possible that some of the local mobile hosts
Example DBMS may communicate directly with each other without the mobile support station.
Architectures
However, please note that such a communication should be based on some standard
protocol/technology. Fortunately, the Blue tooth standard is available. This standard
allows wireless connectivity on short ranges (10-50 metres) and has nearly 1
megabytes per second speed, thus allowing easy use of PDA mobile phones and
intelligent devices. Also there are wireless LAN standards such as 801.11 and 802.16.
In addition, wireless technology has improved and packet based cellular networking
has moved to the third generation or is even more advanced allowing high-speed
digital data transfer and applications. Thus, a combination of all these technologies
viz., blue tooth, wireless LANs and 3G cellular networks allows low cost
infrastructure for wireless communication of different kinds of mobile hosts.

This has opened up a great potential area for mobile applications where large, real-
time, low cost database applications can be created for areas such as, just in time
accounting, teaching, monitoring of resources and goods etc. The major advantage
here would be that the communication of real time data through these networks would
now be possible at low costs.

Major drawback of mobile databases – is the limitation of power and available size of
the display unit has found newer technologies like use of flash, power-saving disks,
low-powered and power saving displays. However, the mobile devices since are
normally smaller in size requires creation of presentation standards. One such protocol
in the areas of wireless networking is the Wireless Access Protocol (WAP).

Thus, mobile wireless networks have opened up the potential for new mobile database
applications. These applications may be integrated into very strong web-based,
intelligent, real time applications.

2.3 WEB DATABASES


The term web database can be used in at least two different way:
Definition 1: A web database may be defined as the organised listing of web pages
for a particular topic. Since the number of web pages may be large for a topic, a web
database would require strong indexing and even stronger spider or robot based search
techniques.

We are all familiar with the concept of web searching. Web searching is usually text
based and gathers hundreds or thousands of web pages together as a result of a search.
But how can we sort through these pages solve this problem? How do the web
database could help in this regard. However, this is not the basic issue for the
discussion in this section. We would like to concentrate on the second definition of a
web database.

Definition 2: A web database is a database that can be accessed through the web.

This definition actually defines a web application with the database as the backend.
Let us discuss more about web database application systems.

Features of a Web Database Application


A Web database application should have the following features:

• it should have proper security,


• it should allow multiple users at a time,
• it should allow concurrent online transactions,
• it should produce a response in finite time,
24
• its response time should be low, Emerging Database
Models, Technologies
• it should support client – server (2-tier or 3 tier) architecture through the website. and Applications-II
• on the three-tier architecture an additional application server tier operates between
the client and server.

Creating a Web Database Application


How do we create a web database? Well, we would like to demonstrate this with the
help of a very simple example. To implement the example, we use Active Sever Pages
(ASP), which is one of the old popular technologies with the backend as Ms-Access.
ASP is a very easy technology to use. Although, Microsoft Access does not support
websites with a lot of traffic, it is quite suitable for our example. A web database
makes it very convenient to build a website.

Let us show the process of creating a web database. We would need to follow the
following steps to create a student database and to make it accessible as a web
database.

• The first step is to create a database using Ms-Access with the following
configuration:
Student-id Text (10) Primary Key
Name Text (25)
Phone Text (12)
Now, you would need to enter some meaningful data into the database and save it
with the name students.mdb.

• Put your database online by using ftp to transfer students.mdb to the web server
on which you are allowed access. Do not put the file in the same directory in
which your web site files are stored, otherwise, the entire databases may be
downloaded by an unauthorised person. In a commercial set up it may be better to
keep the data on the Database server. This database then can be connected through
a Data Source Name (DSN) to the website. Let us now build the required interface
from ASP to the Database. A simple but old method may be with connecting ASP
using ActiveX Data Object (ADO) library. This library provides ASP with the
necessary functionality for interacting with the database server.
• The first and most basic thing we need to do is to retrieve the contents of the
database for display. You can retrieve database records using ADO Recordset,
one of the objects of ADO.
Dim recordsettest
Set recordsettest = Server.CreateObject(“ADODB.Recordset”)
The commands given above create a variable (recordsettest) to store new
Recordset object using the Server object’s CreateObject method.

• Now fill this Recordset with records from the database with its Open method.
Open takes two parameters:
o the table name that contains the records to be fetched and
o the connection string for the database.

Now, the name of the table is straight forward, as we would obviously, create a table
with a name. However, the connection string is slightly more complex. Since the
ADO library is capable of connecting to many database servers and other data
sources, the string must tell Recordset not only where to find the database (the path
and file name) but also how to read the database, by giving the name of its database
provider.
25
Emerging Trends and A database provider is a software that has a very important role. It allows ADO to
Example DBMS communicate with the given type of database in a standard way. ADO the provider for
Architectures
MS-Access, SQL Server, Oracle, ODBC database servers etc. Assuming that we are
using the provider Jet.OLEDB to connect to an Access database, the connection string
would be:
Provider = Microsoft.Jet.OLEDB.version; Data Source = ~\student\student.mdb

A connection string for MS SQL may be like:


Provider = SQLOLEDB; Data Source = servername; Initial Catalog = database name;
User Id = username; Password = password.

So a sample code could be:


Dim recordsettest
Dim db_conn ‘The database connection string
recordsettest= Server.CreateObject (“ADODB.Recordset”)
db_conn = “Provider=Microsoft.Jet.OLEDB.version; Data Source=
~\student\student.mdb
recordsetest.Open “student”, db_conn

However, since many ASP pages on the site will require such a string, it is common to
place the connection string in an application variable in the global workspace.

<SCRIPT LANGUAGE= “VBScript” RUNAT= “Server”>


Sub Application_OnStart ( )
Dim db_conn
db_conn= “Provider” = Microsoft.Jet.OLEDB.version;
Data Source= ~\student\student.mdb Application (“db_conn”) = db_conn
End Sub
</SCRIPT>

The code to retrieve the contents of the student table will now include only
the code for Recordset.

Dim recordsettest
Set recordsettest = Server.CreateObject(“ADODB.Recordset’’)
recordsettest.Open “Student”, Application(“db_conn”)

Thus, we have established the connection and can get information from the
MS-Access database. But another problem remains. How do we display the
results?

• Now, we can access recordsets, like of the database table, the result sets with the
data row as one database record. In our example, we have put the contents of the
student table into the recordsetrest object. An opened Recordset keeps track of the
current record. Initially the first record is the current record. A MoveNext method
of the Recordset object moves the pointer to the next record in the set, if any. An
EOF property of the Recordset will become true at the end of Recordset. Thus, to
display student id and name, you may write the following code:

Do While Not recordsettest.EOF


Response. Write “<li> “& recordsettest (“students-id”) &”
Response.Write ‘’<p> “& recordsettest (“name”) &” </p> </li>
recordsetest(‘’name)
recordsettest.MoveNext
Loop
If recordsettest.BOF Then
26
Response.Write “<p>No students data in the database. </p>” Emerging Database
Models, Technologies
and Applications-II
Please note that the BOF property checks if the file is empty.

• Once you have completed the task then you must be close the recordset as:
Recordsettest.Close
This command sets free the connection to the database. As these connections
may not be very large in numbers, therefore, they need not be kept open
longer than it is necessary.

Thus, the final code may look like:


<html>
<head>
<title> Student Data</title>
</head>
<body>
<o1>
<%
Dim recordsettest
Set recordsettest = Server.CreateObject(“ADODB.Recordset’’)
recordsettest.Open “Student”, Application(“db_conn”)
Do While Not recordsettest.EOF
Response. Write “<li> “& recordsettest (“students-id”) &”
Response.Write ‘’<p> “& recordsettest (“name”) &” </p> </li>
recordsetest(‘’name)
recordsettest.MoveNext
Loop
If recordsettest.BOF Then
Response.Write “<p>No students data in the database. </p>”
Recordsettest.Close

Save this file on the web server. Now, test this program by storing suitable data in the
database. This application should produce the simple list of the student. However, you
can create more complex queries to data using SQL. Let us try to explain that with the
help of an example.

Example: Get the list of those students who have a phone number specified by you.
More than one student may be using use this phone number as a contact number.
Please note that in actual implementation you would need to create more complex
queries.

Please also note that the only change we need to perform in the programming given
above is, in the Open statement where we need to insert SQL command, such that on
opening the recordset only the required data from the database is transferred to it.
Thus, the code for this modified database access may be:
<html>
<head>
<title> Student Data</title>
</head>
<body>
<o1>
<%
Dim recordsettest
Set recordsettest = Server.CreateObject(“ADODB.Recordset’’)
recordsettest.Open “SELECT student-id, name FROM Student
WHERE phone = “ & Request (“phone”) ”,
Application(“db_conn”)
27
Emerging Trends and Do While Not recordsettest.EOF
Example DBMS Response.Write “<li> “& recordsettest (“students-id”) &”
Architectures
Response.Write “<p> “& recordsettest (“name”) &” </p> </li>
recordsetest(‘’name)
recordsettest.MoveNext
Loop
If recordsettest.BOF Then
Response.Write “<p>No students data in the database. </p>”
Recordsettest.Close

You can build on more complex queries. You may also refer to more advanced ASP
versions and connections. A detailed discussion on these is beyond the scope of the
unit.

) Check Your Progress 1


1) What are the different characteristics of mobile databases?
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
2) What are the advantages of using wireless LAN?
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
3) What are the steps required to create a web database?
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………

2.4 ACCESSING DATABASES ON DIFFERENT


DBMSS
Database access from different interfaces may require some standards. The two
standards that are quite useful are ODBC for accessing a database from any DBMS
environment that support ODBC and JDBC that allows JAVA program access
databases. Let us discuss them in more detail in this unit.
2.4.1 Open Database Connectivity (ODBC)
Open Database Connectivity (ODBC) is a standard API (Application Programming
Interface) that allows access to as database that is a part of any DBMS that supports
this standard. By using ODBC statements in your program you may access any
database in MS-Access, SQL Server, Oracle, DB2, even an Excel file, etc. to work
with ODBC driver for each of the database that is being accessed from the ODBC.
The ODBC allows program to use standard SQL commands for accessing database.
Thus, you need not master the typical interface of any specific DBMS.
For implementing ODBC in a system the following components will be required:
• the applications themselves,
• a core ODBC library, and
• the database drives for ODBC.
28
The library functions as the transitory/ interpreter between the application and the Emerging Database
Models, Technologies
database drivers. Database drivers are written to hide DBMS specific details. Such and Applications-II
layered configurations allow you to use standard types and features in your DBMS
applications without, knowing details of specific DBMS that the applications may
encounter. It also simplifies the implementations of database driver as they need to
know only the core library. This makes ODBC modular. Thus, ODBC provides the
standard data access using ODBC drivers that exist for a large variety of data sources
including drivers for non-relational data such as spreadsheet, XML files.

ODBC also has certain disadvantages. These are:

• if a large number of client machines require different drivers, and DLLs are to be
connected through ODBC then it has a complex and large administration
overhead. Thus, large organisations are on the lookout for server side ODBC
technology.

• the layered architecture of ODBC may introduce minor performance penalty.

The Call Level Interface (CLI) specifications of SQL are used by the ODBC as its
base. The ODBC and its applications are becoming stronger. For example, Common
Request Broker Architecture (CORBA) that is distributed object architecture, and the
Persistent Object Service (POS) – an API, are the superset of both the Call-Level
Interface and ODBC. If you need to write database application program in JAVA,
then you need Java database Connectivity (JDBC) application program interface.
From JDBC you may use a JDBC-ODBC “bridge” program to reach ODBC-
accessible database.

The designers of ODBC wanted to make it independent of any programming


language, Operating System or DBMS.

Connecting Database to ODBC


A database application needs to have the following types of statements for connecting
to ODBC:

• linkage of core ODBC Library to client program,


• call results in communication to the request of the client library to the server, and
• results fetched at the client end.

But how do we actually write a code using ODBC? The following steps are needed to
write a code using ODBC:

• establish the connection,


• run a SQL query that may be written by using embedded SQL, and
• release the connection.

But how do we establish the connection? Some of the ways of establishing a


connection are:
An ODBC/DSN connection to MySqL using ASP:
Dim db_conn
set db_conn = server.createobject(“adodb.connection”)
db_conn.open(“dsn=DSNmysql; uid = username; pwd = password; database =
student”)

29
Emerging Trends and What is a DSN in the connection above? It is a Data Source Name that has been given
Example DBMS to a specific user connection. DSN are more secure as you have to be the user as
Architectures
defined in DSN otherwise, you will not be allowed to use the data.

ODBC setting for Oracle:


To set a connection you may need to edit the odbc.ini file in certain installations of
Oracle. The basic objective here is to allow you to connection to the relational data
source of Oracle. Let us assume that we are using Oracle Version 8 and trying to
connect to an Oracle data source oradata using an ODBC driver. You need to write the
following:

[ODBC data Sources]


oradata=Oracle8 Source Data

[oradata]
Driver = / . . . /odbc/drivername
Description=my oracle source
ServerName=OracleSID

The last line defines the name of an Oracle database defined in the environment file of
Oracle.

Once the connection is established through the required ODBC driver to the database
through, the user-id and password, you can write the appropriate queries using SQL.
Finally, you need to close the connection using close( ) method.

2.4.2 Java Database Connectivity (JDBC)


Accessing a database in Java requires, Java Database Connectivity (JDBC). JDBC
allows you to accesses a database in your application and applets using a set of JDBC
drivers.

What is JDBC?
Java Database Connectivity (JDBC) provides a standard API that is used to access
databases, regardless of the DBMS, through JAVA. There are many drivers for JDBC
that support popular DBMSs. However, if no such driver exits for the DBMS that you
have selected then, you can use a driver provided by Sun Microsystems to connect to
any ODBC compliant database. This is called JDBC to ODBC Bridge. For such an
application, you may need to create, an ODBC data source for the database before,
you can access it from the Java application.

Connecting to a Database
In order to connect to a database, let us say an oracle database, the related JDBC
driver has to be loaded by the Java Virtual Machine class loader successfully.

// Try loading the oracle database driver


try
{
Class.forName(“oracle.jdbc.driver.OracleDriver”).newInstance( ) ;
}
catch (ClassNotFoundException ce) // driver not found
{
System.err.println (“Drivernotfound”);
// Code to handle error
}

30
Now, you can connect to the database using the driver manager class that selects the Emerging Database
Models, Technologies
appropriate driver for the database. In more complex applications, we may use and Applications-II
different drivers to connect to multiple databases. We may identify our database using
a URL, which helps in identifying the database. A JDBC URL starts with “jdbc:” that
indicates the use of JDBC protocol.

A sample database URL may be

jdbc:[email protected]:2000:student

To connect to the database, we need to connect with a username and password.


Assuming it to be “username” and “password”, the connection string would be:

// Try creating a database connection


Connection db_conn =
DriverManager.getConnection
(jdbc:[email protected]:student, “username”,“password”)

Thus, you will now be connected. Now the next step is to execute a query.

You can create a query using the following lines:

Statement stmt = db_conn.CreateStatement( )


try {
stmt.executeQuery (
// Write your SQL query using SQL and host language
)
} catch (………) {……}
stmt.close ( )
db_conn.close ( )

Thus, the JDBC standard allows you to handle databases through JAVA as the host
language. Please note that you can connect to any database that is ODBC compliant
through JAVA either through the specialised driver or through the JDBC – ODBC
bridge if no appropriate driver exists.

) Check Your Progress 2


1) Why is there a need for ODBC?
…………………………………………..…………………………..………………
………………………………………..…………………………………..…………
………………………………………………………………………………
2) What are the components required for implementing ODBC?
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
3) Why do you need JDBC? What happens when a DBMS does not have a JDBC
driver?
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………

31
Emerging Trends and
Example DBMS 2.5 DIGITAL LIBRARIES
Architectures
Let us now, define a digital library.
“ A digital library is a library that allows almost the same functionality as that of a
traditional library, however, having most of its information resources in digital form
that are stored using multimedia repositories. The information to the digital library
may be through the web or intranet.”

The following may be some of the objectives of the digital library:

• to allow online catalog we access,


• to search for articles and books,
• to permit search on multiple resources at a time,
• to refine the search to get better results,
• to save the search results for future use, and
• to access search results by just clicking on the result.

Please note that most of the objectives above can easily be fulfilled due to the fact that
a digital library allows web access.

But what are the advantages of the digital library?

A digital library has the following advantages:

• very large storage space,


• low cost of maintenance,
• information is made available in rich media form,
• round the clock operation,
• multi-user access, and
• can become a part of the world knowledge base/knowledge society.

Now the next question is what is the cost involved in creating and maintaining Digital
Libraries?

The following are the cost factors for the creation and maintenance of digital libraries:

• cost of conversion of material into digital form,


• cost of maintaining digital services including cost of online access, and
• cost of maintains archival information.

So far, we have seen the advantages of digital libraries, but do digital libraries have
any disadvantages?

Following are some of the disadvantages of digital libraries:

• only information available public domain can be made available,


• a digital library does not produce the same environment as a physical library, and
• the person needs to be technology friendly in order to use the library.

At present many Universities offer digital library facility online.


Functionality of Digital library
A digital library supports the following functions:

• Searching for information: A digital library needs a very strong search facility. It
should allow search on various indexes and keyword. It should have a distributed
search mechanism to provide answers to individual users search.
32
• Content management: A digital library must have facility for the continuous Emerging Database
Models, Technologies
updating of source information. Older a information in such cases may need to be and Applications-II
archived.
• Licenses and rights management: Only appropriate/authorised users are given
access to protected information. A library needs to protect the copyright.
• All the links to further information should be thoroughly checked
• The library should store the Meta data in a proper format. One such structure for
storing the meta data is the Dublin core.
• Library information should be represented in standard data formats. One such
format may be XML. The contents may be represented in XML, HTML, PDF,
JPEG, GIF, TIFF etc.

Technology Required for the Digital Library


A digital library would require expensive technology. Some of these requirements
may be:

• Very large, reliable storage technologies supporting tera or even peta bytes of
information. You may use Storage Access Networks (SAN).
• The transfer rate of information from the storage point to the computer should be
high in order to fulfil the request of many users.
• High Internet bandwidth to store many users at a time.
• A distributed array of powerful servers that may process large access requests at
the same time.
• Very reliable library software. Such software are developed on top of RDBMS to
support features of full text indexing, meta-data indexing etc.

2.6 DATA GRID


Let us first define the term data grid. The concept of a data grid is somewhat difficult
to define. A data grid can be seen as the process of creating a virtual database across
hardware of almost the entire data that exists in some form. Thus, the key concept is
Virtual Database. The basic tenet behind a data grid is that an application need not
know either the place or the DBMS where the data is stored; rather the application is
only interested in getting the correct results. Figure 2 shows a typical structure for a
data grid.

Applications Applications

Data Grid
Structure
Placement and Data Policy manager

Virtual Database
Database Database Database

Figure 2: A Data Grid

33
Emerging Trends and A data grid should address the following issues:
Example DBMS
Architectures • It should allow data security and domain specific description of data. Domain
specific data helps in the identification of correct data in response to a query.
• It should have a standard way of representing information. [The possible
candidates in this category may be XML].
• It should allow simple query language like SQL for accessing information.
• The data requirements should be fulfilled with a level of confidence, (that is there
has to be a minimum quality of service in place).
• There needs to be a different role assigned to the data administrator. The data
administrators should not be concerned with what the data represent. Data domain
specification should be the sole responsibility of the data owner. Unfortunately,
this does not happen in the present database implementations.
• The separation of the data administration and the data manager will help the
database administrator to concentrate on data replication and query performance
issues based on the DBMS statistics.
Thus, a data grid virtualises data. Many DBMSs today, have the ability to separate the
roles of the database administrator and the data owner from the application access.
This needs to be extended to the grid across a heterogeneous infrastructure.

What are the implications of using a data grid?


An application written in a data grid, references a virtual data source with a specified
quality of service. Such an application need not be compiled again even if the data
moves from one server to another or, there is a change in the infrastructure or due to
changes in the data access methods. This happens as in the data grid, data sources are
transparent to the application. Thus, it moves the concept of data independence further
up the ladder.

A data grid permits a data provider with facilities that need not explicitly mention the
location and structure of current and future applications. This information can be
published in the data dictionary or Registry. A grid middleware then, is needed, to
query the registry in order to locate this information for the applications. An
administrator in a data grid is primarily concerned with the infrastructure and its
optimum uses and providing the required qualities of service. Some of the things that
the administrator is allowed to do based on the statistics stored in the data dictionary
includes, - enabling replication, failure recovery, partitioning, changing the
infrastructure etc.

Data Grid Applications


A data grid includes most of the relational database capabilities including schema
integration, format conversion of data, distributed query support etc. In addition, a
data grid should be scalable, reliable and should support efficient file access. These
things are not easy to do. However, in this sub-section let us try to define some of the
applications that a data grid may have.

A Medical Grid: Consider a situation in which a patient is admitted to a hospital


where s/he has no record. By querying the data grid on a key field (may be a voter’s id
number) all the information about the previous medical history of the patient can be
obtained using the data grid of hospitals. The data grid of hospitals may have their
independent database systems, which is a part of the data grid. A hospital need not
hand over all the confidential data as part of the data grid. Please note that the main

34
feature here is that the hospital is in complete control of its data. It can change, hide, Emerging Database
Models, Technologies
and secure any part of its own database while, participating in the data grid federation. and Applications-II
Thus, a virtual medical database that is partitioned across hospitals can be made.
Please note that a query can be run across the whole set of hospitals and can retrieve
consolidated results. A query may not need to retrieve data from all the hospitals
participating in the grid to get significant information, for example, a query about the
symptoms of a disease that has been answered by 70 percent of the hospitals can
return meaningful results. However, on the other hand some queries would require all
the data grid members to participate. For instance, a query to find the patient's
complete medical record wherever its parts are stored, would require all the hospitals
to answer.
Some of the major requirements of a data grid are:

• Handling of failure of a data source: A grid may have replicas and caches, but
grid applications tend to access a data resource. But needs to be happens when
this data source fails? A data grid flexible and it should have a middleware that
automatically moves the operations to either another data resource with a similar
data set or to multiple data resources each with a subset of the data.
• Parallel access for local participant: Some organisations may have very large
data, thus, a query on that would take longer and such a query could be beyond
the performance criteria set for the grid. Therefore, it may be a good idea to use a
"virtual distributed" database to fetch data in parallels and keep processing it.
• Global access through SQL: A data grid would require dynamic selection of data
sources. Therefore, it requires a complete SQL query transformation and
optimisation capability.
• A data grid application may need to access data sources such as content
management systems, Excel files, or databases not yet supported.
) Check Your Progress 3
1) What are the basic functions of a digital library?
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
2) What are the advantages of Data Grid?
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
3) What are different types of hardware and software required for digital libraries?
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………

2.7 SUMMARY
This unit introduced you to several concepts related to emerging database
applications. This unit also provides insight into some of the practical issues on
connecting to databases from any DBMS or using JAVA, as well as analysed simple
applications related to web database.
35
Emerging Trends and The unit introduces the requirements of mobile databases. Although most of the
Example DBMS mobile applications in the past work on data broadcasting, however, this may change
Architectures
in the era when new wireless LAN technologies are available. This may give rise to
new real time database applications. The ODBC and JDBC are two standards that can
be used to connect to any database of any DBMS on any operating system and JAVA
respectively. A web database may be the backend for the browser front end or the
browser – application server tiers. However, it basically provides access to the data
through the web. Digital libraries are a very useful application that has emerged in
recent years. Digital libraries follow some of the meta-data and storage standards.
The digital libraries also support distributed search. A Data grid is a virtual database
created for the purpose of information sharing. The grid is loosely controlled.

2.8 SOLUTIONS/ANSWERS
Check Your Progress 1

1) The following are the characteristics of mobile databases:


• Mobile database relies on the broadcast of data
• Mobile stations may be working in standalone mode most of the time
• The data on a mobile workstation may be inconsistent
• Mobile database may be made scalable
• Mobile database are closer to distributed database technology
• A mobile unit may be changing its physical location. It is to be reached at all
locations.

2) Wireless LANs may allow low cost communication between two mobile units that
are located in the same LAN area. Thus, it may result in reduced cost of operation.

3) The following steps are required to create a web database:


• Create a database and put it on a database server.
• Create a connection string to connect to the server through a valid username
and password
• Open the connection to bring in the required data based on the suitable query
interactions
• Format and display the data at the client site.

Check Your Progress 2

1) ODBC allows using standard SQL commands to be used on any database on any
DBMS. It gives application designer freedom from learning the features of
individual DBMS, or OS etc. Thus, it simplifies the task of database
programmers.

2) The following three components are required to implement ODBC:


• The application
• The core ODBC library
• The database driver for ODBC

3) JDBC is an API that allows Java programmers to access any database through the
set of this standard API. In case a JDBC driver is not available for a DBMS then
the ODBC-JDBC bridge can be used to access data.
36
Check Your Progress 3 Emerging Database
Models, Technologies
and Applications-II
1) A digital library supports
• Content management
• Search
• License management
• Link management
• Meta data storage

2) Data Grid is helpful in the sharing of large amount of information on a particular


topic. Thus, allowing a worldwide repository of information, while, on the other
hand, still giving full control of information to the creator.

3) Digital libraries require:


• Very large secondary storage
• Distributed array of powerful servers
• Large bandwidth and data transfer rate
• A reliable library software.

37
Emerging Trends and
Example DBMS
Architectures
UNIT 3 POSTGRESQL
Structure Page Nos.
3.0 Introduction 38
3.1 Objectives 38
3.2 Important Features 38
3.3 PostgreSQL Architectural Concepts 39
3.4 User Interfaces 41
3.5 SQL Variation and Extensions 43
3.6 Transaction Management 44
3.7 Storage and Indexing 46
3.8 Query Processing and Evaluation 47
3.9 Summary 49
3.10 Solutions/Answers 49

3.0 INTRODUCTION
PostgreSQL is an open-source object relational DBMS (ORDBMS). This DBMS was
developed by the academic world, thus it has roots in Academia. It was first
developed as a database called Postgres (developed at UC Berkley in the early 80s). It
was officially called PostgreSQL around 1996 mostly, to reflect the added ANSI SQL
compliant translator. It is one of the most feature-rich robust open-source database. In
this unit we will discuss the features of this DBMS. Some of the topics that are
covered in this unit include its architecture, user interface, SQL variation, transactions,
indexes etc.

3.1 OBJECTIVES
After going through this unit, you should be able to:

• define the basic features of PostgreSQL;


• discuss the user interfaces in this RDBMS;
• use the SQL version of PostgreSQL, and
• identify the indexes and query processing used in PostgreSQL.

3.2 IMPORTANT FEATURES


PostgreSQL is an object relational database management system. It supports basic
object oriented features including inheritance and complex data types along with
special functions to deal with these data types. But, basically most of it is relational.
In fact, most users of PostgreSQL do not take advantage of its extensive object
oriented functionality. Following are the features of PostgreSQL that make it a very
good DBMS:

• Full ANSI-SQL 92 compliance: It supports:

o most of ANSI 99 compliance as well,


o extensive support for the transactions,
o BEFORE and AFTER triggers, and
o implementation of stored procedures, constraints, referential integrity with
cascade update/delete.

38
• Many high level languages and native interfaces can be used for creating user- PostgreSQL
defined database functions.
• You can use native SQL, PgSQL (postgres counterpart to Oracle PL/SQL or MS
SQL Server’s/Sybase TransactSQL), Java, C, C++, and Perl.
• Inheritance of table structures - this is probably one of the rarely used useful
features.
• Built-in complex data types such as IP Address, Geometries (Points, lines,
circles), arrays as a database field type and ability to define your own data types
with properties, operators and functions on these user-defined data types.
• Ability to define Aggregate functions.
• Concept of collections and sequences.
• Support for multiple operating systems like Linux, Windows, Unix, Mac.
• It may be considered to be one of the important databases for implementing Web
based applications. This is because of the fact that it is fast and feature rich.
PostgreSQL is a reasonably fast database with proper support for web languages
such as PHP, Perl. It also supports the ODBC and JDBC drivers making it easily
usable in other languages such as ASP, ASP.Net and Java. It is often compared
with MySQL - one of the fastest databases on the web (open source or non). Its
querying speed is in line with MySQL. In terms of features though PostgreSQL is
definitely a database to take a second look.

3.3 POSTGRESQL ARCHITECTURAL CONCEPTS


Before we dig deeper into the features of PostgreSQL, let’s take a brief look at some
of the basic concepts of Postgres system architecture. In this context, let us define how
the parts of Postgres interact. This will make the understanding of the concepts
simpler. Figure 1 shows the basic architecture of the PostGreSQL on the Unix
operating system.

Server Processes Shared Memory Operating System

Client
Application
interacts
through client Shared Disk
Postmaster Disk Buffers
interface library
Initial Buffers
Connection and
Authentication

Backend Shared
Client
Server Tables
Application Disk
Processes
interacts Storage
through client Queries
interface library and
Results

Figure 1: The Architecture of PostGresSQL

The Postgres uses a simple process per-user client/server model.

39
Emerging Trends and A session on PostgresSQL database consists of the following co-operating processes:
Example DBMS
Architectures • A supervisory daemon process (also referred to as postmaster),
• The front-end user application process (e.g., the psql program), and
• One or more backend database server processes (the PostGres process itself).

The single postmaster process manages the collection of a database (also called an
installation or site) on a single host machine. The client applications that want access
to a database stored at a particular installation make calls to the client interface library.
The library sends user requests over the network to the postmaster, in turn starts a new
backend server process. The postmaster then connects the client process to the new
server. Exam this point onwards, the client process and the backend server process
communicate with each other without any intervention on the part of the postmaster.
Thus, the postmaster process is always running – waiting for requests from the new
clients. Please note, the client and server processes will be created and destroyed over
a period of time as the need arises.

Can a client process, make multiple connections to a backend server process? The
libpq library allows a single client to make multiple connections to backend server
processes. However, please note, that these client processes are not multi-threaded
processes. At present the multithreaded front-end/backend connections are not
supported by libpq. This client server or font-end/back-end combination of processes
on different machines files that can be accessed on a client machine that permits may
not be accessible (or may only be accessed using a different filename) on the database
server machine.

Please, also note that the postmaster and postgres servers run with the user-id of the
Postgres superuser or the administrator. Please also note, that the Postgres superuser
does not have to be a special user and also that the Postgres superuser should
definitely not be the UNIX superuser (called root). All the files related to database
belong to this Postgres superuser.

Figure 2 shows the establishing of connection in PostgresSQL.

Figure 2: Establishing connection in PostgresSQL

40
PostgreSQL
3.4 USER INTERFACES
Having discussed the basic architecture of the PostgreSQL, the question that now
arises is, how does one access databases? PostgreSQL have the following interfaces
for the access of information:
• Postgres terminal monitor programs (e.g. psql): It is a SQL command level
interface that allows you to enter, edit, and execute SQL commands
interactively.
• Programming Interface: You can write a C program using the LIBPQ
subroutine library. This allows you to submit SQL commands from the host
language - C and get responses and status messages back to your program.
But how would you be referring to this interface? To do so, you need to install
PostgreSQL on your machine. Let us briefly point out some facts about the
installation of PostgreSQL.
Installing Postgres on Your machine: Since, Postgres is a client/server DBMS,
therefore, as a user, you need the client’s portion of the installation (an example of a
client application interface is the interactive monitor psql). One of the common
directory where PostGres may be installed on Unix machines is /usr/local/pgsql.
Therefore, we will assume that Postgres has been installed in the
directory/usr/local/pgsql. If you have installed PostGres in a different directory then
you should substitute this directory name with the name of that directory. All
PostGres commands are installed in the directory /usr/local/pgsql/bin. Therefore, you
need to add this directory to your shell command path in Unix.
For example, on the Berkeley C shell or its variants such as csh or tcsh, you need to
add:
% set path = ( /usr/local/pgsql/bin path )

in the .login file in the home directory.

On the Bourne shell or its variants such as sh, ksh, or bash, you need to add:
% PATH=/usr/local/pgsql/bin PATH
% export PATH

to the profile file in your home directory.

Other Interfaces
Some other user interfaces that are available for the PostGres are:
pgAdmin 3 from http://www.pgadmin.org for Windows/Linux/BSD/nix
(Experimental Mac OS-X port). This interface was released under the Artistic License
is a complete PostgreSQL administration interface. It is somewhat similar to
Microsoft’s Enterprise Manager and written in C++ and wxWindows. It allows
administration of almost all database objects and ad-hoc queries.

PGAccess from http://www.pgaccess.org for most platforms is the original


PostgreSQL GUI. It is a MS Access-style database browser that has been written in
Tcl/Tk. It allows browsing, adding and editing tables, views, functions, sequences,
databases, and users, as well as graphically-assisted queries. A form and report
designer are also under development.

Many similar open source tools are available on the Internet.


Let us now describe the most commonly used interface for PostgreSQL i.e., psql in
more details.

41
Emerging Trends and Starting the Interactive Monitor (psql)
Example DBMS
Architectures You can process an application from a client if:

• the site administrator has properly started the postmaster process, and
• you are authorised to use the database with the proper user id and password.

As of Postgres v6.3, two different styles of connections are supported. These are:

• TCP/IP network connections or


• Restricted database access to local (same-machine) socket connections only.

These choices are significant in case, you encounter problems in connecting to a


database. For example, in case, you get the following error message from a Postgres
command (such as psql or createdb):
% psql template1
Connection to database 'postgres' failed.
connectDB() failed: Is the postmaster running and accepting connections at 'UNIX
Socket' on port '5432'?
or
% psql -h localhost template1
Connection to database 'postgres' failed.
connectDB() failed: Is the postmaster running and accepting TCP/IP (with -i)
connections at 'localhost' on port '5432'?

It is because of the fact that either the postmaster is not running, or you are attempting
to connect to the wrong server host. Similarly, the following error message means that
the site administrator has started the postmaster as the wrong user.

FATAL 1:Feb 17 23:19:55:process userid (2360) != database owner (268)

Accessing a Database
Once you have a valid account then the next thing is to start accessing the database.
To access the database with PostGres mydb database you can use the command:
% psql mydb

You may get the following message:


Welcome to the POSTGRESQL interactive sql monitor:
Please read the file COPYRIGHT for copyright terms of POSTGRESQL

type \? for help on slash commands


type \q to quit
type \g or terminate with semicolon to execute query
You are currently connected to the database: template1
mydb=>

The prompt indicates that the terminal monitor is ready for your SQL queries. These
queries need to be input into a workspace maintained by the terminal monitor. The
psql program also responds to escape codes (you must have used them while
programming in C) that begin with the backslash character, “\”. For example, you can
get help on the syntax of PostGres SQL commands by typing:
mydb=> \h

42
Once you have completed the query you can pass the contents of the workspace to the PostgreSQL
Postgres server by typing:
mydb=> \g

This indicates to the server that it may process the query. In case you terminate the
query with a semicolon, the “\g” is not needed. psql automatically processes the
queries that are terminated by a semicolon.

You can store your queries in a file. To read your queries from such a file you may
type:
mydb=> \i filename

To exit psql and return to UNIX, type


mydb=> \q

White space (i.e., spaces, tabs and new line characters) may be used in SQL queries.
You can also enter comments. Single-line comments are denoted by “--”. Multiple-
line comments, and comments within a line, are denoted by “/* ... */”.

) Check Your Progress 1


1) What are the basic features of PostgreSQL?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2) What are the basic processes in PostgreSQL?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
3) What are the different types of interfaces in PostgreSQL?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………

3.5 SQL VARIATION AND EXTENSIONS


The SQL was standardised by American National Standards Institute (ANSI) in 1986.
The International Standards Organisation (ISO) standardised it in 1987. The United
States Government’s Federal Information Processing Standard (FIPS) adopted the
ANSI/ISO standard in 1989, a revised standard known commonly as SQL89 or SQL1,
was published.

The SQL89 standard was intentionally left incomplete to accommodate commercial


DBMS developer interests. However, the standard was strengthened by the ANSI
committee, with the SQL92 standard that was ratified in 1992 (also called SQL2). This
standard addressed several weaknesses in SQL89 and set forth conceptual SQL
features which at that time exceeded the capabilities of the RDBMSs of that time. In
fact, the SQL92 standard was approximately six times the length of its predecessor.
Since SQL 92 was large, therefore, the authors of the standards defined three levels of
SQL92 compliance: Entry-level conformance (only the barest improvements to
SQL89), Intermediate-level conformance (a generally achievable set of major
advancements), and Full conformance (total compliance with the SQL92 features).
43
Emerging Trends and More recently, in 1999, the ANSI/ISO released the SQL99 standard (also called
Example DBMS
Architectures SQL3). This standard addresses some of the more advanced and previously ignored
areas of modern SQL systems, such as object-relational database concepts, call level
interfaces, and integrity management. SQL99 replaces the SQL92 levels of
compliance with its own degrees of conformance: Core SQL99 and Enhanced SQL99.
PostgreSQL presently conforms to most of the Entry-level SQL92 standard, as well as
many of the Intermediate- and Full-level features. Additionally, many of the features
new in SQL99 are quite similar to the object-relational concepts pioneered by
PostgreSQL (arrays, functions, and inheritance).

PostgreSQL also provide several extenstions to standard SQL. Some of these


extensions are:

• PostgreSQL supports many non-standard types. These include abstract data


types like complex, domains, cstring, record, trigger, void etc. It also includes
polymorphic types like any array.

• It supports triggers. It also allows creation of functions which can be stored and
executed on the server.

• It supports many procedural programming languages like PL/pqSQL, PL/Tcl,


PL/Python etc.

3.6 TRANSACTION MANAGEMENT


Every SQL query is executed as a transaction. This results in some desirable
properties, while processing such queries are all-or-nothing types when they are
making modifications. This ensures the integrity and recoverability of queries.
For example, consider the query:
UPDATE students SET marks = marks +10;

Assume that the query above has modified first 200 records and is in the process of
modifying 201st record out of the 2000 records. Suppose a user terminates the query at
this moment by resetting the computer, then, on the restart of the database, the
recovery mechanism will make sure that none of the records of the student is
modified. The query is required to be run again to make the desired update of marks.
Thus, the PostGres has made sure that the query causes no recovery or integrity
related problems.

This is a very useful feature of this DBMS. Suppose you were executing a query to
increase the salary of employees of your organsiation by Rs.500 and there is a power
failure during the update procedure. Without transactions support, the query may have
updated records of some of the persons, but not all. It would be difficult to know
where the UPDATE failed. You would like to know: “Which records were updated,
and which ones were not?” You cannot simply re-execute the query, because some
people who may have already received their Rs. 500 increment would also get another
increase by Rs. 500/-. With the transactions mechanism in place, you need not bother
about it for when the DBMS starts again, first it will recover from the failure thus,
undoing any update to the data. Thus, you can simply re-execute the query.

44
Multistatement Transactions PostgreSQL

By default in PostGres each SQL query runs in its own transaction. For example,
consider two identical queries:
mydb=> INSERT INTO table1 VALUES (1);
INSERT 1000 1
OR
mydb=> BEGIN WORK;
BEGIN
mydb=> INSERT INTO table1 VALUES (1);
INSERT 1000 1
mydb=> COMMIT WORK;
COMMIT

The former is a typical INSERT query. Before PostgreSQL starts the INSERT it
automatically begins a transaction. It performs the INSERT, and then commits the
transaction. This step occurs automatically for any query with no explicit transaction.
However, in the second version, the INSERT uses explicit transaction statements.
BEGIN WORK starts the transaction, and COMMIT WORK commits the transaction.
Both the queries results in same database state, the only difference being the implied
BEGIN WORK...COMMIT WORK statements. However, the real utility of these
transactions related statements can be seen in the ability to club multiple queries into a
single transaction. In such a case, either all the queries will execute to completion or
none at all. For example, in the following transaction either both INSERTs will
succeed or neither.
mydb=> BEGIN WORK;
BEGIN
mydb=> INSERT INTO table1 VALUES (1);
INSERT 1000 1
mydb=> INSERT INTO table1 VALUES (2);
INSERT 2000 1
mydb=> COMMIT WORK;
COMMIT

The PostgreSQL has implemented both two-phase locking and multi-version


concurrency control protocols. The multi-version concurrency control supports all the
isolation levels of SQL standards. These levels are:

• Read uncommitted
• Read committed
• Repeatable Read, and
• Serialisable.

) Check Your Progress 2


1) List the add-on non-standard types in PostgreSQL?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2) How are the transactions supported in PostgreSQL?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………

45
Emerging Trends and 3) State True or False
Example DBMS
Architectures a) PostgreSQL is fully compliant with SQL 99 standard.

b) PostgreSQL supports server-based triggers.

c) PostgreSQL supports all levels of isolation as defined by


SQL standard

d) COMMIT is not a keyword in PostgrSQL.

3.7 STORAGE AND INDEXING


After discussing the basics of transactions let us discuss some of the important
concepts used in PostgreSQL from the point of view of storage and indexing of tables.
An interesting point here is that PostgreSQL defines a number of system columns in
all tables. These system columns are normally invisible to the user, however, explicit
queries can report these entries. These columns, in general, contains meta-data i.e.,
data about data contained in the records of a table.

Thus, any record would have attribute values for the system-defined columns as well
as the user-defined columns of a table. The following table lists the system columns.

Column Name Description


oid (object identifier) The unique object identifier of a record.
It is automatically added to all records.
It is a 4-byte number.
It is never re-used within the same table.
tableoid (table object The oid of the table that contains a row.
identifier) The pg_class system table relates the name and
oid of a table.
xmin (transaction minimum) The transaction identifier of the inserting
transaction of a tuple.
Cmin (command minimum) The command identifier, starting at 0, is associated
with the inserting transaction of a tuple.
xmax (transaction maximum) The transaction identifier of a tuple’s deleting
transaction.
If a tuple has not been deleted then this is set to
zero.
cmax (command maximum) The command identifier is associated with the
deleting transaction of a tuple. Like xmax, if a tuple
has not been deleted then this is set to zero.
ctid (tuple identifier) This identifier describes the physical location of the
tuple within the database.
A pair of numbers are represented by the ctid: the
block number, and tuple index within that block.

Figure 3: System Columns

If the database creator does not create a primary key explicitly, it would become
difficult to distinguish between two records with identical column values. To avoid
such a situation PostgreSQL appends every record with its own object identifier
number, or OID, which is unique to that table. Thus, no two records in the same table
will ever have the same OID, which, also mean that no two records are identical in a
table. The oid makes sure of this.

46
Internally, PostgreSQL stores data in operating system files. Each table has its own PostgreSQL
file, and data records are stored in a sequence in the file. You can create an index on
the database. An index is stored as a separate file that is sorted on one or more
columns as desired by the user. Let us discuss indexes in more details.

Indexes
Indexes allow fast retrieval of specific rows from a table. For a large table using an
index, finding a specific row takes fractions of a second while non-indexed entries
will require more time to process the same information. PostgreSQL does not create
indexes automatically. Indexes are user defined for attributes or columns that are
frequently used for retrieving information.

For example, you can create an index such as:


mydb=> CREATE INDEX stu_name ON Student (first_name);

Although you can create many indexes they should justify the benefits they provide
for retrieval of data from the database. Please note that an index adds on overhead in
terms of disk space, and performance as a record update may also require an index
update. You can also create an index on multiple columns. Such multi-column indexes
are sorted by the first indexed column and then the second indexed column.

PostgreSQL supports many types of index implementations. These are:

• B-Tree Indexes: These are the default type index. These are useful for
comparison and range queries.
• Hash Indexes: This index uses linear hashing. Such indexes are not preferred in
comparison with B-tree indexes.
• R-Tree indexes: Such index are created on built-in spatial data types such as
box, circle for determining operations like overlap etc.
• GiST Indexes: These indexes are created using Generalised search trees. Such
indexes are useful for full text indexing, and are thus useful for retrieving
information.

3.8 QUERY PROCESSING AND EVALUATION


This section provides a brief introduction to the query processing operations of
PostgreSQL. It will define the basic steps involved in query processing and evaluation
in the PostgreSQL. The query once submitted to PostgreSQL undergoes the following
steps, (in sequential order) for solving the query:

• A connection from the client application program to the PostgreSQL server is


established. The application program transmits the query to the server and waits
for the server to process the query and return the results.

• The parser at the server checks the syntax of the query received from the client
and creates a query tree.

• The rewrite system takes the query tree created by the parser as the input, and
selects the rules stored in the system catalogues that may apply to the query
tree. It then performs the transformation given as per the rules. It also rewrites
any query made against a view to a query accessing the base tables.

• The planner/optimiser takes the (rewritten) query tree and creates a query plan
that forms the input to the executor. It creates a structured list of all the possible
paths leading to the same result. Finally, the cost for the execution of each path
is estimated and the cheapest path is chosen. This cheapest path is expanded
into a complete query evaluation plan that the executor can use.
47
Emerging Trends and • The executor recursively steps through the query evaluation plan tree supplied
Example DBMS by the planner and creates the desired output.
Architectures

A detailed description of the process is given below:


A query is sent to the backend (it may be the query processor) via data packets that
may result from database access request through TCP/IP on a remote database access
or local Unix Domain sockets. The query is then loaded into a string, and passed to
the parser, where the lexical scanner, scan.l, tokenises the words of the query string.
The parser then uses another component gram.y and the tokens to identify the query
type, such as, Create or Select queries. Now the proper query-specific structure is
loaded.

The statement is then identified as complex (SELECT / INSERT / UPDATE /


DELETE) or as simple, e.g., CREATE USER, ANALYSE etc. Simple utility commands
are processed by statement-specific functions, however, for handling complex
statements it need further detailing and processing.

Complex queries could be, SELECT, return columns of data or specify columns that
need to be modified like INSERT and UPDATE. The references of these columns are
converted to TargetEntry entries that may, be linked together to create the target list
of the query. The target list is stored in Query.targetList.

Now, the Query is modified for the desired VIEWS or implementation of RULES that
may apply to the query.

The optimiser then, creates the query execution plan based on the Query structure and
the operations to be performed in order to execute the query. The Plan is passed to the
executor for execution, and the results are returned to the client.

Query Optimisation
The task of the planner/optimizer is to create an optimal execution plan out of the
available alternatives. The query tree of a given SQL query can be actually executed
in a wide variety of different ways, each of which essentially producing the same set
of results. It is not possible for the query optimiser to examine each of these possible
execution plans to choose the execution plan that is expected to run the fastest. Thus,
the optimiser must find a reasonable (non optimal) query plan in the predefined time
and space complexity. Here a Genetic query optimiser is used by the PostgreSQL.

After the cheapest path is determined, a full-fledged plan tree is built in order to pass
it to the executor. This represents the desired execution plan in sufficient detail for the
executor to run it.

) Check Your Progress 3


1) What are system columns?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2) What are the different types of indexes in PostgreSQL?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………

48
3) What are the different steps for query evaluation? PostgreSQL

……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………

3.9 SUMMARY
This unit provided an introduction to some of the basic features of the PostgreSQL
DBMS. This is very suitable example of a DBMS available as Open Source software.
This database provides most of the basic features of the relational database
management systems. It also supports some of the features of object oriented
programming like inheritance, complex data types, declarations of functions and
polymorphism. Thus, it is in the category of object relational DBMS. PostgreSQL has
features such as, the client server architecture with the Postmaster, server and client as
the main processes. It supports all the basic features of SQL 92 and many more
features recommended by SQL 99. PostgreSQL treats even single SQL queries as
transaction implicitly. This helps in maintaining the integrity of the database at all
times. PostgreSQL places many system related attributes in the tables defined by the
users. Such attributes help the database with tasks that may be useful for indexing and
linking of records. It also supports different types of indexes, which includes B-Tree,
Hash, R-Tree and GiST types of indexes. Thus, making it suitable for many different
types of application like spatial, multi-dimensional database applications. It has a
standard process for query evaluation and optimisation.

3.10 SOLUTIONS/ANSWERS
Check Your Progress 1
1) PostgreSQL supports the following features:

• ANSI SQL 2 compliance,


• Support for transactions, triggers, referential integrity and constraints,
• High level language support,
• Inheritance, complex data types and polymorphism,
• Built in complex data types like IP address,
• Aggregate functions, collections and sequences,
• Portability, and
• ODBC and JDBC drivers.

2) PostgreSQL has the following three processes:

• Postmaster
• Server process
• Client processes

3) PostgreSQL supports both the terminal monitor interfaces and program driven
interfaces.

Check Your Progress 2


1) Some of these types are: complex, domains, cstring, record, trigger, void etc.

2) Each SQL statement is treated as a transaction. It also has provision for multi-
statement transactions.

3) (a) False (b) True (c) True (d) False


49
Emerging Trends and Check Your Progress 3
Example DBMS
Architectures 1) System columns are added by PostgrSQL to all the tables. These are oid (object
identifier), tableoid, xmin, xmax, cmax, ctid.
2) PostgreSQL supports the following four types of indexes: B-Tree, Hash, R-Tree
and GiST.

3) The steps are:

• Query submission, the


• Parsing by the parser to query tree,
• Transformation of query by rewrite system
• Creation of query evaluation plan by the optimiser, and
• Execution by executor.

50
Oracle
UNIT 4 ORACLE
Structure Page Nos.
4.0 Introduction 51
4.1 Objectives 52
4.2 Database Application Development Features 52
4.2.1 Database Programming
4.2.2 Database Extensibility
4.3 Database Design and Querying Tools 57
4.4 Overview of Oracle Architecture 50
4.4.1 Physical Database Structures
4.4.2 Logical Database Structures
4.4.3 Schemas and Common Schema Objects
4.4.4 Oracle Data Dictionary
4.4.5 Oracle Instance
4.4.6 Oracle Background Processes
4.4.7 How Oracle Works?
4.5 Query Processing and Optimisation 71
4.6 Distributed Oracle 75
4.6.1 Distributed Queries and Transactions
4.6.2 Heterogenous Services
4.7 Data Movement in Oracle 76
4.7.1 Basic Replication
4.7.2 Advanced Replication
4.7.3 Transportable Tablespaces
4.7.4 Advanced Queuing and Streams
4.7.5 Extraction, Transformation and Loading
4.8 Database Administration Tools 77
4.9 Backup and Recovery in Oracle 78
4.10 Oracle Lite 79
4.11 Scalability and Performance Features of Oracle 79
4.11.1 Concurrency
4.11.2 Read Consistency
4.11.3 Locking Mechanisms
4.11.4 Real Application Clusters
4.11.5 Portability
4.12 Oracle DataWarehousing 83
4.12.1 Extraction, Transformation and Loading (ETL)
4.12.2 Materialised Views
4.12.3 Bitmap Indexes
4.12.4 Table Compression
4.12.5 Parallel Execution
4.12.6 Analytic SQL
4.12.7 OLAP Capabilities
4.12.8 Data Mining
4.12.9 Partitioning
4.13 Security Features of Oracle 85
4.14 Data Integrity and Triggers in Oracle 86
4.15 Transactions in Oracle 87
4.16 SQL Variations and Extensions in Oracle 88
4.17 Summary 90
4.18 Solutions/Answers 91

4.0 INTRODUCTION
The relational database concept was first described by Dr. Edgar F. Codd in an IBM
research publication titled “System R4 Relational” which in 1970. Initially, it was
unclear whether any system based on this concept could achieve commercial success.
However, if we look back there have been many products, which support most of the
features of relational database models and much more. Oracle is one such product
51
Emerging Trends and that was created by Relational Software Incorporated (RSI) in 1977. They released
Example DBMS Oracle V.2 as the world’s first relational database within a couple of years. In 1983,
Architecture RSI was renamed Oracle Corporation to avoid confusion with a competitor named
RTI. During this time, Oracle developers made a critical decision to create a portable
version of Oracle (Version 3) that could run not only on Digital VAX/VMS systems,
but also on Unix and other platforms. Since the mid-1980s, the database deployment
model has evolved from dedicated database application servers to client/servers to
Internet computing implemented with PCs and thin clients accessing database
applications via browsers – and, to the grid with Oracle Database 10g.

Oracle introduced many innovative technical features to the database as computing


and deployment models changed (from offering the first distributed database to the
first Java Virtual Machine in the core database engine). Oracle also continues to
support emerging standards such as XML and .NET.
We will discuss some of the important characteristics and features of Oracle in this
unit.

4.1 OBJECTIVES
After going through this unit, you should be able to:

• define the features of application development in Oracle;


• identify tools for database design and query;
• describe the Oracle architecture;
• discuss the query processing and optimisation in Oracle;
• define the distributed Oracle database features;
• identify tools for database administration including backup and recovery;
• discuss the features of Oracle including scalability, performance, security etc.,
and
• define newer applications supported by Oracle.

4.2 DATABASE APPLICATION DEVELOPMENT


FEATURES
The main use of the Oracle database system is to store and retrieve data for
applications. Oracle has evolved over the past 20 years. The following table traces the
historical facts about Oracle.

Year Feature
Oracle Release 2—the first commercially available relational database
1979
to use SQL.
Single code base for Oracle across multiple platforms, Portable toolset,
1980-1990 Client/server Oracle relational database, CASE and 4GL toolset, Oracle
Financial Applications built on relational database.
Oracle Parallel Server on massively parallel platform, cost-based
optimiser, parallel operations including query, load, and create index,
Universal database with extended SQL via cartridges, thin client, and
application server, Oracle 8 with inclusion of object-relational and Very
Large Database (VLDB) features, Oracle8i added a new twist to the
1991-2000
Oracle database – a combination of enhancements that made the
Oracle8i database the focal point of the world of Internet (the i in 8i)
computing.

Java Virtual Machine (JVM) was added into the database and Oracle

52
Oracle
Year Feature
tools were integrated in the middle tier.
Oracle9i Database Server is generally available: Real Application
2001
Clusters; OLAP and data mining API in the database.

Oracle Database 10g enables grid (the g in 10g) computing. A grid is


simply a pool of computers that provides needed resources for
applications on an as-needed basis. The goal is to provide computing
resources that can scale transparently to the user community, much as an
electrical utility company can deliver power, to meet peak demand, by
accessing energy from other power providers’ plants via a power grid.
2003 Oracle Database 10g further reduces the time, cost, and complexity of
database management through the introduction of self-managing
features such as the Automated Database Diagnostic Monitor,
Automated Shared Memory Tuning, Automated Storage Management,
and Automated Disk Based Backup and Recovery. One important key to
Oracle Database 10g’s usefulness in grid computing is the ability to
have a provision for CPUs and data.

However, in this section we will concentrate on the tools that are used to create
applications. We have divided the discussion in this section into two categories:
database programming and database extensibility options. Later in this unit, we will
describe the Oracle Developer Suite, a set of optional tools used in Oracle Database
Server and Oracle Application Server development.

4.2.1 Database Programming


All flavors of the Oracle database include different languages and interfaces that allow
programmers to access and manipulate data in the database. The following are the
languages and interfaces supported by Oracle.

SQL
All operations on the information in an Oracle database are performed using SQL
statements. A statement must be the equivalent of a complete SQL sentence, as in:
SELECT last_name, department_id FROM employees;

Only a complete SQL statement can run successfully. A SQL statement can be
thought of as a very simple, but powerful, computer instruction. SQL statements of
Oracle are divided into the following categories.

Data Definition Language (DDL) Statements: These statements create, alter,


maintain, and drop schema objects. DDL statements also include statements that
permit a user the right to grant other users the privilege of accessing the database and
specific objects within the database.

Data Manipulation Language (DML) Statements: These statements manipulate


data. For example, querying, inserting, updating, and deleting rows of a table are all
DML operations. Locking a table and examining the execution plan of a SQL
statement are also DML operations.

Transaction Control Statements: These statements manage the changes made by


DML statements. They enable a user group changes into logical transactions.
Examples include COMMIT, ROLLBACK, and SAVEPOINT.

53
Emerging Trends and Session Control Statements: These statements let a user control the properties of the
Example DBMS current session, including enabling and disabling roles and changing language
Architecture settings. The two session control statements are ALTER SESSION and SET ROLE.

System Control Statements: These statements change the properties of the Oracle
database instance. The only system control statement is ALTER SYSTEM. It lets
users change settings, such as the minimum number of shared servers, kill a session,
and perform other tasks.

Embedded SQL Statements: These statements incorporate DDL, DML, and


transaction control statements in a procedural language program. Examples include
OPEN, CLOSE, FETCH, and EXECUTE.

Datatypes
Each attribute and constant in a SQL statement has a datatype, which is associated
with a specific storage format, constraints, and a valid range of values. When you
create a table, you must specify a datatype for each of its columns.

Oracle provides the following built-in datatypes:


• Character datatypes
• Numeric datatypes
• DATE datatype
• LOB datatypes
• RAW and LONG RAW datatypes
• ROWID and UROWID datatypes.
New object types can be created from any in-built database types or any previously
created object types, object references, and collection types. Metadata for user-defined
types is stored in a schema available to SQL, PL/SQL, Java, and other published
interfaces.

An object type differs from native SQL datatypes in that it is user-defined, and it
specifies both the underlying persistent data (attributes) and the related behaviours
(methods). Object types are abstractions of real-world entities, for example, purchase
orders.

Object types and related object-oriented features, such as variable-length arrays and
nested tables, provide higher-level ways to organise and access data in the database.
Underneath the object layer, data is still stored in columns and tables, but you can
work with the data in terms of real-world entities – customers and purchase orders,
that make the data meaningful. Instead of thinking in terms of columns and tables
when you query the database, you can simply select a customer.

PL/SQL
PL/SQL is Oracle’s procedural language extension to SQL. PL/SQL combines the
ease and flexibility of SQL with the procedural functionality of a structured
programming language, such as IF, THEN, WHILE, and LOOP.

When designing a database application, consider the following advantages of using


stored PL/SQL:
• PL/SQL code can be stored centrally in a database. Network traffic between
applications and the database is reduced, hence, both application and system
performance increases. Even when PL/SQL is not stored in the database,
applications can send blocks of PL/SQL to the database rather than individual
SQL statements, thereby reducing network traffic.

54
• Data access can be controlled by stored PL/SQL code. In this case, PL/SQL Oracle
users can access data only as intended by application developers, unless another
access route is granted.

• PL/SQL blocks can be sent by an application to a database, running complex


operations without excessive network traffic.

• Oracle supports PL/SQL Server Pages, so the application logic can be invoked
directly from your Web pages.

The PL/SQL program units can be defined and stored centrally in a database. Program
units are stored procedures, functions, packages, triggers, and anonymous
transactions.

Procedures and functions are sets of SQL and PL/SQL statements grouped together as
a unit to solve a specific problem or to perform a set of related tasks. They are created
and stored in compiled form in the database and can be run by a user or a database
application. Procedures and functions are identical, except that all functions always
return a single value to the user. Procedures do not return values.

Packages encapsulate and store related procedures, functions, variables, and other
constructs together as a unit in the database. They offer increased functionality (for
example, global package variables can be declared and used by any procedure in the
package). They also improve performance (for example, all objects of the package are
parsed, compiled, and loaded into memory once).

Java Features and Options


Oracle8i introduced the use of Java as a procedural language with a Java Virtual
Machine (JVM) in the database (originally called JServer). JVM includes support for
Java stored procedures, methods, triggers, Enterprise JavaBeans (EJBs), CORBA, and
HTTP. The Accelerator is used for project generation, translation, and compilation,
and can also be used to deploy/install-shared libraries. The inclusion of Java within
the Oracle database allows Java developers to level their skills as Oracle application
developers as well. Java applications can be deployed in the client, Application
Server, or database, depending on what is most appropriate. Oracle data warehousing
options for OLAP and data mining provide a Java API. These applications are
typically custom built using Oracle’s JDeveloper.

Large Objects
Interest in the use of large objects (LOBs) continues to grow, particularly for storing
non-traditional data types such as images. The Oracle database has been able to store
large objects for some time. Oracle8 added the capability to store multiple LOB
columns in each table. Oracle Database 10g essentially removes the space limitation
on large objects.

Object-Oriented Programming
Support of object structures has been included since Oracle8i to allow an object-
oriented approach to programming. For example, programmers can create user-
defined data types, complete with their own methods and attributes. Oracle’s object
support includes a feature called Object Views through which object-oriented
programs can make use of relational data already stored in the database. You can also
store objects in the database as varying arrays (VARRAYs), nested tables, or index
organised tables (IOTs).

55
Emerging Trends and Third-Generation Languages (3GLs)
Example DBMS
Architecture Programmers can interact with the Oracle database from C, C++, Java, COBOL, or
FORTRAN applications by embedding SQL in those applications. Prior to compiling
applications using a platform’s native compilers, you must run the embedded SQL
code through a precompiler. The precompiler replaces SQL statements with library
calls the native compiler can accept. Oracle provides support for this capability
through optional “programmer” precompilers for languages such as C and C++
(Pro*C) and COBOL (Pro*COBOL). More recently, Oracle added SQLJ, a
precompiler for Java that replaces SQL statements embedded in Java with calls to a
SQLJ runtime library, also written in Java.

Database Drivers

All versions of Oracle include database drivers that allow applications to access
Oracle via ODBC (the Open DataBase Connectivity standard) or JDBC (the Java
DataBase Connectivity open standard). Also available are data providers for OLE DB
and for .NET.

The Oracle Call Interface

This interface is available for the experienced programmer seeking optimum


performance. They may choose to define SQL statements within host-language
character strings and then explicitly parse the statements, bind variables for them, and
execute them using the Oracle Call Interface (OCI). OCI is a much more detailed
interface that requires more programmer time and effort to create and debug.
Developing an application that uses OCI can be time-consuming, but the added
functionality and incremental performance gains often make spending the extra time
worthwhile.

National Language Support

National Language Support (NLS) provides character sets and associated


functionality, such as date and numeric formats, for a variety of languages. Oracle9i
featured full Unicode 3.0 support. All data may be stored as Unicode, or select
columns may be incrementally stored as Unicode. UTF-8 encoding and UTF-16
encoding provides support for more than 57 languages and 200 character sets. Oracle
Database 10g adds support for Unicode 3.2. Extensive localisation is provided (for
example, for data formats) and customised localisation can be added through the
Oracle Locale Builder. Oracle Database 10g includes a Globalisation Toolkit for
creating applications that will be used in multiple languages.

4.2.2 Database Extensibility


The Internet and corporate intranets have created a growing demand for storage and
manipulation of non-traditional data types within the database. There is a need for its
extension to the standard functionality of a database for storing and manipulating
image, audio, video, spatial, and time series information. These capabilities are
enabled through extensions to standard SQL.
Oracle Text and InterMedia
Oracle Text can identify the gist of a document by searching for themes and key
phrases in the document.

56
Oracle interMedia bundles additional image, audio, video, and locator functions and is Oracle
included in the database license. Oracle interMedia offer the following capabilities:
• The image portion of interMedia can store and retrieve images.
• The audio and video portions of interMedia can store and retrieve audio and
video clips, respectively.
• The locator portion of interMedia can retrieve data that includes spatial
coordinate information.

Oracle Spatial Option


The Spatial option is available for Oracle Enterprise Edition. It can optimise the
display and retrieval of data linked to coordinates and is used in the development of
spatial information systems. Several vendors of Geographic Information Systems
(GIS) products now bundle this option and leverage it as their search and retrieval
engine.

XML
Oracle added native XML data type support to the Oracle9i database and XML and
SQL interchangeability for searching. The structured XML object is held natively in
object relational storage meeting the W3C DOM specification. The XPath syntax for
searching in SQL is based on the SQLX group specifications.

4.3 DATABASE DESIGN AND QUERYING TOOLS


Many Oracle tools are available to developers to help them present data and build
more sophisticated Oracle database applications. This section briefly describes the
main Oracle tools for application development: Oracle Forms Developer, Oracle
Reports Developer, Oracle Designer, Oracle JDeveloper, Oracle Discoverer
Administrative Edition and Oracle Portal. Oracle Developer Suite was known as
Oracle Internet Developer Suite with Oracle9i.

SQL*Plus
SQL*Plus is an interactive and batch query tool that is installed with every Oracle
Database Server or Client installation. It has a command-line user interface, a
Windows Graphical User Interface (GUI) and the iSQL*Plus web-based user
interface.

SQL*Plus has its own commands and environment, and it provides access to the
Oracle Database. It enables you to enter and execute SQL, PL/SQL, SQL*Plus and
operating system commands to perform the following:
• format, perform calculations on, store, and print from query results,
• examine table and object definitions,
• develop and run batch scripts, and
• perform database administration.

You can use SQL*Plus to generate reports interactively, to generate reports as batch
processes, and to output the results to text file, to screen, or to HTML file for
browsing on the Internet. You can generate reports dynamically using the HTML
output facility of SQL*Plus, or using the dynamic reporting capability of iSQL*Plus
to run a script from a web page.

Oracle Forms Developer


57
Emerging Trends and Oracle Forms Developer provides a powerful tool for building forms-based
Example DBMS applications and charts for deployment as traditional client/server applications or as
Architecture
three-tier browser-based applications via Oracle Application Server. Developer is a
fourth-generation language (4GL). With a 4GL, you define applications by defining
values for properties, rather than by writing procedural code. Developer supports a
wide variety of clients, including traditional client/server PCs and Java-based clients.
The Forms Builder includes an in-built JVM for previewing web applications.

Oracle Reports Developer

Oracle Reports Developer provides a development and deployment environment for


rapidly building and publishing web-based reports via Reports for Oracle’s
Application Server. Data can be formatted in tables, matrices, group reports, graphs,
and combinations. High-quality presentation is possible using the HTML extension
Cascading Style Sheets (CSS).

Oracle JDeveloper

Oracle JDeveloper was introduced by Oracle in 1998 to develop basic Java


applications without writing code. JDeveloper includes a Data Form wizard, a Beans
Express wizard for creating JavaBeans and BeanInfo classes, and a Deployment
wizard. JDeveloper includes database development features such as various Oracle
drivers, a Connection Editor to hide the JDBC API complexity, database components
to bind visual controls, and a SQLJ precompiler for embedding SQL in Java code,
which you can then use with Oracle. You can also deploy applications developed with
JDeveloper using the Oracle Application Server. Although JDeveloper user wizards to
allow programmers create Java objects without writing code, the end result is a
generated Java code. This Java implementation makes the code highly flexible, but it
is typically a less productive development environment than a true 4GL.

Oracle Designer

Oracle Designer provides a graphical interface for Rapid Application Development


(RAD) for the entire database development process—from building the business
model to schema design, generation, and deployment. Designs and changes are stored
in a multi-user repository. The tool can reverse-engineer existing tables and database
schemas for re-use and re-design from Oracle and non-Oracle relational databases.

The designer also include generators for creating applications for Oracle Developer,
HTML clients using Oracle’s Application Server, and C++. Designer can generate
applications and reverse-engineer existing applications or applications that have been
modified by developers. This capability enables a process called round-trip
engineering, in which a developer uses Designer to generate an application, modifies
the generated application, and reverse-engineers the changes back into the Designer
repository.

Oracle Discoverer

58
Oracle Discoverer Administration Edition enables administrators to set up and Oracle
maintain the Discoverer End User Layer (EUL). The purpose of this layer is to shield
business analysts using Discoverer as an ad hoc query or ROLAP tool from SQL
complexity. Wizards guide the administrator through the process of building the EUL.
In addition, administrators can put limits on resources available to analysts monitored
by the Discoverer query governor.

Oracle Portal
Oracle Portal, introduced as WebDB in 1999, provides an HTML-based tool for
developing web-enabled applications and content-driven web sites. Portal application
systems are developed and deployed in a simple browser environment. Portal includes
wizards for developing application components incorporating “servlets” and access to
other HTTP web sites. For example, Oracle Reports and Discoverer may be accessed
as servlets. Portals can be designed to be user-customisable. They are deployed to the
middle-tier Oracle Application Server.

Oracle Portal has enhanced the WebDB, with the ability to create and use portlets,
which allow a single web page to be divided into different areas that can
independently display information and interact with the user.

) Check Your Progress 1


1) What are Data Definition statements in Oracle?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………

2) ………………………………… manage the changes made by DML statements.

3) What are the in- built data types of ORACLE?


……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………

4) Name the main ORACLE tools for Application Development.


……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………

4.4 OVERVIEW OF ORACLE ARCHITECTURE


59
Emerging Trends and The following section presents the basic architecture of the Oracle database. A
Example DBMS schematic diagram of Oracle database is given below:
Architecture

Figure 1: Oracle 10g Architecture (Source: www.oracle.com)

4.4.1 Physical Database Structures


The following sections explain the physical database structures of an Oracle database,
including datafiles, redo log files, and control files.

Datafiles
Every Oracle database has one or more physical datafiles. The datafiles contain all the
database data. The data of logical database structures, such as tables and indexes, is
physically stored in the datafiles allocated for a database.
The characteristics of datafiles are:
• a datafile can be associated with only one database,

60
• datafiles can have certain characteristics set to let them automatically extend Oracle
when the database runs out of space,
• one or more datafiles form a logical unit of database storage called a tablespace.

Data in a datafile is read, as needed, during normal database operation and stored in
the memory cache of Oracle. For example, assume that a user wants to access some
data in a table of a database. If the requested information is not already in the memory
cache for the database, then it is read from the appropriate datafiles and stored in the
memory.

Modified or new data is not necessarily written to a datafile immediately. To reduce


the amount of disk access and to increase performance, data is pooled in memory and
written to the appropriate datafiles all at once, as determined by the database writer
process (DBWn) background process.

Control Files
Every Oracle database has a control file. A control file contains entries that specify the
physical structure of the database. For example, it contains the following information:
• database name,
• names and locations of datafiles and redo log files, and
• time stamp of database creation.

Oracle can multiplex the control file, that is, simultaneously maintain a number of
identical control file copies, to protect against failure involving the control file.

Every time an instance of an Oracle database is started, its control file identifies the
database and redo log files that must be opened for database operation to proceed. If
the physical makeup of the database is altered (for example, if a new datafile or redo
log file is created), then the control file is automatically modified by Oracle to reflect
the change. A control file is also used in database recovery.

Redo Log Files


Every Oracle database has a set of two or more redo log files. The set of redo log files
is collectively known as the redo log for the database. A redo log is made up of redo
entries (also called redo records).

The primary function of the redo log is to record all changes made to the data. If a
failure prevents modified data from being permanently written to the datafiles, then
the changes can be obtained from the redo log, so work is never lost. To protect
against a failure involving the redo log itself, Oracle allows a multiplexed redo log so
that two or more copies of the redo log can be maintained on different disks.

The information in a redo log file is used only to recover the database from a system
or media failure that prevents database data from being written to the datafiles. For
example, if an unexpected power shortage terminates database operation, then the data
in the memory cannot be written to the datafiles, and the data is lost. However, lost
data can be recovered when the database is opened, after power is restored. By
applying the information in the most recent redo log files to the database datafiles,
Oracle restores the database to the time when the power failure had occurred. The
process of applying the redo log during a recovery operation is called rolling forward.

Archive Log Files


You can also enable automatic archiving of the redo log. Oracle automatically
archives log files when the database is in ARCHIVELOG mode.

Parameter Files
61
Emerging Trends and Parameter files contain a list of configuration parameters for that instance and
Example DBMS database.
Architecture
Oracle recommends that you create a server parameter file (SPFILE) as a dynamic
means of maintaining initialisation parameters. A server parameter file lets you store
and manage your initialisation parameters persistently in a server-side disk file.

Alert and Trace Log Files


Each server and background process can write to an associated trace file. When an
internal error is detected by a process, it dumps information about the error to its trace
file. Some of the information written to a trace file is intended for the database
administrator, while other information is for Oracle Support Services. Trace file
information is also used to tune applications and instances.

The alert file, or alert log, is a special trace file. The alert file of a database is a
chronological log of messages and errors.

Backup Files
To restore a file is to replace it with a backup file. Typically, you restore a file when a
media failure or user error has damaged or deleted the original file.

User-managed backup and recovery requires you to actually restore backup files
before a trial recovery of the backups can be attempted/performed.

Server-managed backup and recovery manages the backup process, such as


scheduling of backups, as well as recovery processes such as applying the correct
backup file when recovery is needed.

4.4.2 Logical Database Structures


The logical storage structures, including data blocks, extents, and segments, enable
Oracles fine-grained control of disk space use.

Tablespaces
A database is divided into logical storage units called tablespaces, which group related
logical structures together. For example, tablespaces commonly group together all
application objects to simplify administrative operations.

Each database is logically divided into one or more tablespaces. One or more datafiles
are explicitly created for each tablespace to physically store the data of all logical
structures in a tablespace. The combined size of the datafiles in a tablespace is the
total storage capacity of the tablespace. Every Oracle database contains a SYSTEM
tablespace and a SYSAUX tablespace. Oracle creates them automatically when the
database is created. The system default is to create a smallfile tablespace, which is the
traditional type of Oracle tablespace. The SYSTEM and SYSAUX tablespaces are
created as smallfile tablespaces.

Oracle also lets you create bigfile tablespaces up to 8 exabytes (8 million terabytes) in
size. With Oracle-managed files, bigfile tablespaces make datafiles completely
transparent for users. In other words, you can perform operations on tablespaces,
rather than on the underlying datafiles.

Online and Offline Tablespaces


A tablespace can be online (accessible) or offline (not accessible). A tablespace is
generally online, so that users can access information in the tablespace. However,

62
sometimes a tablespace is offline, in order to make a portion of the database Oracle
unavailable while allowing normal access to the remainder of the database. This
makes many administrative tasks easier to perform.

Oracle Data Blocks


At the finest level of granularity, Oracle database data is stored in data blocks. One
data block corresponds to a specific number of bytes of physical database space on
disk. The standard block size is specified by the DB_BLOCK_SIZE initialization
parameter. In addition, you can specify up to five other block sizes. A database uses
and allocates free database space in Oracle data blocks.

Extents
The next level of logical database space is an extent. An extent is a specific number of
contiguous data blocks, obtained in a single allocation, used to store a specific type of
information.

Segments
Next, is the segment, or the level of logical database storage. A segment is a set of
extents allocated for a certain logical structure. The following table describes the
different types of segments.

Segment Description
Data Each non-clustered table has a data segment. All table data is stored in
segment the extents of the data segment.
For a partitioned table, each partition has a data segment.
Each cluster has a data segment. The data of every table in the cluster is
stored in the cluster’s data segment.
Index Each index has an index segment that stores all of its data.
segment For a partitioned index, each partition has an index segment.

Temporary Temporary segments are created by Oracle when a SQL statement needs
segment a temporary database area to complete execution. When the statement
has been executed, the extents in the temporary segment are returned to
the system for future use.

Rollback If you are operating in automatic undo management mode, then the
segment database server manages undo space-using tablespaces. Oracle
recommends that you use automatic undo management.

Earlier releases of Oracle used rollback segments to store undo


information. The information in a rollback segment was used during
database recovery for generating read-consistent database information
and for rolling back uncommitted transactions for users.

Space management for these rollback segments was complex, and not
done that way. Oracle uses the undo tablespace method of managing
undo; this eliminates the complexities of managing rollback segment
space.

Oracle does use a SYSTEM rollback segment for performing system

63
Emerging Trends and Segment Description
Example DBMS
Architecture transactions. There is only one SYSTEM rollback segment and it is
created automatically at CREATE DATABASE time and is always
brought online at instance startup. You are not required to perform any
operation to manage the SYSTEM rollback segment.

Oracle dynamically allocates space when the existing extents of a segment become
full. In other words, when the extents of a segment are full, Oracle allocates another
extent for that segment. Because extents are allocated as needed, the extents of a
segment may or may not be contiguous on a disk.

4.4.3 Schemas and Common Schema Objects


A schema is a collection of database objects. A schema is owned by a database user
and has the same name as that user. Schema objects are the logical structures that refer
directly to the database’s data. Schema objects include structures like tables, views,
and indexes. (There is no relationship between a tablespace and a schema. Objects in
the same schema can be in different tablespaces, and a tablespace can hold objects
from different schemas). Some of the most common schema objects are defined in the
following section.

Tables
Tables are the basic unit of data storage in an Oracle database. Database tables hold all
user-accessible data. Each table has columns and rows.

Indexes
Indexes are optional structures associated with tables. Indexes can be created to
increase the performance of data retrieval. An index provides an access path to table
data.

When processing a request, Oracle may use some or all of the available indexes in
order to locate, the requested rows efficiently. Indexes are useful when applications
frequently query a table for a range of rows (for example, all employees with salaries
greater than 1000 dollars) or a specific row.

An index is automatically maintained and used by the DBMS. Changes to table data
(such as adding new rows, updating rows, or deleting rows) are automatically
incorporated into all relevant indexes with complete transparency to the users. Oracle
uses B-trees to store indexes in order to speed up data access.

An index in Oracle can be considered as an ordered list of the values divided into
block-wide ranges (leaf blocks). The end points of the ranges along with pointers to
the blocks can be stored in a search tree and a value in log(n) time for n entries could
be found. This is the basic principle behind Oracle indexes.
The following Figure 2 illustrates the structure of a B-tree index.

64
Oracle

Figure 2: Internal Structure of an Oracle Index (Source: www.oracle.com)

The upper blocks (branch blocks) of a B-tree index contain index data that points to
lower-level index blocks. The lowest level index blocks (leaf blocks) contains every
indexed data value and a corresponding rowid used to locate the actual row. The leaf
blocks are doubly linked. Indexes in columns containing character data are based on
the binary values of the characters in the database character set.

Index contains two kinds of blocks:


• Branch blocks for searching, and
• Leaf blocks that store values.

Branch Blocks: Branch blocks store the following:


• the minimum key prefix needed to make a branching decision between two
keys, and
• the pointer to the child block containing the key.

If the blocks have n keys then they have n+1 pointers. The number of keys and
pointers is limited by the block size.

Leaf Blocks: All leaf blocks are at the same depth from the root branch block. Leaf
blocks store the following:
• the complete key value for every row, and
• ROWIDs of the table rows

All key and ROWID pairs are linked to their left and right siblings. They are sorted by
(key, ROWID).

Views
Views are customised presentations of data in one or more tables or other views. A
view can also be considered as a stored query. Views do not actually contain data.
Rather, they derive their data from the tables on which they are based, (referred to as
the base tables of the views). Views can be queried, updated, inserted into, and deleted
from, with some restrictions.
65
Emerging Trends and
Example DBMS Views provide an additional level of table security by restricting access to a pre-
Architecture determined set of rows and columns of a table. They also hide data complexity and
store complex queries.
Clusters
Clusters are groups of one or more tables physically stored together because they
share common columns and are often used together. Since related rows are physically
stored together, disk access time improves.

Like indexes, clusters do not affect application design. Whether a table is part of a
cluster is transparent to users and to applications. Data stored in a clustered table is
accessed by SQL in the same way as data stored in a non-clustered table.
Synonyms
A synonym is an alias for any table, view, materialised view, sequence, procedure,
function, package, type, Java class schema object, user-defined object type, or another
synonym. Because a synonym is simply an alias, it requires no storage other than its
definition in the data dictionary.

4.4.4 Oracle Data Dictionary


Each Oracle database has a data dictionary. An Oracle data dictionary is a set of tables
and views that are used as a read-only reference on the database. For example, a data
dictionary stores information about the logical and physical structure of the database.
A data dictionary also stores the following information:
• the valid users of an Oracle database,
• information about integrity constraints defined for tables in the database, and
• the amount of space allocated for a schema object and how much of it is in use.

A data dictionary is created when a database is created. To accurately reflect the status
of the database at all times, the data dictionary is automatically updated by Oracle in
response to specific actions, (such as, when the structure of the database is altered).
The database relies on the data dictionary to record, verify, and conduct on-going
work. For example, during database operation, Oracle reads the data dictionary to
verify that schema objects exist and that users have proper access to them.

4.4.5 Oracle Instance


An Oracle database server consists of an Oracle database and an Oracle instance.
Every time a database is started, a system global area (SGA) (please refer to Figure1)
is allocated and Oracle background processes are started. The combination of the
background processes and memory buffers is known as an Oracle instance.
Real Application Clusters: Multiple Instance Systems
Some hardware architectures (for example, shared disk systems) enable multiple
computers to share access to data, software, or peripheral devices. Real Application
Clusters (RAC) take advantage of such architecture by running multiple instances that
share a single physical database. In most applications, RAC enables access to a single
database by users on multiple machines with increased performance.
An Oracle database server uses memory structures and processes to manage and
access the database. All memory structures exist in the main memory of computers
that constitute the database system. Processes are jobs that work in the memory of
these computers.
Instance Memory Structures

66
Oracle creates and uses memory structures to complete several jobs. For example, Oracle
memory stores program code being run and data shared among users. Two basic
memory structures associated with Oracle are: the system global area and the program
global area. The following subsections explain each in detail.

System Global Area


The System Global Area (SGA) is a shared memory region that contains data and
control information for one Oracle instance. Oracle allocates the SGA when an
instance starts and deallocates it when the instance shuts down. Each instance has its
own SGA.

Users currently connected to an Oracle database share the data in the SGA. For
optimal performance, the entire SGA should be as large as possible (while still fitting
in to the real memory) to store as much data in memory as possible and to minimise
disk I/O.

The information stored in the SGA is divided into several types of memory structures,
including the database buffers, redo log buffer, and the shared pool.

Database Buffer Cache of the SGA


Database buffers store the most recently used blocks of data. The set of database
buffers in an instance is the database buffer cache. The buffer cache contains modified
as well as unmodified blocks. Because the most recently (and often, the most
frequently) used data is kept in the memory, less disk I/O is required, and the
performance is improved.

Redo Log Buffer of the SGA


The redo log buffer stores redo entries – a log of changes made to the database. The
redo entries stored in the redo log buffers are written to an online redo log, which is
used if database recovery is necessary. The size of the redo log is static.

Shared Pool of the SGA


The shared pool contains shared memory constructs, such as shared SQL areas. A
shared SQL area is required to process every unique SQL statement submitted to a
database. A shared SQL area contains information such as the parse tree and execution
plan for the corresponding statement. A single shared SQL area is used by multiple
applications that issue the same statement, leaving more shared memory for other
uses.
Statement Handles or Cursors
A cursor is a handle or name for a private SQL area in which a parsed statement and
other information for processing the statement are kept. (Oracle Call Interface, OCI,
refers to these as statement handles). Although most Oracle users rely on automatic
cursor handling of Oracle utilities, the programmatic interfaces offer application
designers more control over cursors.

For example, in precompiler application development, a cursor is a named resource


available to a program and can be used specifically to parse SQL statements
embedded within the application. Application developers can code an application so
that it controls the phases of SQL statement execution and thus improves application
performance.
Program Global Area

67
Emerging Trends and The Program Global Area (PGA) is a memory buffer that contains data and control
Example DBMS information for a server process. A PGA is created by Oracle when a server process is
Architecture started. The information in a PGA depends on the configuration of Oracle.

4.4.6 Oracle Background Processes


An Oracle database uses memory structures and processes to manage and access the
database. All memory structures exist in the main memory of computers that
constitute the database system. Processes are jobs that work in the memory of these
computers.

The architectural features discussed in this section enables the Oracle database to
support:
• many users concurrently accessing a single database, and
• the high performance required by concurrent multiuser, multiapplication
database systems.

Oracle creates a set of background processes for each instance. The background
processes consolidate functions that would otherwise be handled by multiple Oracle
programs running for each user process. They asynchronously perform I/O and
monitor other Oracle processes to provide increased parallelism for better
performance and reliability. There are numerous background processes, and each
Oracle instance can use several background processes.

Process Architecture
A process is a “thread of control” or a mechanism in an operating system that can run
a series of steps. Some operating systems use the terms job or task. A process
generally has its own private memory area in which it runs. An Oracle database server
has two general types of processes: user processes and Oracle processes.

User (Client) Processes


User processes are created and maintained to run the software code of an application
program or an Oracle tool (such as Enterprise Manager). User processes also manage
communication with the server process through the program interface, which is
described in a later section.

Oracle Processes
Oracle processes are invoked by other processes to perform functions on behalf of the
invoking process. Oracle creates server processes to handle requests from connected
user processes. A server process communicates with the user process and interacts
with Oracle to carry out requests from the associated user process. For example, if a
user queries some data not already in the database buffers of the SGA, then the
associated server process reads the proper data blocks from the datafiles into the SGA.

Oracle can be configured to vary the number of user processes for each server process.
In a dedicated server configuration, a server process handles requests for a single user
process. A shared server configuration lets many user processes share a small number
of server processes, minimising the number of server processes and maximising the
use of available system resources.

On some systems, the user and server processes are separate, while on others they are
combined into a single process. If a system uses the shared server or if the user and
server processes run on different machines, then the user and server processes must be
separate. Client/server systems separate the user and server processes and run them on
different machines.

68
Oracle
Let us discuss a few important process briefly next.

Database Writer (DBWR)


The Database Writer process writes database blocks from the database buffer cache in
the SGA to the datafiles on disk. An Oracle instance can have up to 10 DBWR
processes, from DBW0 to DBW9, to handle the I/O load to multiple datafiles. Most
instances run one DBWR. DBWR writes blocks out of the cache for two main
reasons:
• If Oracle needs to perform a checkpoint (i.e., to update the blocks of the
datafiles so that they “catch up” to the redo logs). Oracle writes the redo for a
transaction when it’s committed, and later writes the actual blocks. Periodically,
Oracle performs a checkpoint to bring the datafile contents in line with the redo
that was written out for the committed transactions.

• If Oracle needs to read blocks requested by users into the cache and there is no
free space in the buffer cache. The blocks written out are the least recently used
blocks. Writing blocks in this order minimises the performance impact of losing
them from the buffer cache.

Log Writer (LGWR)


The Log Writer process writes the redo information from the log buffer in the SGA to
all copies of the current redo log file on disk. As transactions proceed, the associated
redo information is stored in the redo log buffer in the SGA. When a transaction is
committed, Oracle makes the redo information permanent by invoking the Log Writer
to write it to disk.

System Monitor (SMON)


The System Monitor process maintains the overall health and safety of an Oracle
instance. SMON performs crash recovery when the instance is started after a failure
and coordinates and performs recovery for a failed instance when there is more than
one instance accessing the same database, as with Oracle Parallel Server/Real
Application Clusters. SMON also cleans up adjacent pieces of free space in the
datafiles by merging them into one piece and gets rid of space used for sorting rows
when that space is no longer needed.

Process Monitor (PMON)

The Process Monitor process watches over the user processes that access the database.
If a user process terminates abnormally, PMON is responsible for cleaning up any of
the resources left behind (such as memory) and for releasing any locks held by the
failed process.

Archiver (ARC)
The Archiver process reads the redo log files once Oracle has filled them and writes a
copy of the used redo log files to the specified archive log destination(s).

An Oracle instance can have up to 10 processes, numbered as described for DBWR


above. LGWR will start additional Archivers as needed, based on the load, up to the
limit specified by the initialisation parameter LOG_ARCHIVE_MAX_PROCESSES.
Checkpoint (CKPT)

69
Emerging Trends and The Checkpoint process works with DBWR to perform checkpoints. CKPT updates
Example DBMS the control file and database file headers to update the checkpoint data when the
Architecture checkpoint is complete.

Recover (RECO)
The Recover process automatically cleans up failed or suspended distributed
transactions.

4.4.7 How Oracle Works?


The following example describes the most basic level of operations that Oracle
performs. This illustrates an Oracle configuration where the user and associated server
process are on separate machines (connected through a network).

1) An instance has started on the computer running Oracle (often called the host or
database server).

2) A computer running an application (a local machine or client workstation) runs


the application in a user process. The client application attempts to establish a
connection to the server using the proper Oracle Net Services driver.

3) The server is running the proper Oracle Net Services driver. The server detects
the connection request from the application and creates a dedicated server
process on behalf of the user process.

4) The user runs a SQL statement and commits the transaction. For example, the
user changes a name in a row of a table.

5) The server process receives the statement and checks the shared pool for any
shared SQL area that contains a similar SQL statement. If a shared SQL area is
found, then the server process checks the user’s access privileges to the
requested data, and the previously existing shared SQL area is used to process
the statement. If not, then a new shared SQL area is allocated for the statement,
so it can be parsed and processed.

6) The server process retrieves any necessary data values from the actual datafile
(table) or those stored in the SGA.

7) The server process modifies data in the system global area. The DBWn process
writes modified blocks permanently to the disk when doing so is efficient.
Since, the transaction is committed, the LGWR process immediately records the
transaction in the redo log file.

8) If the transaction is successful, then the server process sends a message across
the network to the application. If it is not successful, then an error message is
transmitted.

9) Throughout this entire procedure, the other background processes run, watching
for conditions that require intervention. In addition, the database server manages
other users’ transactions and prevents contention between transactions that
request the same data.

70
) Check Your Progress 2 Oracle

1) What are the different files used in ORACLE?


……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2) ………………………Storage Structure enables Oracle to have fine – gained
control of disk space use.

3) ……………………..are the basic unit of data storage in an ORACLE database.

4) What are the advantages of B+ tree structure?


……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
5) What does a Data Dictionary store?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………

6) ………………………is a handle for a private SQL area in which a parsed


statement and other information for processing the statement are kept.

7) ……………………………is a memory buffer that contains data and control


information for a server process.

8) What is a process in Oracle?


……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
9) Expand the following:
a) LGWR
b) ARC
c) RECO.

4.5 QUERY PROCESSING AND OPTIMISATION


This section provides the various stages of the execution of a SQL statement in each
stage of DML statement processing. The following stages are necessary for each type
of statement processing:

71
Emerging Trends and
Example DBMS
Architecture

Figure 3: Stages in Query Processing (Source: www.oracle.com)

Stage 1: Create a Cursor


A program interface call creates a cursor. This cursor is not the Oracle PL/SQL cursor
but the reference to memory space that is used to keep and manipulate the data in the
primary memory. The cursor is created independent of any SQL statement: it is
created in expectation of any SQL statement. In most applications, the cursor creation
is automatic. However, in pre-compiler programs, cursor creation can either occur
implicitly or be explicitly declared.

Stage 2: Parse the Statement


During parsing, the SQL statement is passed from the user process to Oracle, and a
parsed representation of the SQL statement is loaded into a shared SQL area. Many
errors may be detected during this stage of statement processing.
Parsing is the process of:
72
• translating a SQL statement, verifying it to be a valid statement, Oracle
• performing data dictionary lookups to check table and column definitions,
• acquiring parse locks on required objects so that their definitions do not change
during the statement’s parsing,
• checking privileges to access referenced schema objects,
• determining the optimal execution plan for the statement,
• loading it into a shared SQL area, and
• routing all or part of distributed statements to remote nodes that contain
referenced data.

Oracle parses a SQL statement only if a shared SQL area for a similar SQL statement
does not exist in the shared pool. In this case, a new-shared SQL area is allocated, and
the statement is parsed.

The parse stage includes processing requirements that need to be done only once, no
matter, how many times the statement is executed. Oracle translates each SQL
statement only once, re-executing that parsed statement during subsequent references
to the statement.

Although parsing a SQL statement validates that statement, parsing only identifies
errors that can be found before the execution of the statement. Thus, some errors
cannot be caught by parsing. For example, errors in data conversion or errors in data
(such as an attempt to enter duplicate values in a primary key) and deadlocks are all
errors or situations that can be encountered and reported only during the execution
stage.

Note: Queries are different from other types of SQL statements because, if successful,
they return data as results. Whereas, other statements simply return success or failure,
for instance, a query can return one row or thousands of rows. The results of a query
are always in tabular format, and the rows of the result are fetched (retrieved), either a
row at a time or in groups.

Several issues are related only to query processing. Queries include not only explicit
SELECT statements but also the implicit queries (subqueries) in other SQL
statements. For example, each of the following statements, require a query as a part of
its execution:
INSERT INTO table SELECT...

UPDATE table SET x = y WHERE...

DELETE FROM table WHERE...

CREATE table AS SELECT...


In particular, queries:

• require read consistency,


• can use temporary segments for intermediate processing, and
• can require the describe, define, and fetch stages of SQL statement processing.

Stage 3: Describe Results of a Query


The describe stage is necessary only if the characteristics of a query’s result are not
known; for example, when a query is entered interactively by a user. In this case, the
describe stage determines the characteristics (datatypes, lengths, and names) of a
query’s result.
Stage 4: Define Output of a Query

73
Emerging Trends and In the define stage of a query, you specify the location, size, and datatype of variables
Example DBMS defined to receive each fetched value. Oracle performs datatype conversion if
Architecture necessary.
Stage 5: Bind any Variables
At this point, Oracle knows the meaning of the SQL statement but still does not have
enough information to execute the statement. Oracle needs values for any variables
listed in the statement; for example, Oracle needs a value for DEPT_NUMBER. The
process of obtaining these values is called binding variables.

A program must specify the location (memory address) of the value. End users of
applications may be unaware that they are specifying bind variables, because the
Oracle utility can simply prompt them for a new value.
Because you specify the location (binding by reference), you need not rebind the
variable before re-execution. You can change its value and Oracle will look up the
value on each execution, using the memory address.

You must also specify a datatype and length for each value (unless they are implied or
defaulted) in order for Oracle to perform datatype conversions.

Stage 6: Parallelise the Statement


Oracle can parallelise queries (SELECTs, INSERTs, UPDATEs, MERGEs,
DELETEs), and some DDL operations such as index creation, creating a table with a
subquery, and operations on partitions. Parallelisation causes multiple server
processes to perform the work of the SQL statement so it can be completed faster.

Stage 7: Execute the Statement


At this point, Oracle has all the necessary information and resources, so the statement
is executed. If the statement is a query or an INSERT statement, no rows need to be
locked because no data is being changed. If the statement is an UPDATE or DELETE
statement, however, all rows that the statement affects are locked from use by other
users of the database until the next COMMIT, ROLLBACK, or SAVEPOINT for the
transaction. This ensures data integrity.

For some statements you can specify a number of executions to be performed. This is
called array processing. Given n number of executions, the bind and define locations
are assumed to be the beginning of an array of size n.
Stage 8: Fetch Rows of a Query
In the fetch stage, rows are selected and ordered (if requested by the query), and each
successive fetch retrieves another row of the result until the last row has been fetched.
Stage 9: Close the Cursor
The final stage of processing a SQL statement is closing the cursor.

Query Optimisation
An important facet of database system performance tuning is the tuning of SQL
statements. SQL tuning involves three basic steps:

• Identifying high load or top SQL statements that are responsible for a large
share of the application workload and system resources, by reviewing past SQL
execution history available in the system.
• Verifying that the execution plans produced by the query optimiser for these
statements perform reasonably.

74
• Implementing corrective actions to generate better execution plans for poorly Oracle
performing SQL statements.

These three steps are repeated until the system performance reaches a satisfactory
level or no more statements can be tuned.

A SQL statement can be executed in many different ways, such as full table scans,
index scans, nested loops, and hash joins. The Oracle query optimiser determines the
most efficient way to execute a SQL statement after considering many factors related
to the objects referenced and the conditions specified in the query. This determination
is an important step in the processing of any SQL statement and can greatly affect
execution time.

The query optimiser determines most efficient execution plan by considering available
access paths and by factoring in information based on statistics for the schema objects
(tables or indexes) accessed by the SQL statement. The query optimiser also considers
hints, which are optimisation suggestions placed in a comment in the statement.

The query optimiser performs the following steps:


1) The optimiser generates a set of potential plans for the SQL statement based on
available access paths and hints.

2) The optimiser estimates the cost of each plan based on statistics in the data
dictionary for data distribution and storage characteristics of the tables, indexes,
and partitions accessed by the statement.
The cost is an estimated value proportional to the expected resource use needed
to execute the statement with a particular plan. The optimiser calculates the cost
of access paths and join orders based on the estimated computer resources,
which includes I/O, CPU, and memory.

Serial plans with higher costs take more time to execute than those with lesser
costs. When using a parallel plan, however, resource use is not directly related
to elapsed time.

3) The optimiser compares the costs of the plans and chooses the one with the
lowest cost.

4.6 DISTRIBUTED ORACLE


One of the strongest features of the Oracle database is its ability to scale up to
handling extremely large volumes of data and users. Oracle scales not only by running
on more and more powerful platforms, but also by running in a distributed
configuration. Oracle databases on separate platforms can be combined to act as a
single logical distributed database. This section describes some of the basic ways that
Oracle handles database interactions in a distributed database system.

4.6.1 Distributed Queries and Transactions


Data within an organisation is often spread among multiple databases for reasons of
both capacity and organisational responsibility. Users may want to query this
distributed data or update it as if it existed within a single database. Oracle, first,
introduced distributed databases in response to the requirements for accessing data on
multiple platforms in the early 1980s. Distributed queries can retrieve data from
multiple databases. Distributed transactions can insert, update, or delete data on
distributed databases. Oracle’s two-phase commit mechanism guarantees that all the
database servers that are part of a transaction will either commit or roll back the
75
Emerging Trends and transaction. Background recovery processes can ensure database consistency in the
Example DBMS event of system interruption during distributed transactions. Once the failed system
Architecture comes back online, the same process will complete the distributed transactions.

Distributed transactions can also be implemented using popular transaction monitors


(TPs) that interact with Oracle via XA, an industry standard (X/Open) interface.
Oracle8i added native transaction coordination with the Microsoft Transaction Server
(MTS), so you can implement a distributed transaction initiated under the control of
MTS through an Oracle database.
4.6.2 Heterogeneous Services
Heterogeneous Services allow non-Oracle data and services to be accessed from an
Oracle database through generic connectivity via ODBC and OLE-DB included with
the database. Optional Transparent Gateways use agents specifically tailored for a
variety of target systems. Transparent Gateways allow users to submit Oracle SQL
statements to a non-Oracle distributed database source and have them automatically
translated into the SQL dialect of the non-Oracle source system, which remains
transparent to the user. In addition to providing underlying SQL services,
Heterogeneous Services provide transaction services utilising Oracle’s two-phase
commit with non-Oracle databases and procedural services that call for third-
generation language routines on non-Oracle systems. Users interact with the Oracle
database as if all objects are stored in the Oracle database, and Heterogeneous
Services handle the transparent interaction with the foreign database on the user’s
behalf.

4.7 DATA MOVEMENT IN ORACLE


Moving data from one Oracle database to another is often a requirement when using
distributed databases, or when a user wants to implement multiple copies of the same
database in multiple locations to reduce network traffic or increase data availability.
You can export data and data dictionaries (metadata) from one database and import
them into another. Oracle Database 10g introduces a new high speed data pump for
the import and export of data. Oracle also offers many other advanced features in this
category, including replication, transportable tablespaces, and Advanced Queuing.
The ensuing sections describe the technology used to move data from one Oracle
database to another automatically.

4.7.1 Basic Replication


You can use basic replication to move recently, added and updated data from an
Oracle “master” database to databases on which duplicate sets of data reside. In basic
replication, only the single master is updated. You can manage replication through the
Oracle Enterprise Manager (OEM or EM). While replication has been a part of all
recent Oracle releases, replication based on logs is a more recent addition, appearing
for the first time in Oracle9i Release 2.

4.7.2 Advanced Replication


You can use advanced replication in multi-master systems in which any of the
databases involved can be updated and in which conflict-resolution features are
required to resolve inconsistencies in the data. Because, there is more than one master
database, the same data may be updated on multiple systems at the same time.
Conflict resolution is necessary to determine the “true” version of the data. Oracle’s
advanced replication includes a number of conflict-resolution scenarios and also
allows programmers to write their own conflict-resolution scenarios.

76
4.7.3 Transportable Tablespaces Oracle

Transportable tablespaces were introduced in Oracle8i. Instead of using the


export/import process, which dumps data and the structures that contain it into an
intermediate files for loading, you simply put the tablespaces in read-only mode,
move or copy them from one database to another, and mount them. You must export
the data dictionary (metadata) for the tablespace from the source and import it at the
target. This feature can save a lot of time during maintenance, because it simplifies the
process. Oracle Database 10g allows you to move data with transportable tablespaces
between different platforms or operating systems.

4.7.4 Advanced Queuing and Streams


Advanced Queuing (AQ), first introduced in Oracle8, provides the means to
asynchronously send messages from one Oracle database to another. Because
messages are stored in a queue in the database and sent asynchronously when a
connection is made, the amount of overhead and network traffic is much lower than it
would be using traditional guaranteed delivery through the two-phase commit
protocol between source and target. By storing the messages in the database, AQ
provides a solution with greater recoverability than other queuing solutions that store
messages in file systems.

Oracle messaging adds the capability to develop and deploy a content-based publish
and subscribe solution using the rules engine to determine relevant subscribing
applications. As new content is published to a subscriber list, the rules on the list
determine which subscribers should receive the content. This approach means that a
single list can efficiently serve the needs of different subscriber communities.

In the first release of Oracle9i, AQ added XML support and Oracle Internet Directory
(OID) integration. This technology is leveraged in Oracle Application Interconnect
(OAI), which includes adapters to non-Oracle applications, messaging products, and
databases.

The second release of Oracle9i introduced Streams. Streams have three major
components: log-based replication for data capture, queuing for data staging, and user-
defined rules for data consumption. Oracle Database 10g includes support for change
data capture and file transfer solutions via Streams.

4.7.5 Extraction, Transformation and Loading


Oracle Warehouse Builder is a tool for the design of target data stores including data
warehouses and a metadata repository, but it also provides a front-end to building
source-to-target maps and for generating extraction, transformation, and loading
(ETL) scripts. OWB leverages key embedded ETL features first made available in the
Oracle9i database.

4.8 DATABASE ADMINISTRATION TOOLS


Oracle includes many features that make the database easier to manage. We will
discuss two types of tools: Oracle Enterprise Manager, and add-on packs in this
section. The tools for backup and recovery, and database availability will be explained
in the next section.
Oracle Enterprise Manager
As part of every Database Server, Oracle provides the Oracle Enterprise Manager
(EM), a database management tool framework with a graphical interface to manage
database users, instances, and features (such as replication) that can provide additional

77
Emerging Trends and information about the Oracle environment. EM can also manage Oracle’s Application
Example DBMS Server, Collaboration Suite, and E-Business Suite.
Architecture
Prior to the Oracle8i database, the EM software was installed on Windows-based
systems. Each repository was accessible only by a single database manager at a time.
EM evolved to a Java release providing access from a browser or Windows-based
system. Multiple database administrators could then access the EM repository at the
same time.

More recently, an EM HTML console was released with Oracle9iAS. This console has
important new application performance management and configuration management
features. The HTML version supplemented the Java-based Enterprise Manager earlier
available. Enterprise Manager 10g, released with Oracle Database 10g, also comes in
Java and HTML versions. EM can be deployed in several ways: as a central console
for monitoring multiple databases leveraging agents, as a “product console” (easily
installed with each individual database), or through remote access, also known as
“studio mode”. The HTML-based console includes advanced management capabilities
for rapid installation, deployment across grids of computers, provisioning, upgrades,
and automated patching.

Oracle Enterprise Manager 10g has several additional options (sometimes called
packs) for managing the Oracle Enterprise Edition database. These options, which are
available for the HTML-based console, the Java-based console, or both, include:
• Database Diagnostics Option
• Application Server Diagnostics Option
• Database Tuning Option
• Database Change Management Option
• Database Configuration Management Option
• Application Server Configuration Management Option.

Standard management pack functionality for managing the Standard Edition is now
also available for the HTML-based console.

4.9 BACKUP AND RECOVERY IN ORACLE


As every database administrator knows, backing up a database is a rather mundane but
necessary task. An improper backup makes recovery difficult, if not impossible.
Unfortunately, people often realise the extreme importance of this everyday task only
when it is too late – usually after losing critical business data due to the failure of a
related system.

Recovery Manager
Typical backups include complete database backups (the most common type),
tablespace backups, datafile backups, control file backups, and archived redo log
backups. Oracle8 introduced the Recovery Manager (RMAN) for server-managed
backup and recovery of the database. Previously, Oracle’s Enterprise Backup Utility
(EBU) provided a similar solution on some platforms. However, RMAN, with its
Recovery Catalogue we stored in an Oracle database, provides a much more complete
solution. RMAN can automatically locate, back up, restore, and recover datafiles,
control files, and archived redo logs. RMAN, since Oracle9i, can restart backups,
restore and implement recovery window policies when backups expire. The Oracle
Enterprise Manager Backup Manager provides a GUI-based interface to RMAN.
Oracle Enterprise Manager 10g introduces a new improved job scheduler that can be
used with RMAN and other scripts, and that can manage automatic backups to disk.

78
Incremental Backup and Recovery Oracle

RMAN can perform incremental backups of Enterprise Edition databases. Incremental


backups, back up, only the blocks modified since the last backup of a datafile,
tablespace, or database; thus, they are smaller and faster than complete backups.
RMAN can also perform point-in-time recovery, which allows the recovery of data
until just prior to an undesirable event (such as the mistaken dropping of a table).

Oracle Storage Manager and Automated Disk Based Backup and Recovery
Various media-management software vendors support RMAN. Since Oracle8i, a
Storage Manager has been developed with Oracle to provide media-management
services, including the tracking of tape volumes, for up to four devices. RMAN
interfaces automatically with the media-management software to request the mounting
of tapes as needed for backup and recovery operations.

Oracle Database 10g introduces Automated Disk Based Backup and Recovery. The
disk acts as a cache, and archives and backups can then be copied to tape. The disk
“cache” can also serve as a staging area for recovery.

4.10 ORACLE LITE


Oracle Lite is Oracle’s suite of products for enabling mobile use of database-centric
applications. The key components of Oracle Lite include the Oracle Lite Database,
Mobile Development Kit, and Mobile Server (an extension of the Oracle Application
Server).

Although the Oracle Lite Database engine runs on a much smaller platform than other
Oracle implementations (it requires a 50K to 1 MB footprint depending on the
platform), Mobile SQL, C++, and Java-based applications can run against the
database. ODBC is therefore, supported. Java support includes Java stored procedures
and JDBC. The database is self-tuning and self-administering. In addition to
Windows-based laptops, Oracle Lite also supports for handheld devices running on
WindowsCE, Palm’s Computing Platform, and Symbian EPOC.

4.11 SCALABILITY AND PERFORMANCE


FEATURES OF ORACLE
Oracle includes several software mechanisms to fulfil the following important
requirements of an information management system:
• Data concurrency of a multiuser system must be maximised.

• Data must be read and modified in a consistent fashion. The data a user is
viewing or changing is not changed (by other users) until the user is finished
with the data.

• High performance is required for maximum productivity from the many users
of the database system.

4.11.1 Concurrency
A primary concern of a multiuser database management system is controlling
concurrency, which is the simultaneous access of the same data by many users.
Without adequate concurrency controls, data could be updated or changed improperly,
compromising data integrity.

79
Emerging Trends and One way to manage data concurrency is to make each user wait for a turn. The goal of
Example DBMS a database management system is to reduce that wait so it is either nonexistent or
Architecture negligible to each user. All data manipulation language statements should proceed
with as little interference as possible, and undesirable interactions among concurrent
transactions must be prevented. Neither performance nor data integrity can be
compromised.

Oracle resolves such issues by using various types of locks and a multiversion
consistency model. These features are based on the concept of a transaction. It is the
application designer’s responsibility to ensure that transactions fully exploit these
concurrency and consistency features.

4.11.2 Read Consistency


Read consistency, as supported by Oracle, does the following:

• Guarantees that the set of data seen by a statement is consistent with respect to a
single point in time and does not change during statement execution (statement-
level read consistency).
• Ensures that readers of database data do not wait for writers or other readers of
the same data.
• Ensures that writers of database data do not wait for readers of the same data.
• Ensures that writers only wait for other writers if they attempt to update
identical rows in concurrent transactions.

The simplest way to think of Oracle’s implementation of read consistency is to


imagine each user operating a private copy of the database, hence the multiversion
consistency model.

To manage the multiversion consistency model, Oracle must create a read-consistent


set of data when a table is queried (read) and simultaneously updated (written). When
an update occurs, the original data values changed by the update are recorded in the
database undo records. As long as this update remains part of an uncommitted
transaction, any user that later queries the modified data views the original data
values. Oracle uses current information in the system global area and information in
the undo records to construct a read-consistent view of a table’s data for a query.

Only when a transaction is committed are the changes of the transaction made
permanent. Statements that start after the user’s transaction is committed only see the
changes made by the committed transaction.

Transaction is the key to Oracle’s strategy for providing read consistency. This unit of
committed (or uncommitted) SQL statements dictates the start point for read-
consistent views generated on behalf of readers and controls modified data can be
seen by other transactions of the time span when, database for reading or updating.

By default, Oracle guarantees statement-level read consistency. The set of data


returned by a single query is consistent with respect to a single point in time.
However, in some situations, you might also require transaction-level read
consistency. This is the ability to run multiple queries within a single transaction, all
of which are read-consistent with respect to the same point in time, so that queries in
this transaction do not see the effects of intervening committed transactions. If you
want to run a number of queries against multiple tables and if you are not doing any
updating, you would prefer a read-only transaction.

4.11.3 Locking Mechanisms

80
Oracle also uses locks to control concurrent access to data. When updating Oracle
information, the data server holds that information with a lock, until, the update is
submitted or committed. Until that happens, no one else can make changes to the
locked information. This ensures the data integrity of the system.

Oracle provides unique non-escalating row-level locking. Unlike other data servers
that “escalate” locks to cover entire groups of rows or even the entire table, Oracle
always locks only the row of information being updated. Because Oracle includes the
locking information with the actual rows themselves, Oracle can lock an unlimited
number of rows so users can work concurrently without unnecessary delays.

Automatic Locking
Oracle locking is performed automatically and requires no user action. Implicit
locking occurs for SQL statements as necessary, depending on the action requested.
Oracle’s lock manager automatically locks table data at the row level. By locking
table data at the row level, contention for the same data is minimised.

Oracle’s lock manager maintains several different types of row locks, depending on
the type of operation that established the lock. The two general types of locks are:
exclusive locks and share locks. Only one exclusive lock can be placed on a resource
(such as a row or a table); however, many share locks can be placed on a single
resource. Both exclusive and share locks always allow queries on the locked resource
but prohibit other activity on the resource (such as updates and deletes).

Manual Locking
Under some circumstances, a user might want to override default locking. Oracle
allows manual override of automatic locking features at both the row level (by first
querying for the rows that will be updated in a subsequent statement) and the table
level.

4.11.4 Real Application Clusters


Real Application Clusters (RAC) comprises several Oracle instances running on
multiple clustered machines, which communicate with each other by means of an
interconnect. RAC uses cluster software to access a shared database that resides on a
shared disk. RAC combines the processing power of these multiple interconnected
computers to provide system redundancy, near linear scalability, and high availability.
RAC also offers significant advantages for both OLTP and data warehouse systems
and all systems and applications can efficiently exploit clustered environments.

You can scale applications in RAC environments to meet increasing data processing
demands without changing the application code. As you add resources such as nodes
or storage, RAC extends the processing powers of these resources beyond the limits of
the individual components.

4.11.5 Portability
Oracle provides unique portability across all major platforms and ensures that your
applications run without modification after changing platforms. This is because the
Oracle code base is identical across platforms, so you have identical feature
functionality across all platforms, for complete application transparency. Because of
this portability, you can easily upgrade to a more powerful server as your
requirements change.

) Check Your Progress 3


1) What is Parsing?
81
Emerging Trends and ……………………………………………………………………………………
Example DBMS
Architecture ……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2) What are the three basic steps involved in SQL tuning?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
3) The oracle …………………………………determines the most efficient way to
execute a SQL statement after considering many factors related to the object
references and the conditions specified in the Query.

4) What are heterogeneous services?


……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
5) Name the basic technology used to move data from are ORACLE database to
another.
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
6) What is Oracle Enterprise manager?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
7) What are the different types of backup?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
8) ……………………….. is Oracle’s suite of products for enabling mobile use of
database – centric Applications.

9) …………………………….. comprises several ORACLE instances running on


multiple clustered machines.

4.12 ORACLE DATA WAREHOUSING


A data warehouse is a relational database designed for query and analysis rather than
for transaction processing. It usually contains historical data derived from transaction
data, but it can include data from other sources. It separates analysis workload from
transaction workload and enables an organisation to consolidate data from several
sources.

82
In addition to a relational database, a data warehouse environment includes an Oracle
extraction, transportation, transformation, and loading (ETL) solution, an online
analytical processing (OLAP) engine, client analysis tools, and other applications that
manage the process of gathering data and delivering it to business users.

4.12.1 Extraction, Transformation and Loading (ETL)


You must load your data warehouse regularly so that it can serve its purpose of
facilitating business analysis. To do this, data from one or more operational systems
must be extracted and copied into the warehouse. The process of extracting data from
source systems and bringing it into the data warehouse is commonly called ETL,
which stands for extraction, transformation, and loading.

4.12.2 Materialised Views


A materialised view provides indirect access to table data by storing the results of a
query in a separate schema object. Unlike an ordinary view, which does not take up
any storage space or contain any data, a materialised view contains the rows resulting
from a query against one or more base tables or views. A materialised view can be
stored in the same database as its base tables or in a different database.

Materialised views are stored in the same database as their base tables can improve
query performance through query rewrites. Query rewrites are particularly useful in a
data warehouse environment.

4.12.3 Bitmap Indexes


Data warehousing environments typically have large amounts of data and ad hoc
queries, but a low level of concurrent database manipulation language (DML)
transactions. For such applications, bitmap indexing provides:
• reduced response time for large classes of ad hoc queries,
• reduced storage requirements compared to other indexing techniques,
• dramatic performance gains even on hardware with a relatively small number of
CPUs or a small amount of memory, and
• efficient maintenance during parallel DML and loads.

Fully indexing a large table with a traditional B-tree index can be prohibitively
expensive in terms of space because the indexes can be several times larger than the
data in the table. Bitmap indexes are typically only a fraction of the size of the
indexed data in the table.

4.12.4 Table Compression


To reduce disk use and memory use (specifically, the buffer cache), you can store
tables and partitioned tables in a compressed format inside the database. This often
leads to better scaleup for read-only operations. Table compression can also speed up
query execution. There is, however, a slight cost in CPU overhead.

4.12.5 Parallel Execution


When Oracle runs SQL statements in parallels, multiple processes work together
simultaneously to run a single SQL statement. By dividing the work, necessary to run
a statement among multiple processes, Oracle can run the statement more quickly than
if only a single process ran it. This is called parallel execution or parallel processing.

Parallel execution dramatically reduces response time for data-intensive operations on


large databases, because statement processing can be split up among many CPUs on a
single Oracle system.
83
Emerging Trends and
Example DBMS 4.12.6 Analytic SQL
Architecture
Oracle has many SQL operations for performing analytic operations in the database.
These include ranking, moving averages, cumulative sums, ratio-to-reports, and
period-over-period comparisons.

4.12.7 OLAP Capabilities


Application developers can use SQL online analytical processing (OLAP) functions
for standard and ad-hoc reporting. For additional analytic functionality, Oracle OLAP
provides multidimensional calculations, forecasting, modeling, and what-if scenarios.
This enables developers to build sophisticated analytic and planning applications such
as sales and marketing analysis, enterprise budgeting and financial analysis, and
demand planning systems. The data can also be stored in either relational tables or
multidimensional objects.

Oracle OLAP provides the query performance and calculation capability, previously
found only in multidimensional databases to Oracle’s relational platform. In addition,
it provides a Java OLAP API that is appropriate for the development of internet-ready
analytical applications. Unlike other combinations of OLAP and RDBMS technology,
Oracle OLAP is not a multidimensional database using bridges to move data from the
relational data store to a multidimensional data store. Instead, it is truly an OLAP-
enabled relational database. As a result, Oracle provides the benefits of a
multidimensional database along with the scalability, accessibility, security,
manageability, and high availability of the Oracle database. The Java OLAP API,
which is specifically designed for internet-based analytical applications, offers
productive data access.

4.12.8 Data Mining


With Oracle Data Mining, data never leaves the database — the data, data preparation,
model building, and model scoring results all remain in the database. This enables
Oracle to provide an infrastructure for application developers to integrate data mining
seamlessly with database applications. Some typical examples of the applications that
data mining are used in are call centers, ATMs, ERM, and business planning
applications. Data mining functions such as model building, testing, and scoring are
provided through a Java API.

4.12.9 Partitioning
Partitioning addresses key issues in supporting very large tables and indexes by letting
you decompose them into smaller and more manageable pieces called partitions. SQL
queries and DML statements do not need to be modified in order to access partitioned
tables. However, after partitions are defined, DDL statements can access and
manipulate individual partitions rather than entire tables or indexes. This is how
partitioning can simplify the manageability of large database objects. Also,
partitioning is entirely transparent to applications.

Partitioning is useful for many different types of applications, particularly applications


that manage large volumes of data. OLTP systems often benefit from improvements in
manageability and availability, while data warehousing systems benefit from
performance and manageability.

4.13 SECURITY FEATURES OF ORACLE


Oracle includes security features that control the accesssing and using of a database.
For example, security mechanisms in Oracle:
84
• prevent unauthorised database access, Oracle
• prevent unauthorised access to schema objects, and
• audit user actions.

Associated with each database user is a schema by the same name. By default, each
database user creates and has access to all objects in the corresponding schema.

Database security can be classified into two categories: system security and data
security.

System security includes mechanisms that control the access and use of the database
at the system level. For example, system security includes:
• valid user name/password combinations,
• the amount of disk space available to a user’s schema objects, and
• the resource limits for a user.

System security mechanisms check whether a user is authorised to connect to the


database, whether database auditing is active, and the system operations that a user
has been permitted to perform.

Data security includes mechanisms that control the access and use of the database at
the schema object level. For example, data security includes:
• users with access to a specific schema object and the specific types of actions
permitted to each user on the schema object (for example, user MOHAN can
issue SELECT and INSERT statements but not DELETE statements using the
employees table),
• the actions, if any, that are audited for each schema object, and
• data encryption to prevent unauthorised users from bypassing Oracle and
accessing the data.

Security Mechanisms
The Oracle database provides discretionary access control, which is a means of
restricting access to information based on privileges. The appropriate privilege must
be assigned to a user in order for that user to access a schema object. Appropriately
privileged users can grant other users privileges at their discretion.

Oracle manages database security using several different facilities:


• authentication to validate the identity of the entities using your networks,
databases, and applications,
• authorisation processes to limit access and actions, limits that are linked to
user’s identities and roles,
• access restrictions on objects, like tables or rows,
• security policies, and
• database auditing.

4.14 DATA INTEGRITY AND TRIGGERS IN


ORACLE
Data must adhere to certain business rules, as determined by the database
administrator or application developer. For example, assume that a business rule says
that no row in the inventory table can contain a numeric value greater than nine in the
sale_discount column. If an INSERT or UPDATE statement attempts to violate this
integrity rule, then Oracle must undo the invalid statement and return an error to the

85
Emerging Trends and application. Oracle provides integrity constraints and database triggers to manage data
Example DBMS integrity rules.
Architecture
Database triggers let you define and enforce integrity rules, but a database trigger is
not the same as an integrity constraint. Among other things, a database trigger does
not check data already loaded into a table. Therefore, it is strongly recommended that
you use database triggers only when the integrity rule cannot be enforced by integrity
constraints.
Integrity Constraints
An integrity constraint is a declarative way to define a business rule for a column of a
table. An integrity constraint is a statement about table data that is always true and
that follows these rules:
• If an integrity constraint is created for a table and some existing table data does
not satisfy the constraint, then the constraint cannot be enforced.

• After a constraint is defined, if any of the results of a DML statement violate


the integrity constraint, then the statement is rolled back, and an error is
returned.
Integrity constraints are defined with a table and are stored as part of the table’s
definition in the data dictionary, so that all database applications adhere to the same
set of rules. When a rule changes, it only needs be changed once at the database level
and not for each application.

The following integrity constraints are supported by Oracle:


• NOT NULL: Disallows nulls (empty entries) in a table’s column.
• UNIQUE KEY: Disallows duplicate values in a column or set of columns.
• PRIMARY KEY: Disallows duplicate values and nulls in a column or set of
columns.
• FOREIGN KEY: Requires each value in a column or set of columns to match a
value in a related table’s UNIQUE or PRIMARY KEY. FOREIGN KEY
integrity constraints also define referential integrity actions that dictate what
Oracle should do with dependent data if the data references is altered.
• CHECK: Disallows values that do not satisfy the logical expression of the
constraint.

Keys
Key is used to define several types of integrity constraints. A key is the column or set
of columns included in the definition of certain types of integrity constraints. Keys
describe the relationships between the different tables and columns of a relational
database. Individual values in a key are called key values.
The different types of keys include:
• Primary key: The column or set of columns included in the definition of a
table’s PRIMARY KEY constraint. A primary key’s value uniquely identifies
the rows in a table. Only one primary key can be defined for each table.

• Unique key: The column or set of columns included in the definition of a


UNIQUE constraint.

• Foreign key: The column or set of columns included in the definition of a


referential integrity constraint.

• Referenced key: The unique key or primary key of the same or a different table
referenced by a foreign key.

86
Triggers Oracle

Triggers are procedures written in PL/SQL, Java, or C that run (fire) implicitly,
whenever a table or view is modified or when some user actions or database system
action occurs.

Triggers supplement the standard capabilities of Oracle to provide a highly


customised database management system. For example, a trigger can restrict DML
operations against a table to those issued during regular business hours.

4.15 TRANSACTIONS IN ORACLE


A transaction is a logical unit of work that comprises one or more SQL statements run
by a single user. According to the ANSI/ISO SQL standard, with which Oracle is
compatible, a transaction begins with the user’s first executable SQL statement. A
transaction ends when it is explicitly committed or rolled back by that user.

Transactions let users guarantee consistent changes to data, as long as the SQL
statements within a transaction are grouped logically. A transaction should consist of
all necessary parts for one logical unit of work – no more and no less. Data in all
referenced tables are in a consistent state before the transaction begins and after it
ends. Transactions should consist of only SQL statements that make one consistent
change to the data.

Consider a banking database. When a bank customer transfers money from a savings
account to a checking account, the transaction can consist of three separate operations:
decrease in the savings account, increase in the checking account, and recording the
transaction in the transaction journal.

The transfer of funds (the transaction) includes increasing one account (one SQL
statement), decreasing another account (one SQL statement), and recording the
transaction in the journal (one SQL statement). All actions should either fail or
succeed together; the credit should not be committed without the debit. Other non-
related actions, such as a new deposit to one account, should not be included in the
transfer of funds transaction. Such statements should be in other transactions.

Oracle must guarantee that all three SQL statements are performed to maintain
accounts accurately. When something prevents one of the statements in the transaction
from running (such as a hardware failure), then the other statements of the transaction
must be undone. This is called rolling back. If an error occurs in making any of the
updates, then no updates are made.

Commit and Undo Transactions


The changes made by the SQL statements that constitute a transaction can be either
committed or rolled back. After a transaction is committed or rolled back, the next
transaction begins with the next SQL statement.

To commit, a transaction makes permanent the changes resulting from all SQL
statements in the transaction. The changes made by the SQL statements of a
transaction become visible to other user sessions’ transactions that start only after the
transaction is committed.

To undo a transaction, retracts, any of the changes resulting from the SQL statements
in the transaction. After a transaction is rolled back, the affected data is left
unchanged, as if the SQL statements in the transaction were never run.

Savepoints
87
Emerging Trends and Savepoints divide a long transaction with many SQL statements into smaller parts.
Example DBMS With savepoints, you can arbitrarily mark your work at any point within a long
Architecture transaction. This gives you the option of later rolling back all work performed from
the current point in the transaction to a declared savepoint within the transaction.

4.16 SQL VARIATIONS AND EXTENSIONS IN


ORACLE
Oracle is compliant to industry-accepted standards and participates actively in SQL
standards committees. Industry-accepted committees are the American National
Standards Institute (ANSI) and the International Standards Organisation (ISO),
affiliated to the International Electrotechnical Commission (IEC). Both ANSI and the
ISO/IEC have accepted SQL as the standard language for relational databases. Oracle
10g conforms Core SQL: 2003 for most of the features except the following partially
supported and not supported features.

Oracle partially supports these sub-features:

• CHARACTER VARYING data type (Oracle does not distinguish a zero-length


VARCHAR string from NULL).
• Character literals (Oracle regards the zero-length literal “as being null”).
• Correlation names in FROM clause (Oracle supports correlation names, but not
the optional AS keyword).
• WITH HOLD cursors (in the standard, a cursor is not held through a
ROLLBACK, but Oracle does hold through ROLLBACK).
• ALTER TABLE statement: ADD COLUMN clause (Oracle does not support
the optional keyword COLUMN in this syntax).
• User-defined functions with no overloading.
• User-defined procedures with no overloading.
• In the standard, the mode of a parameter (IN, OUT or INOUT) comes before
the parameter name, whereas in Oracle it comes after the parameter name.
• The standard uses INOUT, whereas Oracle uses IN OUT.
• Oracle requires either IS or AS after the return type and before the definition of
the routine body, while the standard lacks these keywords.

Oracle does not support the following sub-features:

• Rename columns in the FROM clause.


• DROP TABLE statement: RESTRICT clause.
• DROP VIEW statement: RESTRICT clause.
• REVOKE statement: RESTRICT clause.
• Features and conformance views.
• Distinct data types.
Oracle supports the following subfeatures in PL/SQL but not in Oracle SQL:

• RETURN statement.
Oracle’s triggers differ from the standard as follows:

• Oracle does not provide the optional syntax FOR EACH STATEMENT for the
default case, the statement trigger.
• Oracle does not support OLD TABLE and NEW TABLE; the transition tables
specified in the standard (the multiset of before and after images of affected
rows) are not available.
• The trigger body is written in PL/SQL, which is functionally equivalent to the
standard’s procedural language PSM, but not the same.

88
• In the trigger body, the new and old transition variables are referenced Oracle
beginning with a colon.
• Oracle’s row triggers are executed as the row is processed, instead of buffering
them and executing all of them after processing all rows. The standard’s
semantics are deterministic, but Oracle’s in-flight row triggers are more
performant.
• Oracle’s before row and before-statement triggers may perform DML
statements, which is forbidden in the standard. On the other hand, Oracle’s
after-row statements may not perform DML, while it is permitted in the
standard.
• When multiple triggers apply, the standard says they are executed in order of
definition; in Oracle the execution order is non-deterministic.
In addition to traditional structured data, Oracle is capable of storing, retrieving, and
processing more complex data.

• Object types, collection types, and REF types provide support for complex
structured data. A number of standard-compliant multiset operators are now
supported by the nested table collection type.
• Large objects (LOBs) provide support for both character and binary
unstructured data. A single LOB reach a size of 8 to 128 terabytes, depending
on the size of the database block.
• The XML datatype provides support for semistructured XML data.
) Check Your Progress 4
1) What is ETL?
……………………………………………………………………………………
……………………………………………………………………………………
2) What is Parallel Execution?
……………………………………………………………………………………
……………………………………………………………………………………
3) With Oracle …………………………………….data never leaves the database.

4) What does system security include?


……………………………………………………………………………………
……………………………………………………………………………………
5) How does Oracle manage database security?
……………………………………………………………………………………
……………………………………………………………………………………
6) …………………………….is a declarative way to defined a business rule for a
column of a table.

7) What are the different integrity constraints supported by oracle?


……………………………………………………………………………………
……………………………………………………………………………………
8) Name the different types of keys.
……………………………………………………………………………………
……………………………………………………………………………………
9) What is a trigger?
89
Emerging Trends and ……………………………………………………………………………………
Example DBMS
Architecture ……………………………………………………………………………………
10) …………………………….divides a long transaction with many SQL
statements into smaller parts.

4.17 SUMMARY
This unit provided an introduction to Oracle, one of the commercial DBMS in the
market. Some other products include MS SQL server, DB2 by IBM and so on. Oracle
being a commecial DBMS supports the features of the Relational Model and also
some of the Object oriented models. It has a very powerful Database Application
Development Feature that is used for database programming. Oracle supports the
Standard Query Language (SQL) and the use of embedded languages including C,
JAVA etc.
The Oracle Architecture defines the physical and logical database components of the
database that is stored in the Oracle environment. All database objects are defined
using a schema definition. This information is stored in the data dictionary. This data
dictionary is used actively to find the schema objects, integrity and security
constraints on those objects etc. Oracle defines an instance as the present database
state. There are many Oracle backend processes that support various operations on
oracle database. The database is stored in the SGA area of the memory.
Oracle is relational as the basic technology supports query optimisation. Query
optimisation is needed for commercial databases, as the size of such databases may be
very high. Oracle supports indexing using the B-tree structure, in addition the bit wise
index makes indexing even faster. Oracle also supports distributed database
technology. It supports replication, transportability of these replicas and advanced
queuing and streams.
Oracle has many tools for system administration. They support basic used
management to backup and recovery tools. Oracle Lite is the version of Oracle used
for mobile database. Oracle uses standard implementation methods for concurrency
control. Oracle has Data Warehousing capabilities. It contains Extraction,
Transformation and Loading (ETL) tools, materialised views, bitmap indexes, table
compression and Data mining and OLAP in support to a Data Warehouse. Oracle
supports both system and database security features.
The unit also explained data integrity, transactions, SQL Variations and Extensions in
Oracle.

4.18 SOLUTIONS/ANSWERS
Check Your Progress 1
1) These statements create, alter, maintain and drop schemes objects.
2) Transaction Control Statements.
3) (a) Character
(b) Numeric
(c) DATE data type
(d) LOB data type
(e) RAW and LONGRAW
(f) ROWID and UNROWID data type
4) ORACLE forms Developer
ORACLE Reports Developer
ORACLE Designer
ORACLE J Developer
ORACLE Discoverer Administrative Edition
90
ORACLE Portal. Oracle

Check Your Progress 2


1) (a) Data files
(b) Redo log files
(c) Control files
2) Logical.
3) Tables
4)
(a) All Leaf Block have same depth,
(b) B-Tree indexes are balanced,
(c) Blocks of the B+ tree are 3 quarters full on the average,
(d) B+ tree have excellent retrieval performance,
(e) Inserts, updates and deletes are all efficient, and
(f) Good Performance.
5)
(1) Valid users of Oracle data types,
(2) Information about integrity Constraints defined for the tables, and
(3) The Amount of space Allocated for a scheme Object.
6) Cursor
7) Program Global Area
8) A process is a thread of control or a mechanism in an operating system that can
run a series of steps.
9)
(a) Log writers,
(b) Archiver,
(c) Recover.

Check Your Progress 3


1) Translating a SQL statement and verifying it to be a valid statement.
2)
(1) Identifying high load on top of SQL Statements.
(2) Verifying that the execution plans perform reasonably, and
(3) Implementing corrective actions to generate better execution plans.
3) Query optimiser

4) It allows non – oracle data and services to be accessed from an Oracle database
through generic connectivity.
5)
(1) Basic replication,
(2) Advanced replication,
(3) Transportable replication,
(4) Advanced queuing and streams, and
(5) Extraction, transformation, loading.

6) A complete interface for enterprise wide application development

7)
(a) Complete database,
(b) Tablespace,
(c) Data file,
(d) Control file, and
(e) Achieved Redo Log.
91
Emerging Trends and 8) Oracle LITE.
Example DBMS
Architecture
9) Real application clusters.

Check Your Progress 4


1) ETL means extraction, transformation and loading. It is basically the process of
extracting data from source systems and transferring it to the data warehouse.
2) The process of making multiple processes run together.
3) Data Mining.
4) It includes the mechanisms that control access and use of the database at the
system level.
5) Oracle manages database security using:
• Authentication,
• Authorisation,
• Access restrictions,
• Security policies, and
• Database Auditing.
6) Integrity constraints.
7)
• NOT NULL
• UNIQUE KEY
• PRIMARY KEY
• FOREIGN KEY
• CHECK
8)
• Primary
• Unique
• Foreign
• Referenced
9) It is a event driven procedure.

10) Save points.

92

You might also like