Block 4
Block 4
1.0 INTRODUCTION
Database technology has advanced from the relational model to the distributed DBMS
and Object Oriented databases. The technology has also advanced to support data
formats using XML. In addition, data warehousing and data mining technology has
become very popular in the industry from the viewpoint of decision making and
planning.
1.1 OBJECTIVES
After going through this unit, you should be able to:
5
Emerging Trends and
Example DBMS
Architectures
1.2 MULTIMEDIA DATABASE
Multimedia and its applications have experienced tremendous growth. Multimedia
data is typically defined as containing digital images, audio, video, animation and
graphics along with the textual data. In the past decade, with the advances in network
technologies, the acquisition, creation, storage and processing of multimedia data and
its transmission over networks have grown tremendously.
(iii) Applications
With the rapid growth of computing and communication technologies, many
applications have come to the forefront. Thus, any such applications in future will
support life with multimedia data. This trend is expected to go on increasing in the
days to come.
• media commerce
• medical media databases
• bioinformatics
• ease of use of home media
• news and entertainment
• surveillance
6
• wearable computing Emerging Database
Models, Technologies
• management of meeting/presentation recordings and Applications-I
• biometrics (people identification using image, video and/or audio data).
Multimedia Databases (MMDBs) must cope with the large volume of multimedia
data, being used in various software applications. Some such applications may include
digital multimedia libraries, art and entertainment, journalism and so on. Some of
these qualities of multimedia data like size, formats etc. have direct and indirect
influence on the design and development of a multimedia database.
Media Format Data: This data defines the format of the media data after the
acquisition, processing, and encoding phases. For example, such data may consist of
information about sampling rate, resolution, frame rate, encoding scheme etc. of
various media data.
Media Keyword Data: This contains the keyword related to the description of media
data. For example, for a video, this might include the date, time, and place of
recording, the person who recorded, the scene description, etc. This is also known as
content description data.
Media Feature Data: This contains the features derived from the media data. A
feature characterises the contents of the media. For example, this could contain
information on the distribution of colours, the kinds of textures and the different
shapes present in an image. This is also referred to as content dependent data.
The last three types are known as meta data as, they describe several different aspects
of the media data. The media keyword data and media feature data are used as indices
for searching purpose. The media format data is used to present the retrieved
information.
6) One of the main requirements for such a Database would be to handle different
kinds of indices. The multimedia data is in exact and subjective in nature, thus,
the keyword-based indices and exact range searches used in traditional
databases are ineffective in such databases. For example, the retrieval of records
of students based on enrolment number is precisely defined, but the retrieval of
records of student having certain facial features from a database of facial
images, requires, content-based queries and similarity-based retrievals. Thus,
the multimedia database may require indices that are content dependent key-
word indices.
7) The Multimedia database requires developing measures of data similarity that
are closer to perceptual similarity. Such measures of similarity for different
media types need to be quantified and should correspond to perceptual
similarity. This will also help the search process.
8) Multimedia data is created all over world, so it could have distributed database
features that cover the entire world as the geographic area. Thus, the media data
may reside in many different distributed storage locations.
8
Emerging Database
9) Multimedia data may have to be delivered over available networks in real-time. Models, Technologies
and Applications-I
Please note, in this context, the audio and video data is temporal in nature. For
example, the video frames need to be presented at the rate of about 30
frames/sec for smooth motion.
Multimedia data is now being used in many database applications. Thus, multimedia
databases are required for efficient management and effective use of enormous
amounts of data.
These software are used to provide support for a wide variety of different media types,
specifically different media file formats such as image formats, video etc. These files
need to be managed, segmented, linked and searched.
The later commercial systems handle multimedia content by providing complex object
types for various kinds of media. In such databases the object orientation provides the
facilities to define new data types and operations appropriate for the media, such as
video, image and audio. Therefore, broadly MMDBMSs are extensible Object-
Relational DBMS (ORDBMSs). The most advanced solutions presently include
Oracle 10g, IBM DB2 and IBM Informix. These solutions purpose almost similar
approaches for extending the search facility for video on similarity-based techniques.
Some of the newer projects address the needs of applications for richer semantic
content. Most of them are based on the new MPEG-standards MPEG-7 and MPEG-
21.
MPEG-7
MPEG-7 is the ISO/IEC 15938 standard for multimedia descriptions that was issued
in 2002. It is XML based multimedia meta-data standard, and describes various
elements for multimedia processing cycle from the capture, analysis/filtering, to the
delivery and interaction.
• the applications utilising multimedia data are very diverse in nature. There is a
need for the standardisation of such database technologies,
9
Emerging Trends and • technology is ever changing, thus, creating further hurdles in the way of
Example DBMS
Architectures
multimedia databases,
• there is still a need to refine the algorithms to represent multimedia information
semantically. This also creates problems with respect to information
interpretation and comparison.
) Check Your Progress 1
1) What are the reasons for the growth of multimedia data?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2) List four application areas of multimedia databases.
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
3) What are the contents of multimedia database?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
4) List the challenges in designing multimedia databases.
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
Requirements of a GIS
The data in GIS needs to be represented in graphical form. Such data would require
any of the following formats:
A GIS must also support the analysis of data. Some of the sample data analysis
operations that may be needed for typical applications are:
11
Emerging Trends and Once the data is captured in GIS it may be processed through some special operations.
Example DBMS
Architectures
Some such operations are:
The GIS also requires the process of visualisation in order to display the data in a
proper visual.
Thus, GIS is not a database that can be implemented using either the relational or
object oriented database alone. Much more needs to be done to support them. A
detailed discussion on these topics is beyond the scope of this unit.
Biological data by nature is enormous. Bioinformation is one such key area that has
emerged in recent years and which, addresses the issues of information management
of genetic data related to DNA sequence. A detailed discussion on this topic is beyond
the scope of this unit. However, let us identify some of the basic characteristics of the
biological data.
The Human Genome Initiative is an international research initiative for the creation of
detailed genetic and physical maps for each of the twenty-four different human
chromosomes and the finding of the complete deoxyribonucleic acid (DNA) sequence
of the human genome. The term Genome is used to define the complete genetic
information about a living entity. A genetic map shows the linear arrangement of
genes or genetic marker sites on a chromosome. There are two types of genetic maps−
genetic linkage maps and physical maps. Genetic linkage maps are created on the
12
basis of the frequency with which genetic markers are co-inherited. Physical maps are Emerging Database
Models, Technologies
used to determine actual distances between genes on a chromosome. and Applications-I
One of the major uses of such databases is in computational Genomics, which refers
to the applications of computational molecular biology in genome research. On the
basis of the principles of the molecular biology, computational genomics has been
classified into three successive levels for the management and analysis of genetic data
in scientific databases. These are:
• Genomics.
• Gene expression.
• Proteomics.
1.4.1 Genomics
Genomics is a scientific discipline that focuses on the systematic investigation of the
complete set of chromosomes and genes of an organism. Genomics consists of two
component areas:
Genome Databases
Genome databases are used for the storage and analysis of genetic and physical maps.
Chromosome genetic linkage maps represent distances between markers based on
meiotic re-combination frequencies. Chromosome physical maps represent distances
between markers based on numbers of nucleotides.
13
Emerging Trends and Genome databases should define four data types:
Example DBMS
Architectures • Sequence
• Physical
• Genetic
• Bibliographic
• Sequence-tagged sites
• Coding regions
• Non-coding regions
• Control regions
• Telomeres
• Centromeres
• Repeats
• Metaphase chromosome bands.
• Locus name
• Location
• Recombination distance
• Polymorphisms
• Breakpoints
• Rearrangements
• Disease association
• Bibliographic references should cite primary scientific and medical literature.
14
• Spatial information Emerging Database
Models, Technologies
• Quantification and Applications-I
• Gene products
• User annotation of existing data
• Linked entries
• Links to other databases
o Internet access
o Internet submission.
Gene expression databases have not established defined standards for the collection,
storage, retrieval and querying of gene expression data derived from libraries of gene
expression experiments.
Data visualisation is used to display the partial results of cluster analysis generated
from large gene expression database cluster.
1.4.3 Proteomics
Proteomics is the use of quantitative protein-level measurements of gene expression in
order to characterise biological processes and describe the mechanisms of gene
translation. The objective of proteomics is the quantitative measurement of protein
expression particularly under the influence of drugs or disease perturbations. Gene
expression monitors gene transcription whereas proteomics monitors gene translation.
Proteomics provides a more direct response to functional genomics than the indirect
approach provided by gene expression.
Proteome Databases
Proteome databases also provide integrated data management and analysis systems for
the translational expression data generated by large-scale proteomics experiments.
Proteome databases integrate expression levels and properties of thousands of proteins
with the thousands of genes identified on genetic maps and offer a global approach to
the study of gene expression.
Proteome databases address five research problems that cannot be resolved by DNA
analysis:
15
Emerging Trends and The creation of comprehensive databases of genes and gene products will lay the
Example DBMS
Architectures
foundation for further construction of comprehensive databases of higher-level
mechanisms, e.g., regulation of gene expression, metabolic pathways and signalling
cascades.
A detailed discussion on these databases is beyond the scope of this Unit. You may
wish to refer to the further readings for more information.
Knowledge databases are the database for knowledge management. But what is
knowledge management? Knowledge management is the way to gather, manage and
use the knowledge of an organisation. The basic objectives of knowledge management
are to achieve improved performance, competitive advantage and higher levels of
innovation in various tasks of an organisation.
• Please note that during the representation of the fact the data is represented
using the attribute value only and not the attribute name. The attribute name
determination is on the basis of the position of the data. For instance, in the
example above Rakesh is the Mgrname.
• The rules in the Datalog do not contain the data. These are evaluated on the
basis of the stored data in order to deduce more information.
Please note the distinction here. Information visualisation mainly focuses on the tools
that are supported by the computer in order to explore and present large amount of
data in formats that may be easily understood.
You can refer to more details on this topic in the fifth semester course.
1.7 SUMMARY
This unit provides an introduction to some of the later developments in the area of
database management systems. Multimedia databases are used to store and deal with
multimedia information in a cohesive fashion. Multimedia databases are very large in
size and also require support of algorithms for searches based on various media
components. Spatial database primarily deals with multi-dimensional data. GIS is a
spatial database that can be used for many cartographic applications such as irrigation
system planning, vehicle monitoring system etc. This database system may represent
information in a multi-dimensional way.
Genome database is another very large database system that is used for the purpose of
genomics, gene expression and proteomics. Knowledge database store information
either as a set of facts and rules or as semantic models. These databases can be utilised
in order to deduce more information from the stored rules using an inference engine.
Information visualisation is an important area that may be linked to databases from
the point of visual presentation of information for better user interactions.
19
Emerging Trends and
Example DBMS
Architectures
1.8 SOLUTIONS/ANSWERS
Check Your Progress 1
1) a) Advanced technology in terms of devices that were digital in nature and
support capture and display equipment.
b) High speed data communication network and software support for
multimedia data transfer.
c) Newer application requiring multimedia support.
2) Medical media databases
Bio-informatics
Home media
News etc.
3) Content can be of two basic types:
a) Media Content
b) Meta data, which includes media, format data, media keyword data and
media feature data.
4) Some of the challenges are:
a) Support for different types of input/output
b) Handling many compressions algorithms and formats
c) Differences in OS and hardware
d) Integrating to different database models
e) Support for queries for a variety of media types
f) Handling different kinds of indices
g) Data distribution over the world etc.
Check Your Progress 2
1) GIS is a spatial database application where the spatial and non-spatial data is
represented along with the map. Some of the applications of GIS are:
• Cartographic applications
• 3-D Digital modeling applications like land elevation records
• Geographic object applications like traffic control system.
2) A GIS has the following requirements:
• Data representation through vector and raster
• Support for analysis of data
• Representation of information in an integrated fashion
• Capture of information
• Visualisation of information
• Operations on information
3) The data may need to be organised for the following three levels:
• Geonomics: Where four different types of data are represented. The
physical data may be represented using eight different fields.
• Gene expression: Where data is represented in fourteen different fields
• Proteomics: Where data is used for five research problems.
Check Your Progress 3
1) A good knowledge database will have good information, good classification and
structure and an excellent search engine.
20
Emerging Database
Models, Technologies
and Applications-I
21
Emerging Database
UNIT 2 EMERGING DATABASE MODELS, Models, Technologies
and Applications-II
TECHNOLOGIES AND
APPLICATIONS - II
Structure Page Nos.
2.0 Introduction 21
2.1 Objectives 21
2.2 Mobile and Personal Database 22
2.2.1 The Basic Framework for Mobile Computing
2.2.2 Characteristics of Mobile Databases
2.2.3 Wireless Networks and Databases
2.3 Web Databses 24
2.4 Accessing Databases on different DBMSs 28
2.4.1 Open Database Connectivity (ODBC)
2.4.2 Java Database Connectivity (JDBC)
2.5 Digital Libraries 32
2.6 Data Grid 33
2.7 Summary 35
2.8 Solutions/Answers 36
2.0 INTRODUCTION
Database applications have advanced along with the advancement of technology from
age old Relational Database Management Systems. Database applications have
moved to Mobile applications, Web database applications, Digital libraries and so on.
The basic issues to all advanced applications are in making up-to-date data available
to the user, in the desired format related anywhere, anytime. Such technology requires
advanced communication technologies; advanced database distribution models and
advanced hardware. In this unit, we will introduce the concepts of mobile and
personal databases, web databases and the issues concerned with such databases. In
addition, we will also discuss the concepts and issues related to Digital Libraries, Data
Grids and Wireless Communication and its relationship with Databases.
Mobile and personal databases revolve around a mobile computing environment and
focuses on the issues that are related to a mobile user. Web databases on the other
hand, are very specific to web applications. We will examine some of the basic issues
of web databases and also discuss the creation of simple web database development.
We will examine some of the concepts associated with ODBC and JDBC − the two
standards for connecting to different databases. Digital Libraries are a common source
of information and are available through the databases environment. Such libraries are
equipped to handle information dissemination, searching and reusability. Data grids
allow distributed storage of data anywhere as one major application unit. Thus, all
these technologies have a major role to play in the present information society. This
unit discusses the issues and concerns of technology with respect to these databases
system.
2.1 OBJECTIVES
After going through this unit, you should be able to:
• define the requirements of a mobile database systems;
• identify and create simple web database;
• use JDBC and ODBC in databases;
• explain the concept of digital libraries, and
• define the concept of a data grid.
21
Emerging Trends and
Example DBMS 2.2 MOBILE AND PERSONAL DATABASE
Architectures
In recent years, wireless technology has become very useful for the communication of
information. Many new applications of wireless technology are already emerging
such as electronic valets which allow electronic money anywhere, anytime, mobile
cells, mobile reporting, etc. The availability of portable computing devices that
supports wireless base a communication helps database users access relevant
information from anywhere, at anytime. Let us discuss mobile database systems and
the issue related to it in this section.
Mobile
units
Mobile
Support Wireless LAN
Stations
High Speed
Host Network-
Hosts
Wired
Mobile
Support
Wired LAN Station
Hosts
Mobile units
Please note that the basic network may be a wired network but there are possibilities
of Wireless LANs as well.
22
2.2.2 Characteristics of Mobile Databases Emerging Database
Models, Technologies
and Applications-II
The mobile environment has the following characteristics:
1) Communication Latency: Communication latency results due to wireless
transmission between the sources and the receiver. But why does this latency
occur? It is primarily due to the following reasons:
a) due to data conversion/coding into the wireless formats,
b) tracking and filtering of data on the receiver, and
c) the transmission time.
2) Intermittent wireless connectivity: Mobile stations are not always connected to the
base stations. Sometimes they may be disconnected from the network.
3) Limited battery life: The size of the battery and its life is limited. Information
communication is a major consumer of the life of the battery.
4) Changing location of the client: The wireless client is expected to move from a
present mobile support station to an other mobile station where the device has
been moved. Thus, in general, the topology of such networks will keep on
changing and the place where the data is requested also changes. This would
require implementation of dynamic routing protocols.
Because of the above characteristics the mobile database systems may have the
following features:
Very often mobile databases are designed to work offline by caching replicas of the
most recent state of the database that may be broadcast by the mobile support station.
The advantages of this scheme are:
However, the disadvantage is the inconsistency of data due to the communication gap
between the client and the server.
However, some of the challenges for mobile computing are:
(i) Scalability: As the number of stations increase, latency increases. Thus, the time
for servicing the client also increases. This results in increase in latency, thus
more problems are created for data consistency.
The solution: Do data broadcasting from many mobile stations thus, making the
most recent information available to all, thus eliminating the enough latency time.
(ii) Data Mobile problem: Client locations keeps on changing in such networks thus,
keeping track of the location of the client is important for, the data server and data
should be made available to the client from the server which is minimum latency
way from the client.
2.2.3 Wireless Networks and Databases
A mobile support station covers a large geographic area and will service all the mobile
hosts that are available in that wireless geographical area − sometimes referred to as
the cell. A mobile host traverses many mobile zones or cells, requiring its information
to be passed from one cell to another, not necessarily adjacent, as there is an
overlapping of cells. However, with the availability of wireless LANS now, mobile
hosts that are in some LAN areas may be connected through the wireless LAN rather
than a wide area cellular network. Thus, reducing the cost of communication as well
as overheads of the cell movement of the host.
23
Emerging Trends and With wireless LAN technology it is now possible that some of the local mobile hosts
Example DBMS may communicate directly with each other without the mobile support station.
Architectures
However, please note that such a communication should be based on some standard
protocol/technology. Fortunately, the Blue tooth standard is available. This standard
allows wireless connectivity on short ranges (10-50 metres) and has nearly 1
megabytes per second speed, thus allowing easy use of PDA mobile phones and
intelligent devices. Also there are wireless LAN standards such as 801.11 and 802.16.
In addition, wireless technology has improved and packet based cellular networking
has moved to the third generation or is even more advanced allowing high-speed
digital data transfer and applications. Thus, a combination of all these technologies
viz., blue tooth, wireless LANs and 3G cellular networks allows low cost
infrastructure for wireless communication of different kinds of mobile hosts.
This has opened up a great potential area for mobile applications where large, real-
time, low cost database applications can be created for areas such as, just in time
accounting, teaching, monitoring of resources and goods etc. The major advantage
here would be that the communication of real time data through these networks would
now be possible at low costs.
Major drawback of mobile databases – is the limitation of power and available size of
the display unit has found newer technologies like use of flash, power-saving disks,
low-powered and power saving displays. However, the mobile devices since are
normally smaller in size requires creation of presentation standards. One such protocol
in the areas of wireless networking is the Wireless Access Protocol (WAP).
Thus, mobile wireless networks have opened up the potential for new mobile database
applications. These applications may be integrated into very strong web-based,
intelligent, real time applications.
We are all familiar with the concept of web searching. Web searching is usually text
based and gathers hundreds or thousands of web pages together as a result of a search.
But how can we sort through these pages solve this problem? How do the web
database could help in this regard. However, this is not the basic issue for the
discussion in this section. We would like to concentrate on the second definition of a
web database.
Definition 2: A web database is a database that can be accessed through the web.
This definition actually defines a web application with the database as the backend.
Let us discuss more about web database application systems.
Let us show the process of creating a web database. We would need to follow the
following steps to create a student database and to make it accessible as a web
database.
• The first step is to create a database using Ms-Access with the following
configuration:
Student-id Text (10) Primary Key
Name Text (25)
Phone Text (12)
Now, you would need to enter some meaningful data into the database and save it
with the name students.mdb.
• Put your database online by using ftp to transfer students.mdb to the web server
on which you are allowed access. Do not put the file in the same directory in
which your web site files are stored, otherwise, the entire databases may be
downloaded by an unauthorised person. In a commercial set up it may be better to
keep the data on the Database server. This database then can be connected through
a Data Source Name (DSN) to the website. Let us now build the required interface
from ASP to the Database. A simple but old method may be with connecting ASP
using ActiveX Data Object (ADO) library. This library provides ASP with the
necessary functionality for interacting with the database server.
• The first and most basic thing we need to do is to retrieve the contents of the
database for display. You can retrieve database records using ADO Recordset,
one of the objects of ADO.
Dim recordsettest
Set recordsettest = Server.CreateObject(“ADODB.Recordset”)
The commands given above create a variable (recordsettest) to store new
Recordset object using the Server object’s CreateObject method.
• Now fill this Recordset with records from the database with its Open method.
Open takes two parameters:
o the table name that contains the records to be fetched and
o the connection string for the database.
Now, the name of the table is straight forward, as we would obviously, create a table
with a name. However, the connection string is slightly more complex. Since the
ADO library is capable of connecting to many database servers and other data
sources, the string must tell Recordset not only where to find the database (the path
and file name) but also how to read the database, by giving the name of its database
provider.
25
Emerging Trends and A database provider is a software that has a very important role. It allows ADO to
Example DBMS communicate with the given type of database in a standard way. ADO the provider for
Architectures
MS-Access, SQL Server, Oracle, ODBC database servers etc. Assuming that we are
using the provider Jet.OLEDB to connect to an Access database, the connection string
would be:
Provider = Microsoft.Jet.OLEDB.version; Data Source = ~\student\student.mdb
However, since many ASP pages on the site will require such a string, it is common to
place the connection string in an application variable in the global workspace.
The code to retrieve the contents of the student table will now include only
the code for Recordset.
Dim recordsettest
Set recordsettest = Server.CreateObject(“ADODB.Recordset’’)
recordsettest.Open “Student”, Application(“db_conn”)
Thus, we have established the connection and can get information from the
MS-Access database. But another problem remains. How do we display the
results?
• Now, we can access recordsets, like of the database table, the result sets with the
data row as one database record. In our example, we have put the contents of the
student table into the recordsetrest object. An opened Recordset keeps track of the
current record. Initially the first record is the current record. A MoveNext method
of the Recordset object moves the pointer to the next record in the set, if any. An
EOF property of the Recordset will become true at the end of Recordset. Thus, to
display student id and name, you may write the following code:
• Once you have completed the task then you must be close the recordset as:
Recordsettest.Close
This command sets free the connection to the database. As these connections
may not be very large in numbers, therefore, they need not be kept open
longer than it is necessary.
Save this file on the web server. Now, test this program by storing suitable data in the
database. This application should produce the simple list of the student. However, you
can create more complex queries to data using SQL. Let us try to explain that with the
help of an example.
Example: Get the list of those students who have a phone number specified by you.
More than one student may be using use this phone number as a contact number.
Please note that in actual implementation you would need to create more complex
queries.
Please also note that the only change we need to perform in the programming given
above is, in the Open statement where we need to insert SQL command, such that on
opening the recordset only the required data from the database is transferred to it.
Thus, the code for this modified database access may be:
<html>
<head>
<title> Student Data</title>
</head>
<body>
<o1>
<%
Dim recordsettest
Set recordsettest = Server.CreateObject(“ADODB.Recordset’’)
recordsettest.Open “SELECT student-id, name FROM Student
WHERE phone = “ & Request (“phone”) ”,
Application(“db_conn”)
27
Emerging Trends and Do While Not recordsettest.EOF
Example DBMS Response.Write “<li> “& recordsettest (“students-id”) &”
Architectures
Response.Write “<p> “& recordsettest (“name”) &” </p> </li>
recordsetest(‘’name)
recordsettest.MoveNext
Loop
If recordsettest.BOF Then
Response.Write “<p>No students data in the database. </p>”
Recordsettest.Close
You can build on more complex queries. You may also refer to more advanced ASP
versions and connections. A detailed discussion on these is beyond the scope of the
unit.
• if a large number of client machines require different drivers, and DLLs are to be
connected through ODBC then it has a complex and large administration
overhead. Thus, large organisations are on the lookout for server side ODBC
technology.
The Call Level Interface (CLI) specifications of SQL are used by the ODBC as its
base. The ODBC and its applications are becoming stronger. For example, Common
Request Broker Architecture (CORBA) that is distributed object architecture, and the
Persistent Object Service (POS) – an API, are the superset of both the Call-Level
Interface and ODBC. If you need to write database application program in JAVA,
then you need Java database Connectivity (JDBC) application program interface.
From JDBC you may use a JDBC-ODBC “bridge” program to reach ODBC-
accessible database.
But how do we actually write a code using ODBC? The following steps are needed to
write a code using ODBC:
29
Emerging Trends and What is a DSN in the connection above? It is a Data Source Name that has been given
Example DBMS to a specific user connection. DSN are more secure as you have to be the user as
Architectures
defined in DSN otherwise, you will not be allowed to use the data.
The last line defines the name of an Oracle database defined in the environment file of
Oracle.
Once the connection is established through the required ODBC driver to the database
through, the user-id and password, you can write the appropriate queries using SQL.
Finally, you need to close the connection using close( ) method.
What is JDBC?
Java Database Connectivity (JDBC) provides a standard API that is used to access
databases, regardless of the DBMS, through JAVA. There are many drivers for JDBC
that support popular DBMSs. However, if no such driver exits for the DBMS that you
have selected then, you can use a driver provided by Sun Microsystems to connect to
any ODBC compliant database. This is called JDBC to ODBC Bridge. For such an
application, you may need to create, an ODBC data source for the database before,
you can access it from the Java application.
Connecting to a Database
In order to connect to a database, let us say an oracle database, the related JDBC
driver has to be loaded by the Java Virtual Machine class loader successfully.
30
Now, you can connect to the database using the driver manager class that selects the Emerging Database
Models, Technologies
appropriate driver for the database. In more complex applications, we may use and Applications-II
different drivers to connect to multiple databases. We may identify our database using
a URL, which helps in identifying the database. A JDBC URL starts with “jdbc:” that
indicates the use of JDBC protocol.
jdbc:[email protected]:2000:student
Thus, you will now be connected. Now the next step is to execute a query.
Thus, the JDBC standard allows you to handle databases through JAVA as the host
language. Please note that you can connect to any database that is ODBC compliant
through JAVA either through the specialised driver or through the JDBC – ODBC
bridge if no appropriate driver exists.
31
Emerging Trends and
Example DBMS 2.5 DIGITAL LIBRARIES
Architectures
Let us now, define a digital library.
“ A digital library is a library that allows almost the same functionality as that of a
traditional library, however, having most of its information resources in digital form
that are stored using multimedia repositories. The information to the digital library
may be through the web or intranet.”
Please note that most of the objectives above can easily be fulfilled due to the fact that
a digital library allows web access.
Now the next question is what is the cost involved in creating and maintaining Digital
Libraries?
The following are the cost factors for the creation and maintenance of digital libraries:
So far, we have seen the advantages of digital libraries, but do digital libraries have
any disadvantages?
• Searching for information: A digital library needs a very strong search facility. It
should allow search on various indexes and keyword. It should have a distributed
search mechanism to provide answers to individual users search.
32
• Content management: A digital library must have facility for the continuous Emerging Database
Models, Technologies
updating of source information. Older a information in such cases may need to be and Applications-II
archived.
• Licenses and rights management: Only appropriate/authorised users are given
access to protected information. A library needs to protect the copyright.
• All the links to further information should be thoroughly checked
• The library should store the Meta data in a proper format. One such structure for
storing the meta data is the Dublin core.
• Library information should be represented in standard data formats. One such
format may be XML. The contents may be represented in XML, HTML, PDF,
JPEG, GIF, TIFF etc.
• Very large, reliable storage technologies supporting tera or even peta bytes of
information. You may use Storage Access Networks (SAN).
• The transfer rate of information from the storage point to the computer should be
high in order to fulfil the request of many users.
• High Internet bandwidth to store many users at a time.
• A distributed array of powerful servers that may process large access requests at
the same time.
• Very reliable library software. Such software are developed on top of RDBMS to
support features of full text indexing, meta-data indexing etc.
Applications Applications
Data Grid
Structure
Placement and Data Policy manager
Virtual Database
Database Database Database
33
Emerging Trends and A data grid should address the following issues:
Example DBMS
Architectures • It should allow data security and domain specific description of data. Domain
specific data helps in the identification of correct data in response to a query.
• It should have a standard way of representing information. [The possible
candidates in this category may be XML].
• It should allow simple query language like SQL for accessing information.
• The data requirements should be fulfilled with a level of confidence, (that is there
has to be a minimum quality of service in place).
• There needs to be a different role assigned to the data administrator. The data
administrators should not be concerned with what the data represent. Data domain
specification should be the sole responsibility of the data owner. Unfortunately,
this does not happen in the present database implementations.
• The separation of the data administration and the data manager will help the
database administrator to concentrate on data replication and query performance
issues based on the DBMS statistics.
Thus, a data grid virtualises data. Many DBMSs today, have the ability to separate the
roles of the database administrator and the data owner from the application access.
This needs to be extended to the grid across a heterogeneous infrastructure.
A data grid permits a data provider with facilities that need not explicitly mention the
location and structure of current and future applications. This information can be
published in the data dictionary or Registry. A grid middleware then, is needed, to
query the registry in order to locate this information for the applications. An
administrator in a data grid is primarily concerned with the infrastructure and its
optimum uses and providing the required qualities of service. Some of the things that
the administrator is allowed to do based on the statistics stored in the data dictionary
includes, - enabling replication, failure recovery, partitioning, changing the
infrastructure etc.
34
feature here is that the hospital is in complete control of its data. It can change, hide, Emerging Database
Models, Technologies
and secure any part of its own database while, participating in the data grid federation. and Applications-II
Thus, a virtual medical database that is partitioned across hospitals can be made.
Please note that a query can be run across the whole set of hospitals and can retrieve
consolidated results. A query may not need to retrieve data from all the hospitals
participating in the grid to get significant information, for example, a query about the
symptoms of a disease that has been answered by 70 percent of the hospitals can
return meaningful results. However, on the other hand some queries would require all
the data grid members to participate. For instance, a query to find the patient's
complete medical record wherever its parts are stored, would require all the hospitals
to answer.
Some of the major requirements of a data grid are:
• Handling of failure of a data source: A grid may have replicas and caches, but
grid applications tend to access a data resource. But needs to be happens when
this data source fails? A data grid flexible and it should have a middleware that
automatically moves the operations to either another data resource with a similar
data set or to multiple data resources each with a subset of the data.
• Parallel access for local participant: Some organisations may have very large
data, thus, a query on that would take longer and such a query could be beyond
the performance criteria set for the grid. Therefore, it may be a good idea to use a
"virtual distributed" database to fetch data in parallels and keep processing it.
• Global access through SQL: A data grid would require dynamic selection of data
sources. Therefore, it requires a complete SQL query transformation and
optimisation capability.
• A data grid application may need to access data sources such as content
management systems, Excel files, or databases not yet supported.
) Check Your Progress 3
1) What are the basic functions of a digital library?
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
2) What are the advantages of Data Grid?
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
3) What are different types of hardware and software required for digital libraries?
………………………………………………………………………………………
………………………………………………………………………………………
………………………………………………………………………………………
2.7 SUMMARY
This unit introduced you to several concepts related to emerging database
applications. This unit also provides insight into some of the practical issues on
connecting to databases from any DBMS or using JAVA, as well as analysed simple
applications related to web database.
35
Emerging Trends and The unit introduces the requirements of mobile databases. Although most of the
Example DBMS mobile applications in the past work on data broadcasting, however, this may change
Architectures
in the era when new wireless LAN technologies are available. This may give rise to
new real time database applications. The ODBC and JDBC are two standards that can
be used to connect to any database of any DBMS on any operating system and JAVA
respectively. A web database may be the backend for the browser front end or the
browser – application server tiers. However, it basically provides access to the data
through the web. Digital libraries are a very useful application that has emerged in
recent years. Digital libraries follow some of the meta-data and storage standards.
The digital libraries also support distributed search. A Data grid is a virtual database
created for the purpose of information sharing. The grid is loosely controlled.
2.8 SOLUTIONS/ANSWERS
Check Your Progress 1
2) Wireless LANs may allow low cost communication between two mobile units that
are located in the same LAN area. Thus, it may result in reduced cost of operation.
1) ODBC allows using standard SQL commands to be used on any database on any
DBMS. It gives application designer freedom from learning the features of
individual DBMS, or OS etc. Thus, it simplifies the task of database
programmers.
3) JDBC is an API that allows Java programmers to access any database through the
set of this standard API. In case a JDBC driver is not available for a DBMS then
the ODBC-JDBC bridge can be used to access data.
36
Check Your Progress 3 Emerging Database
Models, Technologies
and Applications-II
1) A digital library supports
• Content management
• Search
• License management
• Link management
• Meta data storage
37
Emerging Trends and
Example DBMS
Architectures
UNIT 3 POSTGRESQL
Structure Page Nos.
3.0 Introduction 38
3.1 Objectives 38
3.2 Important Features 38
3.3 PostgreSQL Architectural Concepts 39
3.4 User Interfaces 41
3.5 SQL Variation and Extensions 43
3.6 Transaction Management 44
3.7 Storage and Indexing 46
3.8 Query Processing and Evaluation 47
3.9 Summary 49
3.10 Solutions/Answers 49
3.0 INTRODUCTION
PostgreSQL is an open-source object relational DBMS (ORDBMS). This DBMS was
developed by the academic world, thus it has roots in Academia. It was first
developed as a database called Postgres (developed at UC Berkley in the early 80s). It
was officially called PostgreSQL around 1996 mostly, to reflect the added ANSI SQL
compliant translator. It is one of the most feature-rich robust open-source database. In
this unit we will discuss the features of this DBMS. Some of the topics that are
covered in this unit include its architecture, user interface, SQL variation, transactions,
indexes etc.
3.1 OBJECTIVES
After going through this unit, you should be able to:
38
• Many high level languages and native interfaces can be used for creating user- PostgreSQL
defined database functions.
• You can use native SQL, PgSQL (postgres counterpart to Oracle PL/SQL or MS
SQL Server’s/Sybase TransactSQL), Java, C, C++, and Perl.
• Inheritance of table structures - this is probably one of the rarely used useful
features.
• Built-in complex data types such as IP Address, Geometries (Points, lines,
circles), arrays as a database field type and ability to define your own data types
with properties, operators and functions on these user-defined data types.
• Ability to define Aggregate functions.
• Concept of collections and sequences.
• Support for multiple operating systems like Linux, Windows, Unix, Mac.
• It may be considered to be one of the important databases for implementing Web
based applications. This is because of the fact that it is fast and feature rich.
PostgreSQL is a reasonably fast database with proper support for web languages
such as PHP, Perl. It also supports the ODBC and JDBC drivers making it easily
usable in other languages such as ASP, ASP.Net and Java. It is often compared
with MySQL - one of the fastest databases on the web (open source or non). Its
querying speed is in line with MySQL. In terms of features though PostgreSQL is
definitely a database to take a second look.
Client
Application
interacts
through client Shared Disk
Postmaster Disk Buffers
interface library
Initial Buffers
Connection and
Authentication
Backend Shared
Client
Server Tables
Application Disk
Processes
interacts Storage
through client Queries
interface library and
Results
39
Emerging Trends and A session on PostgresSQL database consists of the following co-operating processes:
Example DBMS
Architectures • A supervisory daemon process (also referred to as postmaster),
• The front-end user application process (e.g., the psql program), and
• One or more backend database server processes (the PostGres process itself).
The single postmaster process manages the collection of a database (also called an
installation or site) on a single host machine. The client applications that want access
to a database stored at a particular installation make calls to the client interface library.
The library sends user requests over the network to the postmaster, in turn starts a new
backend server process. The postmaster then connects the client process to the new
server. Exam this point onwards, the client process and the backend server process
communicate with each other without any intervention on the part of the postmaster.
Thus, the postmaster process is always running – waiting for requests from the new
clients. Please note, the client and server processes will be created and destroyed over
a period of time as the need arises.
Can a client process, make multiple connections to a backend server process? The
libpq library allows a single client to make multiple connections to backend server
processes. However, please note, that these client processes are not multi-threaded
processes. At present the multithreaded front-end/backend connections are not
supported by libpq. This client server or font-end/back-end combination of processes
on different machines files that can be accessed on a client machine that permits may
not be accessible (or may only be accessed using a different filename) on the database
server machine.
Please, also note that the postmaster and postgres servers run with the user-id of the
Postgres superuser or the administrator. Please also note, that the Postgres superuser
does not have to be a special user and also that the Postgres superuser should
definitely not be the UNIX superuser (called root). All the files related to database
belong to this Postgres superuser.
40
PostgreSQL
3.4 USER INTERFACES
Having discussed the basic architecture of the PostgreSQL, the question that now
arises is, how does one access databases? PostgreSQL have the following interfaces
for the access of information:
• Postgres terminal monitor programs (e.g. psql): It is a SQL command level
interface that allows you to enter, edit, and execute SQL commands
interactively.
• Programming Interface: You can write a C program using the LIBPQ
subroutine library. This allows you to submit SQL commands from the host
language - C and get responses and status messages back to your program.
But how would you be referring to this interface? To do so, you need to install
PostgreSQL on your machine. Let us briefly point out some facts about the
installation of PostgreSQL.
Installing Postgres on Your machine: Since, Postgres is a client/server DBMS,
therefore, as a user, you need the client’s portion of the installation (an example of a
client application interface is the interactive monitor psql). One of the common
directory where PostGres may be installed on Unix machines is /usr/local/pgsql.
Therefore, we will assume that Postgres has been installed in the
directory/usr/local/pgsql. If you have installed PostGres in a different directory then
you should substitute this directory name with the name of that directory. All
PostGres commands are installed in the directory /usr/local/pgsql/bin. Therefore, you
need to add this directory to your shell command path in Unix.
For example, on the Berkeley C shell or its variants such as csh or tcsh, you need to
add:
% set path = ( /usr/local/pgsql/bin path )
On the Bourne shell or its variants such as sh, ksh, or bash, you need to add:
% PATH=/usr/local/pgsql/bin PATH
% export PATH
Other Interfaces
Some other user interfaces that are available for the PostGres are:
pgAdmin 3 from http://www.pgadmin.org for Windows/Linux/BSD/nix
(Experimental Mac OS-X port). This interface was released under the Artistic License
is a complete PostgreSQL administration interface. It is somewhat similar to
Microsoft’s Enterprise Manager and written in C++ and wxWindows. It allows
administration of almost all database objects and ad-hoc queries.
41
Emerging Trends and Starting the Interactive Monitor (psql)
Example DBMS
Architectures You can process an application from a client if:
• the site administrator has properly started the postmaster process, and
• you are authorised to use the database with the proper user id and password.
As of Postgres v6.3, two different styles of connections are supported. These are:
It is because of the fact that either the postmaster is not running, or you are attempting
to connect to the wrong server host. Similarly, the following error message means that
the site administrator has started the postmaster as the wrong user.
Accessing a Database
Once you have a valid account then the next thing is to start accessing the database.
To access the database with PostGres mydb database you can use the command:
% psql mydb
The prompt indicates that the terminal monitor is ready for your SQL queries. These
queries need to be input into a workspace maintained by the terminal monitor. The
psql program also responds to escape codes (you must have used them while
programming in C) that begin with the backslash character, “\”. For example, you can
get help on the syntax of PostGres SQL commands by typing:
mydb=> \h
42
Once you have completed the query you can pass the contents of the workspace to the PostgreSQL
Postgres server by typing:
mydb=> \g
This indicates to the server that it may process the query. In case you terminate the
query with a semicolon, the “\g” is not needed. psql automatically processes the
queries that are terminated by a semicolon.
You can store your queries in a file. To read your queries from such a file you may
type:
mydb=> \i filename
White space (i.e., spaces, tabs and new line characters) may be used in SQL queries.
You can also enter comments. Single-line comments are denoted by “--”. Multiple-
line comments, and comments within a line, are denoted by “/* ... */”.
• It supports triggers. It also allows creation of functions which can be stored and
executed on the server.
Assume that the query above has modified first 200 records and is in the process of
modifying 201st record out of the 2000 records. Suppose a user terminates the query at
this moment by resetting the computer, then, on the restart of the database, the
recovery mechanism will make sure that none of the records of the student is
modified. The query is required to be run again to make the desired update of marks.
Thus, the PostGres has made sure that the query causes no recovery or integrity
related problems.
This is a very useful feature of this DBMS. Suppose you were executing a query to
increase the salary of employees of your organsiation by Rs.500 and there is a power
failure during the update procedure. Without transactions support, the query may have
updated records of some of the persons, but not all. It would be difficult to know
where the UPDATE failed. You would like to know: “Which records were updated,
and which ones were not?” You cannot simply re-execute the query, because some
people who may have already received their Rs. 500 increment would also get another
increase by Rs. 500/-. With the transactions mechanism in place, you need not bother
about it for when the DBMS starts again, first it will recover from the failure thus,
undoing any update to the data. Thus, you can simply re-execute the query.
44
Multistatement Transactions PostgreSQL
By default in PostGres each SQL query runs in its own transaction. For example,
consider two identical queries:
mydb=> INSERT INTO table1 VALUES (1);
INSERT 1000 1
OR
mydb=> BEGIN WORK;
BEGIN
mydb=> INSERT INTO table1 VALUES (1);
INSERT 1000 1
mydb=> COMMIT WORK;
COMMIT
The former is a typical INSERT query. Before PostgreSQL starts the INSERT it
automatically begins a transaction. It performs the INSERT, and then commits the
transaction. This step occurs automatically for any query with no explicit transaction.
However, in the second version, the INSERT uses explicit transaction statements.
BEGIN WORK starts the transaction, and COMMIT WORK commits the transaction.
Both the queries results in same database state, the only difference being the implied
BEGIN WORK...COMMIT WORK statements. However, the real utility of these
transactions related statements can be seen in the ability to club multiple queries into a
single transaction. In such a case, either all the queries will execute to completion or
none at all. For example, in the following transaction either both INSERTs will
succeed or neither.
mydb=> BEGIN WORK;
BEGIN
mydb=> INSERT INTO table1 VALUES (1);
INSERT 1000 1
mydb=> INSERT INTO table1 VALUES (2);
INSERT 2000 1
mydb=> COMMIT WORK;
COMMIT
• Read uncommitted
• Read committed
• Repeatable Read, and
• Serialisable.
45
Emerging Trends and 3) State True or False
Example DBMS
Architectures a) PostgreSQL is fully compliant with SQL 99 standard.
Thus, any record would have attribute values for the system-defined columns as well
as the user-defined columns of a table. The following table lists the system columns.
If the database creator does not create a primary key explicitly, it would become
difficult to distinguish between two records with identical column values. To avoid
such a situation PostgreSQL appends every record with its own object identifier
number, or OID, which is unique to that table. Thus, no two records in the same table
will ever have the same OID, which, also mean that no two records are identical in a
table. The oid makes sure of this.
46
Internally, PostgreSQL stores data in operating system files. Each table has its own PostgreSQL
file, and data records are stored in a sequence in the file. You can create an index on
the database. An index is stored as a separate file that is sorted on one or more
columns as desired by the user. Let us discuss indexes in more details.
Indexes
Indexes allow fast retrieval of specific rows from a table. For a large table using an
index, finding a specific row takes fractions of a second while non-indexed entries
will require more time to process the same information. PostgreSQL does not create
indexes automatically. Indexes are user defined for attributes or columns that are
frequently used for retrieving information.
Although you can create many indexes they should justify the benefits they provide
for retrieval of data from the database. Please note that an index adds on overhead in
terms of disk space, and performance as a record update may also require an index
update. You can also create an index on multiple columns. Such multi-column indexes
are sorted by the first indexed column and then the second indexed column.
• B-Tree Indexes: These are the default type index. These are useful for
comparison and range queries.
• Hash Indexes: This index uses linear hashing. Such indexes are not preferred in
comparison with B-tree indexes.
• R-Tree indexes: Such index are created on built-in spatial data types such as
box, circle for determining operations like overlap etc.
• GiST Indexes: These indexes are created using Generalised search trees. Such
indexes are useful for full text indexing, and are thus useful for retrieving
information.
• The parser at the server checks the syntax of the query received from the client
and creates a query tree.
• The rewrite system takes the query tree created by the parser as the input, and
selects the rules stored in the system catalogues that may apply to the query
tree. It then performs the transformation given as per the rules. It also rewrites
any query made against a view to a query accessing the base tables.
• The planner/optimiser takes the (rewritten) query tree and creates a query plan
that forms the input to the executor. It creates a structured list of all the possible
paths leading to the same result. Finally, the cost for the execution of each path
is estimated and the cheapest path is chosen. This cheapest path is expanded
into a complete query evaluation plan that the executor can use.
47
Emerging Trends and • The executor recursively steps through the query evaluation plan tree supplied
Example DBMS by the planner and creates the desired output.
Architectures
Complex queries could be, SELECT, return columns of data or specify columns that
need to be modified like INSERT and UPDATE. The references of these columns are
converted to TargetEntry entries that may, be linked together to create the target list
of the query. The target list is stored in Query.targetList.
Now, the Query is modified for the desired VIEWS or implementation of RULES that
may apply to the query.
The optimiser then, creates the query execution plan based on the Query structure and
the operations to be performed in order to execute the query. The Plan is passed to the
executor for execution, and the results are returned to the client.
Query Optimisation
The task of the planner/optimizer is to create an optimal execution plan out of the
available alternatives. The query tree of a given SQL query can be actually executed
in a wide variety of different ways, each of which essentially producing the same set
of results. It is not possible for the query optimiser to examine each of these possible
execution plans to choose the execution plan that is expected to run the fastest. Thus,
the optimiser must find a reasonable (non optimal) query plan in the predefined time
and space complexity. Here a Genetic query optimiser is used by the PostgreSQL.
After the cheapest path is determined, a full-fledged plan tree is built in order to pass
it to the executor. This represents the desired execution plan in sufficient detail for the
executor to run it.
48
3) What are the different steps for query evaluation? PostgreSQL
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
3.9 SUMMARY
This unit provided an introduction to some of the basic features of the PostgreSQL
DBMS. This is very suitable example of a DBMS available as Open Source software.
This database provides most of the basic features of the relational database
management systems. It also supports some of the features of object oriented
programming like inheritance, complex data types, declarations of functions and
polymorphism. Thus, it is in the category of object relational DBMS. PostgreSQL has
features such as, the client server architecture with the Postmaster, server and client as
the main processes. It supports all the basic features of SQL 92 and many more
features recommended by SQL 99. PostgreSQL treats even single SQL queries as
transaction implicitly. This helps in maintaining the integrity of the database at all
times. PostgreSQL places many system related attributes in the tables defined by the
users. Such attributes help the database with tasks that may be useful for indexing and
linking of records. It also supports different types of indexes, which includes B-Tree,
Hash, R-Tree and GiST types of indexes. Thus, making it suitable for many different
types of application like spatial, multi-dimensional database applications. It has a
standard process for query evaluation and optimisation.
3.10 SOLUTIONS/ANSWERS
Check Your Progress 1
1) PostgreSQL supports the following features:
• Postmaster
• Server process
• Client processes
3) PostgreSQL supports both the terminal monitor interfaces and program driven
interfaces.
2) Each SQL statement is treated as a transaction. It also has provision for multi-
statement transactions.
50
Oracle
UNIT 4 ORACLE
Structure Page Nos.
4.0 Introduction 51
4.1 Objectives 52
4.2 Database Application Development Features 52
4.2.1 Database Programming
4.2.2 Database Extensibility
4.3 Database Design and Querying Tools 57
4.4 Overview of Oracle Architecture 50
4.4.1 Physical Database Structures
4.4.2 Logical Database Structures
4.4.3 Schemas and Common Schema Objects
4.4.4 Oracle Data Dictionary
4.4.5 Oracle Instance
4.4.6 Oracle Background Processes
4.4.7 How Oracle Works?
4.5 Query Processing and Optimisation 71
4.6 Distributed Oracle 75
4.6.1 Distributed Queries and Transactions
4.6.2 Heterogenous Services
4.7 Data Movement in Oracle 76
4.7.1 Basic Replication
4.7.2 Advanced Replication
4.7.3 Transportable Tablespaces
4.7.4 Advanced Queuing and Streams
4.7.5 Extraction, Transformation and Loading
4.8 Database Administration Tools 77
4.9 Backup and Recovery in Oracle 78
4.10 Oracle Lite 79
4.11 Scalability and Performance Features of Oracle 79
4.11.1 Concurrency
4.11.2 Read Consistency
4.11.3 Locking Mechanisms
4.11.4 Real Application Clusters
4.11.5 Portability
4.12 Oracle DataWarehousing 83
4.12.1 Extraction, Transformation and Loading (ETL)
4.12.2 Materialised Views
4.12.3 Bitmap Indexes
4.12.4 Table Compression
4.12.5 Parallel Execution
4.12.6 Analytic SQL
4.12.7 OLAP Capabilities
4.12.8 Data Mining
4.12.9 Partitioning
4.13 Security Features of Oracle 85
4.14 Data Integrity and Triggers in Oracle 86
4.15 Transactions in Oracle 87
4.16 SQL Variations and Extensions in Oracle 88
4.17 Summary 90
4.18 Solutions/Answers 91
4.0 INTRODUCTION
The relational database concept was first described by Dr. Edgar F. Codd in an IBM
research publication titled “System R4 Relational” which in 1970. Initially, it was
unclear whether any system based on this concept could achieve commercial success.
However, if we look back there have been many products, which support most of the
features of relational database models and much more. Oracle is one such product
51
Emerging Trends and that was created by Relational Software Incorporated (RSI) in 1977. They released
Example DBMS Oracle V.2 as the world’s first relational database within a couple of years. In 1983,
Architecture RSI was renamed Oracle Corporation to avoid confusion with a competitor named
RTI. During this time, Oracle developers made a critical decision to create a portable
version of Oracle (Version 3) that could run not only on Digital VAX/VMS systems,
but also on Unix and other platforms. Since the mid-1980s, the database deployment
model has evolved from dedicated database application servers to client/servers to
Internet computing implemented with PCs and thin clients accessing database
applications via browsers – and, to the grid with Oracle Database 10g.
4.1 OBJECTIVES
After going through this unit, you should be able to:
Year Feature
Oracle Release 2—the first commercially available relational database
1979
to use SQL.
Single code base for Oracle across multiple platforms, Portable toolset,
1980-1990 Client/server Oracle relational database, CASE and 4GL toolset, Oracle
Financial Applications built on relational database.
Oracle Parallel Server on massively parallel platform, cost-based
optimiser, parallel operations including query, load, and create index,
Universal database with extended SQL via cartridges, thin client, and
application server, Oracle 8 with inclusion of object-relational and Very
Large Database (VLDB) features, Oracle8i added a new twist to the
1991-2000
Oracle database – a combination of enhancements that made the
Oracle8i database the focal point of the world of Internet (the i in 8i)
computing.
Java Virtual Machine (JVM) was added into the database and Oracle
52
Oracle
Year Feature
tools were integrated in the middle tier.
Oracle9i Database Server is generally available: Real Application
2001
Clusters; OLAP and data mining API in the database.
However, in this section we will concentrate on the tools that are used to create
applications. We have divided the discussion in this section into two categories:
database programming and database extensibility options. Later in this unit, we will
describe the Oracle Developer Suite, a set of optional tools used in Oracle Database
Server and Oracle Application Server development.
SQL
All operations on the information in an Oracle database are performed using SQL
statements. A statement must be the equivalent of a complete SQL sentence, as in:
SELECT last_name, department_id FROM employees;
Only a complete SQL statement can run successfully. A SQL statement can be
thought of as a very simple, but powerful, computer instruction. SQL statements of
Oracle are divided into the following categories.
53
Emerging Trends and Session Control Statements: These statements let a user control the properties of the
Example DBMS current session, including enabling and disabling roles and changing language
Architecture settings. The two session control statements are ALTER SESSION and SET ROLE.
System Control Statements: These statements change the properties of the Oracle
database instance. The only system control statement is ALTER SYSTEM. It lets
users change settings, such as the minimum number of shared servers, kill a session,
and perform other tasks.
Datatypes
Each attribute and constant in a SQL statement has a datatype, which is associated
with a specific storage format, constraints, and a valid range of values. When you
create a table, you must specify a datatype for each of its columns.
An object type differs from native SQL datatypes in that it is user-defined, and it
specifies both the underlying persistent data (attributes) and the related behaviours
(methods). Object types are abstractions of real-world entities, for example, purchase
orders.
Object types and related object-oriented features, such as variable-length arrays and
nested tables, provide higher-level ways to organise and access data in the database.
Underneath the object layer, data is still stored in columns and tables, but you can
work with the data in terms of real-world entities – customers and purchase orders,
that make the data meaningful. Instead of thinking in terms of columns and tables
when you query the database, you can simply select a customer.
PL/SQL
PL/SQL is Oracle’s procedural language extension to SQL. PL/SQL combines the
ease and flexibility of SQL with the procedural functionality of a structured
programming language, such as IF, THEN, WHILE, and LOOP.
54
• Data access can be controlled by stored PL/SQL code. In this case, PL/SQL Oracle
users can access data only as intended by application developers, unless another
access route is granted.
• Oracle supports PL/SQL Server Pages, so the application logic can be invoked
directly from your Web pages.
The PL/SQL program units can be defined and stored centrally in a database. Program
units are stored procedures, functions, packages, triggers, and anonymous
transactions.
Procedures and functions are sets of SQL and PL/SQL statements grouped together as
a unit to solve a specific problem or to perform a set of related tasks. They are created
and stored in compiled form in the database and can be run by a user or a database
application. Procedures and functions are identical, except that all functions always
return a single value to the user. Procedures do not return values.
Packages encapsulate and store related procedures, functions, variables, and other
constructs together as a unit in the database. They offer increased functionality (for
example, global package variables can be declared and used by any procedure in the
package). They also improve performance (for example, all objects of the package are
parsed, compiled, and loaded into memory once).
Large Objects
Interest in the use of large objects (LOBs) continues to grow, particularly for storing
non-traditional data types such as images. The Oracle database has been able to store
large objects for some time. Oracle8 added the capability to store multiple LOB
columns in each table. Oracle Database 10g essentially removes the space limitation
on large objects.
Object-Oriented Programming
Support of object structures has been included since Oracle8i to allow an object-
oriented approach to programming. For example, programmers can create user-
defined data types, complete with their own methods and attributes. Oracle’s object
support includes a feature called Object Views through which object-oriented
programs can make use of relational data already stored in the database. You can also
store objects in the database as varying arrays (VARRAYs), nested tables, or index
organised tables (IOTs).
55
Emerging Trends and Third-Generation Languages (3GLs)
Example DBMS
Architecture Programmers can interact with the Oracle database from C, C++, Java, COBOL, or
FORTRAN applications by embedding SQL in those applications. Prior to compiling
applications using a platform’s native compilers, you must run the embedded SQL
code through a precompiler. The precompiler replaces SQL statements with library
calls the native compiler can accept. Oracle provides support for this capability
through optional “programmer” precompilers for languages such as C and C++
(Pro*C) and COBOL (Pro*COBOL). More recently, Oracle added SQLJ, a
precompiler for Java that replaces SQL statements embedded in Java with calls to a
SQLJ runtime library, also written in Java.
Database Drivers
All versions of Oracle include database drivers that allow applications to access
Oracle via ODBC (the Open DataBase Connectivity standard) or JDBC (the Java
DataBase Connectivity open standard). Also available are data providers for OLE DB
and for .NET.
56
Oracle interMedia bundles additional image, audio, video, and locator functions and is Oracle
included in the database license. Oracle interMedia offer the following capabilities:
• The image portion of interMedia can store and retrieve images.
• The audio and video portions of interMedia can store and retrieve audio and
video clips, respectively.
• The locator portion of interMedia can retrieve data that includes spatial
coordinate information.
XML
Oracle added native XML data type support to the Oracle9i database and XML and
SQL interchangeability for searching. The structured XML object is held natively in
object relational storage meeting the W3C DOM specification. The XPath syntax for
searching in SQL is based on the SQLX group specifications.
SQL*Plus
SQL*Plus is an interactive and batch query tool that is installed with every Oracle
Database Server or Client installation. It has a command-line user interface, a
Windows Graphical User Interface (GUI) and the iSQL*Plus web-based user
interface.
SQL*Plus has its own commands and environment, and it provides access to the
Oracle Database. It enables you to enter and execute SQL, PL/SQL, SQL*Plus and
operating system commands to perform the following:
• format, perform calculations on, store, and print from query results,
• examine table and object definitions,
• develop and run batch scripts, and
• perform database administration.
You can use SQL*Plus to generate reports interactively, to generate reports as batch
processes, and to output the results to text file, to screen, or to HTML file for
browsing on the Internet. You can generate reports dynamically using the HTML
output facility of SQL*Plus, or using the dynamic reporting capability of iSQL*Plus
to run a script from a web page.
Oracle JDeveloper
Oracle Designer
The designer also include generators for creating applications for Oracle Developer,
HTML clients using Oracle’s Application Server, and C++. Designer can generate
applications and reverse-engineer existing applications or applications that have been
modified by developers. This capability enables a process called round-trip
engineering, in which a developer uses Designer to generate an application, modifies
the generated application, and reverse-engineers the changes back into the Designer
repository.
Oracle Discoverer
58
Oracle Discoverer Administration Edition enables administrators to set up and Oracle
maintain the Discoverer End User Layer (EUL). The purpose of this layer is to shield
business analysts using Discoverer as an ad hoc query or ROLAP tool from SQL
complexity. Wizards guide the administrator through the process of building the EUL.
In addition, administrators can put limits on resources available to analysts monitored
by the Discoverer query governor.
Oracle Portal
Oracle Portal, introduced as WebDB in 1999, provides an HTML-based tool for
developing web-enabled applications and content-driven web sites. Portal application
systems are developed and deployed in a simple browser environment. Portal includes
wizards for developing application components incorporating “servlets” and access to
other HTTP web sites. For example, Oracle Reports and Discoverer may be accessed
as servlets. Portals can be designed to be user-customisable. They are deployed to the
middle-tier Oracle Application Server.
Oracle Portal has enhanced the WebDB, with the ability to create and use portlets,
which allow a single web page to be divided into different areas that can
independently display information and interact with the user.
Datafiles
Every Oracle database has one or more physical datafiles. The datafiles contain all the
database data. The data of logical database structures, such as tables and indexes, is
physically stored in the datafiles allocated for a database.
The characteristics of datafiles are:
• a datafile can be associated with only one database,
60
• datafiles can have certain characteristics set to let them automatically extend Oracle
when the database runs out of space,
• one or more datafiles form a logical unit of database storage called a tablespace.
Data in a datafile is read, as needed, during normal database operation and stored in
the memory cache of Oracle. For example, assume that a user wants to access some
data in a table of a database. If the requested information is not already in the memory
cache for the database, then it is read from the appropriate datafiles and stored in the
memory.
Control Files
Every Oracle database has a control file. A control file contains entries that specify the
physical structure of the database. For example, it contains the following information:
• database name,
• names and locations of datafiles and redo log files, and
• time stamp of database creation.
Oracle can multiplex the control file, that is, simultaneously maintain a number of
identical control file copies, to protect against failure involving the control file.
Every time an instance of an Oracle database is started, its control file identifies the
database and redo log files that must be opened for database operation to proceed. If
the physical makeup of the database is altered (for example, if a new datafile or redo
log file is created), then the control file is automatically modified by Oracle to reflect
the change. A control file is also used in database recovery.
The primary function of the redo log is to record all changes made to the data. If a
failure prevents modified data from being permanently written to the datafiles, then
the changes can be obtained from the redo log, so work is never lost. To protect
against a failure involving the redo log itself, Oracle allows a multiplexed redo log so
that two or more copies of the redo log can be maintained on different disks.
The information in a redo log file is used only to recover the database from a system
or media failure that prevents database data from being written to the datafiles. For
example, if an unexpected power shortage terminates database operation, then the data
in the memory cannot be written to the datafiles, and the data is lost. However, lost
data can be recovered when the database is opened, after power is restored. By
applying the information in the most recent redo log files to the database datafiles,
Oracle restores the database to the time when the power failure had occurred. The
process of applying the redo log during a recovery operation is called rolling forward.
Parameter Files
61
Emerging Trends and Parameter files contain a list of configuration parameters for that instance and
Example DBMS database.
Architecture
Oracle recommends that you create a server parameter file (SPFILE) as a dynamic
means of maintaining initialisation parameters. A server parameter file lets you store
and manage your initialisation parameters persistently in a server-side disk file.
The alert file, or alert log, is a special trace file. The alert file of a database is a
chronological log of messages and errors.
Backup Files
To restore a file is to replace it with a backup file. Typically, you restore a file when a
media failure or user error has damaged or deleted the original file.
User-managed backup and recovery requires you to actually restore backup files
before a trial recovery of the backups can be attempted/performed.
Tablespaces
A database is divided into logical storage units called tablespaces, which group related
logical structures together. For example, tablespaces commonly group together all
application objects to simplify administrative operations.
Each database is logically divided into one or more tablespaces. One or more datafiles
are explicitly created for each tablespace to physically store the data of all logical
structures in a tablespace. The combined size of the datafiles in a tablespace is the
total storage capacity of the tablespace. Every Oracle database contains a SYSTEM
tablespace and a SYSAUX tablespace. Oracle creates them automatically when the
database is created. The system default is to create a smallfile tablespace, which is the
traditional type of Oracle tablespace. The SYSTEM and SYSAUX tablespaces are
created as smallfile tablespaces.
Oracle also lets you create bigfile tablespaces up to 8 exabytes (8 million terabytes) in
size. With Oracle-managed files, bigfile tablespaces make datafiles completely
transparent for users. In other words, you can perform operations on tablespaces,
rather than on the underlying datafiles.
62
sometimes a tablespace is offline, in order to make a portion of the database Oracle
unavailable while allowing normal access to the remainder of the database. This
makes many administrative tasks easier to perform.
Extents
The next level of logical database space is an extent. An extent is a specific number of
contiguous data blocks, obtained in a single allocation, used to store a specific type of
information.
Segments
Next, is the segment, or the level of logical database storage. A segment is a set of
extents allocated for a certain logical structure. The following table describes the
different types of segments.
Segment Description
Data Each non-clustered table has a data segment. All table data is stored in
segment the extents of the data segment.
For a partitioned table, each partition has a data segment.
Each cluster has a data segment. The data of every table in the cluster is
stored in the cluster’s data segment.
Index Each index has an index segment that stores all of its data.
segment For a partitioned index, each partition has an index segment.
Temporary Temporary segments are created by Oracle when a SQL statement needs
segment a temporary database area to complete execution. When the statement
has been executed, the extents in the temporary segment are returned to
the system for future use.
Rollback If you are operating in automatic undo management mode, then the
segment database server manages undo space-using tablespaces. Oracle
recommends that you use automatic undo management.
Space management for these rollback segments was complex, and not
done that way. Oracle uses the undo tablespace method of managing
undo; this eliminates the complexities of managing rollback segment
space.
63
Emerging Trends and Segment Description
Example DBMS
Architecture transactions. There is only one SYSTEM rollback segment and it is
created automatically at CREATE DATABASE time and is always
brought online at instance startup. You are not required to perform any
operation to manage the SYSTEM rollback segment.
Oracle dynamically allocates space when the existing extents of a segment become
full. In other words, when the extents of a segment are full, Oracle allocates another
extent for that segment. Because extents are allocated as needed, the extents of a
segment may or may not be contiguous on a disk.
Tables
Tables are the basic unit of data storage in an Oracle database. Database tables hold all
user-accessible data. Each table has columns and rows.
Indexes
Indexes are optional structures associated with tables. Indexes can be created to
increase the performance of data retrieval. An index provides an access path to table
data.
When processing a request, Oracle may use some or all of the available indexes in
order to locate, the requested rows efficiently. Indexes are useful when applications
frequently query a table for a range of rows (for example, all employees with salaries
greater than 1000 dollars) or a specific row.
An index is automatically maintained and used by the DBMS. Changes to table data
(such as adding new rows, updating rows, or deleting rows) are automatically
incorporated into all relevant indexes with complete transparency to the users. Oracle
uses B-trees to store indexes in order to speed up data access.
An index in Oracle can be considered as an ordered list of the values divided into
block-wide ranges (leaf blocks). The end points of the ranges along with pointers to
the blocks can be stored in a search tree and a value in log(n) time for n entries could
be found. This is the basic principle behind Oracle indexes.
The following Figure 2 illustrates the structure of a B-tree index.
64
Oracle
The upper blocks (branch blocks) of a B-tree index contain index data that points to
lower-level index blocks. The lowest level index blocks (leaf blocks) contains every
indexed data value and a corresponding rowid used to locate the actual row. The leaf
blocks are doubly linked. Indexes in columns containing character data are based on
the binary values of the characters in the database character set.
If the blocks have n keys then they have n+1 pointers. The number of keys and
pointers is limited by the block size.
Leaf Blocks: All leaf blocks are at the same depth from the root branch block. Leaf
blocks store the following:
• the complete key value for every row, and
• ROWIDs of the table rows
All key and ROWID pairs are linked to their left and right siblings. They are sorted by
(key, ROWID).
Views
Views are customised presentations of data in one or more tables or other views. A
view can also be considered as a stored query. Views do not actually contain data.
Rather, they derive their data from the tables on which they are based, (referred to as
the base tables of the views). Views can be queried, updated, inserted into, and deleted
from, with some restrictions.
65
Emerging Trends and
Example DBMS Views provide an additional level of table security by restricting access to a pre-
Architecture determined set of rows and columns of a table. They also hide data complexity and
store complex queries.
Clusters
Clusters are groups of one or more tables physically stored together because they
share common columns and are often used together. Since related rows are physically
stored together, disk access time improves.
Like indexes, clusters do not affect application design. Whether a table is part of a
cluster is transparent to users and to applications. Data stored in a clustered table is
accessed by SQL in the same way as data stored in a non-clustered table.
Synonyms
A synonym is an alias for any table, view, materialised view, sequence, procedure,
function, package, type, Java class schema object, user-defined object type, or another
synonym. Because a synonym is simply an alias, it requires no storage other than its
definition in the data dictionary.
A data dictionary is created when a database is created. To accurately reflect the status
of the database at all times, the data dictionary is automatically updated by Oracle in
response to specific actions, (such as, when the structure of the database is altered).
The database relies on the data dictionary to record, verify, and conduct on-going
work. For example, during database operation, Oracle reads the data dictionary to
verify that schema objects exist and that users have proper access to them.
66
Oracle creates and uses memory structures to complete several jobs. For example, Oracle
memory stores program code being run and data shared among users. Two basic
memory structures associated with Oracle are: the system global area and the program
global area. The following subsections explain each in detail.
Users currently connected to an Oracle database share the data in the SGA. For
optimal performance, the entire SGA should be as large as possible (while still fitting
in to the real memory) to store as much data in memory as possible and to minimise
disk I/O.
The information stored in the SGA is divided into several types of memory structures,
including the database buffers, redo log buffer, and the shared pool.
67
Emerging Trends and The Program Global Area (PGA) is a memory buffer that contains data and control
Example DBMS information for a server process. A PGA is created by Oracle when a server process is
Architecture started. The information in a PGA depends on the configuration of Oracle.
The architectural features discussed in this section enables the Oracle database to
support:
• many users concurrently accessing a single database, and
• the high performance required by concurrent multiuser, multiapplication
database systems.
Oracle creates a set of background processes for each instance. The background
processes consolidate functions that would otherwise be handled by multiple Oracle
programs running for each user process. They asynchronously perform I/O and
monitor other Oracle processes to provide increased parallelism for better
performance and reliability. There are numerous background processes, and each
Oracle instance can use several background processes.
Process Architecture
A process is a “thread of control” or a mechanism in an operating system that can run
a series of steps. Some operating systems use the terms job or task. A process
generally has its own private memory area in which it runs. An Oracle database server
has two general types of processes: user processes and Oracle processes.
Oracle Processes
Oracle processes are invoked by other processes to perform functions on behalf of the
invoking process. Oracle creates server processes to handle requests from connected
user processes. A server process communicates with the user process and interacts
with Oracle to carry out requests from the associated user process. For example, if a
user queries some data not already in the database buffers of the SGA, then the
associated server process reads the proper data blocks from the datafiles into the SGA.
Oracle can be configured to vary the number of user processes for each server process.
In a dedicated server configuration, a server process handles requests for a single user
process. A shared server configuration lets many user processes share a small number
of server processes, minimising the number of server processes and maximising the
use of available system resources.
On some systems, the user and server processes are separate, while on others they are
combined into a single process. If a system uses the shared server or if the user and
server processes run on different machines, then the user and server processes must be
separate. Client/server systems separate the user and server processes and run them on
different machines.
68
Oracle
Let us discuss a few important process briefly next.
• If Oracle needs to read blocks requested by users into the cache and there is no
free space in the buffer cache. The blocks written out are the least recently used
blocks. Writing blocks in this order minimises the performance impact of losing
them from the buffer cache.
The Process Monitor process watches over the user processes that access the database.
If a user process terminates abnormally, PMON is responsible for cleaning up any of
the resources left behind (such as memory) and for releasing any locks held by the
failed process.
Archiver (ARC)
The Archiver process reads the redo log files once Oracle has filled them and writes a
copy of the used redo log files to the specified archive log destination(s).
69
Emerging Trends and The Checkpoint process works with DBWR to perform checkpoints. CKPT updates
Example DBMS the control file and database file headers to update the checkpoint data when the
Architecture checkpoint is complete.
Recover (RECO)
The Recover process automatically cleans up failed or suspended distributed
transactions.
1) An instance has started on the computer running Oracle (often called the host or
database server).
3) The server is running the proper Oracle Net Services driver. The server detects
the connection request from the application and creates a dedicated server
process on behalf of the user process.
4) The user runs a SQL statement and commits the transaction. For example, the
user changes a name in a row of a table.
5) The server process receives the statement and checks the shared pool for any
shared SQL area that contains a similar SQL statement. If a shared SQL area is
found, then the server process checks the user’s access privileges to the
requested data, and the previously existing shared SQL area is used to process
the statement. If not, then a new shared SQL area is allocated for the statement,
so it can be parsed and processed.
6) The server process retrieves any necessary data values from the actual datafile
(table) or those stored in the SGA.
7) The server process modifies data in the system global area. The DBWn process
writes modified blocks permanently to the disk when doing so is efficient.
Since, the transaction is committed, the LGWR process immediately records the
transaction in the redo log file.
8) If the transaction is successful, then the server process sends a message across
the network to the application. If it is not successful, then an error message is
transmitted.
9) Throughout this entire procedure, the other background processes run, watching
for conditions that require intervention. In addition, the database server manages
other users’ transactions and prevents contention between transactions that
request the same data.
70
) Check Your Progress 2 Oracle
71
Emerging Trends and
Example DBMS
Architecture
Oracle parses a SQL statement only if a shared SQL area for a similar SQL statement
does not exist in the shared pool. In this case, a new-shared SQL area is allocated, and
the statement is parsed.
The parse stage includes processing requirements that need to be done only once, no
matter, how many times the statement is executed. Oracle translates each SQL
statement only once, re-executing that parsed statement during subsequent references
to the statement.
Although parsing a SQL statement validates that statement, parsing only identifies
errors that can be found before the execution of the statement. Thus, some errors
cannot be caught by parsing. For example, errors in data conversion or errors in data
(such as an attempt to enter duplicate values in a primary key) and deadlocks are all
errors or situations that can be encountered and reported only during the execution
stage.
Note: Queries are different from other types of SQL statements because, if successful,
they return data as results. Whereas, other statements simply return success or failure,
for instance, a query can return one row or thousands of rows. The results of a query
are always in tabular format, and the rows of the result are fetched (retrieved), either a
row at a time or in groups.
Several issues are related only to query processing. Queries include not only explicit
SELECT statements but also the implicit queries (subqueries) in other SQL
statements. For example, each of the following statements, require a query as a part of
its execution:
INSERT INTO table SELECT...
73
Emerging Trends and In the define stage of a query, you specify the location, size, and datatype of variables
Example DBMS defined to receive each fetched value. Oracle performs datatype conversion if
Architecture necessary.
Stage 5: Bind any Variables
At this point, Oracle knows the meaning of the SQL statement but still does not have
enough information to execute the statement. Oracle needs values for any variables
listed in the statement; for example, Oracle needs a value for DEPT_NUMBER. The
process of obtaining these values is called binding variables.
A program must specify the location (memory address) of the value. End users of
applications may be unaware that they are specifying bind variables, because the
Oracle utility can simply prompt them for a new value.
Because you specify the location (binding by reference), you need not rebind the
variable before re-execution. You can change its value and Oracle will look up the
value on each execution, using the memory address.
You must also specify a datatype and length for each value (unless they are implied or
defaulted) in order for Oracle to perform datatype conversions.
For some statements you can specify a number of executions to be performed. This is
called array processing. Given n number of executions, the bind and define locations
are assumed to be the beginning of an array of size n.
Stage 8: Fetch Rows of a Query
In the fetch stage, rows are selected and ordered (if requested by the query), and each
successive fetch retrieves another row of the result until the last row has been fetched.
Stage 9: Close the Cursor
The final stage of processing a SQL statement is closing the cursor.
Query Optimisation
An important facet of database system performance tuning is the tuning of SQL
statements. SQL tuning involves three basic steps:
• Identifying high load or top SQL statements that are responsible for a large
share of the application workload and system resources, by reviewing past SQL
execution history available in the system.
• Verifying that the execution plans produced by the query optimiser for these
statements perform reasonably.
74
• Implementing corrective actions to generate better execution plans for poorly Oracle
performing SQL statements.
These three steps are repeated until the system performance reaches a satisfactory
level or no more statements can be tuned.
A SQL statement can be executed in many different ways, such as full table scans,
index scans, nested loops, and hash joins. The Oracle query optimiser determines the
most efficient way to execute a SQL statement after considering many factors related
to the objects referenced and the conditions specified in the query. This determination
is an important step in the processing of any SQL statement and can greatly affect
execution time.
The query optimiser determines most efficient execution plan by considering available
access paths and by factoring in information based on statistics for the schema objects
(tables or indexes) accessed by the SQL statement. The query optimiser also considers
hints, which are optimisation suggestions placed in a comment in the statement.
2) The optimiser estimates the cost of each plan based on statistics in the data
dictionary for data distribution and storage characteristics of the tables, indexes,
and partitions accessed by the statement.
The cost is an estimated value proportional to the expected resource use needed
to execute the statement with a particular plan. The optimiser calculates the cost
of access paths and join orders based on the estimated computer resources,
which includes I/O, CPU, and memory.
Serial plans with higher costs take more time to execute than those with lesser
costs. When using a parallel plan, however, resource use is not directly related
to elapsed time.
3) The optimiser compares the costs of the plans and chooses the one with the
lowest cost.
76
4.7.3 Transportable Tablespaces Oracle
Oracle messaging adds the capability to develop and deploy a content-based publish
and subscribe solution using the rules engine to determine relevant subscribing
applications. As new content is published to a subscriber list, the rules on the list
determine which subscribers should receive the content. This approach means that a
single list can efficiently serve the needs of different subscriber communities.
In the first release of Oracle9i, AQ added XML support and Oracle Internet Directory
(OID) integration. This technology is leveraged in Oracle Application Interconnect
(OAI), which includes adapters to non-Oracle applications, messaging products, and
databases.
The second release of Oracle9i introduced Streams. Streams have three major
components: log-based replication for data capture, queuing for data staging, and user-
defined rules for data consumption. Oracle Database 10g includes support for change
data capture and file transfer solutions via Streams.
77
Emerging Trends and information about the Oracle environment. EM can also manage Oracle’s Application
Example DBMS Server, Collaboration Suite, and E-Business Suite.
Architecture
Prior to the Oracle8i database, the EM software was installed on Windows-based
systems. Each repository was accessible only by a single database manager at a time.
EM evolved to a Java release providing access from a browser or Windows-based
system. Multiple database administrators could then access the EM repository at the
same time.
More recently, an EM HTML console was released with Oracle9iAS. This console has
important new application performance management and configuration management
features. The HTML version supplemented the Java-based Enterprise Manager earlier
available. Enterprise Manager 10g, released with Oracle Database 10g, also comes in
Java and HTML versions. EM can be deployed in several ways: as a central console
for monitoring multiple databases leveraging agents, as a “product console” (easily
installed with each individual database), or through remote access, also known as
“studio mode”. The HTML-based console includes advanced management capabilities
for rapid installation, deployment across grids of computers, provisioning, upgrades,
and automated patching.
Oracle Enterprise Manager 10g has several additional options (sometimes called
packs) for managing the Oracle Enterprise Edition database. These options, which are
available for the HTML-based console, the Java-based console, or both, include:
• Database Diagnostics Option
• Application Server Diagnostics Option
• Database Tuning Option
• Database Change Management Option
• Database Configuration Management Option
• Application Server Configuration Management Option.
Standard management pack functionality for managing the Standard Edition is now
also available for the HTML-based console.
Recovery Manager
Typical backups include complete database backups (the most common type),
tablespace backups, datafile backups, control file backups, and archived redo log
backups. Oracle8 introduced the Recovery Manager (RMAN) for server-managed
backup and recovery of the database. Previously, Oracle’s Enterprise Backup Utility
(EBU) provided a similar solution on some platforms. However, RMAN, with its
Recovery Catalogue we stored in an Oracle database, provides a much more complete
solution. RMAN can automatically locate, back up, restore, and recover datafiles,
control files, and archived redo logs. RMAN, since Oracle9i, can restart backups,
restore and implement recovery window policies when backups expire. The Oracle
Enterprise Manager Backup Manager provides a GUI-based interface to RMAN.
Oracle Enterprise Manager 10g introduces a new improved job scheduler that can be
used with RMAN and other scripts, and that can manage automatic backups to disk.
78
Incremental Backup and Recovery Oracle
Oracle Storage Manager and Automated Disk Based Backup and Recovery
Various media-management software vendors support RMAN. Since Oracle8i, a
Storage Manager has been developed with Oracle to provide media-management
services, including the tracking of tape volumes, for up to four devices. RMAN
interfaces automatically with the media-management software to request the mounting
of tapes as needed for backup and recovery operations.
Oracle Database 10g introduces Automated Disk Based Backup and Recovery. The
disk acts as a cache, and archives and backups can then be copied to tape. The disk
“cache” can also serve as a staging area for recovery.
Although the Oracle Lite Database engine runs on a much smaller platform than other
Oracle implementations (it requires a 50K to 1 MB footprint depending on the
platform), Mobile SQL, C++, and Java-based applications can run against the
database. ODBC is therefore, supported. Java support includes Java stored procedures
and JDBC. The database is self-tuning and self-administering. In addition to
Windows-based laptops, Oracle Lite also supports for handheld devices running on
WindowsCE, Palm’s Computing Platform, and Symbian EPOC.
• Data must be read and modified in a consistent fashion. The data a user is
viewing or changing is not changed (by other users) until the user is finished
with the data.
• High performance is required for maximum productivity from the many users
of the database system.
4.11.1 Concurrency
A primary concern of a multiuser database management system is controlling
concurrency, which is the simultaneous access of the same data by many users.
Without adequate concurrency controls, data could be updated or changed improperly,
compromising data integrity.
79
Emerging Trends and One way to manage data concurrency is to make each user wait for a turn. The goal of
Example DBMS a database management system is to reduce that wait so it is either nonexistent or
Architecture negligible to each user. All data manipulation language statements should proceed
with as little interference as possible, and undesirable interactions among concurrent
transactions must be prevented. Neither performance nor data integrity can be
compromised.
Oracle resolves such issues by using various types of locks and a multiversion
consistency model. These features are based on the concept of a transaction. It is the
application designer’s responsibility to ensure that transactions fully exploit these
concurrency and consistency features.
• Guarantees that the set of data seen by a statement is consistent with respect to a
single point in time and does not change during statement execution (statement-
level read consistency).
• Ensures that readers of database data do not wait for writers or other readers of
the same data.
• Ensures that writers of database data do not wait for readers of the same data.
• Ensures that writers only wait for other writers if they attempt to update
identical rows in concurrent transactions.
Only when a transaction is committed are the changes of the transaction made
permanent. Statements that start after the user’s transaction is committed only see the
changes made by the committed transaction.
Transaction is the key to Oracle’s strategy for providing read consistency. This unit of
committed (or uncommitted) SQL statements dictates the start point for read-
consistent views generated on behalf of readers and controls modified data can be
seen by other transactions of the time span when, database for reading or updating.
80
Oracle also uses locks to control concurrent access to data. When updating Oracle
information, the data server holds that information with a lock, until, the update is
submitted or committed. Until that happens, no one else can make changes to the
locked information. This ensures the data integrity of the system.
Oracle provides unique non-escalating row-level locking. Unlike other data servers
that “escalate” locks to cover entire groups of rows or even the entire table, Oracle
always locks only the row of information being updated. Because Oracle includes the
locking information with the actual rows themselves, Oracle can lock an unlimited
number of rows so users can work concurrently without unnecessary delays.
Automatic Locking
Oracle locking is performed automatically and requires no user action. Implicit
locking occurs for SQL statements as necessary, depending on the action requested.
Oracle’s lock manager automatically locks table data at the row level. By locking
table data at the row level, contention for the same data is minimised.
Oracle’s lock manager maintains several different types of row locks, depending on
the type of operation that established the lock. The two general types of locks are:
exclusive locks and share locks. Only one exclusive lock can be placed on a resource
(such as a row or a table); however, many share locks can be placed on a single
resource. Both exclusive and share locks always allow queries on the locked resource
but prohibit other activity on the resource (such as updates and deletes).
Manual Locking
Under some circumstances, a user might want to override default locking. Oracle
allows manual override of automatic locking features at both the row level (by first
querying for the rows that will be updated in a subsequent statement) and the table
level.
You can scale applications in RAC environments to meet increasing data processing
demands without changing the application code. As you add resources such as nodes
or storage, RAC extends the processing powers of these resources beyond the limits of
the individual components.
4.11.5 Portability
Oracle provides unique portability across all major platforms and ensures that your
applications run without modification after changing platforms. This is because the
Oracle code base is identical across platforms, so you have identical feature
functionality across all platforms, for complete application transparency. Because of
this portability, you can easily upgrade to a more powerful server as your
requirements change.
82
In addition to a relational database, a data warehouse environment includes an Oracle
extraction, transportation, transformation, and loading (ETL) solution, an online
analytical processing (OLAP) engine, client analysis tools, and other applications that
manage the process of gathering data and delivering it to business users.
Materialised views are stored in the same database as their base tables can improve
query performance through query rewrites. Query rewrites are particularly useful in a
data warehouse environment.
Fully indexing a large table with a traditional B-tree index can be prohibitively
expensive in terms of space because the indexes can be several times larger than the
data in the table. Bitmap indexes are typically only a fraction of the size of the
indexed data in the table.
Oracle OLAP provides the query performance and calculation capability, previously
found only in multidimensional databases to Oracle’s relational platform. In addition,
it provides a Java OLAP API that is appropriate for the development of internet-ready
analytical applications. Unlike other combinations of OLAP and RDBMS technology,
Oracle OLAP is not a multidimensional database using bridges to move data from the
relational data store to a multidimensional data store. Instead, it is truly an OLAP-
enabled relational database. As a result, Oracle provides the benefits of a
multidimensional database along with the scalability, accessibility, security,
manageability, and high availability of the Oracle database. The Java OLAP API,
which is specifically designed for internet-based analytical applications, offers
productive data access.
4.12.9 Partitioning
Partitioning addresses key issues in supporting very large tables and indexes by letting
you decompose them into smaller and more manageable pieces called partitions. SQL
queries and DML statements do not need to be modified in order to access partitioned
tables. However, after partitions are defined, DDL statements can access and
manipulate individual partitions rather than entire tables or indexes. This is how
partitioning can simplify the manageability of large database objects. Also,
partitioning is entirely transparent to applications.
Associated with each database user is a schema by the same name. By default, each
database user creates and has access to all objects in the corresponding schema.
Database security can be classified into two categories: system security and data
security.
System security includes mechanisms that control the access and use of the database
at the system level. For example, system security includes:
• valid user name/password combinations,
• the amount of disk space available to a user’s schema objects, and
• the resource limits for a user.
Data security includes mechanisms that control the access and use of the database at
the schema object level. For example, data security includes:
• users with access to a specific schema object and the specific types of actions
permitted to each user on the schema object (for example, user MOHAN can
issue SELECT and INSERT statements but not DELETE statements using the
employees table),
• the actions, if any, that are audited for each schema object, and
• data encryption to prevent unauthorised users from bypassing Oracle and
accessing the data.
Security Mechanisms
The Oracle database provides discretionary access control, which is a means of
restricting access to information based on privileges. The appropriate privilege must
be assigned to a user in order for that user to access a schema object. Appropriately
privileged users can grant other users privileges at their discretion.
85
Emerging Trends and application. Oracle provides integrity constraints and database triggers to manage data
Example DBMS integrity rules.
Architecture
Database triggers let you define and enforce integrity rules, but a database trigger is
not the same as an integrity constraint. Among other things, a database trigger does
not check data already loaded into a table. Therefore, it is strongly recommended that
you use database triggers only when the integrity rule cannot be enforced by integrity
constraints.
Integrity Constraints
An integrity constraint is a declarative way to define a business rule for a column of a
table. An integrity constraint is a statement about table data that is always true and
that follows these rules:
• If an integrity constraint is created for a table and some existing table data does
not satisfy the constraint, then the constraint cannot be enforced.
Keys
Key is used to define several types of integrity constraints. A key is the column or set
of columns included in the definition of certain types of integrity constraints. Keys
describe the relationships between the different tables and columns of a relational
database. Individual values in a key are called key values.
The different types of keys include:
• Primary key: The column or set of columns included in the definition of a
table’s PRIMARY KEY constraint. A primary key’s value uniquely identifies
the rows in a table. Only one primary key can be defined for each table.
• Referenced key: The unique key or primary key of the same or a different table
referenced by a foreign key.
86
Triggers Oracle
Triggers are procedures written in PL/SQL, Java, or C that run (fire) implicitly,
whenever a table or view is modified or when some user actions or database system
action occurs.
Transactions let users guarantee consistent changes to data, as long as the SQL
statements within a transaction are grouped logically. A transaction should consist of
all necessary parts for one logical unit of work – no more and no less. Data in all
referenced tables are in a consistent state before the transaction begins and after it
ends. Transactions should consist of only SQL statements that make one consistent
change to the data.
Consider a banking database. When a bank customer transfers money from a savings
account to a checking account, the transaction can consist of three separate operations:
decrease in the savings account, increase in the checking account, and recording the
transaction in the transaction journal.
The transfer of funds (the transaction) includes increasing one account (one SQL
statement), decreasing another account (one SQL statement), and recording the
transaction in the journal (one SQL statement). All actions should either fail or
succeed together; the credit should not be committed without the debit. Other non-
related actions, such as a new deposit to one account, should not be included in the
transfer of funds transaction. Such statements should be in other transactions.
Oracle must guarantee that all three SQL statements are performed to maintain
accounts accurately. When something prevents one of the statements in the transaction
from running (such as a hardware failure), then the other statements of the transaction
must be undone. This is called rolling back. If an error occurs in making any of the
updates, then no updates are made.
To commit, a transaction makes permanent the changes resulting from all SQL
statements in the transaction. The changes made by the SQL statements of a
transaction become visible to other user sessions’ transactions that start only after the
transaction is committed.
To undo a transaction, retracts, any of the changes resulting from the SQL statements
in the transaction. After a transaction is rolled back, the affected data is left
unchanged, as if the SQL statements in the transaction were never run.
Savepoints
87
Emerging Trends and Savepoints divide a long transaction with many SQL statements into smaller parts.
Example DBMS With savepoints, you can arbitrarily mark your work at any point within a long
Architecture transaction. This gives you the option of later rolling back all work performed from
the current point in the transaction to a declared savepoint within the transaction.
• RETURN statement.
Oracle’s triggers differ from the standard as follows:
• Oracle does not provide the optional syntax FOR EACH STATEMENT for the
default case, the statement trigger.
• Oracle does not support OLD TABLE and NEW TABLE; the transition tables
specified in the standard (the multiset of before and after images of affected
rows) are not available.
• The trigger body is written in PL/SQL, which is functionally equivalent to the
standard’s procedural language PSM, but not the same.
88
• In the trigger body, the new and old transition variables are referenced Oracle
beginning with a colon.
• Oracle’s row triggers are executed as the row is processed, instead of buffering
them and executing all of them after processing all rows. The standard’s
semantics are deterministic, but Oracle’s in-flight row triggers are more
performant.
• Oracle’s before row and before-statement triggers may perform DML
statements, which is forbidden in the standard. On the other hand, Oracle’s
after-row statements may not perform DML, while it is permitted in the
standard.
• When multiple triggers apply, the standard says they are executed in order of
definition; in Oracle the execution order is non-deterministic.
In addition to traditional structured data, Oracle is capable of storing, retrieving, and
processing more complex data.
• Object types, collection types, and REF types provide support for complex
structured data. A number of standard-compliant multiset operators are now
supported by the nested table collection type.
• Large objects (LOBs) provide support for both character and binary
unstructured data. A single LOB reach a size of 8 to 128 terabytes, depending
on the size of the database block.
• The XML datatype provides support for semistructured XML data.
) Check Your Progress 4
1) What is ETL?
……………………………………………………………………………………
……………………………………………………………………………………
2) What is Parallel Execution?
……………………………………………………………………………………
……………………………………………………………………………………
3) With Oracle …………………………………….data never leaves the database.
4.17 SUMMARY
This unit provided an introduction to Oracle, one of the commercial DBMS in the
market. Some other products include MS SQL server, DB2 by IBM and so on. Oracle
being a commecial DBMS supports the features of the Relational Model and also
some of the Object oriented models. It has a very powerful Database Application
Development Feature that is used for database programming. Oracle supports the
Standard Query Language (SQL) and the use of embedded languages including C,
JAVA etc.
The Oracle Architecture defines the physical and logical database components of the
database that is stored in the Oracle environment. All database objects are defined
using a schema definition. This information is stored in the data dictionary. This data
dictionary is used actively to find the schema objects, integrity and security
constraints on those objects etc. Oracle defines an instance as the present database
state. There are many Oracle backend processes that support various operations on
oracle database. The database is stored in the SGA area of the memory.
Oracle is relational as the basic technology supports query optimisation. Query
optimisation is needed for commercial databases, as the size of such databases may be
very high. Oracle supports indexing using the B-tree structure, in addition the bit wise
index makes indexing even faster. Oracle also supports distributed database
technology. It supports replication, transportability of these replicas and advanced
queuing and streams.
Oracle has many tools for system administration. They support basic used
management to backup and recovery tools. Oracle Lite is the version of Oracle used
for mobile database. Oracle uses standard implementation methods for concurrency
control. Oracle has Data Warehousing capabilities. It contains Extraction,
Transformation and Loading (ETL) tools, materialised views, bitmap indexes, table
compression and Data mining and OLAP in support to a Data Warehouse. Oracle
supports both system and database security features.
The unit also explained data integrity, transactions, SQL Variations and Extensions in
Oracle.
4.18 SOLUTIONS/ANSWERS
Check Your Progress 1
1) These statements create, alter, maintain and drop schemes objects.
2) Transaction Control Statements.
3) (a) Character
(b) Numeric
(c) DATE data type
(d) LOB data type
(e) RAW and LONGRAW
(f) ROWID and UNROWID data type
4) ORACLE forms Developer
ORACLE Reports Developer
ORACLE Designer
ORACLE J Developer
ORACLE Discoverer Administrative Edition
90
ORACLE Portal. Oracle
4) It allows non – oracle data and services to be accessed from an Oracle database
through generic connectivity.
5)
(1) Basic replication,
(2) Advanced replication,
(3) Transportable replication,
(4) Advanced queuing and streams, and
(5) Extraction, transformation, loading.
7)
(a) Complete database,
(b) Tablespace,
(c) Data file,
(d) Control file, and
(e) Achieved Redo Log.
91
Emerging Trends and 8) Oracle LITE.
Example DBMS
Architecture
9) Real application clusters.
92