Customizing The IDS For Your ENV
Customizing The IDS For Your ENV
Customizing the
Informix Dynamic Server
for Your Environment
An administration free zone to reduce
the cost of systems management
Chuck Ballard
Rosario Annino Santosh Sajip
Alan Caldera Vinayak Shenoi
Sergio Dias Robert Uleman
Jacques Roy Suma Vinod
ibm.com/redbooks
International Technical Support Organization
June 2008
SG24-7522-00
Note: Before using this information and the product it supports, read the information in
“Notices” on page xi.
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
The team that wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Contents v
5.6.3 Scheduling a procedure to run at regular intervals . . . . . . . . . . . . . 214
5.6.4 Viewing the task in the Open Admin Tool . . . . . . . . . . . . . . . . . . . . 216
Contents vii
9.6 Communicating with the outside world . . . . . . . . . . . . . . . . . . . . . . . . . . 392
9.6.1 Sending information to a file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
9.6.2 Misbehaved functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
9.6.3 Calling a user-defined function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
9.6.4 Sending a signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
9.6.5 Opening a network connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
9.6.6 Integrating message queues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
9.6.7 Other possibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
9.7 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
Contents ix
x Customizing the Informix Dynamic Server for Your Environment
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area.
Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product, program, or service that
does not infringe any IBM intellectual property right may be used instead. However, it is the user's
responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not give you any license to these patents. You can send license
inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer
of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may
make improvements and/or changes in the product(s) and/or the program(s) described in this publication at
any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm
the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on
the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the
sample programs are written. These examples have not been thoroughly tested under all conditions. IBM,
therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
SAP, and SAP logos are trademarks or registered trademarks of SAP AG in Germany and in several other
countries.
Oracle, JD Edwards, PeopleSoft, Siebel, and TopLink are registered trademarks of Oracle Corporation
and/or its affiliates.
Adobe, and Portable Document Format (PDF) are either registered trademarks or trademarks of Adobe
Systems Incorporated in the United States, other countries, or both.
EJB, Enterprise JavaBeans, J2EE, Java, Java runtime environment, JavaBeans, JavaSoft, JDBC, JDK, JRE,
JVM, Solaris, Sun, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United
States, other countries, or both.
Excel, Internet Explorer, Microsoft, PowerPoint, Virtual Earth, Visual Basic, Visual C++, Visual Studio,
Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
Intel, Pentium 4, Pentium, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered
trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
We certainly realize that there are many other functions, features, and
advantages of using IDS that come into play as you customize. However,
although an in-depth focus on all of them is beyond the scope of this document,
we briefly glance at some of them.
All these capabilities can result in a lower total cost of ownership (TCO). For
example, many of the typical database administrator operations are
self-managed by the IDS database, making it near hands free. The
administration activities can also be controlled within an application via the SQL
API. IDS customers report that they are using one-third or less of the staff
typically needed to manage other database products. Shortened development
cycles are also realized due to rapid deployment capabilities and the choice of
application development environments and languages.
There are also flexible choices for business continuity with replication and the
Continuous Availability (CA) Feature for shared disk cluster solutions. This
means that there is no necessity for a “one-size-fits-all” solution. You can
customize IDS to your environment.
The Continuous Availability Feature offers significant cost savings with support
for cluster solutions, providing scalability to meet growing business demands,
and failover recovery from any server to ensure continuous business operations.
It provides the ability for a secondary server to automatically take over in the
case of a system failure, accessing the same data disk.
All of this calls for a data server that is flexible and can accommodate change
and growth in applications, data volume, and numbers of users. It must also be
able to scale in performance as well as in functionality. The new suite of business
availability functionality provides greater flexibility and performance in backing up
and restoring an instance, automated statistical and performance metric
gathering, improvements in administration, and reductions in the cost to operate
the data server.
The technology used by IDS enables efficient use of existing hardware and
software, including single- and multi-processor architectures. It also helps you
keep up with technological growth, including the requirement to support complex
applications, which often calls for the use of nontraditional or rich data types that
cannot be stored in simple character or numeric form.
Built on the IBM Informix Dynamic Scalable Architecture (DSA), IDS provides
one of the most effective solutions available, including a next-generation parallel
data server architecture that delivers mainframe-caliber scalability, manageability
and performance, minimal operating system overhead, automatic distribution of
workload, and the capability to extend the server to handle new types of data.
IDS delivers proven technology that efficiently integrates new and complex data
directly into the database. It handles time-series, spatial, geodetic, Extensible
Markup Language (XML), video, image and other user-defined data side by side
with traditional data to meet today’s most rigorous data and business demands. It
also helps businesses lower their TCO by using their well-regarded general ease
of use and administration as well as its support of existing standards for
development tools and systems infrastructure. IDS is a development-neutral
environment and supports a comprehensive array of application development
tools for rapid deployment of applications under Linux®, UNIX®, and Microsoft®
Windows® operating environments.
Preface xv
Alan Caldera is currently a team lead for the IBM Informix
DataBlade development group and member of the IDS
Architecture Board. He has over 20 years of experience in
the IT industry as a software developer, database
administrator, and consultant. Alan joined the Informix
Software Professional Services Organization in 1998 as a
consultant, working with customers and business partners
on DataBlade implementations, application design,
replication, and IDS performance tuning. He holds a
bachelor degree in computer science from Indiana
University.
Thanks to the following people who have either contributed directly to the content
of this book or to its development and publication:
A special thanks to:
– Alexander Koerner for his valuable contributions to this book and ongoing
support of the ITSO. Alexander is a member of the IBM worldwide sales
enablement organization, located in Munich, Germany.
– Donald Payne for his significant contribution to this book, particularly in
the area of Informix DataBlades. Donald is an IT Specialist with the
WorldWide Enablement Center in New York, NY, USA.
– Prasad Mujumdar for his valuable contributions and technical guidance in
the area of user-defined access methods. Prasad is a senior member of
the IDS development team, located in San Jose, CA, USA.
– Keshava Murthy for his valuable contributions and technical guidance in
the area of user-defined access methods. Keshava is an IDS
SQL/Extensibility Architect, located in San Jose, CA, USA.
Preface xvii
Thanks also to the following people for their contributions to this project:
– From IBM Locations Worldwide
• Cindy Fung, Software Engineer, IDS Product Management, Menlo
Park, CA
• Pat Moffatt, Program Manager, Education Planning and Development,
Markham, Ontario, Canada
– From the International Technical Support Organization
• Mary Comianos, Publications Management
• Emma Jacobs, Graphics
• Deanna Polm, Residency Administration
Your efforts will help increase product acceptance and customer satisfaction. As
a bonus, you will develop a network of contacts in IBM development labs, and
increase your productivity and marketability.
Find out more about the residency program, browse the residency index, and
apply online at:
ibm.com/redbooks/residencies.html
Preface xix
xx Customizing the Informix Dynamic Server for Your Environment
1
With such capabilities, you can for perform the following tasks, for example:
Perform implementation and maintenance with minimal resources
Easily embed IDS in application systems
Create your own “administration free zone” (See Chapter 5, “The
administration free zone” on page 161, for details.)
Through the use of IBM Informix DataBlade technology, the capabilities of the
database can be extended to meet specific organizational requirements. All
types of data can be managed, including text, images, sound, video, time series,
All of this calls for a data server that is flexible and can accommodate change
and growth, in applications, data volume, and numbers of users. It must be able
to scale in performance and functionality. The IDS suite of business availability
functionality provides greater flexibility for backing up and restoring an instance,
automated statistical and performance metrics gathering, improvements in
administration, and reductions in the cost to operate and maintain.
The technology used by IDS enables efficient use of existing hardware and
software, including single- and multi-processor architectures. It also helps you
keep up with technological growth, including such requirements as more complex
application support, which often calls for the use of nontraditional or rich data
types that cannot be stored in simple character or numeric form.
IDS has all this, and better yet, it can all be easily customized to support you and
your particular business environment. How? Let us take a look.
From a simplistic point of view, a transaction can be defined as a unit of work that
consists of receiving input data, performing some processing on that data, and
delivering a response. An OLTP system is one that manages the execution of
these transaction applications. All of this sounds fairly straightforward, except
that, over time, expanded meanings have been given to the term OLTP, as in the
following examples:
A category of transactions, with specific characteristics
An environment for processing transactions online
A category of applications, with specific characteristics
A process with specific characteristics
A category of tools used to manage the transaction processing
A set of specific requirements for transaction completion and recovery
A specific database organization that best supports transaction processing
Processing characteristics that differentiate OLTP from other categories
Therefore, you must be familiar with all of these particulars as you discuss OLTP.
For example, the following characteristics, among others, differentiate OLTP:
Processing the data results in new, or changed, data.
The duration of processing a transaction is relatively short.
The volume of data processed in a single transaction is small.
It typically deals with large numbers of transactions.
A transaction recovery process is required so that data is not lost.
The number of database accesses during a transaction is small.
Processing time must be short, since it is performed online.
The entire transaction must be completed successfully or rolled-back.
DSS applications are more typically read-only applications and, therefore, do not
have the stringent recovery and availability requirements as with OLTP. Even if it
is an online DSS application, and it fails for any reason, it can simply be rerun
after recovery. However, a DSS application can also be run online and thus must
be system managed.
The set of characteristics and requirements for DSS can be quite different from
OLTP, as in the following examples:
The applications typically access multiple databases and servers.
The duration of the application processing is relatively long.
The volume of data processed by the application is large.
Typically fewer, longer running applications are processed.
A recovery process might not be required in a read-only environment.
The number of database accesses during a transaction is larger.
Processing time can be long, even when performed online.
Application results need to be repeatable.
There are many similar, but different, operating environments in the business
world that are required simply to handle the differing business requirements.
Throughout this book, we describe the capabilities of IDS that enable it to be
customized to support all of them. The message that we want to convey is that
IDS can be customized to support your environment.
In this book, we discuss and describe many of the IDS capabilities for
customizing your environment. However, it is not our intention to provide detailed
function and feature descriptions, but simply to discuss some of them in the
context of how and where to use them and how you can use them to help you
customize IDS to support your particular business requirements.
Secure
Adaptable
Rapid Response
Agile Fast to
Changing Needs
Flexible
Hidden
Efficient
Invisible Minimal and
Easy
Affordable
Figure 1-1 IDS capabilities positioning
Each of the categories and capabilities upon which the IDS positioning is based
is briefly described as follows:
Resilient
The business environment today is more global in nature, and the demand for
high availability of systems and applications is growing dramatically to enable
support of that environment. IDS capabilities are also growing to meet those
demands.
– Reliable
IDS offers a broad spectrum of business continuity options to protect your
data server environment. Some business situations require backup
servers without duplicating data, while others need full and independent
copies of the entire processing environment for failover and workload
balancing around the world.
– Available
IDS 11 offers flexible choices that work seamlessly together for an
availability solution to fit nearly any situation. For example, the IDS
continuous availability feature enables you to build a cluster of IDS
instances around a single set of shared storage devices. All instances
synchronize memory structures, providing nearly seamless failover
options at a fraction of the cost of having full replicas of the primary data
server environment. Properly written applications can easily leverage this
architecture for load balancing or practically uninterrupted data services,
even in the event that one or more servers fail.
That is the positioning of IDS from a high level perspective. Now we discuss
some examples of the special features of IDS:
High availability to provide the systems access and support that is required to
successfully compete in a global environment
Significant security and encryption, such as LBAC and Common Criteria
certification
Spatial and Geodetic Web services for location-based services
IDS provides significant extensibility with the DataBlade technology.
Further reduction in total cost of ownership (TCO) with improved
administration functions, which are easy, yet powerful, and can be scheduled,
or immediately executed automatically based on systems monitoring and
programmatic alerts
Advanced application development, with XML and SOA, for standardization
and reduced cost through reusable code modules
Enhanced solutions integration, an Admin API, and a customizable footprint
that results in faster and easier administration with minimum resource
requirements
A Web-based GUI administration tool called the Open Administration Tool is also
available for IDS 11. This tool uses the new features in IDS 11 to provide a simple
interface for performing the IDS administration tasks.
In this chapter, we provide a brief description of these features and how they
make administration simple and automated in IDS 11.
By extending the data server with new data types, functions, and
application-specific structures, developers build solutions that achieve the
following tasks:
Take advantage of data models that closely match the problem domain
Depart from strict relational normalization to achieve better performance
Implement powerful business logic in the data tier of the software stack
Handle new types of information
We call these solutions robust because the elegance and economy of their
architecture gives them higher performance, maintainability, responsiveness, and
flexibility in the face of changes in environment, assumptions, and requirements.
The basis for functional customization in IDS is the ability to add components,
which are packages that contain data types, functions, index methods, and
whatever else is needed to teach the data server new tricks. These components
are referred to as DataBlades, which are, in fact, extensions of the database.
For those who want to develop their database extensions in the C programming
language, IDS contains a comprehensive set of header files, public data type
structures, and public functions via the DataBlade API.
IDS has the unique capability of registering functions that will be executed when
specific events occur. An IDS event lives in the context of a database connection.
This means that the events and callback functions are specific to a user session,
which then must register the callbacks before the events are generated. IDS 11
includes new stored procedures that are executed when a database is opened or
when it is closed.
With the advent of Web services, SOA, and Web 2.0, it is evident that centralized
sources of data are a thing of the past. More and more we see disparate data
sources being joined in order to extract interesting information from the huge
volumes of data. This is also evident in the current trend of Web 2.0 applications
called mashups. The idea is to be able to integrate multiple Web services, or for
that matter, any non-relational data source, into IDS and be able to query the
data by using simple SQL statements.
1.5 Summary
We have now provided an overview of the material presented in this book. This
guide will assist you in choosing and prioritizing your reading selections. In the
remainder of this book, we discuss and describe the capabilities of IDS in a way
that will make it easier for you to see how they can be applied and enable you to
customize IDS to your specific business environment, so that you can start taking
advantage of the benefits.
We also discuss the network infrastructure and how to configure the connections
to meet your requirements.
Let us start by discussing the IDS server solutions that are available today for
your consideration.
In this section, we provide a brief description of the different server solutions that
are available and of a large number of additional products that allow Informix to
satisfy a wide range of business needs.
For more details about these products, see the Informix product family Web page
for IDS 11 at the following address:
http://www-306.ibm.com/software/data/informix/
Also visit the IBM library Web site to download the users guide:
http://www-306.ibm.com/software/data/informix/pubs/library/
In this section, we provided a general overview of the major features and tools
that are integrated with IBM Informix database servers. For more information,
see the IBM Informix library at the following links:
Informix product family page
http://www-306.ibm.com/software/data/informix/
Informix library
http://www-306.ibm.com/software/data/informix/pubs/library/
The development company also wants an installation procedure that has the
following characteristics:
Requires little space
Easily installs and configures the database with almost no interaction from the
users
Creates the same directory structure for all the users
Has a remote administration tool that can provide easy administration of all
the Informix instances
With IDS 11, the custom installation has been improved. To do so, IDS is now
divided into multiple components based on functionality. Each component is then
Important: You must choose the custom installation to access the option to
remove the IDS components.
In Figure 2-1, you can see the Informix Deployment Wizard component tree.
IDS
Server Media
Audit
IDS Deployment Wizard – Component Tree Korean
Monitoring
*GLX = Global Language Support
TSM = Tivoli Storage Manager DBLoad/
Other
ISM = Informix Storage Manager Unload Utils
Misc.
Utilities
For more information about the Installation Wizard, and IDS features and
components, see the IBM Informix Dynamic Server Installation Guide for
Microsoft Windows, G251-2776, or IBM Informix Dynamic Server Installation
Guide for UNIX and Linux, G251-2777.
All these tasks can be done, but we do not describe the details here. For more
information about the sysadmin database, refer to Chapter 5, “The administration
free zone” on page 161.
Now we show how you can create a script that performs all the administrative
tasks mentioned previously without interaction from the user by using the
following steps. The user must only execute the script from the user Informix.
1. Create a file named tailor_admin.sql.
2. Write the code as shown in Example 2-1.
The last line of Example 2-1 generates a file named command_history.txt that
contains a list of all commands that the administration API ran. You can see
the results of each of the commands. The file command_history.txt should
look much like the example in shown in Figure 2-5.
3. Execute the configuration tasks.
From user informix, open a command window and execute the following
command:
dbaccess - tailor_admin.sql
4. Check the result of the commands that have been executed.
Analyze the file command_history.txt that was generated by the script
tailor_admin.sql. The file command_history.txt should look much like the
example shown in Figure 2-5. The sixth column contains the error codes,
which we have highlighted with the ovals. In this example, the commands
executed correctly. Therefore the error codes that are displayed are zeros.
When the sixth column is different from zero, you must analyze the error and
fix it manually. For a description of the error, open an Informix command
window and execute the following command:
finderr error_number
At this point, you should be connected to the instance that you selected as shown
in Figure 2-24 on page 73. Here you can manage the instance remotely. In
addition, you can check the logs, spaces, checkpoints, message log, and much
more.
In this section, we show how IDS can be dynamically changed from supporting
one environment to supporting another environment, without a requirement to
reboot the instance, demonstrating the flexibility of IDS. This type of mixed
environment is more typically supported with medium-sized instances. For large
DSS or OLTP systems, it might be better to consider a dedicated machine for
each environment, or to reboot the instance to change the onconfig file when
moving between those environments.
Few rows read per transaction Many rows read per transaction
Fast response time (less than 2 seconds) Long response time (minutes/hours)
Read cache rate of 95% or better Read cache rate below 80%
Write cache rate 85% or better Write cache rate below 80%
As you can see, RA_PAGES and RA_THRESHOLD can impact the number
of light scan buffers, and they cannot be changed dynamically. You can
consider creating dbspaces that are dedicated to the DSS activity, giving
them a larger page size. When increasing the PAGESIZE, IDS increases the
number of light scan buffers (see the previous equation). The page size must
be a multiple of the operating system page size, but not greater than 16
kilobytes (KB). Place attention on the size of your row. Each page can contain
a maximum of 255 rows. Therefore, if the row size is small and the page size
is large, you can risk to lose disk space. To know the maximum row size, use
the following command:
oncheck -pt databasename:tablename
Then check the line “Maximum row size.”
To create a dbspace with a customized page size in KB, you can use the
following command:
onspaces -c -d DBspace [-t] [-k pagesize] -p path -o offset -s size
[-m path offset]
– BUFFERPOOL
The BUFFERPOOL configuration parameter specifies the values for
BUFFERS, LRUs, LRU_MAX_DIRTY, and LRU_MIN_DIRTY for both the
default page size buffer pool and for any non-default pages size buffer
pools. However, if you create a dbspace with a non-default page size, the
dbspace must have a corresponding buffer pool. For example, if you
create a dbspace with a page size of 8 KB, you must create a buffer pool
with a page size of 8 KB. The BUFFERPOOL onconfig parameter can be
useful to reduce the number of buffers and force IDS to use the light scan.
For a DSS environment, you can set the buffers to a low number, for
example 5000.
BUFFERPOOL
size=8K,buffers=5000,lrus=8,lru_min_dirty=50,lru_max_dirty=60
PDQPRIORITY 100
MAX_PDQPRIORITY 100
You can monitor the PDQ behavior by using the onstat –g mgm command.
PDQ queries use memory from the Virtual Shared Memory segments, not
from the BUFFERS.
DBSPACETEMP
This variable defines more DBSPACETEMP to allow parallelism. Also
consider how much additional space is needed. For example, hash joins can
use a significant amount of memory and can potentially overflow to temporary
space on disk. You can use the following formula to estimate the amount of
memory that is required for the hash table in a hash join:
hash_table_size = (32 bytes + row_size) * num_rows_table
Fragmentation
There are many considerations when fragmenting. For example, you must
understand the workload and then consider how to fragment both the table
and the indexes based on that workload.
Keep in mind that, in an OLTP environment, you want to accomplish the following
tasks:
Tune the onconfig parameters to have fast response time.
Read and write buffer cache rates above 90%.
Take fast checkpoints.
Have a short recovery time objective (RTO).
Maximize I/O throughput.
Optimize fragmentation strategy.
Optimize index utilization.
We look at the primary OLTP factors detail in the sections that follow.
Here we describe some of the important parameters that are involved in the
OLTP configuration. However, this section is not about performance tuning and
that topic is not included in this book. For information about performance tuning,
see the IBM Informix Dynamic Server Performance Guide, G229-6385, and the
IBM Informix Dynamic Server Administrator’s Reference, G229-6360. For some
of the onconfig parameters, we provide an initial value, but this does not mean
they are the most suitable values for your particular implementation.
Fragmentation
Fragmentation is a data distribution scheme used by the database server to
distribute rows or index entries to data fragments. The expression-based
distribution schemes put rows that contain particular specified values in the same
fragment. A fragmentation expression defines the criteria for assigning a set of
rows to each fragment, either as a range rule or some arbitrary rule. A remainder
fragment can be specified that holds all rows that do not match the criteria for any
other fragment, although a remainder fragment reduces the efficiency of the
expression-based distribution scheme.
Index utilization
Typically the queries involved in the OLTP environment do not request a scan of
the entire table. Instead indexes are typically used to select the rows that are
needed to process the query. In large OLTP environments the database
administrator analyzes the tables when they are created. However, it can be
difficult to continue monitoring their usage. Therefore, new users or new
applications can query the tables by using a different WHERE condition that is
not yet optimized. This action can generate sequential scans, which in some
circumstances, can create significant performance problems because it
increases both the number of pages that are read from disk and the number of
locks.
OLTP performance can be increased by removing the sequential scans using the
following methods:
Use the onstat -p command to check the value of the seqscans field. If
seqscans has a high value, say more than 200, you must investigate and
determine which tables have a high number of sequential scans.
SELECT
dbsname,tabname,b.partnum,pf_dskreads,pf_dskwrites,pf_seqscans FROM
systabnames as a, sysptntab as b WHERE pf_seqscans > 200 AND
a.partnum=b.partnum
output:
dbsname database_name
tabname table_name
partnum 2097154
pf_dskreads 265432456
pf_dskwrites 543678954
pf_seqscans 34000
After you identify a table with a high number of sequential scans, monitor the
activity on the table to determine if there are missing indexes. You can do this
by using the OAT as described in the list item “SQL Trace” on page 202 of
Chapter 5, “The administration free zone” on page 161.
One important consideration is the source of data to which users need access.
For example, consider a telecommunications company that saves all the
customer call detail, such as phone number, call start and finish time, duration,
location, and so on. Some users might need access to all the detail, but many
others might not. They might only need a summarized version of the data, which
in this scenario, might include data such as the total number of calls per day,
average number of calls per hour, average duration of a call, and so on. This is
true, for example, when considering OLTP and DSS users.
For a scenario that includes OLTP and DSS users, data tables might be
categorized into the following three groups:
The DSS group contains aggregate tables for the DSS users. For this
category, create the tables in dbspaces with large page sizes.
The OLTP group contains tables with detailed data for the OLTP users that
contain the detailed data. Create these tables in dbspaces with a smaller
page size, such as 2k or 4k.
The OLTP and DSS tables are used by both OLTP and DSS users. Create the
tables in dbspaces with a page size somewhere between the sizes of the
OLTP and DSS dbspaces.
This classification of the tables gives the possibility to use different dbspaces and
different BUFFERPOOL sizes, and to set parameters based on the results of the
performance tuning activity.
For example, consider a 32-bit operating system with 2,000 pages. The onconfig
file can contain several different BUFFERPOOL configurations, for example:
For DSS:
BUFFERPOOL size=6k,buffers=2000, lrus=2, lru_min_dirty=60,
lru_max_dirty=50
To change dynamically from OLTP to DSS, consider Example 2-3 on page 50. In
this example, we use generic values that must be changed for your environment.
The appropriate values that are used can be determined from your performance
tuning activity.
The first step is to create a file named oltp2dss.sql. In that file, place the SQL
script that is shown in Example 2-3.
The SQL script only needs to be executed once. Then the IDS scheduler can
execute the command every night at midnight as specified in the ph_task table.
To execute the script, from user informix, open a command window and execute
the following command:
dbaccess - oltp2dss.sql
Output
tk_name oltp2dss_CKPTS
tk_type TASK
tk_group SERVER
tk_description change AUTO_CKPTS for DSS
tk_execute execute function admin('ONMODE', 'wf',
'AUTO_CKPTS=0');
tk_next_execution 2007-11-02 00:00:00
tk_start_time 00:00:00
tk_stop_time
tk_frequency 1 00:00:00
Double check that the tk_execute and tk_next_execution fields are correct as you
specified in your script. Check all the other tasks that you wrote in the script, as
shown in Example 2-5.
cmd_number 131
cmd_exec_time 2007-12-31 00:00:00
cmd_user informix
cmd_hostname NA
cmd_executed ONMODE
cmd_ret_status 0
cmd_ret_msg OK
After the first scheduled execution, check the command_history table for the new
tasks that are created. Look at the cmd_ret_status field. If it is different from zero,
the command failed.
At this point, you should be able to change dynamically from the OLTP
environment to the DSS environment.
Instead of using the oltp2dss.sql script, use the dss2oltp.sql script as described
in Example 2-6. Primarily, the following fields need to be changed:
tk_name
tk_description
tk_next_execution
tk_start_time
The script only needs to be executed once. The IDS scheduler will execute the
command every morning at 07:00:00 a.m. as you specified in the ph_task table.
At this point, you should be able to dynamically change from the DSS to the
OLTP environment.
Note: After the installation, you must configure XAMPP to work with
Informix.
You have now verified that the Web server, PHP, and Informix are installed and
configured properly. Proceed with installing the OAT.
2. On the IBM Informix Free Product Download main page (Figure 2-11), scroll
down to the IBM Informix Open Admin section. Select the OS version and
documentation that you want to download, and click Download Now at the
bottom of the page.
If you click the button Detail, you can see the file downloaded, the location,
and size, as shown in Figure 2-13.
For the installation and configuration, use the local host for all the examples that
involve an action with Apache. However, you can use the machine name instead
of the local host in all the commands described in this section, and the results will
be the same.
1. Open the Web browser and type the following URL in the address bar:
http://localhost/oat/install/
2. On the installation Welcome Page (Figure 2-14), select the I accept the
terms in the license agreement check box and click Next.
The signup process provides a key that you can enter in the GOOGLEMAPKEY
field. From there, you can determine, for example, the specific latitude and
longitude of the server. When you finish, click Save.
The OAT Admin page opens (Figure 2-20) on which you can perform the
following actions:
Change the OAT configuration parameters (refer to Figure 2-16 on page 65).
Add a new OAT administrator group.
Add new connections to IDS instances.
Associate a location map to the IDS instances.
Connect to an IDS instance.
To find the Host Name, connect to the IDS server and enter the following
command:
hostname
To find the port number, connect to the IDS server, open the file services, and
search for the instance name. Depending on the operating environment, it can be
found in either of the following directories depending on your platform:
For Windows, in C:\WINDOWS\system32\drivers\etc\services
For UNIX or Linux, in /etc/services
The port number is in the second column. Sometimes the port name and port
number used by the instance are not written in the file services. In this case,
enter the following command:
onstat -g ntt
The port number is in the row that is related to the thread name, soctcplst.
If you are not planning to use the location map, you can leave the Latitude and
Longitude fields empty. If you are using the location map, those values can be
determined, for example, by using Google Maps. For more information, refer to
“Googlemapkey” on page 65.
At this point you have completed the configuration and can now monitor and
manage the instance by using the OAT. For more details about managing the
instance, refer to Chapter 5, “The administration free zone” on page 161.
EDA is comprised of the high availability (HA) and data replication solutions that
are embedded in IDS. Such solutions include High Availability Data Replication
(HDR), Remote Standalone Secondary (RSS), Shared Disk Secondary (SDS),
Continuous Log Restore (CLR), and Enterprise Replication (ER). These
solutions are key to enabling an effective, flexible, and efficient way to maximize
availability of data, provide disaster recovery, and ensure consistent delivery of
that data wherever and whenever it is needed.
In this chapter, we do not give details about how to configure and implement the
solutions presented here because such information is well beyond the scope of
the chapter. However, upon completion of this chapter, you will have a better
understanding of the technology and solutions that are available to you with IDS.
High Availability Data Replication is extremely robust, having been part of IDS
for over ten years. However, with HDR, there can only be one secondary
instance. At this time, the user can only write to the primary instance, which
might not enable the desired degree of load balancing.
The latest requirement is to have both the ease of use of HDR and the
extensibility and one-to-many relationships of ER. With IDS 11, this functionality
has been delivered with two new replication technologies, Remote Standalone
Secondary and Shared Disk Secondary servers. Additionally, a new Continuous
Log Restore feature makes it possible to manually maintain a backup system.
In this section, we provide a brief overview of the high availability and data
replication technologies that are embedded in IDS. With this information, you will
have a better understanding of how to apply and enable these EDA features to
address your specific business and application needs.
HDR
HDR employs a log record shipping technique to transfer the logical log records
from the primary server to the secondary server. The secondary server is in
perpetual roll-forward mode, so that data on the secondary server remains
current with data on the primary server.
HDR provides manual or automatic failover. If the primary server fails, the HDR
secondary server automatically takes over and switches to a standard or primary
server allowing minimal disruption to the clients. When the original primary
server becomes available, it is synchronized when HDR is restarted.
The HDR secondary server can be used for read-only operations while in a
functional HDR pair. As such, read-only applications, such as reports, can be
executed against the secondary instance, thus reducing the load on the primary
server. It can also be used as a hot backup server for additional availability in
case of unplanned outages or disaster recovery scenarios.
Miami
RSS_1
NewYork
Primary
S
RS
LosAngeles
RS RSS_1
S
Using full duplexed communication results in RSS servers has little impact on the
primary server performance.
As shown in Figure 3-2 on page 78, remote application servers can access local
database servers to minimize latency and improve performance.
RSS can also be used as multiple remote backup servers for additional
availability in the event of unplanned outages or any catastrophe at the location
of the primary or other HA secondary servers.
Shared Disk
Figure 3-3 Shared Disk Secondary
Like RSS servers, SDS servers also use the SMX layer, which is an internal
component that is implemented to support the full duplexed communication
protocol. SDS does not support synchronous mode, which is similar to RSS, but
different from HDR.
The SDS architecture provides the ability to set up multiple database servers
sharing the entire dbspace set that is defined by a primary database server. It
can be used for defining database servers on the same physical machine or
different machines with an underlying shared file system.
Multiple SDS servers provide the opportunity to dedicate specific SDS servers
for specific tasks, such as data warehousing as a DSS-oriented server or Web
application server with an online transaction processing (OLTP) approach, with
the appropriate differences in the configuration for parallel database query (PDQ)
and memory requirements. The SDS environment can also be used simply for
An SDS server can be made available quickly. When configured, an SDS server
joins an existing system and is ready for immediate use.
The benefits of this feature in terms of resources, in comparison with HDR and
RSS, are a significantly lower requirement on disk space and a slight reduction in
network traffic. The simple requirements for setup and configuration do not bind
additional DBA resources. In addition, much better load balancing and workload
partitioning can be achieved by dynamically adding and removing SDS servers in
an existing infrastructure.
Shared disk file systems: Several shared disk file systems are available in
the market that guarantee concurrent use by different systems in a high
availability cluster. For example, the IBM General Parallel File System™
(GPFS™) is a high performance shared disk file system that can provide fast,
reliable data access from all servers for AIX and Linux cluster systems.
Similarly, other shared disk technologies, such as Veritas Storage Foundation
Cluster File System, Redundant Array of Independent Disks (RAID), and
Storage Area Network (SAN), can also be used to set up an SDS cluster.
However, we do not recommend the use of a mounted Network File System
(NFS) for the Shared Disk Secondary servers, for performance reasons.
Ba
c
ku L og
p ply
Ap
Lo
g
Transport Log
Should the primary server become unavailable, a final log recovery is performed
on the backup server, which is brought up in online mode as the primary server.
CLR is useful when the backup database server is required to be fairly current,
but the two systems need to be completely independent of each other for
reasons such as security and network availability. CLR can also be useful when
the cost of maintaining a persistent network connection is too high. With CLR, log
files are manually transferred to a backup database server where they are
restored.
HA Cluster
ER
HA Cluster
ER provides mechanisms to easily set up and deploy replication for systems with
large numbers of tables and servers. It also provides support for Online Schema
Evolution that allows modifications in replication definitions or replicated tables
for an active ER system without interrupting the data replication.
All the features of ER can result in a wide spectrum of benefits, including reliable
and fast replication of data across a distributed or global organization, improved
data availability, capacity relief, and increased performance.
The EDA features are built within the server and can interact with each other
making IDS a powerful database server yet that is simple to configure and
administer, thus minimizing DBA activity and overall cost of ownership.
3.2.1 HA clusters
Figure 3-6 shows an HA cluster, which is a combination of all the possible HA
solutions including CLR, HDR, RSS, and SDS nodes. Depending on the
business needs, the HA cluster can include more RSS and SDS nodes, or it can
be a subset of the configuration shown.
HDR Secondary
RSS
Primary
SDS
RSS Shared
Disk
SDS
SDS
Backup Server
Blade Server
CLR
Figure 3-6 HA cluster with CLR, HDR, RSS, and SDS servers
The HA cluster can provide such capabilities as high availability, failover, disaster
recovery, and workload balancing. It can also support planned or unplanned
outages.
Typically, planned outages are required in situations where one of the servers in
the HA cluster is scheduled for maintenance such as in the hardware or OS.
In these situations, a DBA can choose the available failover options. These
include failover of the primary to a secondary, switching roles between the
primary and secondary, switching the secondary to another type, changing a
server to standard mode, or just removing a server from the HA cluster.
In this case, only the primary server of an HA cluster is included in the replication
network. The backup or secondary servers are not directly configured for ER.
The order of creating the systems does not matter. An HA cluster can be created
and then added to an ER network, or any stand-alone system in an ER network
can be converted to an HA cluster.
What matters is to ensure that paths are available, so that the failure of any single
system does not leave sets of systems cut off from one another. For example, in
Figure 3-7 on page 87, if one of the central servers, in New York or London, is not
available, then none of the regional servers will be able to connect to the rest of
the enterprise. In this case, each central server and its regional servers are good
candidates to be HA clusters.
London
New York
Munich Paris
LosAngeles Chicago
London
HA Cluster
New York
HA Cluster
Munich Paris
HDR Secondary
RSS
Primary
SDS
RSS Shared
SDS Disk
SDS
Backup Server
Blade Server
HQ
RSS RSS
Primary Primary
SDS SDS
RSS Shared RSS Shared
SDS Disk SDS Disk
SDS SDS
Backup Server Backup Server
Blade Server Blade Server
Region1 Region2
Figure 3-9 Combination of HA clusters and an ER network
The high availability requirement can be obtained with the failover, disaster
recovery, and data distribution capabilities provided by HDR, RSS, SDS, CLR,
and ER technologies included in IDS. In the same way, the high performance
goal can be achieved by using IDS and all of its built-in features that make it a
fast database server, along with the capacity relief, workload balancing, and data
sharing capabilities provided by the HA and ER solutions of IDS.
High availability and high performance requirements not only depend on the
database server, but also on the other components that comprise the system
such as hardware, OS, network, and the application. Nevertheless, the Informix
EDA solutions discussed in this chapter can be used as an important part of the
overall solution.
Table 3-1 Comparison of capabilities and failover options for EDA technologies
Capabilities and failover options HDR RSS SDS CLR ER
Disk layout of primary or source compared to Identical Identical Shared Identical Custom
secondary, backup, or target
Fully duplexed (SMX) versus half duplexed Half Fully Fully N/A N/A
communication
Only one shared primary server for any existing Yes Yes Yes Yes N/A
secondaries in an HA cluster
Coexists and works together with other EDA Yes Yes Yes Yes Yes
solutions
Periodic requirement to increase reporting capacity. Use SDS or RSS servers. If the amount of data is
large and maintaining multiple copies is difficult,
then use SDS servers.
You are using SAN devices, which provide ample Use SDS servers.
disk hardware availability, but you are concerned
about server failures.
You are using SAN devices, which provide ample Consider using two blade centers running SDS
disk hardware mirroring, but you also want a servers at the two sites.
second set of servers that can be brought online if
the primary operation should fail (and the
limitations of mirrored disks are not a problem).
You want to have a backup site a moderate distance Consider using two blade centers with SDS servers
away, but cannot tolerate any loss of data during on the primary blade center and an HDR secondary
failover. on the remote server.
You want a highly available system in which no Consider using an HDR secondary located nearby
transaction is ever lost, but that must also have a running in SYNC mode and an RSS server on the
remote system on the other side of the world. other side of the world.
You want a high availability solution, but because of Consider using an RSS server.
the networks in your region, there is a large latency.
You want a backup site, but you do not have any Consider using CLR with backup and recovery.
direct communication with the backup site.
You can tolerate a delay in the delivery of data as Consider using SDS servers with hardware disk
long as the data arrives eventually. However, you mirroring in conjunction with ER.
need quick failover in any case.
You need additional write processing power, can Consider using ER with SDS servers.
tolerate some delay in the delivery of those writes,
need something highly available, and can partition
the workload.
The first layer provides availability solutions to deal with transitory local failures.
For example, this might include having a couple of blade servers attached to a
single disk subsystem running SDS servers. Placing the SDS servers in several
locations throughout your campus makes it possible to provide seamless failover
in the event of a local outage.
HDR Traffic
HDR
Primary Secondary
SDS
Shared
Disk
Shared
Disk
Blade Server A1
Blade Server B
<New Orleans>
<Memphis>
Building-A
Shared
Disk RSS Traffic
Mirror RSS
SDS
SDS
Shared
Disk
Blade Server A2
<New Orleans> Blade Server C
Building-B <Denver>
Now suppose that a local outage occurred in Building-A on the New Orleans
campus. Perhaps a pipe burst in the machine room causing water damage to the
blade server and the primary copy of the shared disk subsystem. You can switch
the role of the primary server to Building-B by running the onmode -d command to
make the primary server name on one of the SDS servers running on the blade
server in Building-B. This causes all other secondary nodes to automatically
connect to the new primary node, as shown in Figure 3-11 on page 95.
Offline HDR
Secondary
Shared
Disk
Shared
RSS Traffic Disk
Blade Server A1
<New Orleans> Blade Server B
Building-A <Memphis>
Shared
Disk
Mirror RSS
Primary
SDS
Shared
Disk
Should there be a regional outage in New Orleans such that both building A and
building B are both lost, then you can shift the primary server role to Memphis. In
addition, you might also want to make Denver an HDR secondary and possibly
add more SDS servers to the machine in Memphis. Figure 3-12 illustrates this
scenario.
Primary
Offline
SDS
SDS
Shared
Disk
Shared
Disk
Blade Server A1
HDR Traffic Blade Server B
<New Orleans>
<Memphis>
Building-A
Shared
Disk
Mirror HDR
Secondary
Offline
Shared
Disk
Blade Server A2 Blade Server C
<New Orleans> <Denver>
Building-B
Offline
Offline
Shared
Disk Shared
Disk
Blade Server A1
<New Orleans> HDR Traffic Blade Server B
Building-A <Memphis>
Shared
Disk
Mirror
Primary
Offline
Shared
ER Traffic
Disk I/O Traffic
SDS Network Traffic
Client to Server Traffic
hq1 hq2
Primary
Primary
SDS
SDS Shared
Disk SDS
SDS
Volume 1
SDS
SDS
Figure 3-14 Organization with data warehouse and OLTP using SDS and ER
hq1 hq2
Primary
Primary
SDS
SDS
Shared
Disk SDS
SDS
Volume 1
SDS
SDS
Blade Server A Blade Server B
Shared
Disk
Data Warehouse Volume 2 OLTP
Figure 3-15 Handling transient need for extra OLTP client requirements
Call center
Organizations might want a consistent set of records that can be updated at any
site in a peer-to-peer fashion. That is an update-anywhere replication. This
capability allows users to function autonomously and continue to function even
when other servers or networks in the replication system are not available.
Retail pricing
A company might want to use a data dissemination model where data is updated
in a central location and then replicated to multiple, read-only sites. This method
is useful when information must be distributed, for example, to multiple sales
locations.
For example, a hotel chain might want to send reservation information to the
various hotels, or a book store chain headquarters might need to send updated
price lists of available books to its stores on a nightly basis. To ensure that this
data is consistent, the stores have read-only access to the information while the
headquarters has read-write capability.
An example of such environment is a retail store chain that throughout the day
gathers point-of-sale information. At the end of the business day, the stores must
transmit the data to the headquarters, where it is consolidated into the central
data warehouse to be used in various business intelligence processes, such as
trend analysis and inventory control systems.
For example, an HR system where the European site has ownership of its
partition and can modify employee records for personnel in its region. Any
changes to the data are then replicated to the U.S. sites. While the European site
can query or read the other partitions, it cannot update them. Similarly, the U.S.
sites can change data only within their own respective partitions, but can query
and read data in all partitions. Any changes to the U.S. data are replicated to the
European site.
As an example, Figure 3-16 shows how two ER nodes, New York and London,
replicate a database table, Table X, which has four columns. A set of
applications, App V1, are connected to each of the ER nodes and execute
transactions against Table X.
App V1
App V1 App V1 App V1 App V1
App V1
Table X Table X
Table X Table X
Figure 3-17 Altering the table at one ER node with older client versions migrated
App V2
App V2
App V2
Tablex
Tablex
Figure 3-18 Deployment of new version of application that is aware of the new schema
Tablex
Tablex
Thus without any server downtime, we can smoothly upgrade the applications
from one version to another version.
serv1
sds_1 sec_1
RSS
g_serv1
rss_1
In Figure 3-20, the primary, HDR secondary, RSS, and SDS instances are
configured as one logical instance named g_serv1. This configuration is
accomplished through entries in the client and server SQLHOSTS files as shown
in Table 3-3.
In this section, we briefly describe the first four methods and then show an
example of the MACH 11 feature of the OAT.
More information: For more details about using the message log file,
console, event alarms, onstat utility, and sysmaster database for monitoring
availability and replication features of IDS, refer to the Redbooks publication
Informix Dynamic Server 11: Extending Availability and Replication,
SG24-7488, which is available in softcopy at the following address:
http://www.redbooks.ibm.com/abstracts/sg247488.html?Open
You can either customize the ALARMPROGRAM scripts provided with IDS, or
write and use your own shell script, batch file, or binary program.
For further details about configuring the event alarms for HA and ER servers,
refer to the IBM Informix Dynamic Server Administrator's Guide, G229-6359, or
to the IBM Informix Dynamic Server Enterprise Replication Guide, G229-6371.
More information: You can also use OAT for monitoring and administering
other features and components of IDS. For more details about OAT, refer to
2.4, “Installing the Open Admin Tool” on page 55, and 5.5, “The Open Admin
Tool for administration” on page 192.
Consider that you have an HA cluster with one primary database server
(newyork), an SDS (newyork_c1), an HDR secondary (miami), and an RSS
(losangeles). You have already configured the servers by using the Connection
Admin page and included the latitude and longitude for every server. After a login
to the Informix server newyork in the OAT, you see an overview page.
To access the Cluster functionality of OAT, under Menu in the left pane of the
overview page, click Mach 11.
You can use the cluster page to monitor the status of the servers in the
HA cluster.
The role of the DBSA is a more powerful role than a DBA, who is typically only
responsible for a particular database. The Informix Dynamic Server (IDS)
contains a rich set of features that provide DBSAs with a framework to create a
robust environment for a high performance database system.
Raw devices, however, are much faster than cooked devices because they
bypass the file system caching and use kernel AIO (KAIO). But, they are
comparatively more difficult to manage and extend.
Cooked files have the benefit of a high cache hit rate, read-ahead, and
asynchronous I/O. However they are not needed because the database server
has its own buffering and read-ahead mechanism.
Direct I/O is available on AIX, HP-UX, Solaris, Linux, and Windows, provided that
it is supported by the underlying file system. Some of the file systems that
support Direct I/O are Solaris UFS, VxFs, JFS, and NTFS. Refer to the man page
of the mount command to find the argument that enables direct I/O.
IDS 11 supports direct I/O on cooked devices only if it is supported by the file
system for the page size of the dbspace chunk. Direct I/O is used by default on
Windows (NTFS). To enable it on UNIX, set the onconfig parameter DIRECT_IO
to 1. If the database server is configured to use KAIO, IDS uses KAIO along with
direct I/O on cooked devices. The performance of direct I/O and KAIO on cooked
files is close to raw device performance.
For temporary dbspaces, use of the file system buffering for regular files can be
faster than using direct I/O because there is no need to synchronize writes to
ensure that writes are committed to disk. Therefore, for regular files belonging to
temporary dbspaces, IDS does not use direct I/O, even if direct I/O is enabled.
Recommendations
We recommend that you use raw disk devices for the best performance. Based
on your environment, requirements, and benchmark tests on your table layout,
you can decide on the option that is the most optimal and convenient. Even
though all your chunks are on raw devices, you might want to set DIRECT_IO if
you load large number of rows by using flat files that are generated from
heterogeneous systems. One such example is a data warehouse environment
where a large number of rows is bulk loaded from various data marts running on
other databases.
On Windows, both raw disks and NTFS use KAIO. Because NTFS files are a
more standard method of storing data, you can use NTFS files instead of raw
disks because it also supports direct I/O. Consider using raw disks if your
database server requires a large amount of disk access.
To reduce contention for the root dbspace, the physical and logical logs can be
moved out from the root dbspace to a separate disk.
Page size
The default system page size is platform dependent (4K on Windows and 2K on
most UNIX platforms). However, you might want to create multiple dbspaces with
differing page sizes that are multiples of the system page size. Each page size
can have its own BUFFERPOOL setting in the onconfig file. The maximum
allowable page size is 16K.
At the time of disk initialization (oninit -iy), you can use the TBLTBLFIRST and
TBLTBLNEXT configuration parameters to specify the first and next extent sizes
for the tblspace tblspace belonging to the root dbspace. For non-root dbspaces,
you can use the onspaces utility to specify the initial and next extent sizes for the
tblspace tblspace, when creating this dbspace. The first and next extent sizes
cannot be changed after the creation of the dbspace. You cannot specify these
extent sizes for temporary dbspaces, sbspaces, blobspaces, or external spaces.
The number of pages in the tblspace tblspace is equal to the total number of
table and detached index fragments including any system objects that reside in
the dbspace. As shown in Example 4-1 on page 117, dbs4 is created with an
initial extent size and next extent size of 2 MB and 1 MB for tblspace tblspace.
$ oncheck -pe
>>>>>>
Chunk Pathname Pagesize(k) Size(p) Used(p) Free(p)
9 /work3/ssajip/INSTALL/dbspaces/dbs4 2 5120 1003 4117
set explain on ;
select * from shipment where ship_date = '2007-04-01';
EOF
$ cat sqexplain.out
QUERY: (OPTIMIZATION TIMESTAMP: 01-22-2008 15:38:30)
------
select * from shipment where ship_date = '2007-04-01'
Estimated Cost: 3
Estimated # of Rows Returned: 1
IDS supports simple large objects primarily for compatibility with earlier versions
of Informix applications. When you write new applications that need to access
large objects, we recommend that you use smart large objects to hold character
(CLOB) and binary (BLOB) data. CLOBs can be used only for text data, where
BLOBs can be used for any binary data.
Smart large objects have the following advantages over simple large objects:
They can store up to 4 TB as opposed to 2 GB.
They support random access to the data.
You can read or rewrite only specified portions of the smart large object.
Temporary dbspaces
Temporary dbspaces are never backed up, nor are they physically or logically
logged. Therefore, if you create a logged temporary table in a logged database, it
resides only in the logged (non-temp) dbspaces of the DBSPACETEMP
configuration parameter.
The database server keeps track of the most recently used temporary dbspace
and uses the next available dbspace (in a round-robin pattern) to allocate I/O
operations approximately evenly among those dbspaces. You can define a
different page size for temporary dbspaces, so that the temporary tables have a
separate buffer pool.
Extspaces
An extspace is a logical name that is associated with an arbitrary string that
signifies the location of external data. The resource that the extspace references
depends on the user-defined access method for accessing its contents. The
server does not manage this space. It is created by using the onspaces utility.
Virtual tables/index interface and basic text search (BTS) are two areas where
extspaces are used.
For DSS, data can be fragmented, and the index can be non-fragmented and can
reside in a different dbspace (detached index). For DSS environments that
contain tables that are accessed sequentially, or if there are no columns that can
be used as a fragment expression, you can choose a round-robin fragmentation.
Round-robin fragmentation evenly distributes data across the disks.
For large DSS tables that contain columns representing date, region, or country,
expression-based tables are most ideal because they can benefit from fragment
elimination. Also fragments can be detached or attached based on, for example,
the date column. For example, every month-end processing can involve
detaching the fragment for the first month of last year and attaching the fragment
that contains data for the new month.
For OLTP environments, you can fragment the data and index to reduce
contention among sessions. Smaller tables do not need to be fragmented
because the overhead of creating the scan threads to access the fragments can
exceed the time taken to sequentially scan the table.
The database server tests all six of the inequality conditions when it attempts
to insert a row with a value of 25:
x >= 1 and x <= 10 in dbspace1
x > 10 and x <= 20 in dbspace2
x > 20 and x <= 30 in dbspace3
Fast recovery time is the time from when the DBA starts the IDS server until the
server comes to an online or quiescent mode. It is comprised of the following
times:
Boot up time
This is the time it takes to boot up the server infrastructure. Typically most of
this time is spent in the initialization of the shared memory.
Physical recovery time
This is the physical recovery restores the database to a physically consistent
state. This time depends on the number of pages being physically restored
and the I/O speed of the physical log disk.
In the following sections, we describe how to set and tune the configuration
parameter to predict fast recovery time. We also discuss the effects of this setting
on other configuration parameters.
Tuning RTO_SERVER_RESTART
More frequent checkpoints are triggered for a lower value of this setting. You can
start with an aggressive (smaller) value of RTO_SERVER_RESTART that is
suitable for your environment, and then monitor the frequency and type of
checkpoints by using the onstat -g ckp command. If the checkpoints are too
frequent and are affecting transactional performance, use the onmode -wf
command to dynamically increase RTO_SERVER_RESTART as shown in
Example 4-3 on page 123.
As part of logical replay, each of the log records can generate an I/O. If that I/O
requires a page to be read from disk, log replay performance will be adversely
affected. Also, these random I/Os can cause unpredictable recovery time.
Changing the physical log size, location, or both can only be done online in
IDS 11. You can use the onparams command to change the physical log size,
location, or both without requiring the server to be rebooted. However, this can
only be done while in quiescent or admin mode. That means it must be done
when access to the data can be interrupted.
Example 4-4 on page 124 shows how to change the physical log size from 9 MB
to 11 MB. This example has just the default BUFFERPOOL setting for 2K page
size. If you have BUFFERPOOL settings for other page sizes, you must sum
them to calculate the optimum physical log size.
$ onstat -l
Physical Logging
Buffer bufused bufsize numpages numwrits pages/io
P-1 16 64 15 0 0.00
phybegin physize phypos phyused %used
1:15525 4500 34 16 0.36
$ onparams -p -s 11000 -y
Log operation started. To monitor progress, use the onstat -l command.
** WARNING ** Because the physical log has been modified, a level 0
archive must be taken of the following spaces before an incremental
archive will be permitted for them: rootdbs
(see Dynamic Server Administrator's manual)
$ onstat -l
Physical Logging
Buffer bufused bufsize numpages numwrits pages/io
P-2 2 64 16 1 16.00
phybegin physize phypos phyused %used
1:20025 5500 0 2 0.04
PHYSBUFF
The default value for the physical log buffer size is 128 KB. When the
RTO_SERVER_RESTART configuration parameter is enabled, the default
physical log buffer size can be changed to 512 KB. If a smaller value is to be
used, IDS writes a performance advisory message to the online.log indicating
that optimal performance might not be attained, as shown in Example 4-5. Using
a physical log buffer smaller than the default size impacts only performance, not
transaction integrity.
4.3.1 AUTO_CKPTS
The default value of AUTO_CKPTS is 1(ON) in the onconfig.std file. When
AUTO_CKPTS is set to 1, the server calculates the minimum physical log size
using RAS_PLOG_SPEED.
In Example 4-4 on page 124, we set PHYSFILE to 11 MB, which was 110% of
the total buffer pool size. In that example, we had set AUTO_CKPTS to 0. After
we set AUTO_CKPTS to 1, we saw a performance advisory message in the
online.log as shown in Example 4-7 to increase the log size to 14 MB. If the log
size is less than this minimum value, automatic checkpointing is disabled. You
can use the onparams -p command to increase PHYSFILE to the recommended
value of 14 MB.
The server also calculates the minimum logical log space based on
RAS_LLOG_SPEED and writes an advisory message with the recommended
value.
If automatic checkpoint is ON, the server also calculates the time it takes to flush
the buffer pool and writes an advisory to the online.log if the physical log size is
too small. By the time the flushing completes, transactions can potentially use up
the remaining physical log causing checkpoints to block out transactions. The
server writes this informative message, so that the DBA can take a corrective
action, as shown in Example 4-8 on page 128.
To change the default setting, you can use the onmode -wm (temporary for this
current server session) or onmode -wf (permanent change in onconfig) options to
turn it OFF (value 0).
4.3.2 AUTO_LRU_TUNING
In IDS versions prior to 11, it was typical to set low thresholds for
LRU_MIN_DIRTY and LRU_MAX_DIRTY. This setting limited the number of dirty
buffers and improved checkpoint performance. The reduced checkpoint time
blocked user transactions for a shorter duration as compared to the case where
the LRU_MIN_DIRTY and LRU_MAX_DIRTY values were higher. It might even
be set to a value < 1 if a large number of buffer pools were configured on the
system. With IDS 11, you can now relax the LRU settings, since checkpoints do
not block transactions. This should result in a dramatic increase in performance.
The following settings are a good starting point for setting the LRU flushing
parameters:
lru_min_dirty=70
lru_max_dirty=80
You can also let IDS auto tune the LRU settings depending on its usage by
setting the AUTO_LRU_TUNING configuration parameter to 1.
4.3.3 AUTO_AIOVPS
The tuning of AIOVPS was based on how it can accommodate the peak number
of I/O requests. The maxlen field of the onstat -g ioq command shows the
largest backlog of I/O requests for each file descriptor. The gfd column of this
output can then be used to get the pathname from onstat -g iof, which can then
be mapped to the onstat -d output to get the chunk name. Generally, it is not
detrimental to allocate too many AIO virtual processors.
With IDS 11, you can set AUTO_AIOVPS to enable IDS to add AIO VPS on an
as-needed basis. The default value for AUTO_AIOVPS is 1. When using cooked
file chunks, if AIO VPs are too busy, the server automatically increases the
number of flushers and AIO VPs. However, we recommend that you monitor
these automatic increases and the particular environment. For example, if a large
Even if the database setup contains only raw devices that use KAIO, the server
uses AIO for writing to the online.log and sqexplain.out files or reading from the
onconfig file or load files.
Important: Do not disable the hosts.equiv lookup in database servers that are
used in distributed database operations.
1. A user who is trusted on the machine can connect from a remote client
without specifying the USER clause in the CONNECT statement. To make the
user trusted, add the remote client to the .rhosts file in the user’s home
directory:
> connect to 'db1@ids_server';
Connected.
Connected.
The following example shows the basic steps that are required to enable PAM.
The occurrence of s= 4 in the fifth field of the sqlhosts file represents a PAM entry
that includes the PAM service name and the type of authorization, password or
challenge.
1. Update the sqlhosts file entry for the PAM entry:
ids ontlitcp ramsay 9801 s=4,pam_serv=pam_ids,pamauth=(challenge)
2. Add an entry to the PAM configuration file /etc/pam.conf:
pam_ids auth required /usr/lib/security/pam_ids.so
3. Write a program to create the shared library pam_ids, so that it authenticates
the user by defining the function pam_sm_authenticate. This function can
throw various challenges. The remaining interfaces, including
pam_sm_setcred, pam_sm_acct_mgmt, pam_sm_open_session,
pam_sm_close_session, and pam_sm_chauthtok, can be left in dummy
status.
You can find the sample shared library used in the previous example at the
following Web address:
http://www.ibm.com/developerworks/db2/library/techarticle/dm-0704anbalagan
When the server is changed to administration mode, all sessions for users other
than user informix, the DBSA group users, and those identified in the user list
will lose their database server connection.
If you want only user informix to connect to the server, you can set the
configuration parameter ADMIN_USER_MODE_WITH_DBSA to 0. A value of 1
means that the DBSA group users, user informix, and administration mode
users, as listed in ADMIN_MODE_USERS, can connect when the server is in
administrator only mode.
DAC is an access control policy that verifies whether the user has been granted
the required privileges to perform an operation. DAC is a simpler system that
involves lesser overhead than MAC and can be implemented in the following
ways:
Controlling who is allowed to create databases by using the
DBCREATE_PERMISSION configuration parameter
Restricting the users who are allowed to register external UDRs by using the
IFX_EXTEND_ROLE configuration parameter
Controlling operations on database objects by using roles (RBAC)
In the subsequent sections, we briefly discuss these DAC options and explain
with some examples how you can benefit from the LBAC functionality.
Only jdoe can create external UDRs in a database, if the following statement is
the only GRANT EXTEND statement in that database:
GRANT EXTEND TO 'jdoe';
You can also create a default role and assign that role to individual users or to
PUBLIC on a per-database level. The default role is automatically applied when a
user establishes a connection with the database. This enables a user to connect
to a database without issuing a SET ROLE statement. The default role can also
be attained by setting the role in the user.sysdbopen procedure. Each user that is
assigned to a default role receives the privileges of that role in addition to the
other privileges that are granted individually to the user.
Example 4-13 on page 137 shows that user jdoe who works for the sales
department has a default role of sales_role with all permissions on the sales
table. maryk works for the human resource (HR) department and has all
permissions on the employee table but no permissions on the sales table. jdoe
can only access the employee and department columns from the employee table.
He cannot access any confidential columns, such as the salary column, from this
table.
When a user attempts to access a protected table, IDS enforces two levels of
access control. The first level is DAC, which you can implement by using roles
(RBAC). With DAC, IDS verifies whether the user who is attempting to access the
table has been granted the required privileges to perform the requested
operation on that table. The second level is LBAC, which controls access at the
row level, column level, or both levels.
sales_person usr3
(expression) Engineer:Sales:Northern California
region North CA
1 row(s) retrieved.
e. Let us assume that usr2, who is the Western region Sales manager, wants
to insert rows. If the label column is omitted, IDS automatically inserts the
appropriate label that belongs to usr2. usr2 can see the row inserted by
usr3 in the previous step because usr2 is higher in the hierarchy in the
levels ARRAY and is from the West region:
as usr2:
> insert into sales (sales_date, sales_person, region, sales)
> values (today, "usr2", "West", 999);
1 row(s) inserted.
sales_person usr3
(expression) Engineer:Sales:Northern California
region North CA
sales_person usr2
(expression) Manager:Sales:West
region West
2 row(s) retrieved.
f. usr3 is unable to see the usr2 row because that row is at a higher level:
as usr3:
> select sales_person, seclabel_to_char('policy1',lbac_tag),
> region from sales;
sales_person usr3
(expression) Engineer:Sales:Northern California
region North CA
1 row(s) retrieved.
sales_person usr4
(expression) Manager:Sales:East
region East
1 row(s) retrieved.
h. usr1 is the president and can view all the rows:
as usr4:
> select sales_person, seclabel_to_char('policy1',lbac_tag),
> region from sales;
sales_person usr3
(expression) Engineer:Sales:Northern California
region North CA
sales_person usr2
(expression) Manager:Sales:West
region West
sales_person usr4
(expression) Manager:Sales:East
region East
3 row(s) retrieved.
No rows found.
b. usr2 cannot access the highly confidential column ssn. The SELECT
succeeds if usr2 selects only the allowable columns:
as usr2:
> select * from employee;
8245: User cannot perform READ access to the protected column
(ssn).
emp_no salary
No rows found.
c. usr3 can access only the columns that are marked as unclassified:
as usr3:
> select emp_name,gender,dept from employee;
No rows found.
You can also have tables that require a combination of row level and column level
security. The steps to implement this are similar to the ones that we just
explained.
4.5.5 Auditing
You can use the secure auditing feature to detect any unauthorized access. This
feature allows you to audit database events that access or modify data. The
database system security officer (DBSSO) can configure the system to audit
certain user activities and periodically analyze the audit trail. The audit event has
predefined mnemonics, such as CRTB for CREATE TABLE, that can be used to
define the audit masks. The onaudit command is used to create, modify, and
maintain the audit masks and configuration.
Auditing can also be used for diagnostic purposes. For example, consider an
application that drops and recreates tables where the CREATE TABLE fails
intermittently with an error indicating that the table already exists, even though it
was DROPPED just before creation. This means that another session is trying to
run a CREATE TABLE between this session’s DROP and CREATE. You can turn
on auditing and audit the CREATE TABLE and DROP TABLE commands run by
usr1, by entering the following commands:
$ onaudit -l 1
$ onaudit -p /tmp/audit
$ onaudit -a -u usr1 -e +CRTB,DRTB
The file created in /tmp/audit shows an entry if usr1 creates a table t1 as shown
in this example:
ONLN|2007-11-01 11:52:17.000|ramsay|17280|ramsay_install|usr1
|0:CRTB:san:101:t1:usr1:0:-
Users might enter unencrypted data into columns that are meant to contain
encrypted data. To ensure that data entered into a field is always encrypted, use
views and INSTEAD OF triggers that internally call the encryption function.
For more details and examples, refer to Chapter 5 of the Redbooks publication
Informix Dynamic Server 11: Advanced Functionality for Modern Business,
SG24-7465.
For internal backups, transactions can continue executing on the server when the
backup is running. There is a small time interval when the server blocks
transactions (updates) during the archive checkpoint. However, cold restore
requires the server to be in offline mode when a critical dbspace, such as root
dbspace or a dbspace that contains the physical or logical log, must be restored.
An onbar restore can take hours depending on the database size. If you have a
set of important tables that require immediate access, you can run a cold restore
on the dbspaces housing that data to bring the server online.
The external method handles the physical backup and restore, but the logical log
backup and restore still has be done by using ON-Bar. External backup and
restore can be done by using cp, dd, or tar on UNIX or copy on Windows, or a file
backup program. If you use hardware disk mirroring and external backup and
restore, the DBSA blocks the server for a short time period when the mirrors are
broken. During this time, reads are still allowed but updates block. The server is
then unblocked to allow updates, and onbar is run to back up all the logs
including the current one. The offline mirrored disks can then be backed up to
back up media by using external commands, during which time the server is
online and accepts new connections and updates. Mirroring can be resumed
after the backup is complete. You can get your system online faster with external
restore than with ON-Bar.
IDS supports internal backup and restore by using the ontape and onbar utilities:
ontape offers the following advantages:
– Can perform a sequential backup and sequential restore
– Can back up to stdio, a file, or a directory
– Can change the logging mode of a database but the recommended way is
to use ondblog
ontape requires the following considerations:
– Does not use a storage manager
– Cannot specify specific storage spaces
– Cannot do a Point-in-Time (PIT) restore from backup
– Cannot use multiple tapes concurrently
– Cannot restart a restore
The backup tapes produced by ontape and ON-Bar are not compatible. For
example, you cannot create a backup with ontape and then restore it with
ON-Bar, or visa versa. In the following sections, we discuss the various backup
and restore techniques and how they can fit your needs and your environment.
We recommend that you establish a backup schedule that keeps level-1 and
level-2 backups small. Schedule frequent level-0 backups to avoid restoring large
level-1 and level-2 backups or many logical-log backups. The frequency of the
database backup depends on the frequency, amount, and importance of the data
updates.
The ontape utility writes the backup data directly to tape media. The configuration
parameters TAPEDEV and LTAPEDEV point to the tape device. When backing
up to tape, ensure that an operator is available and that there is sufficient media.
A backup can require multiple tapes. After a tape fills, ontape rewinds the tape,
displays the tape number for labeling, and prompts the operator to mount the
next tape. The full tapes should then be labeled and new tapes mounted.
As shown in Example 4-14, the same directory is used by two instances on the
same node ramsay, one with SERVERNUM 10 and the other with 14. Two level-0
backups have been run on the latter instance. During restore, ontape looks for
the file ramsay_14_L0 in this directory. If you must restore from an older image, it
can be renamed to this standard format and used for the restore.
$ ontape -s -L 0
File created: /home/data/backup/ramsay_14_L0
Program over.
$ ls -ltr /home/data/backup
-rw-rw---- 1 informix informix 15342850 Oct 1 11:02 ramsay_10_L0
-rw-rw---- 1 informix informix 11042816 Nov 1 13:52
ramsay_14_20071101_135254_L0
-rw-rw---- 1 informix informix 11239424 Nov 1 22:48 ramsay_14_L0
A set of all the important tables can reside in separate dbspaces. These
dbspaces, along with the dbspaces that contain the root dbspace and the logical
and physical logs, can be backed more often than the remaining dbspaces.
Customizing ALARMPROGRAM
The ALARMPROGRAM configuration parameter can be set to the
Informix-provided alarm program script $INFORMIXDIR/etc/alarmprogam.sh to
capture certain administrative events. These events can be informative, such as
log complete, backup complete, long transaction detected, or an error condition
such as chunk offline or Out of memory.
The second table lists the time taken to spawn the daemon or update the sysutils
catalog for backup or time taken to read the bootup file in case of a warm restore.
If there are performance issues, compare these times to see if the bottleneck is
due to the storage manager or IDS.
TRANSFER RATES:
--------------------------------------------------------------------------------------------------------------------------
| OBJECT | XBSA API | SERVER API |
| NAME | xfer-kbytes xfer-time RATIO(kb/s) API-TIME | xfer-kbytes xfer-time RATIO(kb/s) API-TIME |
--------------------------------------------------------------------------------------------------------------------------
| 9 | 64 0.147 436 0.261 | 64 0.013 4967 0.014 |
| dbs1 | 62 0.105 589 0.156 | 62 0.000 134183 0.291 |
| rootdbs | 10230 0.362 28233 0.440 | 10292 0.114 90338 0.293 |
--------------------------------------------------------------------------------------------------------------------------
| PID = 9008 | 10356 0.614 16859 0.856 | 10418 0.127 81855 0.598 |
--------------------------------------------------------------------------------------------------------------------------
2007-11-07 11:04:57.845472 9008 9006 PERFORMANCE INFORMATION
PROCESS CLOCKS:
--------------------------------------------------------------------------------------------------------------------------
| PID | CLOCK DESCRIPTION | TIME SPENT (s) |
--------------------------------------------------------------------------------------------------------------------------
| 9008 | To execute the master OnBar deamon. | 2.796 |
| 9008 | To update 'sysutils' for Log 9. | 0.036 |
| 9008 | To update 'sysutils' for Dbspace dbs1. | 0.036 |
| 9008 | To update 'sysutils' for Dbspace rootdbs. | 0.032 |
| 9008 | To update 'sysutils' for Log 9. | 0.024 |
| 9008 | To update 'sysutils' for Log 9. | 0.021 |
| 9008 | To update 'sysutils' for Log 9. | 0.069 |
| 9008 | To update 'sysutils' for Dbspace dbs1. | 0.015 |
| 9008 | To update 'sysutils' for Dbspace dbs1. | 0.018 |
| 9008 | To update 'sysutils' for Dbspace dbs1. | 0.044 |
| 9008 | To update 'sysutils' for Dbspace rootdbs. | 0.022 |
| 9008 | To update 'sysutils' for Dbspace rootdbs. | 0.042 |
| 9008 | To update 'sysutils' for Dbspace rootdbs. | 0.112 |
--------------------------------------------------------------------------------------------------------------------------
2007-11-07 11:04:57.857574 9008 9006 /work3/ssajip/INSTALL/xps/bin/onbar_d complete, returning 0 (0x00)
You can use the onsmsync utility to regenerate the emergency boot file and expire
old backups.
For the syntax and options available with onsmsync, refer to the IBM Informix
Backup and Restore Guide, Version 11.1, G229-6361-01.
The database server and ON-Bar do not track external backups. To track the
external backup data, use a third-party storage manager or track the data
manually. Table 4-1 shows the items that we recommend to track for an external
backup.
ins_copyid_hi and ins_copyid_lo Copy ID that the storage manager assigns to each
backup object
Backup date and time The times that the database server was blocked
and unblocked
Database server version The database server version from which the
backup was taken
IDS supports a cold or warm external restore. Refer to the IBM Informix Backup
and Restore Guide, Version 11.1, G229-6361-01, to do a cold or warm restore
from an external backup.
AC_STORAGE /tmp
AC_MSGPATH /tmp/ac_msg.log
AC_VERBOSE on
AC_TAPEBLOCK 62 KB
AC_IXBAR /home/informix/INSTALL/etc/ixbar.14
Dropping old log control tables
Extracting table db1:src into db1:dest
Scan PASSED
Control page checks PASSED
Table checks PASSED
Table extraction commands 1
Tables found on archive 1
LOADED: db1:dest produced 4 rows.
Creating log control tables
Staging Log 12
Refer to the IBM Informix Backup and Restore Guide, Version 11.1,
G229-6361-01, for more information.
This happens when the table or tables to which the statement refers in the
PREPARE get renamed or altered, possibly changing the structure of the table or
even an UPDATE STATISTICS on the table. By setting the configuration
parameter AUTO_REPREPARE to 1, IDS automatically re-optimizes SPL
routines and prepares prepared objects again after the schema of a table
referenced by the SPL routine or by the prepared object has been changed. You
can also set it at a session level by using SET ENVIRONMENT
IFX_AUTO_REPREPARE.
For more information, refer to the IBM Informix Dynamic Server Administrators
Reference Guide, Version 11.1, G229-6359-01.
For example, IDS 11 now has SQL-based administration. That is, now most of
the administrative tasks can be performed by using SQL. It also provides a
framework where SQL statements or stored procedures can be defined to
execute administrative tasks, collect statistical information, and perform database
system monitoring without database administrator (DBA) intervention. IDS also
has the capability to trace SQL statements for a particular user or a set of users.
DBAs can retrieve the trace information in several ways and in formats that are
more understandable to them.
The Web-based GUI administration tool called Open Administration Tool (OAT) is
also available for IDS 11. This tool uses the new features in IDS 11 to provide a
simple interface for performing the IDS administration tasks.
In this chapter, we provide a brief description of these features and explain how
they make administration simple and automated in IDS 11. We also discuss
example scenarios for using the new functionality. Finally we show you a real life
example that uses the different components of the database administration
system.
In IDS 11, you can use a set of user-defined routines (UDRs) to do most of the
administrative tasks previously performed by command line utilities, such as
finderr, oninit, onmode, onspaces, onparams, ondblog, oncheck (-c options only),
IDS 11 also has a component called the DBScheduler, which by using this
component, you can schedule administrative activities within the database
server. You can specify when to execute a specific administration task and
whether to run it at specific intervals or at predefined times. For example, you can
specify that the database server is to take the archive of the system at a certain
time every week without manual intervention. You can also configure the
database server to detect certain situations and take corrective actions as
needed. For example, when the logical logs are full, the server blocks until the
logical logs are backed up or new logs are available.
You can write scripts to enable the server to detect when the logical logs are full
and add a logical log dynamically. You can also define SQL scripts and store
them in the database server to gather statistical information and monitor
database activity. For example, you can enable the server to monitor how much
memory each session is using or how many users are logged onto the system at
a certain time.
You can also generate reports for later analysis. For example, you can create
reports for the total memory used by sessions or for the SQL statement statistics
and purge these reports every day. The DBA does not need to rely on OS cron
jobs or shell scripts for these routine tasks. Because the tasks are scheduled
inside the server, they are portable across all platforms.
You can expand on this list by creating new tasks and scheduling them according
to your needs. The SQL Administration APIs can help with this activity. For
example, to schedule an administrative task, such as adding a chunk to a
dbspace, you must rely on the SQL Administration API. But to schedule an
administrative task, such as executing a checkpoint, you can use the onmode
command in the scheduler. In the sections that follow, we explain how to perform
administrative tasks by using SQL and show how this helps in scheduling tasks.
The sysadmin database is a logged database that contains tables that are used
by the DBScheduler. These tables contain tasks that are created by the
Scheduler for collecting data and monitoring system activities.
The sysadmin database is created in the root dbspace by default. If you have a
number of tasks scheduled and heavily use the SQL Administration APIs, the
root dbspace can fill up fast. If you usually use the root dbspace for your
databases, then you can run out of space soon. In that case, you can move the
sysadmin database to another dbspace to make more space in the root dbspace.
PH_RUN Contains information about how and when each task was
executed.
PH_TASK Lists tasks and contains information about how and when the
database server will execute each task.
Each row in the ph_task table is a task or a sensor that is executed by the
DBScheduler as defined. The task properties determine what SQL or stored
procedures to execute, when to execute them, and where the results should be
stored.
The ph_run table contains an entry for every execution of a task or a sensor from
the ph_task table. You can query this table to see if a task or a sensor has been
executed successfully.
The ph_alert table also contains user-defined and system-defined alerts. If a task
or a sensor created in the ph_task table failed to execute, there is a row in the
ph_alert table indicating that the SQL specified in the task failed to execute and
the error it returned. These are system-defined alerts. You can also insert rows in
to the ph_alert table to create an alert for when a specific event occurs. These
are user-defined alerts.
Note: Only user informix has permissions to access the sysadmin database
by default.
The routine task() returns a descriptive message indicating the success or failure
of the command. The routine admin() returns an integer whose absolute value
can be used to query the command_history table in the syasadmin database to
obtain more information about the command that was executed. A positive
integer indicates a success and a negative integer indicates a failure. The
function task() can be used to execute a command by itself and see the return
message. The function admin() can be used in SQL scripts or stored procedures.
Each execution of task() and admin() gets logged into the command_history
table automatically. You can query this table to retrieve information about the user
who executed the command, the time the command was executed, the command
itself, and the message returned when the database server completed running
the command.
Note: Only the database server administrator (DBSA), root user, and informix
user have permission to execute the task() and admin() functions.
Adding more space is done by using the onspaces command, as in previous IDS
versions. We now describe how to add a dbspace using the SQL Administration
API. Example 5-1 shows the commands to create a 20 MB dbspace with 0 offset
by using the task() UDR.
The SQL in Example 5-2 creates the dbs2 file with correct permissions in
$INFORMIXDIR/chunks as part of creating the dbspace. If a file named dbs2
already exists, then this SQL command will fail.
The same results can be obtained by executing the function admin() as shown in
Example 5-3.
The SQL in Example 5-3 creates the dbs2 file with correct permissions in
$INFORMIXDIR/chunks while creating the dbspace. The only difference to task()
is in the return value. Example 5-4 shows the return value.
179
To see detailed information about the admin() function executed in Example 5-3,
query the command_history table as shown in Example 5-5.
1 row(s) retrieved.
Suppose that you need to update a table and the dbspace does not have enough
room to fit the table after the update. You can add more space to the dbspace by
adding a chunk to it. Example 5-7 on page 169 shows how you can add a 10 MB
chunk to dbspace dbs1 by using the admin() function.
database sysadmin;
When adding a chunk, the file should exist in the specified directory with the
correct permissions.
In IDS 11, checkpoints are non-blocking, which means that transactions can
make updates while a checkpoint is active. These transactions consume physical
log and logical log space during their processing. If the server is short of critical
resources, such as physical log and logical log space, then the transactions are
blocked to let the checkpoint finish.
To increase the physical log size, use the onparams command or use the SQL
Administration API as shown in Example 5-9.
If the logical log is too small and blocking of transactions happens during
checkpoint, you see performance advisories in your online.log recommending
you to increase the logical log size. To increase the logical log size, add more
logical logs by using the SQL Administration API shown in Example 5-10.
If needed, you can also drop a logical log file to increase the amount of space in
a dbspace. The database server requires a minimum of three logical log files at
all times. You cannot drop a log if your logical log is composed of only three log
files.
Some of the command line commands can also be passed as arguments to the
task() and admin() functions, as shown in Example 5-12. Here the onmode -l
command is passed as an argument to the task() function. Then onmode -l is
executed to switch to the next logical log. You might want to switch to the next
logical log file before the current log file becomes full for the following reasons:
Back up the current log
Activate new blobspaces and blobspace chunks
Execute the command shown in Example 5-12 to switch to the next available log
file.
Suppose you are packaging the database server in your application and the
database server in each package should have the space configuration as shown
in Table 5-3.
dbspace1 40 MB
dbspace2 30 MB
sbspace1 50 MB
bspace1 50 MB
physdbs 40 MB
logdbs 40 MB
tempdbs 10 MB
$INFORMIXDIR/chunks/chunk1 in dbspace1 10 MB
$INFORMIXDIR/chunks/chunk2 in dbspace1 10 MB
The requirement might be to add the spaces after the server comes online. To do
this, develop an SQL script that creates a table that contains the name, type,
path, offset, and size of the dbspaces needed and another table that contains the
information about the chunks. Example 5-13 shows the SQL Administration API
functions to create the dbspaces and chunks.
Now create a UNIX script, as shown in Example 5-14, to initialize the server and
then execute the SQL script to create the specified dbspaces and chunks. The -w
option that is used with oninit makes the shell script wait until the server is
online. It saves you from writing additional code to check whether the server is
online before executing dbaccess.
You can also perform more administrative tasks by using the SQL Administration
API functions. To see a full list of available SQL Administration APIs, refer to
Chapter 7 of the Redbooks publication Informix Dynamic Server 11: Advanced
Functionality for Modern Business, SG24-7465.
Only task properties, not configuration parameters, define what the Scheduler
collects and executes.
5.3.1 Tasks
A task is a means to execute a specific job at a specific time or interval. A task is
executed by invoking an SQL statement, a stored procedure, a C UDR, or a Java
UDR. It is a way to monitor the database server and take corrective actions as
needed.
In IDS 11, checkpoints are non-blocking. That is, transactions can make changes
to the data while the checkpoint is in progress. The server still blocks
transactions from making any updates, if it is short of critical resources, such as
physical log and logical log, to complete the checkpoint.
If you create this task at a time later than 5:00 a.m., it is executed immediately on
that day, but it executes at 5:00 a.m. on all subsequent days. You can also specify
a date in the tk_next_execution field if you want the task to start executing from
that day.
To check if your scheduled task was executed successfully, query the ph_run
table by using the tk_id of the task in ph_task. In Example 5-17, tk_id is of the
task in the ph_task table that you want to check.
Example 5-18 Task to run the update statistics on the mydb database
INSERT INTO ph_task(
tk_name,tk_type,tk_group,tk_description,tk_execute,tk_start_time,
tk_stop_time, tk_frequency, tk_next_execution)
VALUES(
“UpdateStat_mydb","TASK",“SERVER",
“Do update stasticis on mydb every Sunday at 5am",
“database mydb; update statistics high;",
DATETIME(05:00:00) HOUR TO SECOND,
NULL,
“7 0:00:00”,
“2007-11-04 05:00:00”,
);
Improvements are expected in the next release that will provide even more
flexibility and power with this feature.
5.3.2 Sensors
Sensors are specialized tasks that are more geared toward collecting data.
Sensor data can be purged for status logging or later analysis. Sensors provide a
portable and simple way of collecting information without using OS calls.
However, because of their purging capability, they are useful in creating reports.
Example 5-19 on page 177 demonstrates the creation of a sensor to track the
number of sessions on the database server every 5 minutes and to delete the
data every day. This is one of the predefined tasks that is automatically created in
the sysadmin database.
Tasks and sensors can be used for diagnostic purposes. Suppose there is a
memory leak in your environment every day between 4 a.m. and 5 a.m. You must
determine the sessions that are active during that time and the total memory
used by these sessions to help determine the session that is causing the
memory leak. Example 5-20 shows how to create a sensor that collects this
information and stores it in a table, called sess_mem, every 60 seconds during
the time frame in question.
Important: The tk_create and tk_delete fields in the ph_task table are
enabled only for sensors. Even if you specify a value for these fields when you
create tasks, it will be ignored.
As rows are added to a table, the database server allocates disk space in units of
extents, which is a block of physically contiguous pages from the dbspace. Even
when the dbspace includes more than one chunk, each extent is allocated
entirely within a single chunk, so that it remains contiguous.
Because table sizes are not known, the database server cannot preallocate table
space. Therefore, the database server adds extents only as they are needed, but
all the pages in any one extent are contiguous for better performance. In addition,
when the database server creates a new extent that is adjacent to the previous
one, it treats both as a single extent.
Monitoring disk usage by table is particularly important when you are using table
fragmentation, and you want to ensure that table data and table index data are
distributed appropriately over the fragments.
You can also use startup tasks to turn on diagnostic flags at server startup time.
Suppose that you need to set AFDEBUG 5 minutes after the server starts.
AFDEBUG is set for hanging the engine in case of an assertion failure to enable
you to collect diagnostic information. Example 5-22 shows how to create a
startup task for this.
Example 5-23 Startup sensor to track the database server startup environment
INSERT INTO ph_task (
tk_name,
tk_type,
tk_group,
tk_description,
tk_result_table,
tk_create,
tk_execute,
tk_stop_time,
tk_start_time,
tk_frequency,
tk_delete )
VALUES (
"mon_sysenv",
"STARTUP SENSOR",
"SERVER",
"Tracks the database servers startup environment.",
"mon_sysenv",
"create table mon_sysenv (ID integer, name varchar(250), value
lvarchar(1024))",
"insert into mon_sysenv select $DATA_SEQ_ID, env_name, env_value FROM
sysmaster:sysenv",
NULL,
NULL,
"0 0:01:00",
"60 0:00:00" );
By default this feature is turned off, but you can turn it on for all users or for a
specific set of users. When this feature is enabled with its default configuration,
the database server tracks the last 1000 SQL statements that ran, along with the
profile statistics for those statements.
Be aware that the memory required by this feature is quite large if you plan to
keep historical information. The default amount of space required for SQL history
tracing is 2 MB. You can expand or reduce the amount of storage according to
your requirements or disable SQL history tracing.
The following information is an example of what the SQL trace output shows:
The user ID of the user who ran the command
The database session ID
The name of the database
The type of SQL statement
The duration of the SQL statement execution
The time this statement completed
The text of the SQL statement or a function call list (also called stack trace)
with the statement type
Statistics including:
– Number of buffer reads and writes
– Number of page reads and writes
– Number of sorts and disk sorts
– Number of lock requests and waits
– Number of logical log records
– Number of index buffer reads
– Estimated number of rows
– Optimizer estimated cost
– Number of rows returned
Database isolation level
Any user who can modify the $INFORMIXDIR/etc/$ONCONFIG file can modify
the value of the SQLTRACE configuration parameter and affect the startup
configuration.
The built-in SQL Administration API functions or task() and admin() from the
sysadmin database provide the same functionality as the configuration variable
SQLTRACE. However, setting or changing the tracing values by using the API
functions does not require you to restart the server. With tracing enabled or
disabled, using these API functions is effective only until the engine is restarted.
After the engine is restarted, the SQLTRACE setting from the configuration file is
used.
Do you know of a way to enable tracing when the database server starts without
using the SQLTRACE onconfig variable?
Tip: Use the Scheduler to enable tracing when the database server starts.
Setting adjustment: Adjust the trace settings to make the trace buffers big
enough on systems where there are many users.
Because the tracing data is stored in memory buffers, setting the tracing only to
required sessions gives you more control of the tracing memory usage.
When you notice a performance degradation, but are not sure what sessions to
trace, switch on the global mode of tracing. Monitor the sessions and SQL history
trace information to help identify the sessions that do not need tracing. You can
disable tracing at the session level by using the SQL Administration API
functions.
Example 5-24 specifies the database server to gather low-level trace information
for all sessions with default values. The server collects about 1000 KB of trace
information for 1000 SQL statements.
To enable low-level global tracing with default values when the server is online,
use the SQL Administration API shown in Example 5-26.
Example 5-26 SQL Administration API to switch on tracing with default values
dbaccess sysadmin -
Database selected.
1 row(s) retrieved.
Example 5-27 specifies that the database server is to gather medium-level trace
information for all sessions. It specifies to trace 3000 SQL statements and to
collect 4 KB of trace data for each SQL statement. Medium-level tracing gathers
more information than low-level tracing.
Example 5-28 shows how you can gather the same information by using the SQL
Administration API function of task().
Database selected.
1 row(s) retrieved.
After monitoring the system for a while, it is possible to determine which sessions
do not need tracing. Suppose that you decide that you want to disable tracing for
a particular session, for example session 20. Use the SQL Administration API
shown in Example 5-29 to disable tracing.
Database selected.
After enabling global tracing, you might decide that the informix user sessions
are acceptable. To disable tracing for all sessions of user informix, use the SQL
shown in Example 5-30.
When you know which user sessions must be traced, switch on the user mode of
tracing. You can monitor the sessions by a specific user or set of users to help
identify the sessions that are causing a performance bottleneck.
Example 5-31 shows how to enable user-mode tracing when the database server
starts. The actual tracing does not start until an SQL Administration API is
executed to indicate which user sessions to trace.
Database selected.
The tracing is switched on for all user sessions of usr1 and usr2 by using the
tracing values specified in the SQLTRACE onconfig parameter.
To start tracing a particular session, when the user-mode tracing is enabled, use
the SQL shown in Example 5-34.
The onstat -g his command shows the host variable values for a statement if
the tracing has captured them and those same values cannot be found in the
Example 5-35 shows the output of the onstat -g his command. It has been
truncated to show the trace information for one SQL statement.
Statement history:
Database: 0x1000B2
Statement text:
SELECT FIRST 1 {+first_rows} tk_id, (tk_next_execution -
CURRENT)::INTERVAL SECOND(9) TO SECOND::char(20)::integer as tm_rem FROM
ph_task WHERE tk_id NOT IN ( ?,?,?,?,?,?,?,?,?,? ) AND
tk_next_execution IS NOT NULL AND tk_enable ORDER BY tk_next_execution,
tk_priority
Iterator/Explain
================
ID Left Right Est Cost Est Rows Num Rows Type
2 0 0 8 1 16 Seq Scan
1 2 0 1 1 1 Sort
Statement information:
Sess_id User_id Stmt Type Finish Time Run Time
19 200 SELECT 16:14:06 0.0028
Statement Statistics:
Page Buffer Read Buffer Page Buffer Write
Read Read % Cache IDX Read Write Write % Cache
0 45 100.00 0 0 0 0.00
As you can see in Example 5-35, the onstat -g his output shows the following
information:
Trace settings information
– This information includes the trace level, trace mode, size of the buffer,
number of statements to trace, trace flags and control block. Some of
these are dynamically configurable using the SQL Administration API
functions.
– From the output shown in Example 5-35, you can see that it is a low-level
global trace for 1000 SQL statements. The buffer size is shown as 984
bytes, which is almost equal to 1 KB.
Statement information
– This information shows the statement text, the database name, iterator,
and explain information for the statement. From this information, it is
possible to determine the type of query, the user ID, session ID, the
estimated number of rows, and the type of scans done on the tables.
– The database name shows up correctly only for medium and high levels of
tracing.
Statement statistics
– This information is most important because it is required for
troubleshooting performance problems. It shows the time spent in page
reads, buffer reads, lock waits, and I/O waits.
– From the output in Example 5-35, you can see that there are zero page
reads and 45 buffer reads. Therefore, the percentage read from the cache
is 100%.
You can query these tables to get trace information related to a specific SQL
statement or SQL statements related to a specific user or session.
Suppose that you want to retrieve the SQL trace information for a particular
session, for example 19, to see the SQL statements executed by that session
and their statistics. Use the query in Example 5-36 to see the trace information
for session 19.
Example 5-37 shows the output for this query. The output has been truncated to
show just one row of data. If many SQLs are executed in that session, the output
is long, so that you can unload the output to a file and analyze it later.
You can unload the output to a file if this query has been executed a number of
times.
You can find instructions to install and configure the OAT in Chapter 2,
“Optimizing IDS for your environment” on page 15. In this section, we focus on
the different functionality of OAT.
Home is the first item in Menu. By clicking Home, you see the Map view on the
right side of the page.
Online Log displays the database server message log. The MSGPATH
configuration parameter specifies the full path name of the database server
message log. You should specify a path to an existing directory with an
appropriate amount of space available. If you specify a file name only in the
MSGPATH configuration parameter, the server creates the message log in the
working directory in which you started the database server. For example, if
you started the server from /usr/mydata on UNIX, the message log is written
to that directory. This is the log hat the server uses to write all messages,
errors, and warnings.
Onbar Activity Log displays the ON-Bar activity log file. The BAR_ACT_LOG
configuration parameter specifies the full path name of the ON-Bar activity
log. You should specify a path to an existing directory with an appropriate
amount of space available or use $INFORMIXDIR/bar_act.log. Whenever a
backup or restore activity or error occurs, ON-Bar writes a brief description to
the activity log. The format of the file resembles the format of the database
server message log.
You can examine the activity log to determine the results of ON-Bar actions. If
you specify a file name only in the BAR_ACT_LOG parameter, ON-Bar
creates the ON-Bar activity log in the working directory in which you started
ON-Bar. For example, if you started ON-Bar from /usr/mydata on UNIX, the
activity log is written to that directory.
Alert List
ID Type Message Time Recommendation Alert
State
2007-10-30 Re-Check
TASK NAME UpdateStat_mydb LOCATION ERROR – 201 a
7472 WARNING NEW
A syntax error has ocurred. 09:57:16
Ignore
For more information about the alert colors, refer to Table 5-4 on page 197.
You can selectively view the alerts depending on their Severity (color), Error
Type, or State.
Warning From the database that was A future event that needs A predicted failure is imminent.
automatically addressed to be addressed Action is necessary now.
Task List shows the different tasks and sensors that are created in the
Scheduler. This page shows the name, group, description, time of next
execution, and frequency of execution of each task in the ph_task table of the
sysadmin database. You can view this page to monitor the tasks that are
created in the database and to know the names of the existing tasks when
you are ready to create a new one.
On the Task List page shown in Figure 5-5 on page 197, you can see such
predefined tasks as mon_config_startup and mon_sysenv as discussed in
5.1, “IDS administration” on page 162. You can view all the tasks or
selectively view the tasks by using the Group to View drop-down list, which is
also shown in Figure 5-5.
Scheduler shows part of the information from the ph_task table about
scheduled tasks, such as start time, stop time, and frequency.
By clicking the Name column, you see detailed information about each
dbspace.
Chunk shows details about all the chunks in each dbspace, such as size,
page size, offset, free space, and used space. It also shows the reads and
writes in each of the chunks.
Usually the queries executed on a system are monitored to see if they take
longer than expected. If they do, they are analyzed further to see if they are
performing sequential scans, and then decisions are made regarding whether
to create indexes on the tables involved. By using the OAT SQL Trace
interface, you can monitor a particular type of query or all queries.
SQL Tool Box is a powerful menu in OAT and contains the following subitems:
Databases shows a list of databases (Figure 5-14). When each database is
clicked, the tables and SPLs created in them are displayed. The tables can be
further examined to see the column details and the data.
SQL shows the SQL Query Editor like the example in Figure 5-16. You can
write an SQL statement or import from a file and execute it against any of the
databases in the server.
The Dashboard menu item shows a graphical view of the memory, space, locks,
and transactions in the server.
In this section, we examine how a DBA can take advantage of the different
components of the Database Admin System to solve a real life problem. The
problem the we explore is to how to remove users who have been idle for more
than a specified length of time, but only during work hours. Prior to the database
admin system, a DBA used several different operating system tools, such as
shell scripting and cron. In addition, if this is a pre-packaged system, these new
scripts and cron entries must be integrated into an installed script. In addition,
this must be portable across all supported platforms.
If you are to use the Database Admin System, you only have to add a few lines to
your schema file and you are done. Since this is only SQL, it has the advantage
of being portable across different flavors of UNIX and Windows.
task_name The name of the task in the ph_task table associated with this threshold
We insert a new parameter into the ph_threshold table that can be updated to
reflect the current idle timeout period, saving the work of re-writing the stored
procedure if conditions change as shown in Example 5-39. We then allow the
OAT to display this as a configurable item for the task.
Next we retrieve the thresholds from the ph_threshold table. We do this by using
a simple select and casting the result into our desired data type (an integer). See
Example 5-41.
Example 5-41 Select statement to retrieve the threshold from ph_threshold table
SELECT value::integer
INTO time_allowed
FROM ph_threshold
WHERE name = "IDLE TIMEOUT"
The main part of our stored procedure is the SELECT statement (Example 5-42
on page 210) to find all users who have been idle for more than a specified
number of minutes. From the systcblst table, we select the last time a thread has
executed on a virtual processor. If this time is longer than our predetermined idle
threshold and this thread is an sqlexec thread (that is, not a system thread), then
we pass the session ID (sid) to the SQL Administration API function admin(). The
admin() function has been set up to call onmode -z to terminate a session.
Alert List
ID Type Message Time Recommendation Alert State
2007-10-26 Re-Check
User suma@yogi sid(689) terminated due to idle
6336 INFO ADDRESSED
timeout. 13:19:29
Ignore
Re-Check
TASK NAME UpdateStat_mydb LOCATION ERROR -201 2007-10-26 NEW
6294 WARNING
A syntax error has occurred. 09:57:16
NEW Ignore
Figure 5-17 Show Alerts page showing that session 689 has been terminated
{*** Find all users who are idle longer than the threshold ***}
FOREACH SELECT admin("onmode","z",A.sid), A.username, A.sid,
hostname
INTO rc, sys_username, sys_sid, sys_hostname
FROM sysmaster:sysrstcb A , sysmaster:systcblst B,
sysmaster:sysscblst C
WHERE A.tid = B.tid
AND C.sid = A.sid
AND lower(name) in ("sqlexec")
AND CURRENT - DBINFO("utc_to_datetime",last_run_time) >
time_allowed UNITS MINUTE
AND lower(A.username) NOT IN( "informix", "root")
tk_type The type of task (TASK, SENSOR, STARTUP TASK, and STARTUP
SENSOR).
tk_group The name of the group to associate the task with, for organization
purposes. See the tk_group table for more details.
By extending the data server with new data types, functions, and
application-specific structures, developers build solutions that have the following
characteristics:
Take advantage of data models that closely match the problem domain
Depart from strict relational normalization to achieve better performance
Implement powerful business logic in the data tier of the software stack
Handle new types of information
We call these solutions robust because the elegance and economy of their
architecture gives them higher performance, maintainability, responsiveness, and
flexibility in the face of changes in environment, assumptions, and requirements.
Here, we introduce object-relational extensibility, which is the concept that
makes it all possible. We also show how the Informix Dynamic Server (IDS) has
taken this key ingredient farther than any other data server, putting the full power
of customization in the hands of developers and users, to enable customization
of IDS for your environment.
While relational database management systems (RDBMS) have long allowed for
scripting through stored procedures, they are mostly large, closed, monolithic
programs with no facility for adding new capabilities. IDS has the well-crafted
architecture, tools, and interfaces to make it an effective container for
components big and small, allowing it to respond to any data management
challenge. Instead of the plugins, add-ons, and add-ins of other software, for IDS
we use the term DataBlade or, more generically, extension. However, they are
basically other words for the same idea of software components.
A problem arises when such code must connect to a database. That is, the
relational model forces us to map the problem structure to a schema consisting
of tables with visible columns whose types come from a limited set of
fundamental alphanumeric types, whose connections are encoded with primary
and foreign keys, and whose general structure is a poor match with the
object-oriented picture.
y (x,y)
O x X
In a relational table, you can easily represent this, as shown in Example 6-1.
Y
________________
d = √(x – 15)2 + (y – 12)2
y (x,y)
d y – 12
(15,12)
12
x – 15
O 15 x X
id name distance
1 one 5.38516
... ... ...
Even better, we can write user-defined routines (UDRs) in SPL, C, or Java that
make such queries easier to express and more self-documenting. We can also
put the implementation in one place to be used by every client application. For
the sake of simplicity, Example 6-3 uses SPL to implement a distance function.
The output is the same as in Example 6-2, and therefore, it is not repeated here.
Still, the problem is that, for nearly all purposes, x and y only have meaning when
used together. Individually, they mean (almost) nothing. Moreover, the distance
function takes four FLOAT arguments. But, is the relationship between them, for
example, x1-y1-x2-y2 or x1-x2-y1-y2? It can quickly become confusing if we add
more arguments, for example, for working in three dimensions or to carry along
information about the coordinate system for each location.
CREATE TABLE sites (id INT, name VARCHAR(40), ..., location Point);
INSERT INTO sites VALUES (1, 'one', Point(20, 10));
...
SELECT id, name, Distance(location, Point(15, 12)) AS distance
FROM sites ORDER BY 3;
This still puts too much of the implementation (the BETWEEN predicates) in the
application. To avoid this, we can define a new type, called Box. It represents a
rectangular area, whose edges are parallel to the coordinate axes by its
southwest (or lower-left) and northeast (or upper-right) corners as shown in
Figure 6-3.
ne
yne
y (x,y)
ysw sw
O xsw x xne X
Example 6-6 A window query using a Box data type and Within predicate function
CREATE ROW TYPE Box (sw Point, ne Point);
CREATE FUNCTION Box (sw Point, ne Point) -- Constructor function
RETURNING Box -- for better syntax
WITH (NOT VARIANT)
RETURN ROW(sw, ne)::Box;
END FUNCTION;
All of this is from a problem modeling perspective, because it is closer to the way
we think of maps and coordinate systems with points, distances, and rectangles.
The spatial column has the type Point, and the Distance function simply takes
two Point arguments. For a query that selects on points that lie within a given
rectangular window, we use a Box data type and apply a Within predicate.
Further, we have created reusable functions, Distance and Within. These
functions give the significant benefit of having a single implementation of our
logic, in the part of the software stack that is closest to the data. Applications can
use them, and any changes and enhancements apply to those applications
without difficult roll-out and installation of updates.
The main problem with row types, aside from performance limitations, is that they
offer no encapsulation. The implementation of the Point data type as two FLOAT
elements is visible for all to see, and likewise for the Box type. In effect, row types
and their elements are merely a twist on tables and their columns. Row types
tend to be inextricably tied to the data model and database schema for which
they were originally developed. This means that application code can access the
elements of the row type directly (for example, location.x), which makes it
impossible for the implementation to evolve with changing requirements.
For example, what if our experience of using the data types that we developed in
the previous examples suggests any of the following improvements:
Compress the coordinate values in Point, currently two FLOAT elements taking
16 bytes, when storing them on disk.
Perhaps there is not much to be gained in a single point value, but you can
see that it can make a big difference for geometric shapes with many points,
such as lines and polygons.
Use INTEGER coordinates.
For performance and algorithmic robustness, some spatial software packages
internally use integer, not floating-point coordinates.
Implement a Box as (southwest corner, width, height) instead of
(southwest corner, northeast corner).
This is largely a matter of taste, but sometimes one implementation can be
more efficient than the other.
Implement a Box as (center point, width, height).
For some map display software, the center of the window is a much more
important location than the corners. Therefore, it might make sense to reflect
that in the implementation.
Implement a Box as (center point, radius).
This might look like a mistake, but in some situations, it makes little sense to
talk about rectangles, while searching over a circular area is highly efficient.
Yet mapping applications expect to deal with boxes.
Indexing: For all but the simplest extensions, it is important to support any
new data types by an index. Good data modeling is helpful, but offers little
practical value if the data server cannot perform queries fast. Consider the
type of information that is represented by a new data type that does not lend
itself to B-tree indexing (because it cannot be ordered numerically or
alphabetically). In this case, a new secondary access method must be added,
by using the Virtual Index Interface (VII). The only realistic method for
implementing an access method is to use a tight, procedural language such
as C. Therefore, it is not a big step to use C for the type itself.
Let us go back to the example with Point and Box. Figure 6-4 shows how the
principle of encapsulation applies in this context. The specific implementation of
each type as a C structure with supporting functions (functions not shown) is
hidden inside a box whose border is opaque. Only the public interfaces (UDRs)
are visible to the developer or user.
Figure 6-4 Example of Point and Box as opaque data types with encapsulation
Multiple implementations for each are possible, hidden from the application code.
Only the public methods or interfaces, indicated by the ball-on-stick symbols, can
be used by the application code. Many alternative interfaces can be published,
regardless of actual implementation.
A delivery service
Imagine a delivery service that has overnight, second-day, and ground (five days)
service and delivers only on business days. Simple business logic can benefit
from a variation on the DATE type that properly differentiates between business
The BusDate UDT and its BusInterval companion are conceptually simple types
that can have a big impact on application consistency and simplicity. Most
systems implement the kind of logic outlined previously in application code.
However, such code must be reused or re-implemented consistently in many
applications, which is a much more complex challenge than making it available
once, in the data server.
This is not to say that the implementation itself is trivial. Knowing about holidays
requires the maintenance of a calendar table, which must be accessible for
localization purposes and annual updates. However, that only amplifies the point.
It is better to maintain the calendar in the database, and run the computations
that use it in the data server. This is preferred to having the calendar in multiple
applications or requiring those applications to retrieve the information from the
database for their computations.
Finally, because we are discussing a fairly simple twist on a built-in data type, it is
possible to gain most of the benefits of extensibility without implementing entirely
new types. All it takes is some UDRs that operate on normal DATEs:
A check constraint on IsBusinessDay(pick_up_date) validates entries.
NextBusinessDay(pick_up_date, 5) computes estimated_delivery_date.
BusinessDays(actual_delivery_date, estimated_delivery_date) calculates
a delivery delay.
In some cases, it is obvious that a new UDT is required. In others, anything but a
UDR or two is clearly overkill. Sometimes, however, it is a matter of preference
The next example takes all its power from a single UDR.
Assume that we have a table called BC_members, that records the contact
information and date of birth (in a DATE column called dob) for all members. The
logic behind the mailing is simple enough. That is, once a week, run a query
(shown in Example 6-8) to retrieve names and addresses for all BC members
whose birthdays are coming up in a week, which is between 8 and 14 days from
now.
Unfortunately, this query not give quite the results we want, because those birth
dates have years attached to them, and none of them will be in the future. In
reality, we want to use the birth days (without the year), not birth dates, since we
really do not care how old the members are. Of course, for a different business
process, such as purging a kids-only club of members who have become adults,
the year may be significant.
In this case, a single UDR might provide the solution. NextBirthday(dob DATE)
returns a DATE value that is the actual date, in the current or the next year, of the
next birthday for someone born on the given date. This is not simply the original
date of birth stripped of its year and extended again with the current year, as in
the following expression:
EXTEND(EXTEND(dob, MONTH TO DAY), YEAR TO DAY)
In contrast, this function makes sure that any birthday earlier on the calendar
than today is placed in the following year. Perhaps more interestingly, it also
ensures that any leap day date of birth (February 29) is mapped to an
Example 6-9 shows the new weekly mailing query and examples of the output of
NextBirthDay.
If TODAY is 07/15/2006:
NextBirthday( '10/24/1991' ) returns 10/24/2006
NextBirthday( '04/12/2000' ) returns 04/12/2007
NextBirthday( '02/29/1996' ) returns 02/28/2007
If TODAY is 07/15/2007:
NextBirthday( '02/29/1996' ) returns 02/29/2008
6.2.3 Fractions
A cadastre is an agency that records units of land and their ownership. This
involves maintaining both the legal description and geographic survey of each
parcel (the map), as well as the ownership and other rights (such as mineral or
grazing rights) that apply to the parcel, including the recording of ownership
transactions. Naturally, a cadastre requires industrial-strength systems and
software for managing legal information, transactions, and spatial surveys and
maps. A small industry segment of IT vendors specializes in these types of
Some cadastral departments are faced with peculiar requirements. For example,
one department, an IDS user, received complaints about the apportionment of
ownership (and the property tax liability that goes with it) when a parcel had not
one, but many owners. This arises frequently when a parcel is owned by a family
and passed down through the generations. The problem stemmed from the use
of the DECIMAL (or NUMERIC) type, commonly used in databases of all types,
to represent the ownership fraction.
FLOAT and REAL: The floating-point data types FLOAT and REAL, being
approximate numeric types, are generally avoided in databases unless they
represent data of a scientific or physically measured nature, such as the
geographic coordinates discussed in 6.2.1, “Coordinates” on page 222.
If you see the idea behind this problem and appreciate its importance in a
real-world situation as well as the relief that a well-chosen UDT can bring, you
may want to skip the to 6.3, “Denormalization for performance and modeling” on
page 242. However, if you need a little more convincing or want a better feel for
the specifics of a rational-number type, read on.
1 $1,000,000.00 0.015
4 $789,243.16 0.015
The parcels’ ownership is recorded in the Ownership table (Example 6-10 and
resulting Table 6-2). Each owner’s interest in each parcel is a fractional amount
(up to 1), recorded (for illustration purposes) as both a DECIMAL(5,3) number and
a fraction.
1 A 1.000 1/1
4 B 1.000 1/1
Example 6-11 Internal representation and SQL definition of the Fract data type
typedef struct /* Internal representation in C */
{
mi_integer numerator;
mi_integer denominator;
}
Fraction;
An important design choice is implicit in this structure. The domain is the set of
rational numbers m/n, with m and n as whole numbers (integers) representable
by 4-byte signed integers, that is, in the range –2,147,483,647 to 2,147,483,647.
Alternatives are 8-byte (mi_int8) or short (mi_smallint) integers.
In general, with IDS, you can use any name you choose. However, the SQL
parser might return an error in some situations if the name matches a
keyword, unless it is used as a delimited identifier (enclosed in double
quotation marks; environment variable DELIMIDENT must be set). It is better to
avoid any potential for collision and not use keywords. For more information
about keywords, refer to the IBM Informix Guide to SQL: Syntax,
G229-6375-01.
For an opaque type, the support functions that convert between the type’s
internal and external representations are identified by casts. In this case, a few
additional casts and constructors are useful to connect the Fract type to the other
numeric types:
IMPLICIT CAST (Fract AS DECIMAL) turns 3/8 into 0.375 and lets any fraction
participate easily in numeric expressions as in the following example:
SELECT o.interest_fr*p.value*p.tax_rate FROM ...;
IMPLICIT CAST (INTEGER AS Fract) turns 7 into 7/1, which is convenient in
the special case of whole values that must be treated as fractions.
FUNCTION Fract(numerator INTEGER, denominator INTEGER) RETURNING
Fract is a useful constructor from the integer numerator and denominator
values, without going through a text representation. Fract(3, 6) returns 1/2.
Finally, the Fract type needs all the appropriate functions to make it as usable as
any numeric type:
Relational operators equal (=), notequal (<> and !=), greaterthan (>),
greaterthanorequal (>=), lessthan (<), and lessthanorequal (<=)
automatically associated with their operator symbols
A compare function, for server operations such as sorting and indexing
Arithmetic operators plus (+), minus (–), times (*), divide (/), and negate
(unary –)
Additional algebraic functions Abs (absolute value) and Recip (reciprocal)
(others are possible)
Important for this application are the arithmetic operations, which preserve the
rational nature of the result, as shown in Example 6-12.
decimal 0.20833333333333
fraction 5/24
Table 6-3 Owners’ tax bills for initial state of ownership table
owner num_parcels parcels ownertax_dec ownertax_fr
At this point, with each owner owning the entire parcel outright, there is no
difference between the tax bills computed by using decimal or fractional interest.
Likewise, we can compute for each parcel the total owners’ interest and tax
assessed as shown in Example 6-14.
Example 6-14 Computation of total tax and owners’ interest per parcel
SELECT
o.oid AS parcel,
Sum (o.interest_dec) AS interest_dec,
Sum (o.interest_fr) AS interest_fr,
Sum (o.interest_dec*p.rate*p.value) AS totaltax_dec,
Sum (o.interest_fr*p.rate*p.value) AS totaltax_fr
FROM
ownership o, parcels p
WHERE
p.oid = o.oid
GROUP BY
o.oid
ORDER BY
o.oid;
Table 6-4 Total owners’ interest and tax per parcel for initial state
parcel interest_dec interest_fr totaltax_dec totaltax_fr
As expected, the total of all owners’ interest in each parcel is exactly 1, and the
total tax assessed for each parcel is the same regardless of how the owner’s
interest is recorded.
As shown, the total tax for parcel 4 has not changed. However, due to
rounding to three digits of the ownership interest, D is paying more than
necessary, while B is paying less by the same amount. Of course, whether
rounding really affects the end result depends on the number of digits kept
(the DECIMAL(m,n) column declaration), the parcel value, and the tax rate.
Table 6-6 Owners’ tax bills for after subdividing parcel 1 and selling off a part
owner num_parcels parcels ownertax_dec ownertax_fr error
While the total for parcel 4 is correct in both cases, masking the overpayment of
one owner as it is compensated by the underpayment of another, the total for
parcel 1 shows a discrepancy. That is, the total owner’s interest is computed by
using decimal values is greater than 100%, and the total tax for that parcel is off
by $60. This is clearly a nonsensical situation. While the actual tax error depends
on the specifics of the parcel and the number of digits recorded, it is not possible
A national cadastre had this problem, was under a legislative mandate to solve it,
and did so by extending the database with a new data type for fractions. How
many databases around the world suffer from the same inaccuracies, with real
financial consequences?
12C34 1 Developer
12C34 2 Manager
5G678 1 Clerk
5G678 2 Administrator
5G678 3 Manager
The structure shown in Example 6-15 on page 242 and Table 6-8 violates First
Normal Form (1NF) due to the non-atomic jobs_held column. One of the
reasons this is undesirable in a relational model is that it makes it difficult to ask
questions such as, “Which employees have management experience?”, or, in
Y
(5,9)
(1,7)
(12,6)
(8,3)
(3,2)
O 5 10 X
In applications, it is highly unlikely that we want to find all lines that have a vertex
at (21,6). Instead, we want to find lines that cross a given line, intersect or are
contained within a given region, lie within a given distance of a given point, and
so on. Each of these possibilities requires examining the entire line and, more
importantly, considering all points lying on the line segments that connect the
vertices. That is, the natural object that matches the problem domain most
closely is the whole line, not the individual vertex.
In a strictly relational data server, with no ability to create extended types, this
presents a problem. Because a line can have an arbitrary number of vertices
(two or more), we cannot resort to separate columns for the individual X and Y
coordinates in the main table.
Table 6-11 shows the sample main line table for Figure 6-6.
Table 6-12 shows the sample auxiliary line coordinate table for Figure 6-6.
117 1 3 2
117 2 5 9
117 3 12 6
117 4 8 3
117 5 1 7
This normalized representation is perfectly correct and can work, but it has two
serious defects. First, it forces any application or database procedure to
reassemble each line from its individual coordinate rows, potentially thousands or
even millions of them, before it can work on any spatial expression or process.
This hugely complicates all code and makes it difficult to determine the solution
when examining the schema and the application logic. This is a classic
consequence of the mismatch between the application object domain and the
relational model.
With extensibility, the right design is easy to see. (How you implement a
variable-length data type that can get arbitrarily large is another matter, which we
are not concerned with here). An opaque UDT, Line, can represent the entire
shape as a single column value, as in Example 6-18 and Table 6-13.
The results of the lines table creation in Example 6-18 are depicted in Table 6-13.
It is the sample line table, lines, with UDT (arbitrary text representation shown).
With the Line data type, we incur the row and column overhead only once for
each line. We also have the ability to create a spatial index on the shape column
and apply predicates and functions, such as Within and Distance (see
Example 6-7 on page 229) to formulate spatial queries. Best of all, the schema is
simple and matches how we think of geometric shapes.
Fundamentally, the encapsulation provided by the opaque data type renders the
value atomic from the point of view of the logical schema. There is no difference
in principle between a Line value that contains an array of ordered pairs and, say,
a VARCHAR value that contains an array of characters. Nor is there a difference
between such a Line value and a FLOAT that contains 64 bits in several groups
(sign, mantissa, exponent) with a specific meaning.
The next example is less obviously a modeling improvement and more purely a
performance trick.
rd
IBM and ORCL share prices - 3 Quarter, 2007
120 23
IBM 22
115
21
ORCL
IBM
110
20
ORCL
105
19
100 18
2-Jul 16-Jul 30-Jul 13-Aug 27-Aug 10-Sep 24-Sep
The relational and, with a few specialized exceptions, most common way to
handle a time series is to give each observation or element its own row.
Depending on the number and volatility of sources (such as telemetry devices
and stock symbols), each source can have its own table, or all sources of the
same type of information can share a single table.
Table 6-14 shows the first three days’ worth of data from the two time series of
Figure 6-7 on page 248.
For moderate amounts of data, this works fine. But as the table grows, and
especially as the rate at which the data comes in accelerates, the overhead of
recording each sample in its own row hurts the server’s ability to keep up and to
return query results quickly. Figure 6-8 on page 250 shows the conceptual data
volume for a database managing 3,000 stock symbols over 24 years, tracking 65
different quantities or variables, such as open, close, high, low, and volume.
One Row
(symbol, 3,000
timestamp, Securities
variables…) 24 years
@ 250 days
of data
65 Variables
Figure 6-8 Volume of rows in a relational representation of share price time series
In contrast, assume that we have a TimeSeries data type, which the Line type of
6.3.1, “Line shapes” on page 244, can hold a data array of arbitrary length.
Unlike the Line type with its fixed (X,Y) coordinate pair elements, it can handle
array elements of any structure, as long as that structure has been defined as a
row type. The table now looks something like the one represented by
Example 6-20 and Table 6-15 on page 251.
Important: What follows is loosely based on the design and capabilities of the
IBM Informix TimeSeries DataBlade product, but it only discusses the general
principles, not the specific implementation of the DataBlade. Do not use this
book as a reference for the product or draw conclusions from this discussion
about its behavior, capabilities, or implementation.
Example 6-20 Table for share prices using time series UDT
CREATE ROW TYPE DailyStats
(
market_date DATE,
high MONEY,
low MONEY,
close MONEY,
volume INTEGER,
...
);
CREATE TABLE share_prices_ts
(
symbol CHAR(6) PRIMARY KEY,
history Timeseries( DailyStats )
);
... ...
Now the data quantity, in terms of rows, looks like a two-dimensional stack as in
Figure 6-9, rather than the three-dimensional volume of Figure 6-8 on page 250.
Here the total number of rows is 3,000. Each row contains two columns. Symbol
and time stamp are not repeated for each element in the time series array.
3,000
Securities
symbol Timeseries(DailyStats)
65 variables, 24 years @ 250 days
Figure 6-9 Stack of rows in UDT-based representation of share price time series
In the next section, we briefly discuss how the time series idea can be
generalized to other areas.
IDS already has a collection type, LIST, that can manage ordered sets of arbitrary
element structures. It can even manage lists of elements that are themselves
LIST objects, which would simulate a multidimensional array. However, the
collection types were implemented as a simple extension to SQL to aid the
management of small collections of manually entered values, not arbitrarily large
arrays of automatically collected information. In addition, their semantics are
limited and not tailored to specific applications such as statistical analysis.
Real problems occur, however, when a solution requires queries that involve
more than one of these unusual types of information. Consider a simple query for
a location-based service, for example, in which we search for restaurants that
serve a specific dish and are located in a given search area. Let us say that our
restaurants table has the columns menu, which contains XML or PDF documents
that can be searched through a text index, and location, a spatial point whose
location with respect to a search area can be checked through spatial predicates.
In this case, the query looks as shown in Example 6-21.
Assume that the table contains a million restaurants, 5000 serving Peking Duck,
and 100 that are in a 5 km-radius search area. Only two restaurants both serve
Peking Duck and are in the search area. Ideally, the optimizer has enough
information to determine that the spatial Within predicate is more selective, runs
an index scan on the spatial index, and applies the text Contains predicate as a
filter on the 100 rows in the intermediate result set.
The data server performs the least possible amount of work and returns only the
desired results to the client in response to a single query. Even if the situation is
not ideal, performance suffers only a little. Most text search implementations
require an index, in which case, it is not possible to apply the Contains predicate
as a filter. The text index must be used for the primary index scan, and the Within
predicate will be applied as a filter. In our example, this means that the
intermediate result set has 5000 rows, not 100. Still, the data server can handle
this sort of situation efficiently and only returns the two-row final result to the
application.
5 km 1
Spatial WHERE …
3
100 rows 5
2 rows 2 rows 9
7 1,000,000
rows
8 WHERE id IN (…)
Pe
k in 5,000 rows 6
g
Du 4
ck Text search WHERE …
2
4a
The application must formulate two separate middleware requests and send one
(1) to the spatial and the other (2) to the text search middleware. Each
middleware process submits its own translated query to the database (3, 4),
some of which might involve access to local, proprietary index files (4a), and
retrieves the rows found (5, 6). The application receives the results returned by
the middleware processes and merges them into an intersection (7), keeping
only those rows that occur in both result sets. To reduce network load, the
application might request only primary key results on the first pass and issue a
third query (8) to retrieve the full row contents for the final result set (9).
Thus, the need for a data server platform with the ability to support any kind of
business logic never goes away. And extensibility, of course, is the key ingredient
that makes this ability truly powerful and universally flexible.
Stored procedures written in SPL have long been used for this kind of server-side
processing. However, SPL is limited as an algorithmic language, and performance
for computationally intensive steps is not as good as in a mainstream procedural
language such as C. Moreover, the set of standard SQL data types is limiting for
many problem domains. Often, this has forced processing outside the database.
Extensibility simply means that the limitations no longer apply. Business logic can
now be implemented in the software tier where it makes the most sense, not
where the limitations of traditional SQL and SPL force it.
Enter the Virtual Table Interface, a specification and interface that gives
developers the power to make anything look like a regular table, including such
diverse objects as the following types:
A hydrological stream gauge
An operating system’s file system
An in-memory structure for ingesting high-volume, real-time data streams
Another table that contains a time series column
Nearly identical to the VTI in interface and internal mechanism is the Virtual
Index Interface, which has a different purpose. With the VII, developers can
create new indexing methods to manage those new types of data for which the
traditional B-tree index is inadequate. That is, what VTI does for primary access
methods (tables), VII does for secondary access methods.
Like UDTs and UDRs, user-defined access methods are an important tool in the
solution developer’s toolbox. They convey the same benefits of performance,
simplicity, flexibility, and so on to the overall solution. Many database products
support views, database federation, and external data sources, which to one
degree or another do some of what VTI can do. Likewise, many database
products support the extension of the B-tree index to other types of data, as long
as that data type can be mapped to a linear value domain. Only IDS, however,
has VTI and VII, which like opaque data types, represent the extra level of power
and performance that are required to turn the underlying concept from an
interesting research topic to a practical mechanism for robust solutions. We
discuss VTI and VII further in Chapter 10, “The world is relational” on page 401.
Stock trades
In the previous time series example, we kept track of daily market data, which is a
cumulative measure of all the individual trades that go on during a day. But what
if we want to record and analyze all those individual transactions, as soon as they
come in on one of the proprietary financial-data feeds?
It was for just this application that the Real-Time Loader DataBlade, which works
specifically with the TimeSeries DataBlade, was developed. It employs tricks
such as shared-memory storage of incoming data, making that in-memory
GPS-enabled devices
Cars and trucks, PDAs and mobile phones, jets and missiles are pretty much all
moving and movable things that now locate themselves by using a global
positioning system (GPS) and report their changing position as they move.
Imagine tracking the millions of subscribers, updated every few seconds, for a
mobile phone service provider. In many ways, it is exactly the same problem as
that of the stock trades of the previous section, and the same tricks apply. We
can keep up with hundreds of thousands of moving objects whose position is
updated every second. This includes not only updating each object’s position as
the new coordinates come in, but querying the new configuration for spatial
relationships to detect conditions such as objects on a collision course, objects of
a specific description being closer than a specified distance, or objects entering
or leaving a specified area.
Another way of expressing the rule from the previous paragraph is that the data
server’s first priority is to keep your data safe and preserve its integrity, and then
to let you find the data you are looking for with speed and precision. This has
strong implications for deciding what is appropriate for a DataBlade. For
example, a spatial data type, with spatial predicate functions (for example,
Within) supported by a spatial index, can help quickly find rows of data by spatial
criteria. Alternatively, we can implement a DataBlade that adds sophisticated
image processing functionality (Fourier Transforms, filters, multispectral
classification) to the data server, but why burden a data management program
with this kind of highly specialized, CPU-intensive computational load?
One question is used to separate the promising ideas from the merely curious
and bad ones, when the novelty of the DataBlade idea prompted many to
propose DataBlades for everything: “Does it go in the WHERE clause?” While not
every function that is not an index-supported predicate is useless, it forces us to
think a little harder before proposing an extension that does not come with
index-supported query capability.
Naturally, there will always be gray areas. Sometimes it is convenient to find the
same functionality in the data tier as in other tiers of the software stack. But in
With this, the case for extensibility should be clear. The rest of this book looks at
specific applications of extensibility and the details of some of the facilities and
tools IDS that provides.
Finally, IDS defines two built-in functions that return the current date value.
CURRENT returns a DATETIME value, and TODAY returns today's date.
(constant)
09/02/2092
1 row(s) retrieved.
If DBCENTURY is set to P, the inferred century is the one that is closest to the
current date as described in Example 7-2.
(constant)
09/02/1992
1 row(s) retrieved.
order_date
2007-05-20
1 row(s) retrieved.
For a more elaborate date display, you can use the TO_CHAR function and
provide a format as previously described and shown in Example 7-4.
1 row(s) retrieved.
You can use some of the provided functions to extract values, such as the month
and the day, and use those values when defining table fragmentation by
expression. You can also use them for grouping in SQL statements. For example,
to determine how many orders you have per month, you can use the statement
shown in Example 7-5.
This type of grouping can be useful in all sorts of reporting. You can do much
more if you are willing to take advantage of the basic extensibility features of IDS.
With this wrapper function, you can create an index such as the one shown in
Example 7-7.
Then you can use an SQL statement such as the one shown in Example 7-8 that
takes advantage of the index.
You can also use the function in an EXECUTE FUNCTION statement, setting a
value in a function or stored procedure. Or you can use it in an SQL statement,
such as the one shown in Example 7-10.
order_date d_o_y
05/20/1998 140
1 row(s) retrieved.
The week of the year function is slightly trickier. It is a similar calculation, except
that we must divide by seven days per week, as shown in Example 7-11.
The key to this function is to understand the offset provided by the WEEKDAY
built-in function. The WEEKDAY function returns zero for Sunday up to six for
Saturday. If January 1 is a Sunday, then we know that January 8 is the following
Sunday, week 2. If January 1 starts on any other day, this means that the first
week is shorter. The WEEKDAY built-in function gives us that offset that allows
us to determine the week of the year.
The week_of_year() function has a problem with the last week of one year and
the first week of the next year. For example, 31 December 2004 was a Friday and
(expression)
53
1 row(s) retrieved.
(expression)
1 row(s) retrieved.
This problem has been addressed in the ISO 8601 standard. Jonathan Leffler, of
IBM, has written stored procedures that implement the standard. His stored
procedure can be found on the International Informix Users Group Web site at
the following address:
http://www.iiug.org
Let us continue our discussion on date manipulation. We ignore the ISO 8601
standard to keep the functions simple. As you have previously seen, you can
easily adapt the functions to be compliant with the standard.
To calculate the week of the month, use the same function, but instead of using
January 1 as the starting date, use the first day of the month coming from the
date passed as argument, as shown in Example 7-15.
Many companies want to calculate the quarter based on their business year that
is different from the calendar year. There are even some organizations that must
calculate the quarter differently based on what needs to be done. For example,
some school districts must calculate a calendar quarter, a school year quarter,
and a business quarter.
In this implementation, the year is included as part of the quarter. For example,
the third quarter of 2005 is represented by 200503. This simplifies the processing
when an SQL statement spans more than one year. You can decide to create a
different implementation, such as returning a character string instead of an
integer. This is determined by your particular requirements.
DEFINE yr INTEGER;
DEFINE mm INTEGER;
LET yr = YEAR(dt);
LET mm = MONTH(dt)+4; -- sept to jan is 4 months
IF mm > 12 THEN
LET yr = yr + 1;
LET mm = mm - 12;
END IF
RETURN (yr * 100) + 1 + (mm - 1)/3;
END FUNCTION;
quarter count
199802 16
199803 7
2 row(s) retrieved.
In addition, you can create indexes on the functions as shown in Example 7-19.
This gives you the flexibility to have the database return the information that you
are looking for. IDS extensibility can be useful in many other areas.
By using these date manipulation techniques, you can adapt IDS to fit your
environment. If the date functions provided here do not exactly fit your
environment, you can easily modify them. The flexibility provided by IDS means
that the IDS capabilities should be considered early in the design phase of an
application. The result can be greater performance and scalability, and a simpler
implementation.
Data handling
With the data handling class of functions, you can obtain information about the
data type with which you are working, manipulate it, convert it to a different data
type, convert it to a different code set, or transfer it to another computer running
IDS. Functions in this class are typically used in UDRs that work with
user-defined types (UDTs), complex IDS types (such as sets, lists, and
collections), as well as more common tasks such as handling NULL values and
SERIAL values that have been retrieved from the server.
For more information about processing SQL statements, see IBM Informix
DataBlade API Programmer's Guide, G229-6365.
Function execution
With the function execution class of functions, you can obtain information about,
and execute, UDRs from your routine. A UDR can be invoked through an SQL
statement, such as EXECUTE FUNCTION or SELECT, and its data retrieved as
with any other SQL statement. If the UDR that is called resides in the same
shared object library as the caller, then it can be invoked just as any other C
function. If it resides in a different shared object library or DataBlade, then you
must use the Fastpath interface to call the UDR. The Fastpath interface allows a
UDR to directly invoke the UDR, bypassing the overhead associated with
compiling, parsing and executing an SQL statement. There are other functions in
this class, with which you can obtain information about trigger execution as well
as HDR status information.
Memory management
Because UDRs execute inside the IDS server context, traditional operating
system memory allocation functions, such as malloc(), calloc(), realloc() and
free(), should not be used. The DataBlade API provides the memory
management class of functions so that you can manage IDS server memory
within the UDR, and allocate and free memory from the virtual pool in IDS. These
functions are mi_alloc(), mi_dalloc(), mi_zalloc(), and mi_free().
Memory that is allocated and freed during UDR execution is performed in terms
of durations. The default duration for memory during a UDR is PER_ROUTINE.
That is, memory allocated during the execution of the UDR is deallocated at the
end or when it is explicitly freed by the UDR. Table 7-1 describes the various
memory durations that are available.
PER_COMMAND Subquery
This class of functions also includes the ability to create named memory. Named
memory is memory allocated from IDS shared memory, but instead of accessing
it purely by address, you can assign a name and a duration to the memory. This
makes it easier to write functions that can share state across multiple invocations
of UDRs and user sessions. There are also functions provided to lock and unlock
memory to help with concurrency.
Important: Some objects, such as large objects and long varchars, provide
special functions for allocating and deallocating memory that is associated
with those objects. Consult the IBM Informix DataBlade API Function
Reference, G229-6364, for more information about functions related to these
objects to determine the method to allocate and deallocate memory.
Exception handling
With the exception handling class of functions, you can trap and manage events
that occur within IDS. The most common events are errors and transaction
events such as an instruction to rollback the current transaction. You can find
examples of this type of function in Chapter 9, “Taking advantage of database
events” on page 373.
Tracing
The tracing class of functions allows for the embedding and enablement of
tracing messages during UDR run time. With this facility, you can create trace
classes and define levels of trace messages that can be embedded in your UDR.
You can also specify the file name to which the trace output is written. All IBM
DataBlades include a special trace function to help diagnose problems that are
related to that DataBlade.
When constructing a C UDR, remember to include the header file mi.h. This
reference includes most of the common definitions and other header files that will
be used in most UDRs. For more information about these functions, header files
and data types, consult the following manuals in the IDS 11 Information Center at
the following address:
http://publib.boulder.ibm.com/infocenter/idshelp/v111/index.jsp?topic=/
com.ibm.start.doc/welcome.htm
IBM Informix DataBlade API Programmer's Guide, G229-6365
IBM Informix DataBlade API Function Reference, G229-6364
You should be aware of the limitations to Java UDRs, including those as follows:
Commutator functions
A UDR can be defined as the same operation as another one with the
arguments in reverse order. For example, lessthan() is a commutator function
to greaterthanorequal() and vice versa.
Cost functions
A cost function calculates the amount of resource a UDR requires based on
the amount of system resources it uses.
Operator class functions
These functions are used in the context of the Virtual Index Interface (VII).
Before you start to use Java UDRs, you must configure IDS for Java support. In
brief, you must ensure the following tasks:
Include a default sbspace (SBSPACENAME onconfig parameter).
Create a jvp.properties file in $INFORMIXDIR/extend/krakatoa.
Add or modify the Java parameters in your onconfig file.
(Optional) Set some environment variables.
A Java UDR is implemented as a static method within a Java class. For example,
to create a quarter() function in Java, see the sample code provided in
Example 7-20.
To compile the class, use the javac command from the Java Development Kit
(JDK™). For Util.java, you can use the following command:
javac Util.java
If you are using Java Database Connectivity (JDBC) features or a special class,
such as one that keeps track of state information in an iterator function, you must
add two Java archive (JAR) files in your classpath. The following command
illustrates their use:
javac -classpath
$INFORMIXDIR/extend/krakatoa/krakatoa.jar;$INFORMIXDIR/extend/krakatoa/
jdbc.jar Util.java
After the Java code is compiled, you must put it in a JAR file with a deployment
descriptor and a manifest file. The deployment descriptor allows you to include in
the JAR file the SQL statements for creating and dropping the UDR. In this
example, the deployment descriptor, possibly called Util.txt, can be as shown in
Example 7-21.
"BEGIN REMOVE
DROP FUNCTION jquarter(date);
END REMOVE"
}
Before you can create the function, you must install the JAR file in the database.
Identify the location of the JAR file and give it a name that is then used as part of
the external name in the create statement:
EXECUTE PROCEDURE install_jar(
"file:$INFORMIXDIR/extend/jars/Util.jar", "util_jar");
The install_jar procedure takes a copy of the JAR file from the location given as
the first argument and loads it into a smart binary large object (BLOB) stored in
the default smart BLOB space defined in the onconfig configuration file. The JAR
file is then referenced by the name given as the second argument.
The CREATE FUNCTION statement defines a function jquarter that takes a date
as input and returns a varchar(10). The modifier in the function indicated that this
function can run in parallel if IDS decides to split the statement into multiple
threads of execution.
The external name defines the JAR file where to find the class Util. This name is
the one defined in the execution of the install_jar procedure. The class name is
followed by the static function name and the fully qualified argument name.
At this point, you can use the jquarter() function in SQL statements or by itself in
a statement such as the following example:
SELECT jquarter(order_date), SUM(amount)
FROM transactions
WHERE jquarter(order_date) LIKE "2005%"
GROUP BY 1
ORDER BY 1;
For simple functions, such as jquarter(), it might be more desirable to use SPL or
C, rather than Java. By its nature, Java requires more resources to run than SPL
or C. However, this does not mean that it is a bad choice for extensibility.
If you do not have demanding performance requirements, it does not matter that
you use Java. In some cases, the complexity of the processing in the function
makes the call overhead insignificant. In other cases, the functionality provided in
Java makes it a natural choice. It is much easier to communicate with outside
processes or access the Web in Java than with any other extensibility interface.
Java is a natural fit to access Web services. Since the UDR runs in a Java virtual
machine (JVM™), you can also use any class library to do processing, such as
Choosing Java as the primary extensibility language does not exclude using
either SPL or C, so do not hesitate to use Java for extensibility if you feel it is the
right choice.
DBDK is a graphical user interface (GUI) that includes the following parts:
BladeSmith
BladeSmith helps you manage the project and assists in the creation of the
functions based on the definition of the arguments and the return value. It also
generates header files, makefiles, functional test files, SQL scripts,
messages, and packaging files.
BladePack
BladePack can create a simple directory tree that includes files to be
installed. The resulting package can be registered easily in a database by
using BladeManager. It assumes that you have created your project by using
BladeSmith.
BladeManager
BladeManager is a tool that is included with IDS on all platforms. It simplifies
the registration, upgrade, and de-registration of DataBlades.
The first line includes a file that defines most of the functions and constants of
the DataBlade API. Others include files that might be needed in some other
cases. This include file is located in $INFORMIXDIR/incl/public.
The line the follows defines the function quarter() as taking a date as an
argument and returning a character string (CHAR, VARCHAR, or LVARCHAR).
Note that the DataBlade API defines a set of types to match the SQL types. The
function also has an additional argument that can be used to detect whether the
argument is NULL, as well as do other tasks.
The rest of the function is straightforward. We extract the month, day, and year
from the date argument by using the ESQL/C function rjulmdy(), calculate the
quarter, create a character representation of that quarter, and transform the
result into an mi_lvarchar before returning it.
The next step is to compile the code and create a shared library. Assuming that
the C source code is in a file called quarter.c, you can do this with the following
steps:
cc -DMI_SERVBUILD -I$INFORMIXDIR/incl/public -c quarter.c
ld -G -o quarter.bld quarter.o
chmod a+x quarter.bld
The ld command created the shared library named quarter.bld and includes the
object file quarter.o. The .bld extension is a convention that indicates that it is a
Obviously these commands vary from platform to platform. To make it easier, IDS
has a directory, $INFORMIXDIR/incl/dbdk, that includes files that can be used in
makefiles. These files provide definitions for the compiler name, linker name, and
the different options that can be used. These files are different depending on the
platform in use. Example 7-23 shows a simple makefile to create quarter.bld.
MI_INCL = $(INFORMIXDIR)/incl
CFLAGS = -DMI_SERVBUILD $(CC_PIC) -I$(MI_INCL)/public $(COPTS)
LINKFLAGS = $(SHLIBLFLAG) $(SYMFLAG)
all: quarter.bld
quarter.o: quarter.c
$(CC) $(CFLAGS) -o $@ -c $?
quarter.bld: quarter.o
$(SHLIBLOD) $(LINKFLAGS) -o quarter.bld quarter.o
To use this makefile on another platform, you must change the include file name
on the first line from makeinc.linux to makeinc with the suffix from another
platform.
If there is a need to remove the function from the database, it can be done with
the following statement:
DROP FUNCTION quarter(date);
Starting with DataBlade modules released in 2007, such as Spatial 8.21.xC1 and
Geodetic 3.12.xC1, a single installation command can help simplify installation
and provide new capabilities such as console, GUI and silent installation. This
command is in the following format:
blade.major.minor.fixpack.platform.bin
Refer to the Quick Start guides that are provided with the DataBlade module for
more information about how to access these options.
You must register a DataBlade module into a database before it is available for
use. The registration might create new types, UDFs, and tables and views. The
registration process is made easy through the use of the DataBlade Manager
In this example, we start by executing the blade manager program. The prompt
indicates with which instance we are working. This example uses the IDS 11
instance. The first command, list demo, looks into the demo database to see if
any DataBlade modules are already installed. The next command, show modules,
provides a list of the DataBlade modules that are installed in the server under
$INFORMIXDIR/extend. The names correspond to the directory names in the
extend directory. The blade manager utility looks into the directories to make sure
they are proper DataBlade directories. This means that other directories can
exist under extend, but are not listed by the show modules command.
With access to the source code, you can study the implementation and then
enhance it to suit your business requirements and environment. These
extensions can provide key functionality that can save time, cost, resources, and
effort.
Flat File Access Method A complete access method that lets you build virtual tables
based on operating system files.
JPEG Image Bladelet Provides a UDT for JPEG images so that you can
manipulate images and extract and search on image
properties.
For a more complete list of Bladelets and example code, refer to the following
Web address:
http://www.ibm.com/developerworks/db2/zones/informix/library/samples/db
_downloads.html
IDS has the extensibility components and functionality that can help as you
customize IDS for your particular environment.
We also discuss how to consume Web Services and explain new capabilities for
geospatial mapping and location-based services by using the Web Feature
Service (WFS) available in IDS 11. Finally, we show how you can index your data
in a non-traditional way by using the “sound” of the data.
The simplest example of an iterator function is an SPL function that uses the
RETURN WITH RESUME construct, which is shown in Example 8-1. This
function takes two arguments that represent the start and end date and returns
all of the business days (defined as Monday through Friday) between them.
Sample usage:
(expression)
11/19/2007
11/20/2007
11/21/2007
11/22/2007
11/23/2007
7 row(s) retrieved.
This state information is passed internally via the MI_FPARAM structure. This
structure is declared for all user-defined routines (UDRs) written in C that can be
accessed from SQL. This structure is commonly used to examine and modify
information such as the type, length, precision, and null state of the arguments
passed to the UDR. It can also examine and modify the same type of information
for the return arguments from the UDR.
The iterator state information is examined and modified by using the following
two functions:
mi_fp_request(MI_FPARAM)
This function is an accessor function, with which you can obtain the current
state of the iterator. It takes a pointer to an MI_FPARAM structure that was
In addition, you can also store the address of private-state information, called
user-state information, in a special field of the MI_FPARAM structure. IDS
passes the same MI_FPARAM structure to each invocation of the UDR within the
same routine sequence. When your user-state information is part of this
structure, your UDR can access this across all invocations. This user-state
information is examined and modified by using the following two functions:
mi_fp_funcstate(MI_FPARAM)
With this function, you can obtain the user-state pointer from the MI_FPARAM
structure of the UDR. It takes a pointer to an MI_FPARAM structure as its only
argument. It returns a void pointer that should be cast to a pointer to your
user-state information structure.
mi_fp_setfuncstate(MI_FPARAM, void)
With this function, you can set the user-state pointer in the MI_FPARAM
structure of the UDR. It takes a pointer to the MI_FPARAM structure for your
UDR, as well as a pointer to your user-state information structure. It does not
return any values.
5. After compiling the file or files that contain your C-based iterator into a shared
library, as discussed in Chapter 7, “Easing into extensibility” on page 263,
The iterator function that you created can be used in either an SQL EXECUTE
FUNCTION statement or an SQL SELECT statement, as demonstrated in
Example 8-4.
Method 2:
switch (mi_fp_request( fp ))
{
case SET_INIT:
/* Allocate memory for user-state information */
TelIterator = mi_dalloc(sizeof(TelIterState), PER_COMMAND);
memset(TelIterator, 0, sizeof(TelIterState));
TelIterator->currval = 0;
TelIterator->remvals = numvals;
/* All allocations in the structure need same duration */
TelIterator->initval = mi_dalloc(8, PER_COMMAND);
memset(TelIterator->initval, 0, 8);
sprintf(TelIterator->initval,”%03d-%03d-”, npa, nxx);
/* Store the user-information state */
mi_fp_setfuncstate(fp, (void *)TelIterator);
mi_fp_setreturnisnull(fp, 0, MI_TRUE);
break;
case SET_RETONE:
/* Retrieve the user-information state */
TelIterator = (TelIterState *)mi_fp_funcstate(fp);
if (TelIterator->remvals-- == 0)
{
/* We are done, terminate the active set */
mi_fp_setisdone(fp, MI_TRUE);
After compiling this file into a shared object, we register it to the database as
shown in Example 8-6.
Now that the function is registered, we can use it as shown in Example 8-7.
newnumber 801-438-0000
newnumber 801-438-0001
Since IDS does not provide the capability to parameterize views, we demonstrate
a technique that can be used to provide the equivalent functionality. This
technique involves using a combination of global variables that can be defined by
using an SPL stored procedure and an SPL iterator function that accesses it. A
global SPL variable has the following characteristics:
Requires a DEFAULT value
Can be used in any SPL routine, although it must be defined in each routine in
which it is used
Carries its value from one SPL routine to another until the session ends
Can be of any type, but a collection data type (SET, MULTISET, or LIST)
It is also possible to use a C UDR that declares a piece of named memory that is
valid for a given session or even for all sessions, but we do not discuss this in
detail here. For simplicity in showing the example, we use SPL. Example 8-8
shows how to define and access a global variable in SPL, as well as sample
usage.
Sample usage:
Routine executed.
(expression)
1000002
1 row(s) retrieved.
In addition to global variables, IDS has several built-in variables that can also be
referenced in this manner. They come in the form of operators, such as
CURRENT, SYSDATE, TODAY, and USER, but there are also ones that can be
accessed via the SQL DBINFO(), such as DBINFO('sessionid'). These variables
can be useful if you want to control the result set based on the user name that is
using USER, but also can be maintained in a table based on the current session
identifier.
In Example 8-10, we define the initial view that gets materialized in full. The first
view shown summarizes the call_log table with a count of calls and sum of the
call durations by phone number. The second view combines the first view with
some other metadata to expand the detail. A sample explain output for the query
is also shown.
Sample query:
Explain output:
As you can see in the explain output in the query, a temporary table is necessary
to materialize the call_summary_view embedded in the definition of the
phone_bill_view.
To improve the query plan, and thus performance, we define the two SPL
functions to set and retrieve a session global variable, as well as a modified view
Now that we have created a new view that uses this iterator function, we
demonstrate how it is executed in Example 8-12. First, we initialize the value to
pass to the iterator function. Next, we query the new view, still providing a
phone_number to be queried. This is still necessary for the filter to get folded into
the view execution when it attempts to retrieve data from the mobile_phones
table.
Query:
SELECT * FROM phone_bill_view_new WHERE phone_number = ‘801-555-0015’;
Explain output:
Estimated Cost: 66
Estimated # of Rows Returned: 19
Filters: informix.filter_call_detail.phone_number =
'801-555-0015'
8.1.4 A challenge
As discussed in 8.1.2, “Generating data with iterators” on page 300, one of the
potential uses of iterator functions is to create new data sets. One of the biggest
challenges faced by developers and database administrators is testing
applications when the data sets are small and do not include all the ranges of
values that should be exercised through the system. It is possible, by using the
DataBlade API, to construct an iterator function that examines the structure of a
table and produces test data based on the data types that it finds.
For example, the iterator function can take a table name, the number of rows
desired, and an unnamed IDS ROW type that represents valid values for each
column. These values can include specific values, ranges, and sequences. The
return from this iterator can be an unnamed ROW type that can be inserted via
an SQL INSERT statement.
This type of function would be enjoyed by many others in the IDS community. We
encourage you to either submit your completed function to the International
Informix User’s Group (IIUG) Software Repository or to IBM via developerWorks
at the following address:
https://www.ibm.com/developerworks/secure/myideas.jsp?start
=true&domain=
In this section, we have discussed creating iterator functions using SPL and C
that can be used to generate data or to improve performance by subsetting
complex views. By using this functionality, you can truly pump up your data, not
only to save processing time on the client, but create data in larger volumes that
can help you test applications and database layouts and tune configurables.
Table 8-1 describes the various built-in aggregates and the operators that
support them.
To extend an existing IDS aggregate with a new data type, you must write the
appropriate C-based UDR or UDRs that implement the required operator
functions for the new data type and register them by using SQL CREATE
FUNCTION statements. For a detailed example of how to implement this type of
aggregate, see Chapter 15 of the IBM Informix DataBlade API Programmer’s
Guide, G229-6365.
In order to define a UDA, you must provide aggregate support functions that
implement the tasks of initializing, calculating, and returning the aggregate result.
The purpose of the optional INIT aggregate function is to return a pointer to the
initialized aggregate state. Whether the INIT function is required depends on
whether the aggregate has a simple state or non-simple state.
A simple state is an aggregate state variable whose data type is the same as the
aggregate argument. This is enough in aggregates such as the built-in aggregate
SUM(), which requires only a running sum as its state. However, the AVG()
built-in aggregate requires both a sum and a running count of records processed.
Therefore, it is said to have a non-simple state, because special processing is
needed after all the iterations to produce the final result.
As with the iterator functions described in 8.1.2, “Generating data with iterators”
on page 300, the state must be maintained in the PER_COMMAND memory
pool, so that the different threads have access to the state.
In Example 8-13, we define a structure that holds the state of the aggregate to be
used during the calculation. This structure holds the running sum and the count
of the records used, and must be passed via an mi_pointer C data type. When
this and the other support functions are registered later, we use an equivalent
SQL POINTER type in the definition of the functions, for both arguments and
return variables.
DagState_t *dagStateInitMean(void)
retval = (mi_pointer)dagStateInitMean();
return(retval);
}
The ITER aggregate function performs the sequential aggregation or iteration for
the UDA. It merges the single aggregate argument into the partial result, which
the aggregate state contains. Example 8-14 shows how we implement the ITER
function for the MEAN() aggregate that we previously described.
With the COMBINE aggregate function, the UDA can execute in a parallel query.
In IDS, when a query that contains a UDA is processed in parallel, each thread
operates on a subset of selected rows. The COMBINE aggregate function
merges the partial results from two such subsets. It ensures that the result of
aggregating over a group of rows sequentially is the same as aggregating over
the two subsets of rows in parallel and then combining the results.
It is important to note that a COMBINE function can be called even when a query
is not parallelized. Therefore, you must provide a COMBINE function for every
UDA. Example 8-15 shows a COMBINE function for our implementation of the
MEAN() aggregate.
In this example, we take two aggregate states (State1 and State2), which
represent two possible parallel threads, and merge the results. The COMBINE
function is called for each pair of threads that IDS allocates. There is no need to
write functions to handle more than two states. IDS simply uses the COMBINE
function for states repeatedly until only one state remains. Before the function
returns, it must also release any resources that were associated with one of the
partial results.
Now that we have created all the supported UDRs for our MEAN() aggregate, we
must compile these functions into a shared-object library and register them via
the SQL CREATE FUNCTION statement. Example 8-17 shows how to register
these four functions.
Finally, we can create our MEAN() aggregate via the SQL CREATE
AGGREGATE statement, as shown in Example 8-18.
mean
98.9553000000000
1 row(s) retrieved.
By using this template, we can also define aggregates to compute the median
and mode of a set of DECIMAL values.
To calculate the median exactly, you must store every value in the aggregate
state. Because the UDA cannot know in advance the number of values in the set,
the state must be a variable size, allocated dynamically as needed with the
following aggregate:
mi_dalloc(size, PER_COMMAND)
For the sample implementation (which is available as a download from the IBM
Redbooks Web site, as explained in Appendix A, “Additional material” on
page 465), the state is a structure that contains a count of values and a pointer to
the values. The values are stored in a binary tree. Each node in the tree contains
a decimal value, the count of rows containing that value, and pointers to two
nodes with a lesser and greater value, respectively.
This tree structure allows the ITER function to add values to the state in time of
order N*log(N) in the best case, where N is the number of values (order N2 in the
worst case, that is ordered incoming data). However, it is most easily navigated
by a recursive function. In order not to exceed the size of the stack that IDS has
allocated for the function, the recursive functions must use the DataBlade API
mi_call() function to call themselves.
The count field in the node allows the UDA to store repeated values in one node,
reducing memory consumption.
The ITER function compares the incoming value to the value in the first or root
node. If the incoming value is less than that in the node, the function recursively
calls itself with the left child node. If the value is greater than in the node, the
function recursively calls itself with the right child node. It repeats this until either
of the following situations occurs:
The values compare equal. (The count is incremented.)
A node is not found or NULL. (A node is allocated with the mi_dalloc()
function and initialized with the incoming value and a count of 1.)
The COMBINE function selects one of the two given state structures as the
source and the other as the target. It takes each node from the source tree and
moves it to the target tree. A move adds the node to the target if it did not already
exist. Otherwise, the count in the source node is added to that of the target, and
the source node can be freed. The function navigates down both source and
target trees recursively. When finished, the source tree is empty, and the source
state can be freed.
The FINAL function takes the number of values maintained in the state and
divides it by two to get the position of the value to retrieve. It then navigates the
tree in ascending order, using the count field in each node to keep track of the
number of values visited. When it reaches the correct position, it retrieves the
current value. Before returning, it frees the state. Both the navigation and the
freeing of the state tree are recursive.
This UDA takes a parameter to indicate the size of memory to be allocated. This
is the memory size per invocation.
The INIT function allocates memory buffers of the given size for the incoming
data. However, this aggregate can operate in parallel, and therefore, the INIT
function can be called multiple times, allocating buffers of this size each time. A
The ITER function adds a value to a buffer. If the buffers are full, the function
collapses the data to fit into fewer buffers, by sorting it and taking evenly-spaced
samples. This frees the remaining buffers to receive new data.
The COMBINE function combines and frees buffers, similar to the collapse in
ITER.
The FINAL function retrieves the median value, frees the buffers, and returns.
Enhancements
A more advanced UDA can take a parameter for the number of quantiles (1, 3,
100, and so on), and return those quantiles.
Some quantile algorithms enable you to compute the variance for a given
memory size or vice versa. Therefore, another more advanced UDA can take a
parameter indicating the allowed error in the result and compute the required
memory size from it.
The INIT function allocates the state structure and initializes counters to zero,
which is similar to MedianExact(), except that Mode() does not need a total count
of values.
The ITER function adds the incoming value to the state, which is the same as in
MedianExact().
The FINAL function navigates the state tree, saving the node with the highest
count as far as it goes. The difference from MedianExact() is that Mode() must
navigate the entire state tree, not half of it.
Summary of UDA
In this section, we have discussed how UDAs can be used to put together your
data in new ways. By doing so, you can reduce the amount of time spent
transmitting data to the client to compute these types of aggregates as well as
There are several ways to integrate IDS 11 into an SOA framework. You must
differentiate between service providing and service consumption in addition to
foundation technologies, such as XML support and reliable messaging
integration. With the industry leading extensible architecture and features that
are unique to Version 11, you can easily achieve an SOA integration.
IDS 11 developers have several options for providing Web services. Most of them
use one of the many application development options, such as those that follow,
that are available for IDS:
Java-based Web services (through the IDS JDBC driver)
.NET 2.0-based Web services (through the new IDS .NET 2.0 Provider)
IBM Enterprise Generation Language (EGL)-based Web services
PHP, Ruby, Perl, and C/C++ Web services
Finally, with the introduction of the Web Feature Service for geospatial data,
IDS 11 is capable of providing an Open GeoSpatial Consortium
(OGC)-compliant Web service (just add an HTTP server to IDS) to integrate
easily with geospatial applications that are WFS compliant. For more information
about WFS, see 8.4, “Publishing location data with a Web Feature Service” on
page 324.
The advantages of having Web services accessible from SQL include easy
access through SQL and standardized APIs (for example, Open Database
Connectivity (ODBC) and Java Database Connectivity (JDBC)). They also
include moving the Web service results closer to the data processing in the
database server, which can speed up applications, and providing Web service
access to the non-Java or non-C++ developers.
Example 8-20 illustrates the C source code of the UDR in Figure 8-1.
#include "CurrencyExchange.h"
Gen_Con = NULL;
soap_init(soap);
Gen_RetVal =
(mi_double_precision *)mi_alloc( sizeof( mi_double_precision ) );
if( Gen_RetVal == 0)
{
DBDK_TRACE_ERROR( "CurrencyExchange", ERRORMESG2, 10 );
}
return Gen_RetVal;
}
Further reading
For more details about how to integrate IDS with an SOA environment, refer to
the Redbooks publication Informix Dynamic Server V10 . . . Extended
Functionality for Modern Business, SG24-7299.
Briefly, the WFS is an open standard for retrieving and applying insert, update,
and delete transactions to geographic data through an HTTP- and XML-based
protocol. With IDS 11, IBM has introduced the WFS DataBlade to implement this
standard. The main purpose of this section is to describe the WFS DataBlade
and its use in detail. However, we begin with an overview of spatial data, its uses,
and the DataBlades that help us manage this type of data.
These organizations have long relied on their own ability to survey and collect
geographic information to create their maps. Collectively they have developed
specialized tools and techniques to do so, including software tools for managing
the amassed data. The general term for these software tools is geographic
information system (GIS). An entire industry has sprung up around this,
providing tools, data, and services for everything from basic map display to
advanced editing, cartographic publishing, and geographic processing and
analysis.
Fortunately, for most countries, data and services are available that make
possible the conversion from text-based address to coordinate-based point
location, which is called geocoding. A full discussion of geocoding is beyond the
scope of this book. For our purposes, it is enough to know that we can turn
addresses into point locations.
Both developments have led to the evolution of GIS from a specialized niche to
an integrated aspect of mainstream IT, and to the addition of geographic, or
spatial, data management capabilities to enterprise-level database products.
Data server vendors, such as IBM have, teamed up with GIS software vendors,
such as ESRI and MapInfo (now a unit of Pitney Bowes), to make the integration
as seamless as possible.
Recently, the trend toward mainstream use of geographic technologies and map
visualization has accelerated. Mapquest, Google, Yahoo, and Microsoft mapping
and route planning sites have put the power of this approach in the hands of the
general public. Even corporate CEOs are now asking their IT departments why
they cannot see their own data this way. Mashup and Web services
developments have made advanced capabilities easier to access and integrate
than ever before. They have raised expectations that we can apply them to all our
business processes and data. IDS is meeting the challenge with new capabilities,
such as the WFS DataBlade, building on the spatial data management
foundation it has had for over a decade. Let us now look at this foundation.
Since the background to this is the map-making tradition that underlies nearly all
GIS practices, the specification is for shapes in the two-dimensional plane only,
where straightforward Euclidian geometry (the kind taught in secondary school)
applies. The term simple here refers to the limitation that each feature has a
single geometric representation that is an instance of one of the types defined in
the standard. There is no provision for managing objects that are complex, that
is, composed of multiple geometric shapes (each potentially with its own set of
attributes), but treated as a single value for programming and data transfer
purposes.
ST_Geometry
ST_Geom-
ST_Curve ST_Surface ST_Point
Collection
ST_Multi- ST_Multi-
LineString Polygon
Abstract Types
Instantiable Types
The basic types of geometry, which are point, line (here called linestring), and
polygon, should be familiar. Note the following aspects of the type hierarchy as
well:
The SFS defines specific formats for representing values of these types in binary
and text form. With no hint of modesty, these are designated Well-Known Binary
(WKB) and Well-Known Text (WKT), respectively. For example, the WKT string in
Example 8-21 defines a line segment. The choice of separators (spaces and
commas) and delimiters (parentheses) is determined by the standard. The
format is self-explanatory. A line segment is defined by two points (vertices, the
plural form of vertex, in official usage), and each point is defined by a coordinate
pair (X, Y).
An aspect of spatial data that adds complexity is that coordinates, such as those
in Example 8-21, only mean something in terms of a real location on the earth if
the coordinate reference frame is known. That is roughly, what is the origin of the
coordinate axes and what is the unit along those axes. The Spatial DataBlade
has facilities for defining such Spatial Reference Systems (SRSs), in terms of map
Each SRS is designated by a numeric ID, the SRID. Every ST_Geometry object
carries with it the SRID of the SRS in which its coordinates are defined. SRIDs
only have local significance in the current database. The map projection on which
the SRS is based, however, can be referenced to an outside authority to ensure
more reliable interactions between applications. We do not describe this
mechanism further, except to say that it must be carefully managed. In the WFS,
which by definition is about communication between systems, SRIDs are not
used, but similar principles apply.
The Spatial DataBlade implements many functions to operate on its data types.
The most important of these are the spatial operators, which are the predicate
functions that go in the WHERE clause to support queries that select rows on the
basis of spatial criteria. (For more information, see Chapter 6, “An extensible
architecture for robust solutions” on page 219.) Because the WFS standard
builds on the SFS, it is not surprising to find that the set of spatial operators in the
WFS matches the set in the SFS. These operators are described in Table 8-2 on
page 339. Of course, these operators are backed up by a capable spatial index
(the R-tree index), without which none of this would be of any practical interest.
However, all projections from the round surface of the earth to the flat plane of a
map introduce distortions, and these become more severe as the area covered
by the projected reference system grows. For example, Greenland, which is
about one-quarter the land area of Brazil, looks as big as South America on
many world maps. Naturally, this causes more frequent problems now that
organizations are consolidating data from what used to be isolated, mostly local
projects into seamless enterprise databases.
Finally, and more dramatically, all maps have edges. If you travel far enough in
one direction, eventually you fall off the edge. No map can account for the fact
that you can travel around the world in any direction without ever encountering an
edge, and this is why no map can serve as a useful model for global applications.
None of this is helped by substituting the usual global coordinates, latitude and
longitude (usually in reverse order), for the X and Y of a projected map
Figure 8-3 illustrates one of the problems introduced by assuming the earth is
flat, resulting in erroneous distance measurements along nonsensical
trajectories. The straight-line distance depicted in view (a), on the left, is
meaningless. Not only does it traverse most of the northern hemisphere when a
much shorter path is available, the actual path described by the straight line is
not particularly well defined and utterly dependent on the particular projection
used. In view (b), on the right, the path shown is truly the great-circle, shortest
path on the globe, which is shown in this picture (itself a projection, but only for
display purposes) as a curved line. On the round earth, there is no such thing as
a straight line.
Figure 8-3 Distance from Anchorage to Tokyo from a flat earth and round earth view
Of course, this has been known for a long time, but maps are such useful and
convenient tools and have served humanity so well for so long, that we tend to
forget their limitations. Globes, which are arguably a more spatially accurate
representation of the earth, are too difficult to carry and impossible to get at large
enough scales to show sufficient detail. When software tools came along to help
the geography and cartography communities, the standard practice of making
maps was carried over as the fundamental paradigm for GIS software. None of
There are many parallels between Spatial and Geodetic DataBlades. For the
present purpose, it is sufficient to assume that they are roughly equivalent in
terms of the kinds of geometric shapes they manage and the kinds of spatial
predicates and operations they support. A few differences, however, are worth
pointing out.
Important: While the Geodetic DataBlade is based on a model for the earth
that is a spherical shape in a three-dimensional space, it does not manage
three-dimensional geometric objects (solids) or perform spatial analysis in
three dimensions. All geometric shapes represented by this product are
confined to the surface of the conceptual earth. Curved though it may be, that
surface is only a two-dimensional space, which is why two coordinates are
sufficient to locate yourself on it. The Geodetic DataBlade is not a 3-D product.
Example 8-22 shows such a spatiotemporal query. With a suitably defined R-tree
index, this query on both space and time is resolved by a single index scan.
This query suggests the management of a library of satellite images. The search
is for images whose spatial footprint (meaning, the area of the earth covered by
the image) overlaps with a small area in California (latitude is the first coordinate,
longitude the second) and that were taken during in a two-day period in June,
2007. Note that, by using the GeoPolygon data type, the footprint values
themselves can stretch over a period of time (a time range), not just a single
point in time.
As indicated previously, the unique design of the Geodetic DataBlade, with its
undoubted superiority in handling certain kinds of data and scenarios, also
renders it nonstandard and difficult to access using common tools. As we shall
see, this is one reason why the WFS can be so valuable. It abstracts access
away from the specifics of the SQL interface. If a WFS request includes a filter
based on a time attribute as well as spatial criteria, the WFS implementation is
free to map that filter to an integrated, high-performance Geodetic query under
the covers.
Now that we have a basic understanding of the two DataBlades on which the
WFS operates, we are ready to look at the WFS itself.
One of the solutions for providing this platform independence is the OGC WFS
specification. It provides a generic method for accessing and creating geographic
data via a Web service. A WFS provides the following capabilities:
Query a data set and retrieve the features.
Find the feature definition.
Add features to a data set.
Delete features from a data set.
Update features in a data set.
Lock features to prevent modification.
While a Web Mapping Service (WMS), another OGC specification, returns a map
image to a client, the output of a WFS is an XML document that uses GML to
encode the spatial information so that it can be parsed and processed by using
traditional XML parsing techniques. Requests to a WFS can be made via either
an HTTP GET method by using key-value pairs or an HTTP POST method by
using an XML document. A WFS, backed up by a spatially enabled data server
such as IDS with the Spatial DataBlade, can answer such queries as those that
follow:
Find all the Italian restaurants within one mile of my location and rank them by
diner rating.
Tell me whether my container is still in the port.
Show all the earthquakes in this area that were above 5.0 magnitude, that
occurred this century.
Show me the areas of common bird migrations in Canada.
Show me the areas forecasted to have severe weather in the next 24 hours.
These queries can contain purely spatial requests or be combined with traditional
relational fields to form a rich query environment to map geographic data,
provide location-based services, and perform complex spatial analysis.
Three types of WFS are discussed in the OGC Web Feature Service, with the
Transactional extension, (OGC WFS-T) version 1.1.0 specification upon which
the IBM Informix WFS DataBlade is based:
Basic WFS
This is the minimum implementation. It provides for the creation of a read-only
WFS that responds to queries and returns results.
The WFS specification has some specific terminology that needs to be put in
database terms for the discussion here. A namespace refers to an IDS database,
a feature type refers to a table, a feature refers to a row, and a feature id refers to
a table primary key. All features presented by the WFS must be uniquely
identifiable, and the identifier must consist of a single column primary key.
The WFS DataBlade implements a Transaction WFS that supports the following
operations:
GetCapabilities
DescribeFeatureType
GetFeature
Transaction
The GetCapabilities operation is responsible for defining what the service can
provide. It lists the operations that it supports, the spatial types it can operate on,
query predicates, output formats, and feature types that are available in that
particular instance. Example 8-23 on page 335 shows the two methods of
requesting a GetCapabilities document as well as a possible response.
This operation is typically used by geospatial mapping tools that contain WFS
clients to determine what is available from the offered service. It also provides
information to a prospective consumer about the data to see the types of data
sets that are offered and the kinds of query capabilities they can use against the
service.
Sample response:
<?xml version=”1.0”>
<xsd:schema targetNamespace="http://wfs.somegeo.net/polboundaries"
elementFormDefault="qualified" version="0.1">
<xsd:import namespace="http://www.opengis.net/gml"
schemaLocation="http://schemas.opengis.net/gml/3.1.1/base/gml.xsd"/>
<xsd:element name="congress110" type="polboundaries:congress110_Type"
substitutionGroup="gml:_Feature"/>
<xsd:complexType name="congress110_Type">
<xsd:complexContent>
<xsd:extension base="gml:AbstractFeatureType">
<xsd:sequence>
<xsd:element name="se_row_id" type="xsd:integer"
minOccurs="0" maxOccurs="1"/>
<xsd:element name="state" nillable="true" minOccurs="0"
maxOccurs="1">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="2"/>
</xsd:restriction>
</xsd:simpleType>
The GetFeature operation is the heart of the WFS server. It is responsible for
retrieval and presentation of the data requested. GetFeature requests are based
PropertyIsLessThan <
PropertyIsGreaterThan >
PropertyIsLessThanOrEqualTO <=
PropertyIsGreaterThanOrEqualTo >=
PropertyisEqualTo =
PropertyIsNotEqualTo !=
PropertyisLike LIKE
PropertyIsBetween BETWEEN
PropertyIsNull IS NULL
Touches This is a test to see if the interiors of the two geometries do not
intersect and the boundary of either geometry insects the other’s
interior or boundary.
Crosses This is a test to see whether the boundaries of two geometries cross
each other. This test is typically done between the spatial types of
MultiPoint/LineString, LineString/LineString, MultiPoint/Polygon,
and LineString/Polygon.
Within This is a test to see if the first geometry is completely within the
second geometry. The boundary and the interior of the first
geometry are not allowed to intersect the exterior of the second
geometry.
DWithin This is a test to find objects that are within a stated distance of the
geometry provided.
Beyond This is a test to see if the objects are outside the stated distance. It
is equivalent to saying Not DWithin(...).
And This a logical operator to combine two CQL predicates to test to see
if both operations are true. It is equivalent to the SQL AND
operation.
In addition to the CQL operators listed in Table 8-2, a GetFeature request can
take additional parameters, which are listed in Table 8-3.
RESULTTYPE This parameter is used to determine whether the actual features are returned
(results) or if a count of qualifying features should be returned.
MAXFEATURES This parameter is used to control the maximum number of features returned.
TYPENAME A list of type names to query. This is the only mandatory element to a query. If
no other parameters are specified, all feature instances are returned in the
response document.
FILTER This describes a set of features to retrieve. The filter is defined using the
predicates defined in Table 8-2 on page 339. There should be one filter
specification for each type name requested in the TYPENAME parameter. This
parameter is mutually exclusive with FEATUREID and BBOX.
SORTBY This parameter is used to specify the property names (column names) by
which the feature result list should be ordered when returned in the XML
response document.
PROPERTYNAME This parameter is used to list the property names (column names) that should
be presented in the XML response document. The default is all properties.
Example 8-25 show examples of the two possible forms for a GetFeature
request. The first example encodes the transaction by using the KVP syntax that
can be sent by an HTTP GET method, while the second example shows an XML
document that can be sent via a HTTP POST method. In both examples, the
feature type quakes is being queried via a spatial bounding box for a maximum of
200 features. The output is returned in an XML document.
Figure 8-4 shows a sample WFS GetFeature transaction that was entered into a
browser. We have shown a sample section of the XML output, for familiarity with
the format of the output. All documents that are returned from a successful
GetFeature operation have the number of features returned, a time stamp, and a
bounding box element that shows the entire spatial extent from all feature types
requested. This spatial extent is based on all of the type names (table names)
requested and not purely on the result of the transaction. Each row that satisfies
the query is returned as a gml:featureMember with its unique identifier.
The Transaction operation, as the name suggests, allows for the creation
(INSERT), modification (UPDATE), and deletion (DELETE) of features that are
stored in the WFS. The INSERT and UPDATE Transaction operations are
required to be formed by using an XML document, while DELETE has the ability
to be sent via GET using key-value pairs. Example 8-26 on page 344 shows
three sample operations that can be done and a sample of the XML response
that comes back from a Transaction operation.
The first example inserts a row into a table that contains restaurant ratings. In this
statement, we request that the database generate a new value for the feature
identifier. This identifier is returned as part of the TransactionResponse XML
document in InsertResults returned at the end of the transaction. All fields in the
table must be specified in the INSERT transaction. When you want to specify a
NULL field, specify an empty tag (for example <column_name/>). You can specify
one to many feature instances in each insert transaction.
The third example shows a DELETE transaction where we remove the identified
restaurant from our feature type. The DELETE transaction only needs to contain
a predicate (Filter) that uses CQL similar to the GetFeature request. Like the
UPDATE example discussed earlier, we retrieve the row based on a feature
identifier (natlrestaurants.44223). The number of rows removed from the table is
displayed in the XML TransactionResponse document. As with UPDATE, if no
rows qualify for the filter specified, it is not considered an error. Unlike INSERT
and UPDATE, since DELETE only requires a FILTER specification, it can be
specified by using the KVP syntax.
Sample response:
Browser/
Web Server WFSDriver CGI
Client
XML RESPONSE
wfs.cnf Wfsexplode
UDR
[…] Spatial /
<map path=/gatgr> Geodetic
database gatgr
galandpolys
user informix
password 28d7[…]4662 Gist_id … geom
password_key tiger21 26 … Multipolygon(((…
</map>
[…] 27 … Multipolygon(((…
A typical WFS conversation has the following flow as illustrated in Figure 8-5:
1. The Web server (IBM HTTP Server or other Common Gateway Interface
(CGI) compliant server) receives the HTTP GET/POST request from a client.
A client can be a Web browser or custom program.
2. The Web server examines its configuration file (for example, httpd.conf) and
invokes the CGI program wfsdriver (wfsdriver.exe on the Windows platform).
3. The CGI program reads the configuration file (wfs.cnf) and obtains
information about the location of IDS client libraries, database name, user
name, and the encrypted password with which to connect to the database.
4. The CGI program connects to IDS and calls the WFSExplode() UDR in the
shared object library (wfs.bld).
5. The WFSExplode() UDR processes the transaction and formats an XML
document to be returned to the wfsdriver CGI program. If any errors
occurred during processing, an XML error document is returned.
6. The XML document is returned to the client.
7. Register the WFS DataBlade module and either the Spatial or Geodetic
DataBlades as shown in Example 8-28.
8. Create a directory that has the same name as your database to be used by
the CGI program in which the Web server is installed. This directory contains
a copy of the wfsdriver configuration file (wfs.cnf) and the wfsdriver CGI
program.
9. Run wfssetup, which is located in the
$INFORMIXDIR/extend/wfs.1.00.xC1/wfsdriver directory in your installation.
This is a UNIX script. Therefore, users on Microsoft Windows must create the
wfs.cnf manually, like the one shown in Example 8-29. This file must be
placed with a copy of wfsdriver/wfsdriver.exe in the CGI directory created in
step 8.
Example:
$ cd $INFORMIXDIR/extend/wfs.1.00.UC1/wfsdriver
$ wfspwcrypt wfsdemo informix demokey
password 5b29c069a79c2c9efc9f366a4d08c15e
password_key demokey
Important: If you are using Microsoft Windows as the Web server, you
must ensure that the INFORMIXDIR and INFORMIXSERVER values
declared in the wfs.cnf match the values as set by using the setnet32
utility. This utility is included in both the Informix-Connect and Informix
Client SDK.
10.Configure the Web server for the address:port that you want your service to
be broadcast on and include the ScriptAlias line similar to the one shown in
Example 8-31.
11.If you have existing data that you want to present, skip to the next step. In
Appendix A, “Additional material” on page 465, we provide links to sample
data for you to download. This data is compatible with the Spatial DataBlade.
To load it, you use the loadshp utility.
Beginning transaction.
12.If your table has a spatial column type, declare the table in
sde.geometry_columns via an INSERT statement. This table is created when
either the Spatial or Geodetic DataBlade is registered to the database. This
allows WFS to understand the spatial reference system that is being used in
that particular table. The loadshp utility, which is included with the Spatial
DataBlade, automatically creates a row in this table when tables are created
or initialized. The table contains the following columns:
– f_table_catalog
This column typically contains the name of the catalog (database) when
used with software such as ESRI ArcGIS. For a table to be enabled with
WFS, this value needs to be set to WFS.
– f_table_schema
This column contains the schema owner name for the table.
– f_table_name
This column contains the table name.
– f_geometry_column
This column stores the point, linestring or polygon. If there are multiple
such columns in the table, there must be one row in this table per column.
– storage_type
This column can be set to NULL.
– geometry_type
This column must be set to the type of geometry stored in the column. The
following, more common values are valid for this column. For a more
complete listing, refer to Appendix F of the IBM Informix Spatial DataBlade
Module User’s Guide, G229-6405.
1 ST_Point
3 ST_LineString
5 ST_Polygon
11 ST_MultiPolygon
– coord_dimension
This column can be set to NULL.
Example 8-33 Inserting a row into the geometry_columns table for WFS
INSERT INTO ‘sde’.geometry_columns
VALUES(‘WFS’,’informix’,’landplaces’, ‘geom’, NULL, 1, NULL, 4);
1 row(s) inserted.
(expression)
OK
1 row(s) updated.
Similar metadata fields are found in the wfsmetaserviceid table. These fields
are displayed in the ServiceIdentification section in the XML response for a
GetCapabilities transaction.
14.Verify connectivity by using a GetCapabilities request from your Web browser.
The request has the following format:
http://yourhostname:[http_port]/map
name/wfsdriver[.exe]?SERVICE=WFS&VERSION=1.1.0&REQUEST
=GetCapabilities
While Gaia cannot connect to databases, it knows about OGC Web Services,
such as WFS, WMS, and Web Coverage Service (WCS), and mapping services,
such as Yahoo! Maps and Microsoft Virtual Earth™. It also knows about
processing common geospatial file formats, such as GML, ESRI Shapefiles
(shp), GoogleEarth (kml), AutoDesk (dxf), and MapInfo (mif). Any combination of
these types can be used in layers to create a rich display for analysis and
annotation.
Figure 8-7 Adding the IDS WFS service to Gaia (courtesy of the Carbon Project)
Figure 8-8 Layer presentation window from Gaia (courtesy of the Carbon Project)
While the specific details of adding services to geospatial mapping tools might
vary, the basic information, such as URL, Service Type, and Version, are fairly
typical across a number of geospatial mapping tools that contain a WFS client.
Other tools and possibly future versions of this tool shown might enable other
spatial operations and more complex queries to be specified to allow richer
presentation and the ability to conduct analysis.
Consider a GPS-enabled mobile phone-based application that can show you the
restaurants within a given radius of your current location. These restaurants can
be selected by type and ranked by user rating. Example 8-35 shows a sample
DWithin query along with the SQL that is generated by the WFSExplode() UDR
that such an application might use.
Equivalent SQL:
Note: In version 11, the WFS DataBlade only supports translation of the mile
(mi), nautical mile (nm), meter (m), kilometer (km), and foot (ft) distance-type
arguments with the DWithin and Beyond CQL predicates. In the Spatial
DataBlade, the unit of measure depends on the spatial reference system
specified. If the spatial reference system does not specify a linear unit of
measure, it does the calculation based on degrees of latitude, which gives a
less precise result.
After or during the meal, the user can upload comments, ratings, or both of the
establishment by using an INSERT operation such as the one shown in
Example 8-26 on page 344.
id 1
event GeoPoint((52.388,177.872),(3.5,3.5),(2007-11-16
18:44:18.00000,2007-11-16 18:44:18.00000))
id 3
event GeoPoint((-9.820,108.670),(4.9,4.9),(2007-11-16
15:00:04.00000,2007-11-16 15:00:04.00000))
3 row(s) retrieved.
Table 8-4 explains how each field is mapped from the GeoObject type, with
columnname (such as event in Example 8-35) representing the name of the
column that is a GeoObject data type.
SQL:
Summary
In this section, we have discussed enabling the publication of location-based
data by using the WFS DataBlade in combination with the Spatial and Geodetic
DataBlades. This opens a new capability for your business environment to use
geospatial mapping tools to present colorful maps for analysis and presentation.
It also gives you the ability to provide location-based services over the World
Wide Web without having to learn complex spatial SQL functions and predicates
or designing complex protocols for exchanging data.
Several algorithms have been developed for searching based on the Soundex
code, such as the following examples:
Russell Soundex
NYSIIS
Celko Improved Soundex
Metaphone/Double-Metaphone
Daitch-Mokotoff Soundex
In Example 8-39, a standard SQL query on our example Soundex data type has
asked for employees where the name equals “Smith”. Note that “Smythe,”
“Smyth,” “Smithie,” and “Smithy” are all returned in addition to “Smith”. This is due
to the fact that comparison takes place on the sound attributes rather than the
actual text contents of the name column.
empid name
13 Smith
15 Smythe
17 Smithie
19 Smyth
20 Smithy
5 row(s) retrieved.
In the sections that follow, we demonstrate how to create this new data type
along with the functions that allow it to be indexed, how to create indexes on this
type, and extending the functionality even further to explore regular expression
matching on the new data type.
typedef struct {
mi_char data[256];
mi_char sound[30];
} TSndx;
We must also define this type to IDS, which we do by using the CREATE
OPAQUE TYPE command, as shown in Example 8-41.
All data types require a way to convert from the externally given format to an
internal representation, and back again. For example, an integer stored as an
SQL SMALLINT with the value 32767 does not require 5 bytes to store the value.
Instead it uses 2 bytes for the value internally and converts it back to five
characters when it is returned from the database.
textval = mi_lvarchar_to_string(InValue);
mi_free(thesound);
mi_free(textval);
return(result);
}
Similarly, we create an output function so that the database knows which part of
the opaque type to return when it is retrieved as demonstrated in Example 8-43.
Example 8-43 Source code for the output function and function creation in IDS
UDREXPORT mi_lvarchar *TSndxOutput(TSndx *value)
{
return (mi_string_to_lvarchar(value->data));
}
Instead of having to convert (or cast) from TSndx to char and back again when
necessary, IDS allows us to provide what are referred to as implicit casts. That
means that in the right context, IDS does not return an incorrect data type error.
Instead we give it permission to make certain assumptions and do the casting (or
converting) for us. See Example 8-44.
Now that we have defined how the data will be stored and displayed, we can
create a table and insert data into it. Example 8-45 shows how to create the table
with a TSndx column and the INSERT statements. We use character strings in
the INSERT, even though the name column is of type TSndx. With the casting
functions defined in Example 8-44, IDS can convert this data automatically, and
the user is unaware that the name column is much more than a character type.
If we now select from the employee table, shown in Example 8-46, we see the
output equivalent as though name were a simple character type.
18 row(s) retrieved.
What is probably not evident from this example is that the normal equal (=)
operator used in SQL is now available to compare sound values. The full listing of
all SQL and C code in Appendix A, “Additional material” on page 465, shows the
complete implementation with the addition of the overloads for >, <, >=, <=, and
!= operators. While it is debatable whether a string sounds better or worse than
any other, if you can order the sounds in the database, then you can index them
and gain performance benefits.
Assuming that we have different criteria from the normal compare() functionality,
how can we index on this value? We can do this by using the functional indexes
feature. With functional indexes, you can create an index on the result of a
To display the sound value for each row in the table, use the SELECT statement
as shown in Example 8-49.
name namesound
Davis 180
Dilbert 15941
Dogbert 17941
Duffies 180
Genes 620
Jonbert 62941
Laurence 5420
Jones 620
MacLoud 3751
Lawrence 5420
McLewid 3751
Ratbert 41941
Smith -31
Toffies 180
18 row(s) retrieved.
Now we can create a functional index on the table, as shown in Example 8-50.
Now compare how data might be stored using an index on the employee name
as opposed to the an index on its associated sound. Figure 8-9 shows that the
similar-sounding employees are grouped together, from a sound perspective,
using the functional index and alphabetically using a regular index. Note that
Duffies and Davis are stored near Toffies and Doofus. In the example, the spread
of the data is greatly exaggerated at three rows per page to emphasize how
having to search for n similar-sounding objects can mean having to read n pages.
In this case, no intelligence can be built around the data type, which might mean
a worst-case scenario of having to perform a full table sequential scan.
Figure 8-9 How data might be physically stored using two different indexes
pattern = mi_lvarchar_to_string(p_pattern);
patternsound = Sndx(pattern);
flag = strmatch(patternsound, p_string->sound);
mi_free(pattern);
mi_free(patternsound);
return(flag);
}
Now that IDS knows what is meant by LIKE when referring to a TSndx data type,
we can use the SQL LIKE operator in queries. Example 8-52 shows the query:
“Find all names beginning with the “J” sound and ending with an “S” sound.”
2 row(s) retrieved.
7 row(s) retrieved.
8.6 Summary
In this chapter, we have shown how you can use the power of IDS extensibility to
look at your data in new ways and create data sets by using iterators. We have
also shown you have can use iterator and aggregate functions to increase
performance. In addition, we have shown how you can customize your
environment to consume and provide Web services using IDS. In this changing
IT landscape, extensibility can play a key role in reducing the time needed to
produce applications yet provide rich presentation capabilities for your business
needs.
We see this in many environments. For example, application servers might run
Java beans and Enterprise JavaBeans™ (EJBs) that communicate with each
other based on the demands of the application. These components can be
distributed over a large network and communicate with each other by using
mechanisms, such as messages, or lower level communications, such as
opening a network socket or sending a signal.
We also hear a lot about new types of applications, such as those called
mashups. These applications take advantage of public interfaces that provide
specific functionality to manipulate the information being processed to show it
differently. Or they use such interfaces to add other information sources to
complement the data and generate new information and new business insights.
The Informix Dynamic Server (IDS) provides significant other capabilities that
can benefit the implementation of a software architecture, as we have
demonstrated in several chapters of this book. We can take it one step further
and integrate the database into the flow of the architecture. Here, it is not only
used as an end node, but as an intermediate node that can obtain information
from multiple sources and provide a complete set of results or final result. This
way, we can reduce the overall complexity of the application code since it does
not have to access multiple sources and join the information together to get to the
final result. In addition to reducing complexity, it can optimize the use of multiple
resources, such as the network bandwidth, and increase the system’s scalability.
Figure 9-1 on page 375 illustrates this integrated design.
1 2
IDS
Figure 9-1 represents is the ability of IDS to access outside sources to complete
the information provided by the database. The following types of information, as
examples, can be provided:
Validation of an address during an INSERT through a Web service
Collection of geocoding information during a SELECT
Discovery of any outstanding traffic violation
Understanding credit ratings, history, or both
Learning about creditor actions
With this information, you can create what is called a data-centric mashup.
The approach can also enable optimization by joining the outside data with the
database tables, by taking advantage of the optimized code of the server. It can
also save operations since the server can keep track of what was retrieved and
not retrieve the same information multiple times.
It is common in the travel industry, for example, to have an application that needs
to access another system, often a mainframe, to complete a reservation. By
using this new approach, the problem can be solved as illustrated in Figure 9-2
on page 376.
1
IDS 2
Validate requirements
based on business rules
Figure 9-2 Travel service architecture
Without the events, the application does not have to access the database to
validate the travel request based on the business rules established. If the request
is outside the guidelines, it can either be rejected or flagged with a warning.
The application then needs to access the booking service to request the
appropriate booking. This can result in an error due to no availability or a return
of the itinerary. The application then must store the resulting itinerary in the
database and return the information to the user.
We can take this approach one step further and communicate with the outside
world based on the modifications made to the database tables. In this case, we
must be concerned about the completion of these modifications, which is where
database events come in to play. The following types of operations can be
performed:
Placing an order to a supplier based on an inventory change
Using outside resellers, such as amazon.com, overstock.com, and others, to
sell overstock merchandise
Inventory
3
Application
2 Reseller-2
1 IDS
Reseller-1
One advantage of this approach is that if the resellers change, are removed, or
are added, the inventory application itself does not need to be changed or even
aware of these changes.
The use of database server events provides another benefit in that the
application does not need to be concerned with these tasks. This means that if
the business needs change, such as adding additional resellers, there is no need
to change the application itself because only the database processing changes.
This benefit is compounded when multiple applications are subject to common
rules, because the change then occurs at one place rather than in multiple
applications.
By using this approach, you can register an event when the connection is
established, in addition to setting any connection attributes, such as the isolation
level. The processing must be limited to the general information that can be
gathered from the environment when the event occurs. However, this general
approach is likely too restrictive to be useful.
Another way to register events is to relate them to specific tables. This means
that the registration of an event is done through a trigger on a specific table.
Example 9-1 shows the CREATE TRIGGER syntax.
In Example 9-1, we execute an action for each row that is processed in the
statement execution. Since we want to process events, we must register a
callback, which is done in the BEFORE action. Note that this action is executed
even if the triggering statement does not process any rows. This is acceptable
since the processing is performed in the FOR EACH ROW action. If nothing is
The SELECT trigger capability is unique to IDS. It was added in the IDS 9.21
time frame, sometime during the year 2000. Then later IDS 10.0 introduced the
ability to create triggers on views (INSTEAD OF triggers).
The new row is available in the case of INSERT and UPDATE operations. The old
row is available for the DELETE and UPDATE operations.
That is what can be done with triggers. Now we return to the callback functions.
There are two questions that need to be answered. They are, how can a callback
function be created and how can it be registered? Let us take a look.
MI_CB_CONTINUE This is the only status other than an exception that callback
can return. IDS continues looking for other callbacks to
execute for the given event. If an exception callback returns
this status, and no other register callback exists, the
DataBlade API aborts the user-defined record and any
current transaction.
The last argument is particularly useful since memory can be allocated that is
passed to the callback function to provide any type of information that is desired.
We return to it in 9.4.5, “Memory duration” on page 383.
The body of the callback is just like any C user-defined function (UDF). It
contains calls to DataBlade API functions, standard C functions, and possibly
system calls. Example 9-3 shows a simple callback that allows for testing to
make sure the callback is called as expected.
change_type = mi_transition_type(event_data);
DPRINTF("logger", 80, ("Entering cbfunc0()"));
switch(change_type) {
case MI_BEGIN: str = "BEGIN";
In this callback, we write to a trace file that the callback was called and indicate
the type of event that occurred. We obtain the latter information by calling the
DataBlade API function mi_transition_type() by using the argument event_data
as input to it.
Much more can be added to the callback to send information to the outside
world. The possibilities are discussed in 9.6, “Communicating with the outside
world” on page 392. The information is usually obtained through the user_data
parameter that is passed when registering the callback.
The block of memory can be of any format as long as both the registration
function and the callback agree on its content. Our example passes a NULL
pointer, but other information can be included, such as the context of the
registration (trigger on the table and the type of trigger (DELETE, INSERT,
SELECT, UPDATE)).
IDS has the concept of memory duration for blocks of memory allocated by a
user-defined routine (UDR). This concept must be well understood so that we do
not end up with either invalid pointers or memory that stays allocated longer than
needed. This is the subject of the next section.
PER_STMT_EXEC For the duration of the execution of the current SQL statement
The DataBlade API provides a set of functions for memory allocation. For
example, the mi_alloc() function allocates memory based on the current default
memory duration. This default can be changed with the command
mi_switch_mem_duration(). Another way to control the memory duration is to
use the mi_dalloc() functions that take an additional memory duration argument.
Regardless of which memory duration you are using, it is a good practice to free
the memory explicitly, using mi_free(), if possible.
When writing callback routines and passing information through memory, the
most likely duration used is PER_SESSION since we need to use the information
generated during the transaction after the transaction completes.
You might know that your callback needs to generate information that is reused
between transactions, such as some initialization values of a total count of
activities, or even between sessions. In this case, you might have to use a
PER_SYSTEM memory duration, which you most likely use with another
memory allocation scheme, named memory, which is discussed in the next
section.
sessionConnection = mi_get_session_connection();
/* Retrieve the session ID */
sessionId = mi_get_id(sessionConnection, MI_SESSION_ID);
/* Retrieve or create session memory */
sprintf(buffer, "session%d", sessionId);
if (MI_OK != mi_named_get(buffer, PER_SESSION, &pmem)) {
/* wasn't there, allocate it */
if (MI_OK != mi_named_zalloc(sizeof(NamedMemory_t), buffer,
PER_SESSION, &pmem)) {
mi_db_error_raise(NULL, MI_EXCEPTION,
"Logger memory allocation error", NULL);
}
/* initialize the memory structure */
. . .
}
After defining local variables, we retrieve the database connection for the current
session. This allows us to execute the mi_get_id() function and retrieve the
session identifier. We then use this session identifier to create a unique name for
the entire server.
At this point, we are ready to start manipulating the named memory block. The
first if statement test is to see if named memory with the name found in the
variable buffer has already been allocated. If it has been allocated, the memory
block is retrieved in the variable pmem.
The named memory block can contain pointers to other memory blocks. These
additional memory blocks do not need to be allocated as named memory since
we already have a means to retrieve where they are. However, the callback
If there is a need to free up the named memory block, it can be done easily as
shown in Example 9-7.
This code is similar to the allocation code found in Example 9-6. The interesting
part is that the mi_named_free() function requires the additional argument that
indicates the memory duration of the named memory block. This implies that
multiple named memory blocks with the same name can exist as long as they
have different duration.
The most common use of callbacks is to use information about the row
processed, manipulate it, and send it to a destination. This means that we use a
trigger that processed each row and a callback at the end of the transaction. Both
the row processing routines and the callback must have access to the same
memory. When we register the callback, we can give it a user memory buffer. To
ensure that it is the same as the one used by the row processing routines, we
use named memory with an agreed upon name. This way, each piece of code
has a common memory pointer.
Because this might seem a bit confusing, we provide a few examples. We look at
a few possibilities by using the event type MI_EVENT_END_XACT.
The first test involves using the callback code shown in Example 9-3 on page 381
and registering for the MI_EVENT_END_XACT event in the BEFORE action of a
trigger. Example 9-8 shows the INSERT operations.
In the second part, there are two INSERT statements that are part of the same
transaction. The trigger is executed twice in the transaction. Therefore, the
callback function is registered twice. The result is that one transaction calls two
callbacks resulting in two new entries in the tracing file.
We need to do one more test, which is to remove the trigger and register the
callback in each case, as shown in Example 9-9.
The first statement registers the callback function, and its execution ends with the
following error message:
7514: MI_EVENT_END_XACT callback can only be registered inside a
transaction.
The first statement executes properly without executing the callback. The
statements that are included between BEGIN and COMMIT execute properly
resulting in one call to the callback function. The last INSERT also executes
properly but without a call to the callback function.
mi_integer registerCallback1()
{
MI_CALLBACK_HANDLE *cbhandle;
MI_CONNECTION *sessionConnection;
mi_integer sessionId;
NamedMemory_t *pmem;
mi_string buffer[32];
Start by defining the structure to be used for the named memory. The key
component is to have a flag that indicates if the callback has been registered.
The size of the structure depends on how the memory will be used.
The code continues with the creation or retrieval of the named memory as shown
in Example 9-6 on page 385. Since the memory is initialized to zero, the
registered indicator is automatically set, indicating that the callback has not been
registered.
After the named memory is retrieved, we test the registered indicator to see if we
must register the callback. If we need to register the callback, we do so by
passing the named memory pointer as the user data argument. Then we change
the indicator after verifying that the registration was successful.
This method implies that the callback itself will either reset the registered
indicator to zero or free the named memory. Otherwise, the callback is registered
only once for the life of the named memory block.
Now that we know how to create and register callbacks, let us look at how to
architect a solution that makes the database server a more active participant in
9.5.1 Option A
Figure 9-4 illustrates event processing using Option A.
Commit/Rollback
6
7
5
Callback MonitorProgram
Statement
1
Register
8
3
2 4
Trigger
Table EventTable
We described the first part of this option earlier in this chapter. It is represented
by the following steps:
1. The statement executes and accesses a table.
2. The BEFORE trigger is called.
3. The callback function is registered for the desired event.
At this point each row is processed. This implementation involves writing the
result of the processing to an event table. That table can simply keep a
character representation of the result or can be more complex, including
multiple columns of varied types. The processing steps continue as follows:
4. Process each row from the statement.
5. IDS generates a COMMIT or ROLLBACK event.
6. Execute the callback.
This model only works when processing COMMIT events. If you want to process
ROLLBACK events by using this architecture, you will find the event table empty.
The event table (eventTable) is empty because all the write operations to it are in
the context of the current transaction. If the transaction is aborted, the operations
are rolled back, which includes removing all the data from the event table. This
means that you need a different approach when you want to generate some
events, even when transactions are rolled back.
9.5.2 Option B
This implementation option does not depend on a database table. It is necessary
when the objective is generate events with information when a rollback occurs.
One way to do this is to write to an external file. The DataBlade API provides a
set of functions to manipulate files. By using this technique, the file can be read
back after a ROLLBACK and still provide the information to an outside program.
When performing this option, a specific file name can be agreed on, or a
message that provides the path to the result file can be used. However, a more
efficient method is to take advantage of named memory.
Commit/Rollback
6
8
5
Callback MonitorProgram
Statement
1
Register 7
3
2 4
Trigger
Table NamedMemory
http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0410roy/
It might be difficult to manage the information provided by the events. How can
the events be removed from the file once its processing is completed?
It makes sense that, instead of writing to a specific file, the callback writes to a
specific directory. Each event then generates a new file, and unique file names
can easily be generated by using a timestamp value. This way, events can be
archived or removed after they have been processed.
As you can see in the VPCLASS definition, it is possible to start more than one of
the virtual processors. This brings us back to the second problem discussed in
the beginning of this section. To solve that problem, the DataBlade API provides
the functions mi_module_lock() and mi_udr_lock() to request that a function be
rescheduled on the same virtual processor on which it initially ran.
Any function that is misbehaved should use these mechanisms. But what about
the callback function? It is not a registered UDF. From a callback function, it is
possible to call a UDF that restricts where it executes, which is the subject of the
next section.
A UDF can be called with no consideration for its implementation. This means
that the callback can call a UDF written in C, Java, and SPL.
A Java UDF can also be called. Since the Java runtime environment™ (JRE™)
included in IDS is a complete implementation, the full power of the language can
be used to do anything the language allows. This includes opening sockets,
sending signals, and so on. Java does not have all the functionality available in C.
However, this should not be a problem since the callback is written in C and can
do any processing unavailable in Java before calling the Java UDF.
The DataBlade API provides the following set of functions under the fastpath
interface:
mi_cast_get()
mi_func_desc_by_typeid()
mi_routine_end()
mi_routine_exec()
mi_routine_get()
mi_routine_get_by_typeid()
mi_td_cast_get()
The fastpath interface provides a more convenient (and faster) way to call a
function than to use an EXECUTE FUNCTION SQL statement. As when using
SQL statements, the fastpath interface allows functions to be called without
having to compile them in the shared library. The function can come from any
DataBlade module that is registered in the current database.
When using the fastpath interface, the calls follow this general sequence:
1. Retrieve a reference to the desired function.
2. Execute the function.
3. Release the resources associated with the function descriptor.
If the function is not found, the mi_routine_get() functions return a NULL pointer.
With the function descriptor, the function can be called and passed the proper
arguments. The mi_routine_exec() function returns a pointer to the value that is
returned by the execution of the target function.
When finished, the memory associated with the function descriptor can be
released.
The kill() system call takes only two arguments: the process number where to
send the signal and the signal being sent. Because of this, some issues must be
addressed when using this capability.
First, know which process identifier you want to target. Depending on the
operating system, it is possible to find information about the running processes
by using several methods. A simple, platform independent way is to agree on a
specific file where the process ID is written when the program is started. The
callback function can then use the DataBlade API functions to read the file to
retrieve the process number and send the signal.
Since there are two user-defined signals, information can be conveyed based on
which signal is sent. For example, one signal can be used for committed
transactions and the other signal can be used for rollbacks.
After the monitoring process receives the signal, it knows to return to the
database server to retrieve the details of the event. If you are only concerned
with committed transactions, the easiest way is to have a communication table
When you also want to handle transaction rollbacks, the solution is slightly more
complicated since it is not possible to write the event to a communication table. If
you do so, the inserted row or rows disappear due to the rollback.
One solution is to write the event to an external file. This is done by defining a
directory for event files. After receiving a signal, the monitoring program can start
processing the event files from the given directory. When an event is processed,
the event file can be removed from the directory or moved to another location.
Another solution is to keep the information in named memory. After receiving the
signal, the monitoring process gets the information from named memory and
processes the events. Of course, the monitoring program cannot access named
memory directly. Therefore, this implementation must include additional UDRs to
provide this access.
The DataBlade API does not provide functions to establish a network connection.
An easy way to work around this issue is to use the fastpath interface to call a
UDF written in Java.
A C UDF can still be used to access the network. This function is definitely a
misbehaved function. Therefore, a user-defined virtual processor is required.
The use of message queues does not require a callback function. The trigger can
simply either call the appropriate function or INSERT into a message queue table
if it is set up accordingly.
9.7 Conclusion
Integrating IDS in the architecture of a solution can simplify the design and
improve performance. It can also provide an additional separation from business
processes and a specific business application.
In this chapter, we described how you can take advantage of the power of IDS
extensibility to gain a business advantage. The more you learn about the
capabilities of IDS, the more you can improve your efficiency. However,
integrating IDS into a solution architecture from the start is the best approach.
We also show a few examples that illustrate the power of this framework and
provide possible starting points for how to customize IDS to build complex data
integration applications.
IDS refers to VTI and VII as user-defined access methods (UDAMs). An access
method consists of software routines that open files, retrieve data into memory,
and write data to permanent storage, such as a disk. IDS natively supports
built-in access methods that work with relational data stored in IDS tables.
However, UDAMs are plug-ins that allow IDS to understand non-relational data or
data stored outside of IDS relational tables.
Purpose functions are implemented by the access method developers. They fill in
for the built-in purpose functions that are otherwise called for normal relational
tables. Since IDS does not understand the data handled by the access method, it
relies on purpose functions to transform data into the relational format.
Table 10-1 lists commonly used purpose functions, values, and flags.
am_parallel A flag that the database server sets to indicate which Flag
purpose functions or methods can execute in parallel
in a primary or secondary-access method.
am_open Name of UDF or method (UDR) name that makes the Function
virtual table or virtual index available for access in
queries.
am_rescan Name of a UDR that scans for the next item from a Function
previous scan to complete a join or subquery.
am_getnext Name of the required UDR that scans for the next item Function
that satisfies a query.
am_truncate Name of a UDR that deletes all rows of a virtual table Function
(primary-access method) or that deletes all
corresponding keys in a virtual index
(secondary-access method).
For a more complete list of purpose functions, values, and flags, refer to the IDS
11 information center at the following Web address:
http://publib.boulder.ibm.com/infocenter/idshelp/v111/index.jsp
The USING clause associates an access method with the virtual table or virtual
index. The <identifier>=<value> block is optional and can be used to pass
configuration or customization parameters to the access method. This
information is accessed by the UDAM by using the descriptors that are passed to
the purpose functions. This mechanism provides access method developers
extra flexibility in designing a UDAM, so that a single UDAM can be used for
multiple virtual tables or indices.
System catalogs
IDS uses a set of system catalog tables to store the metadata information about
a UDAM and associated virtual tables and indices. The catalogs have the
following names:
informix.sysams
informix.sysprocedures
informix.systables
informix.sysindices
Example 10-5 shows the entry in sysams for a HASH access method.
am_name hash
am_owner informix
am_id 3
am_type P
am_sptype D
am_defopclass 0
am_keyscan 0
am_unique 0
am_cluster 0
am_rowids 1
am_readwrite 1
am_parallel 0
am_costfactor 1.000000000000
am_create 116
am_drop 0
am_open 117
am_close 118
am_insert 119
am_delete 121
am_update 120
am_stats 0
am_scancost 122
am_check 0
am_beginscan 123
am_endscan 0
am_rescan 124
am_getnext 125
am_getbyid 126
am_build 0
am_init 0
am_truncate 0
procname sha_create
owner informix
procid 116
mode d
retsize 102
symsize 260
datasize 0
codesize 0
numargs 1
isproc f
specificname
externalname (sha_create)
paramstyle I
langid 1
paramtypes pointer
variant t
client f
handlesnulls f
iterator f
percallcost 0
commutator
negator
selfunc
internal f
class
stack
parallelizable f
costfunc
selconst 0.00
collation en_US.819
Example 10-7 informix.systables entry for virtual table using HASH access method
> CREATE TABLE hash_table ( col1 INTEGER )
USING hash ( mode="static", hashkey="(col1)", number_of_rows="100" );
tabname hash_table
owner vshenoi
partnum 1048840
tabid 105
rowsize 4
ncols 1
nindexes 0
nrows 0.00
created 11/07/2007
version 6881281
tabtype T
locklevel P
npused 0.00
fextsize 16
nextsize 16
flags 0
site
dbname
type_xid 0
am_id 3
pagesize 2048
ustlowts
secpolicyid 0
protgranularity
Descriptors
The UDAM framework defines a set of descriptors to hold information that is
passed to the C functions that provide the implementation of purpose functions.
While many descriptors provide support when writing purpose functions, we only
discuss a few of the more important ones.
Table 10-2 shows the descriptors that are defined by the UDAM framework. The
accessor function prefix column provides the DataBlade API extension function
name prefixes that operate on the respective descriptors.
This implies that several SQL functions, including one called my_open, have
been registered with the database server by using the CREATE
PROCEDURE/FUNCTION statement (Example 10-9).
We look at some tasks that my_open must perform, which depends on the
requirements of the external data store:
Verifying that the user has authority to open the table or file
Initializing a user data structure with information that subsequent calls will
require.
Tip: Memory for user data structures should be allocated with the
mi_dalloc() DataBlade API memory allocator call, with PER_COMMAND
duration, so that it persists across function calls. For a discussion of
memory durations provided by the DataBlade API, refer to the IBM Informix
DataBlade API Function Reference, Version 11.1, G229-6364.
This work is done with a combination of the DataBlade API accessor function
calls, DataBlade API calls, and standard C. For example, to retrieve the name of
the table to which an MI_AM_TABLE_DESC refers, we make a call to
mi_tab_name(). We then get the name of the table owner with a call to
mi_tab_owner(). The accessor function prefix column from Table 10-2 on
page 410 lists the prefix of the function calls that operate on respective
descriptors. For a complete list of DataBlade API accessor functions, refer to the
IDS 11 information center at the following Web address:
http://publib.boulder.ibm.com/infocenter/idshelp/v111/index.jsp
Handling qualifiers
The task of the purpose functions is to break the qualification down into a series
of simple predicates. It is also to assign a value of MI_VALUE_TRUE,
MI_VALUE_FALSE, or MI_VALUE_NOT_EVALUATED to each qualifier by using
the accessor function mi_qual_setvalue().
When the purpose function has processed each of the simple predicates in a
qualification, it can make a call to mi_eval_am_qual() to finish evaluating the
qualification. This causes the database server to evaluate any predicates that
were set to MI_VALUE_NOT_EVALUATED and to assign a value of
MI_VALUE_TRUE or MI_VALUE_FALSE to the statement as a whole. If the set
of qualifiers as a whole has a value of MI_VALUE_TRUE, then the row satisfies
the qualification, and the purpose function can return it to the server. Otherwise,
the purpose function can skip to the next row.
10.1.3 Flow of DML and DDL with virtual tables and indices
Now that you understand the UDAM framework and the surrounding data
structures, we turn our focus to the flow of the purpose function calls for various
Data Manipulation Language (DML) and Data Definition Language (DDL)
operations.
am_create
am_open
am_close
Figure 10-2 shows a VII flow that differs from the VTI flow in that, for every row in
the table (on which the index is created), the database server performs
am_insert to insert the index key value into the virtual index.
am_create
am_open
am_insert (key)
More yes
rows ?
no
am_close
am_open
am_drop
am_open (fragment)
am_insert row
am_delete row
or am_update row
am_close
Figure 10-4 UDAM IDU with row address (row ID) flow
am_scancost
am_open
am_beginscan
MI_ROWS
am_getnext am_insert
am_delete
MI_NO_MORE_RESULTS or am_update
am_close
am_scancost
am_open
am_beginscan
yes nextrow = no
norows?
MI_ROWS
am_getnext
am_insert
MI_NO_MORE_RESULTS am_delete
or am_update
am_close
am_scancost
am_open
am_beginscan
am_getnext MI_ROWS
(evaluate qualifiers)
MI_NO_MORE_RESULTS
am_close
am_open
am_truncate
am_drop
Transactions
Currently, the UDAM framework does not support the two-phase commit
protocol. Therefore, transaction management for external data is problematic. A
caveat to this restriction is that, starting with IDS v10.00xC1, XA-compliant data
sources can be registered with the database server, so that they can participate
in the two-phase commit protocol.
The difficulty comes when an error occurs during a commit. If the transaction
affects data from both the external data source and internal Informix tables, then
Caching data
The UDAM developer has access to the following hooks for caching information
across API calls:
Caching at scan level
The UDAM can cache information during a scan (am_beginscan
am_endscan), by using the mi_scan_setuserdata and mi_scan_userdata
accessor functions on MI_AM_SCAN_DESC descriptor. The database server
does not free this memory when the scan descriptor is destroyed. The access
method can either free it at am_endscan time or let the server free it when the
memory duration expires. A memory duration of PER_COMMAND is
sufficient for the time a scan is open.
Caching at open table level
The UDAM can cache information during a table or index open, (am_open
am_close), by using the mi_tab_setuserdata and mi_tab_userdata routines
on an MI_AM_TABLE_DESC descriptor. The server does not free this
memory when the table descriptor is destroyed. The UDAM can either free it
at am_close time or let the server free it when the memory duration expires. A
memory duration of PER_STATEMENT is sufficient for the time a table or
index is open.
Caching using named memory
The UDAM can allocate memory of any duration and associate a name with it
by calling the DataBlade API routines mi_named_alloc or mi_named_zalloc.
Later, UDAM can retrieve the address of that memory by calling
mi_named_get with the same name. Eventually, the access method can free
the memory by calling mi_named_free.
However, if the data is maintained in an IDS SmartBLOB space, then logging and
recovery services of the SmartBLOB component can be leveraged by the UDAM.
Be aware that the data must be stored in a SmartBLOB that was created with the
MI_LO_ATTR_LOG flag turned on. Refer to the mi_lo_create, mi_lo_spec_init,
and mi_lo_specset_flags routines in the IBM Informix DataBlade API
Programmer's Guide, Version 11.1, G229-6365, for details.
Parallelization
To enable parallelization for a UDAM implementation, the following purpose
functions (UDRs) must be parallelizable:
am_open
am_close
am_beginscan
am_endscan
am_getnext
am_rescan
am_getbyid
One way to use INSERT query parallelism is to run either "INSERT INTO...
SELECT" or "SELECT .. INTO TEMP .." statements in parallel.
Example 10-14 The onstat command to display cached data and config parameters
onstat -g dic <table name>
Web services rely on simple open standards, such as XML and SOAP, and are
accessed through any kind of client application. Typically those applications are
written in Java, C++, or C#. For those organizations who already have an existing
application that is based on an SQL database and that already use business
logic in the database server through UDRs, developers might want to integrate
access to Web services on the SQL level.
Having Web services accessible from SQL offers the following advantages:
Easy access through SQL and standardized APIs (as examples, ODBC and
JDBC)
Movement of the Web service results closer to the data processing in the
database server which can speed up applications
Web service access to non-Java or C++ developers
In this section, we look at how you can use the UDAM framework to access the
Amazon E-Commerce Web service as a relational table. Code examples are
provided as a quick start for application developers who are interested in
integrating Web services with IDS.
Web services can be most anything. Examples are theater review articles,
weather reports, credit checks, stock quotations, travel advisories, and airline
travel reservation processes. Each of these self-contained business services is
an application that can easily integrate with other services from the same or
different companies, to create a complete business process. This interoperability
allows businesses to dynamically publish, discover, and bind a range of Web
services through the Internet.
All of these tasks must be executed from the IDS SQL layer to achieve the
required portability.
A Web service client and Web service server code can both be generated by
gSOAP. In addition, gSOAP is self-contained, so that no additional libraries or
products are required. This enables an easier deployment of gSOAP-based IDS
extensions (DataBlades).
The gSOAP stub and skeleton compiler for C and C++ was developed by Robert
van Engelen of Florida State University. See the following Web sites for more
information:
Sourceforge.net gSOAP Toolkit
http://sourceforge.net/projects/gsoap2
gSOAP: C/C++ Web Services and Clients
http://www.cs.fsu.edu/~engelen/soap.html
Throughout the development of this book, we have used version 2.7.9l of gSOAP
for Linux x86 32 bit.
Since we must compile C source code files, have a C-compiler installed on your
development platform. For the examples in this section, we have been using gcc
version 3.4.6 20060404 (Red Hat 3.4.6-3).
ECS is free, but requires registration for an Amazon Web Services (AWS)
account in order to obtain the access key and secret access key. All ECS
requests require the access key to be part of the request to access Amazon data.
For the examples provided in this section, we use the ItemLookup and
ItemSearch operations. The queries are restricted to Amazon only for “books.”
You can download the Web Service Description Language (WSDL) for ECS from
the following Web address:
http://webservices.amazon.com/AWSECommerceService/
AWSECommerceService.wsdl?
Tip: WSDL is an XML file that describes the Web service operation message
format and the structure of request and response messages. It also describes
the Web service endpoints that corresponding to each operation.
Idtype Type of item identifier used to look up an item. We used the ISBN
lookup.
Table 10-4 lists some of the important parameters passed in through the
ItemSearchRequest message.
SearchIndex The category in which to search. We used category “books” for all
our searches.
Note: A wsvti exampleis provided with this book for you to download and try.
This example demonstrates the Amazon VTI capabilities. We start with
gSOAP code generation and then run the actual examples. See Appendix A,
“Additional material” on page 465, for details about how to download the
example code from the IBM Redbooks Web site and then to use it.
The -C option instructs the soapcpp2 tool to generate the Web service client
code only, and the -c option forces the generation of C language code
instead of C++ code.
From the generated files, the more important ones are:
– AWSECommerceService.h contains the function prototype and C structure
definitions for response and request messages.
– soapClient.c contains the implementation of client library functions to
invoke Amazon Web services through SOAP.
6. To test the installation, write a simple stand-alone C program to retrieve data
from ECS. Copy the C source code from Example 10-15 into ${WSVTI}/test.c.
Then replace <<Replace with Access Key>> with your Amazon access key.
int main()
{
struct soap soap;
int i = 0;
struct _ns1__ItemLookupResponse r;
struct _ns1__ItemLookup in;
struct ns1__ItemLookupRequest req;
enum _ns1__ItemLookupRequest_IdType idt =
_ns1__ItemLookupRequest_IdType__ISBN;
char *isbn = "0091883768";
char *repGrp[5] =
{"ItemAttributes","Images","EditorialReview","Offers","BrowseNodes"}
;
in.Request = &req;
in.__sizeRequest = 1;
in.AWSAccessKeyId = <<Replace with Access Key>>;
2. Edit wsvti.h and change the macro AWSACCESSKEYID to use your Amazon
access key.
3. Run the MAKE command in the ${WSVTI} directory to generate the
IDSAmazonAccess UDAM shared object. The shared object name is
wsvti.bld and can be found in ${WSVTI}/linux-intel/.
4. Running as user informix, create the $INFORMIXDIR/extend/wsvti directory.
5. Running as user informix, copy ${WSVTI}/linux-intel/wsvti.bld to
$INFORMIXDIR/extend/wsvti. This step ensures that the IDSAmazonAccess
UDAM is copied into a common place where all IDS extensions reside.
6. Configure IDS to run the IDSAmazonAccess UDAM code in a separate virtual
processor, called wsvp. By adding a dedicated virtual processor class to the
IDS configuration, we can separate the execution of the blocking network calls
of the Web service consumer UDAM from the overall IDS query processing.
To enable at least one dedicated virtual processor class for that purpose, add
the following line to the ONCONFIG file of your IDS instance:
VPCLASS wsvp,num=1
7. Start, or restart, the IDS instance.
8. Execute the onstat -g glo command.
You should now see the additional virtual processor listed, as illustrated in
Example 10-19.
MT global info:
sessions threads vps lngspins
0 18 12 0
Database created.
10.Register the UDAM in the tryamazon database by using the wsvti.sql script as
shown in Example 10-21.
Database selected.
Routine created.
Routine created.
Routine created.
1 row(s) inserted.
Access_method created.
Database closed.
11.Run the test script provided in the ${WSVTI}/test directory. The test script
creates two relational tables, customer and orders, and then populates data
into those tables. Finally it creates the Amazon VTI table, called AmazonTable
(shown in Example 10-22), and performs a join between all three tables.
12.The join returns the latest price of a book from Amazon based on an ISBN
order from a particular customer. The query is shown in Example 10-23.
Refer to “Design considerations” on page 440 for details about the need for
the USE_NL optimizer directive when performing joins involving
AmazonTable.
Database selected.
isbn 1401908810
title Spiritual Connections: How to Find Spirituality Throughout All
the Relat
ionships in Your Life
price 24.9500000000000
isbn 0743292855
title Paula Deen: It Ain't All About the Cookin'
price 25.0000000000000
isbn 1591391105
title The First 90 Days: Critical Success Strategies for New Leaders
at All Le
vels
price 27.9500000000000
isbn 0072257121
title CISSP All-in-One Exam Guide, Third Edition (All-in-One)
price 79.9900000000000
isbn 0071410155
title The Six Sigma Handbook: The Complete Guide for Greenbelts,
Blackbelts, a
nd Managers at All Levels, Revised and Expanded Edition
price 89.9500000000000
5 row(s) retrieved.
select --+USE_NL(AmazonTable)
order_num,
AmazonTable.*, fname, lname, order_date from AmazonTable, orders,
customer where (AmazonTable.isbn = orders.isbn) and orders.customer_num
= customer.customer_num;
order_num 1001
isbn 0091883768
order_num 1002
isbn 0091883768
title Who Moved My Cheese?
price 20.6500000000000
fname Ludwig
lname Pauli
order_date 05/21/1998
order_num 1003
isbn 0954681320
title Six Sigma and Minitab: A complete toolbox guide for all Six
Sigma p
ractitioners (2nd edition)
price 49.9900000000000
fname Anthony
lname Higgins
order_date 05/22/1998
order_num 1004
isbn 0764525557
title Gardening All-in-One for Dummies
price 29.9900000000000
fname George
lname Watson
order_date 05/22/1998
4 row(s) retrieved.
Database closed.
Figure 10-9 shows the architectural diagram of the Amazon VTI implementation.
IDS Engine
Query UDAM
optimizer Framework
Relational
Storage
The IDS SQL query optimizer and query execution engine communicate with the
Amazon VTI implementation through the IDSAmazonAccess UDAM. In order to
Example 10-25 shows the Amazon VTI table schema. AmazonTable has an
ISBN column that allows ECS ISBN lookup type of operations. AmazonTable also
has the TITLE column that allows ECS Title Lookup type of operations. The
PRICE column is an information only column (no filters allowed) that carries the
ECS ListPrice of a queried item. This is a basic schema and can be expanded to
include any other information available through ECS.
wsvti_scancost If qualifiers are present, returns a lower cost than if they are
not present. This helps in cases where joins are performed
with the VTI table being the inner table. Refer to “Design
considerations” on page 440 for more details.
10.3.1 WebSphere MQ
In its simplest form, WebSphere MQ is a method to exchange messages
between two end points. It acts as an intermediary between two systems and
provides value added functionalities such as reliability and transactional
semantics.
If you are using the same software on all systems, for instance an SAP® stack,
the software itself usually comes with workflow management features. If the
modules are running in a homogeneous environment, such as LINUX machines
running WebSphere and Informix, it is easier to change information via
distributed queries or enterprise replication. Alternatively, the application might
be running on heterogeneous systems, such as a combination of WebSphere,
DB2, Oracle and Informix. In this case, programming and setup of distributed
queries or replication becomes complex, and in many cases, does not meet the
application requirements.
Application
Suppliers
Integration 80+ Platforms
Internet Customers
WebSphere MQ
Custom
Databases
Applications
(Informix, DB2, Oracle, etc)
WebSphere MQ
Shipping Application
Queue 1
Informix
Dynamic
Queue 2 Server
Transaction Manager
The order entry application writes custom code to exchange messages from and
to WebSphere MQ. Developing custom code every time an application wants to
interact with WebSphere MQ is costly. It requires you to train the programmers
for it or hire consultants to develop, debug, and maintain this code, and to modify
the code for new queues and applications. The data exchanged between the
database and WebSphere MQ flows is through the application, which is not
efficient for high data volumes and necessitates a transaction manager.
Informix
WebSphere MQ Dynamic
Shipping Application
Server
Queue 1 WebSphere MQ
MQI
Connection Functions
Queue 2 IDS Managed
Transactions
Queue 3
Example 10-27 how you can easily use IDS WebSphere MQ functionality to send
and receive a message to and from a WebSphere MQ queue.
When you roll back the transaction, as shown in Example 10-28, the message
received is restored in the queue book order, and the row is also removed from
shipping_tab.
MQReceiveClob() Receives a CLOB in the queue into IDS and remove it from
the queue
Parameter interpretation
The following function describes how the calls are interpreted for MQSend(). The
other functions follow the same pattern. When the four parameters are given,
translation is straightforward and is executed as given:
MQSend(serviceparam, policyparam, messageparam, correlationparam)
The following translation applies when one or more parameters are missing:
MQsend(messageparam) is translated as follows:
MQSend("IDS.DEFAULT.SERVICE", "IDS.DEFAULT_POLICY",
messageparam, "");
MQsend(serviceparam, messageparam) is translated as follows:
MQSend(serviceparam, "IDS.DEFAULT_POLICY", messageparam, "");
MQsend(serviceparam, policyparam, messageparam) is translated as
follows:
MQSend(serviceparam, policyparam, messageparam, "");
IDS does not implicitly start a new transaction for the EXECUTE statement.
Therefore, you must start a transaction explicitly as shown in Example 10-31.
If the transaction gets rolled back, all operations on WebSphere MQ are rolled
back, just as IDS rolls back its changes. See Example 10-32.
ROLLBACK WORK;
The read operation gets the message from the queue without deleting the
message from the queue. The receive operation removes the message from the
queue and gets the message. These functions can be called with zero or more
parameters. The parameters are interpreted similar to MQSend(). The
transactional behavior of the receive functions is the same as with MQSend.
MQRead() and MQReceive() can return up to 32,739 bytes. The maximum size
of the message itself is a WebSphere MQ configuration parameter. The larger
messages should be read or received as a CLOB. For MQ, a message is a
message. Depending on the length, IDS differentiates between messages to
map the messages to data types.
SELECT mqreceive('SHIPPING.SERVICE','My.DEFAULT.POLICY')
FROM systables where tabid = 1;
Subscribers subscribe to a topic and specify the queue on which to receive the
messages. When a publisher inserts a message on that topic into the queue, the
WebSphere MQ broker routes the messages to all of the queues of each
specified subscriber. The subscribers retrieve the message from the queue by
using read or receive functions.
SELECT mqPublish(‘WeatherChannel’,
"<weather><zip>94501</zip><date>7/27/2006</date><high>89</high><low>59<
/low></weather>","Weather")
FROM systables WHERE tabid = 1;
SELECT mqreceive('WeatherChannel',"Weather")
FROM systables WHERE tabid = 1;
IDS is aware of the correlation ID predicate and sends the correlation ID request
to MQ. WebSphere MQ matches the correlation ID and sends the matched
message.
You can create a table to transport BLOB data by using the statement shown in
Example 10-37.
To complete the two first SELECT statements, IDS retrieves all the messages in
the service myrecvorder and then applies the filters and returns the qualified
rows. The unqualified messages are lost. Use the READ tables if you want to
apply these predicates, but be aware of the message fetch overhead.
IDS implicitly starts a transaction when you issue DML (UPDATE, DELETE,
INSERT, or SELECT) and DDL statements (CREATE statements). Alternatively,
you can explicitly start a new transaction with BEGIN WORK statements. APIs,
such as JDBC, start a new transaction when you turn off autocommit. Note that
an EXECUTE FUNCTION/PROCEDURE statement does not start a transaction.
Therefore, you must start a transaction before invoking the WebSphere MQ
function in an EXECUTE statement. This is illustrated in Figure 10-13.
IDS and MQ
Transaction Management MQ Queue Manager
MQ Queue
Environment
MQ functionality is provided with IDS, and the DataBlade is installed into
$INFORMIXDIR/extend when IDS is installed. The DataBlade is registered in the
database that is to invoke MQ functions. WebSphere MQ interaction is currently
supported in Informix logged databases. IDS communicates with WebSphere
MQ by using the server API. Therefore, WebSphere MQ must be installed on the
same machine as the server. However, this WebSphere MQ can channel the
messages to one or more remote WebSphere MQ servers. Each IDS instance
can connect to only one WebSphere MQ queue manager.
Platform support
Table 10-7 summarizes the IDS and WebSphere MQ versions by supported
platforms.
The answer is to this question takes us back to the basic objective of the UDAM
framework, which is to provide an interface to integrate non-relational data into
The ffvti Bladelet was written by one of the authors of this book, Jacques Roy. It
provides FFAccess as the primary access method, which is a read-only interface
to make external files look like relational tables to IDS. We extend this access
method to make it read/write by adding capabilities to INSERT and TRUNCATE
rows from the VTI table or underlying flat file.
IDS Engine
Query
UDAM
optimizer
Framework
Query
Execution FFAccess
engine Flat Files
ffvti
File I/O
Table
calls
Relational
Storage
We now look at each of the purpose functions and its associated tasks, which are
summarized in Table 10-8.
ffvti_open Extracts the flat file name and delim character from
options.
Opens flat file access in read, write, or read-write mode
based on the type of access on the VTI table.
Allocates state information to maintain the flat file
descriptor.
Caches state information in the MI_AM_TABLE_DESC
descriptor.
ffvti_beginscan Sets up the file seek position at the start of the file in
preparation for file scan.
ffvti_insert Extracts column values from the row provided by the IDS
SQL engine.
Converts the column values into character string by
applying casting functions.
Assembles buffer containing delimited characterized
column values.
Appends the newly formed buffer to the end of a flat file.
ffvti_drop Empty function, because dropping the VTI table should not
drop the flat file.
Transaction support
Transactional support for ffvti is not available in the current implementation. This
means that the INSERT and TRUNCATE TABLE options are irreversible.
Transactional support can be included by adding another layer in between the flat
file and access method. However, that is beyond the scope of this chapter and
does not add any value to the common usage scenarios of ffvti.
ffutil.c Source code for utility functions for the flat-file access method.
Database created.
Database selected.
Database closed.
Database selected.
(expression)
0
1 row(s) retrieved.
a t
b line 1
c 04/17/2001
d 2001-04-17 10:34:20
e 10:34:20
f 17.2300000000000
g 3.141592600000
h 2.180000070000
i 123456789012
j 1
a t
b line 2
c 04/18/2001
d 2001-04-18 10:34:20
e 10:34:20
f 18.2300000000000
g 4.141592600000
h 3.180000070000
i 223456789012
j 2
a t
b line 3
c
d
e 10:34:20
f 18.2300000000000
g 4.141592600000
h 3.180000070000
i 223456789012
j 3
3 row(s) retrieved.
a t
b line 1
c 04/17/2001
d 2001-04-17 10:34:20
e 10:34:20
f 17.2300000000000
g 3.141592600000
h 2.180000070000
i 123456789012
j 1
a t
b line 2
c 04/18/2001
d 2001-04-18 10:34:20
e 10:34:20
f 18.2300000000000
g 4.141592600000
h 3.180000070000
i 223456789012
j 2
a t
b line 3
c
d
e 10:34:20
f 18.2300000000000
g 4.141592600000
h 3.180000070000
i 223456789012
j 3
a t
b new row
c 01/01/2008
d 2008-01-01 00:00:00
e 0:00:00
4 row(s) retrieved.
No rows found.
Database closed.
Example 10-43 first creates a VTI table called tab. Note the use of USING
FFAccess (path='/tmp/tab.txt', delim=';'), which sets up the VTI table to
use a flat file /tmp/tab.txt and delimiter ';'. Then we show a SELECT FIRST 3 on
the VTI to sample the first three rows in the flat file. Next, we INSERT a row into
the VTI table, followed by a SELECT, to illustrate the success of the previous
INSERT. Finally we TRUNCATE table, followed by a SELECT, to show that the
flat file indeed gets truncated.
Select the Additional materials and open the directory that corresponds with
the IBM Redbooks form number, SG247522.
In addition to the above configuration, gSOAP is required to run the Amazon Web
Service VTI. To use gSOAP, first download a recent version of gSOAP for your
desired development platform (UNIX, Linux, or Windows) from the following Web
address:
http://sourceforge.net/project/showfiles.php?group_id=52781
After downloading the gSOAP toolkit, extract the compressed file into a folder, for
example /work/gsoap-linux-2.7. In the following sections, we refer to this gSOAP
install location as the ${GSOAP_DIR}.
Since you need to compile C source code files, ensure that a C-compiler is
installed on your development platform. For the examples in this section, we have
been using the GCC version 3.4.6 20060404 (Red Hat 3.4.6-3).
Use the following steps to test the Amazon VTI source example:
1. Edit wsvti.h and change the macro AWSACCESSKEYID to use your Amazon
access key.
2. Run the MAKE command in the ${WSVTI} directory to generate the
IDSAmazonAccess UDAM shared object. The shared object name is
wsvti.bld and can be found at ${WSVTI}/linux-intel/.
3. Running as user informix, create the $INFORMIXDIR/extend/wsvti directory.
4. Running as user informix, copy ${WSVTI}/linux-intel/wsvti.bld to
$INFORMIXDIR/extend/wsvti. This step ensures that the IDSAmazonAccess
UDAM is copied to where all IDS extensions reside.
5. Configure IDS to run the IDSAmazonAccess UDAM code in a separate virtual
processor, called wsvp. By adding a dedicated virtual processor class to the
IDS configuration, you can separate the execution of the blocking network
calls of the Web service consumer UDAM from the overall IDS query
processing. To enable at least one dedicated virtual processor class for that
purpose, add the following line to the ONCONFIG file of your IDS instance
and restart the IDS instance:
VPCLASS wsvp,num=1
6. Create a new database to work with as shown in Example A-1.
Database created.
8. Run the test script provided in the ${WSVTI}/test directory. The test script
creates two relational tables, customer and orders. It then populates data into
Database selected.
isbn 1401908810
title Spiritual Connections: How to Find Spirituality Throughout All
the Relat
ionships in Your Life
price 24.9500000000000
isbn 0743292855
title Paula Deen: It Ain't All About the Cookin'
price 25.0000000000000
isbn 1591391105
title The First 90 Days: Critical Success Strategies for New Leaders
at All Le
vels
price 27.9500000000000
isbn 0072257121
title CISSP All-in-One Exam Guide, Third Edition (All-in-One)
price 79.9900000000000
isbn 0071410155
title The Six Sigma Handbook: The Complete Guide for Greenbelts,
Blackbelts, a
nd Managers at All Levels, Revised and Expanded Edition
price 89.9500000000000
select --+USE_NL(AmazonTable)
order_num,
AmazonTable.*, fname, lname, order_date from AmazonTable, orders,
customer where (AmazonTable.isbn = orders.isbn) and orders.customer_num
= customer.customer_num;
order_num 1001
isbn 0091883768
title Who Moved My Cheese?
price 20.6500000000000
fname Anthony
lname Higgins
order_date 05/20/1998
order_num 1002
isbn 0091883768
title Who Moved My Cheese?
price 20.6500000000000
fname Ludwig
lname Pauli
order_date 05/21/1998
order_num 1003
isbn 0954681320
title Six Sigma and Minitab: A complete toolbox guide for all Six
Sigma p
ractitioners (2nd edition)
price 49.9900000000000
fname Anthony
lname Higgins
order_date 05/22/1998
order_num 1004
isbn 0764525557
title Gardening All-in-One for Dummies
price 29.9900000000000
fname George
lname Watson
order_date 05/22/1998
4 row(s) retrieved.
Database closed.
You have successfully installed and used the Amazon Web Service VTI sample
code.
Database selected.
(expression)
1 row(s) retrieved.
a t
b line 1
c 04/17/2001
d 2001-04-17 10:34:20
e 10:34:20
f 17.2300000000000
g 3.141592600000
h 2.180000070000
i 123456789012
j 1
a t
a t
b line 3
c
d
e 10:34:20
f 18.2300000000000
g 4.141592600000
h 3.180000070000
i 223456789012
j 3
3 row(s) retrieved.
a t
b line 1
c 04/17/2001
d 2001-04-17 10:34:20
e 10:34:20
f 17.2300000000000
g 3.141592600000
h 2.180000070000
i 123456789012
j 1
a t
b line 2
a t
b line 3
c
d
e 10:34:20
f 18.2300000000000
g 4.141592600000
h 3.180000070000
i 223456789012
j 3
a t
b new row
c 01/01/2008
d 2008-01-01 00:00:00
e 0:00:00
f 12.6000000000000
g 12.60000000000
h 1.000000000000
i 123456789
j 100
4 row(s) retrieved.
No rows found.
Database closed.
You may copy, modify, and distribute this sample program in any form without
payment to IBM, for any purpose including developing, using, marketing or
distributing programs that include or are derivative works of the sample program.
The sample program is provided to you on an "AS IS" basis, without warranty of
any kind. IBM HEREBY EXPRESSLY DISCLAIMS ALL WARRANTIES EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. Some jurisdictions do not allow for the exclusion or limitation of
implied warranties, so the above limitations or exclusions may not apply to you.
IBM shall not be liable for any damages you suffer as a result of using, modifying
or distributing the sample program or its derivatives.
Each copy of any portion of this sample program or any derivative work, must
include a the above copyright notice and disclaimer of warranty.
********************************************************************
C code sample
(c) Copyright IBM Corp. 2002 All rights reserved.
This sample program is owned by International Business Machines Corporation
or one of its subsidiaries ("IBM") and is copyrighted and licensed, not sold.
You may copy, modify, and distribute this sample program in any form without
payment to IBM, for any purpose including developing, using, marketing or
distributing programs that include or are derivative works of the sample program.
The sample program is provided to you on an "AS IS" basis, without warranty of
any kind. IBM HEREBY EXPRESSLY DISCLAIMS ALL WARRANTIES EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. Some jurisdictions do not allow for the exclusion or limitation of
implied warranties, so the above limitations or exclusions may not apply to you.
IBM shall not be liable for any damages you suffer as a result of using, modifying
or distributing the sample program or its derivatives.
Each copy of any portion of this sample program or any derivative work, must
include a the above copyright notice and disclaimer of warranty.
Example 10-44 contains all the C code necessary to create your Soundex
DataBlade.
typedef struct {
mi_char data[256];
// Define Sound-bytes
// Clean up
mi_free(thesound);
mi_free(textval);
return atoi(Value->sound);
}
length=strlen(txt);
if (!isalpha(ccurr) &&
(ccurr != RE_MULTI) && (ccurr != RE_SINGLE) &&
(ccurr != '[') && (ccurr != ']') &&
(ccurr != '^') && (ccurr != '!'))
continue;
switch(ccurr)
{
// Soft characters
case 'a':
case 'e':
case 'h':
case 'i':
case 'o':
case 'u':
case 'w':
case 'y':
sound=SOUND_SKIP;
break;
case 'b':
sound = assign_no_prev_char(cprev, ccurr, SOUND_B);
break;
case 'd':
sound = assign_no_prev_char(cprev, ccurr, SOUND_T);
break;
case 'f':
case 'v':
sound = assign_no_prev_char(cprev, ccurr, SOUND_F);
break;
case 'j':
sound = assign_no_prev_char(cprev, ccurr, SOUND_SH);
case 'l':
sound = assign_no_prev_char(cprev, ccurr, SOUND_L);
break;
case 'm':
sound = assign_no_prev_char(cprev, ccurr, SOUND_M);
break;
case 'n':
sound = assign_no_prev_char(cprev, ccurr, SOUND_N);
break;
case 'q':
sound = assign_no_prev_char(cprev, ccurr, SOUND_K);
break;
case 'r':
sound = assign_no_prev_char(cprev, ccurr, SOUND_R);
break;
case 'z':
sound = assign_no_prev_char(cprev, ccurr, SOUND_S);
break;
case 'c':
if (cnext == 'e') {
sound=SOUND_S;
break;
}
if (cnext == 'h'){
sound=SOUND_SH;
break;
}
if (cnext == 'i') {
if ((cnext2=='o') && (cnext3=='u') && (cnext4=='s')) {
sound = SOUND_SH;
i += 2;
break;
case 'g':
if (cnext=='e') {
sound=SOUND_SH;
break;
}
sound = assign_no_prev_char(cprev, 'g', SOUND_K);
break;
case 'k':
if (cprev=='c') {
sound=SOUND_SKIP;
break;
}
sound = assign_no_prev_char(cprev, 'k', SOUND_K);
break;
case 'p':
if (cnext=='h') {
sound=SOUND_F;
break;
}
sound = assign_no_prev_char(cprev, 'p', SOUND_B);
break;
case 's':
if (cnext == 'h') {
sound=SOUND_SH;
break;
}
sound = assign_no_prev_char(cprev, 's', SOUND_S);
break;
case 't':
case 'x':
buf[pcntr]=SOUND_K;
buf[pcntr+1]=SOUND_S;
buf[pcntr+2]='\0';
pcntr += 2;
break;
default:
sound = ccurr;
}
if (sound != SOUND_SKIP) {
buf[pcntr]=sound;
buf[pcntr+1]='\0';
pcntr++;
}
}
return (buf);
}
//
// Pattern Match routine.
//
// Simplistic Regular expression handler
//
// Allows use of:
// '%' to match any string
pattern = mi_lvarchar_to_string(p_pattern);
if (pattern == NULL) {
mi_db_error_raise( NULL, MI_SQL, "error",
"FUNCTION %s", "TSndxRE 1", (char *)NULL);
}
patternsound = Sndx(pattern);
mi_free(pattern);
mi_free(patternsound);
return flag;
}
if (*n == '\0')
return MI_FALSE;
break;
case RE_MULTI:
if (c == '\0')
return MI_TRUE;
{
char c1 = tolower(c);
for (--p; *n != '\0'; ++n)
if ((c == '[' || tolower(*n) == c1) &&
strmatch (p, n) == MI_TRUE)
return MI_TRUE;
return MI_FALSE;
}
case '[':
{
register int not;
if (*n == '\0')
return MI_FALSE;
c = *p++;
if (c == '\0')
return MI_FALSE;
c = tolower(*p++);
c = *p++;
}
if (c == ']')
break;
}
if (!not)
return MI_FALSE;
break;
matched:;
while (c != ']')
{
if (c == '\0')
return MI_FALSE;
c = *p++;
}
if (not)
return MI_FALSE;
}
default:
if (c != tolower(*n))
return MI_FALSE;
}
++n;
}
if (*n == '\0')
return MI_TRUE;
return MI_FALSE;
}
access control list (ACL). The list of principals computer. A device that accepts information (in
that have explicit permission (to publish, to the form of digitalized data) and manipulates it for a
subscribe to, and to request persistent delivery of a result based on a program or sequence of
publication message) against a topic in the topic instructions about how the data is to be processed.
tree. The ACLs define the implementation of
topic-based security. configuration. The collection of brokers, their
execution groups, the message flows and sets that
aggregate. Precalculated and prestored are assigned to them, and the topics and associated
summaries, kept in the data warehouse to improve access control specifications.
query performance.
continuous data replication. See enterprise
aggregation. An attribute-level transformation that replication.
reduces the level of detail of available data, for
example, having a Total Quantity by category of data append. A data loading technique where new
items rather than the individual quantity of each item data is added to the database leaving the existing
in the category. data unaltered.
BLOB. Binary large object. A block of bytes of data data federation. The process of enabling data
(for example, the body of a message) that has no from multiple heterogeneous data sources to appear
discernible meaning, but is treated as one solid as though it is contained in a single relational
entity that cannot be interpreted. database. Can also be referred to “distributed
access.”
commit. An operation that applies all the changes
made during the current unit of recovery or unit of Data Manipulation Language (DML). An
work. After the operation is complete, a new unit of INSERT, UPDATE, DELETE, or SELECT SQL
recovery or unit of work begins. statement.
data partition. A segment of a database that can dynamic SQL. SQL that is interpreted during
be accessed and operated on independently even execution of the statement.
though it is part of a larger data structure.
engine. A program that performs a core or
data refresh. A data loading technique where all essential function for other programs. A database
the data in a database is completely replaced with a engine performs database functions on behalf of the
new set of data. database user programs.
data warehouse. A specialized data environment enrichment. The creation of derived data. An
developed, structured, and used specifically for attribute-level transformation performed by a type of
decision support and informational applications. It is algorithm to create one or more new (derived)
subject oriented rather than application oriented. attributes.
Data is integrated, non-volatile, and time variant.
enterprise replication. An asynchronous,
database partition. Part of a database that log-based tool for replicating data between IBM
consists of its own data, indexes, configuration files, Informix Dynamic Server (IDS) database servers.
and transaction logs.
extenders. Program modules that provide
DataBlades. These are program modules that extended capabilities for DB2 and are tightly
provide extended capabilities for Informix databases integrated with DB2.
and are tightly integrated with the database
management system (DBMS). FACTS. A collection of measures and the
information to interpret those measures in a given
DB Connect. Enables connection to several context.
relational database systems and the transfer of data
from these database systems into the SAP Business federation. Providing a unified interface to diverse
Information Warehouse. data.
Java Runtime Environment (JRE). A subset of nickname. An identifier that is used to reference
the JDK that enables you to run Java applets and the object located at the data source that you want
applications. to access.
materialized query table. A table where the node. An instance of a database or database
results of a query are stored for later reuse. partition.
measure. A data item that measures the node group. Group of one or more database
performance or behavior of business processes. partitions.
message domain. The value that determines how ODS. See operational data store.
the message is interpreted (parsed).
online analytical processing (OLAP).
message flow. A directed graph that represents Multidimensional data analysis, performed in real
the set of activities performed on a message or event time. Not dependent on an underlying data schema.
as it passes through a broker. A message flow
consists of a set of message processing nodes and Open Database Connectivity (ODBC). A
message processing connectors. standard API for accessing data in both relational
and non-relational database management systems.
message parser. A program that interprets the bit Using this API, database applications can access
stream of an incoming message and creates an data stored in database management systems on a
internal representation of the message in a tree variety of computers even if each database
structure. A parser is also responsible for generating management system uses a different data storage
a bit stream for an outgoing message from the format and programming interface. ODBC is based
internal representation. on the call-level interface (CLI) specification of the
X/Open SQL Access Group.
metadata. Typically called data (or information)
about data. It describes or defines data elements. operational data store (ODS). (1) A relational
table for holding clean data to load into InfoCubes,
MOLAP. Multidimensional OLAP. Can be called and can support some query activity. (2) Online
MD-OLAP. Refers to OLAP that uses a Dynamic Server, an older name for IDS.
multidimensional database as the underlying data
structure.
Glossary 495
optimization. The capability to enable a process server. A computer program that provides services
to execute and perform in such a way as to maximize to other computer programs (and their users) in the
performance, minimize resource utilization, and same or other computers. However, the computer
minimize the process execution response time that a server program runs in is also frequently
delivered to the user. referred to as a server.
partition. Part of a database that consists of its shared nothing. A data management architecture
own data, indexes, configuration files, and where nothing is shared between processes. Each
transaction logs. process has its own processor, memory, and disk
space.
pass-through. The act of passing the SQL for an
operation directly to the data source without being static SQL. SQL that has been compiled prior to
changed by the federation server. execution. Typically provides best performance.
pivoting. Analysis operation where a user takes a subject area. A logical grouping of data by
different viewpoint of the results, for example, by categories, such as customers or items.
changing the way the dimensions are arranged.
synchronous messaging. A method of
primary key. Field in a table that is uniquely communication between programs in which a
different for each record in the table. program places a message on a message queue
and then waits for a reply before resuming its own
process. An instance of a program running in a processing.
computer.
task. The basic unit of programming that an
program. A specific set of ordered operations for a operating system controls. Also see multitasking.
computer to perform.
thread. The placeholder information associated
pushdown. The act of optimizing a data operation with a single use of a program that can handle
by pushing the SQL down to the lowest point in the multiple concurrent users. See also multithreading.
federated architecture where that operation can be
executed. More simply, a pushdown operation is unit of work. A recoverable sequence of
executed at a remote server. operations performed by an application between two
points of consistency.
RSAM. Relational Sequential Access Method. The
disk access method and storage manager for the user mapping. An association made between the
Informix DBMS. federated server user ID and password and the data
source (to be accessed) user ID and password.
ROLAP. Relational OLAP. Multidimensional
analysis using a multidimensional view of relational virtual database. A federation of multiple
data. A relational database is used as the underlying heterogeneous relational databases.
data structure.
warehouse catalog. A subsystem that stores and
roll-up. Iterative analysis, exploring facts at a manages all the system metadata.
higher level of summarization.
xtree. A query-tree tool that enables you to monitor
the query plan execution of individual queries in a
graphical environment.
ASCII American Standard Code for DB2 UDB DB2 Universal Database™
Information Interchange DBA database administrator
AST application summary table DBDK DataBlade Development Kit
ASYNC asynchronous DBM database manager
AWS Amazon Web Services DBMS database management
BLOB binary large object system
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this book.
IBM Redbooks
For information about ordering these publications, see “How to get Redbooks” on
page 503. Note that some of the documents referenced here may be available in
softcopy only.
Developing PHP Applications for IBM Data Servers, SG24-7218
Informix Dynamic Server V10 . . . Extended Functionality for Modern
Business, SG24-7299
Informix Dynamic Server V10: Superior Data Replication for Availability and
Distribution, SG24-7319
Informix Dynamic Server 11: Advanced Functionality for Modern Business,
SG24-7465
Informix Dynamic Server 11 Extending Availability and Replication,
SG24-7488
Other publications
These publications are also relevant as further information sources:
Built-In DataBlade Modules User’s Guide, G251-2770
C-ISAM DataBlade Module User’s Guide, Version 1.0, G251-0570
Data Director for Web Programmer’s Guide, Version 1.1, G251-0291
Data Director for Web User's Guide, Version 2.0, G210-1401
DataBlade API Function Reference, G251-2272
DataBlade API Programmer Guide, G251-2273
DataBlade Developer’s Kit User Guide, G251-2274
DataBlade Module Development Overview, G251-2275
DataBlade Module Installation and Registration Guide, G251-2276-01
Online resources
These Web sites are also relevant as further information sources:
IBM CIO article, “Improve Database Performance on File System Containers
in IBM DB2 Universal Database V8.2 using Concurrent I/O on AIX,” by Kenton
DeLathouwer, Alan Y Lee, and Punit Shah, August 2004.
http://www-128.ibm.com/developerworks/db2/library/
techarticle/dm-0408lee/
“Non-blocking checkpoints in Informix Dynamic ServerDescription2,” by Scott
Lashley
http://www.ibm.com/developerworks/db2/library/techarticle/dm-0703las
hley/index.html
Downloadable Bladelets and demos
http://www-128.ibm.com/developerworks/db2/zones/informix/library/sam
ples/db_downloads.html
Object-relational database extensibility, including DataBlades
http://www.iiug.org/software/index_ORDBMS.html
Introduction to the TimeSeries DataBlade
http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-051
0durity2/index.html
Informix Web DataBlade Architecture
http://www.ibm.com/developerworks/db2/library/techarticle/0207harris
on/0207harrison.html
Index 507
multiple partitions 117 DSS (Decision Support Systems) 4–5, 10, 15,
page size 116 34–35, 47–49, 119, 133, 155, 175, 184
tblspace tblspace extents 116 configuration for 36
temporary 119 data warehousing 36, 99
DBSPACETEMP 41, 119 PDQ parameter values 41
hash join 41 dynamic SQL 293
DBSSO (database system security officer) 146
DDL (Data Definition Language) 134, 137, 414, 454
DECIMAL data type 312
E
E-Commerce Service (ECS)
Decision Support Systems (DSS) 4–5, 10, 15,
Amazon 425
34–35, 41, 47–49, 119, 133, 155, 175, 184
gSOAP code 427
configuration for 36
ItemLookup operation 425
data warehousing 36, 99
ItemSearch operation 426
decision-support memory 39
ECS (E-Commerce Service)
decryption 147
Amazon 425
DEFAULT clause 140
gSOAP code 427
degree of parallelism 152
ItemLookup operation 425
DELETE 343–344, 380, 421, 448, 451, 454, 457
ItemSearch operation 426
denial-of-service flood attacks 133
EDA (enterprise data availability) 10, 75, 84
denormalization, data warehousing 242
capabilities and failover options 90
deployment 15
CLR 75–76, 81
Deployment Wizard 9–10, 26
ER 75–76, 83
component tree 26
HDR 75–77
DescribeFeatureType operation 336
recommended solutions 92
descriptor, UDAM 410
RSS 75–76, 78
direct I/O on cooked chunks 115
SDS 75–76, 80
disaster 75
solutions 10
disaster recovery 84–85, 90, 93, 147
solutions in IDS 76
Discretionary Access Control (DAC) 135
technologies and advantages 88
disk hardware mirroring 92
efficiency 256
disk management 11, 114
EGL (Enterprise Generation Language) 321
disk mirroring 148, 151, 155
encryption 9, 147
distributed joins 18
data 146
distributed queries 441
passwords 146
Distributed Relational Database Architecture
encryption, password 132
(DRDA) 19
end node 374
distribution 93, 98
enterprise data availability (EDA) 10, 75, 84
DLL (Data Definition Language) 414
capabilities and failover options 90
DML (Data Manipulation Language) 414, 454
CLR 75–76, 81
domain 236
ER 75–76, 83
DRDA (Distributed Relational Database Architec-
HDR 75–77
ture) 19
recommended solutions 92
DROP 282, 287, 405, 416, 436, 464
RSS 75–76, 78
DROP ACCESS_METHOD 406
SDS 75–76, 80
DROP TABLE 146, 436
solutions 10
DS_MAX_QUERIES 39
solutions in IDS 76
DS_MAX_SCANS 39
technologies and advantages 88
DS_NONPDQ_QUERY_MEM 39
Enterprise Gateway Manager 18
DS_TOTAL_MEMORY 39
Index 509
srid 352 HA cluster 85
Geodetic DataBlade module 23 ER 86
Geodetic Web services 9 hardware mirroring 155
geographic analysis 324 hash join, DBSPACETEMP 41
geographic information system (GIS) 234, 324, 326 HDR (High Availability Data Replication) 8, 84–86,
software 328, 330 89, 91, 105, 278
techniques 325 data encryption 147
tools 325 EDA 75–77
geographic point locations 325 Health Center 196, 212
geographic processing 324 heterogeneous data sources 6
geographic technology 325 hierarchical server tree 84
Geography Markup Language (GML) 328, 353, 356 hierarchical tree topology 84
geometric shapes 326 high availability (HA) 2, 7, 9, 75–76, 84–85, 93,
geometry_type 351 105–106, 112–113, 164
GeoPoint 361 application redirection 103
GeoPolygon data type 332 cluster 85
geospatial file format 353 sysmaster database 106
geospatial mapping 353 High Availability Data Replication (HDR) 8, 84–86,
get and set methods 228 89, 91, 105, 278
GetCapabilities operation 334 data encryption 147
GetCapabilities transaction 353 EDA 75–77
document 356 High Performance Loader (HPL) 42, 120
GetFeature operation 338 Home 193
GIS (geographic information system) 234, 324, 326 host variables 182
software 328, 330 hot backup 81
techniques 325 HPL (High Performance Loader) 42, 120
tools 325 HTTP 290, 335, 423–424
global mode 183 human resources system 100
global positioning system (GPS) 259
GML (Geography Markup Language) 328, 353, 356
Google 325
I
I/O throughput 36
GoogleEarth 353
IBM Informix TimeSeries DataBlade 250
GOOGLEMAPKEY field 65
IDS
GPS (global positioning system) 259
administration 3
GPS-enabled devices 259
application development 3
GRANT user labels 140, 145
business environment 24
great-circle, shortest path 330
capabilities 6
GROUP BY 269, 276, 283, 303, 305, 307
database events 377
group, new 69
DataBlade 220
gSOAP code 427
decryption functions 147
GUI 284
EDA 75
EDA solutions 76
H encryption functions 147
HA (high availability) 2, 7, 9, 75–76, 84–85, 93, extensibility 3, 295, 372
105–106, 112–113, 164 features and tools 17
application redirection 103 LIST collection type 253
cluster 85 memory management 228
sysmaster database 106 mixed and changing environments 34
Index 511
KVP syntax 341, 344 Mandatory Access Control (MAC) 135
MapInfo 325, 353
Mapquest 325
L mashup 14, 325, 374
label-based access control (LBAC) 8–9, 11, 114,
data-centric 375
135, 137
relational 422
column level 143
MAX_PDQPRIORITY 39
row-level 138
MAXAIOSIZE 38
Large Object Locator (LLD) module 291
MaxConnect 19
last committed isolation level 8
MEAN() 312
latency, minimizing 79
MedianApprox() 318
LBAC (label-based access control) 8–9, 11, 114,
MedianExact() 317
135, 137
memory 121, 163–164, 174, 177, 180–181, 184,
column-level 143
189–190, 208, 278, 297–299, 301–303, 312, 315,
row-level 138
317–319, 383, 388–389, 402, 404, 411, 420, 430
LBS (location-based services) 357
allocation 298, 383–385, 389, 392, 430
LDAP 132
duration 383
least recently used (LRU) 44, 114
for query 40
light append buffer 42
management, IDS 228
light scan 37
quantum 39
buffers 38
usage 184
line 244
utilization 36–37
linestring 244
message log file 105–106
Linux 43, 287, 423–424, 455, 466
message queue
LIST 303, 311
integration 398
LIST collection type 253
VTI tables 402
LLD (Large Object Locator) module 291
metadata 293, 305, 334, 406, 408–409, 423
load 162, 295, 434
Microsoft mapping 325
loadshp utility 349, 351
mirroring 148
location-based services (LBS) 357
MLS (multi-level security) 135
lock table 44
Mode() 319
locking 421
model 391, 401–402
LOCKS 44
Model-View-Controller (MVC) 260
log recovery 82
MQ 446
LOGBUFF 45
MQ Publish and Subscribe functions 449
logical log 78, 162, 170–171, 201, 205
MQ Read and Receive functions 449
backup 156
MQ Table mapping functions 451
buffer 45
MQ Utility functions 451
file 170
MQSeries DataBlade module 290, 398
logical recovery time 122
multi-level security (MLS) 135
logical restore 148
multimedia data management 17
logs 194, 205
multiple instances 162, 164
LRU (least recently used) 44, 114
MULTISET 303, 311
LTAPEDEV 150–151
multithreaded architecture 393
MVC (Model-View-Controller) 260
M
MAC (Mandatory Access Control) 135
MAILUTILITY 152 N
named memory 279, 384
mainstream IT 325
Index 513
Oracle 441 primary-target replication 84
ORDER BY 269, 276, 283, 303 privileges 173
ordered set 139 database connection 114
out-of-memory error condition 152 processes 267, 283, 383, 396, 423
PropertyIsBetween CQL predicate 360
purpose function 403
P Python driver 21
page 181, 189, 200
cleaner threads 45
size 116 Q
PAGERMAIL 152 qualification descriptor 410
PAM (Pluggable Authentication Module) 8, 131 qualifier, VTI and VII 413
LDAP 132 quantile 318
parallel database query (PDQ) 36, 39–40, 49, 80, quantum 39
119 quantum unit 39
shared memory 40 quasi-spatial techniques 324
stored procedure 133 query
parallelism 152, 421 memory 40
parallelization 421 plan 305–306, 308
partitioning 113 queues 392, 442–443, 446, 450
password quotations 423
authentication 130
encryption 132, 146
PDQ (parallel database query) 36, 39–40, 49, 80,
R
RA_PAGES 44
119
RA_THRESHOLD 45
parameter values 41
RAS_LLOG_SPEED 125
shared memory 40
RAS_PLOG_SPEED 125
stored procedure 133
raw chunk 114
PDQPRIORITY 39
raw disk device 114–115
performance 2, 184, 201, 295, 303–304, 306, 309
RBAC (role-based access control) 11, 114, 136
features 11
read-ahead 114, 178
tools 19
read-ahead threshold 45
tuning 43
REAL 234
Perl Web services 321
real-time data 258
permissions, data access 136
GPS-enabled devices 259
PHP 55
stock trades 258
driver 21
Real-Time Loader DataBlade 258
OAT 18
recoverability 279
Web services 321
recovery 75, 121
PHYSBUFF 45, 125
recovery point objective (RPO) 114
PHYSDBS 125
recovery time objective (RTO) 11, 43, 45, 114
PHYSFILE 125
dependent onconfig parameters 123
physical recovery time 121
Redbooks Web site 465, 503
Pitney Bowes 325
Contact us xix
Pluggable Authentication Module (PAM) 8, 131
relational data sources 401
LDAP 132
relational mashup 422
polyline 244
relational tables 401
polymorphism, function 328
remote administration 164, 173
primary access method 258, 402, 405, 438
remote instances administration tool 33
Index 515
service provider 424 Spatial DataBlade 290–291, 324, 327, 329, 347,
service-oriented architecture (SOA) 13, 256, 321, 358
422 globes 326
data integration 320 maps 326
foundation technologies in IDS 11 320 shapefile utilities 349
framework 320 SRS 328
set 139 Spatial DataBlade module 23
ordered 139 spatial extent 359–360
SET ENVIRONMENT 134 spatial operators 329
SET ISOLATION COMMITTED READ 159 Spatial Reference System (SRS) 328
SET PDQPRIORITY 134 Spatial Web services 9
SET ROLE 136 spatiotemporal query 332, 358
SFS (Simple Feature Specification) 326 SPL (Stored Procedure Language) 257, 263, 270,
shapefile utilities 349 283, 293, 296, 303–304, 306, 309
Shared Disk Secondary (SDS) 80, 84–86, 89–91, routines 160
105 SQL
architecture 80 data types 257
EDA 75–76, 80 Informix 21
SMX communications interface 80 level 422
workload balancing with ER 96 Web services 422
shared library 285–287, 299, 395 optimizer 120, 413
shared memory 43 UDAM syntax 403
SHMVIRTSIZE 45 SQL Administration API 9, 18, 163–164, 167, 170,
Show command 194 173, 179, 182–186, 208
signal, sending 396 SQLHOSTS file 104
named memory 397 sqlhosts file 130–131
silent configuration 31 srid 329, 352
silent installation 25, 27 SRS (Spatial Reference System) 328
silent log file 30 standards for data types 259
Simple Feature Specification (SFS) 326 startup sensors 179
smart large object 279 startup tasks 178–179
SMX communication 79 statistical information 161, 163
SMX layer 80 statistics 162, 176, 181–182, 189–190, 203, 405
SOA (service-oriented architecture) 13, 256, 321, descriptor 410
422 functions 281
data integration 320 stdout, backup to 150
ESB 442 stock trades 258
foundation technologies in IDS 11 320 storage manager 154
framework 320 Storage Manager commands 152
SOAP 55, 323, 422, 424, 428 storage_type 351
message 424 stored procedure 13, 133, 161, 166–167, 174,
software components 220 211–212, 270, 272, 379, 445
Soundex 362 Stored Procedure Language (SPL) 257, 263, 270,
DataBlade 363 283, 293, 296, 303–304, 306, 309
examples 363 routines 160
spatial analysis 324 subtypes 328
spatial coordinates 325 support functions 228
spatial data 324 synchronous (SYNC) mode 77
spatial data type 327, 331 syntax 182, 403, 405, 412
Index 517
transactions 419 views 12, 287, 289, 295, 303, 305, 309
truncate flow 419 VII (Virtual Index Interface) 13, 228, 258, 280,
UDF (user-defined function) 287, 291, 296 401–402
calling 394 qualifier 413
UDR (user-defined routine) 12, 134, 162, 167, 174, Virtual Index Interface (VII) 13, 228, 258, 280,
220, 223, 231–232, 277–278, 280, 282–284, 289, 401–402
297–298, 300, 303, 311, 327, 384, 386, 397, qualifier 413
403–404, 407, 412, 422, 430 virtual processor 164, 205, 210, 279, 393–394,
UDT (user-defined type) 231, 234, 250, 327 430–432
Fract data type 236 Virtual Shared Memory segments 41
UNIX 19, 152, 194–195, 208, 287, 349, 396 Virtual Table Interface (VTI) 13, 252, 257, 281, 401
unloadshp utility 349 qualifier 413
UPDATE 343–344, 380, 421, 448, 451, 454, 457 VTI (Virtual Table Interface) 13, 252, 257, 281, 401
UPDATE STATISTICS 160 qualifier 413
update-anywhere replication 84
user label 140, 145
user mode 183
W
Web DataBlade module 24, 292
userdata field 410
Web development 20
user-defined
Web Feature Service (WFS) 13, 290, 326, 332
alert 166
architecture 346
primary access method 402
Basic 333
record 381
capabilities 333
secondary access method 402
DataBlade 290, 325, 327
statistics functions 281
installation and setup 347
table functions 281
overview 332
virtual processor 394
publishing location 324
network connection 397
spatiotemporal queries 358
user-defined access method (UDAM) 402
Transaction 334
SQL syntax 403
XLink 334
user-defined aggregate (UDA) 310–314, 316,
Web Feature Service API 8
318–319, 327
Web Mapping Service (WMS) 333
MEAN() 312
Web service 13, 283, 295, 325, 353, 375, 422–425,
MedianApprox() 318
428, 437–438, 468
MedianExact() 317
accessing 398
Mode() 319
Amazon E-Commerce Service 425
user-defined function (UDF) 287, 291, 296
IDS 321
calling 394
standards 422
user-defined routine (UDR) 12, 134, 162, 167, 174,
Web services
220, 223, 231–232, 277–278, 280, 282–284, 289,
.NET 2.0-based 321
297–298, 300, 303, 311, 327, 384, 386, 397,
EGL-based 321
403–404, 407, 412, 422, 430
Java-based 321
user-defined type (UDT) 231, 234, 250, 327
OGC 321
Fract data type 236
PHP, Ruby, Perl, and C/C++ 321
user-state information 298
Web Services Object Runtime Framework (WORF)
321
V WebSphere 441–442, 455
vertex 244 WebSphere Application Server 21
Video Foundation DataBlade module 24, 292 WebSphere MQ 441, 443, 455
X
XBSA API 149
Index 519
520 Customizing the Informix Dynamic Server for Your Environment
Customizing the Informix Dynamic
Server for Your Environment
(1.0” spine)
0.875”<->1.498”
460 <-> 788 pages
Back cover ®
Customizing the
Informix Dynamic Server
for Your Environment ®