DB2 UDB V8.1 Family Application Development Certification:: Advanced Programming
DB2 UDB V8.1 Family Application Development Certification:: Advanced Programming
01 Apr 2004
In this tutorial you'll learn advanced programming skills for writing applications that
interface with the DB2 Universal Database. Topics include dynamic and static SQL,
use of UDTs, compound SQL, concurrency considerations, distributed units of work,
parameter markers, Unicode, and performance-enhancement features. This is the
sixth in a series of seven tutorials that you can use to help prepare for the DB2 UDB
V8.1 Family Application Development Certification exam (Exam 703).
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 1 of 49
developerWorks® ibm.com/developerWorks
You do not need a copy of DB2 Universal Database to complete this tutorial.
However, you can download a free trial version of IBM DB2 Universal Database from
the developerWorks downloads site for reference.
This tutorial is one of the tools that can help you prepare for Exam 703. You should
also review Resources at the end of this tutorial for more information.
This tutorial is designed to help you study for the advanced programming section of
the DB2 Family Application Development exam. The tutorial helps you better
understand how your application can interact with a database. For this tutorial to be
useful, you should have good background knowledge of how a database works. You
should also have a basic knowledge of:
• SQL statements
• Database objects
• Application compilation
• DB2 UDB products
Terminology review
Before you begin this tutorial, you need to be familiar with the concept of an SQL
access plan. This is a set of steps that the DB2 engine uses to execute a SQL
statement. It would include the indexes that are used, the times at which fields are
retrieved from the database tables, and the order in which the steps of the query are
executed. The access plan is created by a database engine based on an SQL
statement sent to that engine.
The term is used throughout this tutorial, so it's important to make sure you
Advanced programming
Page 2 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
understand it up front.
IBM, DB2, DB2 Universal Database, DB2 Information Integrator, WebSphere and
WebSphere MQ are trademarks or registered trademarks of IBM Corporation in the
United States, other countries, or both.
Other company, product, and service names may be trademarks or service marks of
others.
• Static SQL: This SQL is used for fixed statements, where the statement's
format is known ahead of time. It is used in stored procedures and for
queries that are known during application development.
• Dynamic SQL: This SQL is completely dynamic and can be built at run
time. This allows your application to adapt to the user's input.
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 3 of 49
developerWorks® ibm.com/developerWorks
All of the SQL statements that are embedded in stored procedures are static SQL.
These procedures are therefore compiled ahead of time, which allows the stored
procedures to be executed quickly, without requiring a recompilation. Dynamic SQL
is built using strings and is not embedded in the procedure code. DB2 will therefore
not know the precise content of the SQL statements when the procedure is initially
built. Even if you had a dynamic SQL statement based on a string with the entire
SQL statement in it, the statement would still be dynamic. This is because DB2 does
not read the contents of the string when compiling a procedure. Here's an example
of static SQL:
SELECT empno,edLevel
FROM employee
Static SQL also uses the database statistics at compilation time to determine the
best access plan. This can have both positive and negative impacts on your
application. On the positive side, you know that the access plan is not going to
change and that the SQL statement will always be executed in the same way. On
the other hand, if the data in the database changes, the plans for the SQL
statements will still execute based on the old statistics. The plan can later be
updated by rebinding the package. This will then rebuild the plan based on any new
indexes or statistics.
• C/C++
• COBOL
• SQLJ (this is part of the Java language -- for more information on it, see
Resources for a link to Part 5 of this tutorial series)
• FORTRAN
Advanced programming
Page 4 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
• Stored procedures
• "Canned" queries for Management Information System (MIS) programs
• SQL statements that will be executed many times without changes
1. Connect to the database. Before you can do anything else, you must first have a
connection to the database in your program creation script. This is required because
the static SQL will actually be stored in the database.
2. Prepare the application. The code file containing the static SQL must be
prepared. This step checks all the syntax of the SQL statements that you have in
your program. An error will be returned if the syntax of any of the statements is
incorrect. The SQL statements are stored as packages in the database catalog
tables. These packages are then used when the statement is executed. If you use
the PRECOMPILE command then the binding stage can be performed as part of this
command.
3. Compile the application. In this step, you compile your application as you
normally would.
4. Bind to the database. During this phase, the SQL packages in the application
are bound into the database. If any of the tables, views, or other database objects to
which the SQL statements refer are dropped or altered, then the statements will
have to be recompiled. This is because the packages are bound into the database:
We want to ensure that you don't receive errors at runtime on SQL statements that
may no longer be correct. The bind file will contain all of the access plans for the
SQL that was compiled.
Let's consider how these steps would look for an example C program.
Connection:
Preparation:
Compilation:
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 5 of 49
developerWorks® ibm.com/developerWorks
Binding:
A dynamic SQL statement is built by joining together one or more strings. These
strings are then read by the database as a full SQL statement and executed. (We'll
discuss the steps in this process in more detail in the next panel.) Every time a
dynamic SQL statement is executed, it has to be recompiled. The recompilation time
can be a large problem, since for short-running OLTP-type SQL statements, the
compilation time can actually be longer than the run time.
One of the greatest advantages of a dynamic SQL statement is that it uses the most
current statistics on the database. This ensures that the best compilation path for the
SQL will be generated. Static SQL relies on old statistics, which sometimes results in
poorly running SQL.
• C/C++
• COBOL
• The Java language
• FORTRAN
• REXX
• Visual Basic
Some common uses of dynamic SQL code include:
Advanced programming
Page 6 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
platform)
1. Create the string. The actual SQL to be executed must first be written to a string.
The creation of such strings can be quite complex, involving dozens of lines of code,
or can be quite simple and dependent on a single IF/ELSE clause. The advantage
to dynamic SQL is that you are only limited by your creativity. The example below
shows you how to write the string in SQL PL:
2. Prepare the SQL string. Once you have created the SQL string, you must
prepare it. The PREPARE statement executes a number of steps in the database
engine that ensure that the string you have given it is semantically correct. The
engine then optimizes the execution of the string. STATEMENT objects store the
information about the access plan for the SQL string. The host-variable in the
following syntax diagram is the string that you stored the SQL in.
>>-PREPARE--statement-name--FROM--host-variable--------|
Here's an example:
3. Execute the SQL string. After the string has been prepared, you can execute it.
Here's the syntax diagram:
>>-EXECUTE--statement-name-------|
Here's an example:
EXECUTE statementVariable
This isn't the only way to build a dynamic SQL statement, however. There is another
choice you can make in the second step:
2. Execute the string immediately. You can also just execute the SQL string
immediately and combine the preparation step with the execution step. This is done
using the EXECUTE IMMEDIATE command. Here's the syntax:
>>-EXECUTE--IMMEDIATE--sql-statement-------|
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 7 of 49
developerWorks® ibm.com/developerWorks
Here's an example:
EXECUTE sqlString
The following figure shows the steps for dynamic and static SQL compilation.
Advanced programming
Page 8 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
• It is extremely flexible.
• It can be built based on user input.
• It uses the most recent statistics.
But dynamic SQL too has disadvantages:
Further reference
For more information on static and dynamic SQL, search the DB2 Information
Center (see Resources) for the following keywords:
• Dynamic SQL
• Static SQL
• PREPARE statement
• EXECUTE statement
UDTs are used to establish the context of a variable you are using. It will allow you
to keep track of how an object is being used in your application. You can also define
the relationships between different data types and UDTs.
Let's consider an example. If you have an application that determines the best
routes between stores in the US and Canada, you will most likely have to work with
distances in both metric and imperial measurements. You may not know if the data
in a table is stored in kilometers or in miles. You could use a UDT to create a KM
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 9 of 49
developerWorks® ibm.com/developerWorks
object and another UDT for a MILES object. This would make it clear which type of
object is stored in your table. You could also write a function that automatically adds
KM and MILES objects together properly.
Creating a UDT
You can create a UDT with the statement CREATE DISTINCT TYPE. The syntax of
the command is as follows:
The source data type can be any standard data type used by DB2.
When the UDT has been successfully created, DB2 will automatically generate
support for the standard comparison operators (=, <>, <, <=, >, and >=) for use with
the source data type.
Authorization
The privileges held by the authorization ID of the statement must include at least one
of the following:
Advanced programming
Page 10 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
Using a UDT
People often generate a lot of errors when they first begin working with UDTs. This
is because UDTs interact differently with regular data types and each other.
Normally, DB2 automatically translates the data type for you, but with UDTs this will
not always occur.
Strong casting
DB2 will not automatically translate data types from one type to another. Some data
types will be translated if they are of the same data type group, as shown in the
diagram below. If they are not of the same group, then an error will occur.
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 11 of 49
developerWorks® ibm.com/developerWorks
One common problem that programmers encounter: Within some other RDBMSs, a
character field will automatically be cast to a numeric field if mathematical equations
are applied against it. If you tried to use the following code with DB2, you would
receive an error.
This statement would fail because you cannot add 1 to a character field, even if the
field is the string representation of a number.
Advanced programming
Page 12 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
Casting UDTs
Even though DB2 uses strong casting, you can still perform operations on UDTs and
have them interact with other data types and UDTs. If you want to compare two
different UDTs, you can cast them to the same data type using the CAST command.
Here's the syntax:
CAST--(--objectName--AS--target-data-type--)----|
Here's an example:
Let's look at a more in-depth example. First, create our distinct types and a table that
uses them:
Now, this next code snippet works because the data types are properly converted to
be the same data type. The SQL statement converts the integer to a KM object for
the comparison.
SELECT cityName
FROM cityInfo
WHERE width_k > km(10)
Now let's add some code that will cause a problem. This next snippet will not work
because DB2 will not know how to compare the MILES and KM data types:
SELECT cityName
FROM cityInfo
WHERE width_k > width_m
This final snippet will work, however, because the data types have both been cast to
INTEGER.
SELECT cityName
FROM cityInfo
WHERE CAST(width_k AS INTEGER) > CAST(width_m AS INTEGER)
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 13 of 49
developerWorks® ibm.com/developerWorks
Further reference
For more great information about UDTs, search the DB2 Information Center (see
Resources) for the following keywords.
Normally, SQL statements are executed individually. This allows for easy
programming, but it also means that each statement and result set must be
transmitted between the client and the server. To reduce the amount of network
traffic that your application generates, you can instead group a number of SQL
statements into a single group. This group is referred to as compound SQL.
With atomic SQL, the application receives a response from the database manager
when all substatements have completed successfully or when one substatement
ends in an error. If one substatement ends in an error, the entire block is considered
to have ended in an error. Any changes made to the database within the block are
rolled back.
With non-atomic SQL, the application receives a response from the database
manager when all substatements have completed. Each substatement within a block
is executed regardless of whether the preceding substatement completed
successfully. The group of statements can only be rolled back if the unit of work
containing the non-atomic compound SQL is rolled back.
• You reduce network traffic, since only the initial block of SQL and the final
result set will be transmitted.
Advanced programming
Page 14 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
• You improve overall application speed, because more work is done on the
server instead of the client.
However, there are disadvantages to using compound SQL as well.
• You can only receive one error message from the entire group of SQL
statements.
• You cannot tell which SQL statement failed if an error occurs.
The diagram below is an example of the reduction in network traffic that can occur
when you use use compound SQL. The thicker lines represent more network traffic.
The reduction is entirely dependent on the balance of SELECT, INSERT, UPDATE
and DELETE statements you have in your compound SQL.
There are several situations in which using compound SQL would be a good idea:
• You are going to be sending a great deal of data back and forth as part of
a block of SQL statements. Using compound SQL will reduce network
traffic.
• You have a slow client and want to utilize the high speed of the server.
Using compound SQL in a stored procedure ensures that the processing
will occur on the server and that only the final result is transmitted back.
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 15 of 49
developerWorks® ibm.com/developerWorks
Compound SQL is used in stored procedures and also with the following languages:
• Embedded SQL
• DB2 Call Level Interface (CLI)
• JDBC
Dynamic compound SQL
For larger and more complex statements, however, it is a better strategy to use
stored procedures.
A compound statement block begins with the BEGIN keyword and ends with the END
keyword. Here's an example:
BEGIN ATOMIC
UPDATE employee
SET salary = salary * 1.1
WHERE edlevel > 12;
UPDATE dept
SET budget = budget * 0.8
WHERE profitMargin > 10;
INSERT INTO deptList (empno,salary)
VALUES (SELECT empno,salary
FROM employee WHERE edlevel = 15);
END
If the statement listed above failed at the first UPDATE statement, you would not be
able to tell which of the statements failed. The advantage to making this a compound
statement is that the only data being transmitted to the server and back are the SQL
statements and the final SQL information.
Further reference
You can search the DB2 Information Center (see Resources) for the following
keywords for more information about compound SQL.
Advanced programming
Page 16 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
DB2 concurrency
Every database programmer should understand how database concurrency works.
RDBMSs all follow core database data integrity principals, but they vary a great deal
in how they handle locking, transaction, and concurrency control. By learning more
about DB2 concurrency, you can both avoid errors and concurrency problems and
also potentially improve your application performance greatly with only small
changes.
To avoid confusion later on, it's important that we define a few terms first:
• Isolation level: DB2 uses different isolation levels to maintain and control
the integrity of the data in the database. The isolation level defines the
degree to which a process is protected (or isolated) from changes made
by other applications as they also execute in the database.
• Lock mode: This represents the type of access allowed for the lock
owner as well as the type of access permitted for concurrent users of the
locked object. This is sometimes referred to as the state of the lock.
• Deadlocks: This is a condition that occurs when two or more separate
processes are waiting for resources that the other process(es) in the
deadlock have locked. Neither process can progress because each
process is waiting on the others.
• Lock escalation: This occurs when a lock is escalated from a smaller
lock granularity to a higher one due to insufficient lock memory space.
This could, for instance, mean escalating a row lock to a table lock.
1. Uncommitted read (UR). Uncommitted read is the weakest of the isolation levels
and offers the least data protection. Uncommitted read will read any row during a
unit of work, even if it is being changed by another application process. These
changes do not even need to have been committed. This means that your
application may be returning data that would no longer be valid if another
application's transaction were rolled back. UC implies the possibility of reading
incorrect data and ignoring all locks, which is why it is sometimes referred to as dirty
read.
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 17 of 49
developerWorks® ibm.com/developerWorks
Uses: UC is quite useful in large data warehouse systems where you may not want
to wait on a few locks that could be held in a generally read-only environment. It is
commonly used for reporting purposes to avoid all locks in the database.
2. Cursor stability (CS). Cursor stability is the default and most commonly used
isolation level. CS holds a lock on any row on which the cursor is currently
positioned. This ensures that the row cannot be changed by another process as it is
being read or altered. The lock on the row is held until another new row is fetched or
until the unit of work is terminated or completed.
If any row is updated, the lock will be held until the unit of work is completed. The CS
isolation level will not return uncommitted data.
Uses: The CS isolation level is the most commonly used because it ensures that you
do not read any rows that are currently locked and it also locks the row that you are
reading. Because the lock only holds the row currently being processed, contention
is minimal and locking is reduced, but data protection is still enforced. Most common
transaction processing systems use this isolation level.
Uses: This isolation level offers greater data protection, since all rows that are
returned by a query are locked during the UOW. However, it does greatly increase
the number of locks being held and can negatively affect concurrency.
4. Repeatable read (RR). This is the strongest isolation level; it offers the greatest
degree of data integrity. Locks will be held on all rows that are processed during the
running of a query. This means that rows that are not even returned in a result set
will be locked until the unit of work is completed. This has an extremely negative
impact on concurrency, since often entire tables are locked by this isolation level. It
does however ensure that if a query is repeated during a unit of work, the exact
same result will be returned. That is where the term repeatable read comes from.
The largest penalty of this isolation level is that table locks are often taken by the
optimizer. This effectively stops any other application from processing queries on the
table that is locked. No other application would be able to update, delete, or insert a
row into the table until the RR unit of work is done.
Uses: This isolation level is used when queries must return the exact result every
single time throughout a query, and no possibility of changes to the data set are
acceptable.
Advanced programming
Page 18 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
Uncommitted data. This occurs when data can be read that has been altered or
inserted but not committed.
In the example above, a row is updated and the P-name field is changed. A SELECT
query is executed by another application using Uncommitted Read. Since UR
ignores the locks on the row that was changed, it will return the altered value.
However, the update transaction is rolled back before it is committed. The SELECT
query has therefore returned an incorrect row.
Nonrepeatable reads. These occur when SELECTs return different result sets within
the same result set.
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 19 of 49
developerWorks® ibm.com/developerWorks
In the example above, an application A looks up the route from Dallas to Honolulu.
The transaction is repeatable, so no matter when in the transaction it is executed, it
will return the same result. However, if an application B added a row to the flight
table with a direct flight from Dallas to Honolulu, then a different result could occur in
application A depending when the query is executed. This means that if application
B is allowed to alter the table while application A is running its transaction the query
is therefore nonrepeatable.
Phantom reads. These occur when an application executes the same SQL code
twice in a transaction and the second execution returns additional rows.
Advanced programming
Page 20 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
Lock escalation
DB2 keeps track of all the locks that are held by the database in a memory area
called the LOCKLIST. Each lock in DB2 takes up a very small amount of memory
(36 to 112 bytes), so it is rare that a database runs out of LOCKLIST memory.
However, if this occurs, some of the locks will be escalated. This involves increasing
the size of a lock to the next level of locking. The escalation would be from either a
row lock to a table lock. Increasing the lock size can cause a lot of concurrency
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 21 of 49
developerWorks® ibm.com/developerWorks
problems, since millions of rows in a table -- instead of just a few -- could suddenly
be locked by an application.
The first image illustrates an example where an application has exceeded the
percentage of the total lock memory that it is allowed to have (the MAXLOCKS value).
The application therefore has its locks escalated and the total amount of space
allocated to it reduced. The second image illustrates the more common occurrence
of the total lock memory limit being reached. The lock escalation problem can be
overcome by increasing the LOCKLIST size. However, if you continue to run into
locking problems, you should probably look to see if you are failing to commit your
transactions and therefore not freeing up locked rows.
Lock types
There are 12 types of locks currently used by DB2 V8.1, but it is more important to
understand how locking works than to memorize all the lock types. The lock mode is
the type of access allowed for the lock owner as well as the type of access permitted
for concurrent applications on the locked object.
The two major lock types are exclusive and share locks. Share locks, or S locks, are
held on rows that are being processed for reading; they prevent the row from being
deleted or altered as the row is being read by the database engine. Once the row is
no longer being read, the lock is released and can be changed. Exclusive locks, or X
locks, are held on rows that have been updated, deleted, or inserted. The exclusive
lock prevents the row from being read or updated until the transaction has been
committed or rolled back. This is done so uncommitted data is not read by another
application.
Advanced programming
Page 22 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
All lock types are ignored by the application that holds them, since the locks are only
used to prevent other applications from changing or accessing data that has been
altered or changed by the application holding the locks. The different lock modes all
interact with each other differently, as you'll see in the next panel.
The following list will get you acquainted with the most important types of locks.
Row locks
• Intent none: The holding application can only read data, including
uncommitted data; other applications can both read and update the object
• Intent share: Holding applications can only read the data, other
applications can both read and update the object.
• Intent exclusive: The lock owner can both read and update the data, as
can other applications.
• Update: The application can update. Other applications can read the data,
but cannot attempt to update it.
• Exclusive: The lock owner can both read and change the object. Other
applications can only read the data if they are using the Uncommitted
Read isolation level.
The following table outlines the compatibility among different types of locks (see the
DB2 Information Center for more on the abbreviations):
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 23 of 49
developerWorks® ibm.com/developerWorks
Deadlocks
A deadlock is a very serious concurrency issue, and its cause can often be difficult
to determine. A deadlock occurs when one application places locks on some objects
in the database and then requests to put locks on other objects. These objects are
already locked by another application which in turn wants locks on objects already
locked by the original application. The two (or more) applications will wait forever for
each other's locks to become free .
A deadlock can end when the lock timeout value is reached and the application
waiting on the locks the longest gives up and rolls back its transaction. The deadlock
can also be released if it is detected by the DB2 deadlock daemon, which is an
asynchronous process that runs at regular intervals and checks if there are any
deadlocks. The daemon will choose which process to roll back.
Reducing deadlocks
Advanced programming
Page 24 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
Committing transactions
Data contention
Lock contention can also occur if one application wants to have a lock on the
resources that another application has already locked. Often lock contention, and
therefore lock wait, will be unavoidable. Sometimes, though, this lock contention is
due to DB2 reading in too many rows, which is a result of an index not existing on
the table. The lack of an index causes a table scan, and DB2 has to wait on rows
that have been locked by another application, even if the locks are not involved in
the transaction. This is sometimes hard to detect if the application has been
designed to avoid data contention, but the index that the application expects is
missing.
Most lock problems occur because exclusive locks have been left uncommitted. If
you are having locking problems, it is best to look for long-running transactions that
have changed data but have not been committed. You can then check to see if
those transactions can be split into more than one transaction and have some of the
locks released by using a commit.
Lock avoidance
In DB2 V8.1 Fixpak 4, a new locking strategy was added. To improve concurrency,
in some situations DB2 now permits the deferral of row locks for CS or RS isolation
scans until a record is known to satisfy the predicates of a query. By default, when
row locking is performed during a table or index scan, DB2 locks each row that is
scanned before determining whether the row qualifies for the query. To improve the
concurrency of scans, it may be possible to defer row locking until after DB2
determines that a row qualifies for a query.
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 25 of 49
developerWorks® ibm.com/developerWorks
db2set db2_evaluncommitted=YES
By turning on the new lock setting, you will no longer have to wait when scanning an
index or table for rows that are locked by other processes but are not part of the final
result set. This will help improve concurrency, since this has traditionally been a
concurrency problem when migrating an application to DB2 from other RDBMSs.
Further reference
You can search the DB2 Information Center (see Resources) for the following
keywords for more information about concurrency.
• Isolation levels
• Lock types
• Concurrency control
• Locking type compatibility
• Lock attributes
Advanced programming
Page 26 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
• More than one database management server is updated per unit of work.
• The application directs the distribution of work, and initiates commit.
• There may be multiple requests per unit of work.
• There is one database management server per request.
• Commitment is coordinated across multiple database servers.
A federated database contains references to one or more remote data sources and
their characteristics. A nickname is a reference to a federated data source. This
nickname is referred to whenever you need to access the data source.
Let's look at an example in which we'll create and use a nickname. We'll create a
nickname, farawayTable, for a data source called ORA, which contains the object
DREW.DATATABLE1. Once the nickname is created, we can treat it like a regular
database table.
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 27 of 49
developerWorks® ibm.com/developerWorks
The query does not have to know whether the object it refers to is part of the local
database or another RDBMS entirely. If the underlying object is moved, the
application does not have to be changed. This is known as location transparency.
The following image illustrates the potential sources and targets for federation:
Federation can also be used to reference nonrelational data sources, such as text or
XML files. To do this, you have to create a wrapper that tells DB2 how to SELECT,
UPDATE, DELETE, and INSERT from the data source. Once these are created, you
can treat the data source as if it were a table in DB2. The scientific community has
found this technique quite useful for working with DNA information in flat files.
Advanced programming
Page 28 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
Further reference
Search the DB2 Information Center (see Resources) for the following keywords for
more information about DUOWs.
• Multisite update
• Distributed unit of work
• Nickname
• Federated
• Information integrator
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 29 of 49
developerWorks® ibm.com/developerWorks
and also to reduce the number of times that a statement has to be compiled.
If we hardcoded the value of the education level (edlevel) that we wanted to return
information about, we would have to write a new SQL statement for each value we
searched on. If we ran the query again but searched for level 13, DB2 would have to
recompile the statement and store another copy of it in the catalog tables (if it was
static SQL code).
Picture how much space and wasted time this would take if we were using a field
like lastname instead and there were thousands of values. To avoid this problem,
you can use a place marker in your query instead of the value. Here's how you
would do it for static SQL:
Now that you have used a parameter marker, the SQL statement will only have to be
compiled once instead of once for each value. You can also use multiple parameter
markers in your queries. You can imagine how often you would have to compile the
statement if you had to do it once for every combination of parameters! Here's an
example:
Advanced programming
Page 30 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
Keep in mind that the only part of an SQL statement that a parameter marker can
replace is a variable value, as in the following code:
Let's look at three examples that demonstrate common mistakes developers make
when using parameter markers. Here's the first:
SELECT ?, lastname
FROM employee
WHERE empno = ?
This first example specifies a field in the SELECT list with a parameter marker. DB2
needs to know what fields are going to be returned in order to optimize a query. The
fields that are returned have a big impact on how DB2 retrieves the data. For
instance, they can determine things like the index that DB2 will use, or whether a
table scan will be used instead.
This second example specifies the name of the table in the FROM clause with a
parameter marker. DB2 would be unable to determine the table from which the data
is being extracted, so there is no way the SQL statement could be compiled.
This example specifies the field that the value 14 will be checked against. It also
uses two parameter markers to specify both the predicate field and the value field.
DB2 would be unable to determine which was the predicate or the value, so it could
not compile the statement.
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 31 of 49
developerWorks® ibm.com/developerWorks
Using parameter markers can offer a number of advantages, chief among them is
reduced compilation time. DB2 first prepares a SQL statement before it can be
executed. The preparation phase creates the access plan that the SQL will use to
execute the SQL. When the SQL is being compiled, DB2 checks a section of DB2
memory called the package cache. The package cache contains all of the SQL
statements and their associated access plans. If an SQL statement matches the
query that you are trying to compile, then DB2 will not have to recompile it. It will
instead use the access plan stored in the package cache.
Keep in mind that the SQL statement will have to match exactly for the access plan
in the package cache to be reused. Let's look at some examples to illustrate what
would match and what would not. Imagine that this is the original statement in the
package cache:
The first unmatched statement fails because the mgrno, '000055', is not identical
to '000056'. DB2 uses a straight text match to see if the fields are identical. The
second unmatched statement fails because a parameter marker is used instead.
Using a text match, the ? symbol does not match the value '000056'. This third
example fails for a similar reason, but a host variable was used instead of a
parameter marker.
If, however, a parameter marker had been used in the original statement, then any
execution of the statement using a parameter marker would match. The SQL
statement using the parameter marker could then be executed using any value for
mgrno, such as '000055', '005600', '000123', and so on, and the statement
would still only need to be compiled once.
To illustrate more completely, imagine that this is the original statement in the
Advanced programming
Page 32 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
package cache:
The advantage of not having to compile becomes clear when you consider how long
a query may take to compile. Imagine that we have a query with a compile time of
0.001 seconds and an execution time of 0.001 seconds. How long will it take to
execute that query 10,000 times? Here's the formula to determine that:
Thus, in this case, using a parameter marker cuts our total query time in half!
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 33 of 49
developerWorks® ibm.com/developerWorks
price INTEGER,
amount INTEGER
);
These two queries illustrate the way in which using parameter markers could make a
big difference when working with this table:
In the first query, 800,000 records are going to be returned for the colour 'PINK'.
This list will then be reduced further based on how many of these pink toys are of
type 'CAR'. It would make sense to just scan the table instead of using an index,
since so many records are returned.
The second example returns only 5 records based on the colour 'BROWN', and
possibly less than that as it narrows its search to the type 'CAR'. It would be best to
use an index on colour to reduce the number of items you have to search for the
type 'CAR'.
Now, let's see what happens if we try to do a similar search using a parameter
marker:
Here, DB2 will have no idea what colour is being referred to. It doesn't know if 5 or
300,000 records are going to be returned. The different color values could have
created entirely different access plans. Since a parameter marker was used, only
one access plan will be created.
When parameter markers are involved, DB2 uses the average statistics for the
variable to generate an access plan. DB2 determines how many records on average
each colour has. Since 'PINK' is so common, the average number of records per
colour would be quite high, and thus DB2 would probably use a table scan for every
query on colour. This would be great for 'PINK', but bad for 'RED' or 'BROWN'.
Thus, it would probably have been better to not have used a parameter marker in
the query, since the extra compilation time could be a lot less than the extra
processing time that results from the improper use of a table scan.
Advanced programming
Page 34 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
• Queries that are executed multiple times and only have their parameter
value change
• Queries for which the compilation time can be longer than the execution
time
• Short, OLTP-type queries
However, there are some queries that do not benefit from the use of parameter
markers:
Further reference
Search the DB2 Information Center (see Resources) for the following keywords for
more information about parameter markers.
• Parameter markers
• Host variables
• Package cache
• PREPARE
• SQL compilation
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 35 of 49
developerWorks® ibm.com/developerWorks
What is Unicode?
When databases were first invented, everything was stored in them in standard
English character sets. This could have been ASCII for UNIX or Windows platforms
or EBCDIC for mainframe computers. This was fine when the data was not stored in
multiple languages. As the demand grew for other languages, specific coding sets
were created for non-English languages. You could then create a database
specifically for Japanese- or Russian-language data.
In today's on-demand world, a company seldom has the luxury of storing data in only
one language. Many databases store information from all over the world, and this
demands support for all languages. The Unicode language standard was created to
allow all languages to be stored as one encoding set.
Here's the definition of Unicode as provided by the Unicode home page: "Unicode
provides a unique number for every character, no matter what the platform, no
matter what the program, no matter what the language."
Unicode is already an essential part of the Java platform and other standards, such
as XML, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, and so on.
Advanced programming
Page 36 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
Unicode is no cure-all, though. There are a number of problems that Unicode does
not solve:
• Text must still be input in the native code set and displayed and rendered
to the screen with the proper fonts.
• You must still understand the different languages and their requirements.
• You must write your own bidirectional algorithms for displaying characters
for right-to-left languages such as Hebrew or Arabic; the database does
not take care of this by default.
Types of Unicode
There are two main flavours of Unicode being used today, UTF-8 and UCS-2 (aka
UTF-16). Each has its own advantages and disadvantages.
UTF-8 stores all characters in a format from one to four bytes. All of the ASCII
characters are stored as a single byte, and languages such as Japanese or Korean
take three bytes per character. A few very, very rare characters take four bytes.
• The size of databases designed to store ASCII data will not need to
change.
• The translation between ASCII and UTF-8 is very quick since they have
similar encodings. ASCII is actually a subset of UTF-8: The binary
representation of an ASCII character is equivalent to the binary
representation of the same UTF-8 character, except that the UTF-8
representation will be preceded by 8 bits padded with zeros.
• UTF-8 data is stored as CHAR data in DB2, so no application changes are
needed for CHAR-based apps.
There are disadvantages to UTF-8 as well:
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 37 of 49
developerWorks® ibm.com/developerWorks
• The characters are all the same length, so manipulating and working with
strings is simple.
• UCS-2 is the native encoding for the Java language, so no character
encoding translation is required.
There are disadvantages as well:
• The size of ASCII data doubles, since all characters are two bytes. This
will double the amount of disk space that you need.
• The characters are stored as GRAPHICs in DB2, so your stored
procedures, functions, and applications will have to be redefined to
accommodate the new type.
Using UTF-8
UTF-8 is used when space is a large concern. Converting a data warehouse from a
non-Unicode standard to a Unicode database is a large task, since all the character
data will have to be converted. Data warehouse systems also normally require a
large amount of space. Translating a 2 TB database from ASCII to UTF-8 will keep it
the same size. Using UCS-2, however, would double the size of the database and
require an extra 2 TB of disk space! This would be an extremely expensive choice. It
would be smarter to use UTF-8 to reduce your infrastructure costs.
Using UCS-2
UCS-2 makes programming easier, since all the characters are a fixed length. Your
application can then easily translate your string manipulation functions to handle the
two-byte characters. The database string manipulation functions for UCS-2
automatically treat the two bytes as a single character, so you can treat it just as you
would an ASCII string. You would therefore not need to change any application
logic.
UCS-2 is also the default standard for the Java language and for Microsoft Windows.
This means that no encoding transformation on those platforms will be required if
your data is stored as UCS-2.
Advanced programming
Page 38 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
types. Large Object Binaries, or LOBs, are also stored differently. UTF-8 LOBs are
stored as Character Large Object Binaries, or CLOBs. UCS-2 LOBs are stored as
Double-Byte Character Large Object Binaries or DBCLOBs.
It is simple to create a Unicode database. When you issue the DB2 CREATE
DATABASE command, you have to specify the encoding scheme for the text in the
database. If you do not specify the encoding scheme, then by default DB2 will use
the OS's encoding scheme. If you wanted to store data in a Japanese encoding
scheme, then you would provide DB2 with this information. Unicode is viewed as just
another encoding scheme, albeit one that holds information for all languages.
--CREATE-DATABASE---database-Name---USING-CODESET--UTF-8---TERRITORY--territoryName--|
The territory that you use can be any valid territory code, such as US for the United
States. Changing the territory value will not change the Unicode data. However, this
piece of information is needed to keep the format of the CREATE DATABASE
command consistent for Unicode databases as well. The UTF-8 codeset variable
must be written in all uppercase characters.
The database encoding information is at the top of the output and will look like this:
Database territory = US
Database code page = 1208
Database code set = UTF-8
Database country code = 1
The codepage for UTF-8 is 1208 while the codepage of UCS-2 is 1200. The
configuration will show UTF-8 as the codeset even if you only use UCS-2 data.
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 39 of 49
developerWorks® ibm.com/developerWorks
By default, the codepage of the client is equal to the codepage of the operating
system that it is running on. This can be overridden by setting the DB2 CLI setting
DB2CODEPAGE to be the codepage you want to use. This is useful if your application
stores information that is in a different language, or if you are using a Unicode client.
No data will be lost transmitting data from an ANSI client to a Unicode database.
The codepage of the data will be translated from the ANSI codepage to the Unicode
codepage when the data arrives at the server.
Advanced programming
Page 40 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
If, however, you were working with a Japanese client and you sent Japanese
Unicode data back to it, then the data would be processed without a problem. The
encoding would be translated at the server.
Collation order
The collation sequence for a language is the order in which letters will be sorted.
This is commonly known as the order of the alphabet. The order of the characters
varies from language to language, and is quite important when sorting character
strings. Each character encoding has its own unique collation sequence, which is
specific to the language. Some languages have multiple character encodings and
each encoding may have a slightly different sort order. Consider, for example, an
English encoding in which lowercase letters such as q are be sorted before
uppercase letters such as Q. Another encoding may have Q first and q later.
Unicode uses a single collation sequence for all languages. A number of prominent
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 41 of 49
developerWorks® ibm.com/developerWorks
linguists worked on the collation sequence to generate one that is closest to the
dominant sequence for each of the character sets stored in the encoding. You could
consider this sequence as having multiple alphabets all in one set. The Korean
character set would be sorted in one order, and Japanese in another. However, if
you displayed strings with both languages, you would always receive the same sort
order, since the characters are also part of a complete collation sequence for all of
Unicode. The collation sequences for UTF-8 and UCS-2 are identical.
As a result, a Unicode DB2 database only supports a single collation sequence. DB2
allows you to manually override the collation sequence of an encoding, but doing so
requires manually altering the file holding the collation sequence in a binary format.
Doing this for the Unicode sequence would be far too complex and prone to errors,
and would cause the database to behave differently from what people would expect
and rely on in their applications.
Here's what Unicode characters look like when displayed in the DB2 Control Center:
The data can also be displayed in the Command Window, as the figure below
Advanced programming
Page 42 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
illustrates. This allows you to run your queries and ensure that they are returning the
correct values, even if the results are in another language. This can be quite helpful
when you are troubleshooting applications that display multiple languages at the
same time.
Further reference
You can search the DB2 Information Center (see Resources) for the following
keywords for more information about Unicode:
• Unicode database
• UTF-8
• UCS-2
• Codepage
• Territory
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 43 of 49
developerWorks® ibm.com/developerWorks
In this final section of our tutorial, we'll look at two such features: buffered inserts
and MERGE statements.
Buffered inserts
Often, an application requires a large amount of information to be loaded into the
database. You may be fortunate to have the data in a text file, in which case you
could use the DB2 LOAD command to quickly load the data into tables. You may,
however, want to still be able to alter the table as you load the data, which LOAD
does not allow you to do. In that case, you could use the IMPORT command instead.
Often, though, the data being loaded in is based on user selections, so it is not in a
flat file but instead in your application's memory. Using a series of individual INSERT
statements in this situation can be quite costly, because each will be sent to the
database and executed, and the result code will be sent back, before the next insert
can occur. In applications where there is a long distance between the client and
database, the extra network latency can cause a serious performance problem. It
would be much more efficient if you could join together a series of INSERT
statements into one group. This can be achieved by using buffered inserts.
A buffered insert occurs when you send a group of INSERT statements over to the
database server from a client as one large group. To take advantage buffered
inserts, the application must be prepared or bound using the INSERT BUF option.
Buffered inserts are normally used when a single INSERT statement is repeated
multiple times in a loop, and no other database modification statements are issued in
between the inserts. Normally, the INSERT statements reference a value that
changes as the loop is iterated. Instead of sending the INSERT statements as
individual statements, they are all grouped together in a table queue and run as one
unit.
Error handling
Advanced programming
Page 44 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
the inserts failed if an error occurs. The entire batch of INSERT statements will fail if
even a single statement fails. The error is also not reported immediately, but instead
reported when the first statement that is an UPDATE or DELETE, or an INSERT on a
different table, is executed.
An inserted row may not be immediately visible to SELECT statements issued after
the INSERT by the same application program, if the SELECT is executed using a
cursor.
Now, suppose our file contains 8,000 values, but value 3,258 is not legal (it may, for
example, include a unique key violation). Each set of 1,000 inserts results in the
execution of another SQL statement, which then closes the INSERT INTO t2
statement. During the fourth group of 1,000 inserts, the error for value 3,258 will be
detected. It may be detected after the insertion of more values (not necessarily the
next one). In this situation, an error code is returned for the INSERT INTO t2
statement.
The error may also be detected when an insertion is attempted on table t3, which
closes the INSERT INTO t2 statement. In this situation, the error code is returned
for the INSERT INTO t3 statement, even though the error applies to table t2.
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 45 of 49
developerWorks® ibm.com/developerWorks
The example below shows how the MERGE command could be used to control the
insert or update of a group of rows into the account table from the transaction table.
The complete syntax of the MERGE statement is quite flexible, and therefore can be
quite complex if you choose. The command was added in DB2 V8.1 Fixpak 2 and
the syntax diagram is in the documentation for DB2 V8.1 Fixpak 2 or later.
Further reference
You can search the DB2 Information Center (see Resources) for the following
keywords for more information about these performance-enhancing features.
• Buffered insert
• MERGE
Summary
This tutorial was written to help you with the sixth part of the DB2 Application
Development exam (Exam 703). By reviewing this information, you should now have
a greater understanding of a number of key concepts related to running your
application on DB2. This tutorial was designed as an overview to these concepts. If
you are looking for more in-depth knowledge, then the DB2 Information Center (see
Resources) is the ideal place to look.
To make writing your application easier, it is important that you understand how the
database is going to interact with your program, and how you can utilize the
functionality it has built into it. By leveraging your new knowledge of database
concurrency and advanced techniques, you should be well on your way to getting
great performance of your application on DB2.
Advanced programming
Page 46 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
Resources
Learn
• This tutorial will help you study for the application development exam, but there
is only so much space and time to explain the many concepts. Each of the
sections has a panel at the end that refers to you to keywords that you can look
up in the DB2 documentation for further information. The best source of
information for DB2 is the new Information Center. This Information Center was
added in DB2 V8.1 Fixpak 4 and has an entirely new interface written using the
Eclipse programming framework. It is quick and easy to use, and the search
utility built into it is far better than any of the previous incarnations. It also has
links to other sources of DB2 information on the Web, such as Google and the
DB2 Technotes. >All of my references at the end of the sections refer to this
source and not to the DB2 documentation included with the product itself. The
new Information Center is written by the DB2 Information Development team
and is the new official reference for DB2 help. Go out and try it! The DB2
Information Center is also available on the Web.
• If you need to know more about Unicode, check out the Unicode home page.
• For more information on the DB2 UDB V8.1 Family Application Development
Certification (Exam 703), see IBM DB2 Information Management -- Training and
certification for information on classes, certifications available and additional
resources.
• As mentioned earlier, this tutorial is just one tutorial in a series of seven to help
you prepare for the DB2 UDB V8.1 Family Application Development
Certification exam (Exam 703). The complete list of all tutorials in this series is
provided below:
2. Data Manipulation
4. ODBC/CLI Programming
5. Java Programming
6. Advanced Programming
7. User-Defined Routines
• Before you take the certification exam (DB2 UDB V8.1 Application
Development, Exam 703) for which this tutorial was created to help you
prepare, you should have already taken and passed the DB2 V8.1 Family
Fundamentals certification exam (Exam 700). Use the DB2 V8.1 Family
Fundamentals certification prep tutorial seriesto prepare for that exam. A set of
six tutorials covers the following topics:
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 47 of 49
developerWorks® ibm.com/developerWorks
• DB2 planning
• DB2 security
• Accessing DB2 UDB data
• Working with DB2 UDB data
• Working with DB2 UDB objects
• Data concurrency
• Use the DB2 V8.1 Database Administration certification prep tutorial seriesto
prepare for the DB2 UDB V8.1 for Linux, UNIX and Windows Database
Administration certification exam (Exam 701). A set of six tutorials covers the
following topics:
• Server management
• Data placement
• Database access
• Monitoring DB2 activity
• DB2 utilities
• Backup and recovery
Advanced programming
Page 48 of 49 © Copyright IBM Corporation 1994, 2007. All rights reserved.
ibm.com/developerWorks developerWorks®
Advanced programming
© Copyright IBM Corporation 1994, 2007. All rights reserved. Page 49 of 49