0% found this document useful (0 votes)
6K views42 pages

The InterBase and Firebird Developer Magazine, Issue 3, 2005

The third issue of "The InterBase and Firebird Developer Magazine", 2005. In this issue we have compared performance of InterBase 7.5, Firebird 1.5 and Yaffil 1.0 and came to some interesting results. Also this issue highlight differences between Firebird and MS SQL from the developer's point of view. Also there is a big article by Roman Rokytsky, developer of JBird JDBC driver for Firebird, devoted to full text search approaches in different databases.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6K views42 pages

The InterBase and Firebird Developer Magazine, Issue 3, 2005

The third issue of "The InterBase and Firebird Developer Magazine", 2005. In this issue we have compared performance of InterBase 7.5, Firebird 1.5 and Yaffil 1.0 and came to some interesting results. Also this issue highlight differences between Firebird and MS SQL from the developer's point of view. Also there is a big article by Roman Rokytsky, developer of JBird JDBC driver for Firebird, devoted to full text search approaches in different databases.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Full Text Search

in DBMS

Object-oriented development and RDBMS, Part 2

JayBird 2.0, a JCA/JDBC driver for Firebird

How I started to work with MS SQL Server and ADO

TPC based tests for InterBase & Firebird


www.ibdeveloper.com
2005 ISSUE 3

Keep on watching
www.ibdeveloper.com

We're preparing
special
Christmas
surprises and bonuses
for all InterBase
and Firebird fans!

www.ibdeveloper.com 2 © Copyright 2005-2006, All right reserved www.ibdeveloper.com


2005 ISSUE 3 CONTENTS

THE INTERBASE
& FIREBIRD
Contents
DEVELOPER Editor notes
MAGAZINE by Alexey Kovyazin
Back to the Future ................................................................. 4
Oldest Active
by Helen Borrie
Connecting Remotely .............................................................. 5
Cover Story
Roman Rokytskyy
Full Text Search in DBMS .......................................................... 6
Credits InterBase
Alexey Kovyazin, by Bill Todd
Chief Editor New Cache Options In InterBase 7.5 ............................................16
Dmitri Kouzmenko, Development area
Editor by Vladimir Kotlyarevsky
Helen Borrie, Object-oriented development and RDBMS, Part 2 ............................. 18
Editor
Development area
Noel Cosgrave, by Roman Rokytskyy
Sub-editor JayBird 2.0, a JCA/JDBC driver for Firebird ................................... 30
Lev Tashchilin,
Designer
Development area
Natalya Polyanskaya, by Vladimir Kotlyarevsky
Blog editor How I started to work with MS SQL Server and ADO ........................... 31
Editorial Office
IBase IBDeveloper, office 5, TestBed
1-st Novokuznetsky lane, 10
by Alexey Kovyazin
zip: 115184 TPC based tests for InterBase & Firebird....................................... 39
Moscow, Russia
Phone: +7095 6869763
Fax: +7095 9531334
Email: [email protected]
www.ibdeveloper.com
© Copyright 2005 by IB Developer.
All rights reserved.
No part of this publication may be
reproduced or transmitted in any form
of or by any means, electronic or
mechanical, including photocopy or any
information storage and retrieval sys-
tem, without permission.
For promotional reprints, contact
reprint coordinator Alexey Kovyazin,
[email protected].
IBDeveloper reserves the right to revise,
republish and authorize its readers to
use the articles submitted for publica-
tion. All brand and product names used
in on these pages are trade names, serv-
ice marks or trademarks of their respec-
tive owners.

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 3 www.ibdeveloper.com


EDITOR NOTES 2005 ISSUE 3

Back to the future


ou know, the flow of life is like a All community members are anticipat- cles. Please do not hesitate to post

Y spiral – old ideas become new


with each convolution. Basic,
apparently modern ideas were allive in
ing the SMP support, scalability and
easy extensibility that the Vulcan
implementation will bring to subse-
your comments in the blog at
www.ibdeveloper.com or send them to
[email protected]. Your
Plato's ideal world of ideas. Today we quent releases of Firebird. We can also thoughts and suggestions are very
have a good chance to prove spiral the- look forward to the release of Vulcan important for our authors and editors!
ory in the software development area – itself, which is already SMP-enabled, in Firebird Conference and Borland
many ideas have been resurrected this quarter 1 of 2006. Conference
year. I'd like to devote the first part of We intend to test Beta version of
this editor's note to several new ideas In November we've had two major
Vulcan once it will be available among events of the year – the Firebird
with old histories. other servers with tests based on TPC-R Conference and the Borland Conference
and TPC-C. TPC-C is especially interest- (with a track devoted to InterBase). We
Yukon and others
ed because it implies intensive use of plan to publish special issues devoted
It was about 25 years ago when multi- SMP capabilities. Of course, at this to these events – watch the news!
generational (sometimes called multi- point there is still a lot of work needed
version for marketing reasons, I sup- to make Vulcan stable and convenient Happy New Year!
pose) architecture (MGA) was imple- enough for all users, but I believe it is
mented in the first versions of This is the last issue of our magazine in
only a question of time.
InterBase, and now it is “reinvented” 2005, so it is a good time to wish you
by Microsoft. The newest version of Issue 3 is out Merry Christmas and
MSSQL (Yukon) has multi-version abili- Happy New Year! See
Well, let's get back to the present. Issue you in 2006!
ties for records reading. 3 of “The InterBase and Firebird
Well, one of the database market lead- Developer Magazine” is out. “3” is a Sincerely yours,
ers decided to use the approach that magical number and, with a third issue,
Alexey Kovyazin,
was used for years in InterBase and we can say that we are on the right
Firebird. Fans of locking servers fre- road. Growing interest and increasing Chief editor
quently proclaimed MGA wrong and readership instill in us some confi-
useless, yet it is clear that multi-ver-
sion capabilities are valuable and use-
dence that our magazine is a good and
relevant idea.
Buy paper versions
ful for database development, especial-
ly for combined OLTP+OLAP systems. Of What is in this issue?
course, Yukon is only the first step.
Microsoft folks need to study many
I got an email from a reader who noted
that there was no need to describe arti-
#1
things before they turn MSSQL into a cles – if readers are too lazy to read
full-scale multi-version engine (well, in articles, they certainly won't read the
every joke lies a piece of truth :). editor's note :)
Multi-version architecture is rising in The only thing I'd like to say, therefore,
popularity each year: it was implement- is that we've published a printed ver-
ed in the MySQL InnoDB storage sion of the third issue simultaneously
engine, in Linter, and it seems like so, if you prefer to read it beside the
many servers are on the same course, Christmas tree, you can. Back issues are
too. I suppose that we will be seeing available too - for details visit
more and more multi-version imple- http://ibdeveloper.com/paper-ver-
mentations before long. sion/. #3
Vulcan In this issue we've added some comics
and interviews to make it more fun and
You know that Vulcan is a revolutionary friendly. Hope you'll enjoy it!
new architectural prototype for Firebird
development. Although based on the Community
principles of the original InterBase
Actually, I am less than happy with the
architecture, it is implemented using
community feedback. It seems people
modern techniques.
just have no opinions about the arti-

www.ibdeveloper.com 4 © Copyright 2005-2006, All right reserved www.ibdeveloper.com


2005 ISSUE 3 OLDEST ACT IVE

Connecting Remotely
he rave of the month has been, of Well, the upshot of it all was that a

T course, the Firebird Conference.


For the first time, the conference
moved out of Germany and was staged
good time was had by all. The wish to
return to Prague for next year's confer-
ence was unanimous—as long as it
by Helen Borrie,
[email protected]

in the Czech Republic, in beautiful, his- were not convened at Hotel Olsanka!
toric Prague. The blogs and photo Now, who am I to be waxing lyrical
albums tell it all: the fun bits as well as about the Prague conference? I wasn't drink...While in New Zealand at mid-
the serious bits. There was plenty of there. Too sad! There were many year, I bought a replica of the Famous
beer—a traditional element, given that moments during those four days that I Coffee Plunger, intending to put it up
the two preceding conferences took wished it were otherwise. What made it for the auction, with the hopeful inten-
place in the homeland of German “happen” for me were the blogs. tion of being permitted to keep mine. I
hops!—not to mention cake and cuck- Martijn Tonies, from the Netherlands, totally forgot to mail it over for the
oo clocks and confrontations with started blogging the conference before auction. Mea culpa! But it will keep
armed soldiers and hotel maids. it even began. Thanks to Holger Klemt for next year.
People came from far and wide—Brazil, and his team of techos from Germany, Amazingly, Luc's famous auction—
South Africa, Russia, Central Europe, the conference attendees had Internet which nobody can deny is a total rip-
Japan, North America, even New by wireless in one and only one place— off—netted about 3000 Euros for the
Zealand, which is about as far away the bar. So, while I was visualising Firebird Foundation's coffers. That
from Prague as you can go and still lonely Martijn slaving over a hot key- translates neatly into a grant alloca-
have running tapwater. It was great to board in his hotel room at every spare tion for a part-time QA person for
have Russians there for the first time: moment, in fact he was slaving over a Firebirds 2 and 3. Deep pockets are a
it's been so hard in the past to get them cool beer and a hot keyboard in the bar wonderful thing for open source soft-
to Germany. at every spare moment! It was a good ware development.
The reasons for choosing Prague were formula.
Oh and—yes, it's true, Prague in
several. Our past venue in Fulda had Stefan Heymann, from Germany, also November is too cold for this sub-trop-
been great but it was also expensive, blogged, although not with the intensi- ical dweller. I don't own an overcoat
especially for visitors from outside the ty and frequency exhibited by Martijn. and I live in a part of Australia where a
Euro currency zone. There had been Make a note, all you bloggers, for next winter coat would be a collector's item
persistent difficulties for some would- year: there is no such thing as Too Many if it were available at all and would be
be participants to get visas. Prague Blogs. priced accordingly. Add the shortcom-
appeared a good candidate to address Another conference tradition is for ings of my cold-climate wardrobe to the
both of those problems, with the added Lucas Franzen to kidnap my coffee overwhelming cost of airfares from here
benefit of Very Cheap Beer, or so we plunger and force me to pay a ridicu- to anywhere—the speaker's fee would
were told by the locals. Was it true? lous price at his mad auction to get it have got me about as far as Tokyo, one
Apparently—if the volume of empty back in order to survive the rest of my way—and one has fairly compelling
beer bottles visible in nearly every journey with proper coffee to reasons to stay put.
photo is any kind of indicator.

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 5 www.ibdeveloper.com


COVER STORY 2005 ISSUE 3

Full-text search in RDBMS


What is full-text search and a signature file index. Below are by Roman Rokytskyy,
The full-text search came into our short descriptions of both approaches. [email protected]

T lives together with the Web and


Internet. Catalogs like Yahoo! are
quite often inconvenient when looking
The reader can find more detailed
information in [1].

Inverted file index


for a particular piece of information
that does not belong to a single cate- The inverted file index has two parts –
gory and are just useless when informa- a vocabulary, containing all distinct Lucene as a model of a text search
tion was simply not added to the cata- values being indexed (i.e. words, when engine. Below you will find a description
log. Google revolutionized the search we talk about the text documents) and of its main components; later, we will
industry by improving the quality of an inverted list, a mapping between the use the classification of components and
the search, usually showing the answer vocabulary entries and the documents algorithms for RDBMS search solutions.
to the asked question in the first items containing those entries. A query is
returned in the result set. evaluated by obtaining the inverted Lucene
So, what do people expect from a full- lists for each term from the query and The Lucene search engine indexes doc-
text search? Three main characteristics then either merging them for disjunc- uments. Each document is an entity
come to mind: tive queries or intersecting them for that consists of one or more text fields.
= Query language with text-specific
conjunctive queries. The final set con-
When the document is added to the
tains “pointers” to the documents that
predicates, not only Boolean ones. People index, its fields are usually split into
satisfy the specified query. Additional
expect to have the possibility of perform- structures allow the search algorithm indexable terms which are generated
ing a search for documents containing to rank the documents in the result set. by a “tokenizer” component. Union of
some specific phrase, or documents all terms in the index forms an inverted
where some words are close together. Signature file index file dictionary. In order to simplify the
= Fuzzy matching. People expect that The signature file index groups various explanation we will suppose that our
not only words specified in the query indexing approaches that have in com- document contains only one field.
will be matched, but also variations of mon the fixed-length signature of each Constructing an index
them will be considered, e.g. word document. The simplest approach is
stemming, plurals, thesaurus search and that, for each document, a fixed-width The process of adding a new document
it would be nice if search engine could bitstring of length w is assigned. Each to an index consists of a number of
anticipate common spelling errors. word that appears in the document is steps. First, the indexing component

= Result scoring. People expect that


hashed to determine the bit in the bit iterates through all fields in the docu-
string that should be set to 1. It can ment passed into it and calls the tok-
results of the query are returned in the enizing component, which in turn splits
and will happen that two different
correct order, i.e. the most relevant the text of each field into set of terms
words can set the same bit: in this case,
documents are returned first. and performs additional steps, like con-
no additional action is taken. The
This article aims to explain the com- query terms are also hashed using the verting words into singular form or
mon issues that arise with implement- same algorithm and then those docu- extracting word stems. Later, the tok-
ing full-text search support in databas- ments whose signatures have the same enized terms are stored in the index
es and reviews the current situation bits set as in the query become candi- together with the pointer to the field
with full-text search in various RDBMS dates for retrieval. Later, each docu- to which it belongs, and the position of
systems and Firebird in particular. ment is processed to discover any false the term in a field.
matchesthat were caused by the fixed Let's consider following short documents:
Search and indexing length of the bit string. Various modifi-
When we talk about search engines, cations are possible in order to improve “Fascinating creatures, phoenixes, they
“full-text search” is used as a synonym querying performance, a good overview can carry immensely heavy loads, their
for a corresponding index. But in fact we of which can be found in [1] as well. tears have healing powers and they
do not need an index for this. One could make highly faithful pets.”
create a function that would go through Anatomy (J.K.Rowling, Harry Potter)
each document and check whether it of the text search engine “Fascinating Large Pack Animal: this
satisfies the specified query or not. The inverted file approach is believed creature can carry large and heavy pieces
The index speeds up the search. The to produce the more compact and of equipment; can be equipped with
faster we get results, the happier we faster index compared to the signature large carry packs” (Slightly modified
are. Two main approaches to the text file index and this approach is the basis phrase from a web site dedicated to one
indexing exist: an inverted file index for the Lucene search engine. We use computer game).
www.ibdeveloper.com 6 © Copyright 2005-2006, All right reserved www.ibdeveloper.com
2005 ISSUE 3 COVER STORY
“The Healing Power of Pets reveals the togram can be found in Illustration 2, = “phrase query” matches documents
completeness of animals and their abili- in the “term frequency” column that that contain a particular sequence of
ty to heal our lives... A captivating and contains term frequency values for terms. Another variant of this query,
heartwarming look at the fascinating each indexed document. called “sloppy phrase query”, also
role of pets and relationships in health Another structure, called “document matches sequences that might contain
and healing.” (Amazon.com editorial frequency”, is built, containing the other terms between the specified ones;
= “boolean query” combines other
review page on a book “Healing Powers number of documents that contain the
of Pets” by Dr. Marry Becker) specific term. Document frequency atomic queries with Boolean operators.
After passing these sentences through allows an estimate of the “selectivity” The simplest implementation can com-
the tokenizer, the set of tokens is of the term to be made. bine the scores obtained from the
added to the dictionary. The inverted It is clear that the term frequency his- atomic queries into a base score and
index can be represented by a table togram also works as an inverted index, use that as the basis for sorting the
similar to the one in Illustration 1. The so there is no need to keep a separate result set. This would provide much
left column contains terms, the right table for this purpose. better results than simply matching on
column a list of the documents in the inverted file index, but would still
which each term was found. It is also worth mentioning that the
indexing component also builds a leave much to be desired with regard to
“term proximity table”, which stores relevance. The reason is that words are
Term Documents the position of the term in the docu- simply not equally relevant. To the best
ment. This structure is used by phrase knowledge of the author there is no
ability 3 “silver bullet” that solves the question
queries, but it is too big to be shown on
animal 2,3 the illustration. of which word is more relevant to the
context in comparison to the others.
1,2
can Searching Statistics play a great role here. The fol-
... ... Searching is done in two steps. In the lowing scoring factors have been deter-
first step, the query is parsed and con- mined to improve the match relevancy:
= “term frequency” – score factor
llustration 1: Inverted index table verted into a tree of atomic queries. It
Already this table can be used to return is unlike databases, with each atomic based on the number of times the spec-
a set of documents that satisfy the query returning a score that rates how ified term appears in the document.
specified full-text query. The applica- well that particular query matches a The more often term appears in the
tion can combine predicates with AND document. document, the better that document
and OR operators, which is internally The following atomic queries are sup- corresponds to the topic. The corre-
converted into operations over the sets ported: sponding data structure, created during
= “term query”, the most widely used
of document IDs. But the answer to a the addition of a document to the
query is barely usable by a human. For index, was described earlier.
query. Each word in the term query is
a reasonably large document index, processed with the tokenizer, which = “document frequency” – score fac-
thousands of matches will be returned. converts the word into its “normal” tor based on the number of documents
The main task of the indexing compo- form; the query matches documents that contain the term. It is used to
nent, therefore, is to construct struc- containing the specified term; determine term relevance, since terms
= “fuzzy query” matches documents
tures that not only answer the question that occur in fewer documents are usu-
of which documents contain terms ally better indicators for the topic.
containing terms for which the similar-
from the query, but that also help to Because a smaller value means better
ity is greater than some previously
rank the documents in the result. selectivity, an inverted value, also com-
specified value. The similarity measure
puted when adding a document to the
A structure called a “term frequency in Lucene is based on the Levenshtein
index, is usally used.
histogram”, an ordered list of <term, algorithm, but other similarity meas-
frequency> pairs for each indexed doc- ures like SOUNDEX can be used; = “length norm” – normalization fac-
= “wildcard query” matches docu-
ument, plays a major role in document tor for the “term frequency” score,
ranking. It can be imagined as a based on the total number of tokens in
ments with terms that in turn match
spreadsheet whose first column con- a document. A match in a large docu-
the specified wildcard pattern; this
tains terms from the index dictionary, ment is less precise than the match in a
query can be very slow, especially when
whose first row contains document IDs, short document, so the higher “term
the “*” wildcard is used at the begin-
and the rest of whose cells contain the frequency” score is needed for a docu-
ning of the term;
frequency of the term's appearance in ment to become relevant. It is comput-
the document. This frequency is com- = “prefix query” is a special case of the ed during the addition of the docu-
puted from the number of times the wildcard query that has room for addi- ment to the index.
= “coord score” – score factor based
specific term appears in the document. tional optimization; consequently, it
has been extracted into separate class;
An example of the term frequency his- on the fact that the more terms in the

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 7 www.ibdeveloper.com


COVER STORY 2005 ISSUE 3
query matching the terms in the docu-
ment, the better the match is; it is
Term frequency
Term Doc freq
computed dynamically for each query. Doc 1 Doc 2 Doc 3
= “sloppy match” – score factor appli- ability 1 1
cable only to phrase searches; the clos-
er the proximity of the terms from the animal 1 1 2
query to one another in the document,
the better the match. It is computed can 1 2 2
dynamically for each query, but captivating 1 1
requires a term proximity table that is
created during the addition of the doc- carry 1 1 2
ument to the index
completeness 1 1
The search component applies these
scoring factors to the outcome of each creature 1 2 2
query and later recombines the results equipment 1 1
into an overall document score. This
document score is the final score that is equipped 1 1
used to sort documents in the result set.
faithful 1 1
Let's return to our example, the inverted
index for which is presented on fascinating 1 1 1 3
Illustration 2, and consider two queries.
The first one should search for all docu- have 1 1
ments containing words “healing” and heal 1 3 2
“abilities”, the second one searches for
the phrase “fascinating creatures”. Let's health 1 1
compute the rank of each document for
both queries. heartwarming 1 1

The first query consists of two atomic heavy 1 1 2


term queries combined with a Boolean
highly 1 1
query. First, let's process words with the
imaginary tokenizer that was also used immensely 1 1
to construct the term frequency his-
togram. Our words after processing are large 3 1
“heal” and “ability”. The rank of docu-
life 1 1
ment k in our case can be computed as
the sum of the term ranks normalized by load 1 1
coord score, where the rank of the term i
in document k is proportional to the term look 1 1
frequency and inversely proportional to
make 1 1
the number of terms in the document
and the document frequency of the term. pack 2 1
The coord score is computed and the pet 2 2
number of term matches in the docu-
ment divided by the maximum number of phoenix 1 1
matches in all documents.
piece 1 1
The formula used in Lucene gives ranks
r1=0.06, r2=0 and r3=0.47, which com- power 1 1 2
pletely corresponds to our expectations
– the third document matches our query relationship 1 1
the best, since it talks about the abilities reveal 1 1
of pets to heal our life; the first one talks
about healing powers, which semantical- role 1 1
ly is quite similar to the healing abilities,
tear 1 1
but because there's no 100% word
match, it gets lower rank; and finally the Length norm 16 16 18
second document does not have any
term match, so it will not be displayed in Illustration 2.: Term frequency histogram for the sentences mentioned above.

www.ibdeveloper.com 8 © Copyright 2005-2006, All right reserved www.ibdeveloper.com


2005 ISSUE 3 COVER STORY
the result at all. The formula is con- to compare the document ranking algorithm with Lucene, for example.
structed to give r=1 for an exact match.
The second query is a phrase query,
Text search in RDBMS:
which matches only those documents Case studies
that contain all terms specified in a Creating a usable full-text search solution for an RDBMS is challenging task. To
query string, and therefore the third doc- start with, full-text search systems work with the “document” concept. A way
ument will be filtered before the ranking must be devised to map relations in an RDBMS to documents in the full-text search
begins. We will use term score and slop- engine. The designer of the system must decide on the level at which the map-
py phrase score to construct the docu- ping is performed and what the result of the search is to be. Furthermore, the
ment ranking. First, we compute the search engines that are widely used to index web pages are using a “single writer”
term rank, as in the previous case, and strategy. In other words, there is a crawler component populating an index and a
then combine it with the sloppy phrase search engine accessing the index in read-only mode. This is a complete contrast
score. That score is considered to be 1 to a database engine, where transactions write to the database and run in concur-
when the search terms are found within rent mode.
the specified maximum distance Neither the relational data model nor SQL is of much use for full-text search,
between words and 0 if one of the terms though it is relatively easy to build the inverted file index and corresponding data
is not found, or the distance is greater structures that were described above. The main issue is not the task of populat-
than the maximum allowed. In our case, ing the index, but the querying itself, especially when terms are combined with
we use maximum distance 4, to include the AND operator. The ranking of the results is also an issue. We recommend to the
also the second document. reader an article “Integrating Structured Data and Text“, published in the
Intelligent Enterprise magazine, as a good overview of a pure SQL approach to the
This approach gives us ranks r1=0.37 and
text search [2].
r2=0.21. For the sake of completeness we
define r3=0. These results are in full The absence of an acceptable solution that uses relations and SQL does not mean
agreement with our expectations. that full-text search in databases is not possible. On the contrary, most database
vendors have implemented this feature in their products. Some third-party prod-
It is also worth mentioning that, normally,
ucts exist for Firebird, too, though they do not compare favourably with the solu-
the sloppy phrase scoring factor is limited tions from the commercial database vendors from the perspective of usability.
to a relatively short distance between
terms, usually 2. A scoring factor of two Below we summarize the most notable approaches that are used in the products
here would effectively filter out the sec- available on the market.
ond document, since the phras “fascinat-
ing creatures” does not occur in it. Oracle Text
Oracle Text was a part of a separate product interMedia. Since Oracle version 9 it has
Finally, a few words about the Google
shipped with the Standard Edition of the server. The Oracle Text feature is complete-
ranking algorithm, which caused a revo-
ly integrated with the database and supports replication, parallel execution, etc.
lution in the Internet search engines.
During some university research, Larry Creating a full-text index in Oracle is done by issuing the CREATE INDEX command
Page and Sergey Brin suggested that and specifying the INDEXTYPE IS CTXSYS.CONTEXT clause. The CONTEXT index type
information on an Internet website is corresponds to the general-purpose index described in previous section. Oracle
more relevant if many other sites con- Text has two additional index types: CTXCAT, an index “designed specifically for
tain hypertext links on it. This fact eBusiness catalogs” and CTXRULE for building classification or routing applica-
introduced an additional scoring factor tions. The CONTEXT index type also allows an application to specify the synchro-
“page rank” that is proportional to the nization preference: manually, on commit, or at regular intervals. It also allows the
number of back-links to the web page; use of a transactional text index, by which information becomes searchable right
in essence, the more popular the page after inserting or updating. The CTXCAT index type is optimized for short docu-
is, the more people have referenced the ments and is always up-to-dat.
page and the more relevant it is.
Illustration 3.: Example of CREATE INDEX statement in Oracle
CREATE INDEX description_idx ON product_information(product_description)
This approach works

INDEXTYPE IS CTXSYS.CONTEXT
very well when used

PARAMETERS ('sync (every "SYSDATE+5/1440")');


with hypertext doc-
uments. However, it
is useless when no
back-links exist conceptually in the Illustration 3 shows an example of the CREATE INDEX command for a
target domain, such as an RDBMS sys- “product_description” column of the “product_information” table that is to be
tem. Recently, Google started a new updated every five minutes.
service called “Google Database”, by Oracle supports following query types:

tion and let Google index it. It will be = “Exact match”. Corresponds to the “term query”, without the tokenizer in our clas-
which people can store their informa-

an interesting task to discover its sification; matches only those documents that contain exactly the specified term;
behavior in case of simple text files and = “Word positioning”. Corresponds to the “phrase query” and “sloppy phrase”
© Copyright 2005-2006, All right reserved www.ibdeveloper.com 9 www.ibdeveloper.com
COVER STORY 2005 ISSUE 3
query; it matches documents that contain a specified phrase, words near to each other, but also searching for documents
that contain the discrete words of the phrase in the same sentence or paragraph.
= “Inexact match”. Sort of combination of the “predicate query” and “fuzzy query” with different matching algorithms, like
“SOUNDEX”, stemming, prefix matching, thesaurus matching.
= “Intelligent match”. This corresponds to the “fuzzy query” with the matching algorithm backed by some knowledgebase
that can determine ontological similarity of the query terms and terms found in documents.
= “Boolean combination”. Corresponds to “Boolean query” in our classification, allows other queries to be combined by
means of AND, OR, NOT operators.
The query specification (Illustration 4) is somewhat unusual and requires explanation.
llustration 4.: Example of a full-text query in Oracle
SELECT
score(1), product_id, product_name
FROM product_information
WHERE
CONTAINS (product_description, 'monitor NEAR high resolution', 1) > 0
ORDER BY score(1) DESC;

First, the CONTAINS query operator. It takes two mandatory parameters – column name and query expression. It returns a value
greater than 0 for all records where the specified column matches the query expression. The third optional parameter assigns
a label to the score returned for each record which can be later accessed by the SCORE function, which takes the label as
parameter. Applications can use multiple CONTAINS clauses in one statement and combine corresponding scores with boost
factors or other arithmetic to obtain better document ranking.
The technical side of the story looks not so bright. In fact, despite the white papers, even in Oracle 10g full-text search works
rather more like an add-on than an architectural feature. First, the full-text search parameters can be changed only via proce-
dure calls from the CTXSYS schema. The actual syntax for the call is a mix of Oracle SQL and the component-specific commands.
The full-text index itself is a potential bottleneck. As you can see from [3], when index synchronization is set to “sync on com-
mit”, Oracle effectively serializes all transactions. Additionally, the synchronization happens only after commit, causing a gap
between when documents that were added or updated become visible to other transactions and when they become search-
able. And, in the worst case, error/failure in the full-text index has no influence on the outcome of the transaction.

Microsoft SQL Server


Microsoft took a different approach. The core search functionality is provided by the Microsoft Search (MSSearch) technolo-
gy that is also used by Microsoft Exchange and Microsoft SharePoint Portal Server. MSSearch builds, maintains and queries
full-text indexes stored in the file system (as opposed to inside SQL Server). The logical and physical storage unit used for
full-text indexes by MSSearch is a catalog. A full-text catalog contains one or more full-text indexes per database—one full-
text index may be created per table in SQL Server and may include one or more columns from that table in the index. Each
table may belong to only one catalog, and only one index may be created on each table [4]. The querying mechanism sup-
ports searching for words or phrases, words in close proximity to each other and inflectional forms of verbs and nouns.
Querying involves an OLEDB access to the MSSearch component and returns a result set containing the ID of the document
and its rank. Illustration 5 shows two examples of using the full-text search in Microsoft SQL Server 2000 (see [4] for details).
Illustration 5.: Querying MSSearch component and joining it with the table from the database in SQL Server 2000
-- [APPROACH 1:]
-- most expensive: select all, then join and filter
SELECT ARTICLES_TBL.Author, ARTICLES_TBL.Body, ARTICLES_TBL.Dateline,
FT_TBL.[rank]
FROM FREETEXTTABLE(Articles, Body, 'Ichiro') AS FT_TBL
INNER JOIN Articles AS ARTICLES_TBL
ON FT_TBL.[key] = ARTICLES_TBL.ArticleID
WHERE ARTICLES_TBL.Category = 'Sports'

-- [APPROACH 2:]
-- works, but can backfire and become slow or return inaccurate results:
-- perform filtering via Full-Text and only extract key and rank
-- (processing done at web server level)
SELECT [key], [rank]
FROM CONTAINSTABLE(Articles, *, 'FORMSOF(INFLECTIONAL('Ichiro')
AND "sports"')

www.ibdeveloper.com 10 © Copyright 2005-2006, All right reserved www.ibdeveloper.com


2005 ISSUE 3 COVER STORY
In the imminent Microsoft SQL Server 2005 release, the full-text search component was improved with respect to index main-
tenance, e.g., the full-text search index is backed up together with the database backup, which was not the case before.
Microsoft also claims to have improved the actual indexing performance in the new version.

Netfrastructure
Netfrastructure is not positioned as pure database system, but as a web application development environment.
Netfrastructure is designed and developed by Jim Starkey, the person who created InterBase, but targets a different appli-
cation domain. Unfortunately, not much information about its full-text search feature can be found in public sources [5].
However, the approach used there is quite interesting and worth discussing here. All of the detail below was obtained either
from posts by Jim Starkey in Firebird-Architect list or from the private correspondence with him.
There is no separate command to create a full-text search repository, since it comes into existence with the creation of the
database. The only thing required is to define the fields of the tables as searchable, as shown on Illustration 6.
Illustration 6.: Definition of the searchable column in Netfrastructure
CREATE TABLE product_information(
id INTEGER NOT NULL PRIMARY KEY,
product_description VARCHAR(2000) SEARCHABLE);

Adding the SEARCHABLE keyword after the text or CLOB column definition is enough to tell the engine to index the contents of
the field on each INSERT or UPDATE. The added documents are directly searchable within the same transaction and rollback
removes added items from the index. Currently development is done to allow applications to move the updating of the indexes
to a separate thread. In this case, added documents might not be directly searchable in the current transaction.
There are two options for the full-text query. The first uses the MATCHING operator within the SELECT statement (Illustration 7);
the second one requires use of the API and, according to Jim Starkey, is exactly what application developers are interested in.
Illustration 7.: Example of a SELECT-based full-text search in Netfrastructure
SELECT * FROM product_information
WHERE
product_description MATCHING '+monitor +”high resolution”';

The most interesting part of the full-text search using the API is that the search query returns all hits in all tables and all
fields. Each hit is ranked by the scoring algorithm (currently hardcoded) and is returned in the ResultList structure, represent-
ing a list of java.sql.ResultSet instances, each of them containing one record.
Additionally, a ResultList object provides schema and table names and the score of the hit. The key message here is that peo-
ple looking for something in the database are not interested in hits in a single table, but most likely in all tables. Naturally,
the search API allows the search scope to be limited to specified tables.
An example of using the Netfrastructure search API from Java is shown in Illustration 8.
llustration 8.: Example of using search API from Java in Netfrastructure
Statement stmt = connection.createStatement();
ResultList rl = stmt.search("+monitor +\"high resolution\"");
while (rl.next()) {
String tableName = rl.getTableName();
double score = rl.getScore();
ResultSet rs = rl.fetchRecord();
... // process ResultSet object
}

Stops words in Netfrastructure are indexed but skipped during the scan. After a hit, the search engine goes back and checks
that the word was actually there. The phrase "Alexander the Great" scans for "Alexander" and "Great" with an intervening
word. When it finds an instance, it checks for "the" in between.
Netfrastructure supports the following atomic queries:
= Term or list of terms. Hitting more terms in the query is better than hitting few. Upper case letter matches uppercase let-
ter; lower case letter matches either case;
= Specifying the required term and the term that should not be contained in a document. This is somewhat similar to the
“Boolean query” described above, especially if we consider that the simple list of terms is combined with OR operator;
= Phrase search, which was explained above;
= Wildcard and prefix queries.

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 11 www.ibdeveloper.com


COVER STORY 2005 ISSUE 3

In addition to the standard scoring factors already mentioned, Netfrastructure Happy New Year Wishes
supports “distance from the beginning of the document”. No information about
the reasons for this factor was found, however. Marina Novikova:
What do you think were the greatest
MySQL FB project achievements in 2005 in
MySQL's solution is somewhat similar to the solution used in Oracle, but has also general?
some interesting differences. It must be mentioned that full-text search is sup- Helen Borrie:
ported only on MyISAM tables, which is pretty strong limitation. Getting the Fb 2.0 Beta out was
An application can define a full-text index for a selected table and list of columns more of an achievement than it is
to index (Illustration 9). The FULLTEXT “constraint” takes the list of columns that givencredit for. The guys have
will be automatically indexed. done a fantastic job of cleaning up
some serious messes and making a
Illustration 9.: Example of creating the full-text index for two columns in MySQL
lot of things work better. The
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
record of serious, thoughtful work

title VARCHAR(200),
is there for all to see, whether you

body TEXT,
read the core devs' reports at the

FULLTEXT (title,body)
website:

);
http://firebird.sourceforge.net/
index.php?op=devel?sub=engine
or whether you test Fb 2.0 against
MySQL supports all basic query types except the fuzzy queries. A typical MySQL predecessors and competitors.
query using full text search (query 1 in Illustration 10) performs only record fil-
Paul Ruizendaal:
tering according to the specified query. In order to obtain the score of the hit,
We have achieved so much in 2005,
quite a strange construct is used where the MATCHING ... AGAINST clause is used
it is hard to pick something.
in the column list section of the SELECT statement as well as in its WHERE clause
Perhaps our greatest achievement
(query 2 on Illustration 10). The query optimizer detects such a usage pattern and
is that we are now recognised as
the actual full-text query is executed only once.
one of a few leading open source
Illustration 10.: Example full-text search queries in MySQL
-- query 1
databases, winning prestigious

SELECT * FROM articles


projects. Mostly that is because of

WHERE MATCH (title,body) AGAINST ('database');


all the great stuff we as a commu-
nity have done, but in part it is

-- query 2
also because we have gained a lot

SELECT id, body,


of self confidence: we radiate suc-
MATCH (title,body) AGAINST
cess. People see that and try our
'Security implications of running MySQL as root')
code because of it.
AS score
FROM
Vlad Horsun:

articles
There have been many events this

WHERE
year, but speaking about achieve-

MATCH (title,body) AGAINST


ments the greatest one is probably
('Security implications of running MySQL as root');
the release of the first beta, and, I
hope, the second beta release
before the end of this year.
The most interesting feature used in MySQL is so-called "query expansion". It is Dmitry Yemanov:
quite useful when the search query is short or when the query terms contain It was a hard work on making v2.0
spelling errors. The idea is to execute two full-text queries. The first one is stable and moving to the Beta
matched against the specified query and its top results are used as a query for the stage. We have released nothing
second search. Documentation contains the following example that describes the major this year, but a lot of inter-
fuzzy matching with the spelling corrections: nal work has been done and people
“Another example could be searching for books by Georges Simenon about Maigret, may see the results in our field-
when a user is not sure how to spell “Maigret”. A search for “Megre and the reluctant test versions.
witnesses” finds only “Maigret and the Reluctant Witnesses” without query expan- Alex Peshkov:
sion. A search with query expansion finds all books with the word “Maigret” on the Foremost it is stable Vulcan’s work
second pass.”. with huge databases in the SMP
mode in SAS.

www.ibdeveloper.com 12 © Copyright 2005-2006, All right reserved www.ibdeveloper.com


2005 ISSUE 3 COVER STORY
Firebird: Present and the Future
Full-text search support in open-source databases an exception rather than a commodity. MySQL provides a solution similar
to the one used in Oracle, but PostgreSQL and Firebird do not have any built-in solutions. Existing third-party components
have limited capabilities, especially when it comes to the level of integration with the engine and the querying capabilities.
It is clear that Firebird needs full-text search capability, though it is not yet clear how to implement them. Until the issue is
solved by the Firebird team, let's try to define the possible solutions that applications can use.

Fuzzy word matching


Before we start thinking about the full-text search approach, let's check whether we need it at all. Those applications look-
ing for better fuzzy matching capabilities in short text fields do not need true full-text search engine. Not only is it a fea-
ture overkill but it slows down the database. Short sentences and product names are not good candidates for full-text search
algorithms.
Below is the short description of two products that have been available on the market for some time and can be directly used
with Firebird. Neither of them provides hit scoring, but limits the search to “match”/”no match” behavior. Because of the
querying and ranking limitation we do not regard them as full-text search solution , but as a fuzzy word matching components.
FastTextSearch for InterBase/Firebird
FastTextSearch is a set of UDFs and stored procedures to perform full-text search within text and BLOB fields. The architec-
ture is conceptually similar to the one used in Lucene. However, due to limited querying capabilities and absence of hit
ranking, we did not include this component in our case studies.
The FastTextSearch component constructs a search repository using a proprietary binary index format. Since Firebird does not
provide any mechanism for engine callback from within UDF functions, it has to use BLOB fields, the content of which is
processed by the proprietary UDF.
Illustration 11.: Initial steps to enable the full-text search for a table CUSTOMER
-- step 1
ALTER TABLE customer ADD fts_id INTEGER;
CREATE INDEX customer_idx_fts ON customer(fts_id);

-- step 2
CREATE TRIGGER t_bi_customer_fts FOR customer ACTIVE BEFORE INSERT AS
DECLARE VARIABLE parser INTEGER;
DECLARE VARIABLE i INTEGER;
BEGIN
IF (EXISTS(SELECT * FROM ts$opt WHERE enable > 0)) THEN BEGIN
parser = parser_create();
i = parser_add(parser, new.CUSTOMER);
i = parser_add(parser, new.CONTACT_FIRST);
i = parser_add(parser, new.CONTACT_LAST);
i = parser_add(parser, new.ADDRESS_LINE1);
i = parser_add(parser, new.ADDRESS_LINE2);
i = parser_add(parser, new.CITY);
i = parser_add(parser, new.COUNTRY);

EXECUTE PROCEDURE ts$update(new.FTS_ID, 'customer', :parser)


RETURNING_VALUES new.FTS_ID;

i = parser_free(parser);
END
END

These functions are shipped in a precomit has to use BLOB fields, the content of which is processed by the proprietary UDF.
Because of the restrictions described, the FastTextSeach component requires changes to the existing schema. First of all, in
all tables that are to be indexed, a new integer field has to be added. This field will contain a “document ID” that is gener-
ated by the indexing component when the document is added to the index. Next, the application has to define triggers that
perform mapping from the table record to an abstract, multi-field document.

1 Even though the “full-text search” term is used in their descriptions.

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 13 www.ibdeveloper.com


COVER STORY 2005 ISSUE 3
The parser_add UDF parsers the content of the specified column and registers it under the ID that was generated by the pars-
er_create UDF. The stored procedure call adds the preprocessed document to the index and returns its ID for storing in the
column fts_id.
Querying capabilities of the FastTextSearch component are rather limited. It supports only term queries, with terms that can
be combined with either AND or OR Boolean operators; the terms matching can be either prefix or SOUNDEX and the the set-
ting applies to all terms in the query (Illustration 12). FastTextSearch does not provide hit scoring, either, limiting the result
to “match”/“no match” only.
Illustration 12.: Example of full-text query using the FastTextSearch component
SELECT c.* FROM
ts$select_or('customer', 'tech corp', 0) ts
LEFT JOIN
customer c ON c.fts_id = ts.obj_id

IBObjects Full-Text Search Module


IBObjects [8] has also a module for fuzzy text matching with the features similar to the FastTextSearch component, but using
different technology. Like the rest of the IBObjects components, it is limited to the Delphi/C++ Builder development envi-
ronment and Windows platform. Efforts have been made to port IBObjects to Kylix to run it under Linux, but no official
release is available.
Similar to the FastTextSearch component, IBObjects FTS requires changes to the existing database schema. The distribution
contains a handy GUI that simplifies this task and hides all complexities from the user. Unlike the FastTextSearch compo-
nent, that performs text processing and index maintenance within the same request, IBObjects FTS works more elegantly.
Triggers that were defined in the previous step populate an auxiliary table with the changed information and generate a
Firebird event about the change. This event is processed by a standalone index synchronization component, that fetches the
information about the changed tables, processes the content of the text fields (including BLOB SUB_TYPE 1 fields) and
updates the indexes.
The most elegant thing is the usage of the event facility of Firebird, which effectively decouples the transaction in which
tables are modified and index population and, since events are only posted if the transaction in which they are created is
committed, it saves CPU cycles in case of rollback. The drawback of this scheme is that, because the full-text query will not
“see” updates performed in the same transaction, changes will be searchable only after commit.
IBObjects FTS components supports exact and partial word matching, exact and partial metaphone code matching, SOUNDEX
code matching, synonym and antonym matching.
Searching the database is performed in a way similar to the FastTextSearch and MS SQL Server approaches, but offers a sepa-
rate Delphi component for these purposes. Internally the full-text query is evaluated against the index tables and then
joined with the original table using the primary key column defined during index creation. IBObjects FTS does not provide
hit ranking, only “match”/”no match” hits.

Integration with external tools


Another possible approach for Firebird users is integration with third-party tools. The author does not know about any such
projects, but the development, at least theoretically, is not complicated. The only requirement is the ability to execute
queries against external data sources, which is only available currently in Oracle-mode Firebird using external stored proce-
dures capabilities [9].
In this case application can use scheme similar to the IBObjects FTS, where a stand-alone synchronization engine would lis-
ten for the events generated by the update triggers, would fetch the modified data and feed them to an external full-text
search engine, say, MSSearch via the COM interface, for example. The querying should be performed via the external proce-
dure that returns the table primary key and the score, which the application can join with the target table using syntax sim-
ilar to the one used in MS SQL Server, shown in Illustration 5.

2 In old versions of Firebird three triggers has to be defined, BEFORE INSERT, BEFORE UPDATE and BEFORE DELETE. Since Firebird 1.5 appli-
cation can use universal triggers and put all logic into one PSQL block.

3 Warning: The C/C++ external procedure interface is not yet stable, significant changes are possible in the future, especially when this
feature will be ported to Firebird main code base. Therefore, those who will start development against the existing external procedure facil-
ity, be prepared to update the interface part.

www.ibdeveloper.com 14 © Copyright 2005-2006, All right reserved www.ibdeveloper.com


2005 ISSUE 3 COVER STORY
About the author
Roman Rokytskyy works as a senior consultant in Germany. Since 2001 he has been partcipating in the Firebird project, pri-
marily in the JCA/JDBC driver subproject, but is also involved in other areas of Firebird development. Information retrieval,
full-text search in particular, is one of the author's interests, that has been applied in practice during the development of a
high-performance content store for spatially distributed information.

References
1. Justin Zobel, Alaister Moffat, and Kotagiri Ramamohanarao. Inverted File versus Signature Files for Text Indexing.
ACM Transactions on Database Systems (TODS), Volume 23 , Issue 4 (December 1998), Pages: 453 – 490
2. David Grossman and Ophir Frieder. Integrating Structured Data and Text.
Intelligent Enterprise Magazine, September 18, 2001, Vol 4 No 14
http://www.intelligententerprise.com/010918/414analytic1_1.jhtml
and October 24, 2001, Vol 4 No 16 http://www.intelligententerprise.com/011024/416analytic1_2.jhtml.
3. Oracle Text 10g. Technical Overview. http://www.oracle.com/technology/products/text/x/10g_tech_overview.html
4. Andrew B. Cencini. Building Search Applications for the Web Using Microsoft SQL Server 2000 Full-Text Search.
Microsoft Corporation, December 2002.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sql_fulltextsearch.asp
5. Netfrastructure Inc., http://netfrastructure.com/
6. MySQL 5.0 Reference Manual. Chapter 12.7. Full-Text Search Functions.
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
7. FastTextSearch for InterBase. http://www.textolution.com/ftsib.asp
8. IBObjects. Full Text Search Module. http://www.ibobjects.com/ibofts.html
9. Fyracle: Oracle-mode Firebird. http://www.janus-software.com/fb_fyracle.html

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 15 www.ibdeveloper.com


INTERBASE 2005 ISSUE 3

New Cache Options In InterBase 7.5


reason is that all high-end relational by Bill Todd,
Write Caching [email protected]
databases have complex internal struc-
efore I discuss write caching I

B have to explain the confusing


and inconsistent terminology
that surrounds it. Nowhere will you find
tures that include pointers to other
structures. One example for InterBase is
a table with a blob field. If you insert a
new row the row will contain a pointer to
an option to turn write caching on or
the page where the blob is stored. Now
off. If you look at the General tab of the
suppose that write caching is on and
Database Properties dialog in
that the page containing the new row is cache writer thread the commit
IBConsole you will see an option called
written to disk then the server crashes returns. This lets the client that com-
Forced Writes. Setting forced writes to
before the blob page is written. The mitted the transaction continue work-
true disables write caching by forcing
database now contains a pointer that ing without waiting for the disk I/O to
all writes to disk immediately. If you
points to an object that does not exist. take place. It also lets other transac-
examine the documentation for gfix in
Why can that happen with write tions that are waiting for the commit-
the Operations Guide you will see that
caching on but not with write caching ted transaction’s locks to be released to
it has a -write switch that can be fol-
off? InterBase uses a system call "care- continue. The writer thread writes the
lowed by either async or sync. The sync
ful writes". Careful writing means that pages to disk in the order it received
option turns on synchronous writes,
InterBase writes the object being the pages (which is careful write
which is the same as turning off write
pointed to before it writes the pointer. order). Since all writes are done in
caching. The async option enables
With write caching off the pages are careful write order you get the same
asynchronous writes which turns on
always written to disk in the order that protection from corruption that you get
write caching. In other words:
InterBase writes them which is in care- with forced writes only but you get bet-
Write caching on = forced writes off = ter performance because the user does
ful write order. With write caching on
asynchronous writes not have to wait for the disk I/O to take
InterBase writes the pages to the oper-
Write caching off = forced writes on = ating system write cache. InterBase place. The only risk is that if the server
synchronous writes has no control over the order in which crashes you will lose any changes made
the operating system cache manager by transactions that were committed
Because the terminology is inconsis- but which the writer thread has not yet
tent and confusing I will talk about will write the pages to disk. In Firebird
and in InterBase versions prior to 7.5 written to disk.
turning forced writes on or off. Just be
aware that how you do that and what it you have only two choices. Turn write That seems like a recipe for disaster
is called varies depending on whether caching off, be safe, and put up with because a transaction can be marked as
you use IBConsole or gfix. By the way, the slower performance or turn write committed even though it has not been
just to add to the confusion, write caching on to get the improved per- written to disk. It seems that, if the
caching is on by default on Linux and formance and hope that the server does server crashes, the changes made by
Solaris and off by default on Windows. not crash. In InterBase 7.5 you have the transaction will be lost, whilst
two new options. InterBase will think the transaction
Write caching is a two edged sword. It committed so it will not rollback the
can improve performance because Group Commit transaction when the server restarts.
pages are written to the operating sys-
With forced writes enabled you are Fortunately, that is not what happens.
tem's write cache and the operating
guaranteed that all user updates that The state of transactions is tracked on
system writes the data to disk at some
are part of a transaction are on disk and the transaction inventory pages (TIP).
later time. Take a simple case of a data-
have been written in careful order when However, there are two copies of the
base where one page holds, on average,
the transaction is marked as commit- TIP, one in memory and one on disk.
50 records. If you update 20 of the rows
ted. With force writes off you are not With group commit on, when a transac-
on that page the page will be written to
guaranteed that writes will take place tion commits it is marked as committed
disk once with write caching enabled.
in careful order and you are not guaran- in the in-memory copy of the TIP
With write caching disabled the page
teed that all changes are on disk when (called the transaction inventory page
will be written to disk at least once for
the transaction is marked committed. cache or TPC). The transaction is not
each transaction and perhaps more
often than that. When a transaction commits with marked as committed in the on-disk
forced writes and group commit copy of the TIP until after the writer
The disadvantage of write caching, and thread has finished writing all of the
enabled, the pages that have been
it is a big one, is that you are much transaction's pages to disk. If the serv-
updated by the transaction are passed
more likely to have a corrupt database er crashes, the in-memory TPC is lost.
to a background cache writer thread in
if the database server crashes with When IB restarts it looks at the on-disk
careful write order. As soon as the
write caching on than with it off. The TIP, sees that the transaction is still
updated pages have been passed to the
www.ibdeveloper.com 16 © Copyright 2005-2006, All right reserved www.ibdeveloper.com
2005 ISSUE 3 INTERBASE
active and rolls it back. flush disabled. The only advantage is
that you can be sure the cache is
ALTER DATABASE
To enable group commit execute:
flushed to disk at least every N sec-
SET GROUP COMMIT
onds.
To enable database flush, first make
ALTER DATABASE
To disable group commit use:
sure that forced writes are off for the
SET NO GROUP COMMIT database; then execute the following

ALTER DATATBASE
Although you can have group commit command:

SET FLUSH INTERVAL 5


set with forced writes off it makes no
sense to do so. The background writer
will be writing to the operating system where 5 is the number of seconds
cache, which cannot preserve careful between requests to flush the cache.
write order or guarantee when the You can set the flush interval to any
updated pages will actually be written value you choose.
to disk. To disable database flush use the fol-

ALTER DATABASE
lowing command:
Database Flush
If you want better performance than SET NO FLUSH INTERVAL
forced writes with group commit offers
and you are willing to take some addi- Which Option Should You Use?
tional risk, then database flush may be You now have five options for control-
right for your environment. Database ling physical writes to disk. The follow-
flush is an option you can use when ing two options are very safe because
forced writes are off. they use careful write order and
The problem with turning forced writes because they take place under transac-
off is that there is no way to predict tion control. Any transaction that has
when the operating system will actual- not been completely written to disk
ly flush writes in its cache to disk. This before a server crash will be rolled back
means that there is no way to estimate when InterBase restarts.
how many writes you may lose if the = Forced writes on (also known as syn-
database server crashes. chronous writes)
The new database flush option lets you = Forced writes + group commit
tell InterBase when it should flush all The last three options carry much more
cached writes to disk. You do this by risk because careful write order is not
setting the flush interval. The safest used and the changes made by a single
flush interval is zero. When the flush transaction may be partially written to
interval is set to zero InterBase flushes disk. This can cause logical or physical
the cache to disk each time a transac- database corruption.
tion commits. If multiple transactions
commit at the same time the cache is = Forced writes off + database flush
flushed once after all of the transac- after each commit
tions have committed. This is fairly = Forced writes off + database flush at
safe because the cache is flushed a fixed interval
= Forced writes off
immediately after each commit.
Nevertheless, because careful write
order is not used, a crash while the Choosing the best option is easy. Pick
cache manager is writing can still cor- the safest option that gives you the
rupt your database. performance you need. If choose any
option that does not include forced
For better performance you can set the
writes on, make sure your server is as
database flush interval to a positive
stable as possible by using a dedicated
integer that is the number of seconds
database server in a physically secure
between flushes. In this mode the
location with a UPS. This will reduce
InterBase cache writer thread ignores
the chance of a database crash.
transactions and tells the operating
system to flush the cache every N sec- If you use user defined function
onds. This is almost as risky as running libraries, and particularly if you write
with forced writes off and database your own UDFs, test them very careful-

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 17 www.ibdeveloper.com


INTERBASE 2005 ISSUE 3
ly. Buggy UDFs are the leading cause of InterBase server crashes.

Database Linger
If at least one user is always connected to your database or if your client applica-
tion(s) use connection pooling so that one or more pooled connections are always
connected to the database, there is no reason to use database linger.
When the last user disconnects from a database the database is removed from
memory. This causes several problems. First, it may not give the garbage collector
thread enough time to delete old record versions that have been queued for dele-
tion. Second, when the first user connects to the database, memory for the cache
must be allocated and initialized and the database metadata must be loaded into
memory. This makes the first connection much slower than subsequent connec-
tions.
The first user to connect is not the only one that will pay a performance penalty.
Consider the case where user A runs a SELECT statement. When the SELECT is exe-
cuted, all of the required index and data pages are read from disk and placed in the
cache. If user B runs the same SELECT he/she will get better performance because
the index and data pages will be read from the cache instead of from disk. Now,
suppose that users A and B disconnect and the database is removed from memo-
ry. A few seconds latter user C connects and has to wait while the cache is allocat-
ed and the metadata loaded. Next user D connects and runs the same SELECT that
was run earlier by users A and B. However, user D does not get the fast execution
that user B experienced because the newly created cache is empty. All of the
index and data pages must be read from disk again.
Database linger keeps the database in memory for a specified period of time after
the last user disconnects. It has the same effect as always keeping a connection
open but it is safer because there is no risk of leaving a transaction open.

ALTER DATABASE SET LINGER INTERVAL 600


To enable database linger:

This command tells InterBase to keep the database in memory for 600 seconds (10
minutes) after the last user disconnects. Of course you can use any linger interval
you choose.

ALTER DATABASE SET NO LINGER INTERVAL


There are two ways to disable database linger.

ALTER DATABASE SET LINGER INTERVAL 0


If your environment is such that all of the users might disconnect from the data-
base at the same time, enable database linger. It costs nothing and will improve
performance any time all users disconnect for a short period of time.
New Fields in RDB$DATABASE
You can determine the settings for any of the new cache attributes by querying
the RDB$DATABASE system table. The RDB$FLUSH_INTERVAL field contains the
current database flush interval. RDB$LINGER_INTERVAL contains the database
linger value.
The RDB$GROUP_COMMIT field shows whether group commit is enabled or not. A
“Y” in RDB$GROUP_COMMIT means group commit is on, while an “N” shows that
group commit is off.

Bill Todd is a member of Team Borland supporting InterBase and Delphi on


Borland’s Internet newsgroups. He provides InterBase design, consulting
and training services to clients throughout the world.
You can contact Bill at [email protected]

www.ibdeveloper.com 18 © Copyright 2005-2006, All right reserved www.ibdeveloper.com


2005 ISSUE 3 DEVELOPMENT AREA

Object-oriented development and RDBMS,


By Vladimir Kotlyarevsky
Part 2 [email protected]

Links between objects a standard projecting method, can we figure out the cus-
(See also [5], the "Links" chapter) tomer’s name (the most frequently used attribute)? To do that,
one would need to know where and how the customer list is
Links as object attributes stored, how the relationship between an “order” and a “customer” is organized.
n my opinion, the traditional Moreover, one must be sure that such a relationship exists ?, and, in addition, be

I approach to designing relational


databases leads to some problems
in establishing connections between
able to create some kind of expression for opening that relationship--a relational
join , a stored procedure call or something else. Such things are usually hardcod-
ed for each particular case, i.e., the relationship is defined during the projection
the relations. process and implemented during programming. A particular linkage is given sub-
1. A relational connection is poorly stance via a FK and some SQL-expressions that are encoded somewhere for the
documented (or not documented at all) purposes of sampling and storing the object. Worse, the linkage might be sub-
“in itself.” That is to say, if there is no stantiated by no other means than a lookup-control anchored to a record set field.
FK, then it would be impossible to fig- in a user interface.
ure out whether there is a relationship The problem becomes even more obvious in another example, where it is not even
or not, without project documentation. clear where the link actually leads. Consider an accounting system based on
(Here, of course, one can argue that if transactions. The transactions are stored in a separate table; one record contains
there is no FK, then there is no a rela- information about one transaction. Each transaction has the “amount” attribute
tional connection. I do not think it is (money, goods, etc) and several analytical characteristics. Each characteristic
quite true, since a link is always some- field is a link to a certain directory. At that, one does not know what that directo-
thing more than just an FK, and FK is ry is, and it is not clear whether it is at the stage of projecting or of programming.
nothing but a constraint). If there is an It also depends on several additional conditions, which occur in run-time only.
FK, then a relationship is needed, So what should we do in such cases, when a standard FK cannot be created? How
though its essence is not always clear can one figure out the name of the object to which the link refers? It is possible,
(unless the attribute names and FK if it is specified at the stage of projection. It can be done in the following way:
name are chosen properly). As you get to the account description and see what analytical accounting object type is
know, self-descriptiveness is one of stored in this transaction field. After that, find out what table this object belongs
essential characteristics of a good pro- to, and then figure out the name of the field in this table, in which the object’s
gramming style. name is stored. In this case, there is some self-descriptiveness--a description of
2. To open a relationship (i.e. to access analytical characteristics in account parameters. The downside is that this self-
a tuple from a certain relation via a link descriptiveness is too problem-oriented. What's to be done with other links in the
to it) one would need to apply a database pertaining to these transactions? The common practice is to do nothing.
method, specific to this particular rela- The content of a “select” is left to accomplish the relationship…
tionship. A relationship in an ordinary The first two problems have a simple solution which, if incomplete, is at least sim-
object model means that a field of an ple and convenient. All links to objects are created only through the “OBJECTS”
object contains a pointer (link) to table and have a “TOID” type. Indeed, if each object has a corresponding record in
another object. OBJECTS, then why not? Furthermore, in the “Classes” table, each attribute of “link
For example, if all classes of a system type” should have a distinct set rules described, which would limit the number of
are inherited from the MyObject class, objects the link may refer to (for example, “objects of the listed types only,” or
which has the “Name” property, (or “only the objects from a certain folder”).
they realize one appointed interface, or 1. Such a link is better documented since, even using just this link (type OID
the language supports RTTI, etc) then, value), you can easily figure out the type of the object it is linked to when refer-
with the help of a link to an unknown ring to OBJECTS.
object (using the target polymor-
phism), we can figure out at least its Type ClassId, if the id link to the object is known:

select ClassId
“Name.” In a case where RTTI is sup-
from OBJECTS
ported, we can find out everything else.
where OID = :id
Name of the type:
We can get all this information with the

select o.ClassId, o1.name as ClassName


help of a single link. What does a rela-

from OBJECTS o, OBJECTS o1


tional model offer us in such cases?

where o.ClassId = o1.OID and OID = :id


Let us use an order-document as an
example. One of the fields in this docu-
ment is a link to a customer. How, using
© Copyright 2005-2006, All right reserved www.ibdeveloper.com 19 www.ibdeveloper.com
DEVELOPMENT AREA 2005 ISSUE 3
2. To open a link, a simple call and a link value would be enough:
select o.Name, o.Description, o.ClassId
Happy New Year Wishes
from OBJECTS o
Marina Novikova: where o.OID = :id
Who has done his best to push the
project forward? Why? If the link is not created yet (for example, a user is creating a new object and is
about to enter a link), the rules of its creation can be got in “Classes.”
Helen Borrie:
They have all done their best and
External links between objects
should be rightly proud of their
year's work. Milan Babuskov, Marius (See [5], the "Links" chapter)
Popa and Mauricio Longo deserve In the previous chapter, we considered the cases where a relationship (link) is an
special commendation for their object attribute. However, there are links which, generally, are not a part of an
efforts to "get Firebird out there". object.
And let's not forget to raise our
1. A relationship of "what entailed what" type. For example, a document entails a
hats to Paul Vinkenoog, whose soli-
whole chain of actions and documents, which together constitute a business-
tary efforts have turned the
process. However, the same documents and actions occur in completely different
Firebird docs system (originally
business-processes as well, and those can be independent units. In this case, it
developed by David Jencks) into a
does not make a sense to create a field-link to an entailed object in each object:
very usable system that anyone can
first, their types may differ; second, there can be several of them.
use. But I particularly want to
highlight the largely unheralded 2. Purely logical entering of something to somewhere. For example, the files are
efforts of Claudio, whose job it is included in catalogs, but nevertheless are self-dependent objects, and their mean-
to scrutinise every bit of code that ing (as a rule) does not depend on a catalog changing.
comes through and to get into 3. And so on ?
debates (usually private, occasion-
This problem is solved by creating an extra link table.
create table links(
ally public, often tortuous) about

link_type TOID,
pieces that don't measure up to

left TOID,
standard. Claudio spends many

right TOID,
hours also writing detailed and

Constraint LINKS_PK primary key (link_type, left, right))


precise arguments - in English, not
his native language, and often with
a lot of wry wit. This is an almost common table for organizing a M:M relationship. The “left” and
Why? The efforts of most of the “right” are the links to the left and right sides of the relationship, i.e., the two
core devs and the driver developers objects between which the relationship is established. I call it “almost common”
are very visible. Claudio has an because there is a difference. First, we have a common object set of all types, and
essential, but horrible, job to do there is the only way to refer to them (the “OBJECTS” table and the unique-keys
and he does it very, very well. He system). Secondly, there is the link_type field. The first thing allows us to use the
never stops, because the code “links” table for any external relationships between any objects. Contrast that
never stops. Sometimes he has with the standard relational approach, which requires creation of separate tables
even had to put up with an insult- for each M:M relation, because key data types are not standardized. If they are
ing response from some program- standardized (for example, integer only), their values remain unique only within
mer who believes himself to be the limits of the tables. The second (link_type field) enables the type of a rela-
above criticism. Where does one tionship to be defined, as well as rules for its creation.
go for recourse in such a situation? What does all this mean for us? Let us consider the known method of realizing a
Paul Ruizendaal: tree-like structure in a relational table (id, parent_id, described, for example, in
We *all* have done our best to [6]). Using the “Objects” and “links” tables, you can easily get this structure with
push the project forward. Who has the help of the following query (for a certain relationship type):

select o.OID as id,


done his best the most? I don't
l.left as parent_id,
know. Much effort occurs in areas
o.name,
that I am not so familiar with.
o.description
from objects o, links l
From my personal perspective I am

where o.OID = l.right and l.link_type = :link_type


most impressed by Dmitry
Yemanov. Keeping many highly cre-
ative, highly talented, highly opin-
ionated developers focused is a Such a structure is useful and simple, and, besides, there are many visual compo-
tough job. Linus nents which visually realize the tree-like presentation.
However, it is often necessary to realize a more complex relationship system

www.ibdeveloper.com 20 © Copyright 2005-2006, All right reserved www.ibdeveloper.com


2005 ISSUE 3 DEVELOPMENT AREA
between objects. So, the “links” table (this is where the most interesting things
begin to happen ?) enables almost any kind of trees and object networks to be Happy New Year Wishes
defined. Why? To explain this, let us consider an example with the “order” object.
=First, to organize a global logical object catalog in a database, it is desirable Torvalds once described it as "herd-
to simultaneously store all objects of “order” type in: ing cats". I think Dmitry has done
an excellent job of keeping the
¢ the “Orders” folder. world's most talented database
¢ the folder of the manager who created it and holds the business-process engineering team focused.
concerned with it. Vlad Horsun:
¢ the folder with all company’s documents over the period of the current month. All did their best :) Why? Because
¢
it is pleasant for us ;)
the personal folder of the clerk responsible for this particular order.
¢
Dmitry Yemanov:
somewhere else.
IMO, Vlad Horsun was the most
=Secondly, consider the need to present the business process (BP) to which the active core developer during this
order belongs as a column and handle it as a single whole. For example, year. Arno has done his best creat-
¢ BP began with a customer contract about permanent services during a year. ing more efficient index implemen-
tation which eliminates a few
¢ As a consequence, the document named “delivery schedule” was created.
major performance issues, he's also
¢ Then, according to the contract and schedule, the order appears, been excellent in the optimizer
¢ All this is followed by invoice, shipping, etc. improvements, as always. And I
highly appreciate efforts of
Now let us describe how such problems can be solved. It seems to me that, at least Adriano dos Santos Fernandes in
in systems dealing with documents circulation, such tasks are quite widespread. regard to the INTL issues.
The main object catalog Alex Peshkov:
Certainly Jim. One of the reasons is
The first task is accomplished right away because, in contrast to the id, parent_id
that he can develop the project
structure, our structure allows use of an unlimited number of links to one and the
without digressing to other things.
same object (due to the fact that the notions of object and links are separated
both logically and physically). Let’s enter the “Logic Folder” folder type with
ClassId = 100; also, let the Order type have ClassId = 200. Then the record set in
our structure, which realizes the problem specification (though with no records in
the Classes folder), will look as the follows:

Objects
OID Name Description ClassId
0 Root Root folder 100

1000 PrivateFolders Private folders of the members 100

1001 Documents All documents of the company 100

1002 Smith All orders of the clients 100

1003 February2002 Mr. Smith’s (manager) private folder of 100

1004 March2002 All documents obtained during February, 100


2002
1005 Roberts All documents obtained during March, 2002 100

1006 1001 Mr. Roberts’ (clerk) private folder 100

1007 AnotherFolder Just a folder 100

2001 Order0001 Our order 100

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 21 www.ibdeveloper.com


DEVELOPMENT AREA 2005 ISSUE 3

Link_type Left Right Link Content


0 0 1000 The PrivateFolders folder is included in root

0 0 1001 The Documents folder is included in root

0 0 1007 The AnotherFolder folder is included in root

0 1000 1003 The Smith folder is included in PrivateFolders

0 1000 1006 The Roberts folder is included in PrivateFolders

0 1001 1004 The February2002 folder is included in Documents

0 1001 1005 The March2002 folder is included in Documents

0 1001 1002 The Orders folder is included in Documents

0 1003 2001 Order0001 is included in Smith’s private folder

0 1006 2001 Order0001 is included in Roberts’ private folder

0 1005 2001 Order0001 is included in the March2002 folder

0 1002 2001 Order0001 is included in the Orders folder

The content of any folder with a known OID is selected by the query:
select o.*
from objects o, links l
where l.link_type = 0 and
o.OID = l.right and
l.left = :folder_id

The whole tree-like structure looks as


follows :

There is actually only one "Order0001"


object in the “Objects” table, while
there are several links in different fold-
ers referring to it.

A smarter simple directory, hierarchical


At the very beginning of the article, I promised to consider a method of building
of a directory that would be more sophisticated than the simplest one. It can be
got from OBJECTS by a query:

select OID, Name, Description


from OBJECTS
where ClassId = :LookupId

www.ibdeveloper.com 22 © Copyright 2005-2006, All right reserved www.ibdeveloper.com


2005 ISSUE 3 DEVELOPMENT AREA
The result will be all objects of a certain type from the “OBJECTS” table. Slightly There is no need to describe in detail
changing this query (considering that folder’s ClassID equals 100, and the link the benefits of using of pathnames
type is link_type) "logical entering" equals 0, let us write instead of OIDs (or, more precisely,

select o.OID, o.Name, o.Description


together with them), since there are

from OBJECTS o
many of them and they are obvious.
inner join Links l on l.right = o.OID and
l.link_type = 0
Finding an object OID by a path can be

l.left = :LookupRoot and


realized easily in a stored database pro-

where o.ClassId = :LookupClassId or


cedure, which would provide you with a

o.ClassId = 100
possibility to use the pathnames in
queries and other stored procedures
and triggers, thus increasing database’s
This query will return the first level of the hierarchical directory, which begins logicality and cohesion. To realize such
with the OID LookupRoot folder and contains all the sub-objects (in the catalog a procedure in Interbase, you would
hierarchy) with ClassId = LookupClassId. The query to open any first-level folder, need to write a simple UDF for the path
will be similar, but LookupRoot should be substituted by the OID of that folder. line analysis; for MS SQL, TransactSQL’s
It is a matter for the computer to display the result of this query in the dialog win- capabilities would be enough.
dow in a usual form (with folders opening by double-clicking and selectable
objects inside these folders). Other types of links
For example, this dialog may be exactly the same as a standard “Open file” system Using the features provided by the
dialog. Given this dialog and a controlling run-time object (I call it “Links” table, you can build a lot of
DirectoryBrowser), we will get a universal user interface for most of the directo- useful structures for joining database
ries in the system. objects, in addition to the described
global logical catalog. For example, you
Actually, this is true not only for the directories. In fact, we will get a universal API
can solve a problem of a linked busi-
and user interface for browsing a catalog of any objects in the database, which is
ness process (BP), which is described at
functionally similar to the Win dows Shell API (IShellFolder, IShellBrowser etc) bun-
the end of the previous chapter. The
dled with Windows Explorer. You can also try to embed your mechanism into Windows
object type defining a certain BP type
Explorer, and make the objects accessible from anormal Explorer window. Without
(for example, the already-described
naming names, I know who tried to do it, and even succeeded in it!?
long-term attendance), in this case, is a
A new directory type will be created by two clicks, setting LookupRoot and set of possible links between document
LookupClassId for it (it is possible to use several “ClassId”s, if the directory can types, which can be created in its con-
contain several types). For example, a contractor list may contain both suppliers text. The object, which describes a par-
and purchasers (which can be divided into other directories). ticular BP (attendance contract #10),
About how to create a directory, which requires an extra field (in addition to those contains a set of the existing links
stored in Objects) for sampling I will tell later, in the chapter about realization of between the already created docu-
access library organization. ments in the BP chain, and also is a
central link during a visual presenta-
Pathname in a Catalog tion of the chain of documents and
objects in the general network, i.e.
Since we can build a structure of catalog type in our database, it would be reason-
parts of this BP. In this case we would
able to use another “good old” abstraction: we can access the objects with the
need a new link type (link_type), with
help of a pathname. For our example, the path to the Order0001 object (actually,
its own creation rules and restrictions,
it is one of several possible ways) can be written as
for example, link_type = 1.
/Documents/Orders/Order0001 or /PrivateFolders/Smith/Order0001. Indeed, in a
catalog, the path to a certain object, which starts from a particular unit (root or And so on. Everyone can create their
something else), unambiguously identifies an object on condition that we provid- own types. I can suggest the following
ed for uniqueness of the object names within the catalog. The expression "unam- two: a type for class hierarchy presen-
biguously identify an object by a path" in our case means that, with the help of tation, and a type for account hierarchy
such a path, one can find the OID of the object. Indeed, there can be many paths from the accounting card and some
to one and the same object (since you can have as many links to the object as you other directories.
want); each path would point to only one object. The Links table can be used for links of
Uniqueness of the names within a catalog can be provided by imposing a con- the “master-detail” type, if a
straint on the “Links” table with the help of a trigger that responds when “detail”set is an object list. For exam-
link_type = 0. ple, the table part of a bank extract is a
set of pay-sheets. The advantage link-
We will not discuss here the well-known problem concerning the impossibility to
ing such links relationships via the
fully support such restrictions with the help of a trigger, especially because we
Links table is that you would not need
can avoid this problem in our case.?
to realize the full function and field

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 23 www.ibdeveloper.com


DEVELOPMENT AREA 2005 ISSUE 3
sets (master_id, etc) thousands of times for each master-detail link. Instead, a
standard, already realized and debugged relation method is used. Happy New Year Wishes

The access restriction system Marina Novikova:


Because calling any database object is accomplished via the “Objects” table, you What could be done better
also can realize an access restriction system at the document level (or at the (Is it possible to improve this in
record level), which is a "sweet dream" for many programmers who develop docu- 2006)?
ment-oriented databases. Indeed, this system will be independent from object Helen Borrie:
types, i.e. when adding a new type, a user will not have to change the system. I hope that Dmitry and the team
Here I will briefly describe two methods: the first is simple and fast, the second is will get a good, clear run for the
more complex and flexible. They do not guarantee you absolute security (they can final Fb 2.0 release and the subse-
be omitted at the SQL level), but under these circumstances a hacker would not quent Fb 2-Vulcan merger work. I
benefit from it. Generally speaking, these methods provide an application devel- really really hope that our interna-
oper with a measure of convenience: one can quickly and directly figure out tional field-tester groups will
whether a user has particular rights on a certain object by, for example, adding a become better-coordinated as
simple BOOL function check_access(char* user, int object, int right) or something there will be a lot of new things to
along those lines. test.
Thus: The Foundation has some funds in
hand for grants to help one or two
1) A standard structure of users and groups support list is created:
create table users (
people who have some serious time

UID integer not null primary key,


to contribute to testing.

name varchar(32) unique);


create table groups (
I hope we soon find some good,

UID integer not null primary key,


experienced testers to spearhead

name varchar(32) unique);


the QA effort.
Paul Ruizendaal:
/* uniqueness of users’ and groups’ names is to your
taste */
Everything! We have a roadmap

create table users_groups(


chockful of improvements to the

UID integer not null,


code base, our "google footprint"

GID integer not null,


still is not as large as that of
constraint users_groups_pk primary key (UID, GID));
Oracle, our documentation set
create procedure uid_by_name(name varchar(32))
could be better, we could have pre-
returns (uid integer)
built binaries for more platforms,
as
begin
etc., etc.

select uid from users where name = :name into :uid;


Two big themes in 2006 should be

end
documentation and applications. I
am impressed by IBPhoenix' dona-
Two fields (“read_group integer” and “write_group integer” are added to the tion of their doc set to the Firebird
“Objects” table. These fields are the links to the “groups” field. Foundation and hope that prelimi-
nary docs for the community to
A presentation is created
create view s_objects as
work with and on will be available

select o.*
soon.

from objects o, users_groups ug


Packaged applications are impor-
where (o.read_group = ug.gid and
tant to further grow the user base.
ug.uid = uid_by_name(CURRENT_USER));
If Firebird gets strongly associated -
grant select on s_objects to public; perhaps even bundled - with excit-
ing ready-to-use (open source)
“Objects” becomes inaccessible to everything but the “s_objects” presentation. applications, our user base will grow
This presentation provides a user with rights for reading all the objects whose and grow.
read_group matches the one entered by the current user for this particular object. My personal opinion is that midmar-
“Before insert,” “before delete,” and “before update” triggers are created for the ket and corporate business applica-
“objects” table for checking access rights according to CURRENT_USER and tions (such as ERP, groupware,
write_group. The triggers’ source text is trivial and is not included to the present Web2.0, accounting, etc.) are the
article. You would also need to create “before update” triggers for each table con- most important for Firebird.
taining extra object attributes. Those triggers will be absolutely identical. The
“before update” trigger should do nothing but update the “objects.change_date”
field, thus calling access rights checking by the “objects before update” trigger.
www.ibdeveloper.com 24 © Copyright 2005-2006, All right reserved www.ibdeveloper.com
2005 ISSUE 3 DEVELOPMENT AREA
2) The essence of the second variant is almost the same, but it is based on the well-known ACL algorithm. An ACL (access
control list) enables you to specify a whole list of different access rights for different users and is applicable to many multi-
user file systems. Thus, instead of the “read_group” and “write_group” field, an additional table is created:

create table ACL( id integer not null primary key,


acl_id integer not null, /*acl number*/
uid integer not null, /*link to a user*/
right integer not null /*identifier of a certain access right */
))

The “acl_id integer not null” field is added to the “Objects” table as a link to acl.acl_id. The query
select a.uid, a.right
from acl a, objects o
where acl_id = o.acl_id and o.OID = :oid

will return the ACL for any of the objects from objects. The number of the lists themselves may be significantly lower than the
number of the objects, due to the fact that several objects may share a single list, the way some file systems, such as NTFS, do.
The access control itself is accomplished in almost the same way as in the first example, except that the “s_objects” presen-
tation and access rights checking triggers become more complex and slower. I will not cite a definition of the s_objects pres-
entation. Even though it is quite complex, it is problem-dependent, i.e. it depends on the access rights set specified in the
particular system.
Advantages and disadvantages. The advantages of the first way are simplicity of realization and speed of the relational sam-
pling with s_objects, since s_objects is a simple link, and its sampling is not significantly slower than sampling directly from
objects. The disadvantage is the low flexibility (the features provided by this method are not always enough). The second
method is much more flexible, but it is harder to realize. Moreover, using it in sampling with s_objects would possibly be com-
plicated due to a slower read rights verification procedure. In conclusion, the common disadvantages of both methods are:
a) necessity to create triggers for every new additional attributes storage table
b) neither way guarantees reading security for the additional tables of object
attributes if such a reading passes over s_objects. However, it is possible to take
certain extra actions, so that reading of attributes tables would be of no use to Happy New Year Wishes
malefactors. For example, you can store one or two essential items in objects,
without which most of the objects would serve no purpose. Vlad Horsun:
In my opinion, beta-testing might
have begun half a year earlier. I
hope we shall reduce a release-cycle
Bibliography
next year. And the roadmap from DY
1. Mapping Objects to Relational Databases - White Paper. confirms it :)
Scott W. Ambler. 26-FEB-1999 Dmitry Yemanov:
http://www.AmbySoft.com/mappingObjects.pdf First, we could move faster. The
2. "A Description of the Ultima-S System" by Vladimir Ivanov. v2.0 Alpha testing stage was per-
http://ivn73.tripod.com/ultima_overview.htm formed not as active as it could be.
Second, we need to learn making
3. The Design of a Robust Persistence Layer For Relational Databases -
decisions in time. Too many things
Scott W. Ambler http://www.ambysoft.com/persistenceLayer.pdf. 28-NOV-2000
were discussed but an agreement
4. “Natural Keys vs. Artificial Keys”. Anatoliy Tentser. July 20, 1999. has been defered till it's too late to
http://www.ibase.ru/devinfo/NaturalKeysVersusAtrificialKeysByTentser.html do something in the current ver-
5. A Database as Objects Storage. Anatoliy Tentser. sion.
http://www.compress.ru/Article.asp?id=2006 These issues are hopefully to be
6.“Tree-Type (Hierarchical) Data Structures in Relational Databses”, improved over the next years.
Dmitriy Kuzmenko, iBase - Alex Peshkov:
http://www.ibase.ru/devinfo/treedb.htm. and other articles about trees and I think Vulcan was divided for
objects at http://www.ibase.ru/develop.htm nothing. If firebird/Vulcan were
not divided, we would have a single
system of better quality.

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 25 www.ibdeveloper.com


COVER STORY 2005 ISSUE 3

Happy New Year Wishes


This year’s Firebird Conference
has taken place Marina Novikova:
at the Hotel Olsanka If you remember the horoscopes, the
in Prague, Here you can find symbol of year 2006 is a dog.
Czech Republic. a set of nice photos made by Helen Borrie:
Dmitry Kouzmenko, Does that include all canines? :-)
Serg Vostrikov
and Mary Novikova. Marina Novikova:
Will Firebird be a lucky dog? Why?
Helen Borrie: Let's hope it won't be
a "dog" (slang term: something
that functions badly and runs slow-
ly, as in "It's a real DOG!" Why? I
doubt that dogs have anything to
do with luck! I remember the
response given by an Australian
entrepreneur in an interview some
years ago. He was asked how
important luck had been in his suc-
cess. He replied, "Luck has been
very important. And the harder I
work, the luckier I get."
Paul Ruizendaal: I looked it up and
my Chinese horoscope says: "People
born in the Year of the Dog possess
the best traits of human nature.
They have a deep sense of loyalty,
are honest, and inspire other peo-
ple's confidence because they know
how to keep secrets. But Dog People
are somewhat selfish, terribly stub-
born, and eccentric.
They care little for wealth, yet
somehow always seem to have
money. They can find fault with
many things and are noted for their
sharp tongues. Dog people make
good leaders."
Yeah, I think it fits: Firebird is a
lucky dog.
Vlad Horsun: Firebird will be a
lucky Bird :) Because there are so
many peoples loving it. And they
love it in Dog's year and in all
other years :)
Dmitry Yemanov: Not sure about
dogs, as I don't trust the horo-
scopes, but Firebird was quite lucky
and successful since its beginning.
I don't expect anything to change
here.

www.ibdeveloper.com 26 © Copyright 2005-2006, All right reserved www.ibdeveloper.com


2005 ISSUE 3 COVER STORY

Happy New Year Wishes Happy New Year Wishes

Marina Novikova: Marina Novikova: If you were


The third
What was your most important con- Santa Claus, what present would you
tribution to Firebird in 2005? worldwide give to Sparky and each member of
Helen Borrie: Just being there, I
Firebird the Team (any preferences)?
guess, nothing extraordinary. Conference Helen Borrie: Sparky's present is to
Converting all of the Firebird 1.5 is in session. have a long holiday and one doesn't
and Firebird 2.0 release notes to need Santa to help with that. I'm
DocBook XML was a big job, worth still vacuuming red, yellow and
doing. It is going to simplify trans- orange feathers up from strange
lation and updating, besides mak- places in my house, where they were
ing the contents of the release blown by cooling fans. I think
notes directly available for "lifting" Santa and his elves got it right
into the manual-style documenta- when they built their toy factory at
tion that the doc team is develop- the North Pole. :-)
ing. The documentation task is I hope each of the other team mem-
enormous and the new sources will bers gets the top items from his or
vastly lower the mountain top. her Christmas list. My top item? a
Paul Ruizendaal: Donating Ђ800 to winning Lotto ticket.
the Firebird Foundation. I hope it Paul Ruizendaal: Each member of
will be more in the future. the Team? That is ten's, perhaps
Vlad Horsun: I like my implementa- even hundred's of people, the way I
tion of GLOBAL TEMPORARY TABLEs see it. That list is too long for this
though it has not come in this interview.
release :) Also I participated in To Sparky? Perhaps a pair of sun-
implementation of External Engines glasses so that he can be a real cool
support together with Eugeney dude during his global celebrity
Putilin and Roman Rokytskyy. From tour in 2006. Or should that be 'she'
those features which have included and 'dudette'?
in FB2, it would be desirable to note
EXECUTE BLOCK, removal of the Another nice gift would be airplane
requirement of exclusive connec- tickets for the family reunion, as
tions at time of FK creation and the Sparky consists of 4 brothers and
advanced garbage collector. But sisters now -- scattered around the
year is not completed yet, so there world. I hope Jason has enough
will be new important changes :) spare room to host the party.
And I do not know what from listed Vlad Horsun: To Dmitry - multicore
above is most important :) multiprocessor 128bits notebook
Dmitry Yemanov: Did I really con- for development and entertain-
tribute anything? Are you sure? Ah, ments. To Nickolay - a little bit "a
that roadmap thingy, you're right :- smoke of fatherland" though I am
) Getting serious, I'm glad that a lot not sure, that it is pleasant to him.
of bugs died with my direct assis- To Jim - books "How to win friends",
tance. There weren't many new fea- "How to work in the team" and a
tures by me this year, although I some valeriana. I would like to pres-
consider my parser changes and ent something to Ann, Helen, Arno,
optimizer improvements useful Alex, Claudio and the rest, but mine
enough. fantasy is exhausted now :) I like all
these people and very glad to work
Alex Peshkov: Probably renewed with them.
DB security structure and support
of this structure in the engine and Dmitry Yemanov: The present is
gsec. Also the release of beta1 for obvious: my live body at the next
Amd64. This 64-bit structure cer- conference :-)
tainly has future.

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 27 www.ibdeveloper.com


COVER STORY 2005 ISSUE 3

Happy New Year Wishes


Faces
of Marina Novikova:
How are you going to celebrate
Firebird
Christmas and New Year's Eve?
Conference 2005 ...
Helen Borrie: Christmas I shall
spend with my husband and chil-
dren in Auckland, New Zealand.
After Christmas, I'm going down to
my mother in the South Island
(Christchurch). My father died in
June, unexpectedly. His computer
is much better than her old 486, so
I'll be setting up Dad's machine for
Mum to use.
For New Year, I'll be back at home
but I have no plans. If it's a hot day
on Dec. 31, I'll probably stay up all
night with the garden hose ready, in
case the local drunks do their usual
thing and ignite fireworks in the
eucalyptus forest that surrounds my
house.
Paul Ruizendaal: 2005 has been a
long hard slug for both my wife and
myself. We will probably be in a very
sunny place, doing absolutely noth-
ing. Not quite sure where that will
be, it seems to be rainy season in all
the well know places. Perhaps it will
be Zanzibar.
Vlad Horsun: I'll celebrate at home
with my family. Last times I looked
under a X-mas tree basically to put
a gift, I do not think, that this time
will be exception ;)
Dmitry Yemanov: Don't know yet.
Usually I'm told about the upcom-
ing New Year in the middle of
December when it's too late to
schedule something unusual :-)
Most probably, I will celebrate it
with my girlfriend and friends at
home. Yeah, a small Xmas tree, can-
dles, champagne and the stuff.
Alex Peshkov: I have no plans yet.
Most likely I will celebrate it with
my family.

www.ibdeveloper.com 28 © Copyright 2005-2006, All right reserved www.ibdeveloper.com


2005 ISSUE 3 COVER STORY

Happy New Year Wishes Happy New Year Wishes


... happy faces
Marina Novikova: Marina Novikova:
What would/would not you like to That's the place for your NY wishes
see under the X-mas tree? to the Firebird community: ...
Helen Borrie: I expect our 24-year- Helen Borrie: I wish everyone a
old tabby cat, Peggotty will be great time! In Russia, I think you
under the tree, as usual. Not too have all of your holidays in January.
many broken light-bulbs, I hope. A Keep warm!
nice pile of books for me to read. I wish everyone the best possible
Hint to Santa: "Thud" by Terry year in If Dmitry has his way, you
Pratchett? will all be very busy.
Paul Ruizendaal: Would like to see: Paul Ruizendaal: I wish all Firebird
lot's of presents. Would not like to community members health and
see: lot's of needles. happiness for themselves and for
Dmitry Yemanov: Is Santa reading their loved ones, onto the seventh
this? :-) Well, any presents are generation at least (more if their
always welcome, I don't have any OAT is ancient).
preferences. For me, it's more about Vlad Horsun: I wish the community
kindness rather than about the to grow, take pleasure from Firebird
material stuff. and new releases in new year!
Marina Novikova: Dmitry Yemanov: I wish everyone
When will you start working on to live in peace, enjoy the life and
Firebird in 2006 (will you have long be happy with Firebird.
NY vacations or not)?
Helen Borrie: No NY vacations. In
Australia, the time from Dec. 24 to
January 2 is when life stands still
while we eat, drink and be merry. A
lot of people take annual vacations
in January but, for me, it will be all
over by NY and I'll be back to the
desk.
Paul Ruizendaal: Probably January
1st. The best ideas pop up in my
mind when doing absolute nothing
and then I can't wait to get started.
One thing that intrigues me is how
to best implement materialized
views for Firebird.
Vlad Horsun: I think January, 2-nd
i'll not sustain and turn on my com-
puter :)
Dmitry Yemanov: Since January, 3,
as always. The second day is usually
spent asleep in attempt to recover
from the active celebration :-)
Alex Peshkov: I hope by January, 3
I will have come to consciousnesss
after the NY and start working on
the project.

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 29 www.ibdeveloper.com


DEVELOPMENT AREA 2005 ISSUE 3

JayBird 2.0, a JCA/JDBC driver for Firebird


by Roman
ayBird was started by David Jencks javax.sql.RowSet implementations

J in 2001 and Alejandro Alberola


wrote a first implementation of
the wire protocol. The goal of the proj-
from Sun Microsystems and Oracle.
Another feature added in JayBird 2.0 is
full support of the Services API, which
Rokytskyy,
JayBird team
[email protected]
ect was to provide better JDBC driver allows invoking backup and restore
than the InterClient that we inherited procedures from within the Java VM,
from Borland. It took us more than 2 performing user management and data- statements introduced in Firebird 2.0
years to complete the first release. base maintenance tasks and collecting and we hope, as well, to see Java Stored
Currently the release cycle is approxi- database statistics. Procedures, which are currently sup-
mately 18 months but point releases The main changes in new version are ported in Oracle-mode Firebird, merged
are published more often, about once not visible, however. JayBird 2.0 fea- with the main Firebird tree.
every three months. (There is no rule, tures a new pluggable architecture
except “fix quick, release often”). Today with runtime plug-in discovery, and
we have seen ~130,000 downloads other refactorings that were made to
since the project started and more than increase code maintainability and sta-
1,300 members in Firebird-Java list. bility in all places of the driver.
Architecturally, JayBird consists of
three layers: GDS, JCA and JDBC. The What's coming next?
GDS layer is an abstraction of the phys- Gabriel Reid has already working ver-
ical interface to the server and repre- sion of event support for Firebird,
sents the ibase.h file translated to which did not make it into JayBird 2.0
Java. The JCA layer is responsible pri- due to a release schedule. It will be
marily for transaction management, committed to CVS in next days/weeks
including support for XA transactions. and we will start releasing JayBird 2.1.
The JDBC layer implements the JDBC We also have a pending change to sup-
3.0 specification. Two additional com- port multiple client libraries on the JNI
ponents, a connection pooling frame- level, contributed by Evgeny Putilin
work and database management were and Vlad Horsun, which has been
added later to expose Firebird-specific already included in Oracle-mode
features and to provide efficient Firebird, but has not yet been commit-
resource management classes in Java. ted to JayBird due to release schedule.
We have two main GDS implementa- Ludovic Orban is helping us to achieve
tions: pure Java and a JNI-based one. better XA specification support.
The pure Java implementation con- JayBird 2.2 will target XA optimization
nects to the Firebird server via a TCP to reduce the number of transactions
socket and talks Firebird's wire protocol needed when multiple transaction
directly. This is the fastest way to con- branches of the same global transac-
nect to the remote server from Java. tion are executed against the same
The JNI-based implementation was resource. Depending on the transaction
created primarily to support local IPC coordinator involved, the expected
connections (~30-40% faster compared gain is one database page flush on
to TCP local loopback, on AS3AP bench- commit instead of five. Also we plan to
mark) and Firebird Embedded (up to 2x add read-only optimization that will
speed-up on the same benchmark). use one-phase commit if no changes
Two additional GDS implementations were made to the database within the
exist, one supporting Oracle-mode specified transaction branch.
Firebird via its API library and one sup- Steven Jardine is pursuing a goal to
porting the Vulcan client library, but improve the code layout and build
they are not part of JayBird distribution. process by migrating from Ant build
JayBird 2.0 introduced full support of scripts to Maven 2.0 project manage-
updatable result sets, which allows ment. We hope to finish this task
seamless integration with GUI applica- before the 2.1 release.
tions that allow in-place editing as well Upcoming versions of JayBird will fea-
as integration with the ture support for the INSERT... RETURNS

www.ibdeveloper.com 30 © Copyright 2005-2006, All right reserved www.ibdeveloper.com


2005 ISSUE 3 DEVELOPMENT AREA

How I started to work with MS SQL Server and ADO


By Vladimir
MS SQL Server – what are using IB 6.x as the RDBMS and an appli- Kotlyarevsky
cation server (middle-tier) written in [email protected]
you talking about?! Delphi 7. It ran as COM+ application,
he title of this article might seem had COM API for remote client applica-

T surprising for an IB developer maga-


zine. However, with six years'
InterBase experience before I started
tions, and used FIBPlus for data access.
It was a tried and tested approach for
me, but it had few important disadvan-
3. ADO's Recordset object had well-
implemented serialization, which
with MS SQL Server in 2002, I thought an tages. would allow me to pass data sets over
account of my thoughts and experiences I could not use COM+ transaction man- the network easily.
in moving from IB to MS SQL might make agement, I could not pass datasets into I found that, though it was still in in
an interesting read. I will try to describe the VBScript ActiveScripting engine, the beta stage, IBProvider had already
the features of MS SQL that were strange which I planned to use in both applica- become a product with a quality which
for me as an IB programmer, features that tion server and client, and I had trou- was (and still is) rare in our times.
appeared to be very suitable and some bles with passing datasets from appli- Congratulations and respect to its
that I found to be awful. cation server to clients. I also had to authors.
Of course, I understand that any direct implement my own database connec- So the next iteration of Kelly used
comparison of “IB vs. MS SQL” is a some- tion pool for the application server, not IBProvider and ADO. And it really
what dangerous topic for an author since a very complicated task, but annoying. worked nicely!
he risks being beaten by the fans from All of the MS how-to samples about
both sides. Please keep in mind that this I was absolutely happy when I had writ-
COM+ and ActiveScripting used MS ADO ten my first test in VBScript for the
paper is just an experiential account by and OLE DB for data access.
one application programmer, and does application server API. The API under
not pretend to be either a complete For those who work with Borland prod- test returned an ADO recordset. I could
description or a methodologically correct ucts rather than with MS ones: ADO and call it remotely just from VBScript and
comparison. OLE DB drivers are somewhat similar, process the result recordset in VBScript
respectively, to the Borland BDE plus the or Excel VBA without any problems.
My account encompasses not just the VCL data access components and SQL
actual servers but also development COM+ native connection pooling just
links. ADO is a very simple, high-level worked: I need to do nothing more with
tools, data access APIs and techniques. data access interface for application pro- it. The COM+ transaction coordinator
Many important features of SQL Server gramming, mostly for use with VB. (MS DTC) worked too, so I no longer
are left beyond this article because I have You use the same ADO components for needed to be concerned with managing
not used them and cannot say much any data source, be it ORACLE, MS SQL databases transactions and connec-
about them (full text search, replication, or dbf. OLE DB is s rather low-level, tions on the application server. Anyone
Data Transformation Services, Analysis complex but powerful COM API, intend- who has implemented these things in a
Services, etc.). Many things in ADO that ed for C++ developers. Every RDBMS three-tier environment would under-
have changed in ADO.NET are not cov- requires its own OLE DB driver. ADO stand my happiness ?.
ered. This article is not to be taken as a uses OLE DB drivers to access data on
complete SQL Server guide ? but merely different RDBMSes, but the application Enter MS SQL Server
as “one developer’s perspective”. programmer does not need to know At this point, our company had secured
When I use the mnemonic “IB” in this how it does this. One just has to spec- a large contract for accounting soft-
story, I usually mean InterBase up to ver- ify the data source type in the connec- ware development. We decided to
sion 7.x (sometimes it is Firebird 1.0, but tion string. implement it using our new three-tier
it will be highlighted). SQL Server stands I knew that an implementation of the Kelly techniques.
for MS SQL Server 2000. OLE DB driver for IB already existed – One of the customer requirements was
IBProvider http://www.ibprovider.com to use MS SQL Server 2000 as the
How could this happen? so I decided to check what ADO was. I RDBMS. It was just their corporate
Background found that: standard, they had licenses, support
In 2002, in my work with the Contek-Soft 1. ADO was well supported by COM+. and trained administrators, and were
software company, I began trying to 2. All ADO objects were implemented as not going to change that.
develop a kind of three-tier framework or ActiveX components. This meant I I decided to install and take a first look
"techniques toolbox" for our future multi- could use them in Delphi, any at the Developer version of MS SQL
user applications. I called it “Kelly” – ActiveScript, MS Office VBA (which was Server 2000 from our MSDN subscrip-
don’t ask me why, I just liked this word ?. very nice too) and any other language tion. The Developer version is the full
The initial Kelly attempts were done supporting COM and ActiveX. one with the same feature list as the

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 31 www.ibdeveloper.com


DEVELOPMENT AREA 2005 ISSUE 3
Enterprise version, and it’s free for fewer than for IB.
development purposes only, not for SQL Profiler
commercial use.
SQL Profiler is a wonderful tool! It is what I always dreamed about when working
Installation was quick and easy, just with IB. Its function is to show a full SQL trace for a selected SQL Server instance
few simple questions in wizard screens through a server-side interface. Because of this server-side interface it can show
that I didn’t have to think much about. queries from all working clients. It seems to be incredibly useful and suitable tool.
It seemed to me not much more com-
plex than installing InterBase or IB has no comparable tools. Well, there are FIB+ and IBX SQLMonitor, but they are
Firebird. built into client libraries and hence work only from the client side. only, tracing
queries just from this client, and requiring special client builds. The SQL Server
Tools Profiler traces queries from all clients, yours or third party, independently of what
data access library is used. You can use it not only to develop and debug your own
Enterprise Manager
applications but also to see how some interesting third-party software works with
After installing, I made a quick tour of its database.
the tools from the SQL Server set.
Index Tuning Wizard
SQL Server Enterprise Manager (EM)
This tool makes suggestions about how to changing database index structures
was a strange, heavy and slow applica-
based on query statistics from running applications. My own humble opinion is –
tion, mostly for database administra-
“just throw it away”. Its advice is useless for experienced developers and will dam-
tors. The DBA can use it to start and
age the brains of newbies ?.
stop the server, manage server parame-
ters, backup/restore databases,
Documentation
attach/detach databases, shrink data-
bases, manage security, and to manage The SQL Server online documentation set is called Books Online. It consists of a
and perform replication and other number of “chm“ (html help) files, through a single header. Documentation is of
administrative tasks. You can edit quite high quality, full of samples and hyperlinks. It is comfortable to use, both
stored procedures there, but only in a as online help and as a guide to study offline. The IB Langref,
small modal dialog. God knows why ?. DataDefinitionGuide, etc. are really good books, often more interesting to read
than fiction literature. As books, by comparison, I consider them superior to Books
The closest thing in IB to compare with Online. They are not very comfortable for use as an online reference, though. The
EM is InterBase Server Manager. My format and content organiization of the IB documentation are its main disadvan-
summary assessment is that developers tage, and I don't regard it as very important.
would not usually need this tool. Oh, no
– there is one single function in EM Transact-SQL
that I use sometimes – “extract db
Batches, SET and others
metadata”, which creates an SQL script,
just like in IB. EM would be useful for The SQL Server SQL dialect is called Transact-SQL, or just T-SQL.
newbie SQL Server DBAs, but an experi- The first surprise for an IB programmer is a Transact-SQL command, called batch,
enced DBA doesn't usually need any may contain many statements, much like a stored procedure. It can contain any
visual tool, since he knows SQL com- variety of different statements, including mixed DDL and DML. It can even contain
mands. In SQL Server, anything a DBA several different SELECT statements, each capable of returning results to a client.
has to do can be done using Transact- This feature is often very handy when you need to execute a complex statement
SQL. and don’t want to create a SP or when you need to retrieve some complex object
Query Analyzer containing several data sets. I must say that, in comparison with IB, this feature
is a pleasant one.
Query Analyzer is quite a nice tool for
the developer. In short, it is just a Editor's note: Firebird 2.0 introduces very similar feature namely EXECUTE BLOCK
multi-window MDI-style isql, which The second surprise: an assignment statement in T-SQL requires the SET keyword,
also has database metadata tree pane. just like in old Basic ? - example “SET @TMP = 10”. Furthermore, any variable in
This is the main and SQL tool for SQL T-SQL has to begin with the character @. Example – DECLARE @TMP INT. I don’t
Server developer and almost the only know why, but I suspect Microsoft is just too lazy to rewrite the language parser
one. For comparison with IB, QA is ?. It's an ugly feature, I think – but it doesn’t hurt.
something like IBConsole. But for IB

SELECT ‘test’
we also have well-known excellent Third surprise – you can write something like
IBExpert (http://www.ibexpert.com/),
which has similar look, though is much without any FROM clause! – and client will get a recordset with one record, con-

SELECT @a = 10, @b = ‘test’


superior. taining “test” ?. Or you can write
Concerning third-party tools for SQL
Server, incidentally, as far as I know – and the statement will just assign ‘10’ to @a variable and “test” to @b variable,
there are only a few of them, much without returning anything to client. Looks very strange for IB developer, does

www.ibdeveloper.com 32 © Copyright 2005-2006, All right reserved www.ibdeveloper.com


2005 ISSUE 3 DEVELOPMENT AREA
not conform to ANSI SQL standards, but sets. But they are returned in such way
– why not if it doesn’t hurt? that they cannot be used in SELECT,
Auto incrementation they are just immediately passed to
calling client. The only way to return a
There are no objects or functions like data set from a SP in T-SQL batch is to
the IB “GENERATOR”. Creating an us temporary tables – see below.
autoincrement field in the table is
easy: in the “CREATE TABLE” statement Since a SP in T-SQL cannot be used in
you just define something like “ID INT expressions and queries, T-SQL has a
IDENTITY(1,1)”, where IDENTITY(1,1) thing known as a “function” that is
means that server will automatically somewhat like a SP, but returns a sin-
insert incrementing integer values in gle value and can be used in any
this field, with seed 1 and step 1. Seems expression. There is also the “table
to be simpler than creating generator function”, a function that returns a
and trigger in IB, doesn’t it? data set, or a “table”, in an in-memory
table variable. These functions can be
However, if you need even slightly used in SELECT FROM clause. The main
smarter behavior, this simplicity limitation of functions is that they
becomes a problem. For example, once I cannot change data in the database or
needed to create a field which should change its state in any way. That is to
be incremented by every second record, say, you can employ a function to do
because in my DB structure, the SELECTs and any calculations, i.e.,
“accounting entry” entity was stored as retrieve data, but you cannot do an
two records (debit and credit) in one UPDATE, an INSERT or even an EXECUTE
table and had to have the same “entry PROCEDURE.
id” attribute. With the IB “GENERATOR”
implementation it would be trivial just Instead of beautiful InterBase FOR
to call “GEN_ID” to generate the next SELECT statement in T-SQL you must
id and then insert two records with the use more complex CURSOR processing.
same entry id. It is impossible to use The usage is "more complex" in that
the SQL Server IDENTITY function to you must declare a cursor, fetch the
implement such behaviour. I had to first record, do a fetch loop, close the
implement my own T-SQL function that cursor, deallocate the cursor – many
worded similarly to the IB “GEN_ID”, lines of ugly code instead of the simple
but with serious limitations. FOR SELECT. However T-SQL CURSOR is a
bit more flexible which can be useful in
My opinion - the IB “GENERATOR” is some rare special cases. (Editor note:
very useful feature with a wide func- IB and Firebird also have a form of
tionality that is absent in SQL Server. explicit cursor syntax in PSQL, never
Select form select documented by Borland, which makes
the FOR UPDATE OF CURRENT syntax
In 2002 I was very glad to see that T-
available for very fast execution of DML
SQL supported SELECT FROM SELECT,
on single or multiple tables. By com-
but now the latest Firebird versions
parison with the clumsiness of cursor
support this, too (Editor's note: Derived
processing in T-SQL, it is supremely ele-
tables in Firebird 2.0). Also T-SQL sup-
gant. This pearl of performance has
ports UPDATE FROM – a useful opera-
been greatly enhanced in Firebird 2.0.)
tion that allows you to update one
table with the data from another. It is Temporary tables
useful for complex batch updates; I Temporary tables are a nice feature, if
used it, for example, in a data ware- you use them properly. A local tempo-
house project for batch updating of rary table can be created in the scope
fact tables. of the connection and in the scope of a
Stored procedures SP. This means the temporary table is
visible in that scope and is automati-
A stored procedure in T-SQL cannot be
cally dropped when it goes out of scope
used in a SELECT statement. The only
(the connection is closed or the SP fin-
thing you can do with a SP is to EXEC it.
ishes execution, respectively).
This does not mean that SPs cannot
return data sets – they can, and more, Temporary tables are created with the
they can return several different data same CREATE TABLE statement as usual

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 33 www.ibdeveloper.com


DEVELOPMENT AREA 2005 ISSUE 3
tables, except the name of temporary Transactions in SP and triggers
table must start with #. Temporary In T-SQL you can manage transactions
tables are useful when you need to sim- from inside your SP or trigger code.
plify and/or optimize a complex query, That is to say, you can issue “BEGIN
dividing it into two or more simple and TRANSACTION”, “COMMIT TRANSAC-
fast queries that use result sets TION” or “ROLLBACK TRANSACTION”
returned by previous queries. They can anywhere. for my part, I have never
be used also when you want to return a used this feature, perhaps because of
data set to the current batch from a SP. my previous experience and habits with
To illustrate this, you can create a tem- IB.
porary table with necessary structure,
then call a SP that fills that table, then T-SQL also claims to support nested
process the data in the table as transactions. It looks like very nice
required and drop the table when done feature until you find that it does not
– all in a single batch. work as one would expect. Here are few
quotations from SQL Server Books
Temporary tables are stored in a special, Online:
separate database (named tempdb) so
they don’t affect the size and structure 1. “Committing inner transactions is
of the working database. ignored by Microsoft® SQL Server™.”
My conclusion on temporary tables is 2. “It is not legal for the trans-
that it is useful feature, if used with action_name parameter of a ROLLBACK
caution. “Caution” here means that you TRANSACTION statement to refer to the
use them only when there is no other inner transactions of a set of named
way, since creating the table and filling nested transactions. transaction_name
it with data take significant time. I’d can refer only to the transaction name
wish this feature existed in IB, not for of the outermost transaction.”
returning datasets from SP – IB have 3. “If a ROLLBACK WORK or ROLLBACK
better solution (SUSPEND) for that – TRANSACTION statement without a
but for complex query optimization. transaction_name parameter is execut-
In T-SQL there are also global tempo- ed at any level of a set of nested trans-
rary tables – the same as the local ones action, it rolls back all the nested trans-
just described except that, once creat- actions, including the outermost trans-
ed, they are visible to all connections. I action.”
have never used them and have no This just means that T-SQL nested
experience or idea on how to use them. transactions are not nested transac-
Some dynamic data exchange between tions ? - at least in the sense you might
two different connections may be? expect.
Error handling Triggers
An awful thing is that there is no struc- Triggers in T-SQL can be only AFTER and
tured error handling in T-SQL. The INSTEAD OF. And this does not mean
InterBase “WHEN .. DO” statement AFTER (or INSTEAD OF) the row insert,
might appear to have some disadvan- update or delete, but after the state-
tages, but IB programmers just don’t ment completes. In other words, unlike
understand their luck! Whatever its IB triggers , T-SQL triggers are state-
warts, it is much better than nothing at ment level triggers, not record level.
all. The only thing you can do in T-SQL
Furthermore, in the statement level
is to check the value of a return error
trigger you deal with a set of changed
code by way of the @@ERROR system
(inserted, deleted) records, so there are
function. Moreover, if some error is
no NEW and OLD context variables avail-
raised somewhere in a SP or a trigger
able in T-SQL triggers. Instead there are
(e.g. primary key violation) it doesn't
virtual tables called INSERTED and
cause code execution to be stopped
DELETED. The INSERTED table contains
(except for very serious , usually sys-
new records (inserted records as well as
tem-level, errors)! If you need to stop it
the new versions of updated ones), the
you must watch for @@ERROR values
DELETED table contains old records
and stop execution by hand.
(deleted and old versions of updated).

www.ibdeveloper.com 34 © Copyright 2005-2006, All right reserved www.ibdeveloper.com


2005 ISSUE 3 DEVELOPMENT AREA
What do I think about all this? Well, in different DB statistics and current
some special conditions, if used proper- state, etc. There are even a few proce-
ly and coded carefully, T-SQL set-ori- dures for sending email from a T-SQL
ented triggers can work faster than the batch, calling OLE automation objects
record-oriented triggers in IB. What are and XML processing. It's my opinion
these conditions? An example is where that email and OLE functions are obvi-
your software frequently makes updates ously redundant features in SQL Server
or inserts that affect multiple records, but – why not if does not hurt? ?
say, hundreds or more. I refer to such In T-SQL you can deal with different
conditions as “special” because I think databases within one batch. One uses
that OLTP usually deals with single so-called “fully qualified names” in the
records, and when you need to do some form database.owner.object_name. For
massive batch update you would usual- example, you can issue a SELECT state-
ly write a special procedure for it, ment that joins tables from different
switching triggers off before running databases. With the “Linked Server”
the update and turning them back on feature (see below) or the
afterwards. In my view, an OLTP solu- OpenDataSource built-in function, you
tion that does frequent massive batch can even do heterogeneous queries
updates as a part of normal processing with tables from different SQL Server
is bad practice. For OLAP or similar instances on different hosts or even
solutions that require massive batch from a different RDBMS – SQL Server
updates, it is much easier and more and IB, or Oracle, or dbf, for example.
common to create special procedures From my point of view it is a very use-
with no triggers at all and a minimum ful feature when you need to integrate
of constraints, to increase update different data sources.
speed.
Almost any administrative task in MS
At the same time T-SQL set-oriented SQL Server can be done using T-SQL. In
triggers are much more difficult to my opinion, this makes it so much more
write and their source code is usually comfortable and “natural” than the IB
anything but obvious. Services API.
My conclusion about triggers is that IB Overall impression about T-SQL: it
triggers are more developer-friendly seems more powerful and flexible than
and their usability is better than in MS IB SQL, thanks to its built-in function
SQL Server. As for speed, in common library and its ability to mix DDL and
conditions the speed is the same. DML. At the same time it looks more
Functions “dirty” because of its SET, @ and some
T-SQL has very wide set of ready to use other atavisms. The IB SQL dialect pres-
functions, in contrast to IB, where it is ents as more concise and clear, though
supposed that the developer can imple- less flexible.
ment anything he needs via UDF. Miscellaneous server features
String, date and time, mathematical, The feature of MS SQL Server that I like
security, metadata functions, even sys- most of all and that is absent in IB is
tem configuration functions, all make heterogeneous (distributed) queries
your life easier, so you usually don’t through the Linked Server or the OPEN-
need to write UDFs. If you still need DATASOURCE built-in function. Linked
some function that does not exist in Server is any OLE DB or ODBC data
the standard library you usually can source, that is preconfigured (connec-
write this function using T-SQL itself. tion string specified) and registered in
For example, while in IB I used my own SQL Server instance by DBA. After a
UDF for special string parsing, in MS Linked Server is registered, its tables
SQL Server I could write this function in are available to T-SQL queries. The T-
T-SQL. SQL query optimizer can even use some
SQL Server also comes with a lot of sys- schema information presented by the
tem stored procedures, mostly for Linked Server OLE DB driver to optimize
administrative needs. Using these SPs, heterogeneous joins.
a DBA can configure most server and Functionality similar to Linked Server
database parameters, replication, see

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 35 www.ibdeveloper.com


DEVELOPMENT AREA 2005 ISSUE 3
can be obtained using the OPENDATA- IB has had this engine for more than 20
SOURCE and OPENROWSET built-in years. I would be surprised if, at least
functions, without registering a Linked for the next 2-3 years, MS is not fixing
Server. These functions can be useful bugs every day in YUKON versioning. So
for data sources that are used infre- IB is still far ahead.
quently. Query execution speed
As for me – I often use heterogeneous I don't present any test results here
queries in T-SQL to load data from since my tests were not standard and I
external data sources. For example, didn’t propose to publish them. The fol-
recently we needed to load few hun- lowing is my private opinion based on
dred thousand records from several my own assessment, which I am not
Excel files. Direct data loading using going to argue about. My estimation is
Excel VBA worked very slowly, so we that, starting from millions of records
tried a T-SQL INSERT FROM SELECT in main tables, MS SQL Server is faster
statement using OPENDATASOURCE by two or three times on simple queries,
through MS Jet OLE DB 4 driver for the especially with GROUP BY and aggrega-
FROM clause and found that it worked tions. This is an average and very rough
about 100 times faster. Using this estimation. In some conditions, IB ran
INSERT FROM SELECT, we even could fil- faster.
ter out some unnecessary records by
joining the Excel “table” with an SQL Optimizer and complex queries
table to plug in the filter criteria. Again, no test results, for the same as
The feature of MS SQL Server that I hate before. Query optimizer quality is a
most of all is the locking. It locks data very difficult thing to assess and com-
whenever you read or write. It has no pare at all. In my opinion, SQL Server
record versioning engine like IB. optimizer is “smarter”, since it usually
Almost all RDBMSes behave this way, makes a rather good plan for complex
but happy IB developers don’t even queries which IB optimizer cannot
think about how bad their lives could manage with. But sometimes (not
be if IB behaved like that. It is a really often) this “smartness” fails and it gen-
bad, very annoying feature that must erates terrible plans for quite simple
be kept in mind from the very begin- queries. I don’t know why since I don’t
ning when you design and write your know how it works internally. As far as
application. I know, RDBMS query optimizer logic is
still closer to art than to engineering…
Example – if a client runs some heavy
query, say an annual report that takes a If you meet this situation in SQL Server,
minute or two to generate, nobody can you can prompt the optimizer for a bet-
write to the pages where the report ter plan using query hints right in the
data is located until the transaction in query text – how to make join, how to
which the report is running has com- order, etc. For example, for a join you
pleted. can prompt it to use one of LOOP |
HASH | MERGE | REMOTE as the join
How do people work around this? I was method. There is no opportunity to
very interested in that, too. One time, I give the full execution plan to the SQL
ran the SQL Profiler and watched how Server optimizer like in IB. Usually,
one well-known, modern and respected though, query hints are enough and
accounting software running on MS SQL work fine.
Server generated its reports. Easy – it
just executed queries in dirty read The SQL Server optimizer uses his-
mode. I think you understand well what togram index statistics, not just the
that means. GIGO – “garbage in – single scalar value as IB does. It is
garbage out”. Usually everything is OK, likely that this helps it to make better
but one day you find that Total line in decisions about the execution plan in
your report or document that is not some cases. However, if some table has
equal to sum of its lines… a clustered index (another thing that is
absent in IB), the optimizer will use it
The next version of MS SQL Server – without paying attention to other
YUKON – is promised to have a record indexes.
versioning engine. If it does – fine. But
www.ibdeveloper.com 36 © Copyright 2005-2006, All right reserved www.ibdeveloper.com
2005 ISSUE 3 DEVELOPMENT AREA
In SQL Server you can split your data- he will have a bit more work administering users and groups for your application.
base into several files, as you can in IB. Data Access
But what you can do in SQL Server (but
not in IB) is specify particular files for At the time of writing, the main data access library for SQL Server was MS ADO 2.5,
specific tables and indices. This is done working over the SQL Server OLE DB driver. ADO is built as a set of ActiveX compo-
just in CREATE TABLE statement. nents, allowing it to be used in any language supporting ActiveX (COM). In our
Sometimes it can be useful. For exam- applications we used it in Delphi, Office VBA, VBScript and JScript (MS
ple, if you have one very fast but not ActiveScripting engines). In comparison with the Delphi IBExpress or or FIBPlus
large hard disk and another one that is data access libraries, this is an obvious advantage of ADO.
large, but slow, you can place your main In ADO, the data access and data representation components are more separated
table(s) on the fast disk and other than in the VCL and, in my opinion, this makes it better. ADO has a Command com-
tables on the slow one. ponent that executes SQL statements on the server and a Recordset component
Backup and restore which represents data sets returned by the server. In our multi-tier application
this pattern was more comfortable for us than IBDataSet or FIBDataSet, which
They just work. At least I’ve never incorporate data access and data representation logic in single object.
heard about an “unrestorable backup”
on SQL Server, but it's something that Next, ADO Recordset component can marshal itself by value transparently for COM
sometimes happens on IB. I'm not sure applications. This means that if your application server has a COM interface with
why, but I think it’s because a SQL a method returning an ADO Recordset and you call this method from client appli-
Server backup file is completely differ- cation, local or remote, you can use usual method call semantics.
ent from an IB one. Internally, it is Here's a small code sample in Delphi:
just a compressed database, not meta-
Server:

TMyServerObject = class(TAutoObject, IMyServerObject)


data and data exported in generic for-

function GetSomeData: Recordset;


mat like in IB.

end;
There is also “differential backup” in
SQL Server – it creates not a complete
function TmyServerObject GetSomeData: Recordset;
database backup, but just the changes
begin
Result := CreateOleObject(‘ADODB.Recordset’);
from some point. I haven’t used this

//fill recordset with data here


feature, but I think it must be useful for

end;
large databases, when you just have not
enough space for, say, complete “six
revolver” backups. (Editor note:
Firebird 2.0 has an incremental backup Client:
Var
facility. Its name is Nbackup.)

rs: Recordset;
SQL Server also has a “shrink database”
serverObject: IMyServerObject;
function to remove empty pages from
begin
serverObject := CreateRemoteComObject(‘remote_host’,
database files. If you often do batch

MyServerObjectGUID,) as IMyServerObject;
DELETEs, this function can decrease

rs := serverObject.GetSomeData();
database size by up to two or three

while not rs.Eof do begin


times. In IB you can achieve the same

//do anything here


result with backup/restore.

Rs.MoveNext();
end;
SQL Server allows a client to cancel a

end;
long-running batch. It is a useful fea-
ture, not available in Firebird or older
versions of IB.
SQL Server has Windows-based authen- If you need to run MS Excel (or any other OLE server) from your application and to
tication along with its own internal pass it a recordset that it should process in VBA or just add data to its sheet, you
autherntication, useful if your applica- do almost the same – just pass the ADO recordset as a parameter of the method
tion works in a LAN with Windows call. Behind the scenes, it performs transparent marshalling – it serializes the
Domain or Active Directory installed. data into a binary stream, sends the stream on to another process or to a remote
It simplifies the lives of developers host, then deserializes the stream back into a Recordset, all without you needing
(they don’t need to think much about to be concerned about any of it. – but you don’t need to care about all this. Thus,
login dialogs and authentication) and you can use the Recordset to implement the “Data Transfer Object” pattern. The
users (they don’t need to enter a pass- VCL ClientDataSet has similar functionality but, of course, you can use it only in
word to connect to SQL Server). The Delphi.
only person who would possibly be not You can also construct an ADO Recordset and fill it with data by hand, without any
too happy is Domain admin ?, because database. You just define fields, open it, insert data rows and it works.

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 37 www.ibdeveloper.com


DEVELOPMENT AREA 2005 ISSUE 3
If you want to use VCL DataAware con- database, you must determine differences
trols with an ADO Recordset returned by between old and new metadata. That's a
an application server (or received by rather complex task when using plain SQL
some other means), you can use the VCL scripts and a very simple one with SQL-
ADODataSet and its Recordset property: DMO and structured schema files.
just assign your recordset to the Unfortunately, unlike ADO, SQL-DMO
ADODataSet.Recordset and it turns into does not work over OLE DB. It is a SQL
usual VCL DataSet that you can attach to Server specific library so, sadly, you can-
a VCL DataSource and any Tdatasource not use it for IB ?.
compatible data aware control.
As for speed, in general ADO works more
slowly than IBExpress or FIBPlus on
such typical tasks as looping through a
Conclusion
dataset and doing something with its Both IB and MS SQL Server are good and
fields. How much slower depends on the of sufficiently high quality to use in real
actual task. The thing that seriously industrial applications. Both have
decreases speed for the ADO Recordset advantages and disadvantages, which
is that it gets and sets field values as you should keep in mind when choosing
OleVariant data type. In our code there an RDBMS and designing your applica-
are two often-used low-level library tion.
functions (copy one recordset to anoth- For my part, I use the following very
er) where, in order to optimize perform- rough recommendations for the choice
ance, we had to process data at the OLE (if there are no external reasons to
DB level, without any OleVariant conver- choose one and not the other):
sions. It works two or three times faster.
1. Estimated database size 10Gb and
Other code works normally with ADO
more – probably SQL Server, due to its
Recordsets.
better performance in query processing
And the last words about ADO: since on large tables.
there is an excellent InterBase OLE DB
2. Estimated database size is less than
driver (IBProvider,
10Gb – probably InterBase, due to its
http://www.ibprovider.com), you can
non-locking versioning engine and
use ADO to access InterBase databases
therefore simplified application design.
in just the same way SQL Server pro-
grammers do it with SQL Server. For 10Gb and larger databases you also
have to pay serious attention to the
SQL-DMO – SQL Server Data
type of application you need – OLTP or
Manipulation Objects library is another
OLAP or a mixture. As I said earlier, SQL
thing that I would like to have in IB.
Server does not like mixed types. since
SQL-DMO is COM library, which presents
writers block readers and vice versa
SQL Server database objects as a hierar-
(well you are also able to work in dirty
chical set of collections of tables, views,
read mode if you aren’t afraid of GIGO).
SPs, roles, etc. A Table object in turn
So if you are going to create some
contains collections of fields, triggers,
mixed type of application, you should
constraints, etc. That seems familiar,
consider using separate databases for
doesn’t it? Right – this is just what
OLTP and OLAP query processing or
most RDBMS administrative and devel-
using full-blown OLAP software, such as
opment tools look like. So, if you want
MS Analysis Services (OLAP server), for
to create some administrative or devel-
example, which is included in the MS
opment tool for SQL Server, you don’t
SQL Server Enterprise version.
need to parse system tables as you do in
IB: SQL-DMO does it for you. Incidentally, you can use Analysis
Services as an OLAP solution for IB, too.
In our applications we use SQL-DMO to
create databases from our own XML The conclusion is very simple and far
schema and, even more importantly, to from being something new – the devel-
alter databases when versions of soft- oper should not be a fanatic follower of
ware and database schema are changed. a single RDBMS, but should be open to
Why is the latter important? Because choosing the one that best fits his abil-
you can create a database using a sim- ities and the customer’s requirements
ple SQL script but, to alter an existing and problems.
www.ibdeveloper.com 38 © Copyright 2005-2006, All right reserved www.ibdeveloper.com
2005 ISSUE 3 TESTBED

TPC based tests for InterBase & Firebird


What is the TPC? panies, As an indication, the unit of By Alexey
measurement for OLTP test TPC-C is a Kovyazin
expect most database developers

I know what TPC is. On their site


(www.tpc.org), they state “the TPC
is a non-profit corporation founded to
“tpcm” –a million business transactions).
With Firebird and InterBase we tend to
serve the low-end to middle segments
[email protected]

define transaction processing and of the database market, so it is reason-


able to ask how TPC could benefit our TPC-R is already showing very interest-
database benchmarks and to dissemi-
databases. ing results which are available at
nate objective, verifiable TPC perform-
http://ibdeveloper.com/tests/tpc-r/
ance data to the industry.” First, TPC publishes all its test SQL
scripts and the source code for the The toolkit for TPC-R is also available
In plain language, these folks investi-
tools. They developed database struc- where.
gate the limits of database servers+har-
dware configurations and provide tures and SQL scripts according to stan-
TPC-C
everyone with results. The results rep- dards for the enterprise-level DBMS.
resent an independent and objective Examining these databases and scripts TPC-C is an on-line transaction process-
appraisal of a database's performance reveals a well-known structure – cus- ing benchmark. It is potentially much
tomers, orders, orders lines and so on. more useful because it enables us to
TPC methodology Almost every database developer is test the new SMP capabilities in
familiar with such things. InterBase 7.x and the forthcoming
How do they estimate performance?
Vulcan (and Firebird 3.0, of course).
Well, it is a very interesting question So, we decided to get the TPC databas-
with a very interesting answer. If you es, scripts and tools, adapt them to It simulates the work of large number
ever think about assessing the per- InterBase and Firebird and see just of clients inserting, updating and
formance of any computer system you'd what this is all about and how it might deleting records in several warehouses.
probably agree we usually have grades be useful. You can see some TPC-C based test results
like these: “Wow, it is fast!” and “Damn, here: http://ibdeveloper.com/tests/tpc-c/
it is too slow” (and several slightly dif- TPC-R
ferent variations). The toolkit for TPC-C is available here also
The TPC Benchmark™R (TPC-R) is a available where.
TPC's more sophisticated approach is decision support benchmark, but which
related to business: they estimate the allows additional optimizations based The TPC based testing program
cost of business operations. on advanced knowledge of the queries. So, IBDeveloper magazine hereby
It consists of a suite of business-ori-
They established a set of standard busi- announces the start of its TPC based
ented queries and concurrent data
ness operations for this purpose - test program. We will test each new
modifications
“transactions” (yes, TPC uses the term release of InterBase and Firebird, pub-
“transaction” in a business context – Now TPC-R is marked as obsolete at lish the results of testing along with
as equivalent to "a business opera- tpc.org, but we thought it is still looked commentaries.
tion", not to be confused with data- pretty useful for our goals. TPC-R con-
The toolkit for performing our tests are
base transactions). tains the kinds of “heavy queries”
available for anyone interested in
which are hard for optimizers.
Measurement techniques include data- reproducing them. What's more, we are
bases, SQL scripts and tools to We plan to use TPC-R to test optimizer calling for testers – if you would like to
create/populate the databases and run workings and the performance of help us by carrying out tests on your
the SQL scripts. The idea is to count access methods in all new versions of real-world hardware we will assist you
how many transactions can be per- InterBase and Firebird. The goal is to and publish the results on our test
formed on particular software+hard- check that new versions will be at least results page.
ware configuration and then divide the not worse than the old.
count of transactions by the cost of the Also TPC-R generates high load to a Thanks
hardware and software. system and can be considered useful as The initial efforts for porting TPC-R
a stability test. and TPC-C for InterBase and Firebird
Our TPC based tests In fact, the test based on TPC-R for were down to Aleksey Karyakin, devel-
If you take a look at TPC results you will InterBase or Firebird contains a data- oper of mature ODBC-driver Gemini
observe that only large companies par- base creation script, a tool for populat- (www.ibdatabase.com) and one of the
ticipate in them. The reason is that ing the test database and 22 scripts. We former Yaffil developers. Currently we
full-scale TPC testing is very expensive. measure execution time of for these (the IBDeveloper team and Aleksey
Moreover, its target is high-productivi- queries and consider the query plans Karyakin) are continuing our work on
ty systems for the world's largest com- generated for them. the tests.

© Copyright 2005-2006, All right reserved www.ibdeveloper.com 39 www.ibdeveloper.com


MISCELLANEOUS 2005 ISSUE 3

This is the first, official book on


Firebird — the free, independ-
ent, open source relational
database server that emerged
in 2000.
Based on the actual Firebird
Project, this book will provide
you all you need to know about
Firebird database develop-
ment, like installation, multi-
platform configuration, SQL
language, interfaces, and main-
tenance.

www.ibdeveloper.com 40 © Copyright 2005-2006, All right reserved www.ibdeveloper.com

You might also like