0% found this document useful (0 votes)
795 views

DB2 Text Search

DB2 Text Search provides full text search capabilities for data stored in DB2 tables. It allows indexing text columns and running searches on the indexed data using SQL or XQuery. The text indexes are stored separately from DB2 data on the file system. DB2 Text Search is installed as part of DB2 and supports features such as multiple languages, scoring, and incremental asynchronous index updates.

Uploaded by

sbchkr32011
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
795 views

DB2 Text Search

DB2 Text Search provides full text search capabilities for data stored in DB2 tables. It allows indexing text columns and running searches on the indexed data using SQL or XQuery. The text indexes are stored separately from DB2 data on the file system. DB2 Text Search is installed as part of DB2 and supports features such as multiple languages, scoring, and incremental asynchronous index updates.

Uploaded by

sbchkr32011
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

DB2 Text Search - In Action!

Somu Chakrabarty
Gamesys Limited
Session Code: E8
Tuesday, 17 November 2015 - Time 15:15 16:15
Platform: DB2 for LUW

Objectives
Introduction to free form text search options in
DB2 for LUW
Architecture, Installation and Configuration of
DB2 Text Search
Features and Properties
DB2 Text Search usage scenarios
Comparison with NSE and Migration from
existing NSE Text Index
DB2 Text Search used in IBM Content Manager
Tips and Troubleshooting of DB2 Text Search
2

Introduction
3

Introduction to full text search


inoptions
DB2
concept
Text search
Old Text Information Extender (TIE)
now discontinued/removed completely.

Net Search Extender (NSE) available DB2 v7.2 onwards,


changed in v9.7
now deprecated from 10.1 and might
be removed in a future release
(still available as of the
v10.5 fixpack 6 though).
DB2 Text Search (introduced in DB2 v9.5 FP1)
only strategic/recommended text search solution for DB2 at
present. Available
with all editions of DB2.

DIY - Adhoc programmatic solution, for example, word table


method
complicated, not easy to maintain, prone to errors.

NSE and DB2 Text Search can co-exist in the same


database instance.

Introduction - Continued
Basics of DB2 Text Search

Integration with DB2 server installed as bundled


in DB2 server install binaries, as the custom
install of an optional component.
Shares the level information from DB2.
No additional license is required.

Also available as a stand-alone server installation


independent of DB2 server (known as ECM Text
Search server) - starting from v10.1.
Supported in Database Partitioning and Table
range-partitioning and MDC - from v10.1.
Provides extensive capabilities for searching large
amounts of data stored in text columns in tables
of DB2 database.

Introduction - Continued
Basics of DB2 Text Search Continued
Text indexes offer significant performance benefits
compared to SQL 'LIKE' operator.
DB2 Text Search searches using atext search index
file. A text search index consists of significant
terms/words that are extracted from the text
documents.
Scoring and Contains functions.
Documents can be unformatted plain text, rich text,
formatted structured text such as XML, HTML or
proprietary document formats such as MS Office Doc,
PDF).
SQL, SQL/XML, and XQuery support.
Linguistic processing in all supported languages with
optional Synonyms definitions.
Text indexes are stored on a file system, not in a

DB2 Text Search Server only strategic


text search tool

Integrate
d Text
Search
server
Standalone Text
Search
server

Supported data types: CHAR, VARCHAR, LONGVARCHAR, CLOB,


DBCLOB, BLOB,GRAPHIC, VARCHAPHIC, LONG VARGRAPHIC, XML
Can use UDF to convert an unsupported format into a supported
format/type.
Each database instance can be associated with a single Text
Search server.
Useful for indexing documents that are stored in external
unsupported data stores database table column containing a
pointer/reference.
Use of supported locale code for language specification.

Architecture
8

DB2 Text Search Architecture

DB2 Text Search Architecture Continued

When a database is enabled for text search, catalog


tables and views are created in SYSIBMTS
schema.
Text indexes are managed as file system objects and
the low level representation of an index is known as
a collection.
After a text index is created on the user table, the
staging table and log table are created
automatically.
Text Indexes require that a primary key is defined on
the corresponding table.
The client applications can issue searches using SQL
and XQuery statements.
The administration commands can be run on the

10

Indexing and Searching


db2ts START FOR TEXT
db2ts ENABLE DATABASE FOR TEXT
db2ts CREATE INDEX
myschema.mytitleidx FOR TEXT ON
books(title)
db2ts UPDATE INDEX
myschema.mytitleidx FOR TEXT
SELECT author, year, SUBSTR(title,1,30)
FROM books
WHERE CONTAINS(title, 'mountain') = 1
SELECT title
FROM books
WHERE CONTAINS(title, 'mountain') = 1
11
ORDER BY SCORE(title, 'mountain') DESC

Installation
and
Configuration
12

Installation and Configuration of DB2


Configured
per instance using DB2 installer, or db2icrt or the
Text
Search
Configuration tool
Text Search server port per instance
Authentication token for security

13

Installation and Configuration of DB2


Text Search

14

Configuration of DB2 Text Search


Initial configuration or Reconfiguration
Rerun silent installation with response file
Rerun GUI installation (for initial configuration
only)
Use configTool for manual configuration
db2isetup (in GUI), db2iupdt -j, db2iupgrade -j,
db2icrt -j
configTool printAll to display the current
configuration if any.
DB2 Update SQL on SYSIBMTS.TSSERVERS
SYSTS_CONFIGURE to apply certain text search
server properties to the text search catalog

15

Features
and
Properties
16

Text Search Catalog and


Administrative
Views

Database-level views:
SYSIBMTS.TSDEFAULTS
SYSIBMTS.TSLOCKS
SYSIBMTS.TSSERVERS

Text-index-level views:

SYSIBMTS.TSINDEXES
SYSIBMTS.TSCONFIGURATION
SYSIBMTS.TSCOLLECTIONNAMES
SYSIBMTS.TSEVENT_nnnnnn
SYSIBMTS.TSSTAGING_nnnnnn

17

DB2 Text Search Index features and


properties
Ensure that your system has enough real memory available for

the index update operation. Index updates require memory that


is in addition to that required for any database buffer pools. If
there is insufficient memory, the operating system uses paging
space instead which decreases search performance
considerably.
Index updates are asynchronous.
A staging table, which is also known as a log table, for each text
searchindex.
The changes to the base table are captured in the staging table
for the text-search index.
Trigger on each base text table logtype BASIC
Index optimization adminTool optimizeIndex
Partial integration with DB2 server Some table operations are
restricted when a text index is created and in an active state.
Buffered by the native OS disk file system cache, not in DB2
bufferpools.

18

DB2 Text Search Index features and


properties
Text Index size

the
the
the
the

average size of the document


size of the document key (the primary key columns)
number of sortable fields
number and distribution of unique terms

Additional intermediate work space


Log files
Administrative objects (inside the database) for each text
index

A Staging table
An optional Auxiliary Staging table
An Event table
Trigger on the base text table

19

DB2 Text Search Index features and


properties
Text Search index location
For an integrated Text Search sever
Configuration and collection metadata is stored:
instanceHome/sqllib/db2tss/configon UNIX
instanceProfilePath\instance_name\db2tss\configonWindow
s.

For stand-alone DB2 Text Search servers


location of configuration files for collections is determined
by thedefaultDataDirectoryparameter.

Configuration and collection directory


<ECMTS_HOME>\config\collections
defaultDataDirectory\collection_name\data\text

20

Usage
Scenarios
21

Creating and populating a Text Index

22

Creating and populating a Text Index


Create
db2 CALL
SYSPROC.SYSTS_CREATE(MYSCHEMA,MYTITLEINDEX,
myschema.books(title),,en_US,?)
db2ts CREATE INDEX myschema.mytitleidx FOR TEXT ON
myschema.books(title)

Update
db2 CALL
SYSPROC.SYSTS_UPDATE(MYSCHEMA,MYTITLEIDX,,
'en_US', ?)
db2ts UPDATE INDEX myschema.mytitleidx FOR TEXT
call sysproc.systs_admin_cmd('update index
myschema.mytitleidx for text', 'en_US', ?)
23

DB2 Text Search Index Update


Incremental and asynchronous index
update
The text index update is asynchronous not part of the same transaction with
which the data column in the base table
has been updated.
Text Search index update parameters

USING UPDATE MINIMUM


ALLROWS
UPDATEAUTOCOMMIT
COMMITCYCLES
24

Text Index Update Scheduling


Specifying an update frequency when creating
or altering a text search index.
To run every 30 minutes at the hour and half
hour
db2ts "create index indexname for text on
tablecolumn update frequency D(*)H(*) M(0,30)
db2ts "alter index indexname for text update
frequency D(*) H(*) M(10,30,50)"

Automatically creates a scheduled task through


DB2 Administrative Task Scheduler.
Need to enable Administrative Task Scheduler
on the DB2 instance
Set the DB2_ATS_ENABLE registry variable to YES,
TRUE, 1 or ON.

25

Text Index Update Scheduling


Update task status depends on the privileges of the user
running the update command

DBADMIN
Non-DBADMIN
SYSTS_MGR role

Schedule Task and Task Status

Select message from sysibmts.<eventtable>


Select from systools.admin_task_list
Select name, status from systools.admin_task_status

Consider initial update workload

Create text search index without schedule (update frequency none)


Manual initial update by update index command
Then add the schedule by alter index command

If you are doing massive inserts or updates, drop the NSE


index before processing the updates if you can. Then
recreate and update the index after the updates finish.

26

Text Index Backup and Restore


considerations
Text Indexes (actual text-search index collection content) are not

included in the DB2 Backup and Restore operations.


DB2 Backup includes text-search index catalog data and text-search
index administration data.

27

Text Index Backup and Restore


considerations
No online backup option for text index files.

Complexity involved in backup/restore situation when


incremental scheduled update is used.
The following data must be synchronized for a backup Per index

DB2
The database table that owns the text-search index
The staging table (and event tables) for the text-search index

File system
Collection data for the text-search indices
Text-search index metadata

Per database

Catalog tables with metadata for DB2 Text Search

Per DB2 instance

File system
Scheduler data
Text-search server metadata
configuration (including synonym dictionary and log)

28

DB2 Text Search in High-Availability


setup

29

DB2 Text Search in High-Availability


setup

Collection directory on a shared disk accessible


from both primary and standby.
Text Search Index copy on the standby system.
Standby system restore after failover.
Identical File systems.
Text-search server configuration on the standby
system must match the configuration on the
primary system, except for those configuration
values that address system resource settings.
Identify and treat interrupted text-search
administrative operations, if any, after fail-over.
30

DB2 Text Search in a Hash Partitioned


Database
Multi-Collection text index: A text
index collection/partition for each
database partition.
One staging table per index using
DBPARTITIONNUM function.
A stand-alone remote text search
server setup is recommended in
partitioned database environments for
workload distribution purpose.
To start the instance services, you
must run the db2ts START FOR TEXT
command on the integrated text
search server host machine
The host of the lowest-numbered
database partition server.

For partition addition/deletion,


followed by REDISTRIBUTE DATABASE
PARTITION GROUP
Use Update Index FOR DATA
REDISTRIBUTIONoption

Before inserting or deleting partition

31

Using DB2 Text Search


Text Search with Contains function in DB2 SQL
Basic search (with or without wildcards and synonyms)
select title from books where contains(story, dragon
wizard)=1
Optional QUERYLANGUAGE and RESULTLIMIT parameter in
CONTAINS

Fuzzy search (with optional degree of similarity)


select author, year, story from books where contains(story,
cat~0.4) = 1

Proximity search (with distance in number of words from


each other)
select author, year, substr(story,1,30) as title from books
where contains(story, "cat pigeon"~4) = 1

Searching for special characters - escape the special


character by adding a backslash before it.

32

Text Search usage and example


XML/XQueries
SQL
XMLQuery
Find authors of all the books where bookinfo field contains the text
word range
SELECT xmlquery('$bi//author' passing bookinfo as "bi") as author FROM
books WHERE
CONTAINS(bookinfo, 'range') = 1

Xpath
Find author, year and title of all books containing the text word
range in the story field of bookinfo
SELECT author, year, SUBSTR(title,1,30) as title FROM books WHERE
CONTAINS(bookinfo,
@xpath:''/bookinfo/story [. contains("range")]''') = 1

XQuery

Find author of all books containing the text word range in the story
field of bookinfo
xquery db2-fn:xmlcolumncontains( 'BOOKS.BOOKINFO','@xpath:''/bookinfo/story[.
contains("range")]''')/bookinfo/author

33

DB2 Text Search Administration

Five command-line tools are included withDB2 Text Searchto


facilitate its use.

The
The
The
The
The

Configuration tool
Administration tool
Synonym tool
Stop Word tool
Log Formatter tool

DB2 Text Search Service Stop, Start


Enable, Disable database for Text Search, Drop Index
Cleanup for Text Deleting orphaned DB2 Text Search collections to
delete orphaned DB2 Text Search collections
Adding, removing Synonym Dictionary
Clear Events for Index for Text
Altering update properties of a Text Search Index (db2ts ALTER
INDEX or sysproc.SYSTS_ALTER
Viewing Text Search Index status by adminTool or from

34

Text Search Stored Procedures


Enable a database - SYSPROC.SYSTS_ENABLE
Configure a database - SYSPROC.SYSTS_CONFIGURE
Disable a database - SYSPROC.SYSTS_DISABLE
Create a text index - SYSPROC.SYSTS_CREATE
Update a text index - SYSPROC.SYSTS_UPDATE
Alter a text index - SYSPROC.SYSTS_ALTER
Drop a text index - SYSPROC.SYSTS_DROP
Clear events for a text index SYSPROC.SYSTS_CLEAR_EVENTS
Clear command locks SYSPROC.SYSTS_CLEAR_COMMANDLOCKS
Reset pending status - SYSPROC.SYSTS_ADMIN_CMD
Cleanup inactive indexes - SYSPROC.SYSTS_CLEANUP

35

DB2 Text Search Security Model


No need for database privileges for the instance owner.
Not necessary for the fenced user to be in the same
primary group as the instance owner.
3 new system roles (granted or revoked by SECADM)
SYSTS_ADM (to execute text search operations on database
level)
SYSTS_MGR (to execute text search operations on text index
level)
SYSTS_USR (access to text search catalog data)

Roles automatically database creator during database


creation time
Example: To create a text index privileges needed are:
Base table privilges
SYSTS_MGR role

36

DB2 Text Search Security Model

Role

Operation

Text Search Administrator

SYSTS_ADM

Enable, Disable, Clear


command locks (all),
Configure

Text Search Manager

SYSTS_MGR

Create, Update, Alter, Drop,


Clear Events, Clear
command locks (per index),
Reset Pending

Text Search User

SYSTS_USR

Limited access to the text


search SYSIBMTS catalog

Typical users of Text Search are:


- Text Search Server Administrator (configure, start, stop)
- Text Search Administrator (enable/disable database, clear
command locks)
- Text Search Index Manager (create, update, alter, drop, clear
event of index)
- Database users performing text search queries through SQL
and XQuery (if a user can issue a SELECT statement on a
given table, user can also perform a text search on that table)

37

DB2 Text Search 10.5 Enhancements


Summary
Configuration capabilities

Reduced impact of indexing on search.


Stronger support for multilingual collections.
Support for embedded documents, archive files, and compressed
files
Enhanced configuration capabilities

Committing batches (also from v10.1 FP3)

More options for finer control of update processing.


Number of commit cycles should be completed during one update
session.
Commit size is based on the number of rows or the time passed (in
hours).

Setting manual command locks (also from v10.1 FP3)


Index configuration options (also from v10.1 FP3)

Two new index configuration options,INITIALMODEandLOGTYPE

38

NSE and
Text Search
Comparison
39

Comparisons between NSE and DB2

NSESearch
is a plug-in application for DB2 and must be installed separately, comes as a separate
Text
install binary.

If a database is already enabled for Net Search Extender, and you want to use Text Search in
that database, you can use the index coexistence feature to query the database.

Command executable - db2ts for Text Search and db2text for NSE
Main comparative features are given in the below table:

Function/Feature

NSE

DB2 Text
Search

Database Partitioning

Yes

Yes

Range-Partitioned table

Yes

Yes

Caching

No

No

Synonym dictionary

Yes

Yes

Text, HTML, XML, INSO

Yes

Yes

Linguistic processing

Yes
(English
Only)

Yes

Contains/Score function, Resultlimit

Yes*

Yes*

Free Text Search

Yes

No

Document Model

Yes

No

Number of matches

Yes

No

40

Comparisons between NSE and DB2


Text Search
Function/Feature

NSE

DB2 Text
Search

Highlights

No

No

Fuzzy, Proximity, Precise and Boolean Search

Yes

Yes

Stop-word processing

Yes

Yes

Attribute Search

Yes

No

Start command

START

START FOR TEXT

Stop command

STOP

STOP FOR TEXT

Enable/Disable database for text

Common

Common

Create Index for text

Similar*

Similar*

Drop Index for text

Common

Common

Alter Index for text

Similar*

Similar*

Boolean Operators

&, |, NOT

AND (&&), OR (||), NOT


(-)

Stored Procedure search and SQL Table-Valued function


search

Yes

No

Stored Procedure Admin commands

No

Yes
41

Migration from NSE to DB2 Text


Search

Prepare for application migration


Prepare for Index migration

Start theIBM Text Search Server for DB2.


Enable the database for DB2 Text Search in addition to
Net Search Extender.

Switch the application version and activate new


DB2 Text Search index
Test
After validating that DB2 Text Search works properly
as the primary text indexes, drop the Net Search
Extender text indexes
42

DB2 Text Search in IBM Content


Manager
Content Manager attributes, resource items and
documents are text-searchable.
Text search is available from IBM Content Manager
Enterprise Edition V8.5 onwards (earlier CM releases
supported NSE only)
Setup as part of Library Server configuration
Define text search options from CM System Admin
console
Item Type
Attribute

Also can be setup manually from DB2 server using db2ts


commands (enable database, create index, update index)
The ICMADMIN.ICMSTTEXTINDEXCONF table contains text
search information of each text searchable column.
43

DB2 Text Search in IBM Content


Manager
Update Index process invokes the ICMConstrRef user
defined function and ICMDCTOR document constructor
plug-in (equivalent to NSE ICMFetchFilter).
Doc Constructor plug-in retrieves and filters content
directly from the resource manager application (instead
of through DB2).
Instead of indexing a column directly, the system uses a
reference to the object's location on a resource manager.
Text resource item type view (ICMPARTS) with attribute
TIEREF
SELECT T.* FROM
( SELECT DISTINCT DOC_CAT_08_1.ITEMID, DOC_CAT_08_1.COMPONENTID,
DOC_CAT_08_1.VERSIONID,
1254 AS COMPONENTTYPEID, 1090 AS ITEMTYPEID
FROM ICMUT09954001 DOC_CAT_08_1, ICMUT09955001 ICMParts1090_5
WHERE (((((DOC_CAT_08_1.COMPONENTID = ICMParts1090_5.PARENTCOMPID) AND
(DOC_CAT_08_1.VERSIONID = ICMParts1090_5.VERSIONID)) AND
(contains(ICMParts1090_5.TIEREF, '"IMPULSE" ') = 1)) AND

44

Tips
and
Troubleshooting

45

Search Performance
Try to avoid multiple CONTAINS clauses in the same SQL.
Always use CONTAINS clause when using SCORE clause.

Use the same clause within CONTAINS that used for SCORE

Use RESULTLIMIT clause cautiously


Use queryExpansionLimit configuration parameter to set
the wildcard expansion limit.
Use VARCHAR data type when possible because CLOB or
LONG types can slow text indexing down.
Check the access path of the SQL and investigate if slow:
SELECT T.* FROM
( SELECT DISTINCT DOC_CAT_08_1.ITEMID, DOC_CAT_08_1.COMPONENTID, DOC_CAT_08_1.VERSIONID,
1254 AS COMPONENTTYPEID, 1090 AS ITEMTYPEID
FROM ICMUT09954001 DOC_CAT_08_1, ICMUT09955001 ICMParts1090_5
WHERE (((((DOC_CAT_08_1.COMPONENTID = ICMParts1090_5.PARENTCOMPID) AND
(DOC_CAT_08_1.VERSIONID = ICMParts1090_5.VERSIONID)) AND
(contains(ICMParts1090_5.TIEREF, '"IMPULSE" ') = 1)) AND
(DOC_CAT_08_1.ATTR0000001052 <> 'Deleted')) AND.
46

DB2 Text Search Troubleshooting


If expected data is not available in text search results,
check that the index update was executed:

Verify that the environment variable is set and the database is


active.
Check the event table for error messages.
For incremental updates, determine whether the number of entries
in the staging tables exceeds the specified update minimum.
Verify the schedule task in SYSTOOLS.ADMIN_TASK_LIST and the
task status in SYSTOOLS.ADMIN_TASK_STATUS.

Use UDF trace file in the path defined by the CM Library


Server UDFTRACEFILENAME parameter to view and
correct indexing errors related to the ICMConstrRef UDF.
Each index update failure is recorded in the DB2 Text
Search event table for the index being automatic
incrementally updated.
Use the document constructor log file (cmdctor.log) to
findContent Managermessages that are related to text

47

DB2 Text Search Troubleshooting


Insufficient memory error message while updating a
large number of documents - decrease
thedocumentqueueresultsizevalue in
sysibmts.tsdefaults
Monitor input and output queues in the startup script by
starting the DB2 Text Search server with themonitorQueuesflag to resolve any performance issues.
Monitoring information stored in
InputQueueSizes.csvand OutputQueueSizes.csvfiles
located in theText Search logsdirectory.
adminTool versioncommand to generate diagnostics
log.
adminTool configureTrace -trace on command to enable
tracing.
DB2 trace facility db2trc

48

Correcting Text Index failure in


Content Manager

Re-indexing documents which failed to index


earlier
Ensure RM and the Web server are available
Find the component type id (xxxx) of those
documents
db2 update ICMUT0xxxx001 set TIEREF=TIEREF
Update the text index manually

49

Summary
DB2 Text Search is the only
strategic/recommended text search solution at
present, but NSE (deprecated) is still available
as per the latest DB2 LUW version.
Used for searching large amounts of data stored
in text columns in tables of DB2 database and
in IBM Content Manager documents.
Different search techniques and methods
available.
Stored Procedures for Text Search management
and administration.
Most popular use of NSE and Text Search are in

50

Questions?
Thank You

51

Somu
Chakrabarty
Gamesys Limited
[email protected]
DB2 Text Search - In Action!
Session E8

Please fill out your session


evaluation before leaving!

You might also like