DB2 Text Search
DB2 Text Search
Somu Chakrabarty
Gamesys Limited
Session Code: E8
Tuesday, 17 November 2015 - Time 15:15 16:15
Platform: DB2 for LUW
Objectives
Introduction to free form text search options in
DB2 for LUW
Architecture, Installation and Configuration of
DB2 Text Search
Features and Properties
DB2 Text Search usage scenarios
Comparison with NSE and Migration from
existing NSE Text Index
DB2 Text Search used in IBM Content Manager
Tips and Troubleshooting of DB2 Text Search
2
Introduction
3
Introduction - Continued
Basics of DB2 Text Search
Introduction - Continued
Basics of DB2 Text Search Continued
Text indexes offer significant performance benefits
compared to SQL 'LIKE' operator.
DB2 Text Search searches using atext search index
file. A text search index consists of significant
terms/words that are extracted from the text
documents.
Scoring and Contains functions.
Documents can be unformatted plain text, rich text,
formatted structured text such as XML, HTML or
proprietary document formats such as MS Office Doc,
PDF).
SQL, SQL/XML, and XQuery support.
Linguistic processing in all supported languages with
optional Synonyms definitions.
Text indexes are stored on a file system, not in a
Integrate
d Text
Search
server
Standalone Text
Search
server
Architecture
8
10
Installation
and
Configuration
12
13
14
15
Features
and
Properties
16
Database-level views:
SYSIBMTS.TSDEFAULTS
SYSIBMTS.TSLOCKS
SYSIBMTS.TSSERVERS
Text-index-level views:
SYSIBMTS.TSINDEXES
SYSIBMTS.TSCONFIGURATION
SYSIBMTS.TSCOLLECTIONNAMES
SYSIBMTS.TSEVENT_nnnnnn
SYSIBMTS.TSSTAGING_nnnnnn
17
18
the
the
the
the
A Staging table
An optional Auxiliary Staging table
An Event table
Trigger on the base text table
19
20
Usage
Scenarios
21
22
Update
db2 CALL
SYSPROC.SYSTS_UPDATE(MYSCHEMA,MYTITLEIDX,,
'en_US', ?)
db2ts UPDATE INDEX myschema.mytitleidx FOR TEXT
call sysproc.systs_admin_cmd('update index
myschema.mytitleidx for text', 'en_US', ?)
23
25
DBADMIN
Non-DBADMIN
SYSTS_MGR role
26
27
DB2
The database table that owns the text-search index
The staging table (and event tables) for the text-search index
File system
Collection data for the text-search indices
Text-search index metadata
Per database
File system
Scheduler data
Text-search server metadata
configuration (including synonym dictionary and log)
28
29
31
32
Xpath
Find author, year and title of all books containing the text word
range in the story field of bookinfo
SELECT author, year, SUBSTR(title,1,30) as title FROM books WHERE
CONTAINS(bookinfo,
@xpath:''/bookinfo/story [. contains("range")]''') = 1
XQuery
Find author of all books containing the text word range in the story
field of bookinfo
xquery db2-fn:xmlcolumncontains( 'BOOKS.BOOKINFO','@xpath:''/bookinfo/story[.
contains("range")]''')/bookinfo/author
33
The
The
The
The
The
Configuration tool
Administration tool
Synonym tool
Stop Word tool
Log Formatter tool
34
35
36
Role
Operation
SYSTS_ADM
SYSTS_MGR
SYSTS_USR
37
38
NSE and
Text Search
Comparison
39
NSESearch
is a plug-in application for DB2 and must be installed separately, comes as a separate
Text
install binary.
If a database is already enabled for Net Search Extender, and you want to use Text Search in
that database, you can use the index coexistence feature to query the database.
Command executable - db2ts for Text Search and db2text for NSE
Main comparative features are given in the below table:
Function/Feature
NSE
DB2 Text
Search
Database Partitioning
Yes
Yes
Range-Partitioned table
Yes
Yes
Caching
No
No
Synonym dictionary
Yes
Yes
Yes
Yes
Linguistic processing
Yes
(English
Only)
Yes
Yes*
Yes*
Yes
No
Document Model
Yes
No
Number of matches
Yes
No
40
NSE
DB2 Text
Search
Highlights
No
No
Yes
Yes
Stop-word processing
Yes
Yes
Attribute Search
Yes
No
Start command
START
Stop command
STOP
Common
Common
Similar*
Similar*
Common
Common
Similar*
Similar*
Boolean Operators
&, |, NOT
Yes
No
No
Yes
41
44
Tips
and
Troubleshooting
45
Search Performance
Try to avoid multiple CONTAINS clauses in the same SQL.
Always use CONTAINS clause when using SCORE clause.
Use the same clause within CONTAINS that used for SCORE
47
48
49
Summary
DB2 Text Search is the only
strategic/recommended text search solution at
present, but NSE (deprecated) is still available
as per the latest DB2 LUW version.
Used for searching large amounts of data stored
in text columns in tables of DB2 database and
in IBM Content Manager documents.
Different search techniques and methods
available.
Stored Procedures for Text Search management
and administration.
Most popular use of NSE and Text Search are in
50
Questions?
Thank You
51
Somu
Chakrabarty
Gamesys Limited
[email protected]
DB2 Text Search - In Action!
Session E8