OpenText Documentum XPlore 1.6 Administration and Development Guide
OpenText Documentum XPlore 1.6 Administration and Development Guide
xPlore
Version 1.6
Intended Audience
This guide contains information for xPlore administrators who configure xPlore and Java developers
who customize xPlore:
• Configuration is defined for support purposes as changing an XML file or an administration
setting in the UI.
• Customization is defined for support purposes as using xPlore APIs to customize indexing and
search. The xPlore SDK is a separate download that supports customization.
You must be familiar with the installation guide, which describes the initial configuration of the
xPlore environment. When Documentum functionality is discussed, this guide assumes familiarity
with Documentum Content Server administration.
Revision history
The following changes have been made to this document.
Features
Documentum xPlore is a multi-instance, scalable, high-performance, full-text index server that can be
configured for high availability and disaster recovery.
The xPlore architecture is designed with the following principles:
• Uses standards as much as possible, like XQuery
• Uses open source tools and libraries, like Lucene
• Supports virtualization, with accompanying lower total cost of ownership.
• Enterprise readiness: High availability, backup and restore, analytics, performance tuning, reports,
diagnostics and troubleshooting, administration GUI, and configuration and customization points.
• Integration readiness: Fully supports XML namespaces—Indexing, storing, and querying XML
documents with namespaces, configuring index paths that contain namespaces, and executing
XQueries with namespace mappings.
Indexing features
Collection topography: xPlore supports creating collections online, and collections can span multiple
file systems.
Transactional updates and purges: xPlore supports transactional updates and purges of indexes.
Multithreaded insertion into indexes: xPlore ingestion through multiple threads supports vertical
scaling on the same host.
Dynamic allocation and deallocation of capacity: for periods of high ingestion, you can add a CPS
instance deployed on a higher performance machine. You can also move a collection to another
node for better performance.
Temporary high query load: For high query load, like a legal investigation, add an xPlore instance for
the search service and bind collections to it in read-only mode.
Growing ingestion or query load: If your ingestion or query load increases due to growing business,
you can add instances as needed.
Extensible indexing pipeline using the open-source UIMA framework.
Configurable stop words and special characters.
Search features
Case sensitivity: xPlore queries are lower-cased (rendered case-insensitive).
Full-text queries: To query metadata, set up a specific index on the metadata.
Faceted search: Facets in xPlore are computed over the entire result set or over a configurable number
of results.
Security evaluation: When a user performs a search, permissions are evaluated for each result. Security
can be evaluated in the xPlore full-text engine before results are returned to Content Server, resulting
in faster query results. This feature is turned on by default and can be configured or turned off.
Native XQuery syntax: The xPlore full-text engine supports XQuery syntax.
Thesaurus search to expand query terms.
Fuzzy search finds misspelled words or letter reversals.
Boost specific metadata in search results.
Extensive testing and validation of search on supported languages.
Administration features
Multiple instance configuration and management.
Reports on ingestion metrics and errors, search performance and errors, and user activity.
Collections management: Creating, configuring, deleting, binding, routing, rebuilding, querying.
Command-line interface for automating data management (such as final merge), backup, and restore.
Limitations
ACLs and aspects are not searchable by default
ACLs and aspects are not searchable by default, to protect security. You can reverse the default by
editing indexserverconfig.xml. Set full-text-search to true in the subpath definition for acl_name and
r_aspect_name and then rebuild indexes of your data collections.
Batch failure
Indexing requests are processed in batches. When one request in a batch fails when the index is
written to xDB, the entire batch fails.
Lemmatization
• xPlore supports lemmatization, but you cannot configure the parts of speech that are lemmatized.
• The part of speech for a word can be misidentified when there is not enough context. Workaround:
Enable alternative lemmatization if you have disabled it (see Configuring indexing lemmatization,
page 113).
• Punctuation at the end of the sentence is included in the lemmatization of the last word. For example,
a phrase Mary likes swimming and dancing is lemmatized differently depending on whether there is a
period at the end. Without the period, dancing is identified as a verb with the lemma dance. With the
period, it is identified as a noun with the lemma dancing. A search for the verb dance does not find
the document when the word is at the end of the sentence. The likelihood of errors in Part-Of-Speech
(POS) tagging increases with sentence length. Workaround: Enable alternate lemmatization.
Multilingual support
By default, documents with multiple languages are indexed by only one language. In the latest version
of xPlore, a new feature can be implemented to support multiple languages in one document. Use the
following configuration at the domain level to enable the feature:
<property value="true" name="split-text-in-mixed-lang"/><!--Enable
splitting text for content-->
<property value="true" name="index-metadata-in-batch"/>
After implementation, different language units in the same document can be analyzed by different
languages.
Phrase searches
Search fails for parts of common phrases. A common phrase like because of, a good many, or status
quo is tokenized as a phrase and not as individual words. A search for a word in the phrase like
because fails. Use the following workaround:
In the linguistic_processor with name rlp, set force_tokenize_on_whitespace to true.
Chinese
Space in query causes incorrect tokenization
A space within a Chinese term is treated in DQL as white space. A search for the string fails. For
example, the term 中国 近代 is treated as 中国 AND 近代. A search for 中国近代 fails.
Dictionary must be customized for Chinese name and place search
For Chinese documents, the names of persons and places cannot be searched. To be found, they must
be added to the Chinese dictionary in xPlore. See Adding dictionaries to CPS, page 130. You can
also use the following workaround:
In the linguistic_processor with name rlp, set generate_cjk_components to true and enable query
with components for CJK.
Administration differences
xPlore has an administration console. FAST does not. Many features in xPlore are configurable
through xPlore administrator. These features were not configurable for FAST. Additionally,
administrative tasks are exposed through Java APIs.
Ports required: During xPlore instance configuration, the installer prompts for the HTTP port for the
Wildfly instance (base port). The installer validates that the next 100 consecutive ports are available.
During index agent configuration, the installer prompts for the HTTP port for index agent Wildfly
instance and validates that the next 20 consecutive ports are available. FAST used 4000 ports.
High availability: xPlore supports N+1, active/passive with clusters, and active/active shared data
configurations. FAST supports only active/active. xPlore supports spare indexing instances that
are activated when another instance fails. The Documentum xPlore Installation Guide describes
high availability options for xPlore.
Disaster recovery: xPlore supports online backup. FAST supports only offline (cold) backup.
Storage technology: xPlore supports SAN and NAS. FAST supports SAN only.
Virtualization: xPlore runs in VMware environments. FAST does not.
64-bit address space: 64-bit systems are supported in xPlore but not in FAST.
xPlore requires less temporary disk space than FAST. xPlore requires twice the index space used by
all collections, in addition to the index. This space is used for merges and optimizations. FAST
requires 3.5 times the space.
Indexing differences
Back up and restore: xPlore supports warm backups.
High availability: xPlore automatically restarts content processing after a CPS crash. After a VM
crash, the xPlore watchdog sends an email notification.
Transactional updates and purges: xPlore supports transactional updates and purges. FAST does not.
Collection topography: xPlore supports creating collections online, and collections can span multiple
file systems. FAST does not support these features.
Lemmatization: FAST supports configuration for which parts of speech are lemmatized. In xPlore,
lemmatization is enabled or disabled. You can configure lemmatization for specific Documentum
attribute values.
Search differences
One-box search: Searches from the Webtop client default to ANDed query terms in xPlore.
Query a specific collection: Targeted queries are supported in xPlore but not FAST.
Folder descend: Queries are optimized in xPlore but not in FAST.
Results ranking: FAST and xPlore use different ranking algorithms.
Excluding from index: xPlore allows you to configure non-indexed metadata to save disk space
and improve ingestion and search performance. With this configuration, the number of hits differs
between FAST and xPlore queries on the non-indexed content. For example, if xPlore does not index
docbase_id, a full-text search on "256" returns no hits in xPlore. The search returns all indexed
documents for repository whose ID is 256.
Security evaluation: Security is evaluated by default in the xPlore full-text engine before results are
returned to Content Server, resulting in faster query results. FAST returns results to the Content Server,
resulting in many hits that the user is not able to view.
Underprivileged user queries: Optimized in xPlore but not in FAST.
Native XQuery syntax: Supported by xPlore.
Facets: Facets are limited to 350 hits in FAST, but xPlore supports many more hits.
Special characters: Special character lists are configurable. The default in xPlore differs from FAST
when terms such as email addresses or contractions are tokenized. For example, in FAST, an email
address is split up into separate tokens with the period and @ as boundaries. However, in xPlore,
only the @ serves as the boundary, since the period is considered a context character for part of
speech identification.
Architectural overview
xPlore provides query and indexing services that can be integrated into external content sources such
as the Documentum content management system. External content source clients like Webtop or
CenterStage, or custom Documentum DFC clients, can send indexing requests to xPlore.
Each document source is configured as a domain in xPlore. You can set up domains using xPlore
administrator. For Documentum environments, the Documentum index agent creates a domain for
each repository and a default collection within that domain.
Documents are provided in an XML representation to xPlore for indexing through the indexing APIs.
In a Documentum environment, the Documentum index agent prepares an XML representation of
each document. The document is assigned to a category, and each category corresponds to one or
more collections as defined in xPlore.
xPlore instances are web application instances that reside on application servers. When an xPlore
instance receives an indexing request, it uses the document category to determine what is tokenized
and saved to the index. A local or remote instance of the content processing service (CPS) fetches
the content. CPS detects the format of a document, extracts indexable content from the document,
and then performs linguistic analysis to generate tokens to be saved in index. The tokens are used
for building a full-text index.
xPlore manages the full-text index. An external Apache Lucene full-text engine is embedded into
the XML database (xDB). xPlore tracks indexing and update requests, recording the location of
indexed content in xDB. xDB provides transactional updates to the Lucene index. Indexes are still
searchable during updates.
When an instance receives a query request, the request is processed on all included collections, then
the assembled query results are returned.
xPlore provides a web-based administration console.
Physical architecture
The xPlore index service and search service are deployed as a WAR file to a wildfly application server
that is included in the xPlore installer. xPlore administrator and online help are installed as war files
in the same wildfly application server. The index is stored in the storage location that was selected
during configuration of xPlore.
• xPlore disk areas, page 18
• xPlore instances, page 19
• xPlore disk areas, page 20
• xPlore disk areas, page 21
xPlore instances
An xPlore instance is a web application instance (WAR file) that resides on an application server. You
can have multiple instances on the same host (vertical scaling), although it is more common to have
one xPlore instance per host (horizontal scaling). Make sure those instances are installed with unique
names. You create an instance by running the xPlore installer.
The first instance that you install is the primary instance. You can add secondary instances after
you have installed the primary instance. The primary instance must be running when you install
a secondary instance.
Adding or deleting an instance
To add an instance to the xPlore system, run the xPlore configurator script
(configDsearch.bat/configDsearch.sh under setup folder). If an xPlore instance exists on the same host,
select a different port for the new instance, because the default port is already in use.
To delete an instance from the xPlore system, use the xPlore configurator script. Shut down the
instance before you delete it.
You manage instances in xPlore administrator. Click Instances in the left panel to see a list of
instances in the right content pane. You see following instance information:
• Host information: Host, operating system, and CPU architecture.
• xPlore information: xDB version, instance version, instance type, and state.
• JVM information: JVM version, active thread count, loaded class count, JVM memory usage,
process ID, and thread dump.
To view JVM memory usage information, click Memory and a pie chart will appear showing how
much JVM memory has been used.
Click Thread Dump to view full Java thread dump in a pop-up window. You can click Save as in the
window to save the thread dump data for later analysis.
An instance can have one or more of the following features enabled:
• Content processing service (CPS)
• Indexing service
• Search service
• xPlore Administrator (includes analytics, instance, and data management services)
• Spare: A spare instance can be manually activated to take over for a disabled or stopped instance.
See Replacing a failed instance with a spare, page 34.
You manage an instance by selecting the instance in the left panel. Collections that are bound to the
instance are listed on the right. Click a collection to go to the Data Management view of the collection.
The application server instance name for each xPlore instance is recorded in indexserverconfig.xml. If
you change the name of the wildfly instance, change the value of the attribute appserver-instance-name
on the node element for that instance. This attribute is used for registering and unregistering instances.
Back up the xPlore federation after you change this file.
with the tokens. When documents are updated or deleted, changes to the index are propagated. When
xPlore supplies XQuery expressions to xDB, which passes them to the Lucene index.
xDB manages parallel dispatching of queries to more than one Lucene index when parallel queries
are enabled. For example, if you have set up multiple collections on different storage locations, you
can query each collection in parallel.
An xDB library is stored on a data store. If you install more than one instance of xPlore, the storage
locations must be accessible by all instances. The xDB libraries can reside on separate data stores,
SAN or NAS. The locations are configurable in xPlore Administrator. If you do not have heavy
performance requirements, you can leave all libraries on the same data store. See Creating a storage
location, page 179.
Indexes
xDB has several possible index structures that are queried using XQuery. The Lucene index is
modeled as a multi-path index (a type of composite index) in xDB. The Lucene index services both
value-based and full-text probes of the index.
Covering indexes are also supported. When the query needs values, they are pulled from the index and
not from the data pages. Covering indexes are used for security evaluation and facet computation.
You can configure none, one, or multiple indexes on a collection. An explicit index is based on values
of XML elements, paths within the XML document, path-value combination, or full-text content. For
example, the following is an XQuery using a value indexed field:
/dmftdoc[dmftmetadata//object_name="foo"]
The following is an XQuery using a tokenized, full-text field:
/dmftdoc[dmftmetadata//object_name ftcontains 'foo']
Indexes are defined and configured in indexserverconfig.xml. For information on viewing and updating
this file, see Modifying indexserverconfig.xml, page 47.
Logical architecture
A domain contains indexes for one or more categories of documents. A category is logically
represented as one or more collections. Each collection contains indexes on the content and metadata.
When a document is indexed, it is assigned to a category or class of documents and indexed into one
of the category collections.
• Documentum domains and categories, page 24
• Mapping of domains to xDB, page 25
Domains
A domain is a separate, independent, logical grouping of collections within an xPlore deployment. For
example, a domain could contain the indexed contents of a single Documentum content repository.
Domains are defined in xPlore administrator in the data management screen. A domain can have
multiple data collections in addition to the default data collection.
The Documentum index agent creates a domain for the repository to which it connects. This domain
receives indexing requests from the Documentum index agent.
Categories
A category defines how a class of documents is indexed. All documents submitted for ingestion
must be in XML format. (For example, the Documentum index agent prepares an XML version for
Documentum repository indexing.) The category is defined in indexserverconfig.xml and managed
by xPlore. A category definition specifies the processing and semantics that is applied to an ingested
XML document. You can specify the XML elements that are used for language identification. You
can specify the elements that have compression, text extraction, tokenization, and storage of tokens.
You also specify the indexes that are defined on the category and the XML elements that are not
indexed. A collection belongs to one category.
Collections
A collection is a logical group of XML documents that is physically stored in an xDB detachable
library. A collection represents the most granular data management unit within xPlore. All documents
submitted for indexing are assigned to a collection. A collection generally contains one category of
documents. In a basic deployment, all documents in a domain are assigned to a single default collection.
A collection is bound to a specific instance in read-write state (index and search, index only, or update
and search). A collection can be bound to multiple instances in read-only state (search-only). In the
following figure, three collections (two hot and one cold) with their corresponding instances are shown.
The metrics and audit systems store information in collections in a domain named SystemData. You
can view this domain and collections in xPlore administrator. One metrics and one audit database is
defined. Each database has a subcollection for each xPlore instance.
Example
A document is submitted for indexing. The client indexing application, for example, Documentum
index agent, has not specified the target collection for the document. By default, the index agent picks
up one instance to process the indexing request according to the key of the request. When the instance
receives the request, it checks whether the object in the request has been indexed or not. If it has
already been indexed, the instance updates the object directly in the previous collection. Otherwise,
the predefined collection routing mechanism is applied when there are multiple collections. If you
provide a predefined routing class or the index agent provides collection hints, the default collection
routing mechanism is skipped.
Documentum categories
A document category defines the characteristics of XML documents that belong to that category and
their processing. All documents are sent to a specific collection based on the document category. For
example, xPlore pre-defines a category called dftxml that defines the indexes for Documentum use
cases. All Documentum indexable content and metadata are sent to this category.
The following Documentum categories are defined within the domain element in indexserverconfig.xml.
For information on viewing and updating this file, see Modifying indexserverconfig.xml, page 47.
• dftxml: XML representation of object metadata and content for full text indexing. To view the
dftxml representation using xPlore administrator, click the document in the collection view.
• acl: ACLs that defined in the repository are indexed so that security can be evaluated in the full-text
engine. See About security, page 55 for more information.
• group: Groups defined in the repository are indexed to evaluate security in the full-text engine.
Note: xPlore allows only one acl and one group collection.
Events in dmi_registry for the user dm_fulltext_index_user* generate queue items for indexing. The
following events are registered for dm_fulltext_index_user* to generate indexing events by default:
• dm_sysobject: dm_save, dm_checkin, dm_destroy, dm_saveasnew, dm_move_content
• dm_acl: dm_save, dm_destroy, dm_saveasnew
• dm_group: dm_save, dm_destroy
Reindexing
The index agent does not recreate all the queue items for reindexing. Instead, it creates a watermark
queue item (type dm_ftwatermark) to indicate the progress of reindexing. It picks up all the objects for
indexing in batches by running a query. The index agent updates the watermark as it completes each
batch. When the reindexing is completed, the watermark queue item is updated to 'done' status.
You can submit for reindexing one or all documents that failed indexing. In Documentum
Administrator, open Indexing Management > Index Queue. Choose Tools > Resubmit all failed
queue items, or select a queue item and choose Tools > Resubmit queue item.
• The Content Server queries the full-text indexes and returns query results to client applications.
• The xPlore server responds to full-text queries from Content Server.
If you want to manage another xPlore deployment and it is not available for selection in the login
dialog box, select Other to provide additional information and the xDB password.
Note: When you log in as a Content Server domain user, you must enter the domain in the login
dialog box.
2. In the navigation tree, click System Overview to get the status of each xPlore instance, and
click Global Configuration to configure system-wide settings. You can also navigate to the
Administration page to manage user access.
Role Access
ROLE_SUPERUSER All
ROLE_USER System Overview
ROLE_REPORT Diagnostic and Utilities/Reports
ROLE_THESAURUS Diagnostic and Utilities/Thesaurus
ROLE_DIAGNOSTIC Diagnostic and Utilities
ROLE_DM Data Management
ROLE_SERVICE Services, Instances
4. In the Users tab, click Create New User, enter a user name and select the password source, and
then select the group and deployment you created, or any others that provide appropriate access
permissions for the new user.
If you did not stop secondary instances, they report a failed connection to the primary instance when
you restart it.
• Index Service. See Document processing and indexing service configuration parameters,
page 360.
• Search Service. See Search service configuration parameters, page 364.
• Logging. See Configuring logging, page 317.
• Engine. Configure incremental backups. See Performing a native xDB backup in xPlore
Administrator, page 190.
• Auditing. See Auditing queries, page 264, Troubleshooting data management, page 183, and
Configuring the audit record, page 41.
Managing instances
Configuring an instance
You can configure the indexing service, search service, or content processing service for a secondary
instance. Select the instance in xPlore administrator and then click Stop Instance.
Requirements
All instances in an xPlore deployment must have their host clocks synchronized to the primary
xPlore instance host.
• xPlore_home/dsearch/xhive/admin/xdb.bat (windows)
• xPlore_home/dsearch/xhive/admin/xdb (linux)
• xPlore_home/dsearch/xhive/admin/deletedocs.properties
• xPlore_home/dsearch/xhive/admin/query.properties
• xPlore_home/setup/indexagent/tools/ftintegrity.bat (windows)
• xPlore_home/setup/indexagent/tools/ftintegrity.sh (linux)
• xPlore_home/setup/indexagent/tools/aclreplication.bat (windows)
• xPlore_home/setup/indexagent/tools/aclreplication.sh (linux)
• xPlore_home/wildfly9.0.1/server/DctmServer_${IndexAgent_name}/
deployments/IndexAgent.war/WEB-INF/classes/indexagent.xml
• xPlore_home/watchdog/config/dsearch-watchdog-config.xml
• xPlore_home/wildfly9.0.1/server/DctmServer_${xPloreInstance_name}/
deployments/dsearch.war/WEB-INF/classes/indexserver-bootstrap.properties
• xPlore_home/wildfly9.0.1/server/DctmServer_${xPloreInstance_name}/
deployments/dsearch.war/WEB-INF/classes/xdb.properties
If the IP address is changed, change the value of IP in xPlore _home/config/XhiveDatabase.bootstrap.
2. Primary instance only: Use iAPI to change the parameters for the host name and port in the
dm_ftengine_config object. This change takes effect when you restart the repository.
a. Run the following iAPI command:
retrieve,c,dm_ftengine_config
r_object_id='080a0d6880000d0d'
c. Enter your new xPlore port (dsearch_qrserver_port) at the SET command line. If the port was
returned as the second parameter, set the index to 3 as shown in the following example:
retrieve,c,dm_ftengine_config
set,c,l,param_value[3]
SET>new_port
save,c,l
d. Enter your new xPlore host name (dsearch_qrserver_host) at the SET command line. For
example
retrieve,c,dm_ftengine_config
set,c,l,param_value[5]
SET>new_hostname
save,c,l
If the index agent is installed on the same host of xPlore, make sure dsearch_config_host
and dsearch_config_port are also modified.
3. Restart the Content Server and xPlore instances.
You can install a spare instance using the xPlore installer. When you install a spare instance, the data,
index, and log directories must all be accessible to the primary instance. Use shared storage for the
spare. When you activate the spare to take over a failed instance, xPlore recovers failed data using the
transaction log.
You cannot change an active instance into a spare instance.
Use xPlore administrator to activate a spare to replace a failed or stopped secondary instance. If you
are replacing a primary instance, see Replacing a failed primary instance, page 35.
1. Stop the failed instance.
2. Open xPlore administrator and verify that the spare instance is running.
3. Select the spare instance. Click Activate Spare Instance.
4. Choose the instance to replace.
When xPlore administrator reports success, the spare instance is renamed in the UI with the replaced
instance name. When you activate a spare to replace another instance, the spare takes on the
identity of the old instance. For example, if you activated DSearchSpare to replace DSearchNode3,
the spare instance becomes DSearchNode3. The old instance can no longer be used for ingestion or
queries. The failed instance is renamed with Failed appended, for example, DSearchNode3Failed.
The activated instance is not registered with the watchdog service. To register it for watchdog
notifications, edit the configuration file dsearch-watchdog-config.xml. This file is located in
xplore_home/watchdog/config.
1. Copy and paste an existing watchdog-config element for an active instance.
2. Edit the following properties for the activated spare in your copied watchdog-config element:
watchdog-config[@host-name]
watchdog-config/application-config{@instance-name]
watchdog-config/application-config/properties/property[
@name="application_url" value]
watchdog-config/application-config/tasks/task[@category="process-control" id]
3. Restart the watchdog service (Windows) or run the watchdog script (Linux).
For information on changing a failed instance to spare, see Changing a failed instance into a spare,
page 37.
d. Edit indexserver-bootstrap.properties in all other xPlore instances and change the value of
xhive-connection-string to connect to the new primary instance.
7. Edit xdb.properties in the directory WEB-INF/classes of the new primary instance.
a. Find the XHIVE_BOOTSTRAP entry and edit the URL to reflect the new primary instance
host name and port. (This bootstrap file is not the same as the indexserver bootstrap file.)
b. Change the host name to match your new primary instance host.
c. Change the port to match the port for the value of the attribute xdb-listener-port on the new
instance.
For example:
XHIVE_BOOTSTRAP=xhive://NewHost:9330
d. Edit xDB.properties in all other xPlore instances to reference the new primary instance.
8. Update xdb.bat in xplore_home/dsearch/xhive/admin. Your new values must match the values in
indexserverconfig.xml for the new primary instance.
• Change the path for XHIVE_HOME to the path to the new primary instance web application.
• Change ESS_HOST to the new host name.
• Change ESS_PORT to match the value of the port in the url attribute of the new primary
instance (in indexserverconfig.xml).
9. Start the xPlore primary instance, then start the secondary instances.
10. Update the index agent.
a. Shut down the index agent instance and modify indexagent.xml in
xplore_home/wildfly_version/server/DctmServer_Indexagent/deployments/IndexAgent.war/WEB-INF/classes.
b. Change parameter values for parameters that are defined in the element
indexer_plugin_config/generic_indexer/parameter_list/parameter.
• Change the parameter_value of the parameter dsearch_config_host to the new host name.
• Change the parameter_value of the parameter dsearch_config_port to the new port.
11. Update these properties of dm_ftengine_config on the Content Server: dsearch_qrserver_host,
dsearch_qrserver_port, dsearch_config_host, and dsearch_config_port. Use iAPI to change the
parameters for the host name and port in the dm_ftengine_config object. This change takes effect
when you restart the repository.
a. To find the port and host parameter index values for the next step, do the following iAPI
command:
retrieve,c,dm_ftengine_config
b. Use the object ID to get the parameters and values and their index positions. For example:
?,c,select param_name, param_value from dm_ftengine_config where
r_object_id='080a0d6880000d0d'
c. To set the port, enter your new port at the SET command line. If the port was returned as the
third parameter in step 3, substitute 3 for the parameter index. For example:
retrieve,c,dm_ftengine_config
set,c,l,param_value[3]
SET>new_port
save,c,l
d. To set the host name, enter your new host name at the SET command line:
retrieve,c,dm_ftengine_config
set,c,l,param_value[4]
SET>new_hostname
save,c,l
</task-info>
<timing-info>
<recurrence time-unit="minutes" frequency="1"/>
<start-date>2010-12-08T20:21:00.843Z</start-date>
<expiry-date>2299-12-08T20:21:00.843Z</expiry-date>
<max-response-timeout time-unit="seconds" wait-time="-1"/>
<max-retry-threshold>5</max-retry-threshold>
</timing-info>
</task-info>
</task>
</tasks>
</application-config>
You can configure the timing properties for the index agent just as for other components. In addition,
you can modify the following properties:
• docbase_password: Installation owner password in encrypted format. This value is set during the
index agent configuration process.
If you change the installation owner password, change this to the new encrypted password. To
encrypt the password, run xplore_home/watchdog/tools/encrypt-password.bat|sh.
• servlet_wait_time: Specify in milliseconds the amount of time the watchdog service waits before
checking the status of the index agent servlet again if the servlet is already in the process of shutting
down. Default: 3000.
• servlet_max_retry: The number of times the watchdog service waits and check the status of the
index agent servlet while the servlet is shutting down. Default: 5.
• action_on_servlet_if_stopped: Specify what action the watchdog service takes when it detects
that the index server is not running:
– notify: Send a notification email to the registered email address. This is the default value.
– restart: Automatically start the index agent servlet.
– none: Do nothing.
logged in audit records. If you still want to take advantage of the query warm-up feature, you need
to tune the following settings in query.properties:
– Remove or comment out this line:
exclude_users=unknown
– Set the number of logged queries you want to replay in the number_of_queries_per_user property.
• You cannot use user name as an effective filter criteria when viewing the following reports:
– User Activity Report
– Detailed Query Analysis
– QBS Activity Report by Id
– Query Counts by User
– Top N Slowest Queries
– User Activity Report
Connection refused
Indexing fails when one of the xPlore instances is down. The error in dsearch.log is like the following:
CONNECTION_FAILED: Connect to server at 10.32.112.235:9330 failed,
Original message:
Connection refused
Check the following causes for this issue:
• Instance is stopped: A query that hits a document in a collection bound to the stopped instance
fails. For example, you create collection2 and upload a file to it. You set the mode for collection2
to search_only and bind the collection to instance2 and instance3. Stop instance2 and query for
a term in the uploaded file.
• The xPlore host name has changed: If you have to change the xPlore host name, do the following:
1. Update indexserverconfig.xml with the new value of the URL attribute on the node element. For
information on viewing and updating this file, see Modifying indexserverconfig.xml, page 47.
2. Change the WildFly startup (script or service) so that it starts correctly. If you run a stop script,
run as the same administrator user who started the instance.
3. Modify indexserver-bootstrap.properties under
xplore_home/wildfly_version/server/DctmServer_serverName/deployments/dsearch.war/WEB-INF/classes
to update xhive-connection-string with the new host name.
3. Edit indexserverconfig.xml to remove binding elements from the collection that has the issue.
For information on viewing and updating this file, see Modifying indexserverconfig.xml, page 47.
4. Restart xPlore instances.
• I/O error indexing a large collection: Switch to the 64-bit version of xPlore and use 4+ GB of
memory when a single collection has more than 5 million documents.
• I/O error during index merge: Documents are added to small Lucene indexes within a single
collection. These indexes are merged into a larger final index to help query response time. The final
merge stage can require large amounts of memory. If memory is insufficient, the merge process fails
and corrupts the index. Allocate 4 GB of memory or more to the JVM.
com.xhive.error.XhiveException: IO_ERROR:
Failure while merging external indexes, Original message:
Insufficient system resources exist to complete the requested service
To fix a corrupted index, see Repairing a corrupted index, page 195. To delete a corrupted domain, see
Delete a corrupted domain, page 169.
Error on startup
Non-ASCII characters in indexserverconfig.xml can cause startup to fail.
If you edit indexserverconfig.xml using a simple text editor like Notepad, non-ASCII characters such
as ü are saved in native (OS) encoding. For example, Windows uses ISO-8859-1. xPlore uses UTF-8
encoding, which results in unexpected text errors.
Use an XML editor to edit the file, and validate your changes using the xplore.bat (Windows) or
xplore.sh (Linux) script in xplore_home/dsearch/xhive/admin. Restart the xPlore instances.
<subsystem xmlns="urn:wildfly:domain:undertow:2.0">
<buffer-cache name="default"/>
<server name="default-server">
<http-listener name="default" socket-binding="http" redirect-socket=
"https" max-post-size="0"/>
<host name="default-host" alias="localhost">
<location name="/" handler="welcome-content"/>
<filter-ref name="server-header"/>
<filter-ref name="x-powered-by-header"/>
</host>
</server>
<servlet-container name="default">
<jsp-config/>
<persistent-sessions path="d:/temp/session-dump"/>
<websockets/>
</servlet-container>
<handlers>
<file name="welcome-content" path="${wildfly.home.dir}/welcome-content"/>
</handlers>
<filters>
<response-header name="server-header" header-name="Server" header-value=
"WildFly/9"/>
<response-header name="x-powered-by-header" header-name="X-Powered-By"
header-value="Undertow/1"/>
</filters>
</subsystem>
Remove the persistent-sessions element.
3. Start the index agent.
Modifying indexserverconfig.xml
Some tasks are not available in xPlore administrator. These rarely needed tasks require manual editing
of indexserverconfig.xml. This file is located in xplore_home/config on the primary instance. It is
loaded into xPlore memory during the bootstrap process, and it is maintained in parallel as a versioned
file in xDB. All changes to the file are saved into the xDB file at xPlore startup.
On Windows 2008, you cannot save the file with the same name, and the extension is not shown. By
default, when you save the file, it is given a .txt extension. Be sure to replace indexserverconfig.xml
with a file of the same name and extension.
Note: Do not edit this file in xDB, because the changes are not synchronized with xPlore.
1. Stop all instances in the xPlore federation.
2. Make your changes to indexserverconfig.xml on the primary instance using an XML editor.
Changes must be encoded in UTF-8. Do not use a simple text editor such as Notepad, which can
insert characters using the native OS encoding and cause validation to fail.
3. Set the configuration change check interval as the value of config-check-interval in milliseconds
on the root index-server-configuration element. The system will check for configuration changes
after this interval.
4. Validate your changes using the CLI validateConfigFile. From the command line, type the
following. Substitute your path to indexserverconfig.xml using a forward slash. Syntax:
xplore validateConfigFile path_to_config_file
For example:
xplore "validateConfigFile 'C:/xPlore/config/indexserverconfig.xml'"
./xplore.sh "validateConfigFile '/root/xPlore/config/indexserverconfig.xml'"
5. Back up the xPlore federation after you change this file.
6. Restart the xPlore system to see changes. The configuration file on the file system is compared on
startup to the one in database even if the revision number is the same.
When indexserverconfig.xml is malformed, xPlore cannot start.
Troubleshooting
You can view the content of each version using the xhadmin tool (see Debugging queries, page 277.
Drill down to the dsearchConfig library, click a version, and then click Text:
Customizations in indexserverconfig.xml
• Define and configure indexes for facets.
• Add and configure categories: Specifying the XML elements that have text extraction, tokenization,
and storage of tokens. Specify the indexes that are defined on the category and the XML elements
that are not indexed. Change the collection for a category.
• Configure system, indexing, and search metrics.
• Specify a custom routing-class for user-defined domains.
• Change the xDB listener port and admin RMI port.
• Turn off lemmatization.
• Lemmatize specific categories or element content.
• Configure indexing depth (leaf node).
• Change the xPlore host name and URL.
• Set the security cache size.
Changes to the following configurations require an index rebuild to take effect.
• Add or change special characters for CPS processing:
<search-config>
<properties>
<property value="en" name="query-default-locale"/>
...
<property value="true" name="query-exact-phrase-match">
<properties>
</search-config>
Administration APIs
The xPlore Admin API supports all xPlore administrative functions. The Admin API provides you
with full control of xPlore and its components.
Note: Administration APIs are not officially supported. Contact the development team if you
encounter issues using them.
Each API is described in the javadocs. Index service APIs are available in the interface IFtAdminIndex
in the package com.emc.documentum.core.fulltext.client.admin.api.interfaces. This package is in
the SDK jar file dsearchadmin-api.jar.
System administration APIs are available in the interface IFtAdminSystem in the package
com.emc.documentum.core.fulltext.client.admin.api.interfaces in the SDK jar file dsearchadmin-api.jar.
Administration APIs are wrapped in a command-line interface tool (CLI). The syntax and CLIs are
described in the chapter “Automated Utilities (CLI).
Configuration APIs
Configuration APIs are available in the interface IFtAdminConfig in the package
com.emc.documentum.core.fulltext.client.admin.api.interfaces. This package is in the SDK jar file
dsearchadmin-api.jar.
• query-index-covering-values: The values specified in the path attribute are used for aggregate
queries. These values are pulled from the index and not from the data pages.
• query-facet-max-result-size: Sets the maximum number of results used to compute facet values. For
example, if query-facet-max-result-size=12, only 12 results are used to compute facets. If a query
has many facets, the number of results per facet is reduced accordingly. Default: 10000.
About security
xPlore does not have a security subsystem. Anyone with access to the xPlore host port can connect
to it. You must secure the xPlore environment using network security components such as a firewall
and restriction of network access. Secure the xPlore administrator port and open it only to specific
client hosts.
Passwords are encrypted with a FIPS PUB 140-2 compliant symmetric algorithm. Existing passwords
are encrypted with DES.
Documentum repository security is managed through individual and group permissions (ACLs). By
default, security is applied to results before they are returned to the Content Server (native xPlore
security), providing faster search results. xPlore security minimizes the result set that is returned
to the Content Server.
Content Server queues changes to ACLs and groups. The queue sometimes causes a delay between
changes in the Content Server and propagation of security to xPlore. If the index agent has not yet
processed a document for indexing or updated changes to a permission set, users cannot find the
document.
You can set up a separate index agent to handle changes to ACLs and groups. See Setting up index
agents for ACLs and groups, page 70.
4. Enter the following command to turn off xPlore native security. Note lowercase L in the set and
save commands:
retrieve,c,dm_ftengine_config
set,c,l,ftsearch_security_mode
0
save,c,l
reinit,c
You can manually populate or update the ACL and group information in xPlore. A similar job in
Content Server 6.7 and higher allows you to selectively replication ACLs and groups. The script
replicates all ACLs and groups. Use the job or script for the following use cases:
• You are testing Documentum indexing before migration.
• You use xPlore to index a repository that has no full-text system (no migration).
• Security in the index is out of synch with the repository from ftintegrity counts.
Note: To speed up security updates in the index, you can create a separate index agent for ACLs and
groups. See Setting up index agents for ACLs and groups, page 70.
1. Locate the script aclreplication_for_repositoryname.bat or .sh in
xplore_home/setup/indexagent/tools.
2. Edit the script before you run it. Locate the line beginning with “%JAVA_HOME%\bin\java”
(Windows) or “$JAVA_HOME/bin/java”. Set the repository name, repository user, password,
xPlore primary instance host, xPlore port, and xPlore domain (optional).
Check the console output for any errors or exceptions thrown. When you run the script, it prints the
status of each object it tried to replicate.
Alternatively, you can run the ACL replication job dm_FTACLReplication in Documentum
Administrator. (Do not confuse this job with the dm_ACLReplication job.) By default, the job
reports only the number of objects replicated. Setting the job argument verbose to true writes the
status of each object in job report. You can selectively replicate only dm_acl, dm_group.Note: The
dm_FTACLReplication job is set to inactive after a successful run, meaning that this job cannot be
scheduled to run periodically.
Argument Description
-acl_where_clause DQL where clause to retrieve dm_acl objects.
-group_where_clause DQL where clause to retrieve dm_group objects.
-max_object_count Number of dm_acl and dm_group objects to be
replicated. If not set, all objects are replicated.
-replication_option Valid values: dm_acl, dm_group or both (default).
-verbose Set to true to record replication status for each object
in the job report. Default: false.
-standby_ftengine_id HA active/active mode with multiple xPlore instances
only:
Specify an ID of a standby fulltext engine as the value
of the argument to retrieve xPlore information from
the dm_ftengine_config object where r_object_id is
the specified ID.
After you set this argument, you must restart the
Documentum Java Method Server for the change to
take effect.
Argument Description
NoteA non-standby fulltext engine ID will cause the
job to fail.
-ftEngineStandby Dual mode (FAST and xPlore on two Content Servers)
only: set this parameter to true.
-ftEngineStandby T
admin=new_admin_password
• The file PrimaryDsearch.properties in xplore_home/installinfo/instances/dsearch:
ess.instance.password.encrypted=new_adm_password
Repeat this step for all other xPlore instances, if any.
6. If you use CLI for backup and restore, edit the password in xplore.properties. This file is located in
xplore_home/dsearch/admin. Copy the encrypted password from indexserver-bootstrap.properties.
7. The xDB Administrator password is reset. You can now use the new password to log into xPlore
Administrator.
If you change the index agent installation owner password, in Windows you just have to change the
services entry and on Linux you do not have to change anything.
By default, xPlore accepts encrypted passwords, which means that you can log in to xPlore
Administrator by copying the encrypted admin password from indexserver-bootstrap.properties.
To disable login with encrypted password, edit the file indexserver-bootstrap.properties located in
xplore_home/wildfly_version/server/DctmServer_PrimaryDsarch/deployments/dsearch.war/WEB-INF/classes
and add the following:
accept-encrypted-password=false
4. If necessary, change the Groups-in cache cleanup interval by adding a property to the
security-filter-class properties. The default is 7200 sec (2 hours).
5. Validate your changes using the validation tool described in Modifying indexserverconfig.xml,
page 47.
Troubleshooting security
Viewing security in the log
Check dsearch.log using xPlore administrator. Choose an instance and click Logging. Click dsearch
log to view the following information:
• The XQuery expression. For example, search for the term default:
QueryID=PrimaryDsearch$f3087f7a-fb55-496a-bf0a-50fb1e688fa1,
query-locale=en,query-string=
declare option xhive:fts-analyzer-class '
com.emc.documentum.core.fulltext.indexserver.core.index.xhive.IndexServerAnalyzer';
declare option xhive:ignore-empty-fulltext-clauses 'true';
declare option xhive:index-paths-values "dmftmetadata
//owner_name,dmftsecurity/acl_name,dmftsecurity/acl_domain";
let $libs := collection('/TechPubsGlobal/dsearch/Data')
let $results := for $dm_doc score $s
in $libs/dmftdoc[(dmftmetadata//a_is_hidden = "false") and
(dmftversions/iscurrent = "true") and
(. ftcontains "test" with stemming using stop words default)]
order by $s descending
return $dm_doc return (for $dm_doc in subsequence($results,1,351)
return <r>
{for $attr in $dm_doc/dmftmetadata//*[local-name()=('
object_name','r_modify_date','r_object_id','r_object_type','
r_lock_owner','owner_name','r_link_cnt','r_is_virtual_doc','
r_content_size','a_content_type','i_is_reference','r_assembled_from_id','
r_has_frzn_assembly','a_compound_architecture','i_is_replica','r_policy_id','
subject','title')] return <attr name='{local-name($attr)}' type='
{$attr/@dmfttype}'>{string($attr)}</attr>}
{xhive:highlight(($dm_doc/dmftcontents/dmftcontent/
dmftcontentref,$dm_doc/dmftcustom))}
<attr name='score' type='dmdouble'>{string(dsearch:get-score($dm_doc))}
</attr></r>) is running
• Security filter applied and security statistics. For example:
Total-matching-owner-group-probes=0, MACL_ENABLED=true,
USE_GLOBAL_ACE_CACHE=true, HANDLE_OWNER_TIME=0, Total-ACL-index-probes=0,
USE_GLOBAL_ACL_CACHE=true, GET_DOC_INFO_TIME=0, Minimum-permit-level=2,
Total-Group-index-probes=1, Total-ACL-cache-hits=0,
HANDLE_ACL_INDEX_TIME=0, Owner-group-cache-fill-time=0}
In the example, the query returned 2200 hits to filter. 2000 were filtered out, returning 200 results to
the client application.
When INFO is enabled for security package, the following information is saved in dsearch.log:
• Minimum-permit-level. Returns the minimum permit level for results for the user. Levels: 0 = null |
1 = none | 2 = browse | 3 = read | 4 = relate | 5 = version | 6 = write | 7 = delete
• Filter-output: Total number of hits after security has filtered the results.
• Total-values-from-index-keys: Number of index hits on owner_name, acl_name and acl_domain
for the document.
• QueryID: Generated by xPlore to uniquely identify the query.
• Total-values-from-data-page: Number of hits on owner_name, acl_name and acl_domain for the
document retrieved from the data page.
• Filter-input: Number of results returned before security filtering.
• Total-group-index-probes: Keeps track of how many group indexes are probed. After cache is
populated for a user, if the same user executes another query, this value should be zero.
• Total-matching-group-probes: How many times the query added a group to the group-in cache.
• Total-ACL-index-probes: How many times the query added an ACL to the cache. If this value is
high, you can speed up queries by increasing the ACL cache size.
• Total-groups-in-cache-hits: Number of times the group-in cache contained a hit.
• Total-ACL-cache-hits: Number of times the ACL cache contained a hit.
• Group-cache-fill-time: How much time in milliseconds group caching was completed by retrieving
and iterating indexes from xDB or by running xQuery.
• Security-evaluation-time: How long in milliseconds it took to complete the whole security
evaluation.
• Total-owner-is-group-cache-hits: Number of hits for the ownerIsGroup cache.
• Total-owner-not-group-cache-hits: Number of hits for the ownerIsNotGroup cache.
• Total-owner-group-probes: Number of times xquery is run.
• Owner-group-cache-fill-time: Time required to run xquery and fill in the cache in milliseconds.
• Total-matching-owner-group-probes: Number of times the owner group query returns yes.
• Total-res-with-no-dmftdoc: Number of documents that are not dmftdoc.
• Total-GLOBAL-ACE-cache-hits: Number of hits for global ACE cache.
• Security-double-check-input: Number of documents before security double check.
• MACL_ENABLED: If MACL is enabled.
• USE_GLOBAL_ACE_CACHE: If global ACE cache is used.
• USE_GLOBAL_ACL_CACHE: If global ACL cache is used.
• GET_DOC_INFO_TIME: time spent getting document owner name and acl information in
milliseconds.
• HANDLE_OWNER_TIME: time spent checking if the user is document owner in milliseconds.
• HANDLE_ACL_CACHE_TIME: Time spent getting permission from ACL cache.
• HANDLE_ACL_INDEX_TIME: time spent calculating permission from ACL index.
Verify that the ACL IDs are registered for the events dm_save, dm_destroy, and dm_saveasnew. Verify
that the group IDs are registered for the events dm_save and dm_destroy, for example:
?,c,select registered_id,event from dmi_registry where user_name='
dm_fulltext_index_user'
Note: Documentum Administrator reports all queue items, including those that are subsequently
filtered out by the index agent.
The xPlore installer includes the index agent and its configurator.
A dm_ftindex_agent_config object represents the index agent in normal mode. This object is
configured by the index agent configurator. For more information about the index agent config object,
refer to the Documentum System Object Reference Guide.
Properties of the format object determine which formats are indexable. If the value of the can_index
property is set to true, the content file is indexable. By default, the first content file in a format whose
can_index property is set to true is indexed. Other renditions of the object are not indexed. If the
primary content of an object is not in an indexable format, you can ensure indexing by creating a
rendition in an indexable format. Use Documentum Content Transformation Services or third-party
client applications to create the rendition. For a full list of supported formats, see Oracle Outside In
documentation.
Some formats are not represented in the repository by a format object. Only the properties of objects in
that format are indexed. The formats.cvs file, which is located in DM_HOME/install/tools, contains a
complete list of supported mime_types and the formats with which they are associated. If a supported
mime_type has no format object, create a format object in the repository and map the supported
mime_type to the format in formats.cvs.
Documents are selected for indexing in the Content Server based on the following criteria:
• If a_full_text attribute is false, the content is not indexed. Metadata is indexed.
• If a_full_text attribute is true, content is indexed based on the can_index and format_class attributes
on the dm_format associated with the document:
1. If an object has multiple renditions and none of the renditions have a format_class value of
ft_always or ft_preferred, each rendition is examined starting with the primary rendition. The
first rendition for which can_index is true is indexed, and no other renditions are indexed.
2. If an object has a rendition whose format_class value is ft_preferred, each ft_preferred rendition
is examined in turn starting with the primary rendition. The first ft_preferred rendition that is
found is indexed, and no other renditions are indexed.
3. If an object has renditions with a format_class value of ft_always, those renditions are always
indexed.
Note: Index agent filters can override the settings of a_full_text and can_index. See Configuring
index agent filters, page 74.
Sample DQL to determine these attribute values for the format bmp:
?,c,select can_index, format_class from dm_format where name = 'bmp'
To find all formats that are indexed, use the following command from iAPI:
?,c,select name,can_index from dm_format
The dm_ftengine_config object has a repeating attribute ft_collection_id that references a collection
object of the type dm_fulltext_collection. Each ID points to a dm_fulltext_collection object. It is
reserved for use by Content Server client applications.
Syntax Description
FULLTEXT SUPPORT ADD ALL Defines all properties of the aspect for indexing.
FULLTEXT SUPPORT ADD property_list Defines for indexing only those aspect properties
listed in property_list.
FULLTEXT SUPPORT DROP ALL Stops indexing of all properties of the aspect.
FULLTEXT SUPPORT DROP property_list Stops indexing of those aspect properties listed in
property_list.
When you add or drop indexing for aspect properties, clean the DFC BOF cache for the changes to
take effect.
1. Stop the index agent.
2. On the index agent host, delete the directory for the DFC bof cache. The directory is set by
dfc.data.dir in dfc.properties. For example:
xplore_home\wildfly_version\server\DctmServer_Indexagent\data\Indexagent\cache\
content_server_version\bof\repository_name
3. Start the index agent.
Only new objects are affected. The index is not updated to add or drop aspect property values for
aspects attached to existing objects.
• Linux
indexagent_home/wildfly_version/server/startIndexagent.sh.
2. Start the index agent UI. Use your browser to start the index agent servlet.
http://host:port/IndexAgent/login_dss.jsp
Every index agent URL has the same URL ending: IndexAgent/login_dss.jsp. Only the port
and host differ.
• host is the DNS name of the machine on which you installed the index agent.
• port is the index agent port number that you specified during configuration (default: 9200).
3. In the login page, enter the user name and password for a valid repository user and optional
xPlore domain name.
4. Choose one of the following:
• Start Index Agent in Normal Mode: The index agent will index content that is added or
modified after you start.
• Start new reindexing operation: All content in the repository is indexed (migration mode) or
reindexed. Filters and custom routing are applied. Proceed to the next step in this task.
• Continue: If you had started to index this repository but had stopped, start indexing. The date
and time you stopped is displayed.
Viewing index agent details
Start the index agent and click Details. You see accumulated statistics since last index agent restart and
objects in the indexing queue. To refresh statistics, return to the previous screen and click Refresh,
then view Details again.
• Content Server must be run in SSL mode to make SSL handshake successful between Content
Server and the wildfly instance of the index agent.
• When anonymous SSL (default one, without any certificates) is used, an anonymous cipher (such as,
TLS_DH_anon_WITH_AES_128_CBC_SHA for AES-128) must be configured in stand-alone.xml
of IndexAgent's wildfly instance.
index_name : Repo_ftindex_01
...
Now use the retrieve and dump commands to get the object_name attribute of the
dm_ftindex_agent_config object. You use this attribute value in the start or stop script. For example:
retrieve,c,dm_ftindex_agent_config
...
0800277e80000e42
API> dump,c,l
...
USER ATTRIBUTES
object_name : Config13668VM0_9200_IndexAgent
Use the apply command to start or stop (shutdown) the index agent, and to view its current status.
Syntax:
apply,c,,FTINDEX_AGENT_ADMIN,NAME,S,<index_name of dm_fulltext_index>,
AGENT_INSTANCE_NAME,S,<object_name of dm_ftindex_agent_config>,ACTION,
S,start|shutdown|status
The following example starts one index agent:
apply,c,NULL,FTINDEX_AGENT_ADMIN,NAME,S,LH1_ftindex_01,AGENT_INSTANCE_NAME,
S,Config13668VM0_9200_IndexAgent,ACTION,S,start
To start or stop all index agents, replace the index agent name with all. For example:
apply,c,NULL,FTINDEX_AGENT_ADMIN,NAME,S,LH1_ftindex_01,
AGENT_INSTANCE_NAME,S,all,ACTION,S,shutdown
Follow with these commands to get the results:
API> next,c,qNumber
...
OK
API> dump,c,qNumber
Where Number is the number of execution times that starts at 0 for the first command execution and
increments by 1 with each execution.
Viewing the current index agent status returns one of the following:
• 0: The index agent is running.
• 100: The index agent has been shut down.
• 200: The index agent has a problem.
//Query definition
String query = "NULL,FTINDEX_AGENT_ADMIN,NAME,S," +
indexName + ",AGENT_INSTANCE_NAME,S,all,ACTION,S,shutdown";
q.setDQL(query);
try
{
IDfCollection col = q.execute(sess, IDfQuery.DF_APPLY);
}
catch (DfException e)
{
e.printStackTrace();
}
}
For startup, replace shutdown with start in the query definition.
Note:
• The user must have SysAdmin privileges or higher.
• The index agent instance name must be found in the indexagent.xml configuration file or the
dm_ftindex_agent_config object.
<parameter_list>
...
<parameter>
<parameter_name>index_type_mode</parameter_name>
<parameter_value>aclgroup</parameter_value>
</parameter>
</parameter_list>
</generic_indexer>
</indexer_plugin_config>
4. In the indexagent.xml for sysobjects (the original index agent), add a similar parameter set. Set the
value of parameter_name to index_type_mode, and set the value of parameter_value to sysobject.
5. Restart both index agents. (Use the scripts in indexagent_home/wildfly_version/server or the
Windows services.)
Note: For multi-instance xPlore, the temporary staging area for the index agent must be accessible
from all xPlore instances.
disable.xploreclient.timeout=true
3. Restart the index agent and the feature is turned on. The xPlore client will not resubmit the same
object for indexing until the previously submitted one has been processed.
c. Specify the repository superuser name as the value of the username attribute.
d. Specify the repository superuser password as the value of the password attribute.
For example:
<emc.install dar="C:\Downloads\tempIndexAgentDefaultFilters.dar"
docbase="DSS_LH1" username="Administrator" password="password" />
• Verify filter loading in the index agent log, which is located in the logs subdirectory of the
index agent WildFly deployment directory. In the following example, the FoldersToExclude
filter was loaded:
2010-06-09 10:49:14,693 INFO FileConfigReader [http-0.0.0.0-9820-1]
Filter FoldersToExclude Value:/Temp/Jobs,
/System/Sysadmin/Reports, /System/Sysadmin/Jobs,
6. Configure the filters in the index agent UI. See Configuring index agent filters, page 74.
Troubleshooting the index agent filters
To verify that the filters are installed, use the following iAPI command:
?,c,select primary_class from dmc_module where any a_interfaces = '
com.documentum.fc.indexagent.IDfCustomIndexFilter'
com.documentum.server.impl.fulltext.indexagent.filter.
defaultFolderFilterAction
com.documentum.server.impl.fulltext.indexagent.filter.
defaultTypeFilterAction
Open dfc.properties in the composerheadless package. This package is installed with Content
Server at DOCUMENTUM_HOME/product/version/install/composer/ComposerHeadless, where
version is the Content Server version. The file dfc.properties is located in the subdirectory
performance reasons, you can choose to share the content storage. With shared content storage, CPS
has direct read access to the content. No content is streamed. You map the path to the file store in
index agent web application. This performs a getpath operation.
Note: The content storage area must be unencrypted and mountable as read-only by the Index Agent
and xPlore hosts.
1. On the index agent host, open indexagent.xml, which is located in indexagent_home/wildfly_version
/server/DctmServer_Indexagent/deployments/IndexAgent.war/WEB-INF/classes. If you installed
multiple index agents on this host, an integer is appended to the IndexAgent WAR file name, for
example, IndexAgent1.war.
2. Set the path in the exporter element:
• If the file system paths to content are the same on the Content Server host and xPlore host,
change the value of the child element all_filestores_local to true.
• If the file system paths are different, add a file store map within the exporter element.
Specify the store name and local mapping for each file store. In the following example,
Content Server is on the host Dandelion and filestore_01 is on the same host at the
directory /Dandelion/Documentum/data/repo1/content_storage_01. The index agent
and xPlore server are on a separate host with a map to the Content Server host:
/mappingtoDandelion/repo1/content_storage_01. The following map is added to the exporter
element:
<local_filestore_map>
<local_filestore>
<store_name>filestore_01</store_name>
<local_mount>/mappingtoDandelion/repo1/content_storage_01
</local_mount>
</local_filestore>
<!-- similar entry for each file store -->
</local_filestore_map>
Note: Update the file_system_path attribute of the dm_location object in the repository to match
this local_mount value, and then restart the Content Server.
3. Save indexagent.xml and restart the index agent instance.
For better performance, you can mount the content storage to the xPlore index server host and set
all_filestores_local to true. Create a local file store map as shown in the following example:
<all_filestores_local>true</all_filestores_local>
<local_filestore_map>
<local_filestore>
<store_name>filestore_01</store_name>
<local_mount>\\192.168.195.129\DCTM\data\ftwinora\content_storage_01
</local_mount>
</local_filestore>
<!-- similar entry for each file store -->
</local_filestore_map>
You can create multiple full-text collections for a repository for the following purposes:
• Partition data
• Scale indexes for performance
• Support storage-based routing
1. Open indexagent.xml, located in the indexing agent WAR file in the directory
(xplore_home/wildfly_version/server/DctmServer_IndexAgent/deployments/IndexAgent.war/WEB-INF/classes).
2. Add partition-config and its child elements to the element
index-agent/indexer_plugin_config/indexer to map file stores to collections.
In the following example, filestore_01 maps to collection 'coll01', and 02 to 'coll02'. The rest of the
repository is mapped to the default collection. Each repository has one default collection named default.
<partition_config>
<default_partition>
<collection_name>default</collection_name>
</default_partition>
<partition>
<storage_name>filestore_01</storage_name>
<collection_name>coll01</collection_name>
</partition>
<partition>
<storage_name>filestore_02</storage_name>
<collection_name>coll02</collection_name>
</partition>
</partition_config>
Migrating documents
• Migrating content (reindexing), page 77
• Migrating documents by object type, page 77
• Migrating a limited set of documents, page 78
Note: The parameter_list element can contain only one parameter element.
3. Stop and restart the index agent using the scripts in indexagent_home/wildfly_version/server or
using the Windows services panel.
4. Log in to the index agent UI and choose Start new reindexing operation.
5. When indexing has completed (on the Details page, no more documents in the queue), click
Stop IA.
6. Run the aclreplication script to update permissions for users and groups in xPlore. See Manually
updating security, page 56.
7. Update the indexagent.xml file to index another type or change the parameter_value to
dm_document.
Using ftintegrity
ftintegrity output, page 80
ftintegrity result files, page 81
Running the state of index job, page 82
state of index and ftintegrity arguments, page 82
ftintegrity and the state of index job (in Content Server 6.7 or higher) are used to verify indexing after
migration or normal indexing. The utility verifies all types that are registered in the dmi_registry_table
with the user dm_fulltext_index_user. The utility compares the object ID and i_vstamp between the
repository and xPlore. You can compare metadata values, which compares object IDs and the specified
attributes.
Run ftintegrity as the same administrator user who started the instance.
Note: ftintegrity can be very slow, because it performs a full scan of the index and content. Do not run
ftintegrity when an index agent is migrating documents.
Run the ftintegrity index verification tool after migration or restoring a federation, domain,
or collection. The tool is a standalone Java program that checks index integrity against
repository documents. It verifies all types that are registered to dmi_registry_table with the user
dm_fulltext_index_user, comparing the object ID and i_vstamp between the repository and xPlore.
Use the option -checkType to check a specific object type. Use the option -checkMetadata to check
specific single-value attributes (requires -checkType).
1. Navigate to xplore_home/setup/indexagent/tools.
2. Open the script ftintegrity_for_repositoryname.bat (Windows) or ftintegrity_for_repositoryname.sh
(Linux) and edit the script. Substitute the repository instance owner password in the script (replace
<password> with your password). The tool automatically resolves all parameters except for the
password.
3. Optional: Add the option -checkfile to the script. The value of this parameter is the full path to
a file that contains sysobject IDs, one on each line. This option compares the i_vstamp on the
ACL and any groups in the ACL that is attached to each object in a specified list. If this option
is used with the option -checkUnmaterializeLWSO, -CheckType, -StartDate, or -EndDate, these
latter options are not executed.
For example:
....FTStateOfIndex DSS_LH1 Administrator mypassword
Config8518VM0 9300 -checkfile ...
4. Optional: Add the option -checkType to compare a specific type in the Content Server and index.
You can run the script for one type at a time. The tool checks sysobject types or subtypes. It does
not check custom types that are not subtypes of dm_sysobject.
For example:
$JAVA_HOME/bin/java ... -checkType dm_document
5. Optional: Add the option -checkMetadata at the end of the script. This argument requires a path
to a metadata.txt file that contains a list of required single-valued (not repeating) metadata fields
to check, one attribute name per line. (Create this file if it does not exist.) This option applies
only to a specific type.
ftintegrity output
Output from the script is like the following:
Executing stateofindex
ftintegrity is completed.
Interpreting the output:
• objects from dm_acl and dm_group: Numbers fetched from repository (docbase) and xPlore.
• match ivstamp: Objects that have been synchronized between Content Server and xPlore.
• different ivstamp: Objects that have been updated in Content Server but not yet updated in the index.
• objects in DCTM only: These objects are in the repository but not xPlore for one or more of
the following reasons:
– Objects failed indexing.
– New objects not yet indexed.
– Objects filtered out by index agent filters.
• objects in Index Server only: Any objects here indicate objects that were deleted from the repository
but the updates have not yet propagated to the index.
In the example, the ACLs and groups totals were identical in the repository and xPlore, so security is
updated. There are 147 objects in the repository that are not in the xPlore index. They were filtered out
by index agent filters, or they are objects in the index agent queue that have not yet been indexed.
To eliminate filtered objects from the repository count, add the usefilter argument to ftintegrity
(slows performance).
• ObjectId-indexOnly.txt
This report contains the object IDs and i_vstamp values of objects in the index but not in the
repository.
These objects were removed from the repository during or after migration, before the event has
updated the index.
You can input the ObjectId-common-version-mismatch.txt file into the index agent UI to see errors
for those files. After you have started the index agent, select Object file. Browse to import the file
and then click Submit. Navigate to the Refeed Tasks page to display the Refeed Tasks List. You can
view the refeed task status, pause, resume or delete tasks, and export failed ID lists or an xml file
with error messages.
In addition, the job is installed with the -queueperson and -windowinterval arguments set. The
-queueperson and -windowinterval arguments are standard arguments for administration jobs and are
explained in the Documentum Content Server Administration and Configuration Guide.
Note:
• To get a list of objects that failed indexing, you need to run ftintegrity or the state of index
Content Server job to get a list of objects that failed in indexing. See Using ftintegrity, page 79.
Remove all data from the file ObjectId-common-version-mismatch.txt except object IDs.
• To copy an object ID list to the IndexAgent server, you need to create a text
file with the object ID. Save as ids.txt in the WEB-INF/classes directory of
xplore_home/wildfly_version/server/DctmServer_IndexAgent/deployments/
IndexAgent.war/. (Specify the actual path to your index agent web application.) For more
details see the Setting startup with a list of file IDs topic in Silent index agent startup, page 67 of
the Documentum xPlore Installation Guide.
To submit a DQL or file of object IDs from the Index Agent UI, do the following:
1. Start the index agent in normal mode. You will see a page that allows you to input a selected
list of objects for indexing.
2. Select one of these options:
• Click DQL and select the type. You can include all versions and add a clause if necessary.
To preview the DQL result, click Preview.
• Click Object File and browse to import a file.
3. Click Submit.
4. Navigate to the Refeed Tasks page to display the Refeed Tasks List. You can view the refeed task
status, pause, resume or delete tasks, and export failed ID lists or an xml file with error messages.
Completed tasks older than 60 days are deleted.
You can view summary and detailed error and warning messages raised by document indexing failures
by running the following reports under Diagnostic and Utilities:
• IA Message Summary Per Hour: Displays the number of error messages and warning messages
raised by document indexing failures during each hour on a specified date.
• IA Message Summary Per Day: Displays the number of error messages and warning messages
raised by document indexing failures during each day in a specified month.
• IA Message Summary Per Month: Displays the number of error messages and warning messages
raised by document indexing failures during each month of a specified year.
• IA Message Record: Displays detailed error and/or warning messages raised by document indexing
failures during a specified period of time.
After you configure this setting, restart the index agent for the change to take effect:
After logging in to the Index Agent Administration UI (default: http://host:9200/IndexAgent), click
Stop IA; then select Start Index Agent in Normal Mode and click Submit.
<parameter_value>true</parameter_value>
</parameter>
</parameter_list >
...
</indexagent_instance>
After you configure this setting, restart the index agent for the change to take effect:
After logging in to the Index Agent Administration UI (default:
http://host:9200/IndexAgent/login_dss.jsp), click Stop IA; then select Start Index Agent in Normal
Mode and click Submit.
If you make this change after indexing, reindex objects to make the metadata non-searchable.
Documentum object types can be marked as non-indexed in Documentum Administrator. See Making
types non-indexable, page 88.
{
domBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
}
catch (ParserConfigurationException e)
{
throw new DfException(e);
}
IDfCollection childRelations = getChildRelatives("dm_annotation");
while (childRelations.next())
{
Element annotationNode = document.createElement("annotation");
mediaAnnotations.appendChild(annotationNode);
try
{
IDfId id = childRelations.getTypedObject().getId("child_id");
// This will get the dm_note object
IDfDocument note = (IDfDocument) getSession().getObject(id);
ByteArrayInputStream xmlContent = note.getContent();
Document doc = domBuilder.parse(xmlContent);
catch (IOException e)
{
// Log the error
}
}
childRelations.close();}}
Generated dftxml
<dmftdoc>
...
<dmftcustom>
<mediaAnnotations>
<annotation>
<content>
This is my first note
</content>
<author>Marc</author>
</annotation>
<annotation>
<content>
This is my second note
</content>
<author>Marc</author>
</annotation>
</mediaAnnotations>
</dmftcustom>
</dmftdoc>
if ("dm_note".equals(objectTypeName))
{
return DfCustomIndexFilterAction.SKIP;
}
return DfCustomIndexFilterAction.INDEX;
}
}
Warning Metadata indexed but not content Metadata indexed but not content
retrieve,c,dm_ftindex_agent_config
set,c,l,runaway_item_timeout
<value>
save,c,l
Note: Increasing the timeout values can cause an out-of-memory error. It is recommended that you
test the system under load to make sure your changes are safe.
If task_state is done, the message is "Successful batch..." If the task_state is failed, the message is
"Incomplete batch..."
To resubmit one document for reindexing
Put the object ID into a temporary text file. Use the index agent UI to submit the upload: Select
theObject File option.
To remove queue items from reindexing
Navigate to Refeed Tasks page in the index agent UI. For earlier xPlore versions, use the following
DQL. For username, specify the user logged in to the index agent UI and start reindexing.
?,c,delete dmi_queue_item object where name=username and
event='FT re-index'
Indexing is slow
The bottleneck can be in the RDBMS, index agent, or xPlore.
• If a simple SQL query like count(*) from the sysobject table takes a long time, the RDBMS is slow.
Try a restart or run update statistics.
• If the index agent log shows CONNECTOR_PAUSED or EXPORTER_PAUSED, the problem
is in the index agent.
• If the index agent log show INDEXER_PAUSED, the problem is in xPlore.
error_code Description
UNSUPPORTED_DOCUMENT Unsupported format
XML_ERROR XML parsing error for document content
DATA_NOT_AVAILABLE No information available
PASSWORD_PROTECTED Password protected or document encrypted
MISSING_DOCUMENT RTS routing error
By default, if xPlore server is down (CONNECTION FAILURE error), the indexing and the data
ingestion stop after the specified number of errors happens in the specified time period. In this case,
the status of the index agent connector and indexer threads displayed is “finished”. When the problem
is solved, use the index agent UI to stop and restart the index agent.
DM_SYSOBJECT_E_CANT_SAVE_NO_LINK ERROR
The error in the index agent log is “Cannot save xxx sysobject without any link.” Possible causes are:
• The index agent configurator failed to retrieve full-text repository objects.
• The index agent installation user does not have a default folder defined in the repository, or the
folder no longer exists.
To verify, dump the user with the following iAPI commands. Substitute the installation owner name.
retrieve,c,dm_user where user_name='installation_owner'
get,c,l,default_folder
progress, Awaiting indexing, Warning, and All. From the Indexing failed display, you can find the
object ID and type, and the type of failure. Some types of errors are the following:
• [DM_FULLTEXT_E_SEARCH_GET_ROW_FAIL...] Caused by incorrect query plugin
• [DM_FULLTEXT_E_QUERY_IS_NOT_FTDQL...] Caused by incorrect query plugin
• [DM_FULLTEXT_E_EXEC_XQUERY_FAIL...] There is nothing in the index.
To sort by queue state when there is a large queue, use the following DQL command in Documentum
Administrator:
?,c,select count(*), task_state from dmi_queue_item where name like '%fulltext%'
group by task_state
To check the indexing status of a single object, get the queue item ID for the document in the details
screen of the index agent UI. Use the following DQL to check the status of the queue item:
?,c,select task_name,item_id,task_state,message from dmi_queue_item where name=
username and event='FT re-index'
To check registered types and the full-text user name, use the following iAPI command.
?,c,select distinct t.name, t.r_object_id, i.user_name from dm_type t,
dmi_registry i where t.r_object_id = i.registered_id and i.user_name like
'%fulltext%'
Enable connections between the index agent host, the Content Server, and xPlore through the firewall.
Startup problems
Make sure that the index agent web application is running. On Windows, verify that the Documentum
Indexagent service is running. On Linux, verify that you have instantiated the index agent using the
start script in xplore_home/wildfly_version/server.
Make sure that the user who starts the index agent has permission in the repository to read all content
that is indexed.
If the repository name is reported as null, restart the repository and the connection broker and try again.
If you see a status 500 on the index agent UI, examine the stack trace for the index agent instance. If a
custom routing class cannot be resolved, this error appears in the browser:
org.apache.jasper.JasperException: An exception occurred processing JSP page
/action_dss.jsp at line 39
...
root cause
com.emc.documentum.core.fulltext.common.IndexServerRuntimeException:
com.emc.documentum.core.fulltext.client.index.FtFeederException:
Error while instantiating collection routing custom class...
If the index agent web application starts with port conflicts, stop the index agent with the script. If you
run a stop script, run as the same administrator user who started the instance. The index agent locks
several ports, and they are not released by closing the command window.
100 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Chapter 5
Document Processing (CPS)
About CPS
The content processing service (CPS) performs the following functions:
• Retrieves indexable content from content sources
• Determines the document format and primary language
• Parses the content into index tokens that xPlore can process into full-text indexes
If you test Documentum indexing before performing migration, first replicate security. See Manually
updating security, page 56.
For information on customizations to the CPS pipeline, see Custom content processing, page 132.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 101
Document Processing (CPS)
Language identification
Some languages have been tested in xPlore. Many other languages can be indexed. Some languages
are identified fully including parts of speech, and others require an exact match. For a list of languages
that CPS detects, see Basistech documentation. If a language is not listed as one of the tested languages
in the xPlore release notes, search must be for an exact match. For tested languages, linguistic features
and variations that are specific to these languages are identified, improving the quality of search
experience.
White space
White space such as a space separator or line feed identifies word separation. Then, special characters
are substituted with white space. See Handling special characters, page 118.
For Asian languages, white space is not used. Entity recognition and logical fragments guide the
tokenization of content.
Case sensitivity
All characters are stored as lowercase in the index. For example, the phrase "I'm runNiNg iN THE
Rain” is lemmatized and tokenized as "i be run in the rain.”
There is a limited effect of case on lemmatization. In some languages, a word can have different
meanings and thus different lemmas depending on the case.
Case sensitivity is not configurable.
102 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
6. In the Add Service window, select Remote and provide information of the remote CPS instance
you are adding.
a. Enter the URL to the remote instance using the following syntax:
http://hostname:port/cps/ContentProcessingService?wsdl
b. From the Instance list, select an instance you want to add the CPS to.
c. From the Usage list, specify whether the CPS instance processes indexing requests (the index
option), search requests (the search option), or both (the all option).
d. Click OK.
Note: The added CPS takes effect immediately.
7. Specify whether the CPS instance performs linguistic processing (lp) or text extraction (te). If a
value is not specified, TE and LP are sent to CPS as a single request.
a. In indexserverconfig.xml, locate the content-processing-services element. This element
identifies each CPS instance, The element is added when you install and configure a new
CPS instance.
b. Add or change the capacity attribute on this element. The capacity attribute determines whether
the CPS instance performs text extraction, linguistic processing, or all. In the following
example, a local CPS instance analyzes linguistics, and the remote CPS instance processes
text extraction.
<content-processing-services context-characters="
!,.;?'"" special-characters="@#$%^_~`*&:()-+=<>/\[]{}">
<content-processing-service capacity="lp" usage="all" url="local"/>
<content-processing-service capacity="te" usage="index" url="
http://myhost:9700/cps/ContentProcessingService?wsdl"/>
</content-processing-services>
8. Wait to ensure the change has been applied to all xPlore instances.
9. Test the remote CPS service using the WSDL testing page, with the following syntax:
http://hostname:port/cps/ContentProcessingService?wsdl
After you install and register the remote instance, you see it in the Content Processing Service UI of
xPlore administrator. You can check the status and see version information and statistics.
Check the CPS daemon log file cps_daemon.log for processing event messages. For a local
process, the log is in xplore_home/wildfly_version/server/DctmServer_NodeInstanceName/logs.
For a remote CPS instance, cps_daemon.log is located in
cps_home/wildfly_version/server/DctmServer_RemoteCPSInstanceName/logs. If a CPS instance is
configured to process text only, TE is logged in the message. For linguistic processing, LP is logged.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 103
Document Processing (CPS)
Note: An xPlore instance must have at least one CPS configured for it. If an xPlore instance has only
one CPS, either local or remote, you cannot disable or remove it.
Administering CPS
Configuring Starting and stopping CPS
You can configure CPS properties, stop and start CPS in xPlore administrator.
To operate on remote CPS, expand Services, expand Content Processing Service.
Remote CPS services are displayed under Content Processing Service, for example,
http://host:port/cps/ContentProcessingService?wsdl.
To operate on local CPS, expand Instances, expand an xPlore node, and then expand Content
Processing Service. The local service is displayed under Content Processing Service.
1. Configure CPS properties: Select a remote or local service, click Configuration on the up-right
corner. The CPS Configuration window appears. Modify properties in this window.
2. Stop CPS: Select a remote or local service, click Stop CPS on the up-right corner, and then click
Suspend.
3. Start CPS: Select a remote or local service, click Start CPS and then click Resume
Select an instance in the xPlore administrator tree, expand it, and choose Content Processing
Service. Click Start CPS and then click Resume.
If CPS crashes or malfunctions, the CPS manager tries to restart it to continue processing. If the restart
fails, check the corresponding CPS configuration file to ensure the configuration is appropriate.
104 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 105
Document Processing (CPS)
• To avoid query timeouts when the indexing load is high, enable search on a dedicated CPS daemon:
1. Edit the CPS configuration file NodeInstanceName_local_configuration.xml or
RemoteCPSInstanceName_configuration.xml located in xplore_home/dsearch/cps/cps_daemon.
2. Edit the query_dedicated_daemon_count property and set the number of daemons dedicated to
searches. Default: 1.
106 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 107
Document Processing (CPS)
are restricted to an exact match. For the list of identified languages and encodings for each language,
see the Basistech documentation.
108 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
<search-config><properties>
<property value="en" name "query-default-locale"/>...
The query locale can be overridden by setting the property dsearch_override_locale in
dm_ftengine_config.
4. Change the metadata that are used for language identification. Set an attribute as the value of the
name on the element element-for-language-identification. For example:
<linguistic-process>
<element-for-language-identification name="object_name"/>
<element-for-language-identification name="title"/>
<element-for-language-identification name="subject"/>
<element-for-language-identification name="keywords"/>
</linguistic-process>
5. Validate your changes to indexserverconfig.xml. See Modifying indexserverconfig.xml, page 47.
6. (Optional) Check the identified language for a document: Use xPlore administrator to view the
dftxml of a document. Click the document in the collection view, under Data Management. The
language is specified in the lang attribute on the dmftcontentref element. For example:
query-locale=en
8. (Optional) Change the session locale of a query. The session_locale attribute on a Documentum
object is automatically set based on the OS environment. To search for documents in a different
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 109
Document Processing (CPS)
language, change the local per session in DFC or iAPI. The iAPI command to change the
session_locale:
set,c,sessionconfig,session_locale
The DFC command to set session locale on the session config object (IDfSession.getSessionConfig):
IDfTypedObject.setString("session_locale", locale)
110 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
<property value="language_code_1[weight_1],language_code_2[weight_2],
… language_code_n[weight_n]"
name="query-language-detection-boost-language-list"/>
The weight in [] is an integer indicating the possibility of being chosen as the query language. The
higher the weight is, the more likely the corresponding language is chosen. For example, in the
following setting, both Chinese and Japanese are boosted during query language detection and
Chinese is more likely to be chosen as the query language.
Handling apostrophes
Some languages have more apostrophes as part of a name or other part of speech. The default list
of special context characters includes the apostrophe. Apostrophes in words are treated as white
space. You can remove the apostrophe from the list if words are not correctly found on search. See
Handling special characters, page 118.
Indexable formats
Some formats are fully indexed. For some formats, only the metadata is indexed. For a full list of
supported formats, see Oracle Outside In documentation.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 111
Document Processing (CPS)
If a format cannot be identified, it is listed in the xPlore administrator report Document Processing
Error Detail. Choose Unsupported file formats to see the list.
Lemmatization
• About lemmatization, page 112
• Configuring indexing lemmatization, page 113
• Lemmatizing specific types or attributes, page 114
• Troubleshooting lemmatization, page 115
• Saving lemmatization tokens, page 116
About lemmatization
Lemmatization is a normalization process that reduces a word to its canonical form. For example, a
word like books is normalized into book by removing the plural marker. Am, are, and is are normalized
to "be.” This behavior contrasts with stemming, a different normalization process in which stemmed
words are reduced to a string that sometimes is not a valid word. For example, ponies becomes poni.
xPlore uses an indexing analyzer that performs lemmatization. Studies have found that some form
of stemming or lemmatization is almost always helpful in search.
Lemmatization is applied to indexed documents and to queries. Lemmatization analyzes a word for
its context (part of speech), and the canonical form of a word (lemma) is indexed. The extracted
lemmas are actual words.
Alternate lemmas
Alternative forms of a lemma are also saved. For example, swim is identified as a verb. The noun
lemma swimming is also saved. A document that contains swimming is found on a search for swim.
If you turn off alternate lemmas, you see variable results depending on the context of a word. For
example, saw is lemmatized to see or to saw depending on the context. See Configuring indexing
lemmatization, page 113.
Query lemmatization
Lemmatization of queries is more prone to error because less context is available in comparison to
indexing.
The following queries are lemmatized:
• IDfXQuery: The with stemming option is included.
• The query from the client application contains a wildcard.
• The query is built with the DFC search service.
• The DQL query has a search document contains (SDC) clause (except phrases). For example,
the query select r_object_id from dm_document search document contains ‘companies winning’
produces the following tokens: companies, company, winning, and win.
112 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
<search-config>
<properties>
<property value="en" name="query-default-locale"/>
...
<property value="true" name="query-exact-phrase-match"/>
</properties>
</search-config>
3. Restart the primary and secondary xPlore instances.
4. Rebuild the index using xPlore administrator.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 113
Document Processing (CPS)
Element Description
element-with-name The name attribute on this element specifies the
name of an element that contains lemmatizable
content.
In the following example, the content of an element with the attribute dmfttype with a value of
dmstring is lemmatized. These elements are in a dftxml file that the index agent generates. For the
dftxml extensible DTD, see Extensible Documentum DTD, page 370.
If the extracted text does not exceed 262144 bytes (extract-text-size), the specified element is
processed. In the example, an element with the name dmftcustom is processed . Several elements are
specified for language identification.
<linguistic-process>
<element-with-attribute name="dmfttype" value="dmstring"/>
<element-with-name name="dmftcustom">
<save-tokens-for-summary-processing extract-text-size-less-than="
262144" token-size="65536"/>
</element-with-name>
<element-for-language-identification name="object_name"/> ...
</linguistic-process>
Note: If you wish to apply your lemmatization changes to the existing index, reindex your documents
from the IndexAgent.
114 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
Troubleshooting lemmatization
If a query does not return expected results, examine the following:
• Test the query phrase or terms for lemmatization and compare to the lemmatization in the context of
the document. (You can test each sample using xPlore administrator Test Tokenization.
• View the query tokens by setting the dsearch logger level to DEBUG using xPlore administrator.
Expand Services > Logging and click Configuration. Set the log level for dsearch-search. Tokens
are saved in dsearch.log.
• Check whether some parts of the input were not tokenized because they were excluded from
lemmatization: Text size exceeds the configured value of the extract-text-size-less-than attribute.
• Check whether a subpath excludes the dftxml element from search. (The sub-path attribute
full-text-search is set to false.)
• If you have configured a collection to save tokens, you can view them in the xDB admin tool. (See
Debugging queries, page 277.) Token files are generated under the Tokens library, located at the
same level as the Data library. If save-tokens-for-summary-processing is enabled for one element,
you can also view tokens in the stored dftxml using xPlore administrator. The number of tokens
stored in the dftxml depends on the configured amount of tokens to save. To see the dftxml, click
a document in a collection.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 115
Document Processing (CPS)
3. You can view the saved tokens in the xDB tokens database. Open the xDB admin tool in
xplore_home/dsearch/xhive/admin.
116 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
Rules format
Disambiguation rules are based on the concepts of original form and root form of terms. The original
form of a term can have variants, whereas the root form is the basic linguistic unit, or lemma. For
example, run, runs, ran, and running are variants of the root form run.
A disambiguation rule starts with a "+" or "-" rule sign, followed by one original form, and then by one
or more root forms. If the rule sign is "+", root forms (lemmas) are added to the original form. If it is
"-", root forms are removed from the original form.
The rule is applied to the original form, or first word after the rule sign. The subsequent words are the
root forms that you want to add or remove. In the rules file, a line starting with “#” is considered as
a common line.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 117
Document Processing (CPS)
118 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
You can also index the two forms: the original form and the normalized form to allow wildcard
searches. In this case, normalize_form must be set to true.
To prevent diacritics from being normalized, do the following from xPlore administrator >
Services > Content Processing service > CPS plugin configuration:
1. Change diacritics removal in the CPS configuration file NodeInstanceName_local_configuration.xml
or RemoteCPSInstanceName_configuration.xml located in xplore_home/dsearch/cps/cps_daemon.
2. Locate the element linguistic_processing/properties/property and set the value of normalize_form
to false:
<property name="normalize_form">false</property>
For example, the following setting denotes that diacritics will not be normalized:
<linguistic_processor>
<name>RLP</name>
<type>native</type>
<lib_path>e:/xPlore/dsearch/cps/cps_daemon/bin/linguistic_processor.dll</lib_path>
<properties>
<property name="max_data_per_process">31457280</property>
<property name="config_file">
e:/xPlore/dsearch/cps/cps_daemon/cps_context.xml</property>
<property name="normalize_form">false</property>
<property name="skip_components_lang_list">de</property>
<property name="force_tokenize_on_whitespace">true</property>
<property name="keep_accents_in_original_form">true</property>
</properties>
<languages>en,en_uc,zh,zh_sc,zh_tc,ja,ko,de,fr,ru,ar,nl,cs,hu,es,it,pt</languages>
</linguistic_processor>
Note: The special characters list must contain only Unicode characters (first 65,536 code points).
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 119
Document Processing (CPS)
For example, a phrase extract-text is tokenized as extract and text, and a search for either term finds the
document.
120 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
will filter stop words in the phrase by default. But if query-exact-phrase-match is enabled, then the stop
words will be kept. When stop words are filtered, there will be a position gap to perform the query,
which means "the query brown fox jumps over the lazy dog" is the same as "query brown fox jump
xxx xxx lazy dog". Therefore, "jump" and "lazy" must have position difference of 2.
A query with the constraint
(. ftcontains "quick brown fox jumps lazy dog" without stemming)
returns only documents containing the string “quick brown fox jumps lazy dog” and does not filter
stop words.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 121
Document Processing (CPS)
are generated for this phrase. Every list is composed of two or more single characters as shown below:
This feature makes documents containing Chinese, Japanese, or Korean terms more likely to be hit.
Moreover, any adjacent characters are searchable no matter of whether the characters are located in the
same component list or not. For example, a query for 华人 also returns documents containing the
phrase 中华人民共和国.
We recommend that you configure a higher score for original forms by increasing the value of
query-original-term-weight, which makes documents containing original tokens more likely to be hit.
Note: This feature requires more disk space to store index data as more components are generated.
122 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
Log levels
In order of decreasing amount of information logged: trace, debug, info, warn, and error. Set the log
level to INFO to troubleshoot CPS.
Log output
Each CPS request is logged with the prefix DAEMON#. You see prefixes for following libraries
in CPS logging:
• CPS daemon: CORE
• Text extraction: TE STELLENT
• HTTP content retrieval: CF_HTTP
• Language processing: LP
• Language identification: LI_RLI
Following is an example from cps.daemon.log. (Remote CPS log is named cps_manager.log.)
2012-03-28 10:28:41,828 INFO [Daemon0(4496)-TE-Stellent-(136)]
identify_file_normally configured: true, actual: true
...
2012-03-28 11:16:37,204 WARN [Daemon0(4496)-Core-(3448)] LP of lpReq 0
of sub-request 4 of req 32 of doc 080023a38000151f based on fallback lang en,
encoding utf-16le
2012-03-28 11:16:36,829 WARN [Daemon0(4496)-LI-RLI-(5464)] No language matched.
Testing tokenization
Test the tokenization of a word or phrase to see what is indexed. Expand Diagnostic and Utilities in
the xPlore administrator tree and then choose Test tokenization. Input the text and select the language.
Different tokenization rules are applied for each language. (Only languages that have been tested are
listed. See the release notes for supported languages. Other languages are not tokenized.)
Uppercase characters are rendered as lowercase. White space replaces special characters.
The results table displays the original input words. The root form is the token used for the index. The
Start and End offsets display the position in raw input. Components are displayed for languages that
support component decomposition, such as German.
Results can differ from tokenization of a full document for the following reasons:
• The document language that is identified during indexing does not match the language that is
identified from the test.
• The context of the indexed document does not match the context of the text.
Use the executable CASample in
xplore_home/dsearch/cps/cps_daemon/shared_libraries/stellent_text_extractor to test the
processing of a file. Syntax:
casample path_to_input_file
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 123
Document Processing (CPS)
Another possible cause is an unsupported OS. Verify the supported OS for your version of xPlore.
If CPS fails to start, the CPS configuration may be invalid. Check to see whether you have changed the
file NodeInstanceName_local_configuration.xml or RemoteCPSInstanceName_configuration.xml in
xplore_home/dsearch/cps/cps_daemon.
This failure occurs because the rebuilding index module dumps large texts to the first CPS export_path
folder. Therefore, if there are multiple CPS instances which can be accessed by the xPlore instance,
make sure all those CPS instances can access the first CPS' export_path, or set up a shared folder in
rebuild-index-temp-path in indexserverconfig.xml
124 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 125
Document Processing (CPS)
File corrupted
If there are processing errors for the file, they will be displayed after the processing statistics. A corrupt
file returns the following error. The XML element that contains the error is displayed:
*** Error: file is corrupt in element_name.
A file with bad content can also return the error message Served data invalid.
Check the non XML file using the casample utility to see if it is corrupted. See CPS troubleshooting
methods, page 122. Check the list of supported formats for the format utility
126 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
• Is indexing enabled for the object type? Documents are not indexed if the document type is not
registered or is not a subtype of a registered type. Check whether indexing is enabled (the type
is a subtype of a registered type). You can check whether indexing is enabled in Documentum
Administrator by viewing the type properties. You can get a listing of all registered types using the
following iAPI command:
?,c,select distinct t.name, t.r_object_id from dm_type t, dmi_registry i
where t.r_object_id = i.registered_id
You see results like the following:
name r_object_id
--------------------------- ----------------
dm_group 0305401580000104
dm_acl 0305401580000101
dm_sysobject 0305401580000105
• You can register or unregister a type through Documentum Administrator. The type must be
dm_sysobject or a subtype of it. If a supertype is registered for indexing, the system displays the
Enable Indexing checkbox selected but disabled. You cannot clear the checkbox.
• Is the format indexable? Check the class attribute of the document format. See Documentum
attributes that control indexing, page 64 for more information.
• Is the document too large? See Maximum document and text size, page 105.
Insufficient CPU
Content extraction and text analysis are CPU-intensive. CPU is consumed for each document creation,
update, or change in metadata. Check CPU consumption during ingestion.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 127
Document Processing (CPS)
Suggested workarounds: For migration, add temporary CPU capacity. For day-forward (ongoing)
ingestion, add permanent CPU or new remote CPS instance on other hosts. CPS instances are used in
a round-robin order.
Insufficient memory
When xPlore indexes large documents, it loads much content which consumes too much memory. In
this case, you get an out-of-memory error in dsearch.log such as:
Internal server error. [Java heap space] java.lang.OutOfMemoryError
Suggested workarounds:
• Add more memory to xPlore.
• Limit the document and text size as described in Maximum document and text size, page 105.
Large documents
Large documents can tie up a slow network. These documents also contain more text to process. Use
xPlore administrator reports to see the average size of documents and indexing latency and throughput.
The average processing latency is the average number of seconds between the time the request is
created in the indexing client and the time xPlore receives the same request. The State of repository
report in Content Server also reports document size. For example, the Documents ingested per hour
reports shows number of documents and text bytes ingested. Divide bytes ingested by document count
to get average number of bytes per document processed.
Several configuration properties affect the size of documents that are indexed and consequently the
ingestion performance.Maximum document and text size, page 105 describes these settings.
128 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
4. If other measures have not resolve the problem, change underlying drives to solid state.
5. Use striped mapping instead of concatenated mapping so that all drives can be used to service
I/O.
Slow network
A slow network between the Documentum Content Server and xPlore results in low CPU consumption
on the xPlore host. Consumption is low even when the disk subsystem has a high capacity. File
transfers via FTP or network share are also slow, independent of xPlore operations.
Suggested workarounds: Verify that network is not set to half duplex. Check for faulty hubs or
switches. Increase network capacity.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 129
Document Processing (CPS)
130 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
Chinese: xplore_home\dsearch\cps\cps_daemon\shared_libraries\rlp\
cma\source\samples\build_user_dict.sh
Japanese: xplore_home\dsearch\cps\cps_daemon\shared_libraries\rlp\
jma\source\samples\build_user_dict.sh
Korean: xplore_home\dsearch\cps\cps_daemon\shared_libraries\rlp\
kma\source\samples\build_user_dict.sh
• On Linux:
1. Export the following variables:
export
BT_ROOT=xplore_home/dsearch/cps/cps_daemon/shared_libraries
export BT_BUILD=variableDir
Where variableDir is a subdirectory under
xplore_home/dsearch/cps/cps_daemon/shared_libraries/rlp/bin/ and its name differs from
computer to computer. For example:
export BT_BUILD=amd64-glibc25-gcc42
export LD_LIBRARY_PATH=libbtutils.so.7.2_path
Where libbtutils.so.7.2_path is the path of the libbtutils.so.7.2 library. For example:
export LD_LIBRARY_PATH=/opt/dmadmin/xPlore1.5/dsearch/cps/
cps_daemon/shared_libraries/rlp/lib/amd64-glibc25-gcc42
2. Use the chmod command to change the file permissions on the compilation script and some
other files:
chmod a+x build_user_dict.sh
chmod a+x build_cla_user_dictionary
chmod a+x cla_user_dictionary_util
chmod a+x t5build
chmod a+x t5sort
3. Run the compilation script build_user_dict.sh. Use the following as an example:
./build_user_dict.sh mydict.txt mydict.bin
• On Windows:
1. Download and install Cygwin from http://www.cygwin.com/.
2. Launch the Cygwin terminal.
3. Use the chmod command to change the file permissions on the compilation script:
chmod a+x build_user_dict.sh
4. Export the following variables:
export
BT_ROOT=xplore_home/dsearch/cps/cps_daemon/shared_libraries
export BT_BUILD=variableDir
Where variableDir is a subdirectory under
xplore_home/dsearch/cps/cps_daemon/shared_libraries/rlp/bin/ and its name differs from
computer to computer. For example:
export BT_BUILD=amd64-w64-msvc100
5. Edit the build_user_dict.sh file to make sure that carriage returns are denoted by \n instead
of \n\r in the file.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 131
Document Processing (CPS)
6. In the Cygwin terminal, run the compilation script build_user_dict.sh specific to the
dictionary language; for example:
./build_user_dict.sh mydict.txt mydict.bin
3. Put the compiled dictionary into the directory specific to the dictionary language:
• Chinese: xplore_home/cps/cps_daemon/shared_libraries/rlp/cma/dicts
• Japanese: xplore_home/cps/cps_daemon/shared_libraries/rlp/jma/dicts
• Korean: xplore_home/cps/cps_daemon/shared_libraries/rlp/kma/dicts
4. Edit the CLA configuration file to include the user dictionary. Add a dictionarypath element to
cla-options.xml in xplore_home/cps/cps_daemon/shared_libraries/rlp/etc.
The following example adds a Chinese user dictionary named mydict.bin:
<claconfig>
...
...
<dictionarypath><env name="root"/>/cma/dicts/mydict.bin
</dictionarypath>
</claconfig>
5. To prevent a word that is also listed in a system dictionary from being decomposed,
modify cps_context.xml in xplore_home/cps/cps_daemon. Add the property
com.basistech.cla.favor_user_dictionary if it does not exist, and set it to true.
For example:
<contextconfig><properties>
<property name="com.basistech.cla.favor_user_dictionary"
value="true"/>...
132 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
• Entity extraction and XML annotation: For example, extracting the locations of people or places
from the full-text content.
You can customize the CPS document processing pipeline at the following stages:
1. Text extractor for certain content formats, both Java and C/C++ based. Supports custom file
decryption before extraction. Specified in CPS NodeInstanceName_local_configuration.xml or
RemoteCPSInstanceName_configuration.xml.
Custom text extractors are usually third-party modules that you configure as text extractors for
certain formats (mime types). You must create adaptor code based on proprietary, public xPlore
adaptor interfaces.
2. Annotators: Classify elements in the text, annotate metadata, and perform name indexing for
faster retrieval.
Custom annotators require software development of modules in the Apache UIMA framework.
Customization steps
To customize a plugin, do the following:
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 133
Document Processing (CPS)
Configuration updates
To modify plugin configurations across all CPS instances, do the following:
1. In Services > Content Processing Service, click CPS Plugin Configuration.
2. Modify configuration settings and save changes.
The changes are applied to all CPS instances.
Note: The following tips will help prevent issues related to configuration changes:
• To avoid re-indexing the docbase or rebuilding the index, ensure there is no data in the system
when changing configurations for text extraction and linguistic processing.
• Do not change configurations when indexing and search are being executed concurrently.
• Changes to default configurations must be reapplied when a new CPS/node is added.
Text extraction
The text extraction phase of CPS can be customized at the following points:
• Pre-processing plugin
• Plugins for text extraction based on mime type, for example, the xPlore default extractor Oracle
Outside In (formerly Stellent) or Apache Tika.
• Post-processing plugin
For best reliability, deploy a custom text extractor on a separate CPS instance. For instructions on
configuring a remote CPS instance for text extraction, see Adding a remote CPS instance, page 102.
The following diagram shows three different mime types processed by different plugins
134 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
Element Description
text_extractor_preprocessor Preprocessing specification, contains name, type,
lib_path, formats, properties
text_extractor Contains name, type, lib_path, formats, properties
text_extractor_postprocessor Postprocessing specification, contains name, type,
lib_path, formats, properties
name Used for logging
type Valid values: Java or native (C/C++)
lib_path Fully qualified class name in CPS host classpath.
formats Contains one or more format element. Each format
element corresponds to a mime type.
properties Contains user-defined property elements.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 135
Document Processing (CPS)
<text_extraction>
<text_extractor>
<name>tika</name>
<type>java</type>
<lib_path>
com.emc.cma.cps.processor.textextractor.CPSTikaTextExtractor
</lib_path>
<properties>
<property name="return_attribute_name">false</property>
</properties>
136 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
<formats>
<format>application/pdf</format>
</formats>
</text_extractor> ...
</text_extraction>
...
</cps_pipeline>
Annotation
Documents from a Content Server are submitted to CPS as dftxml files. The CPS annotation
framework analyzes the dftxml content. Text can be extracted to customer-defined categories, and
the metadata can be annotated with information.
Annotation employs the Apache UIMA framework. A UIMA annotator module extracts data from the
content, optionally adds information, and puts it into a consumable XML output.
A UIMA annotator module has the following content:
• An XML annotation descriptor file.
• A common analysis structure (CAS) that holds the input and output for each stage in the sequence.
The type system descriptor file, also known as a consumer descriptor, defines the type structure for
the CAS. The CAS tool generates supporting code from the type descriptor file.
• Annotation implementation code that extends JCasAnnotator_ImplBase in the UIMA framework.
• An analysis engine consisting of one or more annotators in a sequence.
Sample annotation module files are provided in the xPlore SDK:
• Sample annotator module (dsearch-uima.jar) in /samples/dist.
• Annotator source files in /samples/src/com/emc/documentum/core/fulltext/indexserver/uima/ae.
• Sample indexserverconfig.xml for the example in /samples/config
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 137
Document Processing (CPS)
138 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
Element Description
process-class The xPlore factory class. This class must
implement IFtProcessFactory. The default
class, com.emc.documentum.core.fulltext.in-
dexserver.uima.UIMAProcessFactory, provides for
most processing needs.
properties Contains property elements that define runtime
parameters.
The following example configures a UIMA module. You specify a descriptor XML file, a processing
class, and an optional name. The descriptor file, and process class are hypothetical. The path to the
descriptor file is from the base of the UIMA module jar file.
<pipeline-config>
<pipeline descriptor="descriptors/PhoneNumberAnnotator.xml" process-class="
com.emc.documentum.core.fulltext.indexserver.uima.UIMAProcessFactory" name="
phonenumber_pipeline"/>
</pipeline-config>
The xPlore UIMAProcessFactory class is provided with xPlore. It executes the pipeline based on the
definitions provided.
Element Description
name Attribute of pipeline-usage that references a pipeline
module defined in pipeline-config.
root-result-element Attribute of pipeline-usage that specifies an element
in the dftxml document that will be the root of the
annotated results.
mapper-class Optional attribute of pipeline-usage that maps the
annotation result to an XML sub-tree, which can then
be added to dftxml.
input-element One or more optional child elements of pipeline-usage
that specifies the dftxml elements that are passed to
UIMA analysis engine. Use this to annotate based
on a Document attribute. The name attribute must be
unique within the same pipeline. The element-path
attribute has an xpath value that is used to locate the
XML element in the dftxml document.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 139
Document Processing (CPS)
Element Description
type-mapping One or more elements that specify how to store the
analytic processing results in the original dftxml
document. The name attribute is a unique identifier
for logging. The element-name attribute specifies the
XML element that holds the result value from the
annotator. This element becomes a new child element
within dftxml. The type-name attribute specifies the
fully qualified class name for the class that renders the
result. This value should match the typeDescription
name in the descriptor file.
feature-mapping One or more optional child elements of type-mapping.
The required feature attribute corresponds to a data
member of the type. The required element-name
attribute corresponds to the XML element that the
feature is mapped to.
properties Child element of pipeline-usage. Specifies
concurrency and the elements that are passed to the
annotator.
property name= thread-count Sets concurrency level in the underlying pipeline
engine. By default, it has the same value (100) as
CPS-threadpool-max-size in xPlore administrator
indexing configuration.
property name= send-content-text Controls whether to pass the content text to the
annotator. Default: true. If the annotator does not
require content text, set to false.
property name= send-content-tokens Controls whether to pass the content tokens to the
annotator. Default: false. If the annotator operates on
tokens, set to true.
In the following example of the Apache UIMA room number example, the annotated content is placed
under dmftcustom in the dftxml file (root-result-element). The content of the element r_object_type
(input-element element-path) and object_name are passed to the UIMA analyzer. (If you are annotating
content and not metadata, you do not need input-element.) For the object_name value, a room element
is generated by the RoomNumber class. Next, the building and room-number elements are generated
by a lookup of those features (data members) in the RoomNumber class.
<pipeline-usage root-result-element="dmftcustom" name="test_pipeline">
<input-element element-path="/dmftdoc/dmftmetadata//r_object_type" name="
object_type"/>
<input-element element-path="/dmftdoc/dmftmetadata//object_name" name="
object_name"/>
<type-mapping element-name="room" type-name="com.emc.documentum.core.
fulltext.indexserver.uima.ae.RoomNumber">
<feature-mapping element-name="building" feature="building"/>
<feature-mapping element-name="room-number" feature="roomnumber"/>
</type-mapping>
<type-mapping type-name="com.emc.documentum.core.fulltext.indexserver.
uima.ae.DateTimeAnnot" element-name="datetime"/>
</pipeline-usage>
140 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
See the Apache UIMA documentation for more information on creating the annotator class and
descriptor files. See a simple annotator example, page 141 for xPlore.
UIMA example
The following example is from the UIMA software development kit. It is used to create a UIMA
module that normalizes phone numbers for fast identification of results in xPlore.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 141
Document Processing (CPS)
142 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
The outputs are the features referenced in the same type definition. In this descriptor, the class that
generates the output is referenced along with the feature. For example, the class FtPhoneNumber
generates the output for both the phoneNumber and normalizedForm features.
You specify this descriptor file when you configure UIMA in xPlore. Provide the filename as the value
of the descriptor attribute of the pipeline element.
<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier"
xmlns:xi="http://www.w3.org/2001/XInclude">
<frameworkImplementation>org.apache.uima.java</frameworkImplementation>
<primitive>true</primitive>
<annotatorImplementationName>PhoneNumberAnnotator</annotatorImplementationName>
<analysisEngineMetaData>
<name>Phone Number Annotator</name>
<description>Searches for phone numbers in document content.</description>
<version>1.0</version>
<vendor>The Apache Software Foundation</vendor>
<configurationParameters>
<configurationParameter>
<name>Patterns</name>
<description>Phone number regular expression pattterns.</description>
<type>String</type>
<multiValued>true</multiValued>
<mandatory>true</mandatory>
</configurationParameter>
</configurationParameters>
<configurationParameterSettings>
<nameValuePair>
<name>Patterns</name>
<value>
<array>
<string>\b+\d{3}-\d{3}-\d{4}</string>
<string>\(\d{3}\)\s*\d{3}-\d{4}</string>
</array>
</value>
</nameValuePair>
</configurationParameterSettings>
<typeSystemDescription>
<imports>
<import location="PhoneAnnotationTypeDef.xml"/>
</imports>
</typeSystemDescription>
<capabilities>
<capability>
<inputs></inputs>
<outputs>
<type>FtPhoneNumber</type>
<feature>FtPhoneNumber:phoneNumber</feature>
<feature>FtPhoneNumber:normalizedForm</feature>
</outputs>
<languagesSupported></languagesSupported>
</capability>
</capabilities>
<operationalProperties>
<modifiesCas>true</modifiesCas>
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 143
Document Processing (CPS)
<multipleDeploymentAllowed>true</multipleDeploymentAllowed>
<outputsNewCASes>false</outputsNewCASes>
</operationalProperties>
</analysisEngineMetaData>
</analysisEngineDescription>
144 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Document Processing (CPS)
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 145
Document Processing (CPS)
</pipeline-usage>
You can identify the origin of the CPS processing error in the files cps.log and dsearch.log:
• CPS log examples: Failed to extract text from password-protected files:2011-08-02
23:27:23,288 ERROR
[MANAGER-CPSLinguisticProcessingRequest-(CPSWorkerThread-1)]
Failed to extract text for req 0 of doc VPNwithPassword_zip1312352841145,
err-code 770, err-msg: Corrupt file (native error:TIKA-198:
Illegal IOException from org.apache.tika.parser.pkg.PackageParser@161022a6)
146 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Chapter 6
Indexing
About indexing
The indexing service receives batches of requests to index from a custom indexing client like
the Documentum index agent. The index requests are passed to the content processing service,
which extracts tokens for indexing and returns them to the indexing service. You can configure all
indexing parameters by choosing Global Configuration from the System Overview panel in xPlore
administrator.
For information on creating Documentum indexes, see Creating custom indexes, page 156.
Modify indexes by editing indexserverconfig.xml. For information on viewing and updating this file,
see Modifying indexserverconfig.xml, page 47. By default, Documentum content and metadata are
indexed. You can tune the indexing configuration for specific needs. A full-text index can be created as
a path-value index with the FULL_TEXT option.
For information on scalability planning, see Documentum xPlore Installation Guide.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 147
Indexing
Indexing depth: Only the leaf (last node) text values from subelements of an XML node are returned.
You can configure indexing to return all node values instead of the leaf node value. (This change
negatively affects performance.) Set the value of index-value-leaf-node-only in the index-plugin
element to false. Reindex your documents to see the other nodes in the index.
The paths in the configuration file are in XPath syntax and see the path within the dftxml representation
of the object. (For information on dftxml, see Extensible Documentum DTD, page 370.) Specify
an XPath value to the element whose content requires text extraction for indexing as the value
of the path attribute.
Table 14 Extraction configuration options
Option Description
do-text-extraction Contains one or more for-element-with-name
elements.
for-element-with-name Specifies elements that define content or metadata that
should be extracted for indexing.
for-element-with-name/ When a document to be indexed contains embedded
xml-content XML content, you must specify how that content
should be handled:
• Whether XML content can be tokenized
(tokenize="true | false").
• Whether XML content can be stored within the
input document (store="embed | none"). Note:
When the store attribute is set to none, XML
documents are not tokenized even if the tokenize
attribute is set to true.
• Whether XML attributes can be indexed and
searched (include-attribute-value="true | false").
See XML attribute support
, page 155.
If your repository has many XML documents and a
summary is requested with search results, these files
may not display the summary properly. In this case,
set xml-content embed=”none”.
for-element-with-name/ Sets tokenization of content in specific elements for
save-tokens-for-summary-processing summaries, for example, dmftcontentref (content of
a Documentum document). Specify the maximum
size of documents in bytes as the value of the attribute
extract-text-size-less-than. Set the maximum size
of index tokens for the element as the value of the
attribute token-size.
xml-content on-embed-error You can specify how to handle parsing errors when
the on-embed-error attribute is set to true. Handles
errors such as syntax or external entity access. Valid
values: embed_as_cdata | ignore | fail. The option
embed_as_cdata stores the entire XML content as a
CData sub-node of the specified node. The ignore
option does not store the XML content. For the fail
option, content is not searchable.
148 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Indexing
Option Description
xml-content index-as-sub-path Boolean parameter that specifies whether the path is
stored with XML content when xml-content embed
attribute is set to true.
xml-content file-limit Sets the maximum size of embedded XML content.
compress Compresses the text value of specified elements to
save storage space. Compressed content is about 30%
of submitted XML content. Compression may slow
the ingestion rate by 10-20%.
compress/for-element Using XPath notation, specifies the XML node of
the input document that contains text values to be
compressed.
Configuring an index
Indexes are configured within an indexes element in the file indexserverconfig.xml. For information on
modifying this file, see Modifying indexserverconfig.xml, page 47. The path to the indexes element
is category-definitions/category/indexes. Four types of indexes can be configured: fulltext-index,
value-index, path-value index (sometimes called multi-path index, a it contains a list of sub-paths),
and multi-path.
By default, multi-path indexes do not have all content indexed. If an element does not match one
subpath, specific or general, it is not indexed. To index all element content in a multi-path index, add a
subpath element on //*. For example, to index all metadata content, use the path dmftmetadata//*.
The following table displays the child elements of node/indexes/index that define an index.
Table 15 Index definition options
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 149
Indexing
150 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Indexing
Subpaths
A subpath definition specifies the path to an element. The path information is saved with the indexed
value. For most Documentum applications, you do not need to modify the definitions of the subpath
indexes, except for the following use cases:
• Store facet values in the index. For example:
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 151
Indexing
returning-contents="true" compress="true"/>
• Add a subpath for non-string metadata. By default, all metadata is indexed as string type. To
speed up searches for non-string attributes, add a subpath like the following. Valid types: string |
integer | double | date | datetime.
152 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Indexing
Sort support
You can configure sorting by Documentum attributes. Add a subpath in indexserverconfig.xml for
each attribute that is used for sorting. Requires an attribute value of value-comparision=”true”.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 153
Indexing
Sort debugging
You can use the Query debug and Optimizer tools in xPlore administrator to troubleshoot sorting. When
an index is properly configured and used for the query, you see the following in the Query debug pane:
Found an index to support all order specs. No sort required.
If you do not see this confirmation, check the Optimizer pane. Find the following section:
<endingimplicitmatch indexname="dmftdoc">
In the following example, there are two order-by clauses in the query. The first fails because there
is no sub-path definition for sorting.
<endingimplicitmatch indexname="dmftdoc">
<LucenePlugin>
<ImplicitIndexOptimizer numconditionsall="1"
154 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Indexing
numorderbyclausesall="2">
<condition accepted="true" expr="child::dmftmetadata/
descendant-or-self::node()/child::object_name
[. contains text rule]"/>
<orderbyclause accepted="true" expr="child::dmftmetadata/
descendant-or-self::node()/child::object_name"/>
<orderbyclause accepted="false" reason="
No exact matching subpath configuration found that
matches order-by clause" expr="child::dmftmetadata/
descendant-or-self::node()/child::r_modify_data"/>
</ImplicitIndexOptimizer>
<numacceptedandrejected numconditionsaccepted="1"
numconditionsskipped="0" numorderbyclausessaccepted="1"
numorderbyclausesskipped="1"/>
</LucenePlugin>
<conditions numaccepted="1">
<pnodecondition>
child::dmftmetadata/descendant-or-self::
node()/child::object_name[. contains text rule]
</pnodecondition>
<externallyoptimizedcondition accepted="true">
child::dmftmetadata/descendant-or-self::node()/
child::object_name[. contains text rule]
</externallyoptimizedcondition>
</conditions>
<orderbyclauses numaccepted="1">
<orderbyclause accepted="true">child::dmftmetadata/
descendant-or-self::node()/child::object_name
</orderbyclause>
<orderbyclause accepted="false">child::dmftmetadata/
descendant-or-self::node()/child::r_modify_data
</orderbyclause>
</orderbyclauses>
</endingimplicitmatch>
The information displays that the order by clause is not accepted, which means it would not take
effect. The probable reason is a typo in the sub-path definition in indexserverconfig.xml: The proper
sub-path is dmftmetadata//r_modify_date.
Note: Attributes of descendant elements are not included, even if include-descendants is true. You
must also explicitly add subpath defintions for descendant attributes for them to be indexed.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 155
Indexing
Once the index is rebuilt for the changes to take effect, you can search for attribute values using
XQuery expressions; for example:
for $i score $s in /dmftdoc[dmftmetadata//r_object_id/@dmfttype
ftcontains 'dmid' with stemming] order by $s descending return
<d> {$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name }
{ $i/dmftmetadata//r_modifier } </d>
Indexing and querying XML attribute values in the XML content has the following considerations:
• When you turn on XML zone search (index-as-sub-path = true), XML structure is preserved along
with the content of the elements that is embedded in dmftxml, therefore, XML attributes in the XML
content are indexed in the same way as those in the metadata, and no additional steps are needed.
• When you turn off XML zone search (index-as-sub-path = false), you need to set
include-attribute-value to true on the xml-content element so that xPlore will extract attribute
values from the XML structure for indexing.
156 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Indexing
• View indexing statistics: Expand an instance in the tree and choose Indexing Service. Statistics
are displayed in the right panel: tasks completed, with a breakdown by document properties, and
performance.
• Configure indexing across all instances: Expand Services > Indexing Service in the tree. Click
Configuration. You can configure the various options described in Document processing and
indexing service configuration parameters, page 360. The default values have been optimized for
most environments.
• Start or stop indexing: Select an instance in the tree and choose Indexing Service. Click Enable
or Disable.
• View the indexing queue: Expand an instance in the tree and choose Indexing Service. Click
Operations to display the queue. You can cancel any indexing batch requests in the queue.
Note: This queue is not the same as the index agent queue. You can view the index agent queue in
the index agent UI or in Documentum administrator.
Troubleshooting indexing
You can use reports to troubleshoot indexing and content processing issues. See Document processing
(CPS) reports, page 308 and Indexing reports, page 309 for more information on these reports.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 157
Indexing
158 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Indexing
If CPU, disk I/O, or memory are highly utilized, increase the capacity. Performance on a virtual
server is slower than on a dedicated host. For a comparison or performance on various storage
types, see Disk space and storage, page 329.
After changing sub-path configuration, you need to rebuild the index if there is data in the collection.
Therefore, if the existing data has conflicts with the new sub-path configuration, rebuilding fails.
To resolve the conflict, align the sub-path configuration correctly with the attribute type. After
aligning, rebuild the index, then start indexagent indexing normally.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 159
Indexing
Valid values:
• unit: collection or domain
• domain: domain name.
• collection: Collection name (null for domain consistency check)
• is_fix_trackDB: true or false. Set to false first and check the report. If indexing has not been
turned off, inconsistencies are reported.
160 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Indexing
• batch-size: Numeric value greater than or equal to 1000. Non-numeric, negative, or null values
default to 1000.
• report-directory: Path for consistency report. Report is created in a subdirectory
report-directory/time-stamp/domain_name|collection_name. Default base directory is the
current working directory.
Windows example: Checks consistency of a default collection in defaultDomain and fixes the
trackingDB:
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 161
Indexing
Indexing APIs
Access to indexing APIs is through the interface
com.emc.documentum.core.fulltext.client.IDSearchClient. Each API is described in the javadocs. The
following topics describe the use of indexing APIs.
3. Create your custom class. (See example.) Import IFtIndexRequest in the package
com.emc.documentum.core.fulltext.common.index. This class encapsulates all aspects of an
indexing request:
public interface IFTIndexRequest
{
String getDocId ();
long getRequestId ();
FtOperation getOperation (); //returns add, update or delete
String getDomain ();
String getCategory ();
String getCollection ();
IFtDocument getDocument (); //returns doc to be indexed
public String getClientId();
void setCollection(String value);
public void setClientId(String id);
}
162 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Indexing
SimpleCollectionRouting example
This example routes a document to a specific collection based on Documentum version.
The sample Java class file in the SDK/samples directory assumes that the Documentum
index agent establishes a connection to the xPlore server. Place the compiled
class SimpleCollectionRouting in the Documentum index agent classpath, for example,
xplore_home/wildfly_version/server/DctmServer_Indexagent/deployments/IndexAgent.war/WEB-INF/classes.
This class parses the input DFTXML representation from the index agent. The class gets a metadata
value and tests it, then routes the document to a custom collection if it meets the criterion (new
document).
Imports
import com.emc.documentum.core.fulltext.client.index.
custom.IFtIndexCollectionRouting;
import com.emc.documentum.core.fulltext.client.index.FtFeederException;
import com.emc.documentum.core.fulltext.client.common.IDSearchServerInfo;
import com.emc.documentum.core.fulltext.common.index.IFtIndexRequest;
import com.emc.documentum.core.fulltext.common.index.IFtDocument;
import java.util.List;
import javax.xml.xpath.*;
import org.w3c.dom.*;
IDSearchServerInfo m_serverInfo;
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 163
Indexing
164 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Indexing
if (request.getOperation().toString().equals("add"))
{
setCustomCollection(request);
}
}
return m_updated;
}
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 165
Chapter 7
Index Data: Domains, Categories, and
Collections
This chapter contains the following topics:
• Domain and collection menu actions
• Managing domains
• Delete a corrupted domain
• Configuring categories
• Managing collections
• Rebuilding indexes
• Checking xDB statistics
• Troubleshooting data management
• xDB repair commands
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 167
Index Data: Domains, Categories, and Collections
Managing domains
A domain is a separate, independent, logical, or structural grouping of collections. Domains are
managed through the Data Management screen in xPlore administrator. The Documentum index agent
creates a domain for the repository to which it connects. This domain receives indexing requests
from the repository.
New domain
Select Data Management in the left panel and then click New Domain in the right panel. Choose a
default document category. (Categories are specified in indexserverconfig.xml.) Choose a storage
location from the dropdown list. To create a storage location, see Creating a storage location, page 179.
A Documentum index agent creates a domain for a repository and creates ACL and group collections
in that domain.
New Collection
Create a collection and configure it. See Changing collection properties, page 174.
Configuration
The document category and storage location are displayed (read-only). You can set the runtime mode
as normal (default) or maintenance (for a corrupt domain). The mode does not persist across xPlore
sessions; mode reverts to runtime on xPlore restart.
168 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Data: Domains, Categories, and Collections
Delete domain
If you are unable to start xPlore, with a log entry that the domain is
corrupted, force recovery. Add a property force-restart-xdb=true in
xplore_home/wildfly_version/server/instance_name/deployments/dsearch.war/
WEB-INF/classes/indexserver-bootstrap.properties.
Consider removing the domain if one or both of the following apply:
• The domain is no longer needed.
• The domain library is corrupted. This is quite rare, since no information is stored in the domain
library. If a collection is corrupted, consider deleting only that collection.
If you want to delete a domain with detached collections, off-line collections, or search-only
collections, and these collections are no longer needed, then delete them before deleting the domain.
To back up the domain before deletion, you need to back up the entire config folder, data folder,
and wal folder.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 169
Index Data: Domains, Categories, and Collections
#ApplicationInfo#acl-0.XhiveDatabase.DB"/>
<binding-server name="primary"/>
</segment>
Configuring categories
A category defines a class of documents and their XML structure. The category definition specifies the
processing and semantics that are applied to the ingested XML document. You can specify the XML
elements that have text extraction, tokenization, and storage of tokens. You also specify the indexes
that are defined on the category and the XML elements that are not indexed. More than one collection
can map to a category. xPlore manages categories.
The default categories include dftxml (Documentum content), security (ACL and group), tracking
database, metrics database, audit database, and thesaurus database.
When you create a collection, choose a category from the categories defined in indexserverconfig.xml.
When you view the configuration of a collection, you see the assigned category. It cannot be changed
in xPlore administrator. To change the category, edit indexserverconfig.xml.
You can configure the indexes, text extraction settings, and compression setting for each category. The
paths in the configuration file are in XPath syntax and refer to the path within the XML representation
of the document. (All documents are submitted for ingestion in an XML representation.) Specify an
XPath value to the element whose content requires text extraction for indexing.
Table 17 Category configuration options
Option Description
category-definitions Contains one or more category elements.
category Contains elements that govern category indexing.
properties/property Specifies whether to track the location (index name) of
the content in this category. For Documentum dftxml
track-location
representations of documents, the location is tracked
in the tracking DB. Documentum ACLs and groups
are not tracked because their index location is known.
170 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Data: Domains, Categories, and Collections
...
</category-config>
If clients submit update and delete requests concurrently, xPlore only handles the delete requests,
marking update request completed directly.
Managing collections
• About collections, page 171
• Planning collections for scalability, page 172
• Uses of subcollections, page 173
• Adding or deleting a collection, page 173
• Changing collection properties, page 174
• Routing documents to a specific collection, page 174
• Attaching and detaching a collection, page 177
• Moving a collection, page 177
• Creating a storage location, page 179
• Rebuilding indexes, page 179
• Deleting a collection and recreating indexes, page 181
• Querying a collection, page 179
About collections
A collection is a logical group of XML documents that is physically stored in an xDB detachable
library. A collection represents the most granular data management unit within xPlore. All documents
submitted for indexing are assigned to a collection. A collection generally contains one category of
documents. In a basic deployment, all documents in a domain are assigned to a single default collection.
You can create subcollections under each collection and route documents to user-defined collections.
A collection is bound to a specific instance in read-write state (index and search, index only, or update
and search). A collection can be bound to multiple instances in read-only state (search-only).
Viewing collections
To view the collections for a domain, choose Data Management and then choose the domain the left
pane. In the right pane, you see each collection name, category, usage, state, and instances that the
collection is bound to.
There is a red X next to the collection to delete it. For xPlore system collections, the X is grayed out,
and the collection cannot be deleted.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 171
Index Data: Domains, Categories, and Collections
172 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Data: Domains, Categories, and Collections
You can create additional collections for high ingestion rates and then move those collections to
a parent collection. A single top-level collection hierarchy provides better search performance than
multiple top-level collections.
Uses of subcollections
Create subcollections for the following uses:
• Create multiple top-level collections for migration to boost the ingestion rate. After ingestion
completes, move the temporary collection to a parent collection. The temporary collection is
now a subcollection. The parent and subcollections are searched faster than a search of multiple
top-level collections.
• Create subcollections for data management. For example, you create a collection for 2011 data
with a subcollection to store November data.
The following restrictions are enforced for subcollections, including a collection that is moved
to become a subcollection:
• Subcollections cannot be detached or reattached. For example, a path-value index is defined with no
subpaths, such as the folder-list-index.
• Subcollections cannot be backed up or restored separately from the parent.
• Subcollections must be bound to the same instance as the parent.
• Subcollection state cannot contradict the state of the parent. For example, if the parent is
search_only, the subcollection cannot be index_only or index_and_search. If the parent is indexable,
the adopted collection cannot be search-only.
Exception: When the parent collection is in update_and_search or index_and_search state,
subcollections can be any state.
Note: By default, no fulltext index is created on sub-collections. All indexes are added to the parent
collection for the index in xPlore. When querying a sub-collection directly, ensure the appropriate
index has been created.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 173
Index Data: Domains, Categories, and Collections
3. To create a subcollection, click the parent collection in the navigation pane and then click New
Subcollection. The storage location is the same as the location for the parent.
4. To delete a collection, choose a domain and then click X next to the collection you wish to delete.
Note: When a collection is deleted, the corresponding segment files are deleted and cannot be
recovered.
174 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Data: Domains, Categories, and Collections
Routing on XPath
To route objects with a specific XPath expression, use the FtCollectionRoutingOnXPath class. You can
construct the XPath to configure complex routing criteria, such as routing on multiple field values. The
XPath expression should return a boolean value.
In the following example, if the object's i_folder_id equals 3456789, it will be routed to collection 1,
and if it equals 456789, it will be routed to collection 2.
<backend-collection-routing class-name="FtCollectionRoutingOnXPath"
default-collection="default" category =”dftxml”>
<properties>
</properties>
<collection-routing>
<rule condition="boolean(/dmftdoc//i_folder_id='3456789')"
collection="collection1"/>
<rule condition="boolean(/dmftdoc//i_folder_id='456789')"
collection="collection2"/>
<rule condition="boolean(/dmftdoc//i_folder_id='4567890' 'and'
/dmftdoc//object_name='test')" collection="collection3"/>
</collection-routing>
</backend-collection-routing>
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 175
Index Data: Domains, Categories, and Collections
176 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Data: Domains, Categories, and Collections
The configured mapper rule is inclusive. If one object can be routed by two rules, then the first rule is
used. If invalid collections are configured, that rule is skipped during routing.
Moving a collection
If you are moving a collection to another xPlore instance, choose the collection and click
Configuration. Change Binding instance.
You can create additional collections for faster ingestion, and then move it to become a subcollection
of another collection after ingestion has completed. When you move it as a subcollection, search
performance is improved.
If a collection meets the following requirements, you can move it to become a subcollection:
• Facet compression was disabled for all facets before the documents were indexed.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 177
Index Data: Domains, Categories, and Collections
• The collection is not itself a subcollection (does not have a parent collection).
• The collection does not have subcollections.
• The collection has the same category and index definitions (in indexserverconfig.xml) as the new
parent.
• xPlore enforces additional restrictions after the collection has been moved. For information on
subcollection restrictions, see Uses of subcollections, page 173.
1. Choose the collection and click Move.
2. In the Move to dialog box, select a target collection. This collection can be a subcollection or a
top-level collection.
• XhiveDatabase.bootstrap. Update the path of each segment to the new path. For example:
<segment reserved="false" library-id="0" library-path=
"/Repository1/dsearch/Data/default" usable="true"
usage="detachable_root" state="read_write" version="1"
temp="false" id="Repository1#dsearch#Data#default">
<file id="12" path="C:\xPlore_1/data-new\Repository1\default\
xhivedb-Repository1#dsearch#Data#default-0.XhiveDatabase.DB"/>
<binding-server name="primary"/>
</segment>
5. On all instances, change the path in indexserver-bootstrap.properties to match the new bootstrap
location. For example:
xhive-data-directory=C\:/xPlore_1/data-new
6. Delete the WildFly cache (/work folders) from the index agent and primary instance web
applications. Also delete WildFly cache folders for the secondary instances. The path of the /work
folder is xPlore_home/wildfly_version/server/DctmServer_InstanceName/work.
7. Start the xPlore instances.
8. Start the index agent.
178 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Data: Domains, Categories, and Collections
9. Import a test document and search for it using xPlore administrator. Also, search for a document
that has already been indexed.
Querying a collection
1. Choose a collection in the Data Management tree.
2. In the right pane, click Execute XQuery.
3. Check Get query debug to debug your query. The query optimizer is for technical support use.
To route queries to a specific collection, see Routing a query to a specific collection, page 276.
Rebuilding indexes
You can rebuild indexes either by refeeding data from the original source, or perform an online index
rebuild, depending on your use case. Starting from xPlore 1.5, index rebuilding is performed on
collection bindings instead of the primary node (previous behavior). This change improves the index
scalability and reduces load on the primary node.
• Data refeed
You use the data refeed approach to rebuild indexes in the following scenarios:
– Database corruption involves index, data pages, and redo log
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 179
Index Data: Domains, Categories, and Collections
– You want to restore from some massive failure without any backup
– There are some changes to the system that significantly impact text extraction
– It is OK for the system to be out of service when you rebuild the index
– You changed the data model to add indexable attributes for an object type.
• Online index rebuild
You perform online index rebuild in the following scenarios:
– Database corruption involves only Lucene (multi-path index)
– There have been schema changes to the underlying index such as improved data typing
– Downtime is highly undesirable
– You added a stop words list.
– You added a new facet.
– You want to strip out accents from words in the index.
– You want to use index agent or DFC filters, and the objects have already been indexed.
180 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Data: Domains, Categories, and Collections
– false: Final merges are disabled during online index rebuilds. When the online index rebuild is
complete, the system automatically triggers a final merge on the reindexed collection.
Since the final merge is very I/O intensive and may cause noticeable performance drop in both
indexing and query performance when running, it is recommended that you set this value to false to
prevent the final merge from running during online index rebuilds. After rebuilding is completed,
the final merge is executed immediately to ensure query performance is optimized.
Restart xPlore for the settings to take effect.
4. In the index agent UI, provide the path to the list of object IDs. See Indexing documents in normal
mode, page 85.
5. After the index is rebuilt, run ftintegrity or the State of Index job in Content Server. See Using
ftintegrity, page 79 or Running the state of index job, page 82.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 181
Index Data: Domains, Categories, and Collections
Documentum object attribute named customer_date was originally typed as an integer as a numeric
timestamp but was later retyped as a date format. xPlore stored the objects originally as integer, so the
objects need to be removed and reloaded. In such cases, you need to create and refeed a new collection.
1. Remove the problematic collections:
a. Navigate to the domain in xPlore administrator left panel.
b. Click the X next to the collection name to delete it.
2. Modify the index definition, if necessary. After modifying indexserverconfig.xml, you must restart
xPlore for the changes to take effect.
3. Create a collection with the same name as the one you deleted.
4. Refeed the documents to launch a full reindexing: in the index agent UI, select Start new
reindexing operation.
If custom routing is defined, it is applied. Otherwise, default routing is applied.
The –d argument (xhivedb) is the same for all xPlore installations. The path argument is the path
within xDB to your collection, which you can check in the XHAdmin tool. The output file is large
for most collections. Redirect output to a file using the -o option. (The file must exist before you run
the command.) For example:
statistics-ls -d xhivedb -o C:/temp/DefaultCollStats.txt
/TechPubsGlobal/dsearch/Data/default --lucene-params --details
The output gives the status of each collection (Merge State). For example:
LibPath=/PERFXCP1/dsearch/Data/default IndexName={dmftdoc}
ActiveSegment=EI-8b38b821-4e29-42b2-9fe0-8e6c82764a6b-211106232537097-luceneblobs-1
EntryName=LI-3bb3483d-38c9-4d14-8a90-5a13a9a19717
MergeState=NONE isFinalIndex=FINAL
LastMergeTime=12/09/2012-07:31:11 MinLSN=0 MinLSN=0
LibPath=/PERFXCP1/dsearch/Data/default IndexName={dmftdoc}
ActiveSegment=EI-8b38b821-4e29-42b2-9fe0-8e6c82764a6b-211106232537097-luceneblobs-1
EntryName=LI-2ea1578b-0d82-496a-9c81-ee15502b3cbe
MergeState=NONE isFinalIndex=NOT-FINAL
LastMergeTime=14/09/2012-10:15:48 MinLSN=786124
MinLSN=485357881901
182 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Data: Domains, Categories, and Collections
Other statistics
You can check other statistics such as returnable fields, size of index, and number of documents. The
statistics command has the same arguments as statistics-ls.
For example:
statistics --docs-num -d xhivedb /TechPubsGlobal/dsearch/Data/default dmftdoc
Statistics options:
• --lucene-sz: Size of Lucene fields (.fdt), Index to fields (.fdx), and total size of Lucene index in
bytes (all).
• --lucene-rf: Statistics of returnable fields (configured in indexserverconfig.xml). Includes total
count of path mapping and value mapping and compression mapping consistency.
• --lucene-list: Shows whether each index is final.
• --lucene-params: Lists the xDB parameters set in xDB.properties.
• --docs-num: Displays the number of documents in the collection. This value should match the
number displayed for a collection in xPlore administrator.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 183
Index Data: Domains, Categories, and Collections
Command arguments
To list all available commands, enter help at the xdb prompt. To get arguments for any of the
commands, enter <command> help, for example:
xdb>help repair-merge
The path_to_index argument in repair commands is a library path, not a file system path. For example:
repository_name/dsearch/Data/default. The index_name value is dmftdoc. dmftdoc is the multi-path
index for Documentum repositories.
184 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Data: Domains, Categories, and Collections
index entry :
LI-a7f66aa5-a729-4774-8b97-0950b0703f3c has
been merged
into entry:
LI-abf1d2de-1783-4ce7-a013-6cdf9e871bed
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 185
Index Data: Domains, Categories, and Collections
processing index
C:\xPlore\data\LH1\default\lucene-index\dmftdoc\EI-d1071aae-a76c-46ee-825e-6955c
313954b
processing index
C:\xPlore\data\LH1\default\lucene-index\dmftdoc\EI-d1071aae-a76c-46ee-825e-6955c
313954b
186 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Chapter 8
Backup and Restore
About backup
Back up a collection, domain, or xPlore federation after you make xPlore environment changes:
Adding or deleting a collection, or changing a collection binding. If you do not back up, then restoring
the domain or xPlore federation puts the system in an inconsistent state. Perform all your anticipated
configuration changes before performing a full federation backup.
You can back up an xPlore federation, domain, or collection using xPlore administrator or use your
preferred volume-based or file-based backup technologies. The Documentum xPlore Installation
Guide describes high availability and disaster recovery planning.
You can use external automatic backup products like EMC Networker. All backup and restore
commands are available as command-line interfaces (CLI) for scripting. See the chapter “Automated
Utilities (CLI).
If the disk is full, set the collection state to update_and_search and create a collection in a new
storage location.
Note: If you remove segments from xDB, your backups cannot be restored.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 187
Backup and Restore
segments, xDB page owners, and xDB DOM nodes. For a tool that checks consistency between the
index and the tracking database, see Running the standalone data consistency checker, page 160.
Backup Methods
xPlore supports the following backup approaches.
• Native xDB backups: These backups are performed through xPlore administrator. You can back up
hot (while indexing), warm (search only), or cold (offline). See Performing a native xDB backup
in xPlore Administrator, page 190.
• File-based backups: Back up the xPlore federation directory xplore_home/data, xplore_home/config,
and /wal files. Backup is warm (search only) or cold (offline). Incremental file-based backups are not
recommended, since most files are touched when they are opened. In addition, Windows file-based
backup software requires exclusive access to a file during backup and thus requires a cold backup.
• Volume-based (snapshot) backups: Backup is warm (search only) or cold (offline). Can be
incremental or full backup of disk blocks. Volume-based backups require a third-party product such
as EMC Timefinder.
You can use the CLI tool to perform scripted backup for the above-mentioned backups. See Scripted
backup, page 202.
A snapshot, which is a type of point-in-time backup, is a backup of the changes since the last snapshot.
A copy-on-write snapshot is a differential copy of the changes that are made every time new data is
written or existing data is updated. In a split-mirror snapshot, the mirroring process is stopped and
a full copy of the entire volume is created. A copy-on-write snapshot uses less disk space than a
split-mirror snapshot but requires more processing overhead.
Backup combinations
Periodic full backups are recommended.
188 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Backup and Restore
About restore
All restore operations are performed offline. If you performed a hot backup using xPlore administrator,
the backup file is restored to the point at which backup began.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 189
Backup and Restore
Each xPlore instance owns the index for one or more domains or collections, and a transaction log. If
there are multiple instances, one instance can own part of the index for a domain or collection.
Note: You cannot restore the backup of an earlier version of xPlore, because the xDB version has
changed. Back up your installation immediately after upgrading xPlore.
Scripted restore
Use the CLI for scripted restore of a federation, collection, domain. See the chapter “Automated
Utilities (CLI)”. xPlore supports offline restore only. xPlore instances must be shut down to restore
a collection or an xPlore federation.
190 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Backup and Restore
• Warm backup: Suspend the indexing service on each instance that the collection is bound to. To
do so, choose the service under the instance and click Operations; then click Disable to suspend
the service.
• Cold backup: Suspend both the indexing and search service on each instance that the collection
is bound to. To do so, choose the service under the instance and click Operations; then click
Disable to suspend the service.
In xPlore Administrator, you can back up the federation, a domain, or a collection. The following steps
Note: Backup is not available for a subcollection. Back up the parent and all its subcollections.
To perform a hot backup or warm backup, use the following procedure. It is not applicable for cold
backups, since xPlore instances are suspended.
1. Select the item you want to back up:
• Federation: Click Data Management.
• Domain or collection: Under Data Management, click the domain or collection you want
to back up.
2. Click Backup.
3. Specify a backup location or choose the default one.
For federation backup, choose Full backup.
4. When backup is complete, a message is displayed showing where backup files are located.
5. Back up your jars or DLLs for custom content processing plugins or annotators.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 191
Backup and Restore
3. Domain only: Generate the orphaned segment list. Use the CLI listOrphanedSegments. See
Orphaned segments CLIs, page 205.
4. Stop all xPlore instances.
5. Run the restore CLI. See Scripted federation restore, page 202, Scripted domain restore, page
203, or Scripted collection restore, page 204.
6. Start all xPlore instances. No further steps are needed for federation restore. Do the following
steps for domain or collection restore.
7. Domain only: If orphaned segments are reported before restore, run the CLI
purgeOrphanedSegments. If an orphaned segment file is not specified, the orphaned segment IDs
are read from stdin.
8. Force-attach the domain or collection using xPlore administrator.
9. Perform a consistency check and test search. Select Data Management in xPlore Administrator
and then choose Check DB Consistency.
10. Run the ACL and group replication script to update any security changes since the backup. See
.Manually updating security, page 56.
11. Run ftintegrity. For the start date argument, use the date of the last backup, and for the end date use
the current date. See Using ftintegrity, page 79.
For automated (scripted) restore, see Scripted federation restore, page 202, Scripted domain restore,
page 203, or Scripted collection restore, page 204.
For remote restore, see Performing a remote restore
, page 192.
192 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Backup and Restore
5. Set all backed up domains to the reset state and then turn on indexing. (This state is not displayed
in xPlore administrator and is used only for the backup and restore utilities.) Use the script in
Collection and domain state CLIs, page 206.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 193
Backup and Restore
CLI troubleshooting
If a CLI does not execute correctly, check the following:
• The output message may describe the source of the error.
• Check whether the host and port are set correctly in xplore_home/dsearch/admin/xplore.properties.
• Check the CLI syntax.
– Linux requires double quotes before the command name and after the entire command and
arguments.
– Separate each parameter by a comma.
– Do not put Boolean or null arguments in quotation marks. For example:xplore
"backupFederation null,false,null"
• Check whether the primary instance is running:http://instancename:port/dsearch
• Check whether the JMX MBean server is started. Open jconsole in
xplore_home/java64/java_version/bin and specify Remote Process with the value
service:jmx:rmi://jndi/rmi://myhost:9331/dsearch. (9331 is the default JMX port. If you have
specified a base port for the primary instance that is not 9300, add 31 to your base port.) If
jconsole opens, the JMX layer is OK.
• Check the Admin web service. Open a browser with the following
link. If the XML schema is shown the web service layer is
operative.http://instancename:port/dsearch/ESSAdminWebService?wsdl
• Check dsearch.log in xplore_home/wildfly_version/server/instance_name/logs for a CLI-related
message.
194 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Backup and Restore
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 195
Backup and Restore
Dead objects
For the XhiveException: OBJECT_DEAD.
1. In xPlore administrator, make sure that all collections have a state of index_and_search.
2. Stop xPlore instances.
3. Start xDB in repair mode.
a. Change the memory in xdb.properties for all nodes. This file is located in
xplore_home/wildfly_version/
196 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Backup and Restore
4. Input the following xhcommand, specifying the domain, collection name, and parameter. The
repair command can take a long time if the index is large.
To scan without removing dead objects, remove the option '–repair-index':
repair-blacklists -d xhivedb /%DOMAIN%/dsearch/Data/%COLLECTION% dmftdoc
--check-dups --repair-index
If "Total potential impacted normal objects" is not 0, a file is generated with the following name
convention: %DOMAIN%#dsearch#Data#%COLLECTION%_objects_2012-03-12-21-02-06.
Resubmit this file using the index agent UI.
1. Log in to the index agent UI.
2. Choose Object File.
3. Browse to the file and choose Submit.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 197
Backup and Restore
198 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Chapter 9
Automated Utilities (CLI)
Property Description
host Primary xPlore instance host: fully qualified hostname
or IP address
port Primary xPlore instance port. If you change to the
HTTPS protocol, change this port.
password xPlore administrator password, set up when you
installed xPlore (same as xPlore administrator login
password)
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 199
Automated Utilities (CLI)
Property Description
verbose Prints all admin API calls to console. Default: true.
For batch scripts, set to false.
protocol Valid values: http or https
Examples:
xplore.bat resumeDiskWrites
./xplore.sh resumeDiskWrites
4. Run a CLI command with parameters using the following syntax appropriate for your environment
(Windows or Linux). The command is case-insensitive. Use double quotes around the command,
single quotes around parameters, and a forward slash for all paths:
xplore.bat "<command> [parameters]"
./xplore.sh "<command> [parameters]"
Examples:
xplore “backupFederation 'c:/xPlore/dsearch/backup', true, null”
./xplore.sh "dropIndex 'dftxml', 'folder-list-index' "
The command executes, prints a success or error message, and then exits.
5. (Optional) Run CLI commands from a file using the following syntax:
xplore.bat -f <filename>
Examples:
xplore.bat -f file.txt
./xplore.sh -f file.txt
Call the wrapper without a parameter to view a help message that lists all CLIs and their arguments.
For example:
xplore help backupFederation
./xplore.sh help backupFederation
200 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Automated Utilities (CLI)
For example, the following batch file sample.gvy suspends index writes and performs a federation
backup. Use a forward slash for paths.
suspendDiskWrites
folder='c:/folder'
isIncremental=false
backupFederation folder, isIncremental, null
println 'Done'
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 201
Automated Utilities (CLI)
Scripted backup
The default backup location is specified in indexserverconfig.xml as the value of the path attribute
on the element admin-config/backup-location. Specify any path as the value of [backup_path].
202 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Automated Utilities (CLI)
For example:
xplore "restoreFederation 'C:/xPlore/dsearch/backup/federation/
2011-03-23-16-02-02' "
For example:
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 203
Automated Utilities (CLI)
3. Generate the orphaned segment list using the CLI listOrphanedSegments. If an orphaned segment
file is not specified, the IDs or orphaned segments are sent to stdio. See Orphaned segments
CLIs, page 205.
4. Stop all xPlore instances.
5. Run the restore CLI. If no bootstrap path is specified, the default location in the WEB-INF classes
directory of the xPlore primary instance is used.
"restoreCollection, [backup_path], [bootstrap_path]"
204 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Automated Utilities (CLI)
Examples:
"detachDomain "'myDomain', true"
or
"detachCollection collection('myDomain','default'), true"
2. Attach syntax:
"attachDomain "']domain_name]', true"
or
"attachCollection collection('[domain_name]', '[collection_name]'), true"
Examples:
"attachDomain 'myDomain', true"
or
"attachCollection collection('myDomain','default'), true"
For example:
"listOrphanedSegments 'domain', 'backup/myDomain/2012-10', '
c:/temp/orphans.lst'
'C:/xplore/wildfly9.0.1/server/DctmServer_PrimaryDsearch/deployments/dsearch.war/
WEB-INF/classes/indexserver-bootstrap.properties' "
or
"listOrphanedSegments 'collection', 'backup/myDomain/default/2012-10', null,
'C:/xplore/wildfly9.0.1/server/DctmServer_PrimaryDsearch/deployments/dsearch.war/
WEB-INF/classes/indexserver-bootstrap.properties' "
2. If orphaned segments are reported before restore, run the CLI purgeOrphanedSegments. If
[orphan_file_path] is not specified, the segment IDs are read in from stdin. For file path, use
forward slashes. Syntax:
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 205
Automated Utilities (CLI)
For example:
purgeOrphanedSegments 'c:/temp/orphans.lst'
or
purgeOrphanedSegments null
Arguments:
• [backup_path]: Path to your backup file. If not specified, the default backup location
in indexserverconfig.xml is used: The value of the path attribute on the element
admin-config/backup-location. Specify any path as the value of .
• [bootstrap_path]: Path to the bootstrap file in the WEB-INF classes directory of the xPlore primary
instance.
206 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Automated Utilities (CLI)
Example:
"activateSpareNode 'node2', 'spare1' "
To remove the rebuild index property and clean up the temporary index folder, use the following CLI:
Syntax:
removeInConstructionIndexes [domain], [collection]
For example:
xplore "removeInConstructionIndexes 'myDocbase',
'myCollection' ”
./xplore.sh "removeInConstructionIndexes 'myDocbase',
'myCollection' "
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 207
Automated Utilities (CLI)
For example:
xplore "isFinalMergeOngoing 'myDocbase', 'myCollection' ”
./xplore.sh "getFinalMergeStatus 'myDocbase', 'myCollection' "
You can also see merge status using xPlore administrator. A Merging icon is displayed during the
merge progress.
For example:
xplore "startFinalMerge 'myDocbase', 'myCollection' ”
./xplore.sh "stopFinalMerge 'myDocbase', 'myCollection' "
208 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Chapter 10
Search
About searching
Specific attributes of the dm_sysobject support full-text indexing. Use Documentum Administrator
to make object types and attributes searchable or not searchable and to set allowed search operators
and default search operator.
• Set the is_searchable attribute on an object type to allow or prevent searches for objects of that type
and its subtypes. Valid values: 0 (false) and 1 (true). The client application must read this attribute.
(The indexing process does not use it.) If is_searchable is false for a type or attributes, Webtop does
not display them in the search UI. Default: true.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 209
Search
• Set allowed_search_ops to set the allowed search operators and default_search_op to set the default
operator. Valid values for allowed_search_ops and default_search_op:
Value Operator
1 =
2 <>
3 >
4 <
5 >=
6 <=
7 begins with
8 contains
9 does not contain
10 ends with
11 in
12 not in
13 between
14 is null
15 is not null
16 not
• The default_search_arg attribute sets a default argument for the default operator. The client
must read these attributes; the indexing process does not use them. Webtop displays the allowed
operators and the default operator.
Content Server client applications issue queries through the DFC search service or through DQL. DFC
6.6 and higher translates queries directly to XQuery for xPlore. DQL queries are handled by the
Content Server query plugin, which translates DQL into XQuery unless XQuery generation is turned
off. Not all DQL operators are available through the DFC search service. In some cases, a DQL search
of the Server database returns different results than a DFC/xPlore search. For more information on
DQL and DFC search differences, see DQL, DFC, and DFS queries, page 243.
DFC generates XQuery expressions by default. If XQuery is turned off in DFC, FTDQL queries are
generated. The FTDQL queries are evaluated in the xPlore server. If all or part of the query does not
conform to FTDQL, that portion of the query is converted to DQL and evaluated in the Content Server
database. Results from the XQuery are combined with database results. For more information on
FTDQL and SDC criteria, see the Documentum Content Server DQL Reference.
xPlore search is case-insensitive and ignores white space or other special characters. Special characters
are configurable.
Related topics:
• Handling special characters, page 118
• Search reports, page 309
• Troubleshooting slow queries, page 266
• Changing search results security, page 55
210 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
Query operators
Operators in XQuery expressions, DFC, and DQL are interpreted in the following ways:
• DQL operators: All string attributes are searched with the ftcontains operator in XQuery. All other
attribute types use value operators (= != < >).In DQL, dates are automatically normalized to
UTC representation when translated to XQuery.
• DFC: When you use the DFC interface IDfXQuery, your application must specify dates in UTC to
match the format in dftxml.
• XQuery operators
– The value operators = != < > specify a value comparison search. Search terms are not tokenized.
Can be used for exact match or range searching on dates and IDs.
Any subpath that can be searched with a value operator must have the value-comparison attribute
set to true for the corresponding subpath configuration in indexserverconfig.xml. For example,
an improper configuration of the r_modify_date attribute sets full-text-search to true. A date of
‘2010-04-01T06:55:29’ is tokenized into 5 tokens: '2010' '04' '01T06' '55' '29'. A search for '04'
returns any document modified in April. The user gets many non-relevant results. Therefore,
r_modify_date must have value-comparison set to true. Then the date attribute is indexed as one
token. A search for '04' would not hit all documents modified in April.
– The ftcontains operator (XQFT syntax) specifies that the search term is tokenized before
searching against index.
If a subpath can be searched by ftcontains, set the full-text-search attribute to true in the
corresponding subpath configuration in indexserverconfig.xml.
Administering search
Common search service tasks
You can configure all search service parameters by choosing Global Configuration from the
System Overview panel in xPlore administrator. The default values have been optimized for most
environments.
Enabling search
Enable or disable search by choosing an instance of the search service in the left pane of the
administrator. Click Disable (or Enable).
Canceling running queries
Open an instance and choose Search Service. Click Operations. All running queries are displayed.
Click a query and delete it.
Viewing search statistics
Choose Search Service and click an instance:
• Accumulated number of executed queries
• Number of failed queries
• Number of pending queries
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 211
Search
212 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
</warmup>
</performance>
Set the child element warmup status to on or off. You can set the warmup timeout in seconds. If the
warmup hangs, it is canceled after this timeout period.
Note: By default, the relative path to the warmup script QueryRunner.bat (Windows) or
QueryRunner.sh (Linux) in indexserverconfig.xml assumes that the configuration directory is
xplore_home/config. If not, you must modify the path to a corresponding relative path or an absolute
path for the warmup script to run correctly.
Configuring warmup
Configure warmup in query.properties. This file is in xplore_home/dsearch/xhive/admin. Restart all
xPlore instances for your changes to take effect.
Key Description
xplore_qrserver_host Primary xPlore instance host. Not needed for
file-based warmup.
xplore_qrserver_port Port for primary xPlore instance. Not needed for
file-based warmup.
xplore_protocol Communication protocol used for file-based warmup.
Valid values: http and https
xplore_domain Name of domain in xPlore (usually the same as
the Content Server name). You must change this
to a valid domain. This is used only for file-based
warmup. An incorrect domain name is recorded in
Querywarmup.log as FtSearchException: Invalid
domain.
security_eval Evaluate security for queries in warmup. Default:
false. Used only for file-based warmup.
user_name Name of user or superuser who has permission to run
warmup queries. Required if security_eval is set to
true. Used only for file-based warmup.
super_user Set to true if the user who runs warmup queries is a
Content Server superuser. Required if security_eval is
set to true.
query_file Specify a file name that contains warmup queries.
Query can be multi-line, but the file cannot contain
empty lines. If no name is specified, queries are read
from the audit record, (Query auditing is enabled by
default.)
batch_size Set number of results in one batch for warmup
queries. Default: 20.
timeout Set maximum milliseconds to try a warmup query.
Default: 60000.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 213
Search
Key Description
max_retries Set maximum number of times to retry a failed query.
Default: 10.
print_result Set to true to print results to queryWarmer.log in
xplore_home/dsearch/xhive/admin/logs. Default:
false.
fetch_result_byte Maximum number of bytes to fetch in a warmup
query. Default: 4096000.
read_from_audit Set to true (default) to read queries from audit record.
Requires query auditing enabled (default). (See
Auditing queries, page 264.) Warmup is read first
from a file if query_file is specified. Default: true.
number_of_unique_users Number of users in audit log for whom to replay
queries. Default: 10
number_of_queries_per_user Number of queries for each user to replay. Default: 1
schedule_warmup Enables the warmup schedule. Default: true.
schedule_warmup_period How often the warmup occurs. For example, a value
of 1 in DAYS units results in daily warmup. Default:
1.
schedule_warmup_units Valid values: DAYS | HOURS | MINUTES |
SECONDS | MILLISECONDS | MICROSECONDS.
Default: DAYS.
initial_delay Start warmup after the specified initial delay,
and subsequently after schedule_warmup_period.
Default: 0. If any execution of the task encounters an
exception, subsequent executions are suppressed.
query_response_time Select the queries from audit records for replay that
has query response time (fetch time + execution time)
in milliseconds lower or equal this value. Default:
60000.
exclude_users Select the queries from audit records that are not run
by these users. Set a comma-separated list of users.
Default: unknown.
check_node_status_interval How much time in milliseconds warmup waits before
it checks again the status of nodes when one more
nodes are not successfully started in a multi-node
environment. Warmup runs only after all the nodes
are started. Default: 60000.
check_node_status_max_retries Number of times to check the status of nodes when
one or more nodes are not successfully started in a
multi-node environment. Warmup runs only after all
the nodes are started. Set it to -1 to check indefinitely.
Default: 15.
214 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
Warmup logging
Warmup activity is logged in the audit record and reported in the admin report Audit Records for
Search Component:
• Warmup is logged in the file queryWarmer.log, located in xplore_home/dsearch/xhive/admin/logs.
Use this log to verify when a collection was last warmed up.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 215
Search
• All the queries that are replayed for warmup from a file or the audit record are tagged as a
QUERY_WARMUP event in the audit records. The log includes the query to get warmup queries.
You can see this type in the admin report Top N Slowest Queries.
To view all warmup queries in the audit record, run the report Audit records for search component
(query type: Warmup Tool Query (QUERY_WARMUP)) in xPlore administrator.
3. By default the Documentum attribute r_modify_date is used to boost scores in results (freshness
boost). You can remove the freshness boost factor, change how much effect it has, or boost a
custom date attribute.
• To remove this boost, edit indexserverconfig.xml and set the property enable-freshness-score
to false on the parent category element. This change affects only query results and does not
require reindexing.
<category name='dftxml'><properties>
...
<property name="enable-freshness-score" value="false" />
</properties></category>
216 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
• Change the freshness boost factor. Changes do not require reindexing. Only documents that are
six years old or less have a freshness factor. The weight for freshness is equal to the weight for the
Lucene relevancy score. Set the value of the property freshness-weight in index-config/properties
to a decimal between 0 (no boost) and 1.0 (override the Lucene relevancy score). For example:
<index-config><properties>
...
<property name="enable-subcollection-ftindex" value="false"/>
<property name="freshness-weight" value="0.75" />
• To boost a different date attribute, specify the path to the attribute in dftxml as the value of
a freshness-path property. This change requires reindexing. In the following example, the
r_creation_date attribute is boosted:
<index-config><properties>
...
<property name="enable-subcollection-ftindex" value="false"/>
<property name="freshness-weight" value="0.75" />
<property name="freshness-path" value="dmftmetadata/.*/r_creation_date" />
4. Configure weighting for query term source: original term. This does not require reindexing. Edit
the following properties in search-config/properties. The value ranges from 1 to 1000.
<property name="query-original-term-weight" value="100.0"/>
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 217
Search
DTD URLs are resolved. External entities are expanded and added to the XML content. DTDs and
external entities can be placed on the xPlore host.
Note: When the size of the file exceeds the size that is specified as file-limit on the xml-content element
in indexserverconfig.xml, XML element values are embedded into the dmftcontentref element of the
dmftxml record. You can still do a full-text search on the file contents if it does not fail XML parsing.
218 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
Note: If your documents containing XML have already been indexed, they must be reindexed
to include the XML content.
Note: If the content exceeds the CPS file-limit in xml-content, XML content is not embedded.
The following illustration shows how XML content is processed depending on your configuration.
The table assumes that the document submitted for indexing does not exceed the size limit in index
agent configuration and the content limit in CPS configuration.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 219
Search
You can use IDfXQuery to generate the following query, which is much more specific and performs
better:
let $j = for $x in collection('/XMLTest')/dmftdoc
[dmftcontents/dmftcontent/dmftcontentref/company/staff
ftcontains 'John' with stemming]
220 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
• Perform boolean searches using DQL, XQuery, or the DFC interface IDfXQuery.
For a table of VQL examples and their equivalents in XQuery expression, see VQL and XQuery
Syntax Equivalents, page 377.
Adding a thesaurus
A thesaurus provides results with terms that are related to the search terms. For example, when a user
searches for car, a thesaurus expands the search to documents containing auto or vehicle. When
you provide a thesaurus, xPlore expands search terms in full-text expressions to similar terms. This
expansion takes place before the query is tokenized. Terms from the query and thesaurus expansion
are highlighted in search results summaries.
A thesaurus can have terms in multiple languages. Linguistic analysis of all the terms that are returned,
regardless of language, is based on the query locale.
The thesaurus must be in SKOS format, a W3C specification. FAST-based thesaurus dictionaries must
be converted to the SKOS format. Import your thesaurus to the file system on the primary instance host
using xPlore administrator. You can also provide a non-SKOS thesaurus by implementing a custom
class that defines thesaurus expansion behavior. See Adding custom access to a thesaurus, page 287.
SKOS format
The format starts with a concept (term) that includes a preferred label and a set of alternative labels.
The alternative labels expand the term (the related terms or synonyms). Here is an example of such an
entry in SKOS:
<skos:Concept rdf:about="http://www.my.com/#canals">
<skos:prefLabel>canals</skos:prefLabel>
<skos:altLabel>canal bends</skos:altLabel>
<skos:altLabel>canalized streams</skos:altLabel>
<skos:altLabel>ditch mouths</skos:altLabel>
<skos:altLabel>ditches</skos:altLabel>
<skos:altLabel>drainage canals</skos:altLabel>
<skos:altLabel>drainage ditches</skos:altLabel>
</skos:Concept>
xPlore supports SKOS two-way expansion, so when searching for a term defined by a preferred
label or alternative labels, all labels defined in that concept are returned. In this example, a search
on canals returns documents that contain words such as canals, canal bends, canalized streams,
ditches, and others. Similarly, a search on ditches returns documents containing ditches, canals, canal
bends, canalized streams, and so on.
xPlore also supports one-way expansion, but only when using an xQuery relationship option. With
xQuery, you can specify that relationship “USE” only expands a preferred term with alternative terms,
and “UF”only expands alternative terms with a preferred term. For example, the following xQuery
would expand canals to canal bends, canalized streams, ditches, and so on, but cannot expand ditches
to canals and other terms:
.contains text “canals” using thesaurus at “$yourthesaurusuri"
relationship “USE"
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 221
Search
Terms from multiple languages can be added like the following example:
<skos:Concept rdf:about="http://www.fao.org/aos/#canals">
<skos:prefLabel xml:lang="fr">canal</skos:prefLabel>
<skos:altLabel xml:lang="fr">coudes du canal</skos:altLabel>
<skos:altLabel xml:lang="fr">fossés</skos:altLabel>
…
<skos:prefLabel xml:lang="es">canal</skos:prefLabel>
<skos:altLabel xml:lang="es">curvas del canal</skos:altLabel>
<skos:altLabel xml:lang="es">zanjas</skos:altLabel>
…
</skos:Concept>
222 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
append,c,l,param_value
true
save,c,l
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 223
Search
224 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
• Query plan for thesaurus XQuery execution. Provide the query plan to technical support if
you are not able to resolve an issue. For example:thesaurus lookup execution plan:
query:6:1:Creating query
plan on node /testenv/dsearch/SystemInfo/ThesaurusDB
query:6:1:for expression ...[xhive:metadata(., "uri") = "
http://search.opentext.com/testenv/skos.rdf"]/child::
{http://www.w3.org/1999/02/22-rdf-syntax-ns#}RDF/child::
{http://www.w3.org/2004/02/skos/core#}
Concept[child::{http://www.w3.org/2004/02/skos/core#}prefLabel
[. contains text terms@0]]/child::
{http://www.w3.org/2004/02/skos/core#}altLabel/child::text()
query:6:1:Using query plan:
query:6:1:index(Concept)
[parent::{http://www.w3.org/1999/02/22-rdf-syntax-ns#}
RDF[parent::document()[xhive:metadata(., "uri") = "
http://search.opentext.com/testenv/skos.rdf"]]]/child::
{http://www.w3.org/2004/02/skos/core#}altLabel/child::text()
].
• Related terms that are returned from the thesaurus. For example:[related terms from
thesaurus lookup query
[Absence from work, Absenteeism, Annual leave, Employee vacations,
Holidays from work, Leave from work, Leave of absence, Maternity
leave, Sick leave]]
You can also inspect the final Lucene query. This query is different from the original query because
it contains the expanded terms (alternate labels) from the thesaurus. In xPlore Administrator, open
Services > Logging and expand xhive. Change the log level of com.xhive.index.multipath.query
to DEBUG. The query is in the xDB log as “generated Lucene query clauses”. xdb.log is in
xplore_home/wildfly_version/server/DctmServer_PrimaryDsearch/logs. The tokens are noted as
tkn. For example:
generated Lucene query clauses(before optimization):
+(((<>/dmftmetadata<0>/dm_sysobject<0>/a_is_hidden<0>/ txt:false)^0.0)) +
(((<>/dmftversions<0>/iscurrent<0>/ txt:true)^0.0))
+(<>/ tkn:shme <>/dmftcontents<0>/ tkn:shme <>/dmftcontents<0>/dmftcontent<0>/
tkn:shme <>/dmftfolders<0>/ tkn:shme <>/dmftinternal<0>/
tkn:shme <>/dmftinternal<0>/r_object_id<0>/
tkn:shme <>/dmftinternal<0>/r_object_type<0>/
tkn:shme <>/dmftkey<0>/ tkn:shme <>/dmftmetadata<0>/
tkn:shme <>/dmftsecurity<0>/
tkn:shme <>/dmftsecurity<0>/ispublic<0>/ tkn:shme <>/dmftversions<0>/
tkn:shme <>/dmftvstamp<0>/
tkn:shme) _xhive_stored_payload_:_xhive_stored_payload_
Troubleshooting a thesaurus
Ensure that your thesaurus is in xPlore. You can view a thesaurus and its properties in the xDB admin
tool. Navigate to the /xhivedb/root-library/<domain>/dsearch/SystemInfo/ThesaurusDB library. To
view the default and URI settings, click the Metadata tab.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 225
Search
Ensure that your thesaurus is used. Compare the specified thesaurus URI in the XQuery to the
URI associated with the dictionary. View the URI in the xDB admin tool or the thesaurus list in
xPlore administrator. Compare this URI to the thesaurus URI used by the XQuery, in dsearch.log.
For example:
for $i score $s in collection('/testenv/dsearch/Data')
/dmftdoc[. ftcontains 'food products’ using thesaurus at
'http://www.opentext.com/skos'] order by $s descending return
$i/dmftinternal/r_object_id
If the default thesaurus on the file system is used, the log records a query like the following:
for $i score $s in collection('/testenv/dsearch/Data') /dmftdoc[.
ftcontains 'food products’ using thesaurus default] order by $s
descending return $i/dmftinternal/r_object_id
You can view thesaurus terms that were added to a query by inspecting the final query. Set
xhive.index.multipath.query = DEBUG in xPlore administrator. Search for generated Lucene query
clauses.
226 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
You can configure search to query on compound terms in its entirety but not its components for all
languages or a specified set of languages.
Configure the search for compound terms by editing xplore_home/config/indexserverconfig.xml and
setting the property named query-components-from-compound-words-lang-list under the search-config
element to one of the following values:
• "*": Searching on a compound term returns any results that contain the compound term, any of its
components, or their alternate lemmas for all languages. The search produces a higher number of,
but more ambiguous, search results with some of the results being irrelevant. This is also the default
behavior when the property is not configured.
• Supported language code such as "de" (German), "zh" (Chinese), or "kr" (Korean): Searching on a
compound term returns any results that contain the compound term, any of its components, or their
alternate lemmas only for the specified language(s); for other languages, only results containing the
compound term proper are returned. Delimit multiple language codes with commas.
• "" (blank): Searching on a compound term only returns results that contain the compound term
proper for all languages. This yields a higher level of search accuracy at the cost of fewer returned
results.
For example, the following setting specifies that only for the Chinese and German languages, searching
on a compound term returns results that contain the compound term, any of its components, or their
alternate lemmas; for other languages, only results containing the compound term proper are returned:
<search-config>
<properties>
...
<property value="zh,de" name="query-components-from-compound-word-lang-list"/>
...
</properties>
</search-config>
xPlore generates alternative forms for components in the root form list. This behavior may cause
some compound words hit more documents than expected. To disable this behavior change,
set com.basistech.ela.compounds_in_alt_lemmas to false in the file cps_context.xml located in
xplore_home/dsearch/cps/cps_daemon.
<property name="com.basistech.ela.compounds_in_alt_lemmas" value="false"/>
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 227
Search
– The result must be within the first X rows defined by the max-dynamic-summary-threshold
parameter.
– The size of the extracted text must be less than the extract-text-size-less-than attribute.
– The query term must appear within the first X characters defined by token-size attribute.
– With native xPlore security and the security_mode property of the dm_ftengine_config object set
to BROWSE, you must have at least READ permission to see dynamic summaries.
Dynamic summaries have a performance impact. After the summary is computed, the summary is
reprocessed for highlighting, causing a second performance impact.
You can configure dynamic summary processing on the indexing side as well as on the search side.
You can also disable dynamic summaries for better performance.
• Static summary
Static summaries are computed when the summary conditions do not match the conditions
configured for dynamic summaries. Static summaries are much faster to compute but less specific
than dynamic summaries.
• Metadata summary
To obtain summaries with highlighted metadata, the metadata attributes must be in both the query
constraints and the metadata highlight attribute list.
228 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
The default: is -1 (all documents). For faster summary calculation, set this value to a positive
value. Larger documents return a static summary.
b. Configure the number of characters at the beginning of the document in which the query
term must appear. If the query term is not found in this snippet, a static summary is
returned and term hits are not highlighted. Set the value of the token-size attribute on the
category-definitions/category/do-text-extraction/save-tokens-for-summary-processing element.
The default value is 65536 (64K). A value of -1 indicates no maximum content size, but this
value negatively impacts performance. For faster summary calculation, set this value lower.
c. Configure the maximum number of results that have a dynamic summary. Dynamic
summaries require much more computation time than static summaries. Set the value of
max-dynamic-summary-threshold (default: 50). Additional results have a static summary.
If most users do not go beyond the first page of results, set this value to the page size (for
example, 10) for better performance.
4. Configure dynamic summary query processing
In addition to configuring text extraction and lemmatization for dynamic summaries as a part of
the indexing process, you can further configure dynamic summary computation on the search
side without having to reindex the documents.
Edit indexserverconfig.xml and add the query-dynamic-summary-content-text-size property under
the search-config element. The value of this property approximately determines in characters
the amount of content text and corresponding tokens to be read from disk and considered for
computing the dynamic summary for a document.
For example:
<search-config>
...
<property value="256" name="query-dynamic-summary-content-text-size"/>
...
</search-config>
The setting above specifies that proximately 256 characters of the content from a document and
corresponding amount of tokens will be read from disk and considered for computing the dynamic
summary.
Consider setting the value of query-dynamic-summary-content-text-size lower in the following
scenarios:
• Your disk suffers from poor performance and limited capacity in a multi-user environment and
you want to reduce the disk I/O and improve system performance.
• You do not want to shrink the size of the result display of dynamic summary.
Note: While improving system performance, this feature may affect the quality of summaries
produced for documents, especially in the case of large documents, where only a small portion of
the content text is considered for generating summaries. If a search term occurs at the end of a large
document, it may not be highlighted in the summary.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 229
Search
Highlighting does not preserve query context such as phrase search, AND search, NOT search, fuzzy
search, or range search. Each search term in the original query is highlighted separately.
To enable summary highlighting for multiple elements, add the following property under search-config
in indexserverconfig.xml:
<property value="true" name="query-summary-process-all-highlight-nodes"/>
This is a sample query supported by this feature:
for $doc in doc('/defaultDomain/dsearch/Data/default')/dmftdoc[.
ftcontains 'relational database' with stemming] return <R>
<ID>{string($doc/dmftmetadata//r_object_id)}</ID>
<summary> {xhive:highlight(($doc/dmftcontents/*/dmftcontentref,
$doc/dmftdoc/dmftcustom))}</summary></R>
230 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 231
Search
232 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
feature of Lucene. Lucene generates a similar words list from the index. The index of the documents is
used as a dictionary in xPlore. (The index can be seen as a dictionary in Lucene, and words whose
similarity with the search term is more than the specified similarity are returned.)
By default, fuzzy search is not enabled. Fuzzy search is not applied when wildcards are present.
Additionally, fuzzy search is only available in XQueries not in DQL queries. DFC clients must be at
least version 6.7 to enable fuzzy search.
Fuzzy search does not work if you enable exact phrase match in queries. For information about
configuring exact phrase match in queries, see About lemmatization, page 112.
When enabling fuzzy search, a query runs with the search terms as well as a number of similar terms.
To reduce the performance impact that such a query can have, we consider that the first character of
the search term is correct and we limit the number of similar terms used in the query. For example,
searching on qdministration would not return results with the term administration. Similarly, searching
on explore would not return results with the term xplore.
You can configure the number of leading characters to ignore, the number of similar terms to use in the
query, and the supported distance between terms.
Use the following procedure to enable and configure fuzzy search for all queries except DQL.
1. Check your current dm_ftengine_config parameters. Use iAPI, DQL, or DFC to check the
dm_ftengine_config object. First get the object ID, returned by this API command:
retrieve,c,dm_ftengine_config
3. If the fuzzy_search_enable parameter does not exist, use iAPI, DQL, or DFC to modify the
dm_ftengine_config object. To add a parameter using iAPI in Documentum Administrator, use
append like the following:
retrieve,c,dm_ftengine_config
append,c,l,param_name
fuzzy_search_enable
append,c,l,param_value
true
save,c,l
4. Change the allowed similarity between a word and similar words, set the parameter
default_fuzzy_search_similarity in dm_ftengine_config. This default also applies to custom fuzzy
queries in DFC and DFS for full-text and properties. Set a value between 0 (terms are different
by more than one letter) and 1 (default=0.5).
To verify that your fuzzy search setting has been applied, view the query in dsearch.log. You
should see the following argument in the query with a similarity value that you have set:
5. Edit the xdb.properties file located in the directory WEB-INF/classes of the primary instance.
6. Set the xdb.lucene.fuzzyQueryPrefixLength property to the number of leading characters that should
be ignored when assessing similar terms. Default: 1.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 233
Search
For example, when setting the prefix value to 0, searching explore returns xplore, but it has a
large impact on the performance. Only set it to 0 if the first character is critical to your business.
Setting the prefix to a high value improves the performance but similar terms can be omitted and
you lose the benefit of the feature.
7. Set the xdb.lucene.fuzzyTermsExpandedNumber property to the maximum number of similar terms
used in the query. The most similar terms are used. A smaller value improves query response
time. Default: 10.
8. Make the same changes in the xdb.properties file for all instances.
Fuzzy search in DFC and DFS
You can enable fuzzy search on individual queries in DFC or DFS. Set fuzzy search in individual
full-text and property queries with APIs on IDfFulltextExpression and IDfSimpleAttrExpression. Use
the operators CONTAINS, and EQUALS for String object types:
• setFuzzySearchEnabled(Boolean fuzzySearchEnabled)
• setFuzzySearchSimilarity(Float similarity): Sets a similarity value between 0 and 1. Overrides the
value of the parameter default_fuzzy_search_similarity in dm_ftengine_config.
Note: xPlore does not support negative operators, such as DOES_NOT_CONTAIN, in fuzzy search.
To disable fuzzy search, set the property fuzzy_search_enable in the dm_ftengine_config object to false.
234 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
Index Type
string Yes
double Yes
When xdb.lucene.strictIndexTypeCheck is True, a stricter index typing checking rule is enforced. This
may cause lower query performance if you did not specify index data types when creating subpath
definitions.
The following table shows compatible query and index data type pairs (indicated as Yes) when
xdb.lucene.strictIndexTypeCheck is set to True.
Index Type
Query string integer double date dateTime time float long
Type
string Yes
integer Yes Yes Yes Yes
double Yes
date Yes Yes
dateTime Yes
time Yes Yes
float Yes Yes
For example, the following elements in a Content Server document are declared as type integer and
dateTime respectively:
<owner_permit dmfttype="dmint">7</owner_permit>
<r_creation_date dmfttype="dmdate">2010-09-27T22:54:48</r_creation_date>
However, without an explicit subpath definition in Indexserverconfig.xml, both element values are
indexed as string values in multi-path indexes.
Performing the following queries will return different results depending how you set the
xdb.lucene.strictIndexTypeCheck value:
/dmftdoc[dmftmetadata//owner_permit = xs:Integer(“7”)]
/dmftdoc[dmftmetadata//r_creation_date = xs:dateTime(“2010-09-27T22:54:48”)]
• If xdb.lucene.strictIndexTypeCheck = True, the elements will not be returned since the query data
types integer and dateTime do not match the index data type string.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 235
Search
• If xdb.lucene.strictIndexTypeCheck = False, the element will be returned since the query data types
integer and dateTime are considered compatible with the index data type string.
236 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
cause incorrect results when a negative operator is applied. Without term cutoff, performance issues
increase. The only exception is when the following conditions are met:
– xdb.lucene.prefixQueryCutoff=false. See Limiting wildcards and common terms in search
results, page 237.
– The query can be converted to a simple prefix query and is not in a phrase query.
• A search with only wildcards will cause an exception, by default. In most cases, such a search is
meaningless and causes very poor performance. If you need to run this type of search, make the
following changes:
1. In indexserverconfig.xml, add search-config property
<property value="false"
name="query-discard-unselective-wildcard-query"/>
By default, true.
2. In xdb.properties, add option xdb.lucene.noTokenFtcontainsToMatchAll=true. By default, false.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 237
Search
• xdb.lucene.prefixQueryCutoff: Specifies whether or not to enable cutoff for prefix queries (e.g.
‘abc.*’). Disabling cutoff may result in slower performance if there is a very large number of
matching terms. Default: true.
To get cutoff messages through DFC, enable them with the following API:
IDfQueryBuilder.setCutoffMessageRetrieved(true)
Retrieve cutoff messages by calling the following DFC API. Each docbase source may return a cutoff
message. The return result is a map from docbase source names to cutoff messages.
IDfQueryProcessor.getCutoffMessages()
This example shows how to get cutoff messages from DFC:
IDfQueryBuilder queryBuilder = queryMgr.newQueryBuilder("dm_document");
queryBuilder.addSelectedSource(docbase);
queryBuilder.addResultAttribute("r_object_id");
queryBuilder.setCutoffMessageRetrieved(true);
238 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
2. Use the object ID to get the dm_ftengine_config parameters and values. In the following example,
the value of r_object_id returned in step 1 is used to get the parameters:
?,c,select param_name, param_value from dm_ftengine_config
where r_object_id='080a0d6880000d0d'
3. If the wildcards configuration parameters are not returned, configure them. Append a param_name
and param_value element and set its value. For example:
retrieve,c,dm_ftengine_config
append,c,l,param_name
ft_wildcards_mode
append,c,l,param_value
explicit
save,c,l
4. To change an existing parameter, locate the position of the param_name attribute value of the
parameter. Use set as follows:
retrieve,c,dm_ftengine_config
dump,c,l //locates the position
set,c,l,param_value[i] //position of ft_wildcards_mode
implicit
save,c,l
For metadata searches, modify contains/starts with/ends with/equals in the following parameters:
• metadata_contains_wildcards_mode: separately controls metadata search for contains operator.
Valid values: none | explicit(default) | implicit | trailing_implicit.
• metadata_startswith_wildcards_mode: separately controls metadata search for starts with
operator. Valid values: none | explicit(default) | implicit. With implicit, the wildcard is added at
the end of the search term.
• metadata_endswith_wildcards_mode: separately controls metadata search for ends with operator.
Valid values: none | explicit(default) | implicit. With implicit, the wildcard is added at the beginning
of the search term.
• metadata_equals_wildcards_mode: separately controls metadata search for equals operator.
Valid values: none(default) | explicit.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 239
Search
, page 240
• Query plugin configuration (dm_ftengine_config), page 240
• Making types and attributes searchable, page 241
• Running folder descend queries, page 242
• DQL, DFC, and DFS queries, page 243
• Routing a query to a specific collection, page 276
• Tracing Documentum queries, page 246
240 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
where r_object_id=dm_ftengine_config_object_id
1. Add a missing parameter using iAPI append like the following:
retrieve,c,dm_ftengine_config
append,c,l,param_name
acl_check_db
append,c,l,param_value
T
save,c,l
Redundant spaces before or after parameter names or values can cause unexpected errors. The
following DQL statement helps you locate redundant spaces in the dm_ftengine_config object:
select param_name, param_value from dm_ftengine_config
where any param_name like '% ' or any param_name like ' %'
or any param_value like '% ' or any param_value like ' %'enable(row_based);
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 241
Search
indexable format, you can ensure indexing by creating a rendition in an indexable format. For more
details, see Documentum Search Development Guide.
Allowing search
Set the is_searchable attribute on an object type to allow or prevent searches for objects of that type
and its subtypes (default: true). Valid values: 0 (false) and 1 (true). The client application must read
this attribute. If is_searchable is false for a type or attributes, Webtop does not display them in the
search UI.
Lightweight sysobjects (LWSOs)
Lightweight sysobjects group the attribute values that are identical for a large set of objects. This
redundant information is shared among the LWSOs from the shared parent object. For LWSOs like
dm_message_archive, the client application must configure searchable attributes. Use CREATE
TYPE and ALTER TYPE FULLTEXT SUPPORT switches to specify searchable attributes. For more
information on this configuration, see Documentum Content Server DQL Reference. For information
on supporting extended search with LWSOs, see Documentum Search Development Guide.
Aspects
Properties associated with aspects are not indexed by default. If you wish to index them, use an
ALTER ASPECT statement to identify the aspects you want indexed. For more information on this
statement, see Documentum Content Server DQL Reference Manual.
242 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
Set the folder_cache_limit in the dm_ftengine_config object to the expected maximum number of
folders in the query (default = 2000). If the folder descend condition evaluates to less than the
folder_cache_limit value, then folder IDs are pushed into the index probe, making the query much
faster. If the condition exceeds the folder_cache_limit value, the folder constraint is evaluated
separately for each result.
You can apply FilterFoldersFunction performance improvements included in the new setup by
upgrading to the latest xPlore version and performing the following steps:
1. In Indexserverconfig.xml, create the following subpath definition:
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 243
Search
retrieve,c,dm_docbase_config
append,c,l,r_module_name
dm_ft_order_by_enabled
append,c,l,r_module_mode
save,c,l
reinit,c
244 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
DQL Processing
The DFC and DFS search services by default generate XQuery expressions, not DQL, for xPlore.
DQL hints in a hints file are not applied. You can turn off XQuery generation in dfc.properties so
that DQL is generated and hints are applied. Do not turn off XQuery generation if you want xPlore
capabilities like facets.
If query constraints conform to FTDQL, the query is evaluated in the full-text index. If all or part of
the query does not conform to FTDQL, only the SDC portion is evaluated in the full-text index. All
metadata constraints are evaluated in the Content Server database, and the results are combined.
The following configurations turn off XQuery and render a query in DQL:
• dfc.search.xquery.generation.enable = false in dfc.properties
• ftsearch_security_mode is 0. See Changing search results security, page 55.
• acl_check_db is true. See Changing search results security, page 55.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 245
Search
dfc.search.xquery.generation.enable=false
Unsupported DQL
xPlore does not support the DQL SEARCH TOPIC clause or pass-through DQL.
246 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 247
Search
Overview
Query subscriptions is a feature in which a user can:
• Specify to automatically run a particular saved search (full-text or metadata-only) at specified
intervals (once an hour, day, week, or month) and return any new results.
The results can be discarded or saved. If the results are saved, they can be merged with or replace
the previous results.
• Unsubscribe from a query.
• Retrieve a list of their query subscriptions.
• Be notified of the results via a dmi_queue_item in the subscribed user Inbox and, optionally, an
email.
• Execute a workflow, for example, a business process defined in xCP.
Query subscriptions run in Content Server 6.7 SP1 or later with DFC 6.7 SP1 or later. Support for
query subscriptions is installed with the Content Server. A DFC client like Webtop or CenterStage
must be customized using DFC 6.7 SP1 or later to present query subscriptions to the user.
Because automatically running queries at specified intervals can negatively affect xPlore performance,
tune and monitor query subscription performance.
248 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
whereas SubscriptionN executes once an hour. There is only one subscriber per subscription; that is,
the subscriber of Subscription1 is user1 and the subscriber for SubscriptionN is user2.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 249
Search
250 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
b. Specify the path to the file DarInstall.xml in a temporary working directory (excluding the file
name) as the value of BUILDFILE. For example:
set BUILDFILE="C:\DarInstall\temp"
c. Specify a workspace directory for the generated Composer files. For example:
set WORKSPACE="C:\DarInstall\work"
4. Launch DarInstall.bat (Windows) or DarInstall.sh (Linux) to install the query subscription SBO.
On Windows 2008, run the script as administrator.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 251
Search
Subscription reports
When you support query subscriptions, monitor the usage and query characteristics of the users with
subscription reports. If there are many frequent or poorly performing subscriptions, increase capacity.
252 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
Subscription logging
Subscribed queries are logged in dsearch.log with the event name QUERY_AUTO. The following
information is logged:
<event name="QUERY_AUTO" component="search" timestamp="2011-08-23T14:45:09-0700">
…..
<application_context>
<query_type>QUERY_AUTO</query_type>
<app_name>QBS</app_name>
<app_data>
<attr name="subscriptionID" value="0800020080009561"/>
<attr name="frequency" value="DAILY"/>
<attr name="range" value="1015"/>
<attr name="jobintervalinseconds" value="86400"/>
</app_data>
</application_context>
</event>
Key:
• subscriptionID is set by the QBS application
• frequency is the subscription frequency as set by the client. Values: HOURLY, DAILY, WEEKLY,
MONTHLY.
• range reports time elapsed since last query execution. For example, if the job runs hourly but the
frequency was set to 20 minutes, the range is between 0 and 40 minutes (2400 seconds). Not
recorded if the frequency is greater than one day.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 253
Search
• jobintervalinseconds is how often the subscription is set to run, in seconds. For example, a value
86400 indicates a setting of one day in the client. Not recorded if the frequency is greater than
one day.
dm_ftquery_subscription
Represents a subscribed query.
Description
Supertype: SysObject
Subtypes: None
Internal name: dm_ftquery_subscription
Object type tag: 08
A dm_ftquery_subscription object represents subscription-specific information but not the saved query
itself, which is contained in a dm_smart_list object.
Properties
254 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
dm_qbs_relation object
Table 25 dm_qbs_relation properties
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 255
Search
Each of these jobs execute all query subscriptions that are specified to execute at the corresponding
interval:
Job Name Description
dm_FTQBS_HOURLY Executes all query subscriptions that are to be
executed once an hour.
dm_FTQBS_DAILY Executes all query subscriptions that are to be
executed once a day.
dm_FTQBS_WEEKLY Executes all query subscriptions that are to be
executed once a week.
dm_FTQBS_MONTHLY Executes all query subscriptions that are to be
executed once a month.
Each job executes its query subscriptions in ascending order based on each subscription last_exec_date
property value. If a query subscription is not executed, it is executed when the job runs next.
Note: A job is stopped gracefully just before it is timed out.
Method arguments
Argument Description
256 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
Argument Description
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 257
Search
Argument Description
Reports
Job reports are stored in:
DOCUMENTUM_HOME/dba/log/sessionID/sysadmin
dm_FTQBS_HOURLY FTQBS_HOURLYDoc.txt
dm_FTQBS_DAILY FTQBS_DAILYDoc.txt
dm_FTQBS_WEEKLY FTQBS_WEEKLYDoc.txt
dm_FTQBS_MONTHLY FTQBS_MONTHLYDoc.txt
Custom jobs
The job method -zone_value parameter is meant for partitioning the execution of query subscriptions
among multiple custom jobs that run on the same interval. A custom job executes every
dm_ftquery_subscription that has the same zone_value and frequency attribute values as the custom
job. You must specify a -zone_value value for every custom job that runs on the same interval and that
value must be unique amongst all those custom jobs. If a job does not specify a -zone_value value, then
it will execute all subscriptions on the same interval regardless of each subscription's zone_value value.
Note: None of your custom jobs should have the same interval as any of the pre-installed jobs, because
the pre-installed jobs do not have a -zone_value specified and will execute all subscriptions on the
same interval regardless of their zone_value value.
Requirements
• Activities:
258 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
IQuerySubscriptionSBO
Provides the functionality to subscribe to, unsubscribe from, and list query subscriptions.
Interface name
com.documentum.server.impl.fulltext.qbs.IQuerySubscriptionSBO
Imports
import com.documentum.server.impl.fulltext.qbs.IQuerySubscriptionSBO;
import com.documentum.server.impl.fulltext.qbs.QuerySubscriptionInfo;
import com.documentum.server.impl.fulltext.qbs.impl.QuerySubscriptionException;
DAR
QBS.dar
Methods
• public IDfId subscribe (String docbaseName,IDfId smartListID,
String subscriber, String frequency, IDfId workFlowID, int
zoneValue, IDfTime lastExecDate, int resultStrategy) throws
DfException,QuerySubscriptionException
Validates the dm_smart_list object ID and subscriber name in the specified repository; validates
the frequency value with all query
subscription jobs with the job method argument “-frequency”. Creates a dm_ftquery_subscription
and dm_relation objects. The object ID of dm_ftquery_subscription object is returned.
The workflow template ID can be set to null, if not applicable.
For zone_value, specify -1, if not applicable.
For lastExecDate, specify DfTime.DF_NULLDATE, if not applicable.
For resultStrategy: Integer that indicates whether existing results that are saved in the dm_smart_list
are replaced with the new results (0, the default), merged with the new results (1), or the new results
are discarded (2). Specify -1, if not applicable.
• public IDfId subscribe (String docbaseName,IDfId smartListID, String
subscriber, String frequency, IDfId workFlowID, int zoneValue,
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 259
Search
IQuerySubscriptionTBO
Interface name
com.documentum.server.impl.fulltext.qbs.IQuerySubscriptionTBO
Imports
import com.documentum.server.impl.fulltext.qbs.IQuerySubscriptionTBO;
260 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
import com.documentum.server.impl.fulltext.qbs.results.DfResultsSetSAXDeserializer;
DAR
QBS.DAR
Methods
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 261
Search
Notes
Extending this TBO is not supported.
QuerySubscriptionAdminTool
Class name
com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool
Usage
You use
com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool to:
• Subscribe to a query for a specific user (-subscribe flag)
• Unsubscribe from a query for a specific user (-unsubscribe flag)
• List all subscribed queries for a specific user (-listsubscription flag)
Note: All parameter values are passed as string values and must be enclosed in double quotes if
spaces are specified in the value.
To display the syntax, specify the -h flag. For example:
C:\Temp\qbsadmin>"%JAVA_HOME%\bin\java" -classpath "C:\Documentum\config;
.\lib\qbs.jar;.\lib\qbsAdmin.jar;.\lib\dfc.jar;.\lib\log4j.jar;
.\lib\commons-lang-2.4.jar;.\lib\aspectjrt.jar"
com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool -h
Note: In xplore_home/setup/qbs/tool, qbsadmin.bat and qbsadmin.sh demonstrate how to call this
class. In qbsadmin.bat and qbsadmin.sh, modify the path to the dfc.properties file. You can also change
the -h flag to one of the other flags.
Required JARs
qbs.jar
qbsAdmin.jar
dfc.jar
log4j.jar
commons-lang-2.4.jar
aspectjrt.jar
-subscribe example
C:\Temp\qbsadmin>"%JAVA_HOME%\bin\java"
-classpath "C:\Documentum\config;.\lib\qbs.jar;.\lib\qbsAdmin.jar;
.\lib\dfc.jar;.\lib\log4j.jar;.\lib\commons-lang-2.4.jar;
.\lib\aspectjrt.jar"
com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool
262 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
-subscribe output
subscribed 080000f28002ef2cfor user user1 succeeded
with subscription id 080000f28002f115
-unsubscribe example
C:\Temp\qbsadmin>"%JAVA_HOME%\bin\java" -classpath
"C:\Documentum\config;.\lib\qbs.jar;.\lib\qbsAdmin.jar;
.\lib\dfc.jar;.\lib\log4j.jar;.\lib\commons-lang-2.4.jar;
.\lib\aspectjrt.jar"
com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool
-unsubscribe D65SP2M6DSS user1 password passwrod1 080000f28002ef2c
-unsubscribe output
User user1 has no subscriptions on dm_smart_list object
(080000f28002ef2c)
-listsubscription example
C:\Temp\qbsadmin>"%JAVA_HOME%\bin\java" -classpath
"C:\Documentum\config;.\lib\qbs.jar;.\lib\qbsAdmin.jar;
.\lib\dfc.jar;.\lib\log4j.jar;.\lib\commons-lang-2.4.jar;
.\lib\aspectjrt.jar"
com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool
-listsubscription D65SP2M6DSS user1 password password1
-listsubscription output
Subscriptions for user1 are:
smartList: 080000f28002ef2c frequency: DAILYworkFlowID: 0000000000000000
smartList: 080000f28002ef2f frequency: 5 MINUTESworkFlowID: 0000000000000000
Troubleshooting search
When you set the search service log level to WARN, queries are logged. Auditing queries, page 264
describes how to view or customize reports on queries.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 263
Search
Note: If the timeout(ms) option of the test search is not set, it will use the maximum value of 10
minutes and the query-default-timeout setting configured in indexserverconfig.xml.
Auditing queries
Auditing is enabled by default. Audit records are purged on a configurable schedule (default: 30 days).
To enable or disable query auditing, open System Overview in the xPlore administrator left pane.
Click Global Configuration and choose the Auditing tab. Click search to enable query auditing.
For information on configuring the audit record, see Configuring the audit record, page 41.
Audit records are saved in an xDB collection named AuditDB. You view or create reports on the audit
record. Query auditing provides the following information:
264 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 265
Search
The Content Server query plugin properties of the dm_ftengine_config object are set during xPlore
configuration. If you have changed one of the properties, like the primary xPlore host, the plugin can
fail. Verify the plugin properties, especially the qrserverhost, with the following DQL:
1> select param_name, param_value from dm_ftengine_config
2> go
266 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
Look for temp table creation and inserts like the following:
Thu Feb 09 15:50:12 2012 790000: 6820[7756] 0100019f80023909
process_ftquery_to_temp --- will populate temp table in batch size 20000
Thu Feb 09 15:50:12 2012 790000: 6820[7756] 0100019f80023909
build_fulltext_temp --- begin: create the fulltext temporary table.
Thu Feb 09 15:50:13 2012 227000: 6820[7756] 0100019f80023909
BuildTempTbl --- temporary table dmft80023909004 was created successfully.
Thu Feb 09 15:50:13 2012 430000: 6820[7756] 0100019f80023909
Inserting row at index 0 into the table
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 267
Search
268 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
dfc.search.xquery.option.parallel_execution.enable = true
• Use the ENABLE(fds_collection collectionname) hint or the IN COLLECTION clause in DQL. See
Routing a query to a specific collection, page 276
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 269
Search
full-text-search="false" value-comparison="true"/>
Note: If the metadata is used to compute facets, set returning-contents to true.
for $i in collection("dsearch/SystemInfo")
270 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
where $i//trackinginfo/document[@id="TestCustomType_txt1276106246060"]
return $i//trackinginfo/document/collection-name
3. Set the save-tokens option to true for the target collection and restart xPlore, then reindex the
document. Check the tokens in the Tokens library to see whether the search term was properly
indexed.
for $i in collection("dsearch/SystemInfo")
where $i//trackinginfo/document[@id="TestCustomType_txt1276106246060"]
return $i//trackinginfo/document/collection-name
• Set the save-tokens option to true for the target collection and restart xPlore, then reindex the
document. Check the tokens in the Tokens library to see whether the search term was properly
indexed.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 271
Search
query-locale=en,...>
Search results differ when searching with different locales, especially compound terms that have
associated components. For example, a search for Stollwerk returned many more results when using
the German than the English locale. Stollwerk is lemmatized as stollwerk in English but as stoll and
werk in German. You can turn off lemmatization. See Configuring indexing lemmatization, page 113.
272 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
Debugging queries
You can debug queries for the following problems:
• Query does not return expected results.
• Query is very slow (reported in Top N Slowest Queries report).
No results are returned, because the searched value is a Documentum integer attribute. When you
execute the query with the get query debug option, you see that the value is treated as a string:
query:1:20:for expression .../child::dmftdoc[. contains text 9001001]
You must stop the xPlore instances and add a subpath for the non-string attribute. In this example, the
following subpath was added to the dmftdoc category. Note that partial paths are supported, in case the
metadata value is found in more than one path:
<sub-path description="award number" returning-contents="true"
value-comparison="true" type="integer" path="dmftmetadata//award_no"/>
For the non-string value to be found, we must reindex the domain (or the specific collection, if known).
After reindexing, we have a different result in Test Search:
query:1:99:Using query plan:
query:1:99:index(dmftdoc)
query:1:99:Looking up "(false, true, 030012a7800001dc, 1001)" in index "dmftdoc"
query:1:290:Found an index to support all order specs. No sort required.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 273
Search
Open the Data Management tree and choose the library that contains your index. For Documentum
environments, the library name is the same as the repository name. Click Execute XQuery and input
the query from your text editor. Choose the option Get query debug. Click Execute XQuery.
IDfXQuery.setBooleanOption(IDfXQuery.FtQueryOptions.
SAVE_EXECUTION_PLAN,true)
retrieve:
IDfXQuery.getExecutionPlan(session)
To get an execution plan, do the following:
xquery.setXQueryString(statement);
IDfXQueryTargets target = new DfFullTextXQueryTargets();
xquery.setIntegerOption(IDfXQuery.FtQueryOptions.BATCH_SIZE, 10);
xquery.setBooleanOption(IDfXQuery.FtQueryOptions.SAVE_EXECUTION_PLAN, true);
274 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search
xquery.setBooleanOption(IDfXQuery.FtQueryOptions.CACHING, true);
xquery.setIntegerOption(IDfXQuery.FtQueryOptions.TIMEOUT, 600000);
xquery.setBooleanOption(IDfXQuery.FtQueryOptions.RETURN_TEXT, true);
xquery.setBooleanOption(IDfXQuery.FtQueryOptions.RETURN_SUMMARY, true);
xquery.setStringOption(IDfXQuery.FtQueryOptions.APPLICATION_NAME, "DfXQuery");
xquery.execute(session, target);
try
{
String plan = xquery.getExecutionPlan(session);
System.out.println("xquery plan = " + plan);
}
catch (DfException e)
{
System.out.println(e.getMessage());
}
• Using iAPI
save:
apply,c,NULL,MODIFY_TRACE,SUBSYSTEM,S,fulltext,VALUE,S,ftengine
retrieve: The query execution plan is written to dsearch.log, which is located in the logs subdirectory
of the WildFly deployment directory.
• Using xPlore search API
save:
IDfXQuery.setSaveExecutionPlan(true)
retrieve:
IFtSearchSession.fetchExecutionPlan(requestId)
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 275
Search APIs and customization
r_policy_id')]
return <attr name='{local-name($attr)}' type='{$attr/@dmfttype}'>{
string($attr)}</attr>}{xhive:highlight(($dm_doc/dmftcontents/
dmftcontent/dmftcontentref,$dm_doc/dmftcustom))}
<attr name='score' type='dmdouble'>{string(dsearch:get-score($dm_doc))}
</attr></r>
)
Note: The XQuery portion of the query is almost identical to the query retrieved through xPlore
administrator. These queries were issued separately, which accounts for differences.
To debug the Webtop query, edit the query from View Source and enter it in the Execute XQuery
dialog box in xPlore administrator.
• Route all queries that meet specific criteria using a DQL hint in dfcdqlhints.xml
enable(fds_query_collection_collectionname) where collectionname is the collection name. If
you use a DQL hint, you do not need to change the application or DFC query builder. You must
turn off XQuery generation. See Turning off XQuery generation to support DQL, page 277.) .
For more information on the hints file, refer to Documentum Search Development Guide. For
example:?,c,select r_object_id from dm_document search document contains
'benchmark'
enable(fds_query_collection_custom)
• Implement the DFC query builder API addPartitionScope.
• Implement the DFC IDfXQuery API collection()
• DFS PartitionScope object in a StructuredQuery implementation
Use DQL
You can route a DQL query to a specific collection in the following ways. By default, DFC does
not generate DQL, but you can turn off XQuery generation. (See Turning off XQuery generation
to support DQL, page 277.)
• Route an individual query using the DQL in collection clause to specify the target of a SELECT
statement. Use one of the two following syntaxes.
276 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search APIs and customization
– Collection names are separated by underscores .select attr from type SDC where
… enable(
fds_query_collection_collection1_collection2__...)
• Route all queries that meet specific criteria using a DQL hint in dfcdqlhints.xml
enable(fds_query_collection_collectionname) where collectionname is the collection name.
For more information on the hints file, refer to Documentum Search Development Guide. The
following hints route queries for a specific type to a known target collection appended to
FDS_QUERY_COLLECTION_.<RuleSet>
<Rule>
<Condition>
<From condition="any">
<Type>my_type</Type>
</From>
</Condition>
<DQLHint>ENABLE(FDS_QUERY_COLLECTION_MYTYPECOLLECTION)</DQLHint>
</Rule>
</RuleSet>
Debugging queries
You can debug queries by clicking a collection in xPlore administrator. Choose Execute XQuery for
the target collection or the top-level collection for the repository.
CAUTION: Do not use xhadmin to rebuild an index or change files that xPlore uses. If
you remove segments, your backups cannot be restored. This tool is not aware of xPlore
configuration settings in indexserverconfig.xml.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 277
Search APIs and customization
278 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search APIs and customization
A structured query defines a query using an object‑oriented model. The query is constrained by a set
of criteria contained in an ExpressionSet object. An ordered list of RepositoryScope objects defines
the scope of the query (the sources against which it is run). PartitionScope objects target the query to
specific collections. ExpressionScope
structuredQuery.addRepository(m_docbase);
structuredQuery.setObjectType("dm_document");
// Set expression
ExpressionSet expressionSet2 = new ExpressionSet();
expressionSet2.addExpression(new PropertyExpression(
"object_name", Condition.CONTAINS, new SimpleValue("test")));
structuredQuery.setRootExpressionSet(expressionSet2);
return structuredQuery;
}
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 279
Search APIs and customization
Options:
280 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search APIs and customization
• Debugging:
– Get and set client application name for logging
– Get and set save execution plan to see how the query was executed
• Query execution:
– Get and set result batch size. For a single batch, set to 0.
– Get and set target collection for query
– Get and set query text locale
– Get and set parallel execution of queries
– Get and set timeout in ms
• Security:
– Get and set security filter fully qualified class name
– Get and set security options used by the security filter
– Get and set native security (false sets security evaluation in the Content Server)
• Results:
– Get and set results streaming
– Get and set results returned as XML nodes
– Get and set spooling to a file
– Get and set synchronization (wait for results)
– Get and set caching
• Summaries:
– Get and set return summary
– Get and set return of text for summary
– Get and set summary calculation
– Get dynamic summary maximum threshold
– Gets and sets length of summary fragments
– Get summary security mode
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 281
Search APIs and customization
282 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search APIs and customization
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 283
Search APIs and customization
284 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search APIs and customization
DFC
IDfQueryProcessor method setApplicationContext(DfApplicationContext context).
DfApplicationContext can set the following context:
• setApplicationName(String name)
• setQueryType(String type)
• setApplicationAttributes(Map<String,String> attributesMap). Set user-defined attributes in a Map
object.
DFC example
The following example sets the query subscription application context and application name. This
information is used to report subscription queries.
Instantiate a query process from the search service, set the application name and query type, and add
your custom attributes to the application context object:
IDfQueryProcessor processor = m_searchService.newQueryProcessor(
queryBuilder, true);
Map<String,String> aSetOfApplicationAttributes =
new HashMap<String,String>();
aSetOfApplicationAttributes.put("frequency","300");
aSetOfApplicationAttributes.put("range","320");
anApplicationContext.setApplicationAttributes(
aSetOfApplicationAttributes);
processor.setApplicationContext(anApplicationContext);
IDfResultsSet results = processor.blockingSearch(60000);
The context is serialized to the audit record as follows:
<event name="AUTO_QUERY" component="search" timestamp="
2011-07-26T14:00:18-0700">
<QUERY_ID>PrimaryDsearch$706f93fa-e382-499c-b41a-239ae800da96
</QUERY_ID>
<QUERY>
<![CDATA[for $i in collection('/yttestenv/dsearch/Data')/dmftdoc[(
dmftmetadata//a_is_hidden = "false") and (dmftversions/iscurrent = "
true") and (. ftcontains "xplore" with stemming using stop words
default)]
return string(<R>{$i//r_object_id}</R>)]]></QUERY>
<USER_NAME>unknown</USER_NAME>
<IS_SUPER_USER/>
<application_context>
<app_name>QBS</app_name>
<app_data>
<attr name="subscriptionid" value="080f444580029954"/>
<attr name="frequency" value="300"/>
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 285
Search APIs and customization
IDfXQuery
Use the API FtQueryOptions in the package com.emc.documentum.core.fulltext.common.search.
Call setApplicationName(String applicationName) to log the name of the search client application,
for example, webtop.
Call setQueryType(FtQueryType queryType) with the FtQueryType enum.
dfc.search.xquery.option.parallel_execution.enable = true
You can also use one of the following APIs to execute a query across several collections in parallel:
286 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search APIs and customization
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 287
Search APIs and customization
You can access one thesaurus for full-text and one thesaurus for metadata. For example, you may have a
metadata thesaurus that lists various forms of company names. The following example uses the default
thesaurus to expand the full-text lookup and a metadata thesaurus to expand the metadata lookup:
IDfExpressionSet rootSet = queryBuilder.getRootExpressionSet();
288 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Search APIs and customization
String line;
while((line = br.readLine()) != null)
{
if (line.length() > 0)
{
String[] mapping = line.split("=", 2);
String key = mapping[0];
String related = mapping[1];
// do some format checking
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 289
Search APIs and customization
if (key.length() < 1)
continue;
// related should have at least "[?]", where ? is any character
if (related.length() < 3)
continue;
if (related.charAt(0) != '[' || related.charAt(related.length(
)-1) != ']')
continue;
related = related.substring(1, related.length()-1);
String relatedTerms[] = related.split(",");
Collection<String> terms = new ArrayList<String>(
relatedTerms.length);
for (String term : relatedTerms)
{
terms.add(term);
}
s_thesaurus.put(key, terms);
}
}
}
catch (FileNotFoundException e)
{
System.out.println("FileNotFoundException while loading FAST Thesaurus: "
+ e.getMessage());
}
catch (IOException e)
{
System.out.println("IOException while loading FAST Thesaurus: "
+ e.getMessage());
}
}
}
290 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Chapter 11
Facets
About Facets
Faceted search, also called guided navigation, enables users to explore large datasets to locate items
of interest. You can define facets for the attributes that are used most commonly for search. Facets
are presented in a visual interface, removing the need to write explicit queries and avoiding queries
that do not return desired results. After facets are computed and the results of the initial query are
presented in facets, the user can drill down to areas of interest. At drilldown, the query is reissued
for the selected facets.
A facet represents one or more important characteristics of an object, represented by one or more
object attributes in the Documentum object model. Multiple attributes can be used to compute a facet,
for example, r_modifier or keywords. Faceted navigation permits the user to explore data in a large
dataset. It has several advantages over a keyword search or explicit query:
• The user can explore an unknown dataset by restricting values suggested by the search service.
• The data set is presented in a visual interface, so that the user can drill down rather than constructing
a query in a complicated UI.
• Faceted navigation prevents dead-end queries by limiting the restriction values to results that are
not empty.
Facets are computed on discrete values, for example, authors, categories, tags, and date or numeric
ranges. Facets are not computed on text fields such as content or object name. Facet results are not
localized; the client application must provide localization.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 291
Facets
Before you create facets, create indexes on the facet attributes. See Configuring facets in xPlore, page
292. Some facets are already configured by default.
For very specific use cases, if the out-of-the-box facet handlers do not meet your needs, you can define
custom facet handlers for facet computation. For example, if a facet potentially include many distinct
values, you can define ranges to group the values.
API overview
Your search client application can define a facet using the DFC query builder API or DFS search
service. For information on using the DFC query builder API, see Building a query with the DFC
search service, page 278. For information on using the DFS search service, see Building a query with
the DFS search service, page 278. Define custom facet handlers using the xPlore facet handler API,
see Defining a facet handler, page 299. In most cases, the out-of-the-box facet handlers are sufficient.
Facets are computed in the following process. The APIs that perform these operations are described
fully in the following topics. For facets javadocs, see the DFC or DFS javadocs.
1. DFC or DFS search service evaluates the constraints and returns an iterator over the results.
2. Search service reads through the results iterator until the number of results specified in
query-max-result-size has been read (default: 10000).
3. For each result, the search service gets the attribute values and increment the corresponding facet
values. Subpath indexes speed this lookup, because the values are found in the index, not in
the xDB pages.
4. The search service performs the following on the list of all facet values:
a. Orders the facet values.
b. Keeps only the top facet values according to setMax (DFC) or setMaxFacetValues (DFS).
Default: 10.
c. Returns the facets values and top results.
292 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Facets
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 293
Facets
Facet datatypes
Each facet datatype requires a different grouping strategy. You can set the following parameters for
each datatype in the specified DfFacetDefinition method (DFC) or FacetDefinition object (DFS). The
main facet datatypes are supported: string, date, and numeric.
294 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Facets
– keepDuplicateValues: Documents with repeating value attributes can have duplicate attribute
values. By default, duplicate entries are removed.
– alpharange: Group by range. Set a property range that specifies ranges, for example: a:m,n:r,s:z.
Specify range using ASCII characters. Uses unicode order, not language-dependent order. For
example:
myFacetDefinition.setProperty("range", "a:m,n:r,s:z");
Facets are returned as IDfFacetValue. Following is an example of the XML representation of returned
facet string values:
<facet name='r_modifier'>
<eleme count='5' value='user2'/>
<element count='3' value='user1/>
</facet>
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 295
Facets
</facet>
296 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Facets
FacetValue
A FacetValue object groups results that have attribute values in common. The FacetValue has a label
and count for number of results in the group. For example, a facet on the attribute r_modifier could
have these values, with count in parentheses:
Tom Terrific (3)
Mighty Mouse (5)
A FacetValue object can also contain a list of subfacet values and a set of custom properties. For
example, a facet on the date attribute r_modify_date has a value of a month (November). The facet
has subfacet values of weeks in the specific month (Week from 11/01 to 11/08). xPlore computes
the facet, subfacet, and custom property values.
FacetDefinition
A FacetDefinition object contains the information used by xPlore to build facet values. The facet name
is required. If no attributes are specified, the name is used as the attribute. Facet definitions must be
specified when the query is first executed. A facet definition can hold a subfacet definition.
FacetSort is an enumeration that specifies the sort order for facet values. It is a field of the
FacetDefinition object. The possible sort orders include the following: FREQUENCY (default) |
VALUE_ASCENDING | VALUE_DESCENDING | NONE. A date facet must set the sort order
to NONE.
Facet results
A Facet object holds a list of facet values that xPlore builds.
A QueryFacet object contains a list of facets that have been computed for a query as well as the query
ID and QueryStatus. This object is like a QueryResult object. A call to getFacets returns QueryResult.
The getFacets method of the SearchService object calculates facets on the entire set of query results
for a specified Query. The method has the following signature:
public QueryFacet getFacets(
Query query, QueryExecution execution, OperationOptions options)
throws SearchServiceException
This method executes synchronously by default. The OperationOptions object contains an optional
SearchProfile object that specifies whether the call is blocking. For a query on several repositories that
support facets, the client application can retrieve facets asynchronously by specifying a SearchProfile
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 297
Facets
object as the OperationOptions parameter. Refer to Documentum Enterprise Content Services for more
information on Query, StructuredQuery, QueryExecution, and SearchProfile.
You can call this method after a call to execute, using the same Query and queryId. Paging information
in QueryExecution has no impact on the facets calculation.
298 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Facets
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 299
Facets
<facet-handlers>
<facet-handler class-name="my.package.MyFacetFactory1"/>
<facet-handler class-name="my.package.MyFacetFactory2"/>
</facet-handlers>
</search-config>
b.
If not already set, modify the subpath configuration for the facet as described in Configuring
your own facets, page 293.
Reindexing is only required if you modify the subpath.
6. To use it in your application, reference the custom handler in the grouping strategy (GroupBy
parameter) of the facet.
Start building the root expression set by adding the result attributes:
IDfExpressionSet exprSet = queryBuilder.getRootExpressionSet();
exprSet.addSimpleAttrExpression("r_modify_date", IDfAttr.DM_TIME,
IDfSimpleAttrExpression.SEARCH_OP_GREATER_EQUAL, false, false, "
300 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Facets
1980-01-01T00:00:00");
exprSet.addSimpleAttrExpression("r_modify_date", IDfAttr.DM_TIME,
IDfSimpleAttrExpression.SEARCH_OP_LESS_EQUAL, false, false, "
2010-01-01T00:00:00");
The previous code builds a query without facets. Now for the facets definition that defines a facet for
person who last modified the document:
DfFacetDefinition definitionModifier = new DfFacetDefinition("r_modifier");
queryBuilder.addFacetDefinition(definitionModifier);
Another facet definition adds the last modification date and sets some type-specific options for the date:
DfFacetDefinition definitionDate = new DfFacetDefinition("r_modify_date");
definitionDate.setMax(-1);
definitionDate.setGroupBy("year");
queryBuilder.addFacetDefinition(definitionDate);
Keywords facet:
DfFacetDefinition definitionKeywords = new DfFacetDefinition("keywords");
queryBuilder.addFacetDefinition(definitionKeywords);
To submit the query and process the results, instantiate IDfQueryProcessor, which is described in the
following topic.
try
{
processor.blockingSearch(SEARCH_TIMEOUT);
} catch (Exception e)
{
e.printStackTrace();
}
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 301
Facets
Tuning facets
Limiting the number of facets to save index space and
computation time
Every facet requires a special index, and every query that contains facets requires computation time
for the facet. As the number of facets increases, the disk space required for the index increases. Disk
space depends on how frequently the facet attributes are found in indexed documents. As the number
of facets in an individual query increase, the computation time increases, depending on whether
the indexes are spread out on disk.
302 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Facets
Logging facets
To turn on logging for facets, use xPlore administrator and open the dsearch-search family. Set
com.emc.documentum.core.fulltext.indexserver.services.facets to DEBUG:
Output is like the following:
2009-08-05 14:37:18,953 DEBUG [pool-3-thread-10]
c.e.d.c.fulltext.indexserver.services.facets.impl.CompositeFacetsProcessor
- Begin facet computation
Troubleshooting facets
A query returns no facets
Check the security mode of the repository. Use the following IAPI command:
get,c,l,ftsearch_security_mode ... 1
API> retrieve,c,dm_ftengine_config ... 0800007580000916
...
0800007580000916
API> get,c,l,ftsearch_security_mode
...
0
If the command returns a 0, as in the example, set the security mode to evaluation in xPlore, not the
Content Server. Use the following IAPI command:
retrieve,c,dm_ftengine_config
set,c,1,ftsearch_security_mode
1
save,c,1
reinit,c
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 303
Chapter 12
Using reports
• About reports
• Types of reports
• Document processing (CPS) reports
• Indexing reports
• Search reports
• Editing a report
• Report syntax
• Sample edited report
• Troubleshooting reports
About reports
Reports provide indexing and query statistics, and they are also a troubleshooting tool. See the
troubleshooting section for CPS, indexing, and search for uses of the reports.
Statistics on content processing and indexing are stored in the metrics database. Statistics on queries
and final merges are stored in the audit database. Use xPlore administrator to query these statistics.
Auditing supplies information to reports on administrative tasks or queries (enabled by default). For
information on enabling and configuring auditing, see Auditing collection operations, page 183.
To run reports, choose Diagnostic and Utilities and then click Reports. To generate Documentum
reports that compare a repository to the index, see Using ftintegrity, page 79.
Types of reports
The following types of reports are available in xPlore administrator.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 305
Using reports
Audit records for search component If search auditing is enabled in System > Global
configuration, you can view and create audit reports.
Filter for query type: interactive, subscription,
warmup, test search, report, metrics, ftintegrity,
consistency checker, or all.
Audit records for admin component If admin auditing is enabled in System > Global
configuration, you can view and create audit reports.
Audit records for final merge Displays the following detailed information about
final merges: domain, collection, instance, type,
trigger time, start time, finish time, wait time, process
time, total time, and status.
Average time of an object in each indexing stage per Reports the average bytes and time for the following
hour/hour original/day/month processing stages: CPS Queue, CPS Executor,
CPS processing, Index Queue, Index Executor,
Index processing, Status Queue, Status Executor
Queue, and Status Update. This value indicates
how long a document sitting in the status update
queue after the document has completed indexing.
The timing is affected by the following two CPS
configuration parameters: status-requests-batch-size
and status-thread-wait-time.
Content too large to index summary report by domain Displays format, count, average size, maximum
size, and minimum size. Summarized by domain
(Documentum repository).
306 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Using reports
Document processing error summary Use first to determine the most common problems.
Displays error code, count, and error text.
Document processing error detail Drill down for error codes. Report for each code
displays the request ID, domain, date and time,
format, and error text.
Documents ingested per month/day/hour/minute Totals for the corresponding period, including
document count, bytes ingested, average processing
latency, and CPS error count.
Get query text Click the query ID from the report Top N slowest
queries to get the XQuery expression.
QBS activity report by ID Find the subcribed queries that take the longest
processing time or are run the most frequently.
QBS activity report by user Find users whose subscribed queries take the longest
processing time.
Query counts by user For each user, displays domain, number of queries,
average response time, and maximum and minimum
response times, and last result count (sortable
columns). Filter for query type: interactive,
subscription, warmup, test search, report, metrics,
ftintegrity, consistency checker, or all. Number of
users sets last N users to display. To get slowest
queries for a user, run the report Top N slowest
queries.
Top query terms Displays most common query terms including number
of queries and average number of hits.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 307
Using reports
Top N slowest queries Displays the slowest queries. Select Number of results
to display. Optionally, specify a user to get slowest
queries for a user. Specify the date and time range.
Sort by various options, such as execution time, fetch
time, total time, fetch count, and start time. Filter for
query type: interactive, subscription, warmup, test
search, report, metrics (query of metrics database for
reports or indexing statistics), ftintegrity, consistency
checker, or all.
User activity report Displays query and ingestion activity and errors for
the specified user and specified period of time. Data
can be exported to Microsoft Excel. Query links
display the xQuery.
IA message summary per hour/day/month Displays the number of error messages and warning
messages raised by document indexing failures
during:.
• each hour on a specified date
• each day in a specified month
• each month of a specified year
308 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Using reports
of documents being indexed. For more information about indexing performance, see Indexing
performance, page 345.
Run the User activity report to see ingestion activity and error messages for ingestion by a specific
user and time period.
Indexing reports
To view indexing rate, run the report Documents ingested per month/day/hour. The report shows
Average processing latency. The monthly report covers the current 12 months. The daily report covers
the current month. The hourly report covers the current day. From the hourly report, you can determine
your period of highest usage. You can divide the document count into bytes processed to find out the
average size of content ingested. For example, 2,822,469 bytes for 909 documents yields an average
size of 3105 bytes. This size does not include non-indexable content.
Search reports
Enable auditing in xPlore administrator to view query reports (enabled by default).
Find the slowest queries by selecting Top N slowest queries. To determine how many queries are
unselective, sort by Fetch Count. Results are limited by default in Webtop to 350.
Sort Top N slowest queries by Hits Filtered Out to see how many underprivileged users are
experiencing slow queries due to security filtering. For information on changing the security cache, see
Configuring the security cache, page 59.
To examine a slow or failed query by a user, get the query ID from Top N slowest queries and then
enter the query ID into Get query text. Examine the query text for possible problems. The following
example is a slow query response time. The user searched in Webtop for the string "xplore" (line
breaks added here):
declare option xhive:fts-analyzer-class '
com.emc.documentum.core.fulltext.indexserver.core.index.xhive.IndexServerAnalyzer';
for $i score $s in collection('
/DSS_LH1/dsearch/Data') /dmftdoc[( ( ( (dmftmetadata//a_is_hidden = 'false') ) )
and ( (dmftinternal/i_all_types = '030a0d6880000105') )
and ( (dmftversions/iscurrent = 'true') ) )
and ( (. ftcontains ( (('xplore') with stemming) ) )) ]
order by $s descending return
<dmrow>{if ($i/dmftinternal/r_object_id) then $i/dmftinternal/r_object_id
else
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 309
Using reports
Use Query counts by user to determine which users are experiencing the slowest query response times
or to see queries by a specific user. You can filter by date and domain.
Editing a report
You can edit any of the xPlore reports. Select a report in xPlore administrator and click Save as.
Specify a unique file name and title for the report. Alternatively, you can write a new copy of the report
and save it to the following location:
xplore_home/wildfly_version/server/DctmServer_PrimaryDsearch
/deployments/dsearch.war/WEB-INF/classes/reports.
To see the new report in xPlore administrator, click somewhere else in xPlore administrator and
then click Reports.
Reports are based on the W3C XForms standard. For a guide to the syntax in a typical report, see
Report syntax, page 311.
310 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Using reports
Adding a variable
Reports require certain variables. The XForms processor substitutes the input value for the variable in
the query.
1. Declare it.
2. Reference it within the body of the query.
3. Define the UI control and bind it to the data.
These steps are highlighted in the syntax description, Report syntax, page 311.
Report syntax
xPlore reports conform to the W3C XForms specification. The original report XForms are located
in the following directory:
xplore_home/wildfly_version/server/DctmServer_PrimaryDsearch
/deployments/dsearch.war/WEB-INF/classes/reports.
You can edit a report in xPlore administrator and save it with a new name. Alternatively, you can copy
the XForms file and edit it in an XML editor of your choice.
These are the key elements that you can change in a report:
Table 27 Report elements
Element Description
xhtml:head/input Contains an element for each input field
xhtml:head/query Contains the XQuery that returns report results
xforms:action Contains xforms:setvalue elements.
xforms:setvalue Sets a default value for an input field. The ref attribute
specifies a path within the current XForms document
to the input field.
xforms:bind Sets constraints for an input field. The nodeset
attribute specifies a path within the current XForms
document to the input field.
xhtml:body Contains the xhtml markup that is rendered in a
browser (the report UI)
The following example highlights the user the input field startTime in the report Query Counts By
User (rpt_QueryByUser.xml). The full report is line-numbered for reference in the example (some
lines deleted for readability):
...<xforms:model><xforms:instance><ess_report xmlns="">
<input>
<startTime/><endTime/>...
</input>
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 311
Using reports
<query><![CDATA[ ...
let $u1 := distinct-values(collection('/SystemData/AuditDB')//
event[@component = "search"...
and START_TIME[ . >= $startTime]...
return <report ...>...<rowset>...
for $d in distinct-values(collection('/SystemData...
and START_TIME[ . >= $startTime] and START_TIME[ . <= $endRange]]...
return let $k := collection('AuditDB')...and
START_TIME[ . >= $startTime] and START_TIME[ . <= $endRange]
... return ...
} </rowset></report> ]]></query></ess_report></xforms:instance>
• 1 Specifies the XQuery for the report. The syntax conforms to the XQuery specification.
xhtml:head/xforms:model/xforms:instance/ess_report/query
• 4 return report/rowset: The return is an XQuery FLWOR expression that specifies what is returned
from the query. The transform plain_table.xsl, located in the same directory as the report, processes
the returned XML elements.
• 5 This expression iterates over the rows returned by the query. This particular expression evaluates
all results, although it could evaluate a subset of results.
• 6 This expression evaluates various computations such as average, maximum, and minimum
query response times.
• 7 The response times are returned as row elements (evaluated by the XSL transform).
<xforms:action ev:event="xforms-ready">
<xforms:setvalue ref="input/startTime" value="seconds-to-dateTime(
seconds-from-dateTime(local-dateTime()) - 24*3600)"/>...
</xforms:action>...
<xforms:bind nodeset="input/startTime" constraint="seconds-from-dateTime(
.) <= seconds-from-dateTime(../endTime)"/>
<xforms:bind nodeset="input/startTime" type="xsd:dateTime"/>...
</xforms:model>...</xhtml:head>
<xhtml:body>...<xhtml:tr class="">
<xhtml:td>Start from:</xhtml:td>
<xhtml:td><xforms:group>
<xforms:input ref="input/startTime" width="100px" ev:event="DOMActivate">
<xforms:message ev:event="xforms-invalid" level="ephemeral">
312 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Using reports
The "Start from" date should be no later than the "to" date.
</xforms:message>
<xforms:action ev:event="xforms-invalid">
<xforms:setvalue ref="../endTime" ev:event="xforms-invalid"
value="../startTime"/><xforms:rebuild/>
</xforms:action>
</xforms:input></xforms:group></xhtml:td></xhtml:tr>...
5. This step finds failed queries. Locate the variable definition for successful queries (for $j ...let $k
...) and add your new query. Find the nodes in a QUERY element whose TOTAL_HITS value is
equal to zero to get the failed queries:
let $z := collection('AuditDB')//event[@component = "search" and
@name = "QUERY" and START_TIME[ . >= $startTime and . <= $endRange]
and USER_NAME = $j and TOTAL_HITS = 0]
6. Define a variable for the count of failed queries and add it after the variable for successful query
count (let $queryCnt...):
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 313
Using reports
7. Return the failed query count cell, after the query count cell (<cell> { $queryCnt } ...):
<cell> { $failedCnt } </cell>
8. Redefine the failed query variable to get a count for all users. Add this line after <rowset...>let $k...:
let $z := collection('AuditDB')//event[@component = "search" and @name = "
QUERY" and START_TIME[ . >= $startTime and . <= $endRange] and USER_NAME
and TOTAL_HITS = 0]
9. Add the total count cell to this second rowset, after <cell> { $queryCnt } </cell>:
<cell> { $failedCnt } </cell>
10. Save and run the report. The result is like the following:
If your query has a syntax error, you get a stack trace that identifies the line number of the error. You
can copy the text of your report into an XML editor that displays line numbers, for debugging.
314 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Using reports
If the query runs slowly, it will time out after about one minute. You can run the same query in
the xDB admin tool.
Troubleshooting reports
If you update Internet Explorer or turn on enforced security, reports no longer contain content. Open
Tools > Internet Options and choose the Security tab. Click Trusted sites and then click Sites. Add
the xPlore administrator URL to the Trusted sites list. Set the security level for the Trusted sites zone
by clicking Custom level. Reset the level to Medium-Low.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 315
Chapter 13
Logging
Configuring logging
Note: Logging can slow the system and consume disk space. In a production environment, run the
system with minimal logging.
Basic logging can be configured for each service in xPlore administrator. Log levels can be set for
indexing, search, CPS, xDB, and xPlore administrator. You can log individual packages within these
services, for example, the merging activity of xDB. Log levels are saved to indexserverconfig.xml
and are applied to all xPlore instances. xPlore uses logback and slf4j (Simple Logging Façade for
Java) to perform logging.
To set logging for a service, choose System Overview in the left panel. Choose Global Configuration
and then choose the Logging tab to configure logging for all instances. You can open one of the
logging families like xdb and set levels on individual packages.
To customize the instance-level log setting, edit the logback.xml file in each xPlore instance. The
logback.xml file is located in the WEB-INF/classes directory for each deployed instance war file, for
example, xplore_home/wildfly_version/server/DctmServer_PrimaryDsearch/deployments/dsearch.war.
Levels set in logback.xml have precedence over log levels in xPlore administrator. Changes to
logback.xml take up to two minutes to take effect.
Each logger logs a package in xPlore or your customer code. The logger has an appender that specifies
the log file name and location. DSEARCH is the default appender. Other defined appenders in the
primary instance logback configuration are XDB, CPS_DAEMON, and CPS.
You can add a logger and appender for a specific package in xPlore or your custom code. The
following example adds a logger and appender for the package com.mycompany.customindexing::
<logger name="com.mycompany.customindexing" additivity="false"
level="INFO">
<appender name="CUSTOM" class="
ch.qos.logback.core.rolling.RollingFileAppender">
<file>C:/xPlore/wildfly9.0.1/server/DctmServer_PrimaryDsearch/
logs/custom.log
</file>
<encoder>
<pattern>%date %-5level %logger{20} [%thread]
%msg%n</pattern>
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 317
Logging
<charset>UTF-8</charset>
</encoder>
<rollingPolicy class="ch.qos.logback.core.rolling.
FixedWindowRollingPolicy">
<maxIndex>100</maxIndex>
<fileNamePattern>C:/xPlore/wildfly9.0.1/server/DctmServer_
PrimaryDsearch/logs/custom.log.%i</fileNamePattern>
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.
SizeBasedTriggeringPolicy">
<maxFileSize>10MB</maxFileSize>
</triggeringPolicy>
</appender>
</logger>
You can add your custom logger and appender to logback.xml. Add it to a logger family if you want
your log entries to go to one of the logs in xPlore administrator. This is an optional step – if you don't
add your custom logger to a logger family, it will still log to the file that you specify in your appender.
Logger families are defined in indexserverconfig.xml. They are used to group logs in xPlore
administrator. You can set the log level for the family, or expand the family to set levels on individual
loggers.
The following log levels are available. Levels are shown in increasing severity and decreasing amounts
of information, so that TRACE displays more than DEBUG, which displays more than INFO.
• TRACE
• DEBUG
• INFO
• WARN
• ERROR
Troubleshooting the index agent, page 94 provides information about the logging configuration for
the index agent.
Enabling logging in a client application, page 324 provides information about logging for xPlore
client APIs.
Viewing logs
You can view indexing, search, CPS, and xDB logs in xPlore administrator. Choose an instance in the
tree and click Logging. Indexing and search messages are logged to dsearch. Click the tab for dsearch,
cps, cps_daemon, or xdb to view the last part of the log. Click Download to get links for each log file.
Query logging
The xPlore search service logs queries. When you turn on query auditing (default is true), additional
information is saved to the audit record and is available in reports. Auditing queries, page 264 provides
more information about query logging.
For each query, the search service logs the following information for all log levels:
318 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Logging
To view a log, choose the instance and click Logging. The following examples from dsearch.log
show a query:
2012-03-28 12:19:02,664 INFO [RMI TCP Connection(9)-10.8.47.144]
c.e.d.c.fulltext.indexserver.search.SearchServer -
QueryID=PrimaryDsearch$6f35b53d-34b8-470d-b699-5b4364ef0815,
query-locale=en,query-string=let $j:= for $i score $s in /dmftdoc
[. ftcontains 'ASMAgentServer' with stemming]
order by $s descending return <d> {$i/dmftmetadata//r_object_id}
{ $i/dmftmetadata//object_name } { $i/dmftmetadata//r_modifier } </d>
return subsequence($j,1,200) is running
...
2012-03-28 12:19:05,117 INFO [pool-14-thread-10]
c.e.d.c.f.i.admin.mbean.ESSAdminSearchManagement -
QueryID=PrimaryDsearch$6f35b53d-34b8-470d-b699-5b4364ef0815,
Result count=1,bytes count=187
CPS logging
CPS uses the xPlore logback and slf4j logging framework. A CPS instance that is embedded in an
xPlore instance (installed with xPlore, not separately) uses the logback.xml file in WEB-INF/classes of
the dsearch web application. A standalone CPS instance uses logback.xml in the CPS web application,
in the WEB-INF/classes directory.
If you have installed more than one CPS instance on the same host, each instance has its own web
application and logback.xml file. To avoid one instance log overwriting another, make sure that each
file appender in logback.xml points to a unique file path.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 319
Chapter 14
Setting up a Customization Environ-
ment
Customization points
You can customize indexing and searching at several points in the xPlore stack. The following
information refers to customizations that are supported in a Documentum environment.
The following diagram shows indexing customization points.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 321
Setting up a Customization Environment
1. Using DFC, create a BOF module that pre-filters content before indexing. See Custom content
filters, page 92.
2. Create a TBO that injects data from outside a Documentum repository, either metadata or content.
You can use a similar TBO to join two or more Documentum objects that are related. See Injecting
data and supporting joins, page 89.
3. Create a custom routing class that routes content to a specific collection based on your enterprise
criteria. See Creating a custom routing class, page 162.
The following diagram shows query customization points.
322 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Setting up a Customization Environment
1. Using WDK, modify Webtop search and results UI. See Documentum Search Development Guide.
2. Using DFS, implement StructuredQuery, which generates an XQuery expression. xPlore processes
the expression directly. See Building a query with the DFS search service, page 278.
3. Using DFC or DFS, create NOFTDQL queries or apply DQL hints (not recommended except for
special cases).
DQL is evaluated in the Content Server. Implement the DFC interface IDfQuery and the DFS query
service. FTDQL queries are passed to xPlore. Queries with the NOFTDQL hint or which do not
conform to FTDQL criteria are not passed to xPlore. See DQL Processing, page 245.
4. Using DFC, modify Webtop queries. Implement the DFC search service, which generates XQuery
expressions. xPlore processes the expression directly. See Documentum Search Development
Guide.
5. Using DFC, create XQueries using IDfXQuery. See Building a DFC XQuery, page 279.
6. Create and customize facets to organize search results in xPlore. See the Facets chapter.
7. Target a specific collection in a query using DFC or DFS APIs. See Routing a query to a specific
collection, page 276.
8. Use xPlore APIs to create an XQuery for an XQuery client. See Building a query using xPlore
APIs, page 282.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 323
Setting up a Customization Environment
4. Place your class in the indexagent.war WEB-INF/classes directory. Your subdirectory path under
WEB-INF/classes must match the fully qualified routing class name.
5. Restart the xPlore instances, starting with the primary instance.
324 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Setting up a Customization Environment
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 325
Chapter 15
Performance and Disk Space
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 327
Performance and Disk Space
Multiple collections increase the throughput for ingestion. You can create a collection, ingest
documents to it, then move it to be a subcollection of a parent collection. (See Moving a collection,
page 177. Fewer collections speed up search.
Use the rough guidelines in the following diagram to help you plan scaling of search. The order of
adding resources is the same as for ingestion scaling.
328 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Performance and Disk Space
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 329
Performance and Disk Space
330 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Performance and Disk Space
2. Perform a query to return 1000 documents in each format. Specify the average size range, that is,
r_full_content_size greater than (average less some value) and less than (average plus some value).
Make the plus/minus value a small percentage of the average size. For example:
?,c,select r_object_id,r_full_content_size from dm_sysobject
where r_full_content_size >(1792855 -1000) and
r_full_content_size >(1792855 +1000) and
a_content_type = 'zip' enable (return_top 1000)
3. Export these documents and index them into new, clean xPlore install.
4. Determine the size on disk of the dbfile and lucene-index directories in xplore_home./data
5. Extrapolate to your production size.
For example, you have ten indexable formats with an average size of 270 KB from a repository
containing 50000 documents. The Content Server footprint is approximately 12 GB. You get a sample
of 1000 documents of each format in the range of 190 to 210 KB. After export and indexing, these
10000 documents have an indexed footprint of 286 MB. Your representative sample was 20% of the
indexable content. Thus your calculated index footprint is 5 x sample_footprint=1.43 GB (dbfile 873
MB, lucene-index 593 MB).
Adding storage
The data store locations for xDB libraries are configurable. The xDB data stores and indexes can
reside on a separate data store, SAN or NAS. Configure the storage location for a collection in xPlore
administrator. You can also add new storage locations through xPlore administrator.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 331
Performance and Disk Space
332 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Performance and Disk Space
A remote CPS instance does not perform as well as a CPS instance on an indexing instance.
The remote instance adds overhead for the xPlore system. To add CPS instances, run the xPlore
configuration script and choose Create Content Processing Service Only.
Sizing for search performance
You can size several components of an xPlore system for search performance requirements:
• CPU capacity
• Memory for query caches
Using xPlore administrator, change the value of query-result-cache-size in search service
configuration and restart the search service.
Memory consumption
Following are ballpark estimates for memory consumption by the various components in an xPlore
installation.
Table 30 Average memory consumption
Component RAM
Index agent 1+ GB (4+ GB on 64-bit host)
xPlore indexing and search services 4 GB
CPS daemon 2 GB
For best performance, add index agent processes and CPS on hosts separate from the xPlore host.
Measuring performance
The following metrics are recorded in the metrics database. View statistics in xPlore administrator
to help identify specific performance problems. Select an xPlore instance and then choose Indexing
Service or Search Service to see the metric. Some metrics are available through reports, such as
document processing errors, content too large, and ingestion rate.
Table 31 Metrics mapped to performance problems
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 333
Performance and Disk Space
To get a detailed message and count of errors, use the following XQuery in xPlore administrator:
for $i in collection('/SystemData/MetricsDB/PrimaryDsearch')
/metrics/record/Ingest[TypeOfRec='Ingest']/Errors/ErrorItem
return string(<R><Error>>{$i/Error}</Error><Space>" " </Space>
<Count>>{$i/ErrorCnt}</Count></R>
To get the total number of errors, use the following XQuery in xPlore administrator:
sum(for $i in collection('/SystemData/MetricsDB/PrimaryDsearch')
/metrics/record/Ingest [TypeOfRec='Ingest']/Errors/ErrorItem/ErrorCnt return $i)
334 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Performance and Disk Space
• xPlore caches
Temporary cache to buffer results. Using xPlore administrator, change the value of
query-result-cache-size in search service configuration and restart the search service.
Using compression
Indexes can be compressed to enhance performance. Compression uses more I/O memory. The
compress element in indexserverconfig.xml specifies which elements in the ingested document have
content compression to save storage space. Compressed content is about 30% of submitted XML
content. Compression can slow the ingestion rate by 10-20% when I/O capacity is constrained. See
Configuring text extraction, page 147.
Sample setting:
<compress>
<for-element name="dmftcontentref"></for-element>
</compress>
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 335
Performance and Disk Space
that contain one or more indexed documents. New Lucene segments are added as new documents are
being indexed.
To improve query performance, Lucene automatically merges smaller segments into larger ones
based on your settings. You can adjust the values of two xDB properties to fine tune the indexing
performance.
• mergeFactor
mergeFactor decides the quantity of Lucene segments in one sub-index, which defaults to 10. Using
a higher value of mergeFactor keeps more Lucene segments in one sub-index and further speeds up
the indexing process. However, this may cause bad query performance.
• maxMergeDocs
Use the maxMergeDocs to specify the largest size of legitimate segments for merging. While
merging segments, Lucene will ensure that no segment with more than maxMergeDocs is created.
For example, if you set maxMergeDocs to 1000, when you add the 10,000th document, instead of
merging multiple segments into a single segment of size 10,000, Lucene will create a 10th segment
of size 1000, and keep adding segments of size 1000 for every 1000 documents added. A larger
maxMergeDocs is better suited for batch indexing.
Note: When optimizing the quantity of segments, a final merge ignores the maxMergeDocs setting.
Non-final merge
Non-final merge is an asynchronous process that merges smaller sub-indexes into larger ones, which
frees up memory and disk space and speeds up query. A non-final merge is more lightweight than a
full final merge and has less impact on the indexing and query performance, so you can configure it
to run more frequently. During a non-final merge, sub-indexes under a certain size (specified in the
nonFinalMaxMergeSize property in xdb.properties) are merged into a fresh, new index at regular
intervals (specified in the cleanMergeInterval property in xdb.properties).
336 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Performance and Disk Space
Final merge
Final merge is an asynchronous process that merges all eligible sub-indexes into a single, optimized
“final” index. Although the desired outcome of a final merge is a single index, in practice, this is often
not the case since during the final merge process, which takes quite a long time, new sub-indexes are
created as a result of the ongoing indexing process.
A final merge is very CPU-intensive and I/O-intensive, and can take a long time to complete. At any
point in time, only one final merge is allowed to run on a single node.
The final merge process also takes up a lot of disk space. During a final merge, the system shrinks
existing sub-indexes by moving and consolidating index entries into a new sub-index which resides in
an empty luceneblob. When no empty luceneblob is not available, the system will yield this chance
and wait for the next final merge to run. As a result, a final merge can require up to two times the size
of the final index to move things around during the merging process. For example, you can see disk
space usage at 100G at one point but 300G at another point when a final merge is in progress. Under
such circumstances, you can manually launch a final merge to accelerate the merging process and
free more disk space.
Final merge should not be run during performance-critical periods such as business hours, especially
for large indexes.
The final merge policy is configurable but it requires care because final merge has a direct impact on
the overall performance of the system. While it is desirable to keep the number of sub-indexes small,
sub-index merging can be time-consuming, CPU-intensive, and I/O-intensive. If done too often,
final merges may cause noticeable performance drop in both the indexing and query performance.
Therefore, you should closely monitor and carefully schedule final merges and avoid them during
performance-critical hours.
You can manage final merges in one of the following ways:
• Manually starting and stopping final merges
You can manually start a final merge on a collection directly from xPlore Administrator.
• Interval-based scheduling (instance-specific)
By default, a collection uses the Lucene internal final merge interval setting to run final merges at
specified intervals. You can set the interval by specifying the xdb.lucene.finalMergingInterval
value in xdb.properties and it applies to all the collections in the xPlore instance that uses this
scheduling option.
Once set, the interval-based scheduler takes effect immediately. It runs the initial final merge and
sets the current time as the starting point to schedule subsequent final merges based on the specified
interval. However, the starting point is reset and final merges rescheduled whenever the instance
the collection is bound to is restarted, which makes the final merge running schedules subject to
change and not very predictable or manageable.
For example, at 20:00 on the first day, you set final merges to run on a collection every 24 hours.
The first final merge runs immediately after the scheduling takes effect and the second final
merge runs at 20:00 the next day. On the third day, the xPlore instance to which the collection is
bound is restarted at 9:00. This resets the starting point of the scheduler for the collection to 9:00
and reschedules subsequent final merges to run at 9:00 every day. Do not use the interval-based
scheduler if you want to maintain a precise final merge schedule.
• Cron-style scheduling (collection-specific)
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 337
Performance and Disk Space
With this method, you can configure final merges to run at fixed times, dates, or intervals. Once
set, the Cron-style scheduler settings takes effect immediately. Cron-style scheduling applies to
the current collection only.
While Cron-style scheduling gives you more flexibility and precise control over when to start final
merges, it requires you to have a good understanding of your system usage. Use this scheduling
mode when you know exactly when is the best time to run final merges.
• Threshold-based scheduling
Through threshold-based scheduling, you can configure final merges to run periodically not only
based on time, but also based on necessity and priority, taking into account the current state of
sub-indexes. This advanced tuning tool allows you to balance the need to merge sub-indexes into
the final index (too many sub-indexes impacts performance) and the need to avoid unnecessary
or too-frequent final merges (final merges also impact performance) to achieve optimal system
performance.
There is no user interface for configuring the threshold-based schedule. All the settings are
configured through properties in indexserverconfig.xml.
The final merge scheduling methods are mutually exclusive. Only on scheduler is effective at a time
and you cannot use the combination of multiple scheduler settings.
You can use the Audit Records for Final Merge report to view detailed final merge log data to quickly
identify performance issues associated with final merges.
338 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Performance and Disk Space
Sub-indexing quantity
Too many sub-indexes, especially large ones, slow down indexing and query. The larger the number of
sub-indexes, the more performance impact, and the higher the priority to run a final merge. When
the quantity of sub-indexes reaches the must-merge threshold, it is important to run a final merge
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 339
Performance and Disk Space
immediately to optimize the system performance. On the other hand, if there are too few sub-indexes
and the number falls below a certain point (not-merge threshold), triggering a resource-intensive final
merge is inefficient and will hurt performance.
You set the sub-index quantity thresholds through the following properties in indexserverconfig.xml:
• lmpi-finalmerge-threshold-cleanentry-mustmerge
• lmpi-finalmerge-threshold-cleanentry-notmerge
340 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Performance and Disk Space
...
<index-config>
<properties>
<property value="true" name="lmpi-finalmerge-threshold-enabled">
...
</properties>
...
The threshold-based scheduler is disabled by default. Once enabled, it overrides the interval-based
scheduler or the Cron-style scheduler. When the threshold-based scheduler is disabled, you need
to manually revert to the interval-based scheduler or the Cron-style scheduler through xPlore
Administrator.
3. Configure the criteria for prioritizing final merges through properties within the index-config
element in indexserverconfig.xml.
a. Set the sub-index quantity thresholds:
• lmpi-finalmerge-threshold-cleanentry-mustmerge
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 341
Performance and Disk Space
If the number of sub-indexes on the collection reaches this value, makes it a high priority
(must-merge) to run a final merge. Default: 10.
• lmpi-finalmerge-threshold-cleanentry-notmerge
If the number of sub-indexes on the collection falls below this value, makes it a low priority
(do-not-merge) to run a final merge. Default: 2.
Set these two properties in conjunction with the xdb.lucene.nonFinalMaxMergeSize property
in xdb.properties, which the scheduler uses to calculate the number of sub-indexes to
merge. The larger the value of xdb.lucene.nonFinalMaxMergeSize, the smaller the value of
xdb.lucene.nonFinalMaxMergeSize, and vice versa.
Only sub-indexes with size above xdb.lucene.nonFinalMaxMergeSize are considered for
final merge by the threshold-based scheduler. Smaller sub-indexes are handled by the more
frequent non-final merge process.
b. Set the black node quantity thresholds:
• lmpi-finalmerge-threshold-bl-mustmerge
If the number of black nodes on a collection is greater than this value, makes it a high
priority (must-merge) to run a final merge. Default: 50000.
• lmpi-finalmerge-threshold-bl-notmerge
If the number of black nodes on the collection is smaller than this value, makes it a low
priority (do-not-merge) to run a final merge. Default: 5000.
For information about final merge priorities and prioritization rules, see Final merging
priorities and prioritization rules , page 339.
4. Set the schedule for final merges of different priority levels:
• lmpi-finalmerge-threshold-interval
Specify the time interval in seconds at which to run medium priority (optional-merge) final
merges. Default: 14400 (4 hours).
Note: This interval setting is only effective in threshold-based scheduling. Do not confuse this
with the interval settings in the interval-based and Cron-style schedulers.
• lmpi-finalmerge-oap-weekend
Specify weekend days (24 hours a day) on which to run medium priority (optional-merge)
final merges. Specify numbers that correspond to system weekdays (not calendar weekdays):
1=Sunday, 2=Monday, 3=Tuesday, 4=Wednesday, 5=Thursday, 6=Friday, 7=Saturday. Delimit
multiple days with comma (,); for example:
lmpi-finalmerge-oap-weekend=1,7
This setting overrides the time slot settings specified in the lmpi-finalmerge-oap-day property.
• lmpi-finalmerge-oap-day
Specify in the hh:mm format time slots (StartTime-EndTime) in a day in which to run medium
priority (optional-merge) final merges. Multiple time slots cannot overlap; Delimit them with
comma (,); For example:
lmpi-finalmerge-oap-day=20:15–8:15,12:01-13:30
• lmpi-finalmerge-parallel-execution-crossnode
Specify whether to allow multiple final merges on different nodes to run simultaneously. Set this
to false if multiple nodes share the same storage area to prevent I/O bottleneck. Default: true.
342 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Performance and Disk Space
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 343
Indexing
Indexing
Documentum index agent performance
Index agent settings
The parameters described in this section can affect index agent performance. Do not change these
values unless you are directed to change them by OpenText Global Technical Services.
In migration mode, set the parameters in the indexagent.xml located in
index_agent_WAR/WEB-INF/classes/.
In normal mode, also set the corresponding parameters in the dm_ftindex_agent_config object.
In normal mode, index agent configuration is loaded from indexagent.xml and from the
dm_ftindex_agent_config object. If there is a conflict, the settings in the config object override the
settings in indexagent.xml.
344 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Indexing
Indexing performance
Various factors affect the rate of indexing. You can tune some indexing and xDB parameters and
adjust allowable document size.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 345
Indexing
If the system crashes during a period of heavy ingestion, transactional recovery could take a long
time as it replays the log. The recovery process is single-threaded. Alternatively, you can set up an
active/active high availability system so that failure in a single system does not disrupt business.
Search performance
Changing the security cache sizes
Monitor the query audit record to determine security performance. The value of
<TOTAL_INPUT_HITS_TO_FILTER> records how many hits a query had before security filtering.
The value of <HITS_FILTERED_OUT> shows how many hits were discarded because the user did
346 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Indexing
not have permissions for the results. The hits filtered out divided by the total number of hits is the
hit ratio. A low hit ratio indicates an underprivileged user, who often has slower query response
times than other users.
You can improve security filter performance by adjusting values of security filter properties. For
information on how to change these configuration settings, see Configuring the security cache, page 59
If you have many ACLs, increase the value of global-ace-cache-size, acl-cache-size and
global-acl-cache-user-count.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 347
Indexing
are fetched. CenterStage limits results to 150. Paging is especially important to limit result sets
for underprivileged users.
• Number of collections
If queries are not run in parallel mode (across several collections at once), response time rises as the
number of collections rises. Queries can be targeted to specific collections to avoid this problem. If
you do not use targeted queries, try to limit the number of collections in your xPlore federation.
For information on targeted queries, see Routing a query to a specific collection, page 276. To set
parallel mode for DFC-based search applications, set the following property in dfc.properties to true:
dfc.search.xquery.option.parallel_execution.enable = true
• Caches empty on system startup
At startup, the query and security caches have not been filled, so response times are slower. Make
sure that you have allocated sufficient memory for the file system buffer cache and good response
time from the I/O subsystem.
• Response times slower during heavy ingestion
Slow queries during ingestion are usually an issue only during migration from FAST. If your
environment has large batch migrations once a month or quarterly, you can set the target collection
or domain to index-only during ingestion. Alternatively, you can schedule ingestion during an
off-peak time.
348 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Indexing
parameter. Default: false. If you turn on this feature, make sure the failure_document_id_file is
exclusively used by the current CPS instance and not shared by other CPS instances.
• connection_pool_size: Specifies how many connections CPS manager allocates to each daemon.
If CPU and memory allow, add to daemon_count, then increase connection_pool_size to use a
reasonable amount of memory. Set this value to 3 or greater.
• cut_off_text: Set to true to cut off text of large documents that exceed max_text_threshold instead
of rejecting entire document. Default: false. Documents that are partially indexed are recorded
in cps_daemon.log: docxxxx is partially processed. The dftxml is annotated with the element
partiallyIndexed.
• daemon_count: Specifies the number of daemons that handle normal indexing and query requests
(not a dedicated query daemon). Set from 1 to 6. Default: 1. For information about adding CPS
daemons, see Adding CPS daemons for ingestion or query processing, page 125.
• daemon_restart_threshold: Specifies how many requests a CPS daemon can handle before it restarts.
Default: 1000.
• daemon_restart_memory_threshold: Set a value in bytes for maximum CPS memory consumption.
After this limit, CPS will restart. A maximum of 8 GB is recommended. Default: 4000000000.
• daemon_restart_consistently: Specifies whether CPS should restart regularly after it is idle for 5
minutes. Default: true.
• dump_context_if_exception: Specifies whether to dump stack trace if exception occurs. Default:
true.
• failure_document_id_file: The file that contains IDs of failed documents to
be skipped. You can edit this file. IDs of failed documents are added to it
automatically if add_failure_documents_automatically is set to true. Default:
xplore_home/cps/skip_failure_document.txt.
• io_block_unit: Logical block unit of the read/write target device. Default: 4096.
• io_chunk_size: Size for each read/write chunk. Default: 4096.
• linguistic_processing_time_out: Interval in seconds after which a CPS hang in linguistic processing
forces a restart. Valid values: 60 to 360. Default: 360.
• load_content_directly: For internal use only.
• query_dedicated_daemon_count: The number of CPS daemons dedicated to query processing.
Other CPS daemons handle ingestion when there is a dedicated query daemon. Valid values: 0
to 3. Default: 1.
• retry_failure_in_separate_daemon: Specifies whether to retry failed documents in a newly spawned
CPS daemon. Default: true. A retry daemon is not limited by the value of daemon_count.
• skip_failure_documents: Specifies whether CPS should skip documents that fail processing instead
of retrying them, to reduce CPS crashes. Default: true. Failed documents are retried once unless
this property is set to true.
• skip_failure_documents_upper_bound: Specifies the maximum number of failed documents that
CPS will record in the failure document. Valid values: integers. Default: -1 (no upper bound)
• text_extraction_time_out: Interval in seconds after which a CPS hang in text extraction forces a
restart. Valid values: 60 to 300. Default: 300.
• use_direct_io: Requires CPS to read and write staging files to devices directly. Default: false. If
most incoming files are local, use the default caching. If most files are remote, use direct IO.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 349
Indexing
Note: For local CPS, the total daemon count (daemon_count and query_dedicated_daemon_count
put together) should be less than 8. Consider a case in which daemon_count is 6 and
query_dedicated_daemon_count is 1, you may still face CPS processing bottleneck. For such
a scenario, you should consider adding remote CPS on other hosts. For remote CPS, there is no
limitation for the total daemon count.
See also: Tuning CPS and xDB for search, page 348.
350 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Indexing
1. Use the following DQL query to determine the number of documents modified and accessed in
the past two years (change DQL to meet your requirements):
Throttling indexing
Index speed is throttled when either a final merge is running or the number of search queries xPlore is
executing (also known as active search count) exceeds a certain threshold.
Index speed is defined as the content size that the index service processes per second. The content size
is the sum of the dftxml size and the value of the r_content_size attribute. For delete operations, as the
index request does not contain dftxml, the system uses a constant value 4k as the content size.
The active search count stands for the number of search queries xPlore is executing at the current
time point, which increases by one each time a search request is submitted to xPlore and decreases
by one each time a search request completes.
By default, index throttling is enabled. To adjust the speed limit of index throttling, modify the
following property under node->properties in indexserverconfig.xml.
index-throttle-speed-kbps-limit: Specifies index speed in kilobyte per second. Default value: 400.
To turn off index throttling, set this value to -1.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 351
Indexing
Throttling searching
The search service is throttled via limiting the number of concurrent search query jobs when GC
busyness is higher than a certain value.
GC busyness is measured as a floating number ranging in between [0, 1]. The number represents the
portion of time JVM's garbage collection takes in the last 60 seconds. For example, 0.5 indicates that
JVM spent a total of 30 seconds on garbage collection in the last 60 seconds.
By default, search throttling is enabled. To adjust the search throttling behavior, modify the following
properties under node->properties in indexserverconfig.xml.
• throttle_gc_busyness_limit: The GC overhead threshold. If xPlore GC overhead exceeds this
threshold, the number of concurrent search query jobs is limited. Default value: 0.4.
• throttle_control_query_concurrency_limit: The maximum number of concurrently running search
query jobs allowed during search throttling. Default value: 2. To turn off index throttling, set
this value to -1.
Search throttling ends when the GC overhead decreases to a value lower than
throttle_gc_busyness_limit.
352 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Appendix A
Index Agent, CPS, Indexing, and
Search Parameters
∙ dm_ftengine_config
∙ Index agent configuration parameters
∙ Document processing and indexing service configuration parameters
∙ Search service configuration parameters
∙ API Reference
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 353
dm_ftengine_config
dm_ftengine_config
Attributes
Following are attributes specific to dm_ftengine_config. Some attribute values are set by the index
agent when it creates the dm_ftengine_config object.
Note: Not all attributes values are set at object creation. If you do not set values, the default
values are used. For instructions on changing the attribute values, see Query plugin configuration
(dm_ftengine_config), page 240.
For iAPI syntax to change attributes, see Query plugin configuration (dm_ftengine_config), page 240.
Attribute Description
acl_check_db: If ftsearch_security_mode is set to xPlore and this
value is set to true, xPlore filters the results based on
its ACL information, and then Content Server filters
again based on current database ACL information
from the filtered results set (double security check).
You must disable XQuery generation. See Changing
search results security, page 55.
acl_domain: Owner of dm_fulltext_admin_acl (user specified at
installation)
acl_name dm_fulltext_admin_acl
default_fuzzy_search_similarity Controls the degree of similarity between a search
term and an index term. Default; 0.5. See Configuring
fuzzy search, page 232.
dsearch_config_host Specifies the fully qualified host name or IP address
of the xPlore host that the index agent connects to.
dsearch_config_port Specifies the HTTP or HTTPS port that the index
agent connects to.
dsearch_domain Name of repository
dsearch_override_locale Overrides the locale of the query with the specified
locale.
dsearch_qrserver_host Specifies the fully qualified host name or IP address of
the xPlore host that the Content Server query plugin
connects to.
dsearch_qrserver_port Specifies the HTTP or HTTPS port of the xPlore host
that the Content Server query plugin connects to.
dsearch_qrserver_protocol Sets HTTPS or HTTP as connection protocol.
dsearch_qrserver_target xPlore index server servlet: Partial URL combined
with host, protocol and port to create full URL.
dsearch_qrygen_mode For internal use only.
dsearch_result_batch_size Sets the number of results fetched from xPlore in each
batch. Default: 200.
fast_wildcard_compatible (replaces fds_contain_frag- Sets fragment search option. Default: false.
ment)
354 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Agent, CPS, Indexing, and Search Parameters
Attribute Description
filter_config_id Most recent object ID of dm_filter_config type
folder_cache_limit Specifies the maximum number of maximum folder
IDs included in the index probe. Default: 2000.
If folder descend condition evaluates to less than
folder_cache_limit value, then folder ids are pushed
into index probe. Otherwise, the folder constraint is
evaluated separately for each result. Raise the value
if folder descend queries are slow or timing out.
Lower the value if folder descend queries cause out of
memory to too many clauses errors. See Query plugin
configuration (dm_ftengine_config), page 240.
ft_collection_id Repeating attribute that references a collection object
of the type dm_fulltext_collection. Reserved for use
by Content Server client applications.
ft_wildcards_mode Specifies how wildcards are evaluated in full-text
clauses (Webtop simple search, Webtop advanced
search Contains field, and global search in xCP 2.0).
Valid values:
• none: Wildcard is treated as a literal *
character.
• explicit: Wildcard character must be entered to
search for fragments (default).
• implicit: Wildcards are added implicitly around
every search term (negative impact on
performance).
• trailing_implicit: A wildcard is added to the end
of every search term.
ftsearch_security_mode 0: Content Server. DFC search service will not use
IDfXQuery and instead will generate DQL.
1: xPlore (default)
fuzzy_search_enable Specifies whether fuzzy search is applied. Default:
false. See Configuring fuzzy search, page 232
.
group_name dm_fulltext_admin
object_name Dsearch Fulltext Engine Configuration
FAST
query_plugin_mapping_file .
Path on Content Server host to mapping file. This file
maps attribute conditions to the XQuery subpaths.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 355
Attributes
Attribute Description
security_mode Sets summary security mode. See Configuring
summary security, page 231
.
thesaurus_search_enable Set to true to enable thesaurus search.
use_thesaurus_on_phrase Set to true to match entire phrases in the thesaurus.
parallel_summary_computing_enable Set to true to enable parallel summary calculation and
automatically convert non-parallel summary queries
into parallel summary queries in query generation.
356 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Agent, CPS, Indexing, and Search Parameters
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 357
General index agent runtime parameters
Requests for indexing pass from the exporter queue to the indexer queue to the callback queue.
Parameter Description
358 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Agent, CPS, Indexing, and Search Parameters
Parameter Description
partition_config You can add this element and its contents to map
partitions to specific collections. See Mapping Server
storage areas to collections, page 76.
Parameter Description
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 359
Document processing and indexing service configuration parameters
360 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Agent, CPS, Indexing, and Search Parameters
You can configure the following CPS settings for each instance in xPlore Administrator. The values
are recorded in the file instance_name_local_configuration.xml, where instance_name is the name
of the xPlore instance you are configuring CPS settings for; for example, PrimaryDsearch. This
file is located in xplore_home/dsearch/cps/cps_daemon. The default values have been optimized
for most environments.
• Connection pool size: Maximum number of concurrent connections. Valid values: 1-100. Default:
4.
Increasing the number of connections consumes more memory. Decreasing can slow ingestion.
• Port number: Listener port for CPS daemon, used by the CPS manager. Default: 9322.
This value is set during xPlore configuration.
• Daemon path: Specifies the path to the installed CPS daemon (read-only).
This value is set during xPlore configuration.
• Keep intermediate temp file: Keep content in a temporary CPS folder for debugging.
Enabling temp file has a large impact on performance. Disable (default) to remove temporary files
after the specified time in seconds. Time range in seconds: 1-604800 (1 week).
• Restart threshold: Select After processed... and specify the number of requests after which to
restart the CPS daemon.
Disable if you do not want the daemon restarted. Decreasing the number can affect performance.
• Heartbeat: Interval in seconds between the CPS manager and daemon.
Range: 1-600. Default: 60.
• Embedded return: Select Yes (default) to return embedded results to the buffer. Check No return
results to a file, and specify the file path for export.
Embedded return increases communication time and impacts ingestion.
• Export file path: Valid URI at which to store CPS processing results, for example, file:///c:/.
If the results are larger than Result buffer threshold, they are saved in this path. This setting
does not apply to remote CPS instances, because the processing results are always embedded in
the return to xPlore.
• Result buffer size threshold: Number of bytes at which the result buffer returns results to file.
Valid values: 8 - 16 MB. Default: 1 MB (1048576 bytes). Larger values can accelerate the process
but can cause more instability.
• Processing buffer size threshold: Specifies the number of bytes of the internal memory chunk used
to process small documents.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 361
CPS instance configuration parameters
If this threshold is exceeded, a temporary file is created for processing. Valid values: 100 KB-10
MB. Default: 2 MB (2097152 bytes). Increase the value to speed processing. This consumes
more memory.
• Load file to memory: Check to load the submitted file into memory for processing. Uncheck to pass
the file to a plug-in analyzer for processing (for example, the Documentum index agent).
• Batch in batch count: Average number of batch requests in a batch request.
Range: 1-100. Default: 5. CPS assigns the number of Connection pool threads for each
batch_in_batch count. For example, defaults of batch_in_batch of 5 and connection_pool_size of 5
result in 25 threads.
• Thread pool size: Number of threads used to process a single incoming request such as text
extraction and linguistic processing.
Range: 1-100. Default: 8). Larger size can speed ingestion when CPU is not under heavy load. This
causes instability at heavy CPU load.
• System language: ISO 639-1 language code that specifies the language for CPS.
• Max text threshold: Sets the size limit, in bytes, for the text within documents. Range: 5MB - 2GB
in bytes. Default: 10485760 (10 MB). Maximum setting: 2 GB. Larger values can slow ingestion
rate and cause more instability. Above this size, only the document metadata is tokenized.
Includes expanded attachments. For example, if an email has a zip attachment, the zip file is
expanded to evaluate document size. If you increase this threshold, ingestion performance can
degrade under heavy load.
• Illegal char file: Specifies the URI of a file that defines illegal characters.
To create a token separator, xPlore replaces illegal characters with white space. This list is
configurable.
• Request time out: Number of seconds before a single request times out.
Range: 60-3600. Default: 600.
• Daemon standalone: Check to stop daemon if no manager connects to it. Default: false.
• IP version: Internet Protocol version of the host machine. Values: IPv4 or IPv6. Dual stack is not
supported. Note: For CPS to be started successfully, ensure that you set the correct IP version
before starting xPlore.
• Use express queue: This queue processes query requests. Queries are processed for language
identification, lemmatization, and tokenization. The express queue has priority over the regular
queue. Set the maximum number of requests in the queue. Default: 128.
• The regular queue processes indexing requests. Set the maximum number of requests in the queue.
Default: 1024.
• When the token count is zero and the extracted text is larger than the configured threshold, a
warning is logged
You can configure the following additional parameters in the CPS configuration file
PrimaryDsearch_local_configuration.xml, which is located in the CPS instance directory
xplore_home/dsearch/cps/cps_daemon. If these properties are not in the file, you can add them.
• detect_data_len: The number of bytes used for language identification. The bytes are analyzed
from the beginning of the file. A larger number slows the ingestion process. A smaller number
increases the risk of language misidentification. Default: 65536.
362 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Agent, CPS, Indexing, and Search Parameters
• max_batch_size: Limit for the number of requests in a batch. Valid values: 2 - 65535 (default:
65535).
Note: The index agent also has batch size parameters.
• max_data_per_process: The upper limit in bytes for a batch of documents in CPS processing.
Default: 30 MB. Maximum setting: 2 GB.
• normalize_form: Set to true to remove accents in the index, which allows search for the same
word without the accent.
• slim_buffer_size_threshold: Sets memory buffer for CPS temporary files. Increase to 16384 or
larger for CenterStage or other client applications that have a high volume of metadata.
• temp_directory: Directory for CPS temporary files. Default:
xplore_home/dsearch/cps/cps_daemon/temp.
• temp_file_folder: Directory for temporary format and language identification. Default:
xplore_home/dsearch/cps/cps_daemon/temp.
• keep_temp_file: Whether to keep CPS temporary files (true) or not (false).
• temp_file_retain_time: How long will CPS temporary files be retained. This parameter is only
effective when keep_temp_file is set to false.
• daemon_restart_memory_threshold: Maximum memory consumption at which CPS is restarted.
• use_direct_io: Requires CPS to read and write to devices directly.
• io_block_unit: Logical block unit of the read/write target device.
• io_chunk_size: Size for each read/write chunk.
• cut_off_text: Set to true to cut off text of large documents that exceed max_text_threshold instead of
rejecting entire document. Changes to this setting require an index rebuild.
• Language: Languages supported by CPS, delimited by comma. Changes to this setting require
an index rebuild.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 363
Search service configuration parameters
364 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Agent, CPS, Indexing, and Search Parameters
ftcontains ".*"
ftcontains "@.*"
ftcontains " "
ftcontains ".."
• query-parallel-execution: Set this value to true to run all queries against multiple collections in
parallel. When this is set to true, a log message “Use Parallel Execution” will be recorded for each
query logged in the dsearch.log file at the INFO level.
When there are too many queries running in parallel concurrently, a query may be forced to run
in nonparallel mode to prevent system overload. In this case, the following message is logged for
the query at the INFO level: “Current active parallel thread count too high to allow parallel query
execution. Running in nonparallel for the query”.
When the parallel search resource pool is exhausted, the system will attempt to retry the query
request for up to the number of times specified by the query-executor-retry-limit property
value in search-config. The amount of wait time between retry attempts is determined by the
query-executor-retry-interval property setting.
• query-summary-fragment-size: Number of characters to return as a fragment. The maximum number
of fragments a query can return equals query-summary-display-length/query-summary-fragment-size.
Default is 64.
• query-parallel-summary-calculation: Set to true to enable parallel summary calculation. Default:
false.
• query-parallel-execution-thread-pool-size: The maximum thread pool size for parallel execution.
Default: 100.
• query-parallel-summary-calculation-thread-pool-size: The maximum thread pool size for parallel
summary calculation. Default: 100.
• query-summary-process-all-highlight-nodes: Set it to true to enable summary highlighting for
multiple elements. By default, this parameter is not provided in indexserverconfig.xml.
• query-highlight-based-on-phrase: Set to true to enable highlighting based on the exact phrase, or
sequence of terms, for a phrase query. Set to false to enable highlighting based on each term for a
phrase query or term query.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 365
API Reference
API Reference
CPS APIs
Content processing service APIs are available in the interface IFtAdminCPS in the package
com.emc.documentum.core.fulltext.client.admin.api.interfaces. This package is in the SDK jar file
dsearchadmin-api.jar.
To add a CPS instance using the API addCPS(String instanceName, URL url, String usage), the
following values are valid for usage: all, index, or search. If the instance is used for CPS alone, use
index. For example:
addCPS("primary","
http://myhost:9700/cps/ContentProcessingService?wsdl","
index")
CPS configuration keys for setCPSConfig()
• CPS-requests-max-size: Maximum size of CPS queue
• CPS-requests-batch-size: Maximum number of CPS requests in a batch
• CPS-thread-wait-time: Maximum wait time in milliseconds to accumulate requests in a batch
• CPS-executor-queue-size: Maximum size of CPS executor queue before spawning a new worker
thread
366 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Index Agent, CPS, Indexing, and Search Parameters
Search APIs
Search service APIs are available in the following packages of the SDK jar file dsearchadmin-api.jar. :
• IFtAdminSearch in the package com.emc.documentum.core.fulltext.client.admin.api.interface.
• IFtSearchSession in com.emc.documentum.core.fulltext.client.search
• IFtQueryOptions in com.emc.documentum.core.fulltext.common.search.
Auditing APIs
The data management APIs are available in the interface IFtAdminDataManagement in the package
com.emc.documentum.core.fulltext.client.admin.api.interfaces. This package is in the SDK jar file
dsearchadmin-api.jar.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 367
Appendix B
The dftxml Category
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 369
Extensible Documentum DTD
Documentum repository content is stored in XML format. Customer-defined elements and attributes
can be added to the DTD as children of dmftcustom. Each element specifies an attribute of the object
type. The object type is the element in the path dmftdoc/dmftmetadata/type_name, for example,
dmftdoc/dmftmetadata/dm_document.
To view the dftxml representation that is generated by the index agent, add the
following element as a child of the exporter element in indexagent.xml in
xplore_home/wildfly_version/server/DctmServer_Indexagent/deployments/IndexAgent.war/WEB-INF/classes.
<keep_dftxml>true</keep_dftxml>
To view the dftxml representation of a document that has been indexed, open xPlore administrator
and click the document in the collection view.
To find the path of a specific attribute in dftxml, use a Documentum client to look up the object ID of a
custom object. Using xPlore administrator, open the target collection and paste the object ID into the
Filter word box. Click the resulting document to see the dftxml representation.
DTD
This DTD is subject to change. Following are the top-level elements under dmftdoc.
Table B.35 dftxml top-level elements
Element Description
dmftkey Contains Documentum object ID (r_object_id)
370 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
The dftxml Category
Element Description
dmftcustom Contains searchable information supplied by custom
applications. Requires a TBO. See Injecting data and
supporting joins, page 89.
dmftsearchinternals Contains tokens used by static and dynamic
summaries.
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 371
Example dftxml of a custom object type
<i_branch_cnt dmfttype="dmint">0</i_branch_cnt>
<i_direct_dsc dmfttype="dmbool">false</i_direct_dsc>
<r_immutable_flag dmfttype="dmbool">false</r_immutable_flag>
<r_frozen_flag dmfttype="dmbool">false</r_frozen_flag>
<r_has_events dmfttype="dmbool">false</r_has_events>
<acl_domain dmfttype="dmstring">Administrator</acl_domain>
<acl_name dmfttype="dmstring">dm_450a0d6880000101</acl_name>
<i_is_reference dmfttype="dmbool">false</i_is_reference>
<r_creator_name dmfttype="dmstring">Administrator</r_creator_name>
<r_is_public dmfttype="dmbool">true</r_is_public>
<r_policy_id dmfttype="dmid">0000000000000000</r_policy_id>
<r_resume_state dmfttype="dmint">0</r_resume_state>
<r_current_state dmfttype="dmint">0</r_current_state>
<r_alias_set_id dmfttype="dmid">0000000000000000</r_alias_set_id>
<a_is_template dmfttype="dmbool">false</a_is_template>
<r_full_content_size dmfttype="dmdouble">130524</r_full_content_size>
<a_is_signed dmfttype="dmbool">false</a_is_signed>
<a_last_review_date dmfttype="dmdate"/>
<i_retain_until dmfttype="dmdate"/>
<i_partition dmfttype="dmint">0</i_partition>
<i_is_replica dmfttype="dmbool">false</i_is_replica>
<i_vstamp dmfttype="dmint">0</i_vstamp>
<webpublish dmfttype="dmbool">false</webpublish>
</dm_sysobject>
</dmftmetadata>
<dmftvstamp>
<i_vstamp dmfttype="dmint">0</i_vstamp>
</dmftvstamp>
<dmftsecurity>
<acl_name dmfttype="dmstring">dm_450a0d6880000101</acl_name>
<acl_domain dmfttype="dmstring">Administrator</acl_domain>
<ispublic dmfttype="dmbool">true</ispublic>
</dmftsecurity>
<dmftinternal>
<docbase_id dmfttype="dmstring">658792</docbase_id>
<server_config_name dmfttype="dmstring">DSS_LH1</server_config_name>
<contentid dmfttype="dmid">060a0d688000ec61</contentid>
<r_object_id dmfttype="dmid">090a0d6880008848</r_object_id>
<r_object_type dmfttype="dmstring">techpubs</r_object_type>
<i_all_types dmfttype="dmid">030a0d68800001d7</i_all_types>
<i_all_types dmfttype="dmid">030a0d6880000129</i_all_types>
<i_all_types dmfttype="dmid">030a0d6880000105</i_all_types>
<i_dftxml_schema_version dmfttype="dmstring">5.3</i_dftxml_schema_version>
</dmftinternal>
<dmftversions>
<r_version_label dmfttype="dmstring">1.0</r_version_label>
<r_version_label dmfttype="dmstring">CURRENT</r_version_label>
<iscurrent dmfttype="dmbool">true</iscurrent>
</dmftversions>
<dmftfolders>
<i_folder_id dmfttype="dmid">0c0a0d6880000105</i_folder_id>
</dmftfolders>
<dmftcontents>
<dmftcontent>
372 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
The dftxml Category
<dmftcontentattrs>
<r_object_id dmfttype="dmid">060a0d688000ec61</r_object_id>
<page dmfttype="dmint">0</page>
<i_full_format dmfttype="dmstring">crtext</i_full_format>
</dmftcontentattrs>
<dmftcontentref content-type="text/plain" islocalcopy="true" lang="en"
encoding="US-ASCII" summary_tokens="dmftsummarytokens_0">
<![CDATA[...]]>
</dmftcontentref>
</dmftcontent>
</dmftcontents>
<dmftdsearchinternals dss_tokens="excluded">
<dmftstaticsummarytext dss_tokens="excluded"><![CDATA[mylog.txt ]]>
</dmftstaticsummarytext>
<dmftsummarytokens_0 dss_tokens="excluded"><![CDATA[1Tkns ...]]>
</dmftsummarytokens_0></dmftdsearchinternals></dmftdoc>
Note: The attribute islocalcopy indicates whether the content was indexed. If true, only the metadata
was indexed, and no copy of the content exists in the index.
• Subpath definition
Include namespaces in subpath definitions to specify paths to elements with namespaces. For
example, to specify a path to the following element with a namespace:
<com:product xmlns:com="http://www.opentext.com"
dmfttype="dmstring">xPlore</com:product>
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 373
Supporting XML namespaces
The search will only return matching results with the specified namespace defined in the subpath
definitions.
• Linguistic processing related configurations
When performing linguistic processing related configurations such as static summaries and language
identification, you can include elements with namespaces for CPS to handle such documents.
For example, for the following element with a namespace in the dftxml document:
<ngis:subject xmlns:ngis="http://www.ngis.com"
dmfttype="dmstring">subject with namespace</ngis:subject>
To define the subject element as a static summary element, in indexserverconfig.xml, configure
the setting as follows:
...
<elements-for-static-summary max-size="65536">
<element-name name="{http://www.ngis.com}subject"/>
</elements-for-static-summary>
...
To configure the subject element to be used for language identification, in indexserverconfig.xml,
configure the setting as follows:
...
<linguistic-process>
<element-for-language-idenfication name="{http://www.ngis.com}subject">
</linguistic-process>
...
After the configuration, CPS will only process subject elements with the ngis namespace but not
subject elements with no or different namespaces.
374 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
Appendix C
XQuery and VQL Reference
∙ Tracking XQueries
∙ VQL and XQuery Syntax Equivalents
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 375
Tracking XQueries
Tracking XQueries
Object count from tracking DB
For example:
for $i in collection("dsearch/SystemInfo")
return count($i//trackinginfo/document)
For example:
Get object count in library
count(//trackinginfo/document[library-path="<LibraryPath>"])
Find documents
For example:
for $i in collection("dsearch/SystemInfo")
where $i//trackinginfo/document[@id="TestCustomType_txt1276106246060"]
return $i//trackinginfo/document/collection-name
376 OpenText Documentum xPlore Version 1.6 Administration and Development Guide
XQuery and VQL Reference
DQL XQuery
IN for $i in collection('
/XX/dsearch/Data')/dmftdoc[
(dmftcontents/dmftcontent
ftcontains ('test1')) ]
NEAR/N for $i in collection('
/XX/dsearch/Data')/dmftdoc[
(dmftcontents/dmftcontent
ftcontains ('test1' ftand 'test2'
distance exactly N words)) ]
ORDERED for $i in collection('
/XX/dsearch/Data')/dmftdoc[
(dmftcontents/dmftcontent
ftcontains ('test1' ftand 'test2')
ordered]
ENDS let $result := ( for $i in collection('
/XX/dsearch/Data')/dmftdoc[
(dmftcontents/dmftcontent
ftcontains ('test1')) and
(ends-with(dmftmetadata/dm_sysobject/
object_name, 'test2'))]
STARTS for $i in collection('
/XX/dsearch/Data')/dmftdoc[
(dmftcontents/dmftcontent
ftcontains ('test1')) and
starts-with(dmftinternal/r_object_type,
'dm_docu')]
OpenText Documentum xPlore Version 1.6 Administration and Development Guide 377
xPlore Glossary
Term Description
category
A category defines a class of documents and
their XML structure.
collection
A collection is a logical group of XML
documents that is physically stored in an
xDB library. A collection represents the most
granular data management unit within xPlore.
content processing service (CPS)
The content processing service (CPS)
retrieves indexable content from content
sources and determines the document format
and primary language. CPS parses the content
into index tokens that xPlore can process into
full-text indexes.
domain
A domain is a separate, independent group of
collections with an xPlore deployment.
DQL
Documentum Query Language, used by many
Content Server clients
FTDQL
Full-text Documentum Query Language
ftintegrity
A standalone Java program that checks index
integrity against Content Server repository
documents. The ftintegrity script calls the
state of index job in the Content Server.
full-text index
Index structure that tracks terms and their
occurrence in a document.
index agent
Documentum application that receives
indexing requests from the Content Server.
The agent prepares and submits an XML
representation of the document to xPlore for
indexing.
ingestion
Process in which xPlore receives an XML
representation of a document and processes
it into an index.
Term Description
instance
A xPlore instance is one deployment of the
xPlore WAR file to an application server
container. You can have multiple instances on
the same host (vertical scaling), although it is
more common to have one xPlore instance
per host (horizontal scaling). The following
processes can run in an xPlore instance: CPS,
indexing, search, xPlore administrator. xPlore
can have multiple instances installed on the
same host.
lemmatization
Lemmatization is a normalization process in
which the lemmatizer finds a canonical or
dictionary form for a word, called a lemma.
Content that is indexed is also lemmatized
unless lemmatization is turned off. Terms in
search queries are also lemmatized unless
lemmatization is turned off.
Lucene
Apache open-source, Java-based full-text
indexing, and search engine.
node
In xPlore and xDB, node is sometimes used
to denote instance. It does not denote host.
persistence library
Saves CPS, indexing, and search metrics.
Configurable in indexserverconfig.xml.
state of index job
Content Server configuration installs the state
of index job dm_FTStateOfIndex. This job is
run from Documentum Administrator. The
ftintegrity script calls this job, which reports
on index completeness, status, and indexing
failures.
status library
A status library reports on indexing status for
a domain. There is one status library for each
domain.
stop words
Stop words are common words filtered out of
queries to improve query performance. Stop
words can be searched when used in a phrase.
text extraction
Identification of terms in a content file.
token
Piece of an input string defined by semantic
processing rules.
Term Description
tracking library
An xDB library that records the object IDs
and location of content that has been indexed.
There is one tracking database for each
domain.
transactional support
Small in-memory indexes are created in
rapid transactional updates, then merged into
larger indexes. When an index is written
to disk, it is considered clean. Committed
and uncommitted data before the merge is
searchable along with the on-disk index.
watchdog service
Installed by the xPlore installer, the watchdog
service pings all xPlore instances and sends
an email notification to the administrator
when an instance does not respond. The
watchdog service can also be configured to
automatically restart the index agent when it
has stopped working.
xDB
xDB is a database that enables high-speed
storage and manipulation of many XML
documents. In xPlore, an xDB library stores
a collection as a Lucene index and manages
the indexes on the collection. The XML
content of indexed documents can optionally
be stored.
XQFT
W3C full-text XQuery and XPath extensions
described in XQuery and XPath Full Text 1.0.
Support for XQFT includes logical full-text
operators, wildcard option, anyall option,
positional filters, and score variables.
XQuery
W3C standard query language that is designed
to query XML data. xPlore receives xQuery
expressions that are compliant with the
XQuery standard and returns results.