OpenText Document Pipelines 16.2 - Programming Guide English
OpenText Document Pipelines 16.2 - Programming Guide English
Programming Guide
AR160200-PDP-EN-1
OpenText™ Document Pipelines
Programming Guide
AR160200-PDP-EN-1
Rev.: 23/Oct/2019
This documentation has been created for software version 16.2.
It is also valid for subsequent software versions as long as no new document version is shipped with the product or is
published at https://knowledge.opentext.com.
Tel: +1-519-888-7111
Toll Free Canada/USA: 1-800-499-6544 International: +800-4996-5440
Fax: +1-519-888-0677
Support: https://support.opentext.com
For more information, visit https://www.opentext.com
One or more patents may cover this product. For more information, please visit https://www.opentext.com/patents.
Disclaimer
Every effort has been made to ensure the accuracy of the features and techniques presented in this publication. However,
Open Text Corporation and its affiliates accept no responsibility and offer no warranty whether expressed or implied, for the
accuracy of this publication.
Table of Contents
Part 1 Introduction 5
9 perldtn ...................................................................................... 61
9.1 Command Line Options ................................................................... 61
9.2 Running perldtn as Standard Doctool ............................................... 62
9.3 Functions for perldtn as Standard DocTool ....................................... 63
9.3.1 doBeforeConnect ............................................................................ 63
9.3.2 service ........................................................................................... 63
9.3.3 printObject ...................................................................................... 64
9.3.4 control ............................................................................................ 64
10 perldte ....................................................................................... 65
10.1 Command Line Options ................................................................... 65
10.2 Running perldte as Enqueueing Doctool ........................................... 66
10.3 Functions for perldte as Standard DocTool ....................................... 66
10.3.1 doBeforeConnect ............................................................................ 66
10.3.2 service ........................................................................................... 66
12 Examples .................................................................................. 77
12.1 Enqueueing Documents into a OpenText Document Pipeline with
perldte ............................................................................................ 77
12.2 Using perldtn as DocTool Running in a Document Pipeline ................ 79
12.2.1 Modules and Functions ................................................................... 79
12.2.2 Sample Script ................................................................................. 82
12.2.3 Sample COMMANDS Files .............................................................. 85
GLS Glossary 87
This part of the Programming Guide provides you with general information about
this guide and how you get further documentation concerning OpenText Document
Pipelines as well as product information about OpenText products.
Before you start, you need to understand the basic concepts of OpenText Document
Pipelines. For a general introduction, read the guide OpenText Document Pipelines -
Overview and Import Interfaces (AR-CDP).
Practical knowledge of Perl programming is required if you want to write your own
DocTools.
“Introduction” on page 5
This part describes the contents of this guide and the conventions used, and
gives an overview of the documentation.
“General Document Pipeline Process” on page 13
This part explains basic concepts for the OpenText Document Pipeline and
describes the Document Pipeline process on the server.
“Reconfiguring Document Pipelines” on page 29
This part describes the standard configuration files, how to insert DocTools in a
Document Pipeline and how to remove them. Furthermore it provides
information how to improve the performance of DocTools.
“perldtn and perldte” on page 59
This part gives an introduction to the DocTools perldtn and perldte helping you
to write your own DocTools. It provides also information on OpenText-specific
Perl modules and sample scripts.
1.2 Conventions
User interface
This format is used for elements in the graphical user interface (GUI), such as
buttons, names of icons, menu items, and fields.
Filenames, commands, and sample data
This format is used for file names, paths, URLs, and commands at the command
prompt. It is also used for example data, text to be entered in text boxes, and
other literals.
Note: If you copy command line examples from a PDF, be aware that PDFs
can contain hidden characters. OpenText recommends that you copy from
the HTML version of the document, if it is available.
KEY NAMES
Key names appear in ALL CAPS, for example:
Press CTRL+V.
<Variable name>
Angled brackets < > are used to denote a variable or placeholder. The user
replaces the brackets and the descriptive content with the appropriate value. For
example, <server_name> becomes serv01.
Internal cross-references
Click the cross-reference to go directly to the reference target in the current
document.
External cross-references
External cross-references are usually text references to other documents.
However, if a document is available in HTML format, for example, in the My
Support, external references may be active links to a specific section in the
referenced document.
Warnings, notes, and tips
Caution
Cautions help you avoid irreversible problems. Read this information
carefully and follow all instructions.
Important
Important notes help you avoid major problems.
Directories The following variables for installation and configuration directories are used:
<ECM_DP_PERL_10_1_1>
Installation directory for Document Pipeline Perl, in this case Perl 10.1.1. You
can install Document Pipelines Perl of different versions in parallel.
Example for Windows:
C:\Program Files\Open Text\Document Pipeline Perl_10_1_1.
<ECM_DOCUMENT_PIPELINE_BASE>
Installation directory for Document Pipeline Base.
Example for Windows:
C:\Program Files\Open Text\Document Pipeline Base.
<ECM_DOCUMENT_PIPELINE_INFO>
Installation directory for Document Pipeline Info (DPInfo).
Example for Windows:
C:\Program Files\Open Text\Document Pipeline Info.
<ECM_LOG_DIR>
Location of log files.
Example for Windows:
C:\Documents and Settings\All Users\Application Data\Open Text\var
\LogDir
<ECM_VAR_DIR>
Location of protocols for enqueueing.
Example for Windows:
C:\Documents and Settings\All Users\Application Data\Open Text\var
<ECM_DOCUMENT_PIPELINE_CONF>
Global OpenText Document Pipeline configuration directory.
Example for Windows:
C:\Documents and Settings\All Users\Application Data\Open Text\BASE
Document Pipeline
2. Execute set_ECM.
Release Notes
The Release Notes describe in detail the software supported with the product
and important dependencies, as well as any last-minute changes regarding the
documentation that should be made known. The current version of the
OpenText Document Pipelines Release Notes is available via the OpenText My
Support at https://knowledge.opentext.com/knowledge/llisapi.dll/open/
14711375. Depending on the OpenText products you use, you may also need the
Release Notes of other products that are available in the My Support.
Installation Guides
The installation guides describe the standard installation of the components
required. In particular, the following guides contain installation instructions for
OpenText Document Pipelines:
Document Pipeline Base, SAP, DocuLink, and ELS
OpenText Document Pipeline - Installation and Upgrade Guide (AR-IDPDP), see
Document Pipeline Downloads in OpenText My Support (https://
knowledge.opentext.com/knowledge/llisapi.dll/open/14711031).
TCP Document Pipelines
OpenText Transactional Content Processing - Installation Guide (TCP-IGD), see
OpenText Transactional Content Processing in OpenText My Support
(https://knowledge.opentext.com/knowledge/llisapi.dll/open/14503093)
Document Pipeline for Content Server
Section 6.4.4 “Installing Document Pipeline for Content Server” in OpenText
Imaging Enterprise Scan - Installation Guide (CLES-IGD)
File System Archiving Document Pipeline
OpenText File System Archiving - Installation Guide (FA-IGD)
• Communities
• Knowledge Center
• Usage tips, help files, and best practices for customers and partners.
• Information on product releases.
• User groups and forums where you can ask questions to OpenText experts.
If you need additional assistance, you can find OpenText Corporate Support
Contacts at http://support.opentext.com/.
This part describes the concept of Document Pipelines and the components, files and
environment variables that are involved in the configuration and operation of
pipelines.
Conveyor belt A Document Pipeline is the basic component in almost all document processing
analogy software and is used, for instance, to transfer documents to a storage system or
another application while performing certain additional tasks. Speaking figuratively,
a Document Pipeline is the conveyor belt that transfers the documents through the
software. Individual tools (called DocTools) along the way retrieve the documents
from the conveyer belt, process them one by one, and then return them to be
processed by the next tool. The last tool in the pipeline generally removes the
document from the conveyor belt. Depending on the configuration, Document
Pipelines can contain various different DocTools to implement all different kinds of
document processing, and further tools can be added as required.
Transactional An important principle for all Document Pipelines is that processing is always
transactional. That means the processing status of the document is always defined –
either it has been processed by a specific DocTool or not – and no documents can get
lost. If for any reason the Document Pipeline is aborted or processing is cancelled at
any time, the document is considered to be unprocessed by the last active DocTool.
The current status is retained at all times. Therefore, when the Document Pipeline is
started again, processing can continue at precisely the same step the document was
at when the program was aborted.
Queues and Queues are present “between” each DocTool to keep the waiting documents. While
statuses a document is in the Document Pipeline, it is always located in one distinct queue.
The documents are processed in the order they are enqueued into the Document
Pipeline. If a DocTool processes a document, the next documents are waiting in the
input queue of the DocTool until they can be processed. After successful processing,
the document is assigned to the target queue in the Document Pipeline definition.
This queue is the input queue of the next DocTool in the Document Pipeline. If an
error occurs during document processing, the document is assigned to an error
queue. This means that each DocTool has at least two queues assigned: the input
and the error queue. These queues do not exist physically.
Queue Status
Source queue Document is waiting to be processed
Target queue (≙ source queue of the next Document has been processed successfully
tool) by previous tool; waiting for next
Error queue Document processing failed; error processing
required
A document queue has a FIFO (First In – First Out) structure, that is documents
entering a queue first will be assigned to a DocTool first.
General The general process in a Document Pipeline is shown in Figure 2-1 and refers to the
process following description:
1. A document is taken from a defined exchange directory and placed in the initial
source queue by a special tool called the enqueue tool. The enqueue tool performs
these steps:
2. The Document Pipeliner reads the Document Pipeline configuration and calls
the DocTool with the parameter Document Directory.
3. The DocTool processes the document and returns an operation code for success
or error and the name of the Document Directory. This operation code is set as
status of the document. You can see the results in the OpenText Document
Pipeline Info (DPInfo) window. If the operation was successful, it is placed in
the target queue, which corresponds with the source queue of the next DocTool
in the processing chain.
4. Only if an error occurs during processing, the document is placed in the error
queue of the DocTool and proceeds as follows:
5. Steps 2 to 4 are repeated for all existing DocTools in the Document Pipeline.
EXT_DIR
1a
Enqueue
DocTool 1 DocTool 2 docrm
Tool
2 3 6 6
5
1b
Target queue 1
5
…
Source queue 1 Source queue 2 Source queue n
Error queue 1 Error queue 2 Error queue 2
Note: No document is physically copied but there are internal lists, which
represent the various queues for the different DocTools in the pipeline.
Logging the Before moving a document to the next queue, a time stamp and the input queue
status name of the first DocTool are entered in its status file (DPqStatus). The status file is
transferred together with the document and contains a record of all the processing
steps that the document has already gone through, together with the corresponding
time stamps. The last line always reflects the current queue name of the document.
This file is mainly used internally for recovery after a disturbance in the Document
Pipeline process, as it enables the pipeline to continue processing the document at
precisely the step it was stopped.
Monitoring the You can monitor each step in the Document Pipeline processing chain in a special
process tool called Document Pipeline Info. Document Pipeline Info shows how many
documents are waiting to be processed by each DocTool. In case of errors, you can
determine which error queues contain documents. For details on the Document
Pipeline Info monitoring tool, see OpenText Document Pipelines - Overview and Import
Interfaces (AR-CDP).
The DP acts as a dispatcher and provides each DocTool with documents to process -
that is, with the path name for the document directories in the <DPDIR>. Once a
DocTool has finished processing a document, it notifies the DP that it is done by
sending it the path name of the document directory and an identifier for the
operation that has been performed. The DP tracks the fact that the document has
been processed by the DocTool in the DPqStatus file, together with the current time
stamp. Afterwards, the DP enters the document into the source queue of the next
DocTool.
The DPqStatus file is used in case of error recovery. For example, after crash and
restart of the Document Pipeliner, it reads the entries for each document in the
DPqStatus file. With the information in these entries, the Document Pipeliner
assigns each document to the queue in which the document was before the crash.
For technical details on the DP, see “Deploying Document Pipelines“ on page 21.
You can find a functional description of the most common DocTools in OpenText
Document Pipelines - Overview and Import Interfaces (AR-CDP).
dpconfig Which DocTools are executed in which order is defined in a special configuration
file (dpconfig) for each Document Pipeline. Each DocTool retrieves the documents
from a specified source queue, carries out an operation, and - depending on the
result of the operation - sends the documents on to a specified target queue or an
error queue. For each possible processing step, an entry is made in the dpconfig file.
For details on the configuration file, see “dpconfig” on page 31.
DocTool types If the same DocTool is to be executed several times, but with different queues as
input and output sources, you can define DocTool types. DocTool types can be
distinguished by their names. Defining DocTool types is helpful, for instance, if the
same DocTool is used more than once in the same Document Pipeline, allowing for
parallel processing in one pipeline step.
COMMANDS As the documents are transferred from one DocTool to another, a means of
file communication between the tools is required. Therefore, a COMMANDS file is
created for each document. The COMMANDS file is stored with the document in its
document directory. It contains processing information for the document and can be
extended by any DocTool to include information or parameters for a subsequent
tool, for example a document ID after storing a document to an archive, or
formatting instructions for an XML file. Each entry in the COMMANDS file starts with a
special keyword. This allows each DocTool to scan the file for those keywords that
are relevant to it.
For more information on the COMMANDS file, see OpenText Document Pipelines -
Overview and Import Interfaces (AR-CDP).
Within the exchange document directory, the following files are required for the
document to be processed by the Document Pipeline:
• The data file(s) belonging to the document, for example, data, data1, data2, etc.
All files in a document must be in the same directory.
• One of the following files containing the attributes of the document:
– The IXATTR file. For more information, see Section 7.1.2 “Providing the
attributes in an IXATTR file” in OpenText Document Pipelines - Overview and
Import Interfaces (AR-CDP).
– An xml together with an XSL style sheet. For more information, see Section
7.1.3 “Providing the attributes in XML” in OpenText Document Pipelines -
Overview and Import Interfaces (AR-CDP).
• The COMMANDS file, containing the processing information and parameters for the
DocTools. For more information, see Section 7.3 “COMMANDS file” in OpenText
Document Pipelines - Overview and Import Interfaces (AR-CDP).
• The indicator file LOG that indicates that the directory is ready to be processed.
The document will be processed by the enqueue tool only if this file is available in
the document directory.
Note: Although the general concept for providing documents to the pipeline is
similar for all import scenarios, there are minor differences for Batch import
with attribute extraction. The IXATTR file containing the attributes is
created internally by the first DocTools in the pipeline, and the COMMANDS file is
provided centrally for all documents of the same type in a special configuration
file directory. For more information, see Section 7.2.3 “Providing the
documents for Batch import with attribute extraction” in OpenText Document
Pipelines - Overview and Import Interfaces (AR-CDP).
There are different means to provide the required document directories with the
required files listed above.
• Scanning and storing the files directly from an appropriately configured scan
application (described in the corresponding application documentation). For
more information about Scan pipelines, see Section 12.5.8 “Archiving with the
Document Pipeline for SAP” in OpenText Imaging Enterprise Scan - User and
Administration Guide (CLES-UGD) or Section 12.5.9 “Archiving with the
Document Pipeline for TCP” in OpenText Imaging Enterprise Scan - User and
Administration Guide (CLES-UGD).
• Using the Batch import with attributes provided in advance scenario. The
required attribute files are created beforehand. For more information, see
OpenText Document Pipelines - Overview and Import Interfaces (AR-CDP).
• Using the Batch import with attribute extraction scenario - primarily used
for processing document and print lists, also referred to as the COLD scenario:
the required index files are created automatically. For more information, see
Section 7.2 “Batch import with extraction of attributes (COLD)” in OpenText
Document Pipelines - Overview and Import Interfaces (AR-CDP).
Note: The scenario used to provide the documents is reflected by the name of
the Document Pipeline. The first two letters of the name have the following
meaning:
For example, the Document Pipeline named COR3 uses the COLD (attribute
extraction) scenario to transfer documents to an SAP R/3 (or any other SAP)
system.
From the exchange directory, the document files are transferred to the DP document
directory using a special (enqueue) tool.
Enqueue Tool
The client provides the documents for processing by executing an enqueue tool
(usually as a scheduled job). This tool performs the following steps:
Once processing has started, the DPqStatus file is also located in this directory.
Important
For standard Document Pipeline scenarios that have been fully installed and
configured, no further customizing is required by the user to deploy the
pipelines on the server. Simply ensure that the Spawner service is running
properly, provide the documents to be processed in the specified manner, and
start the pipeline (for instance by scheduling a job; see “Starting the Document
Pipelines Using Jobs” on page 27). Additional environment settings are
available to improve performance or adapt to specific requirements; these are
described in “Environment Variables” on page 24. For new or customer-
specific pipelines, additional configuration tasks are always required. For
information, ask your OpenText consultant.
Starting the DP The Document Pipeliner (DP), which controls and administers the Document
Pipeline processes, is provided as a Spawner component and is included in the
Archive Center installation. The DP is started when the Spawner opens the
corresponding configuration file (30dp.servtab), and then waits for DocTools to
register with it. For details on the Spawner, see also Section 32.2 “Analyzing
processes with spawncmd” in OpenText Archive Center - Administration Guide (AR-
ACN).
Registering the The DocTools are configured in further .servtab files and must be started after the
DocTools DP in order to work properly. This is ensured by adhering to the naming
conventions for the servtab files; these files are processed in alphabetical order.
Once started, the DocTools try to register with the DP by sending their DocTool type
and the function that is to be called by the DP when a document is waiting to be
processed (service function). Different types of the same DocTool must be registered
individually; see also “DocTool types” on page 18.
When the DP receives a request for registration, it checks the DP configuration files
in the <ECM_DOCUMENT_PIPELINE_CONF>/dpconfig directory to determine which
DocTool types are employed in the Document Pipelines. If the DP finds an entry for
the DocTool, the registration is accepted; if not, it is rejected. Registration is
important because during document processing, documents are only sent to
DocTools that have been registered with the DP. If a DocTool is stopped, it signs off
from the DP first so that the DP no longer sends documents to that tool.
Tip: To find out which DocTools are registered and active, you can use the
dpctrl command line tool; see “dpctrl” on page 48.
Log files If registration fails, an entry is made in the DP log file, as well as in the
corresponding DocTool log file. The log files can be found here:
DP log files
Windows
<ECM_LOG_DIR>\DP.log
UNIX/Linux
<ECM_LOG_DIR>/DP.log
UNIX/Linux
<ECM_LOG_DIR>/<doctool_name>.log
When all DocTools have registered, the DP is ready to process documents, and runs
in the background.
Using enqueue The client provides the documents for processing by executing an enqueue tool. This
tool tool is the first DocTool of most Document Pipelines and is usually started by
scheduling the execution of the corresponding pipeline as a regular job
(start<pipeline_name>, for example startEXR3) in the OpenText Administration
Client; see Section 7 “Configuring jobs and checking job protocol” in OpenText
Archive Center - Administration Guide (AR-ACN) . This tool checks all directories in
the specified exchange directory until it finds one that contains an indicator file
named LOG. The LOG file indicates that the document directory is complete and ready
to be processed.
The enqueue tool then copies the documents from the specified exchange directory
to a defined document directory (by default, <DPDIR>/<providing_server_name>/
m) on the server the Document Pipeline is installed on (provided the client has write
access to the <DPDIR>). For each document, a subdirectory with a unique name is
created, which then uniquely identifies the document. The enqueue tool also sends
the document path to the DP, and informs the DP which pipeline is responsible for
processing.
Document The DP then sends the document path to the input queue of the first DocTool of the
processing by specified pipeline, as defined in the corresponding dpconfig file. Thus, the DocTool
the DP
knows where the document components are located and can begin processing.
When processing is complete, the first DocTool notifies the DP about the status of
the operation (opcode) via the DPqStatus file. Depending on this opcode, the DP then
sends the document path to the next input queue. This continues until the document
has passed through the entire pipeline, at which time document processing is
completed.
Queue Within a single queue, documents entering this queue first are assigned to the
processing DocTool first. By default, only one DocTool at a time can process documents from
the same queue. That means a queue is blocked as soon as a DocTool starts to access
it. The DP will not provide documents from that queue to a second DocTool.
However, you can specifically define a queue as a non-blocking queue in the
dpconfig file. In this case, several DocTools may read from the same queue
simultaneously. Only the document that is currently being processed is blocked,
thus avoiding a situation in which the same document is processed simultaneously
by several DocTools.
On the other hand, several queues may be assigned to the same DocTool. In this
case, there are different modes to handle the order in which the queues are
processed; see also queuetime, doctime on page 33. Using the standard
configuration, input is provided to the DocTool by the queue that has not been read
from for the longest time (queuetime mode). This mode ensures a balanced
processing of queues.
DP recovery If for any reason the DP process is interrupted, it is automatically resumed after
restarting the DP. In this case, the defined <DPDIR> is searched to find any
directories containing a DPqStatus file. As this file contains the current status of the
file, the next processing step can be determined by the DP and is carried out.
Errors during In case the enqueue tool fails, an error file, which contains an error message, is
enqueuing created in the source directory. Before the document can be enqueued once again,
you must delete this error file. If the enqueuing was started by OpenText
Administration Client, the error message also appears in the job's messages.
Tip: For testing purposes, the enqueue tool has an option -test that does not
require a DP. When Enqueext is started with this parameter, it expects the
same directory structure as described above, and copies all files to the
subdirectory test in the source directory.
Which documents were enqueued to the Document Pipeline while processing a job
can be seen in the file
<ECM_VAR_DIR>/messages/job_start<pipeline_name>_<num>.log,
where start<pipeline_name> is the job for the corresponding pipeline and <num>
is a consecutive number that is incremented every time a job is started.
Windows
<ECM_DOCUMENT_PIPELINE_CONF>\config\setup\COMMON.Setup
UNIX/Linux
<ECM_DOCUMENT_PIPELINE_CONF>/config/setup/COMMON.Setup
These settings are used by all DocTools, and contain the connection information for
the storage system, for example, or the common Document Pipeline directory
(<DPDIR>). The most important common settings are described below. In addition,
there are some pipeline-specific settings stored in separate entries, which are only
used by specific Document Pipelines.
This variable is used by all DocTools that use the Archive API (for example, doctods), as well as by the
DP. To determine the contents of the ALHOST variable, you can use the dpctrl adms command.
This allows you to define a separate environment for the individual DocTools,
for example, to specify different ports for different tools. The FILING.Setup
file contains the common variables for all DocTools relevant for batch import
(for example Prepdoc, Enqueext, Enqueco). You must define the arguments for
the -env option in the corresponding servtab file that registers the DocTool.
For details on modifying the servtab files, see “servtab” on page 43.
Alternatively, you can specify an argument for the scheduled job; see “Starting
the Document Pipelines Using Jobs” on page 27.
For special requirements, additional arguments can be defined for the job.
4.1 dpconfig
The configuration file for the DP (dpconfig) defines which DocTools are executed in
which order for each Document Pipeline. Each DocTool retrieves the documents
from a specified source queue, carries out an operation, and - depending on the
result of the operation - sends the documents on to a specified target queue or an
error queue.
If the same DocTool is to be executed several times, but with different queues as
input and output sources, you can define DocTool types. DocTool types can be
distinguished by their names. The result of a DocTool operation is indicated by the
opcode in the DPqStatus file.
Thus, for each DocTool type and each possible opcode, the dpconfig file contains an
entry with the following syntax:
The special queue type nil is used for the beginning and end of the pipeline. nil as
a source indicates that the DocTool in question is one that creates document
directories. When nil is the target, the DP removes the processed document from its
administration.
The first entry in the dpconfig file has the following syntax:
nil.<doctooltype1>.done → <target_queue1>
The last DocTool entry in the dpconfig file has the following syntax:
Based on these entries, the Document Pipeliner (DP) can determine where to
continue document processing after an interruption by checking which queue the
document is currently in, which opcode is assigned to it, and then finding the correct
entry for that combination in the dpconfig file.
Additional entries
<doctool_type>: <parameter>
Tip: You can find a sample dpconfig file in OpenText Document Pipelines -
Overview and Import Interfaces (AR-CDP).
Comments (#)
Use the # character to mark comments in the dpconfig file that are to be ignored
by the DP.
<number>
A number defines how many types of the specified DocTool may run
simultaneously in the pipeline. If this number is exceeded when a DocTool tries
to sign on to the DP, the DP rejects the new instance of the DocTool.
Having several DocTool types run simultaneously is useful, for instance, if the
queues cannot be processed quickly enough. The stockist tools, for example,
which process error queues, may require some time before the errors can be
solved. Meanwhile the error queues may overflow if there is only one stockist
tool to do the job.
stopnull
Instructs a DocTool of this type to terminate when its queue is empty. By
default, a DocTool continues to wait for further documents when its queue is
empty. This setting is recommended for the stockist DocTool to keep
erroneous documents in the error queue (otherwise they would be permanently
moved from the error queue to the source queue and back).
runonly
Instructs the DP not to supply any other DocTool with documents while the
specified DocTool type is running. This mechanism can be used to let the
stockist run without any other DocTools disturbing its work (for example
cycling documents).
tellnowork
Instructs the DP to inform a DocTool of this type when it has no more
documents to process. It is up to the DocTool to react to this command (for
example by signing off from the DP).
disabled
Keeps the DocTool running, but not active; useful for test purposes.
(<sec>)
Sets a timeout for the DocTools of this type. If a DocTool of this type requires
more than <sec> seconds to process the document, the DP signs the DocTool off
and instructs it to terminate. This timeout enables the DP to recognize that a
DocTool has finished execution without signing off (for example when the
DocTool has been terminated by an external kill command).
The timeouts should be set to sufficiently large values. If a DocTool exceeds the
timeout under normal running conditions, the document is resubmitted to the
DocTool. This can result in a heavy load being placed on the machine on which
the DocTool runs. We do not recommend changing the timeouts for the
standard DocTools. If this is unavoidable, the new value should be carefully
tested.
• immediate mode
The DocTool receives documents from the queues in the order of their
definition in the dpconfig file. When a queue is empty, the DP switches to
the next queue.
The disadvantage of this mode is that documents in queues that are
configured last in the dpconfig file may have to wait a very long time for
processing, or in the worst case, are never processed at all.
The immediate mode is the default if no other mode is specified.
• doctime mode
In this mode, the DocTool receives input from the queue with the oldest
document (according to the time the document entered the queue). This
avoids individual documents being left unprocessed for longer periods;
however, it may cause other queues that have been filled more recently to
overflow.
If no mode is specified and there are no more documents from immediate
queues, the DP selects documents from the doctime queues.
• queuetime mode
In queuetime mode, input is provided by the queue that has not been read
from for the longest time.
This mode offers the most balanced processing of queues and is thus
preconfigured for most standard pipelines.
exr3stock0: runonly
exr3stock1: runonly
...
exr3stock0: stopnull
exr3stock1: stopnull
...
exr3stock0: 1
exr3stock1: 1
ExR3start: 1
4.2 dpinfo
OpenText Document Pipeline Info (DPInfo) is a utility for monitoring OpenText
Document Pipelines. With it, you can monitor the pipeline processes, making sure
that documents have been correctly processed. If an error occurs, you can quickly
locate the problem. DPInfo shows how many documents are waiting to be processed
by each DocTool. In case of errors, you can determine which error queues contain
documents.
DocTools create a protocol file named DPprotocol for every document. This file
contains brief information about the document processing results. The DPprotocol
file is displayed by DPInfo.
For details on the DPInfo monitoring tool; see OpenText Document Pipelines -
Overview and Import Interfaces (AR-CDP). In the dpinfo file, you can configure how
the information about the DocTools is displayed in DPInfo.
All DocTool types used in a pipeline are included in the flow construct:
flow("Pipeline description")
queue( ...)
...
queue( ...)
For each DocTool type, the dpinfo file contains a queue entry with the following
syntax:
Parameter Description
DocTool description Arbitrary DocTool description that is displayed in the
DPInfo window.
DocTool Name (Type) DocTool to be executed.
source queue for documents The DocTool takes the documents from this source queue.
error queue If errors occur during processing, the affected documents
are moved to this error queue. This parameter is optional.
stockist DocTool DocTool that returns documents from the error queue to
the source queue.
To support different languages, you can provide a lang construct for each language
containing the translated description strings for the flow construct and the queue
entries. DPInfo currently supports Japanese (language code JPN) and German
(language code GER). The syntax is as follows:
This example shows extracts from a dpinfo file and how this configuration
is visible in the screenshot below. For the flow construct and for each queue
entry, a line appears in the DPInfo window.
flow( "Import content and attributes into DocuLink (EXR3)" )
{
queue( "Parse document by XSL", "xsl_parser", "ExR3Xsl" )
{
stockist ("exr3_stock_xsl_parser")
}
queue( "Check document", "ExR3start", "ExR3Perldt" )
{
stockist ("exr3_stock_exr3start")
}
queue( "Copy document to document pipeline", "cpfile",
"ExR3Cpfile" )
{
stockist("exr3_stock_cpfile")
}
queue( "Convert TIFF to Multi-page TIFF", "Tiff2Mtiff",
"ExR3Tiff2Mtiff" )
{
stockist("exr3_stock_tiff2mtiff")
}
queue( "Select Archive ID from R/3","R3AidSel", "ExR3AidSel" )
{
stockist ("exr3_stock_r3aidsel")
}
queue( "Remove document from document pipeline", "docrm",
"ExR3Remove" )
{
stockist ("exr3_stock_docrm")
}
}
lang("JPN") {
"Import content and attributes into DocuLink (EXR3)"="COLD for
DocuLink: NCI \u6587\u66f8 (EXR3)"
"Parse document by XSL"="XSL \u30d7\u30ed\u30bb\u30c3\u30b5\u306b
\u3088\u308a\u6587\u66f8\u3092\u51e6\u7406\u3057\u307e\u3059"
"Check document"="\u6587\u66f8\u3092\u30c1\u30a7\u30c3\u30af
\u3057\u307e\u3059"
"Convert TIFF to Multi-page TIFF"="Multi-page TIFF
\u30d5\u30a1\u30a4\u30eb\u3092\u4f5c\u6210\u3057\u307e\u3059"
"Copy document to document pipe-
line"="\u6587\u66f8\u3092\u30d1\u30a4\u30d7\u30e9\u30a4\u30f3\u3078\u30
b3\u30d4\u30fc\u3057\u307e\u3059"
"Select Archive ID from R/3"="\u30a2\u30fc\u30ab\u30a4\u30d6 ID
\u3092 R/3 \u30b7\u30b9\u30c6\u30e0\u304b\u3089\u9078\u629e\u3057\u307e
\u3059"
...
"Remove document from document pipe-
line"="\u6587\u66f8\u3092\u30d1\u30a4\u30d7\u30e9\u30a4\u30f3\u304b
\u3089\u524a\u9664\u3057\u307e\u3059"
lang("DEU") {
4.3 monitor
Use the OpenText Archive Monitoring Web Client to check the activities of the
individual archive components and the free storage space in the pools, database, and
the OpenText Document Pipeline. For a detailed description, see Section 28 “Using
OpenText Archive Server Monitoring” in OpenText Archive Center - Administration
Guide (AR-ACN). The configuration of the Archive Monitoring Web Client is saved
in the monitor files that are located in the directory
<ECM_DOCUMENT_PIPELINE_CONF>/config/monitor.
# Comment
"Group" = {
group = nul {}
component_name = component_type {
parameter1 = …
parameter2 = …
}
}
The Archive Monitoring Web Client shows the status of the configured components,
i.e. DocTools.
You can define group and component_name arbitrarily. group must be defined in
each group. In most cases, the value nul{} is used, but any other component_type
can also be used.
The component types dpt and dpq_error are used in monitor files:
dpt
Description
dpt checks whether a DocTool is lazy, disabled or working. To check the results
of a dpt entry manually, use the command line call dpctrl tools <toolname>.
In case a DocTool terminates unexpectedly (not the stop of a DocTool), this is
not visible in the Archive Monitoring Web Client.
Parameters
maxrun Defines the number of DocTools that have to be online (= number of lazy
DocTools plus number of working DocTools).
If the number of lazy DocTools plus the number of working DocTools is
greater than maxrun, the status changes to WARNING, otherwise the
status is OK. If maxrun is not defined, it is set to 0. This means that there
is no check whether all DocTools are running, i.e. the status is always OK.
Possible status
Registered 0
Warning 50
Not registered 100
dpq_error
Description
dpq_error checks whether a queue is online. To check the results of a
dpq_error entry manually, use the command line call dpctrl queues <queue>.
Parameters
Possible status
Empty 0
Can't call server 98
Can't connect to 99
server
Not empty 100
The default values for the parameters of a component_type can also be specified by
environment variables. Environment variables define the default value of the
component type parameters for all component types, whereas the parameters in the
monitor configuration files only refer to special components.
This example shows the standard monitor file for the EXR3 pipeline. The
Extern group contains the components, in this case the DocTools, that are to
# DP Tools
#---------
"EXR3" = {
group = nul { }
xsl_parser = dpt { toolname= xsl_parser }
ExR3start = dpt { toolname= ExR3start }
cpfile = dpt { toolname= cpfile }
Tiff2Mtiff = dpt { toolname= Tiff2Mtiff }
R3AidSel = dpt { toolname= R3AidSel}
R3Formid = dpt { toolname= R3Formid}
Prepdoc = dpt { toolname= Prepdoc}
GenR3ins = dpt { toolname= GenR3ins}
page_idx = dpt { toolname= page_idx}
rendition = dpt { toolname= rendition }
doctods = dpt { toolname= doctods }
R3Insert = dpt { toolname= R3Insert}
docrm = dpt { toolname= docrm}
}
4.4 servtab
The servtab configuration files specify which processes the Spawner has to start.
These processes are the Document Pipeliner and the DocTools. This allows you to
define a separate environment for the individual DocTools, for example to specify
different ports for different tools.
• Definitions of variables that are valid in this servtab file by using the globenv
parameter. With these variables, you can shorten the servtab line entries.
• Servtab line entry for a DocTool which defines how to start the DocTool, see
“Syntax of an servtab Line Entry” on page 43.
• Comments
Use a separate line for each variable. To reference a variable, use the prefix $.
Name_of_the_entry;{once|wait|respawn|manual|stop|kill};{no|yes};
[local Environment];[working directory of the program];command with
parameters. The following table describes the options in detail. DocTools are here
referred to as processes, because there are other processes which are no DocTools.
Tip: Use the stop option to define a stop instruction for a DocTool (process) in
a way that the DP is aware of this stop. This prevents problems when
restarting a DocTool and updates the DPInfo window.
...
my_xsl_parser1;once;no;;$BINDIR;$BINDIR/xsl_parser -type xsl_parser_my -loglevel 9 -
logfile my_xsl_parser_1.log
The lines starting with globenv define variables that are valid in this servtab file.
For example, the variable LOG has the value $ECM_LOG_DIR and is used in the last
line as working directory.
The last line of the file, which is the most important one, consists of the following
parts:
5.1 dtcrt
The dtcrt program is a DocTool that is used to register newly created documents
with the Document Pipeliner (DP). It is used in the interface to host machines and in
the archive interface for text documents. dtcrt signs on to the DP and passes it the
path names of new document directories. dtcrt does not check whether these path
names actually exist or whether they are correct. Usage:
5.2 dpctrl
The dpctrl command line tool enables the user to interact with the DP or DocTools,
and to obtain information relating to both. This is especially useful for
troubleshooting and determining which tools are registered and active.
where:
• -dphost <host> specifies the DP host for which you require information. If no
host is specified, the local host is used.
• <arguments> can be any of the arguments listed in the table below. If the
command refers to one or more DocTools, enter a list of the respective DocTool
types, separated by commas.
Argument Description
-h Help on the dpctrl tool
alive Return code for the dpctrl command, indicates
whether the DP is still running
list Causes the DP to write its internal lists to the DP log file
queues [<name>] Returns a list of all queues in the DP. The queue name,
the number of documents in this queue, the queue's
blocking status (- indicates a blocking queue; + indicates
a non-blocking queue), and the name of the DocTool
currently blocking the queue (- indicates that the queue
is not being blocked at present) are output. If a queue
name is given, this information is output for just this
queue.
tools [<name>] Generates a list of all DocTools known to the DP. The
timeout for each DocTool and the queue from which it is
currently reading are listed. If the DocTool is inactive,
that is it does not process a document, “lazy” is output
instead of the name of the queue. If a DocTool name is
given, a list of queues from which the DocTool wants to
read is also produced.
params [<name>] Generates a list of parameters that can be set in the
dpconfig file. The number of instances allowed for
each DocTool, the non-blocking queues, the DocTool
timeouts, etc. are output. If a parameter name is given,
the current setting for just this parameter is output.
rules [<name>] Lists the rules according to which a document can move
from one queue to another. If a DocTool name is given,
only the rules for the specified DocTool are output.
Argument Description
docdirs [<queue>| Returns the relative path name of all documents
<directory>] currently under DP administration. If a queue is
specified, documents are output only from that queue. If
a directory is specified, only that document is returned
with its queue information.
docdirscom Returns the same output as the docdirs argument, but
[<queue>| appends the comment written by the last DocTools.
<directory>]
fssize Returns the free space (in KB) and the number of free
[<directory>] inodes on the partition containing the directory. The
output contains also a number identifying the partition
(which can be used to detect common partitions).
stop <DocTools> Signs off the specified DocTools after the current
document has been processed
force <DocTools> Signs off the specified DocTools immediately. This call is
used, for example, to notify the DP that a DocTool has
failed.
stopnull <DocTools> The specified DocTools are stopped if there are no jobs
waiting to be done. This call is equivalent to the entry
DocTool: stopnull in the dpconfig file.
pwd Outputs the current directory for the Document
Pipeliner
enable <DocTools> The specified DocTools are enabled, which means that
they are supplied with documents again.
disable[<n>] The specified DocTools are disabled for <n> seconds
<DocTools> (default is forever), which means that they are no longer
supplied with documents, but are not shut down.
msg <DocTools> The specified message is sent to the DocTools. It is not
<message> necessary to enclose the message in single quotation
marks, even if it is made up of several words.
Use this command to define the log level of an
individual DocTool dynamically. For more information,
see OpenText Document Pipelines - Overview and Import
Interfaces (AR-CDP).
shutdown The DP sends a stop command to all registered
DocTools. When all DocTools have stopped, the DP
stops itself. The DP stops within 150 seconds, even if one
or more DocTools have not yet signed off.
loglevel [<lev>] Enables the log level of the DP to be set dynamically (at
run time) for test and debug purposes. This function
should be used with care as the DP is significantly
slower when the log level is set high. You can define the
level of detail for the log files from 0 (only fatal errors) to
12 (very detailed debug information).
version Returns the version of the currently running DP
Important
All parameters and values of the dpctrl command line tool are case sensitive.
The following command sets the DEBUG log level of the R3Insert DocTool.
The following command sets the INFO log level of the R3Insert DocTool.
5.3 spawncmd
With the spawncmd utility, you can query the status of the individual archive
processes and control individual processes. With this utility, you can stop and
restart individual processes or all processes at once. After changing or creating a
servtab file, use the command spawncmd reread to force the Spawner to read the
new servtab file. For details on the spawncmd utility, see OpenText Archive Center -
Administration Guide (AR-ACN).
Important
All parameters and values are case sensitive.
1. Stop the Document Pipeliner and all DocTools with the command dpctrl
shutdown.
2. Create a new servtab file for each new DocTool.
3. Copy and adapt the dpconfig file.
4. Copy and adapt the dpinfo file.
5. Copy and adapt the monitor file.
6. Create new code for DocTool functionality with perldtn, javadt or a new binary
file. Please contact OpenText Professional Services if you want to use javadt.
7. If you want to transfer configuration files from Windows to UNIX (or vice versa)
via ftp, use the text mode.
8. Force the Spawner to read the new servtab file by entering the command
spawncmd reread.
9. Restart the Document Pipeliner and all DocTools with this command sequence:
1. dpctrl shutdown
2. spawncmd startall
Configuration details
Important
Never modify an installed servtab file. Modifications can result in problems
when upgrading.
globenv ; LOG=$ECM_LOG_DIR
Tiff2Mtiff_test;once;no;;$LOG;Tiff2Mtiff -type Tiff2Mtiff_test –
env FILING
extstock1b: runonly
extstock1b: stopnull
Important
If you set a stockist DocTool to runonly, you should also set it to
stopnull.
c. If necessary, add a line to enable that multiple types of the specified DocTool
run simultaneously in the pipeline:
Tiff2Mtiff_test: 3
3. Create a dpinfo file:
a. Add a queue parameter line for each DocTool to display the status in the
DPInfo window, for example:
lang("JPN")
{
...
"Convert TIFF to Multi-page TIFF"="Multi-page TIFF
\u30d5\u30a1\u30a4\u30eb\u3092\u4f5c\u6210\u3057\u307e\u3059"
...
}
lang("DEU")
{
...
"Convert TIFF to Multi-page TIFF"="TIFF nach Multipage-TIFF
konvertieren"
...
}
5. Modify the existing monitor file, or copy and modify it:
a. Add a line in the first group (containing the dpt components), for example:
You remove DocTools from the configuration files in the same way you added them.
Do this by modifying the following files (analog to inserting a DocTool). You may
leave the servtab file for later use.
• dpconfig
• dpinfo
• monitor
Important
OpenText recommends not modifying the configuration files of the standard
Document Pipeline. This prevents problems when upgrading. To make
modifications, copy the configuration files and rename the pipeline as shown
in the following example.
...
myExR3GenR3ins.GenR3ins.done ->
myExR3Page_idx
myExR3GenR3ins.GenR3ins.error ->
myExR3GenR3ins_error
myExR3GenR3ins_error.exr3_stock_genr3ins.ok ->
myExR3GenR3ins
myExR3GenR3ins_error.stockist.ok ->
myExR3GenR3ins
myExR3GenR3ins + queuetime..
myExR3Page_idx.page_idx.ok ->
myExR3Rendition
myExR3Page_idx.page_idx.error ->
myExR3Page_idx_error
myExR3Page_idx_error.exr3_stock_page_idx.ok ->
myExR3Page_idx
myExR3Page_idx_error.stockist.ok ->
myExR3Page_idx
myExR3Page_idx + queuetime
myExR3Rendition.rendition.ok ->
myExR3Doctods
...
...
myExR3GenR3ins.GenR3ins.done ->
myExR3Rendition
myExR3GenR3ins.GenR3ins.error ->
myExR3GenR3ins_error
myExR3GenR3ins_error.exr3_stock_genr3ins.ok ->
myExR3GenR3ins
myExR3GenR3ins_error.stockist.ok ->
myExR3GenR3ins
myExR3GenR3ins + queuetime
myExR3Page_idx.page_idx.ok ->
myExR3Rendition
myExR3Page_idx.page_idx.error ->
myExR3Page_idx_error
myExR3Page_idx_error.exr3_stock_page_idx.ok ->
myExR3Page_idx
myExR3Page_idx_error.stockist.ok ->
myExR3Page_idx
myExR3Page_idx + queuetime
...
myExR3Rendition.rendition.ok ->
myExR3Doctods
There are two ways to improve the performance of the DocTools in your Document
Pipelines, for example by reducing bottlenecks.
Assume that you want to enhance the throughput of the R3Insert DocTool
in your pipeline. To achieve this, enter the following lines at the end of
a .servtab file:
In the dpconfig file, increase the number of simultaneous runs for a DocTool at
which there is a bottleneck; and extend the .servtab file:
R3Insert_TIFF: 3
This part helps you to write your own DocTools. It describes the command line
options and functions of the DocTools perldtn and perldte as well as Perl modules
from OpenText that can be used by perldtn and perldte. It provides also sample
scripts for using perldtn and perldte.
perldtn uses these special OpenText modules to extend Perl for the Document
Pipeline, and makes scripting functionality available in the Document Pipeline for
customer-specific configuration.
With the Perl interpreter integrated, perldtn includes all features of Perl.
perldtn is derived from the doctool class and has all inherited features as all other
DocTools, for example. dpctrl msg print. Additionally, perldtn contains
OpenText-specific perl modules that enhance the functionality.
set SCRIPTDIR=%ECM_DOCUMENT_PIPELINE_BASE%\scripts\perl
set BINDIR=%ECM_DP_PERL_10_0_0%\bin
set PERLDIR=%ECM_DP_PERL_10_0_0%\perl-5.8.5\bin
set PATH="%BINDIR%;%PERLDIR%;%Path%;."
set Path=%PATH%
set PERL5LIB="%ECM_DP_PERL_10_0_0%\perl-5.8.5\lib;
%ECM_DP_PERL_10_0_0%\lib\perl-5.8.5;%ECM_DP_PERL_10_0_0%
\perl-5.8.5\site\lib"
%BINDIR%\perldtn -type rendition -env DT_RENDITION -script
"%SCRIPTDIR%\rendition.pl" -logfile rendition_1.log
Return values
Perl array with 2 elements:
9.3.2 service
Use this function to implement the functionality of the DocTool. The function is
called for every document that is passed to the DocTool.
Input parameters
1. Document directory
2. DocTool type
Return values
Perl array with 3 elements:
else
{return (0,"I’m sorry, my name is $dttype");}
}
9.3.3 printObject
This function writes the current settings of the DocTool into the log file of this
DocTool. The function has no parameter or return code.
9.3.4 control
This function serves for administrative purposes. It is called if the DocTool is called
from the command line with the dpctrl tool, for example
dpctrl msg <DocTool name> <Command string>.
Input parameters
1. DocTool name
2. Command string.
This is an arbitrary string that can contain for example commands to be sent
via dpctrl msg.
perldte
perldte is an enqueue DocTool and is derived from the doctool class and has all
inherited features as all other DocTools, for example dpctrl msg print. The
enqueuing of a new document must be done in a Perl script by calling the
IXOS::DT::DPop() function. Additionally there are OpenText-specific Perl modules
that enhance the functionality.
10.3.2 service
Use this function to implement the enqueuing functionality of the DocTool. The
function is called only once.
Input parameter
DocTool type
Return values
Perl array with 2 elements:
Log levels
There are 12 possible log levels (log level 12 contains all logging information):
Including IXOS::DTLogging
Add the following lines to the Perl script to include the IXOS::DTLogging
module.
use IXOS::DTLogging;
$ret = IXOS::DTLogging::logmsg( $loglevel , $logtext );
$ret = IXOS::DTLogging::init( $logfile, $loglevel );
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(),
"This message was written to log file, if LOG_INFO=on");
The module provides a standardized interface to open, to close, to read and to write
files in the working directory of the DP. The module consists of functions to work on
the COMMANDS-File and other self defined files. When a function writes to the
COMMANDS file, it also creates automatically a backup copy of the file.
Methods
IXOS::DTDocument2->new
First you have to call this function as follows:
My $obj = IXOS::DTDocument2->new(path, dtName).
The IXOS::DTDocument2->new method creates the DTDocument2 object. You
must use this method before you can work with the other methods of the
DTDocument2 module. The extension is based on the object-oriented feature
of Perl. For that reason you have to create the object instance by calling the
new method. Perl automatically manages the destruction of the object.
$obj->openCommands
This method opens the COMMANDS file if already closed. Since the
extension or perldtn always opens COMMANDS file automatically, you do
not need this module for opening the COMMANDS file for the first time.
$obj->appendToCommands(value)
Appends the string value to the commands file. This method creates
automatically a security copy of the COMMANDS file.
$obj->closeCommands()
This method closes the COMMANDS file. Normally you do not need this
method because the extension or perldtn closes the COMMANDS file
automatically. All the changes on the COMMANDS file will be written on
closing time
$obj->deleteFromCommands(key, values)
This method deletes a key-value pair form the commands file, but not from
the internal list. So you can find it with findKeyInCommands(key) after
deleting. The return value is the amount of statements or -1 for error.
$obj->appendToProtocol(value)
This method appends the string value to the DpProtocol protocol file.
$obj->closeProtocol()
This method closes the DpProtocol file. Normally you do not need this
method because the extension or perldtn closes the DpProtocol file
automatically.
$obj->checkCOMPStmts(docpath, ignorefileslist)
Checks if files exist that are specified with COMP statements in the
COMMANDS file. You can specify a list of files to be ignored
(ignorefileslist) by this method to prevent that an error is returned.
Return values:
$obj->findKeyInCommands( key)
Returns the amount of the key statements in the COMMANDS file.
$obj->getValueForFormCommands(key)
Returns an array of values that are attached to the key in the COMMANDS
file.
$obj->getAllValuesFromCommands(key)
This method returns the list of all entries in the COMMANDS file.
$obj->integrateChanges()
Integrates all append and delete changes for the COMMANDS file into the
internal list.
11.3 IXOS::DT
This module contains methods for the communication between a DocTool and the
Document Pipeliner (DP).
Methods
DPop
To be used in
perldte
Functionality
DPop is a function which is called after a DocTool has finished processing a
document. It is passed the following arguments:
The DP moves the document in question to the next queue (according to the
configuration defined in the DP's configuration file).
This method must only be used in perldte for enqueuing.
Synopsis
$status = IXOS::DT::DPop($docdir, $opcode, $comment);
Arguments
Argument Description
$docdir Document Directory
$opcode Operation code to be sent to DP
$comment Comment for DP
Return values
$: status code from DP.
DPcrt
To be used in
perldtn
Functionality
DPcrt registers documents to the DP. A document can be enqueued by any
DocTool type.
Synopsis
$status = IXOS::DT::DPcrt($host, $type, $docdir, $opcode, comment);
Arguments
Argument Description
$host Name of the host on which the target DP is running
$type Name of the DocTool type. This type must be known to the DP via
a dpconfig file.
$docdir Document Directory
$opcode Operation code to be handed over to DP.
Return values
$: status code from DP.
The status code can have the following values:
Status Description
code
0 Success
–1 Error
1 DPOP_NOCONNECT: connection to DP impossible
2 DPOP_MISCERR: serveral errors possible, for example:
• Too many documents in the pipeline
• Insufficient disk space
• DPqStatus file cannot be written
3 DPOP_CONFIGERR: configuration error, for example:
• Wrong document path
• Unknown operation code
DPcrt2
To be used in
perldtn
Functionality
DPcrt2 registers documents to the DP. The method must only be used in
perldtn.
Synopsis
$status = IXOS::DT::DPcrt2($docdir, $opcode, $comment);
Arguments
Argument Description
$docdir Document Directory
$opcode Operation code to be handed over to DP
$comment Comment for the DP
Return values
$: status code from DP.
getErrorText
To be used in
perldtn
Functionality
getErrorText returns message for specified status code.
Synopsis
$errmsg = getErrorText($status);
Arguments
Argument Description
$status Status code from DP
Return values
$: Text of the status code. There are the following status codes:
$status Text
0 DPOP_OK
1 DPOP_NOCONNECT
2 DPOP_MISCERR
3 DPOP_CONFIGERR
4 DPOP_SYNCERR
5 DPOP_NOHI
6 DP_NOSWITCH
Else Unknown error
Including IXOS::DT
Add the following line to the service function to include the IXOS::DT module.
use IXOS::DT;
11.4 IXOS::DTUtil
This module contains methods for reading global settings from setup files.
Methods
initPkgConfig($PkgKey)
This method reads the variables of the specified setup file. You can access
the variables via the getenv method.
The argument $PkgKey must contain the file name prefix of the setup file:
$PkgKey = "COMMON".
The method returns 1 on success, else 0:
$ret = $IXOS::DTUtil::initPkgConfig($PkgKey);
getenv($PkgKey)
This method fetches the content of the key in the setup file. It is necessary to
initialize using initPkgConfig() before using this method. The accessible
key depends on the initialization.
Including IXOS::DTUtil
Add the following lines to the service function to include the IXOS::DTUtil
module.
use IXOS::DTUtil;
use IXOS::DTUtil;
my $rc = IXOS::DTUtil::initPkgConfig("COMMON");
$DPDIR = IXOS::DTUtil::getenv("DPDIR");
if ($DPDIR eq "") {
return (0, "DPDIR is not set in the environment");
}
print "DPDIR = $DPDIR\n";
Examples
The script is called with the following command (here an example for enqueueing in
EXR3-pipeline):
perldte -script D:/temp/enqueue_ext.pl -type EnquedocExR3 -env EXR3
Make sure that the correct -type parameter is set. The parameter must be the same
as in the first line of the pipeline configuration file after the nil statement, in this
case the exr3.dpconfig file (see first line of the exr3.dpconfig file below).
The first function DoBeforeConnect is called once at the beginning of the procedure.
In this case, nothing is done.
The next function service is called once, too. This function does the following:
##############################################################
#####
# function is called once before connection to DP
# parameters: $CFG = path of pipeline-root
# $quelldir = path of ext_dir
# $dir[$i] = entries found in ext_dir
#
# returns: name of created pipeline-dir
##############################################################
#####
sub doBeforeConnect {
($dttype) = @_;
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(),"doBeforeConn
ect .....");
return 1;
}
##############################################################
######
# function is called once for reading the ext_dir
# parameters: $dttype = type of pipeline (as in
dpconfig !!!)
#
# global variables: @alldirs = entries found in ext_dir
#
# returns: name of created pipeline-dir
##############################################################
######
sub service {
($dttype) = @_;
#------------------------------------
# enqueueing of all EXT_DIR documents
#------------------------------------
$i = 0;
foreach my $dir (@alldirs) {
$docdir = "$dphost/e".time."_$i";
mkdir("$dpdir/$docdir");
$i++;
opendir (DIR2,"$extdir/$dir") or die "can't open
directory $extdir/$dir";
@alldirs2 = grep !/^\.\.?$/, readdir(DIR2);
foreach my $dir2 (@alldirs2) {
copy("$extdir/$dir/$dir2","$dpdir/$docdir");
}
IXOS::DT::DPop($docdir,"done","success");
closedir(DIR2);
}
closedir(DIR);
return ( 1 );
}
use IXOS::DTUtil;
use IXOS::DTLogging;
use IXOS::DTDocument2;
• The functions of the IXOS::DTDocument2 module are used for reading lines from
the commands file.
doBeforeConnect function
The doBeforeConnect function is called during the startup of the DocTool, before it
connects to the Document Pipeline. This function is used for initialization and for
fetching the environment variables DPHOST and DPDIR with the getenv function. The
commands for fetching the environment variables are just inserted for
demonstration purposes. The variables are not used in this DocTool.
The DocTool returns the value “1” (for successful operation) and the type of the
DocTool.
service function
5. Writing the values to the log file with the IXOS::DTLogging:logmsg() function.
6. The AppendCommandsStmt() function appends the string “Hello World” to the
COMMANDS file.
7. The service function returns “1” (success) as return code, a message text
(“document processed”) and the opcode “ok”. The opcode is evaluated in the
dpconfig file.
CheckCommandsStmt function
This function checks the COMMANDS file for the keyword given as input
parameter.
AppendCommandsStmt function
#=============================================================================
=========
# FUNCTION doBeforeConnect
#=============================================================================
=========
# DESCRIPTION: this perl script function will be called, during the startup
of the
# doctool, before the doctool connects
# to dp.
# PARAMETERS: doctool-type
# RETURNS: a perl array containing two elements:
# - returncode: 1 for success an 0 for an error
# - error text, which is written into the logfile
#=============================================================================
=========
sub doBeforeConnect
{
IXOS::DTLogging::logmsg(IXOS::DTLogging::_ENTRY());
($dttype) = @_;
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "Try to
initialize ...");
if (IXOS::DTUtil::getenv("DPHOST")) {
my $_dphost = IXOS::DTUtil::getenv("DPHOST");
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "DPHOST=
$_dphost");
}
if (IXOS::DTUtil::getenv("DPDIR")) {
my $_dpdir = IXOS::DTUtil::getenv("DPDIR");
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "DPDIR=$_dpdir");
}
#=============================================================================
=========
# FUNCTION service
#=============================================================================
=========
# DESCRIPTION: this perl script function will be called for each document
# PARAMETERS: document directory and doctool-type
# RETURNS: a perl array containing three elements:
# - returncode: 1 for success an 0 for an error (recommended),
# - text (recommended)
# The text is written into the log file and, as info, in the
protocol
# file if the returncode is 1.
# The text is written into the log file and, as error, in the
protocol
# file if the returncode is 0.
# - opcode, which is returned to the DP (optional). The default
depends
# on the returncode.
#initialisation of variables
my $archivid = "";
my $doctype = "";
my $docid = "";
if ($rc[0] == "0") {
return(0, $rc[1], "error");
} else {
$archivid = $rc[1];
}
if ($rc[0] == "0") {
return(0, $rc[1], "error");
} else {
$docid = $rc[1];
}
if ($rc[0] == "0") {
return(0, $rc[1], "error");
} else {
$doctype = $rc[1];
}
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "ARCHIVID=$archivid");
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "DOCID=$docid");
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "DOCTYPE=$doctype");
##############################################################################
#########
# Other functions used in doBeforeConnect and/or
service #
##############################################################################
#########
#=============================================================================
=========
# FUNCTION CheckCommandsStmt
#=============================================================================
=========
# DESCRIPTION: this perl script function check the file 'COMMANDS' for the
keyword
# given as parameter
# PARAMETERS: COMMANDS keyword (e.g. DOCID)
# RETURNS: a perl array containing two elements:
# - returncode: 1 for success an 0 for an error,
# - textstring
#=============================================================================
=========
sub CheckCommandsStmt
{
IXOS::DTLogging::logmsg(IXOS::DTLogging::_ENTRY());
my($dtdocument2,$statement) = @_;
my $value = "";
$dtdocument2->openCommands();
$ret = $dtdocument2->findKeyInCommands($statement);
if ($ret == 0) {
@rc = ("0", "Cant' find '$statement' statement in file 'COMMANDS'!");
} elsif ($ret > 1) {
@rc = ("0", "Found more then one '$statement' statement in file
'COMMANDS'!");
} elsif ($ret == 1) {
$value = $dtdocument2->getValueForFromCommands($statement);
if($value eq '') {
@rc = ("0", "Can't get value for '$statement' statement in file
'COMMANDS'!");
} else {
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "Get value $value
for '
$statement' statement from file 'COMMANDS'.");
@rc = ("1", $value)
}
}
return @rc;
}
#=============================================================================
=========
# FUNCTION AppendCommandsStmt
#=============================================================================
=========
# DESCRIPTION: this perl script function appends a statemnet to the file
'COMMANDS'
# PARAMETERS: statement string
# RETURNS: - returncode: 1 for success an 0 for an error,
#
#=============================================================================
=========
sub AppendCommandsStmt
{
IXOS::DTLogging::logmsg(IXOS::DTLogging::_ENTRY());
my($dtdocument2,$statement) = @_;
$dtdocument2->openCommands();
$dtdocument2->appendToCommands("\n" . $statement . "\n");
$dtdocument2->closeCommands();
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "append statement
'$statement' to
file 'COMMANDS'.");
return 1;
}
DOCTYPE FAX
COMP angebot.fax FAX angebot.fax
COMP im ASCII_NOTE im
R3_DESTINATION QM2
R3_CLIENT 800
--R3_OBJ_TYPE YWH1OWRT
--R3_DOC_TYPE YWH1WRT1
--R3_OBJ_ID 12345678901234567890
--R3BC_TEC_DOC_TYPE BCTECDOCTYPE
COMPUTERNAME brauneck
USERNAME Write_DP
ARCHIVID X1
USE_DOCID_FROM_COMMANDS on
DOCID 1.brauneck.X1.071129130115
RETENTION_PERIOD none
After running the perldtn DocTool, the COMMANDS file contains the additional line
“Hello World ...” at the end.
DOCTYPE FAX
COMP angebot.fax FAX angebot.fax
COMP im ASCII_NOTE im
R3_DESTINATION QM2
R3_CLIENT 800
--R3_OBJ_TYPE YWH1OWRT
--R3_DOC_TYPE YWH1WRT1
--R3_OBJ_ID 12345678901234567890
--R3BC_TEC_DOC_TYPE BCTECDOCTYPE
COMPUTERNAME brauneck
USERNAME Write_DP
ARCHIVID X1
USE_DOCID_FROM_COMMANDS on
DOCID 1.brauneck.X1.071129130115
RETENTION_PERIOD none
Hello World ...
ADMS
See Administration Server.
AFP
See Advanced Function Presentation (AFP).
Archive Center
Archive Center (former Archive Server) provides a full set of services for content
and documents. Archive Center can either be used as an integral part of
OpenText Content Suite Platform or as standalone services in various scenarios.
These services include handling archiving needs and combining the Document
Service, archive databases, etc.
Archive ID
Web-based administration tool for monitoring the state of the processes, storage
areas, Document Pipelines and database space of Archive Center.
Archive Spawner
Service program that starts and terminates the processes of the archive system.
ArchiveLink
CMIS
See Content Management Interoperability Services (CMIS).
COLD
Computer Output to Laser Disk – automatically created document lists that are
stored on an optical storage medium.
COMMANDS file
A COMMANDS file is created for each document and transferred with it. This file
contains processing information for the document and can be extended by any
DocTool to include information or parameters for a subsequent tool; for example,
a document ID after storing a document to an archive, or formatting instructions
for an XML file.
DOCDIR
DocTool Types
If the same DocTool is to be executed several times, but with different queues as
input and output sources, you can define DocTool types. DocTool types can be
distinguished by their names.
DocTools
Document ID
Unique string assigned to each document with which the archive system can
identify it and trace its location.
Document lists
A document list is a single file that contains several individual documents. Each
of the documents can consist of several pages and has its own set of attributes. To
preserve the layout of the documents in a document list, it is possible to overlay
the individual documents with a form.
Document Pipeline
DP
See Document Pipeliner (DP).
dpconfig file
The configuration file for the Document Pipelines defines which DocTools are
executed in which order and contains at least one line per DocTool type. This line
specifies the source of the documents to be processed (that is the queue for the
DocTool) and the destination for the documents once processed (that is the queue
for the next DocTool).
DPDIR
The exchange directory to which the documents are copied from the enqueuing
tools and in which the DocTools perform their operations. Each document has its
own subdirectory in which all files are located concerning the same document.
DPqStatus file
This file contains a record of all the processing steps that the document has
already undergone together with the corresponding time stamps. The last line
always reflects the current status of the document. This file is mainly used
internally for recovery after a disturbance in the Document Pipeline process, as it
enables the pipeline to continue processing the document at precisely the step it
was stopped.
Enqueue tool
The DocTool that transfers the documents from the exchange directory to the
initial source queue of the Document Pipeline.
Exchange directory
The directory which is used for exchange of data to be retrieved or archived. This
directory is dedicated to the exchange between the leading application, the
Document Pipeline, and Archive Center
Indexing
Definition of storage conditions, for example the archive to which the document
is to be stored, by selecting the scenario, default settings and document type for
the document to be stored.
Indicator file
IXATTR file
The attribute values and various other items of document information must be
specified individually for each document and provided in the IXATTR file. The
structure of the IXATTR file is closely associated with the database layout of the
customer's leading application and must be created in accordance with the
customer's specific requirements, that is certain data must be customized.
Jobs
Leading application
Log file
See Indicator file.
Logical archives
The storage area on Archive Center in which documents can be stored. Each
logical archive may be configured to represent a different archiving strategy
appropriate to the types of documents archived exclusively there. It may consist
of one or more pools.
Meta documents
Meta (MTA) documents are also known as document lists, that is one
comprehensive file containing several individual documents of the same file
format. If indexing information is provided for the Meta document
(META_DOCUMENT_INDEX component), the individual documents can be searched
for and retrieved quickly and easily.
Opcode
Short form of operation code. The code that indicates the result of a DocTool
operation, for example ok or error. Depending on the opcode, the document is
transferred to the specified queue.
Print lists
Documents that are created by the leading application and consist of lists of data.
Queues
Waiting lists for multiple tasks of the same type to be executed successively.
Servtab files
Source queue
stockist
The special DocTool that processes documents in an error queue and returns the
document to the previous DocTool.