0% found this document useful (1 vote)
2K views

OpenText Document Pipelines 16.2 - Programming Guide English

Uploaded by

Mukund Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
2K views

OpenText Document Pipelines 16.2 - Programming Guide English

Uploaded by

Mukund Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

OpenText™ Document Pipelines

Programming Guide

This guide describes the OpenText Document Pipeline


configuration and the DocTools perldtn and perldte.

AR160200-PDP-EN-1
OpenText™ Document Pipelines
Programming Guide
AR160200-PDP-EN-1
Rev.: 23/Oct/2019
This documentation has been created for software version 16.2.
It is also valid for subsequent software versions as long as no new document version is shipped with the product or is
published at https://knowledge.opentext.com.

Open Text Corporation

275 Frank Tompa Drive, Waterloo, Ontario, Canada, N2L 0A1

Tel: +1-519-888-7111
Toll Free Canada/USA: 1-800-499-6544 International: +800-4996-5440
Fax: +1-519-888-0677
Support: https://support.opentext.com
For more information, visit https://www.opentext.com

Copyright © 2019 Open Text. All Rights Reserved.


Trademarks owned by Open Text.

One or more patents may cover this product. For more information, please visit https://www.opentext.com/patents.

Disclaimer

No Warranties and Limitation of Liability

Every effort has been made to ensure the accuracy of the features and techniques presented in this publication. However,
Open Text Corporation and its affiliates accept no responsibility and offer no warranty whether expressed or implied, for the
accuracy of this publication.
Table of Contents
Part 1 Introduction 5

1 Guide Overview and General Information ............................... 7


1.1 About This Guide .............................................................................. 7
1.2 Conventions ..................................................................................... 8
1.3 Further Information .......................................................................... 10

Part 2 General Document Pipeline Process 13

2 Document Pipeline Concept ................................................... 15


2.1 Document Pipeliner ......................................................................... 18
2.2 DocTool Concept ............................................................................ 18
2.3 Providing the Documents: Document Directories ............................... 19

3 Deploying Document Pipelines .............................................. 21


3.1 Document Pipeline Process on the Server ........................................ 21
3.2 Environment Variables .................................................................... 24
3.3 Starting the Document Pipelines Using Jobs ..................................... 27

Part 3 Reconfiguring Document Pipelines 29

4 Standard Configuration Files ................................................. 31


4.1 dpconfig ......................................................................................... 31
4.2 dpinfo ............................................................................................. 36
4.3 monitor ........................................................................................... 40
4.4 servtab ........................................................................................... 43
4.4.1 servtab File Structure ...................................................................... 43
4.4.2 Syntax of an servtab Line Entry ........................................................ 43
4.4.3 servtab Example (For Windows) ....................................................... 46

5 Command Line Tools .............................................................. 47


5.1 dtcrt ............................................................................................... 47
5.2 dpctrl .............................................................................................. 48
5.3 spawncmd ...................................................................................... 50

6 Inserting DocTools .................................................................. 51


6.1 Configuration Steps ......................................................................... 51

7 Removing DocTools ................................................................ 55

8 Optimizing DocTool Usage ..................................................... 57


8.1 Starting Separate DocTools for Each Pipeline ................................... 57
8.2 Running DocTools in Parallel ........................................................... 57

OpenText Document Pipelines – Programming Guide iii


AR160200-PDP-EN-1
Table of Contents

Part 4 perldtn and perldte 59

9 perldtn ...................................................................................... 61
9.1 Command Line Options ................................................................... 61
9.2 Running perldtn as Standard Doctool ............................................... 62
9.3 Functions for perldtn as Standard DocTool ....................................... 63
9.3.1 doBeforeConnect ............................................................................ 63
9.3.2 service ........................................................................................... 63
9.3.3 printObject ...................................................................................... 64
9.3.4 control ............................................................................................ 64

10 perldte ....................................................................................... 65
10.1 Command Line Options ................................................................... 65
10.2 Running perldte as Enqueueing Doctool ........................................... 66
10.3 Functions for perldte as Standard DocTool ....................................... 66
10.3.1 doBeforeConnect ............................................................................ 66
10.3.2 service ........................................................................................... 66

11 OpenText-specific Perl Modules ............................................ 69


11.1 Module IXOS::DTLogging ................................................................ 69
11.2 Module IXOS::DTDocument2 ........................................................... 70
11.3 IXOS::DT ........................................................................................ 71
11.4 IXOS::DTUtil ................................................................................... 74

12 Examples .................................................................................. 77
12.1 Enqueueing Documents into a OpenText Document Pipeline with
perldte ............................................................................................ 77
12.2 Using perldtn as DocTool Running in a Document Pipeline ................ 79
12.2.1 Modules and Functions ................................................................... 79
12.2.2 Sample Script ................................................................................. 82
12.2.3 Sample COMMANDS Files .............................................................. 85

GLS Glossary 87

iv OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
Part 1
Introduction
Part 1 Introduction

This part of the Programming Guide provides you with general information about
this guide and how you get further documentation concerning OpenText Document
Pipelines as well as product information about OpenText products.

6 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
Chapter 1
Guide Overview and General Information

1.1 About This Guide


This guide is intended for customizers of the OpenText Document Pipelines. It
assumes a basic knowledge of OpenText Document Pipelines and OpenText Archive
Center corresponding to the OpenText training course.

Before you start, you need to understand the basic concepts of OpenText Document
Pipelines. For a general introduction, read the guide OpenText Document Pipelines -
Overview and Import Interfaces (AR-CDP).

Practical knowledge of Perl programming is required if you want to write your own
DocTools.

This guide contains the following information:

“Introduction” on page 5
This part describes the contents of this guide and the conventions used, and
gives an overview of the documentation.
“General Document Pipeline Process” on page 13
This part explains basic concepts for the OpenText Document Pipeline and
describes the Document Pipeline process on the server.
“Reconfiguring Document Pipelines” on page 29
This part describes the standard configuration files, how to insert DocTools in a
Document Pipeline and how to remove them. Furthermore it provides
information how to improve the performance of DocTools.
“perldtn and perldte” on page 59
This part gives an introduction to the DocTools perldtn and perldte helping you
to write your own DocTools. It provides also information on OpenText-specific
Perl modules and sample scripts.

OpenText Document Pipelines – Programming Guide 7


AR160200-PDP-EN-1
Chapter 1 Guide Overview and General Information

1.2 Conventions
User interface
This format is used for elements in the graphical user interface (GUI), such as
buttons, names of icons, menu items, and fields.
Filenames, commands, and sample data
This format is used for file names, paths, URLs, and commands at the command
prompt. It is also used for example data, text to be entered in text boxes, and
other literals.

Note: If you copy command line examples from a PDF, be aware that PDFs
can contain hidden characters. OpenText recommends that you copy from
the HTML version of the document, if it is available.
KEY NAMES
Key names appear in ALL CAPS, for example:
Press CTRL+V.
<Variable name>
Angled brackets < > are used to denote a variable or placeholder. The user
replaces the brackets and the descriptive content with the appropriate value. For
example, <server_name> becomes serv01.
Internal cross-references
Click the cross-reference to go directly to the reference target in the current
document.
External cross-references
External cross-references are usually text references to other documents.
However, if a document is available in HTML format, for example, in the My
Support, external references may be active links to a specific section in the
referenced document.
Warnings, notes, and tips

Caution
Cautions help you avoid irreversible problems. Read this information
carefully and follow all instructions.

Important
Important notes help you avoid major problems.

Note: Notes provide additional information about a task.

Tip: Tips offer you quicker or easier ways of performing a task.

Directories The following variables for installation and configuration directories are used:

8 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
1.2. Conventions

<ECM_DP_PERL_10_1_1>
Installation directory for Document Pipeline Perl, in this case Perl 10.1.1. You
can install Document Pipelines Perl of different versions in parallel.
Example for Windows:
C:\Program Files\Open Text\Document Pipeline Perl_10_1_1.

<ECM_DOCUMENT_PIPELINE_BASE>
Installation directory for Document Pipeline Base.
Example for Windows:
C:\Program Files\Open Text\Document Pipeline Base.

<ECM_DOCUMENT_PIPELINE_INFO>
Installation directory for Document Pipeline Info (DPInfo).
Example for Windows:
C:\Program Files\Open Text\Document Pipeline Info.

<ECM_LOG_DIR>
Location of log files.
Example for Windows:
C:\Documents and Settings\All Users\Application Data\Open Text\var
\LogDir

<ECM_VAR_DIR>
Location of protocols for enqueueing.
Example for Windows:
C:\Documents and Settings\All Users\Application Data\Open Text\var

<ECM_DOCUMENT_PIPELINE_CONF>
Global OpenText Document Pipeline configuration directory.
Example for Windows:
C:\Documents and Settings\All Users\Application Data\Open Text\BASE
Document Pipeline

To display a variable in Windows

1. Open a command line.

2. Execute set_ECM.

To display a variable in UNIX

1. Logon as root user.

2. Execute the file etc/opentext/conf_dirs/00SPAWNER.conf.

3. Set the environment of the archive user by executing the .$CONFIG/


setup/profile file.

4. Enter the following command: echo $ <Name of the variable>, for


example
echo $ <ECM_DOCUMENT_PIPELINE_CONF>.

OpenText Document Pipelines – Programming Guide 9


AR160200-PDP-EN-1
Chapter 1 Guide Overview and General Information

1.3 Further Information


This manual This manual is available in PDF and HTML format and can be downloaded via the
OpenText My Support (https://knowledge.opentext.com/docs).

OpenText recommends reading the following documentation in addition to this


guide:

Release Notes
The Release Notes describe in detail the software supported with the product
and important dependencies, as well as any last-minute changes regarding the
documentation that should be made known. The current version of the
OpenText Document Pipelines Release Notes is available via the OpenText My
Support at https://knowledge.opentext.com/knowledge/llisapi.dll/open/
14711375. Depending on the OpenText products you use, you may also need the
Release Notes of other products that are available in the My Support.
Installation Guides
The installation guides describe the standard installation of the components
required. In particular, the following guides contain installation instructions for
OpenText Document Pipelines:
Document Pipeline Base, SAP, DocuLink, and ELS
OpenText Document Pipeline - Installation and Upgrade Guide (AR-IDPDP), see
Document Pipeline Downloads in OpenText My Support (https://
knowledge.opentext.com/knowledge/llisapi.dll/open/14711031).
TCP Document Pipelines
OpenText Transactional Content Processing - Installation Guide (TCP-IGD), see
OpenText Transactional Content Processing in OpenText My Support
(https://knowledge.opentext.com/knowledge/llisapi.dll/open/14503093)
Document Pipeline for Content Server
Section 6.4.4 “Installing Document Pipeline for Content Server” in OpenText
Imaging Enterprise Scan - Installation Guide (CLES-IGD)
File System Archiving Document Pipeline
OpenText File System Archiving - Installation Guide (FA-IGD)

Customizing and administration guides


The OpenText Archive Center - Administration Guide (AR-ACN) contains important
information on basic administration tasks that are also required for OpenText
Document Pipelines.

OpenText Online (http://online.opentext.com/) is a single point of access for the


product information provided by OpenText. You can access the following support
sources through OpenText Online:

• Communities
• Knowledge Center

10 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
1.3. Further Information

OpenText Online Communities ( https://communities.opentext.com/communities/


cs.dll/open/OpenTextOnlineCommunity) provide the following resources:

• Usage tips, help files, and best practices for customers and partners.
• Information on product releases.
• User groups and forums where you can ask questions to OpenText experts.

The OpenText My Support (https://knowledge.opentext.com) is OpenText's


corporate extranet and primary site for technical support. The My Support is the
official source for the following:

• Product downloads, patches, and documentation including Release Notes.


• Discussion forums, Online Communities, and the Knowledge Base.
• OpenText Developer Network (OTDN), which includes developer
documentation and programming samples for OpenText products.

If you need additional assistance, you can find OpenText Corporate Support
Contacts at http://support.opentext.com/.

OpenText Document Pipelines – Programming Guide 11


AR160200-PDP-EN-1
Part 2
General Document Pipeline Process
Part 2 General Document Pipeline Process

This part describes the concept of Document Pipelines and the components, files and
environment variables that are involved in the configuration and operation of
pipelines.

14 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
Chapter 2

Document Pipeline Concept

Note: This chapter provides general information on Document Pipeline


concepts. For a more technical description, see “Document Pipeline Process on
the Server” on page 21.

Conveyor belt A Document Pipeline is the basic component in almost all document processing
analogy software and is used, for instance, to transfer documents to a storage system or
another application while performing certain additional tasks. Speaking figuratively,
a Document Pipeline is the conveyor belt that transfers the documents through the
software. Individual tools (called DocTools) along the way retrieve the documents
from the conveyer belt, process them one by one, and then return them to be
processed by the next tool. The last tool in the pipeline generally removes the
document from the conveyor belt. Depending on the configuration, Document
Pipelines can contain various different DocTools to implement all different kinds of
document processing, and further tools can be added as required.

Transactional An important principle for all Document Pipelines is that processing is always
transactional. That means the processing status of the document is always defined –
either it has been processed by a specific DocTool or not – and no documents can get
lost. If for any reason the Document Pipeline is aborted or processing is cancelled at
any time, the document is considered to be unprocessed by the last active DocTool.
The current status is retained at all times. Therefore, when the Document Pipeline is
started again, processing can continue at precisely the same step the document was
at when the program was aborted.

DPDIR Technically, the Document Pipeline is implemented as a special document directory


(DPDIR). The Document Pipeline directory is the physical workspace for the
DocTools. During the installation of a Document Pipeline host, the DPDIR is created.
For every document that is enqueued into a Document Pipeline, a subdirectory
called Document Directory will be created as a subdirectory below the Document
Pipeline directory. All components (files) of such a document will be copied to this
directory. This directory is the identifier for the contained document when
registering with the Document Pipeliner. For more information, see “Document
Pipeliner” on page 18.

Queues and Queues are present “between” each DocTool to keep the waiting documents. While
statuses a document is in the Document Pipeline, it is always located in one distinct queue.
The documents are processed in the order they are enqueued into the Document
Pipeline. If a DocTool processes a document, the next documents are waiting in the
input queue of the DocTool until they can be processed. After successful processing,
the document is assigned to the target queue in the Document Pipeline definition.
This queue is the input queue of the next DocTool in the Document Pipeline. If an
error occurs during document processing, the document is assigned to an error

OpenText Document Pipelines – Programming Guide 15


AR160200-PDP-EN-1
Chapter 2 Document Pipeline Concept

queue. This means that each DocTool has at least two queues assigned: the input
and the error queue. These queues do not exist physically.

Depending on the specific function of a queue, the documents it contains have a


certain status. The following types of queues and statuses are available:

Table 2-1: Document Pipeline statuses

Queue Status
Source queue Document is waiting to be processed
Target queue (≙ source queue of the next Document has been processed successfully
tool) by previous tool; waiting for next
Error queue Document processing failed; error processing
required

A document queue has a FIFO (First In – First Out) structure, that is documents
entering a queue first will be assigned to a DocTool first.

General The general process in a Document Pipeline is shown in Figure 2-1 and refers to the
process following description:

1. A document is taken from a defined exchange directory and placed in the initial
source queue by a special tool called the enqueue tool. The enqueue tool performs
these steps:

• Copying all document components to the Document Directory below


DPDIR.
• Notifying the Document Pipeliner that a document is ready for processing

For more information, see “Enqueue Tool” on page 20:

2. The Document Pipeliner reads the Document Pipeline configuration and calls
the DocTool with the parameter Document Directory.

3. The DocTool processes the document and returns an operation code for success
or error and the name of the Document Directory. This operation code is set as
status of the document. You can see the results in the OpenText Document
Pipeline Info (DPInfo) window. If the operation was successful, it is placed in
the target queue, which corresponds with the source queue of the next DocTool
in the processing chain.

4. Only if an error occurs during processing, the document is placed in the error
queue of the DocTool and proceeds as follows:

a. It awaits processing by the special stockist tool, which handles errors.


b. The stockist tool places the document in the source queue once again to be
reprocessed.

5. Steps 2 to 4 are repeated for all existing DocTools in the Document Pipeline.

16 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
6. The last tool in the pipeline (docrm) removes the document from the Document
Pipeline.

EXT_DIR

1a

Enqueue
DocTool 1 DocTool 2 docrm
Tool
2 3 6 6
5

1b
Target queue 1
5

Source queue 1 Source queue 2 Source queue n
Error queue 1 Error queue 2 Error queue 2

Document Directory (DPDIR)


4a
4b
stockist stockist stockist

Figure 2-1: General process in a Document Pipeline

Note: No document is physically copied but there are internal lists, which
represent the various queues for the different DocTools in the pipeline.

Logging the Before moving a document to the next queue, a time stamp and the input queue
status name of the first DocTool are entered in its status file (DPqStatus). The status file is
transferred together with the document and contains a record of all the processing
steps that the document has already gone through, together with the corresponding
time stamps. The last line always reflects the current queue name of the document.
This file is mainly used internally for recovery after a disturbance in the Document
Pipeline process, as it enables the pipeline to continue processing the document at
precisely the step it was stopped.

Monitoring the You can monitor each step in the Document Pipeline processing chain in a special
process tool called Document Pipeline Info. Document Pipeline Info shows how many
documents are waiting to be processed by each DocTool. In case of errors, you can
determine which error queues contain documents. For details on the Document
Pipeline Info monitoring tool, see OpenText Document Pipelines - Overview and Import
Interfaces (AR-CDP).

OpenText Document Pipelines – Programming Guide 17


AR160200-PDP-EN-1
Chapter 2 Document Pipeline Concept

2.1 Document Pipeliner


The Document Pipeliner (DP) controls the flow of documents and the DocTools
involved in a Document Pipeline. Each document to be processed by a Document
Pipeline registers with the Document Pipeliner, and each DocTool involved in the
process must be registered with the Document Pipeliner at startup.

The DP acts as a dispatcher and provides each DocTool with documents to process -
that is, with the path name for the document directories in the <DPDIR>. Once a
DocTool has finished processing a document, it notifies the DP that it is done by
sending it the path name of the document directory and an identifier for the
operation that has been performed. The DP tracks the fact that the document has
been processed by the DocTool in the DPqStatus file, together with the current time
stamp. Afterwards, the DP enters the document into the source queue of the next
DocTool.

The DPqStatus file is used in case of error recovery. For example, after crash and
restart of the Document Pipeliner, it reads the entries for each document in the
DPqStatus file. With the information in these entries, the Document Pipeliner
assigns each document to the queue in which the document was before the crash.

For technical details on the DP, see “Deploying Document Pipelines“ on page 21.

2.2 DocTool Concept


Standard As described above, the DocTools are the tools that actually process the documents
DocTools within a Document Pipeline. Some standard tools are available in almost all
Document Pipelines, such as the enqueue tool, which provides the documents for the
initial source queue, or the docrm tool, which removes the documents from the
Document Pipeline at the very end of the process. Furthermore, any number of
additional DocTools can be executed, for example to store documents in an archive
(doctods), or to insert meta-data in a leading application (for example R3Insert for
insertion in SAP R/3).

You can find a functional description of the most common DocTools in OpenText
Document Pipelines - Overview and Import Interfaces (AR-CDP).

dpconfig Which DocTools are executed in which order is defined in a special configuration
file (dpconfig) for each Document Pipeline. Each DocTool retrieves the documents
from a specified source queue, carries out an operation, and - depending on the
result of the operation - sends the documents on to a specified target queue or an
error queue. For each possible processing step, an entry is made in the dpconfig file.
For details on the configuration file, see “dpconfig” on page 31.

DocTool types If the same DocTool is to be executed several times, but with different queues as
input and output sources, you can define DocTool types. DocTool types can be
distinguished by their names. Defining DocTool types is helpful, for instance, if the
same DocTool is used more than once in the same Document Pipeline, allowing for
parallel processing in one pipeline step.

18 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
2.3. Providing the Documents: Document Directories

COMMANDS As the documents are transferred from one DocTool to another, a means of
file communication between the tools is required. Therefore, a COMMANDS file is
created for each document. The COMMANDS file is stored with the document in its
document directory. It contains processing information for the document and can be
extended by any DocTool to include information or parameters for a subsequent
tool, for example a document ID after storing a document to an archive, or
formatting instructions for an XML file. Each entry in the COMMANDS file starts with a
special keyword. This allows each DocTool to scan the file for those keywords that
are relevant to it.

For more information on the COMMANDS file, see OpenText Document Pipelines -
Overview and Import Interfaces (AR-CDP).

2.3 Providing the Documents: Document Directories


The documents that are to be processed by the Document Pipeline (except for Scan
pipelines, see below) are provided in a specified exchange directory, which is defined
during installation. A directory two levels below the exchange directory is created
for each document, also called the exchange document directory.

Within the exchange document directory, the following files are required for the
document to be processed by the Document Pipeline:

• The data file(s) belonging to the document, for example, data, data1, data2, etc.
All files in a document must be in the same directory.
• One of the following files containing the attributes of the document:

– The IXATTR file. For more information, see Section 7.1.2 “Providing the
attributes in an IXATTR file” in OpenText Document Pipelines - Overview and
Import Interfaces (AR-CDP).
– An xml together with an XSL style sheet. For more information, see Section
7.1.3 “Providing the attributes in XML” in OpenText Document Pipelines -
Overview and Import Interfaces (AR-CDP).
• The COMMANDS file, containing the processing information and parameters for the
DocTools. For more information, see Section 7.3 “COMMANDS file” in OpenText
Document Pipelines - Overview and Import Interfaces (AR-CDP).
• The indicator file LOG that indicates that the directory is ready to be processed.
The document will be processed by the enqueue tool only if this file is available in
the document directory.

Note: Although the general concept for providing documents to the pipeline is
similar for all import scenarios, there are minor differences for Batch import
with attribute extraction. The IXATTR file containing the attributes is
created internally by the first DocTools in the pipeline, and the COMMANDS file is
provided centrally for all documents of the same type in a special configuration
file directory. For more information, see Section 7.2.3 “Providing the
documents for Batch import with attribute extraction” in OpenText Document
Pipelines - Overview and Import Interfaces (AR-CDP).

OpenText Document Pipelines – Programming Guide 19


AR160200-PDP-EN-1
Chapter 2 Document Pipeline Concept

There are different means to provide the required document directories with the
required files listed above.

• Scanning and storing the files directly from an appropriately configured scan
application (described in the corresponding application documentation). For
more information about Scan pipelines, see Section 12.5.8 “Archiving with the
Document Pipeline for SAP” in OpenText Imaging Enterprise Scan - User and
Administration Guide (CLES-UGD) or Section 12.5.9 “Archiving with the
Document Pipeline for TCP” in OpenText Imaging Enterprise Scan - User and
Administration Guide (CLES-UGD).
• Using the Batch import with attributes provided in advance scenario. The
required attribute files are created beforehand. For more information, see
OpenText Document Pipelines - Overview and Import Interfaces (AR-CDP).
• Using the Batch import with attribute extraction scenario - primarily used
for processing document and print lists, also referred to as the COLD scenario:
the required index files are created automatically. For more information, see
Section 7.2 “Batch import with extraction of attributes (COLD)” in OpenText
Document Pipelines - Overview and Import Interfaces (AR-CDP).

Note: The scenario used to provide the documents is reflected by the name of
the Document Pipeline. The first two letters of the name have the following
meaning:

• CO: COLD – attributes are extracted automatically


• EX: Batch import – attributes are provided in advance
• SC: Scanning – attributes are provided by scan application/pipeline

For example, the Document Pipeline named COR3 uses the COLD (attribute
extraction) scenario to transfer documents to an SAP R/3 (or any other SAP)
system.

From the exchange directory, the document files are transferred to the DP document
directory using a special (enqueue) tool.

Enqueue Tool
The client provides the documents for processing by executing an enqueue tool
(usually as a scheduled job). This tool performs the following steps:

• Creating a Document Directory (subdirectory) below the Document Pipeline


Directory with an unique name
• Copying all document components from the exchange directory to the Document
Directory the server the Document Pipeline is installed on
• Notifying the DP that a document is ready for processing and passing on the
Document Directory and the Document Pipeline to be used in the next step.

Once processing has started, the DPqStatus file is also located in this directory.

20 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
Chapter 3
Deploying Document Pipelines

Document Pipelines can be installed on arbitrary servers, for example, Archive


Center, scan stations, TCP Business Object Layer, or standalone.

Important
For standard Document Pipeline scenarios that have been fully installed and
configured, no further customizing is required by the user to deploy the
pipelines on the server. Simply ensure that the Spawner service is running
properly, provide the documents to be processed in the specified manner, and
start the pipeline (for instance by scheduling a job; see “Starting the Document
Pipelines Using Jobs” on page 27). Additional environment settings are
available to improve performance or adapt to specific requirements; these are
described in “Environment Variables” on page 24. For new or customer-
specific pipelines, additional configuration tasks are always required. For
information, ask your OpenText consultant.

3.1 Document Pipeline Process on the Server


Tip: For general information about Document Pipelines, see “Document
Pipeline Concept“ on page 15.

Starting the DP The Document Pipeliner (DP), which controls and administers the Document
Pipeline processes, is provided as a Spawner component and is included in the
Archive Center installation. The DP is started when the Spawner opens the
corresponding configuration file (30dp.servtab), and then waits for DocTools to
register with it. For details on the Spawner, see also Section 32.2 “Analyzing
processes with spawncmd” in OpenText Archive Center - Administration Guide (AR-
ACN).

Registering the The DocTools are configured in further .servtab files and must be started after the
DocTools DP in order to work properly. This is ensured by adhering to the naming
conventions for the servtab files; these files are processed in alphabetical order.
Once started, the DocTools try to register with the DP by sending their DocTool type
and the function that is to be called by the DP when a document is waiting to be
processed (service function). Different types of the same DocTool must be registered
individually; see also “DocTool types” on page 18.

When the DP receives a request for registration, it checks the DP configuration files
in the <ECM_DOCUMENT_PIPELINE_CONF>/dpconfig directory to determine which
DocTool types are employed in the Document Pipelines. If the DP finds an entry for
the DocTool, the registration is accepted; if not, it is rejected. Registration is
important because during document processing, documents are only sent to
DocTools that have been registered with the DP. If a DocTool is stopped, it signs off
from the DP first so that the DP no longer sends documents to that tool.

OpenText Document Pipelines – Programming Guide 21


AR160200-PDP-EN-1
Chapter 3 Deploying Document Pipelines

Note: DocTools can be configured to stop automatically when their input


queue is empty.

Tip: To find out which DocTools are registered and active, you can use the
dpctrl command line tool; see “dpctrl” on page 48.

Log files If registration fails, an entry is made in the DP log file, as well as in the
corresponding DocTool log file. The log files can be found here:

DP log files
Windows
<ECM_LOG_DIR>\DP.log

UNIX/Linux
<ECM_LOG_DIR>/DP.log

DocTool log files


Windows
<ECM_LOG_DIR>\<doctool_name>.log

UNIX/Linux
<ECM_LOG_DIR>/<doctool_name>.log

When all DocTools have registered, the DP is ready to process documents, and runs
in the background.

Using enqueue The client provides the documents for processing by executing an enqueue tool. This
tool tool is the first DocTool of most Document Pipelines and is usually started by
scheduling the execution of the corresponding pipeline as a regular job
(start<pipeline_name>, for example startEXR3) in the OpenText Administration
Client; see Section 7 “Configuring jobs and checking job protocol” in OpenText
Archive Center - Administration Guide (AR-ACN) . This tool checks all directories in
the specified exchange directory until it finds one that contains an indicator file
named LOG. The LOG file indicates that the document directory is complete and ready
to be processed.

The enqueue tool then copies the documents from the specified exchange directory
to a defined document directory (by default, <DPDIR>/<providing_server_name>/
m) on the server the Document Pipeline is installed on (provided the client has write
access to the <DPDIR>). For each document, a subdirectory with a unique name is
created, which then uniquely identifies the document. The enqueue tool also sends
the document path to the DP, and informs the DP which pipeline is responsible for
processing.

22 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
3.1. Document Pipeline Process on the Server

Document The DP then sends the document path to the input queue of the first DocTool of the
processing by specified pipeline, as defined in the corresponding dpconfig file. Thus, the DocTool
the DP
knows where the document components are located and can begin processing.
When processing is complete, the first DocTool notifies the DP about the status of
the operation (opcode) via the DPqStatus file. Depending on this opcode, the DP then
sends the document path to the next input queue. This continues until the document
has passed through the entire pipeline, at which time document processing is
completed.

Queue Within a single queue, documents entering this queue first are assigned to the
processing DocTool first. By default, only one DocTool at a time can process documents from
the same queue. That means a queue is blocked as soon as a DocTool starts to access
it. The DP will not provide documents from that queue to a second DocTool.
However, you can specifically define a queue as a non-blocking queue in the
dpconfig file. In this case, several DocTools may read from the same queue
simultaneously. Only the document that is currently being processed is blocked,
thus avoiding a situation in which the same document is processed simultaneously
by several DocTools.

Tip: You should generally define queues as non-blocking to increase


performance. A blocking queue is only useful in rare cases, for example if the
processing order of the documents is relevant.

On the other hand, several queues may be assigned to the same DocTool. In this
case, there are different modes to handle the order in which the queues are
processed; see also queuetime, doctime on page 33. Using the standard
configuration, input is provided to the DocTool by the queue that has not been read
from for the longest time (queuetime mode). This mode ensures a balanced
processing of queues.

DP recovery If for any reason the DP process is interrupted, it is automatically resumed after
restarting the DP. In this case, the defined <DPDIR> is searched to find any
directories containing a DPqStatus file. As this file contains the current status of the
file, the next processing step can be determined by the DP and is carried out.

Errors during In case the enqueue tool fails, an error file, which contains an error message, is
enqueuing created in the source directory. Before the document can be enqueued once again,
you must delete this error file. If the enqueuing was started by OpenText
Administration Client, the error message also appears in the job's messages.

Tip: For testing purposes, the enqueue tool has an option -test that does not
require a DP. When Enqueext is started with this parameter, it expects the
same directory structure as described above, and copies all files to the
subdirectory test in the source directory.

Which documents were enqueued to the Document Pipeline while processing a job
can be seen in the file
<ECM_VAR_DIR>/messages/job_start<pipeline_name>_<num>.log,
where start<pipeline_name> is the job for the corresponding pipeline and <num>
is a consecutive number that is incremented every time a job is started.

OpenText Document Pipelines – Programming Guide 23


AR160200-PDP-EN-1
Chapter 3 Deploying Document Pipelines

3.2 Environment Variables


Some general configuration settings for Document Pipelines are stored in
environment variables contained in the COMMON.Setup setup file:

Windows
<ECM_DOCUMENT_PIPELINE_CONF>\config\setup\COMMON.Setup

UNIX/Linux
<ECM_DOCUMENT_PIPELINE_CONF>/config/setup/COMMON.Setup

These settings are used by all DocTools, and contain the connection information for
the storage system, for example, or the common Document Pipeline directory
(<DPDIR>). The most important common settings are described below. In addition,
there are some pipeline-specific settings stored in separate entries, which are only
used by specific Document Pipelines.

A complete list of all available configuration parameters (including those for


Document Pipelines) on Archive Center is provided in Part VII “Configuration
parameter reference” in OpenText Archive Center - Administration Guide (AR-ACN).

Table 3-1: Environment variables for Document Pipelines

Variable Possible value(s) Description


General variables
DPHOST Document Pipeline host
ALPORT Port number for the ArchiveLink connection
ALHOST Host name for the ArchiveLink connection,
that is Archive Center connected via HTTP.[a]
DPPORT Port number of the Document Pipeline
(default: 4032)
DPDIR Common Document Pipeline directory for
documents
EXT_DIR External directory for batch import of
documents (content and attributes) into all EX..
pipelines (for example EXR3)
DATA_DIR Exchange directory for COLD documents
(content prepared for automatic extraction of
attributes); the root of the directory structure
for document/print lists
ECM_DOCUMENT_ The root directory for the Document Pipeline
PIPELINE_CONF configuration files

24 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
3.2. Environment Variables

Variable Possible value(s) Description


CONFIG_DIR The root directory for the batchimport
configuration files
(<ECM_DOCUMENT_PIPELINE_CONF>/
config/batchimport)
DPCTO <sec>/<sec> DPCTO creates a timeout. The first parameter
is the so-called “call timeout.” This is the
overall maximum time (in seconds), the DP
waits for an answer from a DocTool he called
(default: 5 sec.). After this period of time, the
call is no longer repeated and marked as
erroneous. Increase the value for slow systems.
The second parameter represents a “retry
timeout” (in seconds), which indicates how
long the DP waits for an answer from a
DocTool before it repeats the call. This
parameter has generally a far smaller value
than the first one. In case of an error the call is
repeated as long as the “call timeout” is not yet
reached.
Example: DPCTO=60/5. The DP repeats the
call every 5 seconds if he does not get an
answer from a DocTool. After 60 seconds, the
DP stops calling.
DTCTO <sec>/<sec> DTCTO creates a timeout. The first parameter
is the so-called „call timeout“. This is the
overall maximum time (in seconds), a DocTool
waits for an answer from the DP. (default: 5
sec.). After this period of time, the call is no
longer repeated and marked as erroneous.
Increase the value for slow systems.
The second parameter represents a “retry
timeout” (in seconds) which indicates how
long a DocTool waits for an answer from the
DP before it repeats the call. This parameter
has generally a far smaller value than the first
one. In case of an error the call is repeated as
long as the „call timeout“ is not yet reached.
Example: DTCTO=60/5. A DocTool repeats the
call every 5 seconds if it does not get an answer
from the DP. After 60 seconds, the DocTool
stops calling.
RFCSYS The SAP system ID (alternative to specifying
the R3_DESTINATION value in the COMMANDS
file; see OpenText Document Pipelines - Overview
and Import Interfaces (AR-CDP))
NUMBER_MEX_DIR Number of document directories to be created
below <DPDIR> (default: 1)
Log settings

OpenText Document Pipelines – Programming Guide 25


AR160200-PDP-EN-1
Chapter 3 Deploying Document Pipelines

Variable Possible value(s) Description


LOG_DEBUG ON Log level debug; very detailed information
OFF Default: OFF
LOG_ENTRY ON Log level entry; function calls
OFF Default: OFF
LOG_ERROR ON Log level error; error messages
You cannot change this setting.
LOG_WARNING ON Log level warning; warnings
OFF Default: OFF
LOG_REL ON Log level relative time; indicates how long
OFF each logged action took
Default: OFF
LOG_INFO ON Log level info; informational message
OFF Default: OFF
LOG_DSH ON Log level for HTTP communication
OFF Default: OFF
MAXLOGSIZE Maximum size of a log file (in bytes);
Default: 500000
Special variables
NFS_HOST For COLD pipelines only:
Name of the UNIX host from which
documents are enqueued to a Document
Pipeline on a Windows system
FILES_TO_CONVERT regular expression For COLD pipelines only:
Files for which encoding must be converted,
for example for Unicode systems
COLD_RM ON For pipelines using enqueueco only:
OFF Defines whether the files contained in the
with directory document directory should be removed after
the document is processed (remove: ON). If the
directory itself should be removed as well, use
with directory.
INSERTS_PER_RFC For SAP pipelines only:
The number of insert statements to be sent to
the SAP system within one RFC call; see
OpenText Document Pipelines - Overview and
Import Interfaces (AR-CDP)
[a] Short for ArchiveLink host; specifies the Archive Center connected over HTTP. It replaces DSHOST.

This variable is used by all DocTools that use the Archive API (for example, doctods), as well as by the
DP. To determine the contents of the ALHOST variable, you can use the dpctrl adms command.

DocTool option env

There is a DocTool option -env <PKG1>,<PKG2>,... to read the setup


variables from each package specified, for example:

26 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
3.3. Starting the Document Pipelines Using Jobs

doctods -type doctods1 -env FILING

This allows you to define a separate environment for the individual DocTools,
for example, to specify different ports for different tools. The FILING.Setup
file contains the common variables for all DocTools relevant for batch import
(for example Prepdoc, Enqueext, Enqueco). You must define the arguments for
the -env option in the corresponding servtab file that registers the DocTool.
For details on modifying the servtab files, see “servtab” on page 43.

Alternatively, you can specify an argument for the scheduled job; see “Starting
the Document Pipelines Using Jobs” on page 27.

3.3 Starting the Document Pipelines Using Jobs


Document Pipelines for batch import (all pipelines whose names start with “CO” or
“EX”) are started by scheduling the execution of the corresponding pipeline as a job
in the Administration Server; see Section 7 “Configuring jobs and checking job
protocol” in OpenText Archive Center - Administration Guide (AR-ACN). For each
batch import standard pipeline, a predefined job named start<pipeline_name> is
created during installation, for example startEXR3. You can schedule the job to be
executed in regular intervals, so that the documents in the Document Pipeline
directories are processed periodically.

For special requirements, additional arguments can be defined for the job.

Table 3-2: Job arguments for Document Pipelines

Argument Description Examples


-test The enqueue tool copies all the files to
the subdirectory .test of the source
directory, for example
<ECM_DOCUMENT_PIPELINE_CONF>/
config/batchimport/commands/
<document_type>
-env <setup file The setup variables from the package -env MYENV
name1 (without specified in the setup file are used.
.setup
extension)>,
..,<setup file
name[n] (without
.setup
extension)>
-ext_dir The specified exchange directory is E:/Scan_provider/
used instead of the default directory. EXT_DIR,
For Windows: if necessary, replace any \\\\host\
“\” in the original path with “\\”, and \Scan_Provider\
any “\\”in the original path with “\\\ \EXT_DIR
\”; alternatively, use “/”

OpenText Document Pipelines – Programming Guide 27


AR160200-PDP-EN-1
Chapter 3 Deploying Document Pipelines

Argument Description Examples


-loglevel <0-12> Increases or decreases the default log -loglevel 10
level; useful for troubleshooting; see
OpenText Document Pipelines - Overview
and Import Interfaces (AR-CDP)
-nfshost <host> Documents are imported from a UNIX -nfshost hpserver1
host to a pipeline on a Windows
system.
-inputenc Each file defined in the variable -inputenc ISO8859_1
<encoding> FILES_TO_CONVERT is converted from
the specified encoding to the encoding
used internally (UTF8N). Use mainly
for the IXATTR or data files; character
conversion should not apply to any
binary files and other configuration
files (for example FORM.INFO) that are
comprised of ASCII characters.
-outputenc Converts the output files to the -outputenc
<encoding> specified encoding, for example for ISO8859_1
generated IXATTR files for non-unicode
systems.
(default encoding: UTF8)

28 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
Part 3
Reconfiguring Document Pipelines
Part 3 Reconfiguring Document Pipelines

This part describes:

• Structure of configuration files to be created or modified for inserting or


removing DocTools.
• Command line tools for applying changes in Document Pipelines.
• How to insert and remove DocTools from Document Pipelines.
• How to optimize the performance of Document Pipelines.

We recommend that you create a new document pipeline based on a copy of an


existing one instead of modifying an existing standard pipeline.

30 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
Chapter 4
Standard Configuration Files

4.1 dpconfig
The configuration file for the DP (dpconfig) defines which DocTools are executed in
which order for each Document Pipeline. Each DocTool retrieves the documents
from a specified source queue, carries out an operation, and - depending on the
result of the operation - sends the documents on to a specified target queue or an
error queue.

If the same DocTool is to be executed several times, but with different queues as
input and output sources, you can define DocTool types. DocTool types can be
distinguished by their names. The result of a DocTool operation is indicated by the
opcode in the DPqStatus file.

Thus, for each DocTool type and each possible opcode, the dpconfig file contains an
entry with the following syntax:

<source queue1>.<doctooltype1>.<opcode1> → <target_queue1>


<source queue1>.<doctooltype1>.<error> → <error_queue1>

The special queue type nil is used for the beginning and end of the pipeline. nil as
a source indicates that the DocTool in question is one that creates document
directories. When nil is the target, the DP removes the processed document from its
administration.

The first entry in the dpconfig file has the following syntax:

nil.<doctooltype1>.done → <target_queue1>

The last DocTool entry in the dpconfig file has the following syntax:

<source queue_n>.<doctooltype_n>.ok → nil

Based on these entries, the Document Pipeliner (DP) can determine where to
continue document processing after an interruption by checking which queue the
document is currently in, which opcode is assigned to it, and then finding the correct
entry for that combination in the dpconfig file.

Additional entries

In addition to the DocTool opcode entries, other parameters are available to


configure processing by the DocTools. The syntax for all parameters is:

<doctool_type>: <parameter>

Tip: You can find a sample dpconfig file in OpenText Document Pipelines -
Overview and Import Interfaces (AR-CDP).

OpenText Document Pipelines – Programming Guide 31


AR160200-PDP-EN-1
Chapter 4 Standard Configuration Files

Comments (#)
Use the # character to mark comments in the dpconfig file that are to be ignored
by the DP.
<number>
A number defines how many types of the specified DocTool may run
simultaneously in the pipeline. If this number is exceeded when a DocTool tries
to sign on to the DP, the DP rejects the new instance of the DocTool.
Having several DocTool types run simultaneously is useful, for instance, if the
queues cannot be processed quickly enough. The stockist tools, for example,
which process error queues, may require some time before the errors can be
solved. Meanwhile the error queues may overflow if there is only one stockist
tool to do the job.
stopnull
Instructs a DocTool of this type to terminate when its queue is empty. By
default, a DocTool continues to wait for further documents when its queue is
empty. This setting is recommended for the stockist DocTool to keep
erroneous documents in the error queue (otherwise they would be permanently
moved from the error queue to the source queue and back).
runonly
Instructs the DP not to supply any other DocTool with documents while the
specified DocTool type is running. This mechanism can be used to let the
stockist run without any other DocTools disturbing its work (for example
cycling documents).
tellnowork
Instructs the DP to inform a DocTool of this type when it has no more
documents to process. It is up to the DocTool to react to this command (for
example by signing off from the DP).
disabled
Keeps the DocTool running, but not active; useful for test purposes.
(<sec>)
Sets a timeout for the DocTools of this type. If a DocTool of this type requires
more than <sec> seconds to process the document, the DP signs the DocTool off
and instructs it to terminate. This timeout enables the DP to recognize that a
DocTool has finished execution without signing off (for example when the
DocTool has been terminated by an external kill command).
The timeouts should be set to sufficiently large values. If a DocTool exceeds the
timeout under normal running conditions, the document is resubmitted to the
DocTool. This can result in a heavy load being placed on the machine on which
the DocTool runs. We do not recommend changing the timeouts for the
standard DocTools. If this is unavoidable, the new value should be carefully
tested.

32 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
4.1. dpconfig

Note: The Document Pipeline Info window is a special DocTool that


behaves somewhat differently. Since it is not possible to estimate how long
the Document Pipeline Info window will be active, it contacts the DP itself
at regular intervals to signal that it is still active. This time interval is set (in
seconds) using this parameter (for example DPinfo: (600)). If no timeout
is defined for the Document Pipeline Info window in the dpconfig file, the
lifetime of the window is not monitored by the DP.
non-blocking queues: <queue_name> +
By default, only one DocTool at a time can process documents from the same
queue as configured in the dpconfig file. That means a queue is blocked as soon
as a DocTool starts to access it. The DP will not provide documents from that
queue to a second DocTool.
However, you can specifically define a queue as a non-blocking queue, using the
“+” parameter. In this case, several DocTools may read from this queue
simultaneously. Only the document that is currently being processed is blocked,
thus avoiding a situation in which the same document is processed
simultaneously by several DocTools.
Note that using non-blocking queues may result in documents being processed
in a false order by the DocTools.
queuetime, doctime
As a rule, a document queue has a FIFO structure, which means documents
entering this queue first are assigned to the DocTool first. This strategy cannot
be changed. However, several queues may be assigned to the same DocTool. In
this case, there are different modes to handle the order in which the queues are
processed.

• immediate mode
The DocTool receives documents from the queues in the order of their
definition in the dpconfig file. When a queue is empty, the DP switches to
the next queue.
The disadvantage of this mode is that documents in queues that are
configured last in the dpconfig file may have to wait a very long time for
processing, or in the worst case, are never processed at all.
The immediate mode is the default if no other mode is specified.
• doctime mode
In this mode, the DocTool receives input from the queue with the oldest
document (according to the time the document entered the queue). This
avoids individual documents being left unprocessed for longer periods;
however, it may cause other queues that have been filled more recently to
overflow.
If no mode is specified and there are no more documents from immediate
queues, the DP selects documents from the doctime queues.

OpenText Document Pipelines – Programming Guide 33


AR160200-PDP-EN-1
Chapter 4 Standard Configuration Files

• queuetime mode
In queuetime mode, input is provided by the queue that has not been read
from for the longest time.
This mode offers the most balanced processing of queues and is thus
preconfigured for most standard pipelines.

Figure 4-1 illustrates the different behavior.

34 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
4.1. dpconfig

Figure 4-1: Queue processing order for different modes

Example 4-1: Extract from a dpconfig file

Here is an extract from a sample dpconfig file:

nil.EnquedocExR3.done -> ExR3Xsl

OpenText Document Pipelines – Programming Guide 35


AR160200-PDP-EN-1
Chapter 4 Standard Configuration Files

ExR3Xsl.xsl_parser.ok -> ExR3perldtn


ExR3Xsl.xsl_parser.error -> ExR3Xsl_error
ExR3Xsl_error.exr3stock0.ok -> ExR3Xsl
ExR3Xsl_error.stockist.ok -> ExR3Xsl
ExR3Xsl + queuetime

ExR3perldtn.ExR3start.ok -> ExR3Tiff2Mtiff


ExR3perldtn.ExR3start.error -> ExR3perldtn_error
ExR3perldtn_error.exr3stock1.ok -> ExR3perldtn
ExR3perldtn_error.stockist.ok -> ExR3perldtn
ExR3Xsl + queuetime
...
ExR3Remove.docrm.ok -> nil
ExR3Remove.docrm.error -> ExR3Remove_error
ExR3Remove_error.exr3stock8.ok -> ExR3Remove
ExR3Remove_error.stockist.ok -> ExR3Remove
ExR3Xsl + queuetime

exr3stock0: runonly
exr3stock1: runonly

...
exr3stock0: stopnull
exr3stock1: stopnull
...
exr3stock0: 1
exr3stock1: 1

ExR3start: 1

4.2 dpinfo
OpenText Document Pipeline Info (DPInfo) is a utility for monitoring OpenText
Document Pipelines. With it, you can monitor the pipeline processes, making sure
that documents have been correctly processed. If an error occurs, you can quickly
locate the problem. DPInfo shows how many documents are waiting to be processed
by each DocTool. In case of errors, you can determine which error queues contain
documents.

DocTools create a protocol file named DPprotocol for every document. This file
contains brief information about the document processing results. The DPprotocol
file is displayed by DPInfo.

For details on the DPInfo monitoring tool; see OpenText Document Pipelines -
Overview and Import Interfaces (AR-CDP). In the dpinfo file, you can configure how
the information about the DocTools is displayed in DPInfo.

All DocTool types used in a pipeline are included in the flow construct:
flow("Pipeline description")

36 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
4.2. dpinfo

queue( ...)

...
queue( ...)

The Pipeline description is displayed in the DPInfo window.

For each DocTool type, the dpinfo file contains a queue entry with the following
syntax:

queue("DocTool description", "DocTool Name (Type)", "source queue


for documents" [,"error queue"] ) {
stockist("stockist DocTool")
}

Table 4-1: queue entry parameters in the dpinfo file

Parameter Description
DocTool description Arbitrary DocTool description that is displayed in the
DPInfo window.
DocTool Name (Type) DocTool to be executed.
source queue for documents The DocTool takes the documents from this source queue.
error queue If errors occur during processing, the affected documents
are moved to this error queue. This parameter is optional.
stockist DocTool DocTool that returns documents from the error queue to
the source queue.

To support different languages, you can provide a lang construct for each language
containing the translated description strings for the flow construct and the queue
entries. DPInfo currently supports Japanese (language code JPN) and German
(language code GER). The syntax is as follows:

lang("3 char language code") {


"Pipeline description in source language"="Pipeline description
in target language"
"DocTool1 description in source language"="DocTool1 description
in target language
....
"DocTooln description in source language"="DocTooln description
in target language
}

Special characters have to be encoded in Java Unicode, starting with \u. We


recommend using a Java Unicode conversion tool to create or edit text with many
special characters.

OpenText Document Pipelines – Programming Guide 37


AR160200-PDP-EN-1
Chapter 4 Standard Configuration Files

Example 4-2: Extracts from a dpinfo file

This example shows extracts from a dpinfo file and how this configuration
is visible in the screenshot below. For the flow construct and for each queue
entry, a line appears in the DPInfo window.
flow( "Import content and attributes into DocuLink (EXR3)" )
{
queue( "Parse document by XSL", "xsl_parser", "ExR3Xsl" )

{
stockist ("exr3_stock_xsl_parser")
}
queue( "Check document", "ExR3start", "ExR3Perldt" )
{
stockist ("exr3_stock_exr3start")
}
queue( "Copy document to document pipeline", "cpfile",
"ExR3Cpfile" )
{
stockist("exr3_stock_cpfile")
}
queue( "Convert TIFF to Multi-page TIFF", "Tiff2Mtiff",
"ExR3Tiff2Mtiff" )
{
stockist("exr3_stock_tiff2mtiff")
}
queue( "Select Archive ID from R/3","R3AidSel", "ExR3AidSel" )
{
stockist ("exr3_stock_r3aidsel")
}
queue( "Remove document from document pipeline", "docrm",
"ExR3Remove" )
{
stockist ("exr3_stock_docrm")
}
}
lang("JPN") {
"Import content and attributes into DocuLink (EXR3)"="COLD for
DocuLink: NCI \u6587\u66f8 (EXR3)"
"Parse document by XSL"="XSL \u30d7\u30ed\u30bb\u30c3\u30b5\u306b
\u3088\u308a\u6587\u66f8\u3092\u51e6\u7406\u3057\u307e\u3059"
"Check document"="\u6587\u66f8\u3092\u30c1\u30a7\u30c3\u30af
\u3057\u307e\u3059"
"Convert TIFF to Multi-page TIFF"="Multi-page TIFF
\u30d5\u30a1\u30a4\u30eb\u3092\u4f5c\u6210\u3057\u307e\u3059"
"Copy document to document pipe-
line"="\u6587\u66f8\u3092\u30d1\u30a4\u30d7\u30e9\u30a4\u30f3\u3078\u30
b3\u30d4\u30fc\u3057\u307e\u3059"
"Select Archive ID from R/3"="\u30a2\u30fc\u30ab\u30a4\u30d6 ID
\u3092 R/3 \u30b7\u30b9\u30c6\u30e0\u304b\u3089\u9078\u629e\u3057\u307e
\u3059"
...
"Remove document from document pipe-
line"="\u6587\u66f8\u3092\u30d1\u30a4\u30d7\u30e9\u30a4\u30f3\u304b
\u3089\u524a\u9664\u3057\u307e\u3059"

lang("DEU") {

38 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
4.2. dpinfo

"Import content and attributes into DocuLink (EXR3)"="COLD f


\u00fcr DocuLink: Attribut\u00fcbergabe im Voraus (EXR3)"
"Parse document by XSL"="Dokument mit XSL parsen"
"Check document"="Dokument pr\u00fcfen"
"Convert TIFF to Multi-page TIFF"="TIFF nach Multipage-TIFF
konvertieren"
"Copy document to document pipeline"="Dokument in die Document
Pipeline kopieren"
"Select Archive ID from R/3"="Archiv-ID in R/3 ausw\u00e4hlen"
...
"Remove document from document pipeline"="Dokument aus der
DocumentPipeline entfernen"
}

Figure 4-2: OpenText Document Pipeline Info (DPInfo) window

OpenText Document Pipelines – Programming Guide 39


AR160200-PDP-EN-1
Chapter 4 Standard Configuration Files

4.3 monitor
Use the OpenText Archive Monitoring Web Client to check the activities of the
individual archive components and the free storage space in the pools, database, and
the OpenText Document Pipeline. For a detailed description, see Section 28 “Using
OpenText Archive Server Monitoring” in OpenText Archive Center - Administration
Guide (AR-ACN). The configuration of the Archive Monitoring Web Client is saved
in the monitor files that are located in the directory
<ECM_DOCUMENT_PIPELINE_CONF>/config/monitor.

The monitor configuration files have the following structure:

# Comment
"Group" = {
group = nul {}
component_name = component_type {
parameter1 = …
parameter2 = …
}
}

The Archive Monitoring Web Client shows the status of the configured components,
i.e. DocTools.

You can define group and component_name arbitrarily. group must be defined in
each group. In most cases, the value nul{} is used, but any other component_type
can also be used.

The component types dpt and dpq_error are used in monitor files:

dpt

Description
dpt checks whether a DocTool is lazy, disabled or working. To check the results
of a dpt entry manually, use the command line call dpctrl tools <toolname>.
In case a DocTool terminates unexpectedly (not the stop of a DocTool), this is
not visible in the Archive Monitoring Web Client.

Parameters

hostname Machine name where the DP runs (default: localhost)


port Port number (default: 4032)
versnum Version number of the RPC procedure (default: 1)
prognum Program number of the RPC procedure (default: 20231190)
protocol Communication protocol (default: udp)
toolname Defines the name of the DocTool

40 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
4.3. monitor

maxrun Defines the number of DocTools that have to be online (= number of lazy
DocTools plus number of working DocTools).
If the number of lazy DocTools plus the number of working DocTools is
greater than maxrun, the status changes to WARNING, otherwise the
status is OK. If maxrun is not defined, it is set to 0. This means that there
is no check whether all DocTools are running, i.e. the status is always OK.

Possible status

Registered 0
Warning 50
Not registered 100

dpq_error

Description
dpq_error checks whether a queue is online. To check the results of a
dpq_error entry manually, use the command line call dpctrl queues <queue>.

Parameters

hostname Machine name where the DP runs (default:localhost)


port Port number (default: 4032)
versnum Version number of the RPC procedure (default:1)
prognum Program number of the RPC procedure (default: 20231190)
protocol Communication protocol (default: udp)
queuename Name of the error queue

Possible status

Empty 0
Can't call server 98
Can't connect to 99
server
Not empty 100

The default values for the parameters of a component_type can also be specified by
environment variables. Environment variables define the default value of the
component type parameters for all component types, whereas the parameters in the
monitor configuration files only refer to special components.

Example 4-3: exr3.monitor

This example shows the standard monitor file for the EXR3 pipeline. The
Extern group contains the components, in this case the DocTools, that are to

OpenText Document Pipelines – Programming Guide 41


AR160200-PDP-EN-1
Chapter 4 Standard Configuration Files

be monitored (The component type dpt stands for Document Pipeline


Tool). The component names (for example, xsl_parser and Extstart) at
the beginning of the lines are the names that are displayed in the Archive
Monitoring Web Client. The Extern Error Queues group contains the error
queues where documents are put into in case an error occurs when the
corresponding DocTool of the Extern group processes these documents.

# DP Tools
#---------
"EXR3" = {
group = nul { }
xsl_parser = dpt { toolname= xsl_parser }
ExR3start = dpt { toolname= ExR3start }
cpfile = dpt { toolname= cpfile }
Tiff2Mtiff = dpt { toolname= Tiff2Mtiff }
R3AidSel = dpt { toolname= R3AidSel}
R3Formid = dpt { toolname= R3Formid}
Prepdoc = dpt { toolname= Prepdoc}
GenR3ins = dpt { toolname= GenR3ins}
page_idx = dpt { toolname= page_idx}
rendition = dpt { toolname= rendition }
doctods = dpt { toolname= doctods }
R3Insert = dpt { toolname= R3Insert}
docrm = dpt { toolname= docrm}
}

"EXR3 Error Queues" = {


group = nul { }
"xsl_parser"= dpq_error { queuename = ExR3Xsl_error }
"ExR3start" = dpq_error { queuename = ExR3Perldt_error }
"cpfile" = dpq_error { queuename = ExR3Cpfile_error }
"Tiff2Mtiff"= dpq_error { queuename =
ExR3Tiff2Mtiff_error }
"R3AidSel" = dpq_error { queuename = ExR3AidSel_error }
"R3Formid" = dpq_error { queuename = ExR3Formid_error }
"Prepdoc" = dpq_error { queuename = ExR3Prepdoc_error }
"GenR3ins" = dpq_error { queuename = ExR3GenR3ins_error }
"page_idx" = dpq_error { queuename = ExR3Page_idx_error }
"rendition" = dpq_error { queuename =
ExR3Rendition_error }
"doctods" = dpq_error { queuename = ExR3Doctods_error }
"R3Insert" = dpq_error { queuename = ExR3Insert_error }
"docrm" = dpq_error { queuename = ExR3Remove_error }

42 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
4.4. servtab

4.4 servtab
The servtab configuration files specify which processes the Spawner has to start.
These processes are the Document Pipeliner and the DocTools. This allows you to
define a separate environment for the individual DocTools, for example to specify
different ports for different tools.

4.4.1 servtab File Structure


The servtab file contains the following elements:

• Definitions of variables that are valid in this servtab file by using the globenv
parameter. With these variables, you can shorten the servtab line entries.

• Servtab line entry for a DocTool which defines how to start the DocTool, see
“Syntax of an servtab Line Entry” on page 43.

• Comments

The syntax of the globenv variable definitions is as follows:

globenv ; <variable name>=<value>

Use a separate line for each variable. To reference a variable, use the prefix $.

4.4.2 Syntax of an servtab Line Entry


Each servtab line contains five options, which are separated by a “;”. Such a line
has the following syntax:

Name_of_the_entry;{once|wait|respawn|manual|stop|kill};{no|yes};
[local Environment];[working directory of the program];command with
parameters. The following table describes the options in detail. DocTools are here
referred to as processes, because there are other processes which are no DocTools.

Table 4-2: servtab parameters

Parameter Option Description


Name of the - Process name used in the spawncmd commands. It is shown in the
entry “spawncmd status” output. Needs to be unique through all
servtab files.
Start/Stop once Start the process only once, and do not wait for its termination.
mode When it dies, do not restart the process.
wait Start the process and wait for its completion before continuing
with the next servtab entries. This is used for programs that
must have completed before the next program runs.

OpenText Document Pipelines – Programming Guide 43


AR160200-PDP-EN-1
Chapter 4 Standard Configuration Files

Parameter Option Description


respawn If the process does not exist, start the process. Do not wait for its
termination (continue scanning the servtab files). Restart the
process when it dies. If the process exists, do nothing and continue
scanning the servtab files.
manual Process will not be started during the Spawner startup, it is started
only manually with spawncmd start.
stop The normal action, where the Spawner terminates the process is
replaced by calling the stop command. This command is
executed, when a spawncmd stop is issued, and the Spawner will
not stop the process itself. The name of the entry must be the same
as that for a normal line starting with respawn/wait/once/
manual.
kill The normal action, where the Spawner kills the process, is
replaced by calling the kill command. This command is
executed when a spawncmd kill is issued, and the Spawner will
not kill the process itself. The name of the entry must be the same
as that for a normal line starting with respawn/wait/once/
manual.
Wait for yes Waits until the process is initialized. This is used to prevent
Initialization programs from starting before a program that they need is
initialized. The program indicates to the Spawner that it is
initialized by sending the entry name to the named pipe
spawner.sync (Unix) or \\.\pipe\spawner.sync
(Windows).
no Do not wait.
Local - Settings for this process only. Parameters have the form
environment PARAM=VALUE and are separated by “.”.
Working - The directory that the process uses as the current directory.
directory
Command - Call of program, script etc. to be executed.
with
parameters

Tip: Use the stop option to define a stop instruction for a DocTool (process) in
a way that the DP is aware of this stop. This prevents problems when
restarting a DocTool and updates the DPInfo window.

Example 4-4: servtab file with stop option

...
my_xsl_parser1;once;no;;$BINDIR;$BINDIR/xsl_parser -type xsl_parser_my -loglevel 9 -
logfile my_xsl_parser_1.log

my_xsl_parser1;stop;no;;$BINDIR;$BINDIR/dpctrl force xsl_parser_my


...

Use the spawncmd stop my_xsl_parser1 command to stop the DocTool.

44 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
4.4. servtab

OpenText Document Pipelines – Programming Guide 45


AR160200-PDP-EN-1
Chapter 4 Standard Configuration Files

4.4.3 servtab Example (For Windows)


The 87exr3.servtab file contains the configuration of the Extstart DocTool.

Example 4-5: 87exr3.servtab


globenv ; SCRIPTDIR=$ECM_DOCUMENT_PIPELINE_DL\scripts\perl
globenv ; BINDIR=$ECM_DP_PERL_10_0_0\bin
globenv ; PERLDIR=$ECM_DP_PERL_10_0_0\perl-5.8.5\bin
globenv ; PATH="$BINDIR;$PERLDIR;$Path;.";Path=$PATH
globenv ; LOG=$ECM_LOG_DIR
globenv ; PERL5LIB="$ECM_DP_PERL_10_0_0\perl-5.8.5\lib;$ECM_DP_PERL_10_0_0\lib
\perl-5.8.5;$ECM_DP_PERL_10_0_0\perl-5.8.5\site\lib"

ExR3start;once;no;;$LOG;$BINDIR\perldtn -type ExR3start -script "$SCRIPTDIR\exr3.pl"

The lines starting with globenv define variables that are valid in this servtab file.
For example, the variable LOG has the value $ECM_LOG_DIR and is used in the last
line as working directory.

The last line of the file, which is the most important one, consists of the following
parts:

Extstart once; no; ; $LOG; $BINDIR\perldtn -type


; Extstart -script
"$SCRIPTDIR/exr3.pl
Name of DocTool is Process the No Workin • DocTool binary: perldtn
the entry, started only next line in local g
here once. the script enviro director • DocTool type: Extstart
DocTool without nment y using
the LOG •
perldtn script file to be
type. waiting for specifi
executed, which contains
initializatio ed. variable
the DocTool code:
n of .
$SCRIPTDIR/exr3.pl
Extstart.

46 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
Chapter 5

Command Line Tools

5.1 dtcrt
The dtcrt program is a DocTool that is used to register newly created documents
with the Document Pipeliner (DP). It is used in the interface to host machines and in
the archive interface for text documents. dtcrt signs on to the DP and passes it the
path names of new document directories. dtcrt does not check whether these path
names actually exist or whether they are correct. Usage:

dtcrt [options] directory1 [directory2 ... directoryN]

The following options can be used as arguments of the dpctrl command:

Option Default Description


value
-type <doctooltype> dtcrt DocTool type under which dtcrt signs on to the
DP (required option).
-op <operation> ok Operation that dtcrt registers with the DP
(required option).
-dphost <hostname> localhost Name of the computer on which the DP runs.
-comm <comment> by dtcrt Comment which DP writes to the DPqStatus file
when it has received the corresponding document.
-npipe <named pipe> - File from which dtcrt reads the name of the
documents that it passes to the DP.
-P <DP port> 4032 If two DPs run on the same computer, you must
specify a port to select the DP where the documents
have to be registered (default port: 4032).
-subdir <dir> - Enqueue all documents from this directory.
-help - Outputs a usage message.

Example 5-1: Passing a document to the Document Pipeliner

dtcrt -type hostdoca -op create -dphost byzanz /


ext01/11121651471002

In this example, dtcrt logs on to the DP as hostdoca and passes it a


document under the relative path name byzanz/ext01/11121651471002.
As is the case for all the DocTools, this path name is relative to the $DPDIR
directory.

OpenText Document Pipelines – Programming Guide 47


AR160200-PDP-EN-1
Chapter 5 Command Line Tools

5.2 dpctrl
The dpctrl command line tool enables the user to interact with the DP or DocTools,
and to obtain information relating to both. This is especially useful for
troubleshooting and determining which tools are registered and active.

The command syntax is:

dpctrl [-dphost <host>] <arguments>

where:

• -dphost <host> specifies the DP host for which you require information. If no
host is specified, the local host is used.

• <arguments> can be any of the arguments listed in the table below. If the
command refers to one or more DocTools, enter a list of the respective DocTool
types, separated by commas.

Argument Description
-h Help on the dpctrl tool
alive Return code for the dpctrl command, indicates
whether the DP is still running
list Causes the DP to write its internal lists to the DP log file
queues [<name>] Returns a list of all queues in the DP. The queue name,
the number of documents in this queue, the queue's
blocking status (- indicates a blocking queue; + indicates
a non-blocking queue), and the name of the DocTool
currently blocking the queue (- indicates that the queue
is not being blocked at present) are output. If a queue
name is given, this information is output for just this
queue.
tools [<name>] Generates a list of all DocTools known to the DP. The
timeout for each DocTool and the queue from which it is
currently reading are listed. If the DocTool is inactive,
that is it does not process a document, “lazy” is output
instead of the name of the queue. If a DocTool name is
given, a list of queues from which the DocTool wants to
read is also produced.
params [<name>] Generates a list of parameters that can be set in the
dpconfig file. The number of instances allowed for
each DocTool, the non-blocking queues, the DocTool
timeouts, etc. are output. If a parameter name is given,
the current setting for just this parameter is output.
rules [<name>] Lists the rules according to which a document can move
from one queue to another. If a DocTool name is given,
only the rules for the specified DocTool are output.

48 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
5.2. dpctrl

Argument Description
docdirs [<queue>| Returns the relative path name of all documents
<directory>] currently under DP administration. If a queue is
specified, documents are output only from that queue. If
a directory is specified, only that document is returned
with its queue information.
docdirscom Returns the same output as the docdirs argument, but
[<queue>| appends the comment written by the last DocTools.
<directory>]
fssize Returns the free space (in KB) and the number of free
[<directory>] inodes on the partition containing the directory. The
output contains also a number identifying the partition
(which can be used to detect common partitions).
stop <DocTools> Signs off the specified DocTools after the current
document has been processed
force <DocTools> Signs off the specified DocTools immediately. This call is
used, for example, to notify the DP that a DocTool has
failed.
stopnull <DocTools> The specified DocTools are stopped if there are no jobs
waiting to be done. This call is equivalent to the entry
DocTool: stopnull in the dpconfig file.
pwd Outputs the current directory for the Document
Pipeliner
enable <DocTools> The specified DocTools are enabled, which means that
they are supplied with documents again.
disable[<n>] The specified DocTools are disabled for <n> seconds
<DocTools> (default is forever), which means that they are no longer
supplied with documents, but are not shut down.
msg <DocTools> The specified message is sent to the DocTools. It is not
<message> necessary to enclose the message in single quotation
marks, even if it is made up of several words.
Use this command to define the log level of an
individual DocTool dynamically. For more information,
see OpenText Document Pipelines - Overview and Import
Interfaces (AR-CDP).
shutdown The DP sends a stop command to all registered
DocTools. When all DocTools have stopped, the DP
stops itself. The DP stops within 150 seconds, even if one
or more DocTools have not yet signed off.
loglevel [<lev>] Enables the log level of the DP to be set dynamically (at
run time) for test and debug purposes. This function
should be used with care as the DP is significantly
slower when the log level is set high. You can define the
level of detail for the log files from 0 (only fatal errors) to
12 (very detailed debug information).
version Returns the version of the currently running DP

OpenText Document Pipelines – Programming Guide 49


AR160200-PDP-EN-1
Chapter 5 Command Line Tools

Important
All parameters and values of the dpctrl command line tool are case sensitive.

Example 5-2: Increasing the log level

The following command sets the DEBUG log level of the R3Insert DocTool.

dpctrl msg R3Insert setLogLevel DEBUG on

The following command sets the INFO log level of the R3Insert DocTool.

dpctrl msg R3Insert setLogLevel INFO on

5.3 spawncmd
With the spawncmd utility, you can query the status of the individual archive
processes and control individual processes. With this utility, you can stop and
restart individual processes or all processes at once. After changing or creating a
servtab file, use the command spawncmd reread to force the Spawner to read the
new servtab file. For details on the spawncmd utility, see OpenText Archive Center -
Administration Guide (AR-ACN).

50 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
Chapter 6
Inserting DocTools

6.1 Configuration Steps


Perform the following steps to insert a new DocTool into a Document Pipeline.

Important
All parameters and values are case sensitive.

1. Stop the Document Pipeliner and all DocTools with the command dpctrl
shutdown.
2. Create a new servtab file for each new DocTool.
3. Copy and adapt the dpconfig file.
4. Copy and adapt the dpinfo file.
5. Copy and adapt the monitor file.
6. Create new code for DocTool functionality with perldtn, javadt or a new binary
file. Please contact OpenText Professional Services if you want to use javadt.
7. If you want to transfer configuration files from Windows to UNIX (or vice versa)
via ftp, use the text mode.
8. Force the Spawner to read the new servtab file by entering the command
spawncmd reread.
9. Restart the Document Pipeliner and all DocTools with this command sequence:

1. dpctrl shutdown

2. spawncmd startall

Alternatively, start individual DocTools with the command spawncmd <DocTool


type>.

Configuration details

This section provides information on editing the configuration files.

If the DocTool is new, create a new servtab file for it.

Important
Never modify an installed servtab file. Modifications can result in problems
when upgrading.

1. Create a servtab file with the following content:

globenv ; LOG=$ECM_LOG_DIR
Tiff2Mtiff_test;once;no;;$LOG;Tiff2Mtiff -type Tiff2Mtiff_test –
env FILING

OpenText Document Pipelines – Programming Guide 51


AR160200-PDP-EN-1
Chapter 6 Inserting DocTools

2. Modify the dpconfig file:


a. Insert a block with a line for each return code of the DocTool, like in the
following example:

ExtTiff2Mtiff.Tiff2Mtiff_test.ok -> ExtAidSel


ExtTiff2Mtiff.Tiff2Mtiff_test.error ->
ExtTiff2Mtiff_error
ExtTiff2Mtiff_error.extstock1b.ok -> ExtTiff2Mtiff
ExtTiff2Mtiff_error.stockist.ok -> ExtTiff2Mtiff
b. If necessary, add instructions for the Document Pipeliner, for example:

extstock1b: runonly

extstock1b: stopnull

Important
If you set a stockist DocTool to runonly, you should also set it to
stopnull.
c. If necessary, add a line to enable that multiple types of the specified DocTool
run simultaneously in the pipeline:

Tiff2Mtiff_test: 3
3. Create a dpinfo file:
a. Add a queue parameter line for each DocTool to display the status in the
DPInfo window, for example:

queue( "Convert TIFF to Multi-page TIFF", "Tiff2Mtiff_test",


"ExtTiff2Mtiff" ) {stockist("extstock1b")
4. If necessary, add a line in every language section for every entry you added in
the flow section:

lang("JPN")
{
...
"Convert TIFF to Multi-page TIFF"="Multi-page TIFF
\u30d5\u30a1\u30a4\u30eb\u3092\u4f5c\u6210\u3057\u307e\u3059"
...
}
lang("DEU")
{
...
"Convert TIFF to Multi-page TIFF"="TIFF nach Multipage-TIFF
konvertieren"
...
}
5. Modify the existing monitor file, or copy and modify it:
a. Add a line in the first group (containing the dpt components), for example:

Tiff2Mtiff = dpt { toolname= Tiff2Mtiff_test }


b. Add a line in the second group (containing the dpq_error components), for
example:

52 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
6.1. Configuration Steps

"Tiff2Mtiff"= dpq_error { queuename = ExtTiff2Mtiff_error }

OpenText Document Pipelines – Programming Guide 53


AR160200-PDP-EN-1
Chapter 7
Removing DocTools

You remove DocTools from the configuration files in the same way you added them.
Do this by modifying the following files (analog to inserting a DocTool). You may
leave the servtab file for later use.

• dpconfig
• dpinfo
• monitor

Important
OpenText recommends not modifying the configuration files of the standard
Document Pipeline. This prevents problems when upgrading. To make
modifications, copy the configuration files and rename the pipeline as shown
in the following example.

Example 7-1: Removing the page_idx DocTool from the myEXR3


pipeline

1. Copy the EXR3 Document Pipeline configuration files.


2. Replace exr3 by myexr3 in every file name and string.
3. Modify the myexr3.dpconfig file:
a. In the line marked in bold, replace myExR3Page_idx with
myExR3Rendition:

...
myExR3GenR3ins.GenR3ins.done ->
myExR3Page_idx
myExR3GenR3ins.GenR3ins.error ->
myExR3GenR3ins_error
myExR3GenR3ins_error.exr3_stock_genr3ins.ok ->
myExR3GenR3ins
myExR3GenR3ins_error.stockist.ok ->
myExR3GenR3ins
myExR3GenR3ins + queuetime..

myExR3Page_idx.page_idx.ok ->
myExR3Rendition
myExR3Page_idx.page_idx.error ->
myExR3Page_idx_error
myExR3Page_idx_error.exr3_stock_page_idx.ok ->
myExR3Page_idx
myExR3Page_idx_error.stockist.ok ->
myExR3Page_idx
myExR3Page_idx + queuetime

myExR3Rendition.rendition.ok ->

OpenText Document Pipelines – Programming Guide 55


AR160200-PDP-EN-1
Chapter 7 Removing DocTools

myExR3Doctods
...

b. Remove the 5 lines marked in bold concerning myExR3Page_idx.

...
myExR3GenR3ins.GenR3ins.done ->
myExR3Rendition
myExR3GenR3ins.GenR3ins.error ->
myExR3GenR3ins_error
myExR3GenR3ins_error.exr3_stock_genr3ins.ok ->
myExR3GenR3ins
myExR3GenR3ins_error.stockist.ok ->
myExR3GenR3ins
myExR3GenR3ins + queuetime

myExR3Page_idx.page_idx.ok ->
myExR3Rendition
myExR3Page_idx.page_idx.error ->
myExR3Page_idx_error
myExR3Page_idx_error.exr3_stock_page_idx.ok ->
myExR3Page_idx
myExR3Page_idx_error.stockist.ok ->
myExR3Page_idx
myExR3Page_idx + queuetime
...
myExR3Rendition.rendition.ok ->
myExR3Doctods

c. Remove all further lines with settings for myExR3Page_idx, such as


runonly, and stopnull, and running multiple types of the specified
DocTool simultaneously.
4. Modify the myexr3.dpinfo file:
a. Remove the queue parameter line for myExR3Page_idx:
queue( "Count pages of document", "page_idx",
"myExR3Page_idx" ) {
stockist ("exr3_stock_page_idx")
}

b. If other language entries are available, remove the corresponding


language entries for the deleted queue parameter line, for example:
"Count pages of document"="\u6587\u66f8\u306e\u30da\u30fc
\u30b8\u3092\u6570\u3048\u307e\u3059"

5. Modify the myexr3.monitor file:


a. Remove the page_idx line in the Extern section:
page_idx = dpt { toolname= page_idx}

b. Remove the page_idx line in the Extern Error Queues section:


"Page_idx" = dpq_error { queuename = ExtPage_idx_error }

56 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
Chapter 8
Optimizing DocTool Usage

There are two ways to improve the performance of the DocTools in your Document
Pipelines, for example by reducing bottlenecks.

8.1 Starting Separate DocTools for Each Pipeline


Use multiple DocTool types to increase the performance for DocTools with high
load, for example by using multiple types (versions) of a DocTool with an attribute.
It is very important to separate standard and customer-specific pipelines. This
means that you use different DocTools (executing the same code) and log files so
that different pipelines have their own resources.

Example 8-1: Using two DocTool types of DbInsert

Assume that you want to enhance the throughput of the R3Insert DocTool
in your pipeline. To achieve this, enter the following lines at the end of
a .servtab file:

…,R3Insert -type R3Insert_TIFF …-logfile my_R3Insert_tiff.log


…,R3Insert -type R3Insert_ASC …-logfile my_R3Insert_asc.log

8.2 Running DocTools in Parallel


Start DocTools more than one time (trial and error to get the best performance).
Having several DocTool types run simultaneously is useful, for instance, if the
queues cannot be processed quickly enough.

In the dpconfig file, increase the number of simultaneous runs for a DocTool at
which there is a bottleneck; and extend the .servtab file:

1. Insert or modify a line in the dpconfig file like this:

R3Insert_TIFF: 3

2. Enter the following lines at the end of a .servtab file:

…,R3Insert -type R3Insert_TIFF …-logfile my_R3Insert_tiff1.log


…,R3Insert -type R3Insert_TIFF …-logfile my_R3Insert_tiff2.log

OpenText Document Pipelines – Programming Guide 57


AR160200-PDP-EN-1
Part 4
perldtn and perldte
Part 4 perldtn and perldte

This part helps you to write your own DocTools. It describes the command line
options and functions of the DocTools perldtn and perldte as well as Perl modules
from OpenText that can be used by perldtn and perldte. It provides also sample
scripts for using perldtn and perldte.

perldtn uses these special OpenText modules to extend Perl for the Document
Pipeline, and makes scripting functionality available in the Document Pipeline for
customer-specific configuration.

The perldte DocTool is designed for implementing custom enqueue DocTools.

60 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
Chapter 9
perldtn

With the Perl interpreter integrated, perldtn includes all features of Perl.

perldtn is derived from the doctool class and has all inherited features as all other
DocTools, for example. dpctrl msg print. Additionally, perldtn contains
OpenText-specific perl modules that enhance the functionality.

These modules cover the following aspects:

• Working with the Document Pipeliner (DP)


• Writing into log-files in the standard OpenText logging directory
($ECM_SRV_LOG)
• Access to the setup-files
• Reading/changing the COMMANDS file

The dedicated functionality of a certain perldtn DocTool must be implemented by


writing a Perl script.

9.1 Command Line Options


To call a Perl script from the command-line, you can set the following options:

Command Line Description


Option
-script Perl script to be executed
<PerlScript.pl>
-perl_args <args> Arguments for the Perl parser, for example, perl_args “-w -I/
path/path/path”
-test To run the Perl script without the DP
-testpath <path> Document path if the test option is set
-loglevel <1> (<12>) Log level (1 = minimum logging information, 12= maximum logging
information)
-help perldtn interpreter shows a list of all call-parameters and dpctrl
commands
-env <s1>,…,<sn> To read the setup variables from setup files.
Available setup variables: $ECM_DOCUMENT_PIPELINE_CONF/
config/setup/<s1>.setup
...
$ECM_DOCUMENT_PIPELINE_CONF/config /setup/
<sn>.setup

OpenText Document Pipelines – Programming Guide 61


AR160200-PDP-EN-1
Chapter 9 perldtn

Command Line Description


Option
-type <name> Set DocTool type (default = program name). The DocTool type must
be defined in the dpconfig file.
-logfile <log file> Name of the log file.

Example: Sample call for perldtn (Windows):

set SCRIPTDIR=%ECM_DOCUMENT_PIPELINE_BASE%\scripts\perl
set BINDIR=%ECM_DP_PERL_10_0_0%\bin
set PERLDIR=%ECM_DP_PERL_10_0_0%\perl-5.8.5\bin
set PATH="%BINDIR%;%PERLDIR%;%Path%;."
set Path=%PATH%
set PERL5LIB="%ECM_DP_PERL_10_0_0%\perl-5.8.5\lib;
%ECM_DP_PERL_10_0_0%\lib\perl-5.8.5;%ECM_DP_PERL_10_0_0%
\perl-5.8.5\site\lib"
%BINDIR%\perldtn -type rendition -env DT_RENDITION -script
"%SCRIPTDIR%\rendition.pl" -logfile rendition_1.log

9.2 Running perldtn as Standard Doctool


To use perldtn as a normal DocTool, you have to implement defined functions,
which are called from the perldtn-C++-binary. OpenText recommends to implement
the functions doBeforeConnect and service. The functions printObject and
control offer additional functionality but do not require implementation. The
following picture shows the chronological sequence:

62 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
9.3. Functions for perldtn as Standard DocTool

9.3 Functions for perldtn as Standard DocTool


9.3.1 doBeforeConnect
This function is called when the DocTool is started. After executing this function, the
DocTool connects to the DP.

Return values
Perl array with 2 elements:

• Return code: 1 = „ok“, 0 = „error“


• Optional: errortext, written into the logfile

Example 9-1: doBeforeConnect


sub doBeforeConnect{
my $dttype = shift;
# do some initialisation tasks
# ...
return (1, "got Doctool-Type $dttypet");
}

9.3.2 service
Use this function to implement the functionality of the DocTool. The function is
called for every document that is passed to the DocTool.

Input parameters
1. Document directory
2. DocTool type
Return values
Perl array with 3 elements:

• $1: Return code; 0 for error, 1 for OK.


• $2 (optional): Comment for DP.
• $3 (optional): Opcode. Default value is ok for success else error.

Example 9-2: service


sub service
{
my($dpdir,$dttype) = @_;
if ( $dttype eq "PerlDoctool")
{return (1,"oh fine, my name is $dttype","ok") ;}

OpenText Document Pipelines – Programming Guide 63


AR160200-PDP-EN-1
Chapter 9 perldtn

else
{return (0,"I’m sorry, my name is $dttype");}
}

9.3.3 printObject
This function writes the current settings of the DocTool into the log file of this
DocTool. The function has no parameter or return code.

9.3.4 control
This function serves for administrative purposes. It is called if the DocTool is called
from the command line with the dpctrl tool, for example
dpctrl msg <DocTool name> <Command string>.

Input parameters
1. DocTool name
2. Command string.
This is an arbitrary string that can contain for example commands to be sent
via dpctrl msg.

64 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
Chapter 10

perldte

perldte is an enqueue DocTool and is derived from the doctool class and has all
inherited features as all other DocTools, for example dpctrl msg print. The
enqueuing of a new document must be done in a Perl script by calling the
IXOS::DT::DPop() function. Additionally there are OpenText-specific Perl modules
that enhance the functionality.

These modules cover the following aspects:

• Working with the Document Pipeliner (DP)

• Writing into log-files in the standard OpenText logging directory


($ECM_SRV_LOG)

• Access to the setup-files


• Reading/changing the COMMANDS file

10.1 Command Line Options


To call a Perl script from the command-line, you can set the following options:

Command Line Description


Option
-test To run the Perl script without the DP
-testpath <path> Document path if the test option is set
-loglevel <1> (<12>) Log level (1 = minimum logging information, 12= maximum logging
information)
-help perldtn interpreter shows a list of all call-parameters and dpctrl
commands
-env <s1>,…,<sn> To read the setup variables from setup files.
Available setup variables: $ECM_DOCUMENT_PIPELINE_CONF/
config/setup/<s1>.setup
...
$ECM_DOCUMENT_PIPELINE_CONF/config /setup/
<sn>.setup
-type <name> Set DocTool type (default = program name). The DocTool type must
be defined in the dpconfig file.
-logfile <log file> Name of the log file.

OpenText Document Pipelines – Programming Guide 65


AR160200-PDP-EN-1
Chapter 10 perldte

10.2 Running perldte as Enqueueing Doctool


To use perldte as a normal DocTool to enqueue documents into a document
pipeline, you have to implement defined functions that are called from the Perldt-C+
+-binary. For an example of a Perl script used as enqueueing DocTool, see
“Enqueueing Documents into a OpenText Document Pipeline with perldte”
on page 77. Therefore the functions are described very shortly at this place. The
following picture shows the sequence:

10.3 Functions for perldte as Standard DocTool


10.3.1 doBeforeConnect
The DP calls doBeforeConnect, when perldte is started. Then perldte connects to
the DP. For details see “doBeforeConnect” on page 63.

10.3.2 service
Use this function to implement the enqueuing functionality of the DocTool. The
function is called only once.

Input parameter
DocTool type
Return values
Perl array with 2 elements:

• $1: Return code; 0 for error, 1 for OK.

66 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
10.3. Functions for perldte as Standard DocTool

• $2 (optional): Comment for DP.

Example 10-1: service


sub service
{
my $dttype = shift;
if ( $dttype eq “PerlDoctool”)
{return (1,"oh fine, my name is $dttype","ok") ;}
else
{return (0,"I’m sorry, my name is $dttype");}
}

OpenText Document Pipelines – Programming Guide 67


AR160200-PDP-EN-1
Chapter 11
OpenText-specific Perl Modules

11.1 Module IXOS::DTLogging


The logmsg function writes a message into the log file if logging is switched on (set
logging via variables in the setup files, for example LOG_ENTRY=off, LOG_DEBUG=on,
LOG_INFO=on). You do not need the init function if you use perldtn . If you use
only Perl, you must call init() before you call logmsg.

Log levels
There are 12 possible log levels (log level 12 contains all logging information):

Loglevel Description Loglevel


_FATAL() Fatal system error 1
_ERROR () Non-fatal system error 2
_IMPOT() Important 3
_SECU() Security problem 4
_WARNING() Warning 5
_RESULT() Result 6
_UERROR() User error 7
_INFO() Info 8
_DBUG() Debug info 9
_RPC rpc info (logging for the librpc library) 10
_DB Database info 11
_ENTRY Procedure entry/exit 12

Including IXOS::DTLogging
Add the following lines to the Perl script to include the IXOS::DTLogging
module.

use IXOS::DTLogging;
$ret = IXOS::DTLogging::logmsg( $loglevel , $logtext );
$ret = IXOS::DTLogging::init( $logfile, $loglevel );

Example: Writing to a log file:

IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(),
"This message was written to log file, if LOG_INFO=on");

OpenText Document Pipelines – Programming Guide 69


AR160200-PDP-EN-1
Chapter 11 OpenText-specific Perl Modules

11.2 Module IXOS::DTDocument2


This module contains functions for working with the COMMANDS file and other files.

The module provides a standardized interface to open, to close, to read and to write
files in the working directory of the DP. The module consists of functions to work on
the COMMANDS-File and other self defined files. When a function writes to the
COMMANDS file, it also creates automatically a backup copy of the file.

Methods
IXOS::DTDocument2->new
First you have to call this function as follows:
My $obj = IXOS::DTDocument2->new(path, dtName).
The IXOS::DTDocument2->new method creates the DTDocument2 object. You
must use this method before you can work with the other methods of the
DTDocument2 module. The extension is based on the object-oriented feature
of Perl. For that reason you have to create the object instance by calling the
new method. Perl automatically manages the destruction of the object.

$obj->openCommands
This method opens the COMMANDS file if already closed. Since the
extension or perldtn always opens COMMANDS file automatically, you do
not need this module for opening the COMMANDS file for the first time.
$obj->appendToCommands(value)
Appends the string value to the commands file. This method creates
automatically a security copy of the COMMANDS file.
$obj->closeCommands()
This method closes the COMMANDS file. Normally you do not need this
method because the extension or perldtn closes the COMMANDS file
automatically. All the changes on the COMMANDS file will be written on
closing time
$obj->deleteFromCommands(key, values)
This method deletes a key-value pair form the commands file, but not from
the internal list. So you can find it with findKeyInCommands(key) after
deleting. The return value is the amount of statements or -1 for error.
$obj->appendToProtocol(value)
This method appends the string value to the DpProtocol protocol file.
$obj->closeProtocol()
This method closes the DpProtocol file. Normally you do not need this
method because the extension or perldtn closes the DpProtocol file
automatically.
$obj->checkCOMPStmts(docpath, ignorefileslist)
Checks if files exist that are specified with COMP statements in the
COMMANDS file. You can specify a list of files to be ignored
(ignorefileslist) by this method to prevent that an error is returned.

70 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
11.3. IXOS::DT

Return values:

• -1: no COMP statement found in the COMMANDS file.

• 0: file specified in a COMP statement does not exist.

• 1: OK (specified file exists)

$obj->findKeyInCommands( key)
Returns the amount of the key statements in the COMMANDS file.

$obj->getValueForFormCommands(key)
Returns an array of values that are attached to the key in the COMMANDS
file.

$obj->getAllValuesFromCommands(key)
This method returns the list of all entries in the COMMANDS file.

$obj->integrateChanges()
Integrates all append and delete changes for the COMMANDS file into the
internal list.

11.3 IXOS::DT
This module contains methods for the communication between a DocTool and the
Document Pipeliner (DP).

Methods

DPop

To be used in
perldte

Functionality
DPop is a function which is called after a DocTool has finished processing a
document. It is passed the following arguments:

int DPop(path, operation, comment)


char *path; /* path of the processed document */
char *operation; /* opcode */
char *comment; /* comment which is to be added
to the DPqStatus file */

The DP moves the document in question to the next queue (according to the
configuration defined in the DP's configuration file).
This method must only be used in perldte for enqueuing.

Synopsis
$status = IXOS::DT::DPop($docdir, $opcode, $comment);

OpenText Document Pipelines – Programming Guide 71


AR160200-PDP-EN-1
Chapter 11 OpenText-specific Perl Modules

Arguments

Argument Description
$docdir Document Directory
$opcode Operation code to be sent to DP
$comment Comment for DP

Return values
$: status code from DP.

DPcrt

To be used in
perldtn

Functionality
DPcrt registers documents to the DP. A document can be enqueued by any
DocTool type.

Synopsis
$status = IXOS::DT::DPcrt($host, $type, $docdir, $opcode, comment);

Arguments

Argument Description
$host Name of the host on which the target DP is running
$type Name of the DocTool type. This type must be known to the DP via
a dpconfig file.
$docdir Document Directory
$opcode Operation code to be handed over to DP.

Note: The operation code is part of the rules for the


document flow that are defined in the dpconfig file. It
looks as follows:
<input queue>.<doctool-type>.<operation code>
-> <output queue>.
Example: nil.enqueue-doctool.done -> queue1
Operation code is done.
$comment Comment for the DP

Return values
$: status code from DP.
The status code can have the following values:

72 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
11.3. IXOS::DT

Status Description
code
0 Success
–1 Error
1 DPOP_NOCONNECT: connection to DP impossible
2 DPOP_MISCERR: serveral errors possible, for example:
• Too many documents in the pipeline
• Insufficient disk space
• DPqStatus file cannot be written
3 DPOP_CONFIGERR: configuration error, for example:
• Wrong document path
• Unknown operation code

DPcrt2
To be used in
perldtn
Functionality
DPcrt2 registers documents to the DP. The method must only be used in
perldtn.
Synopsis
$status = IXOS::DT::DPcrt2($docdir, $opcode, $comment);
Arguments

Argument Description
$docdir Document Directory
$opcode Operation code to be handed over to DP
$comment Comment for the DP

Return values
$: status code from DP.

getErrorText
To be used in
perldtn
Functionality
getErrorText returns message for specified status code.

Synopsis
$errmsg = getErrorText($status);

OpenText Document Pipelines – Programming Guide 73


AR160200-PDP-EN-1
Chapter 11 OpenText-specific Perl Modules

Arguments

Argument Description
$status Status code from DP

Return values
$: Text of the status code. There are the following status codes:

$status Text
0 DPOP_OK
1 DPOP_NOCONNECT
2 DPOP_MISCERR
3 DPOP_CONFIGERR
4 DPOP_SYNCERR
5 DPOP_NOHI
6 DP_NOSWITCH
Else Unknown error

Including IXOS::DT

Add the following line to the service function to include the IXOS::DT module.

use IXOS::DT;

11.4 IXOS::DTUtil
This module contains methods for reading global settings from setup files.

Methods
initPkgConfig($PkgKey)
This method reads the variables of the specified setup file. You can access
the variables via the getenv method.
The argument $PkgKey must contain the file name prefix of the setup file:
$PkgKey = "COMMON".
The method returns 1 on success, else 0:
$ret = $IXOS::DTUtil::initPkgConfig($PkgKey);

getenv($PkgKey)
This method fetches the content of the key in the setup file. It is necessary to
initialize using initPkgConfig() before using this method. The accessible
key depends on the initialization.

74 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
11.4. IXOS::DTUtil

Including IXOS::DTUtil
Add the following lines to the service function to include the IXOS::DTUtil
module.

use IXOS::DTUtil;

Example for accessing the DPDIR variable of the COMMONsetup file:

use IXOS::DTUtil;
my $rc = IXOS::DTUtil::initPkgConfig("COMMON");
$DPDIR = IXOS::DTUtil::getenv("DPDIR");
if ($DPDIR eq "") {
return (0, "DPDIR is not set in the environment");
}
print "DPDIR = $DPDIR\n";

OpenText Document Pipelines – Programming Guide 75


AR160200-PDP-EN-1
Chapter 12

Examples

12.1 Enqueueing Documents into a OpenText


Document Pipeline with perldte
The enqueue_ext.pl script below is an example for an enqueueing script. Most of
the structure is predefined, the specific function of this example is
makeDocDirAndCopy. In this case a search routine with one step is implemented (not
in two steps as in the “classic” ext_dir-structure).

Calling the script

The script is called with the following command (here an example for enqueueing in
EXR3-pipeline):
perldte -script D:/temp/enqueue_ext.pl -type EnquedocExR3 -env EXR3
Make sure that the correct -type parameter is set. The parameter must be the same
as in the first line of the pipeline configuration file after the nil statement, in this
case the exr3.dpconfig file (see first line of the exr3.dpconfig file below).

Example 12-1: exr3.dpconfig

nil.EnquedocExR3.done -> ExR3Perldt

ExR3Perldt.ExR3start.ok -> ExR3Tiff2Mtiff


ExR3Perldt.ExR3start.error -> ExR3Perldt_error
ExR3Perldt_error.exr3stock1.ok -> ExR3Perldt
ExR3Perldt_error.stockist.ok -> ExR3Perldt

Functions of the script

The enqueue_ext.pl script contains the functions DoBeforeConnect and service.

The first function DoBeforeConnect is called once at the beginning of the procedure.
In this case, nothing is done.

The next function service is called once, too. This function does the following:

1. Parsing external directories for documents and/or files


2. Creating unique pipeline directories
3. Copying the files in the pipeline
4. Announcing each new document to the DP with the IXOS::DT::DPop()
function.

OpenText Document Pipelines – Programming Guide 77


AR160200-PDP-EN-1
Chapter 12 Examples

Example 12-2: enqueue_ext.pl


# perl
use English;
use File::Copy;
use IXOS::DTLogging;
use IXOS::DT;

##############################################################
#####
# function is called once before connection to DP
# parameters: $CFG = path of pipeline-root
# $quelldir = path of ext_dir
# $dir[$i] = entries found in ext_dir
#
# returns: name of created pipeline-dir
##############################################################
#####
sub doBeforeConnect {
($dttype) = @_;

IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(),"doBeforeConn
ect .....");
return 1;
}

##############################################################
######
# function is called once for reading the ext_dir
# parameters: $dttype = type of pipeline (as in
dpconfig !!!)
#
# global variables: @alldirs = entries found in ext_dir
#
# returns: name of created pipeline-dir
##############################################################
######
sub service {
($dttype) = @_;

# reading path to ext_dir from registry


$extdir = $ENV{EXT_DIR};
$dpdir = $ENV{DPDIR};
$dphost = $ENV{DPHOST};

# open handle for ext_dir


opendir (DIR,$extdir ) or die "can't open directory
$extdir";

# getting all directories from ext_dir in global variable


@alldirs
@alldirs = grep !/^\.\.?$/, readdir(DIR);

78 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
12.2. Using perldtn as DocTool Running in a Document Pipeline

#------------------------------------
# enqueueing of all EXT_DIR documents
#------------------------------------
$i = 0;
foreach my $dir (@alldirs) {
$docdir = "$dphost/e".time."_$i";
mkdir("$dpdir/$docdir");
$i++;
opendir (DIR2,"$extdir/$dir") or die "can't open
directory $extdir/$dir";
@alldirs2 = grep !/^\.\.?$/, readdir(DIR2);
foreach my $dir2 (@alldirs2) {
copy("$extdir/$dir/$dir2","$dpdir/$docdir");
}
IXOS::DT::DPop($docdir,"done","success");
closedir(DIR2);
}
closedir(DIR);
return ( 1 );
}

12.2 Using perldtn as DocTool Running in a


Document Pipeline
This is an example of a perldtn DocTool, see “Sample Script” on page 82 for code.
This DocTool contains the two required functions doBeforeConnect and service,
as well as the two other functions CheckCommandsStmt and AppendCommandsStmt.
As a result, this DocTool appends the string “Hello World...” to the COMMANDS
file, see “Sample COMMANDS Files” on page 85.

12.2.1 Modules and Functions


Used modules

The use statements include the required modules in the script.

use IXOS::DTUtil;
use IXOS::DTLogging;
use IXOS::DTDocument2;

• The IXOS::DTUtil module is used for fetching environment variables.

• With the IXOS::DTLogging module, the error messages or information messages


are written into the log file of the tool.

• The functions of the IXOS::DTDocument2 module are used for reading lines from
the commands file.

OpenText Document Pipelines – Programming Guide 79


AR160200-PDP-EN-1
Chapter 12 Examples

doBeforeConnect function

The doBeforeConnect function is called during the startup of the DocTool, before it
connects to the Document Pipeline. This function is used for initialization and for
fetching the environment variables DPHOST and DPDIR with the getenv function. The
commands for fetching the environment variables are just inserted for
demonstration purposes. The variables are not used in this DocTool.

The DocTool returns the value “1” (for successful operation) and the type of the
DocTool.

service function

The service function is executed for each document.

service performs the following tasks:

1. Initializing the variables.


2. Writing the text “Try to get values for archivid, docid and doctype from file
'COMMANDS”to the log file.
3. Instantiating the $dtdocument2 object that references the COMMANDS file and the
protocol file.
4. Reading the values archive id, document id and document type from the
COMMANDS file using the functions of the DTDocument2 module.
The CheckCommandsStmt function checks for each value if it is available in the
COMMANDS file.
The values of the array returned by CheckCommandsStmt are used in the
service function for logging.

5. Writing the values to the log file with the IXOS::DTLogging:logmsg() function.
6. The AppendCommandsStmt() function appends the string “Hello World” to the
COMMANDS file.
7. The service function returns “1” (success) as return code, a message text
(“document processed”) and the opcode “ok”. The opcode is evaluated in the
dpconfig file.

CheckCommandsStmt function

This function checks the COMMANDS file for the keyword given as input
parameter.

The findKeyInCommands function counts the statements in the COMMANDS file. If


there are zero or more than one similar statements (same keyword, for example
“ARCHIVID”, the return code is set to “0” and an error message is set. If the
function gets one hit, getValueForFromCommands returns the value of the statement.
Finally, CheckCommandsStmt checks if the value is not empty, sets the return code to
“1” and writes the value of the statement into the second value of the return array.

80 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
12.2. Using perldtn as DocTool Running in a Document Pipeline

AppendCommandsStmt function

This function appends a statement to the COMMANDS file.

OpenText Document Pipelines – Programming Guide 81


AR160200-PDP-EN-1
Chapter 12 Examples

12.2.2 Sample Script


# contained IXOS-Modules
use IXOS::DTLogging;
use IXOS::DTDocument2;
use IXOS::DTUtil;

#=============================================================================
=========
# FUNCTION doBeforeConnect
#=============================================================================
=========
# DESCRIPTION: this perl script function will be called, during the startup
of the
# doctool, before the doctool connects
# to dp.
# PARAMETERS: doctool-type
# RETURNS: a perl array containing two elements:
# - returncode: 1 for success an 0 for an error
# - error text, which is written into the logfile
#=============================================================================
=========
sub doBeforeConnect
{
IXOS::DTLogging::logmsg(IXOS::DTLogging::_ENTRY());
($dttype) = @_;
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "Try to
initialize ...");

if (IXOS::DTUtil::getenv("DPHOST")) {
my $_dphost = IXOS::DTUtil::getenv("DPHOST");
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "DPHOST=
$_dphost");
}
if (IXOS::DTUtil::getenv("DPDIR")) {
my $_dpdir = IXOS::DTUtil::getenv("DPDIR");
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "DPDIR=$_dpdir");
}

return (1, "got Doctool-Type $dttype.");


}

#=============================================================================
=========
# FUNCTION service
#=============================================================================
=========
# DESCRIPTION: this perl script function will be called for each document
# PARAMETERS: document directory and doctool-type
# RETURNS: a perl array containing three elements:
# - returncode: 1 for success an 0 for an error (recommended),
# - text (recommended)
# The text is written into the log file and, as info, in the
protocol
# file if the returncode is 1.
# The text is written into the log file and, as error, in the
protocol
# file if the returncode is 0.
# - opcode, which is returned to the DP (optional). The default
depends
# on the returncode.

82 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
12.2. Using perldtn as DocTool Running in a Document Pipeline

# If the returncode is 1, the opcode default value is 'done'.


# If the returncode is 0, the opcode default value is 'error'.
#=============================================================================
=========
sub service
{
IXOS::DTLogging::logmsg(IXOS::DTLogging::_ENTRY());
my $docdir = $_[0];
my $dttype = $_[1];

#initialisation of variables
my $archivid = "";
my $doctype = "";
my $docid = "";

# write lo log file


IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "Try to get values for
archivid,
docid and doctype from file 'COMMANDS' ...");

# Instantiate the $dtdocument2 object that references the COMMANDS file


and the
# protocol file
my $dtdocument2 = IXOS::DTDocument2->new($docdir, $dttype);

# try to get archive id


@rc = CheckCommandsStmt($dtdocument2,"ARCHIVID");

if ($rc[0] == "0") {
return(0, $rc[1], "error");
} else {
$archivid = $rc[1];
}

# try to get doc id


@rc = CheckCommandsStmt($dtdocument2,"DOCID");

if ($rc[0] == "0") {
return(0, $rc[1], "error");
} else {
$docid = $rc[1];
}

# try to get doctype


@rc = CheckCommandsStmt($dtdocument2,"DOCTYPE");

if ($rc[0] == "0") {
return(0, $rc[1], "error");
} else {
$doctype = $rc[1];
}

IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "ARCHIVID=$archivid");
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "DOCID=$docid");
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "DOCTYPE=$doctype");

IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "... reading of file


'COMMANDS'
successfully finished.");

AppendCommandsStmt($dtdocument2,"Hello World ...");

OpenText Document Pipelines – Programming Guide 83


AR160200-PDP-EN-1
Chapter 12 Examples

return (1, "document processed", "ok");

##############################################################################
#########
# Other functions used in doBeforeConnect and/or
service #
##############################################################################
#########

#=============================================================================
=========
# FUNCTION CheckCommandsStmt
#=============================================================================
=========
# DESCRIPTION: this perl script function check the file 'COMMANDS' for the
keyword
# given as parameter
# PARAMETERS: COMMANDS keyword (e.g. DOCID)
# RETURNS: a perl array containing two elements:
# - returncode: 1 for success an 0 for an error,
# - textstring
#=============================================================================
=========
sub CheckCommandsStmt
{
IXOS::DTLogging::logmsg(IXOS::DTLogging::_ENTRY());
my($dtdocument2,$statement) = @_;

my $value = "";

$dtdocument2->openCommands();

$ret = $dtdocument2->findKeyInCommands($statement);
if ($ret == 0) {
@rc = ("0", "Cant' find '$statement' statement in file 'COMMANDS'!");
} elsif ($ret > 1) {
@rc = ("0", "Found more then one '$statement' statement in file
'COMMANDS'!");
} elsif ($ret == 1) {
$value = $dtdocument2->getValueForFromCommands($statement);
if($value eq '') {
@rc = ("0", "Can't get value for '$statement' statement in file
'COMMANDS'!");
} else {
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "Get value $value
for '
$statement' statement from file 'COMMANDS'.");
@rc = ("1", $value)
}
}

return @rc;
}

#=============================================================================
=========
# FUNCTION AppendCommandsStmt
#=============================================================================
=========
# DESCRIPTION: this perl script function appends a statemnet to the file

84 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
12.2. Using perldtn as DocTool Running in a Document Pipeline

'COMMANDS'
# PARAMETERS: statement string
# RETURNS: - returncode: 1 for success an 0 for an error,
#
#=============================================================================
=========
sub AppendCommandsStmt
{
IXOS::DTLogging::logmsg(IXOS::DTLogging::_ENTRY());
my($dtdocument2,$statement) = @_;
$dtdocument2->openCommands();
$dtdocument2->appendToCommands("\n" . $statement . "\n");
$dtdocument2->closeCommands();
IXOS::DTLogging::logmsg(IXOS::DTLogging::_INFO(), "append statement
'$statement' to
file 'COMMANDS'.");
return 1;
}

12.2.3 Sample COMMANDS Files


This is the COMMANDS file before the perldtn DocTool runs:

DOCTYPE FAX
COMP angebot.fax FAX angebot.fax
COMP im ASCII_NOTE im
R3_DESTINATION QM2
R3_CLIENT 800
--R3_OBJ_TYPE YWH1OWRT
--R3_DOC_TYPE YWH1WRT1
--R3_OBJ_ID 12345678901234567890
--R3BC_TEC_DOC_TYPE BCTECDOCTYPE
COMPUTERNAME brauneck
USERNAME Write_DP
ARCHIVID X1
USE_DOCID_FROM_COMMANDS on
DOCID 1.brauneck.X1.071129130115
RETENTION_PERIOD none

After running the perldtn DocTool, the COMMANDS file contains the additional line
“Hello World ...” at the end.

DOCTYPE FAX
COMP angebot.fax FAX angebot.fax
COMP im ASCII_NOTE im
R3_DESTINATION QM2
R3_CLIENT 800
--R3_OBJ_TYPE YWH1OWRT
--R3_DOC_TYPE YWH1WRT1
--R3_OBJ_ID 12345678901234567890
--R3BC_TEC_DOC_TYPE BCTECDOCTYPE
COMPUTERNAME brauneck
USERNAME Write_DP
ARCHIVID X1
USE_DOCID_FROM_COMMANDS on
DOCID 1.brauneck.X1.071129130115
RETENTION_PERIOD none
Hello World ...

OpenText Document Pipelines – Programming Guide 85


AR160200-PDP-EN-1
Glossary
Administration Server

Administration Server provides the interface to the OpenText Administration


Client for Archive Center which helps the administrator to create and maintain
the environment of Archive Center, including logical archives, storage devices,
pools, etc.

ADMS
See Administration Server.

Advanced Function Presentation (AFP)

Coordinated set of document creation, viewing, archiving and printing hardware,


software and services. AFP defines native text, image, graphics and bar code
objects that can be combined to create page content and provides a means for
managing fonts, overlays and other resource objects.

AFP
See Advanced Function Presentation (AFP).

Archive Center

Archive Center (former Archive Server) provides a full set of services for content
and documents. Archive Center can either be used as an integral part of
OpenText Content Suite Platform or as standalone services in various scenarios.
These services include handling archiving needs and combining the Document
Service, archive databases, etc.

Archive ID

Name of the logical archive assigned to the archive mode.

Archive Monitoring Web Client

Web-based administration tool for monitoring the state of the processes, storage
areas, Document Pipelines and database space of Archive Center.

Archive Spawner

Service program that starts and terminates the processes of the archive system.

ArchiveLink

The interface between SAP and the archive system.

CMIS
See Content Management Interoperability Services (CMIS).

OpenText Document Pipelines – Programming Guide 87


AR160200-PDP-EN-1
Glossary

COLD

Computer Output to Laser Disk – automatically created document lists that are
stored on an optical storage medium.

COMMANDS file

A COMMANDS file is created for each document and transferred with it. This file
contains processing information for the document and can be extended by any
DocTool to include information or parameters for a subsequent tool; for example,
a document ID after storing a document to an archive, or formatting instructions
for an XML file.

Content Management Interoperability Services (CMIS)

An open standard that allows different content management systems to inter-


operate over the Internet. CMIS defines an abstraction layer for controlling
diverse document management systems and repositories using web protocols.

DOCDIR

Subdirectory in DPDIR that contains all files belonging to a document (for


example the COMMANDS file).

DocTool Types

If the same DocTool is to be executed several times, but with different queues as
input and output sources, you can define DocTool types. DocTool types can be
distinguished by their names.

DocTools

Programs that perform individual, elementary actions with documents in the


Document Pipeline.

Document ID

Unique string assigned to each document with which the archive system can
identify it and trace its location.

Document lists

A document list is a single file that contains several individual documents. Each
of the documents can consist of several pages and has its own set of attributes. To
preserve the layout of the documents in a document list, it is possible to overlay
the individual documents with a form.

Document Pipeline Info (DPInfo)

Graphical user interface for monitoring Document Pipelines.

Document Pipeline Info


See Document Pipeline Info (DPInfo).

88 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
Glossary

Document Pipeline

Short for OpenText Document Pipeline.


A Document Pipeline is the basic component in almost all document processing
software and is used, for instance, to transfer documents to a storage system or
another application while performing certain additional tasks.

Document Pipeliner (DP)

A Spawner component responsible for controlling and administering the flow of


documents and the DocTools involved in a Document Pipeline.

DP
See Document Pipeliner (DP).

dpconfig file

The configuration file for the Document Pipelines defines which DocTools are
executed in which order and contains at least one line per DocTool type. This line
specifies the source of the documents to be processed (that is the queue for the
DocTool) and the destination for the documents once processed (that is the queue
for the next DocTool).

DPDIR

The exchange directory to which the documents are copied from the enqueuing
tools and in which the DocTools perform their operations. Each document has its
own subdirectory in which all files are located concerning the same document.

DPqStatus file

This file contains a record of all the processing steps that the document has
already undergone together with the corresponding time stamps. The last line
always reflects the current status of the document. This file is mainly used
internally for recovery after a disturbance in the Document Pipeline process, as it
enables the pipeline to continue processing the document at precisely the step it
was stopped.

Enqueue tool

The DocTool that transfers the documents from the exchange directory to the
initial source queue of the Document Pipeline.

Exchange directory

The directory which is used for exchange of data to be retrieved or archived. This
directory is dedicated to the exchange between the leading application, the
Document Pipeline, and Archive Center

OpenText Document Pipelines – Programming Guide 89


AR160200-PDP-EN-1
Glossary

Indexing

Definition of storage conditions, for example the archive to which the document
is to be stored, by selecting the scenario, default settings and document type for
the document to be stored.

Indicator file

The indicator file (<docid>.log) indicates that the document directory is


complete and ready to be processed.

IXATTR file

The attribute values and various other items of document information must be
specified individually for each document and provided in the IXATTR file. The
structure of the IXATTR file is closely associated with the database layout of the
customer's leading application and must be created in accordance with the
customer's specific requirements, that is certain data must be customized.

Jobs

A job is an administrative task that you schedule to run automatically at regular


intervals. A job is assigned a unique name and is associated with a system
command that it executes along with any argument required by the system
command.

Leading application

A leading application is an application that generates the archived documents (for


example print lists in SAP) or an application with whose business objects the
archived documents are linked (for example inbound documents in SAP). SAP,
OpenText Content Server, Microsoft Exchange, Lotus Notes and Microsoft
SharePoint can be linked as leading applications.

Log file
See Indicator file.

Logical archives

The storage area on Archive Center in which documents can be stored. Each
logical archive may be configured to represent a different archiving strategy
appropriate to the types of documents archived exclusively there. It may consist
of one or more pools.

Meta documents

Meta (MTA) documents are also known as document lists, that is one
comprehensive file containing several individual documents of the same file
format. If indexing information is provided for the Meta document
(META_DOCUMENT_INDEX component), the individual documents can be searched
for and retrieved quickly and easily.

90 OpenText Document Pipelines – Programming Guide


AR160200-PDP-EN-1
Glossary

Opcode

Short form of operation code. The code that indicates the result of a DocTool
operation, for example ok or error. Depending on the opcode, the document is
transferred to the specified queue.

OpenText Administration Client

Administration tool for setup and maintenance of servers, logical archives,


devices, optical media, disk buffers, archive modes, users, and policies.
Front end interface for customizing and administering Archive Center.

Print lists

Documents that are created by the leading application and consist of lists of data.

Queues

Waiting lists for multiple tasks of the same type to be executed successively.

Servtab files

Configuration files of the Spawner that specify which processes to start.

Source queue

The queue (directory) that contains the documents awaiting processing by a


DocTool.

stockist

The special DocTool that processes documents in an error queue and returns the
document to the previous DocTool.

OpenText Document Pipelines – Programming Guide 91


AR160200-PDP-EN-1

You might also like