0% found this document useful (0 votes)
407 views88 pages

IBM Content Manager OnDemand and FileNet-3

Uploaded by

David Resendiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
407 views88 pages

IBM Content Manager OnDemand and FileNet-3

Uploaded by

David Resendiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Content Manager OnDemand for Multiplatforms

The security user exit runs the ARSUSEC program when a user attempts to log on to the
system. A sample C program is provided in the EXITS directory. To implement your own
security user exit program, add your specific code to the sample that is provided (for example,
you can call another program from the ARSUSEC program). For more information about
functions, parameters, and return codes, see the ARSCSXIT.H file. Then, compile the ARSUSEC
program and move or copy the executable program to the BIN directory. Then, restart the
library server to use the security user exit program.

The arsuperm (permissions exit) can be modified in the same way and needs to be placed in
the /opt/IBM/ondemand/V9.5/exits directory.

Content Manager OnDemand for i server


By default, the Content Manager OnDemand for i server activates the security exit and uses
IBM i security. If the security exit is not enabled, the Content Manager OnDemand user ID
and password have no relationship to the IBM i user ID and password, and all of the Content
Manager OnDemand system parameter settings are honored. You can enable or disable this
exit at an individual instance level.

User Security Exit (ARSUSEC on z/OS only)


On z/OS, the ARSUSEC exit invokes the ARSUSECZ security exit module. The security exit allows
the communication with an external security manager, such as RACF, which then determines
whether the specific activity is allowed.

When you enable the exits to implement the required level or type of security, the user ID
must be defined for both TSO and Content Manager OnDemand.

Figure 6-12 is an overview of the security system exits interface.

Us er Authent icati on
Resource A ut horizat ion
user login, add, delet e or update CMOD System grant ac cess t o folders , applicati on
Ars usec (dll) groups, docum ents, and query SQL
ex ported entry point of SECURITY. 2 1 Ar supe rm (dll)
SRVR_FLAGS_FOLDER_APPLGRP_EXI T=1
ARSUSE C. C (C s ample) SRVR_FLAGS_DOCUMENT_EXI T=1 export ed entry point of PERMEXI T.
ARSUSE CH (C header) SRVR_FLAGS_SECURITY_EXI T=1 SRVR_FLAGS_SQL_QUERY_EXIT=1 ARS UP ERM.C (C sample)
ARSCSX IT.h ARS US ECH (C header)
or ARS CS XIT. h
ex ported entry point of SECURITC. initializ es t he ARSUSE CA s tructure and call or
ARSUSE CC (Cobol sample)
exported ent ry point of
ARSUSE CB (Cobol copy book )
PE RM EXI T.
ARSCSX IC (Cobol c opybook )
ARS UP ERC (Cobol sample)
ARSUSE JJ (compi le JCL) 3 ARSZUXF ARS CS XIC (Cobol c opybook)
ARSZUXP L structure ARS UP ERJ (compile J CL)
exit rout ine driv er
4
ARSUSECA and A RS US ECH
(or A RS US ECB) provide A RSUSECX
mappings of t he dat a st ruct ures external ex it routine driver - As sembler ARSUSE CJ s ample JCL
pres ent ed as input parameters st ream us ed to ass emble and
to exit routines (ex . bind ARSUSECX and
ARSUSECZ) associated with 5
ARSUSECZ.
the ex it point defined by
ARSUSECX.
MVS D ynami c Exi t Faci lity
8

6 A s et of ex it routines 7

Logical exit point name


ARS.S ECURITY A RSUSEC Z r acrou te SAF R ACF, A CF2, Top
Se cre t
Setprog exit…. .A RSUSE CZ S ecuri ty exit module
as sembler
Option al ly
m odified
Se curity datab ase

Figure 6-12 Security exits interface

Chapter 6. Security 153


With the ARCCSXIT_SECURITY_OKAY_BUT_VALIDATE_IN_OD return code option, a user can act on
a request and then the option allows Content Manager OnDemand to perform the standard
security processing. For example, do not allow a new password to match an old password in a
change-password request; the password must be changed.

Table 6-2 lists the z/OS modules or executable files that ship with Content Manager
OnDemand.

Table 6-2 Security exit modules


Module Description

ARSUPERM This c-module provides the interface between the Content Manager OnDemand
system and the ARSUSECX module.

ARSUSEC This c-module provides the interface between the Content Manager OnDemand
system and the ARSUSECX module.

ARSUSECA The mapping of the data structure that is presented to the exit routine is associated
with the exit point that is defined by ARSUSEC in assembler.

ARSUSECH The mapping of the data structure that is presented to the exit routine is associated
with the exit point that is defined by ARSUSEC in C.

ARSUSECJ This sample JCL stream is for assembling and binding ARSUSECX and ARSUSECZ.

ARSUSECX This interface module is for the MVS Dynamic Exit Facility.

ARSUSECZ This module is the Security Exit Module Sample.

All modules are in the SARSINST library. The sequence of this exit, using the MVS Dynamic
Exit Facility, is different from the classical interface with exit modules or a security exit in an
IBM CICS environment. The kernel code was updated to allow external security. The Content
Manager OnDemand kernel code calls a dynamic link library (DLL) as an interface to the exit.
Modules ARSUSEC and ARSUPERM are provided as C source code modules and as
executable files. You do not need to change and recompile them.

The source is delivered mainly for understanding the entire security system exit. If you want to
change the modules, they must be recompiled and bound as a C dynamic link library (DLL).
These modules communicate with the ARSUSECX module, which is an interface to the MVS
Dynamic Exit Facility. The security exit module ARSUSECZ is the delivered sample that
shows how to perform security checks with a Security Exit Facility (SAF) interface. RACF is a
program that uses SAF. ARSUSECH is a C source code module that passes the data
structure as input for every exit (ARSUSECZ) that is provided. ARSUSEA provides the same
function in assembler language.

Note: More than one security exit can be defined to the MVS Dynamic Exit Facility. For
example, you can define a different security exit for each instance.

Tip: The only module that you must change is the provided source code ARSUSECZ to
meet your requirements. It must be assembled and linked into a library that is accessible
for the MVS Dynamic Exit Facility.

154 IBM Content Manager OnDemand Guide


6.7.2 Security systems other than SAF (z/OS only)
The sample that is provided with the Content Manager OnDemand installation is an SAF
sample. However, other installations use their own security system or use their security
system as an enhancement together with the SAF environment. These systems can be
accessed if they provide a correct assembler callable interface. The security exit sample code
contains an example for every function. These functions can be changed or updated in the
sample code.

For example, if your folder permissions are stored in an external security system without any
SAF interface, this part must be updated to call this external security system.

Content Manager OnDemand SAF resource classes


You must define SAF resource classes ARS1FLDR and ARS1APGP for the folders and
application group. For more information about the resource classes, see the section,
“OnDemand SAF resource classes”, in the IBM Content Manager OnDemand for z/OS -
Configuration Guide, SC19-3363.

Important: Even if the security exit can check the user ID and password against SAF or
other security systems, every user must be defined in Content Manager OnDemand in
every instance. You can use the ARSXML program to create users in batch mode, and use it
as a command from the UNIX System Services command line and use a file as input.

Activating the security and permission exits (ARS.INI)


Activation of the security exit is controlled by settings in the ARS.INI file. The settings and their
corresponding events are listed in Table 6-3.

Table 6-3 ARS settings and the corresponding enabled events


ARS.INI statement Enabled event

SRVR_FLAGS_SECURITY_EXIT=1 Logon.
(This setting is the default for Content Manager Changing the password.
OnDemand for i. If you do not want to use IBM i Adding or deleting a user ID through the Content
security for the new instance, change the security Manager OnDemand administrator interface.
setting to 0.)

SRVR_FLAGS_FOLDER_APPLGRP_EXIT=1 Activates the folder or the application group


permission.

SRVR_FLAGS_SQL_QUERY_EXIT=1 Activates the SQL query exit.

SRVR_FLAGS_DOCUMENT_EXIT=1 Activates the document permission exit.

Implementing the security exit in a z/OS environment


The module ARSUSECX interfaces with the MVS Dynamic Exit Facility:
 It defines the logical exit point name, ARS.SECURITY.
 It routes the control to a set of associated exit routines and processes the results of their
execution.

Note: The sample processes the feedback of the exit one at a time, even if you are
running more than one exit.

Chapter 6. Security 155


An exit routine must be eligible for execution by associating a logical exit point
(ARS.SECURITY). In this example, the MVS Dynamic Exit Facility provides several methods to
perform this association. You can use the PROGXX statement in Sys1.Parmlib to define exits to
the Dynamic Exit Facility at IPL time (Exit statement for PROGXX).

The following example shows the exit statement for PROGXX:


EXIT ADD EXITNAME(ARS.SECURITY) MODNAME(ARSUSECZ)

In addition, you can use the following operator command to add the exit:
SETPROG EXIT,ADD,EXITNAME=ARS.SECURITY,MODENAME=ARSUSECZ

Important: The load module must be in a link pack area (LPA) or an LNLKLST dataset.

6.7.3 Unified logon exit (ARSPTGN): z/OS only


With the Content Manager OnDemand unified login exit (ARS.PTGN), you can run the Content
Manager OnDemand command-line utilities (such as ARSLOAD) without requiring a specified
user ID and password.

This facility to log on without specifying a password specifies a PassTicket as a password


when you use a RACROUTE REQUEST=VERIFY call. Figure 6-13 shows the unified logon exit.
CMOD in the figure stands for Content Manager OnDemand.

USS comma nd promp t:


comman d lin e utili ty (ex. Arsloa d)
If (no u serID, p wd spe ci fi ed) then USS
call
Logical exi t point name AR S.PTGN
Setpro g exit….. ARSPTGN … or
Parmlib(PROGx x. .) E XIT st at em ent

PassTicket
ARSPTGN
MVS Dyna mic Exit Faci lity Securit y exit module racro ute SAF
ass embler
Gen erated
Optionall y
mo dified &
re tu rned RAC F, ACF2, Top
Secr et
Ut ility logs on to s erv er
Perf orms f unc tion CMOD RAC ROUTE R EQU EST=VERIFY
term inat es
Server
Secu rity databa se

Figure 6-13 Unified logon exit

To enable PassTicket in a security manager, such as RACF, you must complete the following
steps:
1. Activate the PKTDATA class.
2. Define a secured sign-on application key for each application.
3. Run SETROPTS RACLIST(PTKTDATA).

156 IBM Content Manager OnDemand Guide


6.7.4 System log user exit
Content Manager OnDemand generates messages about the various actions that occur on
the system. For example, when a user logs on the system, Content Manager OnDemand
generates a message that contains the date and time, the type of action, the user ID, and
other information. Unless you specify otherwise, certain messages are automatically saved in
the system logging facility. You can configure the system to save other messages in the
system logging facility.

The system log user exit allows access to all of these messages. The exit can then use these
messages for further processing. For example, an email can be generated when a load fails,
or when a user’s system access pattern is abnormal and requires attention. For more
information about the system log, see 11.4.1, “System log exit for Multiplatforms” on page 250
and 11.4.2, “System log exit for z/OS” on page 253.

6.8 Summary
Content Manager OnDemand provides a secure environment. Security features within
Content Manager OnDemand allow access control to the data and the APIs that access the
data. The data itself is controlled at rest and in motion (SSL). Additional exits that are external
to Content Manager OnDemand can be created that allow the creation of customized
extensions to the Content Manager OnDemand internal security.

Chapter 6. Security 157


158 IBM Content Manager OnDemand Guide
Part 2

Part 2 Data indexing, loading,


retrieval, and expiration
This part contains the following chapters:
 Chapter 7, “Indexing and loading” on page 161
 Chapter 8, “User clients” on page 185
 Chapter 9, “Data conversion” on page 207
 Chapter 10, “Migration and expiring data and indexes” on page 219
 Chapter 11, “Exits” on page 241

© Copyright IBM Corp. 2003, 2015. All rights reserved. 159


160 IBM Content Manager OnDemand Guide
7

Chapter 7. Indexing and loading


In this chapter, we describe the various indexers that are available for IBM Content Manager
OnDemand (Content Manager OnDemand).

In this chapter, we cover the following topics:


 Introduction
 Getting started with PDF indexing
 Getting started with ACIF indexing
 OS/390 indexer on z/OS and AIX
 OS/400 indexer on Content Manager OnDemand on IBM i
 User exits
 Additional references

© Copyright IBM Corp. 2003, 2015. All rights reserved. 161


7.1 Introduction
Before documents can be loaded into Content Manager OnDemand, they must be indexed.
These indexes can be created during the load process (OS/390 indexer), directly before the
load process (Advanced Function Presentation (AFP) Conversion and Indexing Facility
(ACIF), OS/400, XML, and Portable Document Format (PDF) indexers), or before the load
process (Generic indexer). When the indexes are not created as part of the load process, they
are stored in an index file. The index file contains the index values that are associated with
the document and “pointers” to the documents. You cannot load documents into Content
Manager OnDemand without index values.

The index values are text strings that occur in the documents, for example, “John Doe”, or
“Account 1234”. One or more index values identify a unique document in Content Manager
OnDemand.

An indexer extracts the index values and optionally stores them in the index file by examining
the documents and copying the index values into the index file according to criteria that are
specified by the user. Depending on the indexer that is used, the data and indexes are either
directly loaded into Content Management OnDemand or are stored in a set of files that are
then read by the load process to store the data to Content Manager OnDemand. The indexer
creates the following files:
 Output file (.out file extension), which contains the documents to load
 Index file (.ind file extension), which contains the index values for the documents

The indexer might also create a resource file with a .res extension, which contains the
resources that are extracted from the documents.

Operationally, the loading process arsload calls the indexer that is specified on the Indexer
Information tab for the specified application. Depending on the indexer type, arsload
performs one of the following tasks:
 Creates a set of files that is then loaded by the arsload program into the Content Manager
OnDemand System
 Directly passes the indexing and document information to the arsload program so that
they can be loaded into the Content Manager OnDemand System

On Content Manager OnDemand for i, arsload is embedded within the (ADDPRPTOND) user
interface. Therefore, run the Add Report to Content Manager OnDemand (ADDPRPTOND)
command instead of ARSLOAD.

It is possible for the indexing to complete successfully but for the load to fail. The following
reasons are the most common reasons for a loading failure:
 Using insufficient system resources
 Connecting to the wrong database
 Extracting the wrong index value from the document

For information about investigating and resolving common load failures, see 18.1.2, “Indexing
and loading issues” on page 379.

162 IBM Content Manager OnDemand Guide


7.1.1 Loading and indexing files that were created on another system
Reports and documents are often created on a platform other than the platform on which the
Content Manager OnDemand Instance is installed. Two main ways exist to load these reports
and document files:
 Transfer the files from the remote system to the system that contains the Content Manager
OnDemand instance and then index and load the documents on that system.
Many applications are available for transferring files.
For example, if your reports are generated on a z/OS system and you want to load them
from a Microsoft Windows system, you can use these methods:
– On the z/OS side, use the “Download for z/OS” application to automatically download
the files from the z/OS system. “Download for z/OS” is a utility that is included as part
of the Print Services Facility for z/OS.
– On the receiving side (in this case Windows), you can use the Content Manager
OnDemand ARSJESD utility. The ARSJESD utility runs as a service on Windows, and
it runs as a daemon on other platforms.
For more information about ARSJESD, see the IBM Content Manager OnDemand for
Multiplatforms Administration Guide, SC19-3352.
 Run the indexing and load program on the remote system. In this case, the load program
sends the documents and indexes to the Content Manager OnDemand System through
the TCP/IP network. To run the index and load programs on the remote system, you must
copy the appropriate Content Manager OnDemand product code to that system.

You can choose to use either or both of these methods for your remote data loading.

7.1.2 Understanding input data types


It is important to know the data type of the documents that you load into Content Manager
OnDemand. By data types, we mean document formats, such as Line Data, SCS, AFP, or
PDF. In addition to knowing the data type, if you are loading line data, it has the following
characteristics:
 Fixed length or variable records
 If variable, stream or 2-byte length prefix
 If stream, identify the record delimiter
 Whether carriage controls are present
 Type of carriage control, American National Standards Institute (ANSI) or machine
 Whether Table Reference Character (TRC) codes are present
 Code page of the data

Run arsafpd to determine the input data type of your file. Knowing the input data type
determines the indexer that you can use and also helps you determine several of the indexing
parameters that you need.

To run arsafpd from the command line, enter the following command:
arsafpd -s -i <input file>

Figure 7-1 on page 164 shows examples of running the arsafpd command and the output
that is produced.

Chapter 7. Indexing and loading 163


arsafpd -s -i testfile1.txt
ARS7104I Document type: LINE
ARS7114I Records appear to be delimited by hex character(s): 0x0A
ARS7115I Codepage appears to be: ASCII
ARS7110I Carriage control type appears to be: NONE
ARS7111I Pages appear to be delimited with a formfeed (0x0C). The
asciinp and asciinpe userexit might be required if using ACIF.

arsafpd -s -i testfile.afp
ARS7104I Document type: AFP
ARS7107I Group TLE structured fields were encountered

arsafpd -s -i admin.pdf
ARS7104I Document type: PDF
Figure 7-1 Examples of running the arsafpd command and the output that is produced

You can also run the arsafpd command to display the contents of an AFP document, index, or
resource file. For more information about ARSAFPD, see the Content Manager OnDemand
for Multiplatforms Administration Guide, SC19-3352.

7.1.3 Choosing an indexer


You choose the indexer to use based on multiple factors, including the data type of the
documents, the platform on which you are running the indexer, and other criteria. The main
factors are listed in Table 7-1. Many other factors, such as cross-platform compatibility,
advanced indexing functions, and expertise, exist.

Table 7-1 Indexers that are available for use with Content Manager OnDemand
Indexer Input data type Available Conversion Resource Large object Floating
platforms collection support triggers

Generic All All No No No No

ACIF Line, AFP All, except Line to AFP Yes Yes Yes
IBM i

PDF PDF All, except No Yes No Yes


z/OS

OS/400 Line, AFP, SCS, IBM i SCS to AFP Yes Yes Yes
and SCS-Ext

OS/390 Line, AFP z/OS and No Yes Yes Yes


AIX

XML XML All No Yes No No

164 IBM Content Manager OnDemand Guide


Consider the following information about Table 7-1 on page 164:
 The Generic indexer requires the user to manually create an index file in the generic index
format before the user starts the load process. The Generic indexer allows the capture of
documents, index values, and resources that are identified to it. These documents, index
values, and resources are then loaded into the Content Manager OnDemand archive and
stored in the same manner as though they were loaded through any of the other indexers.
An existing resource file can be loaded with a generic index file.
For more information about the generic index format, see IBM Content Manager
OnDemand - Indexing Reference, SC19-3354.
 The ACIF, PDF, XML, and OS/400 indexers all generate intermediate files. These files are
then used to load the indexes and data into the Content Management OnDemand system.
 The OS/390 indexer creates the index data while it loads the indexes and data into the
Content Management OnDemand system.
 Conversion refers to a conversion by the indexer. Other products integrate with Content
Manager OnDemand that also convert data.
 Because of the architecture of PDF documents, large object support for PDF documents is
not possible.
 Starting with V9.5, the PDF Indexer runs in the PASE environment on IBM i. PASE is a
prerequisite on IBM i for V9.5.
 Starting with V9.5, the PDF Indexer is no longer supported on z/OS.

7.2 Getting started with PDF indexing


PDF is a standard that is specified by Adobe Systems, Incorporated, for the electronic
distribution of documents. PDF files are compact. They can be distributed globally through
email, the web, intranets, or CD-ROM, and viewed with Adobe Reader.

PDF is a data type or file format that is platform (hardware, operating system)-independent. A
PDF file contains a complete PDF document that is composed of text, graphics, and the
resources that are referenced by that document.

Two PDF file layouts are possible:


 Non-Linear (not “optimized”)
This file layout is optimized for space savings. Storing a PDF file by using a Non-Linear
layout consumes less disk space than storing the same PDF file linearly. It is slower to
access or display this type of layout because portions of the data that is required to
assemble pages of the document are scattered throughout the PDF file, so the whole PDF
file must be downloaded and accessed before the file can be displayed.
 Linear (“optimized” or “web optimized”)
In this file format, the PDF file is created in a linear (in page order) fashion. This file format
allows the PDF viewer to start displaying the PDF document pages when they are
downloading without waiting for the whole PDF file to be downloaded.

Chapter 7. Indexing and loading 165


7.2.1 Limitations
The maximum input file size that is supported by PDF Indexer is 4 GB. The amount of data
that can be processed from an input file is also limited by the amount of memory that is
available on the server on which you are running the PDF Indexer. The maximum size of a
single document within the input file that can be loaded into Content Manager OnDemand is
2 GB; however, we suggest that the size of a single PDF document does not exceed 50 MB.

Secure PDF documents are not supported. PDF Digital Signatures are not supported. If a
PDF document contains a digital signature, after indexing, the .out file does not contain the
digital signature. To load a file that contains a PDF Digital Signature, create a generic index
file for it, and load the file as one document.

7.3 Performance considerations


The best performance of the PDF Indexer is on the Windows platform. For the preferred
performance practices, see 13.4.1, “PDF data” on page 308.

7.3.1 PDF fonts and output file size


The fonts that are used in a PDF document are one of the factors that determines the
indexing’s output file size.

The base 14 Type 1 fonts


The base 14 Type 1 fonts are a core set of fonts that are always available to the Acrobat
program. Because they are available on the system, they are not embedded in the document.
Therefore, documents that are created with these fonts are more compact. The base 14 fonts
are listed:
 Courier
 Courier-Bold
 Courier-BoldOblique
 Courier-Oblique
 Helvetica
 Helvetica-Bold
 Helvetica-BoldOblique
 Helvetica-Oblique
 Times-Roman
 Times-Bold
 Times-Italic
 Times-BoldItalic
 Symbol
 ZapfDingbats

Fonts that are not members of the base 14 fonts might be embedded in the document, or they
might be stored in a font directory.

Images and bar code fonts are also embedded in the document.

The PDF Indexer collects resources, such as fonts and images, removes them from the
document, and places them in a resource file. The number of embedded fonts in the
document directly affects the size of the resource file.

166 IBM Content Manager OnDemand Guide


We recommend that you use only the base 14 fonts when you create PDF documents.
Because these fonts are not embedded in the document, documents that are created with
these fonts are smaller, and the resource file is also smaller.

Accessing fonts
If a document references fonts that are not embedded and fonts that are not available on the
system, the document does not display correctly in the report wizard, and the PDF Indexer
cannot index it. In the report wizard, the document might display as a series of dots instead of
letters; the PDF Indexer fails with the “Trigger not found” message.

If your documents contain Asian fonts, ensure that you install them when you install Adobe
Acrobat.

If the fonts are not embedded in the document, use the FONTLIB parameter to tell the PDF
Indexer the location of font files.

Listing fonts in a PDF file


If you want to know the fonts that are contained in a PDF document, a simple method within
the Adobe viewer is available to list the fonts in your data.

Follow these steps to list the fonts in a PDF (for example, for Adobe Reader XI, version
11.0.3):
1. Display your PDF document in the Adobe viewer (or reader).
2. Click File → Document Properties → Fonts. You will see a list of fonts for the document.

The path to see the fonts might differ, depending on your viewer version.

7.3.2 Reducing output file size with PDF documents


When you index PDF data, you might be surprised by the size of the output file that the PDF
Indexer creates after it indexes the data. In certain cases, the PDF file that is loaded into
Content Manager OnDemand is many times larger than the source PDF file.

When the input file is indexed, it is split into multiple PDF documents. Each PDF document
contains its own set of PDF structures that are required by the PDF architecture. For this
reason, the multiple PDF documents that are created by the indexing can be larger in total
than the original PDF document.

One way to reduce the size of the output file is using the base 14 fonts.

Chapter 7. Indexing and loading 167


In addition, the following PDF parameter settings can help reduce the size of the output file:
 RESTYPE=ALL
The PDF Indexer removes fonts and images from the input file and places them into a
separate resource file. Without this option, each PDF document that is created by the
indexing contains its own set of duplicate resources. Always use this parameter.
 BOOKMARKS=NO
If a PDF document contains bookmarks, each PDF document that is created by the
indexing process contains the complete set of bookmarks for the input file. Because the
input file is now split into separate documents, most of these bookmarks are invalid. This
option prevents the PDF Indexer from copying any bookmarks to the new PDF files.
 REMOVERES=YES
This option causes the PDF Indexer to remove unused resources and their supporting
structures from the input file before the indexing occurs. Otherwise, the PDF Indexer puts
unused resources (with those resources that are used) into the resource file.

7.3.3 PDF indexing: Using PDF metadata


When the PDF file is created, the user or application must place the metadata (indexes) in the
PDF file. The metadata (indexes) within the document can be modified at any time after which
a new copy of the document can be reloaded into Content Manager OnDemand.

Setting INDEXMODE=METADATA (for the application) causes the PDF Indexer to extract fields from
the Document Information Dictionary that correspond to the specific metadata keywords (if
they exist) and place the extracted values into the .ind file to load into Content Manager
OnDemand. The metadata keywords are listed:
 Title
 Author
 Subject
 Creator
 Producer
 CreationDate
 ModDate
 Trapped

The main advantage of using metadata is the increased speed during the index process. The
main disadvantage of using this method is that each document needs to be loaded
individually; you cannot create large concatenated (multiple document) input data files.

For more information about using PDF metadata, see IBM Content Manager OnDemand -
Indexing Reference, SC19-3354.

7.3.4 PDF indexing: Using the report wizard (graphical indexer)


The report wizard, which is also known as the graphical indexer (technically part of the report
wizard), processes PDF input files.

If you plan to use the report wizard, you must first install Adobe Acrobat on the Windows
workstation from which you plan to run the Administrator Client. You must purchase Adobe
Acrobat from Adobe.

168 IBM Content Manager OnDemand Guide


Installation
Content Manager OnDemand provides the ARSPDF32.API file to enable PDF viewing from the
client.

If you install the client after you install Adobe Acrobat, the installation program copies the
application programming interface (API) file to the Acrobat plug-in directory.

If you install the client before you install Adobe Acrobat, you must copy the API file to the
Acrobat plug-in directory manually.

If you upgrade to a new version of Acrobat, you must copy the API file to the new Acrobat
plug-in directory.

The default location of the ARSPDF32.API file is:

C:\Program Files (x86)\IBM\OnDemand Clients\V9.5\PDF

The default Acrobat plug-in directory is C:\Program Files (x86)\Adobe\Acrobat


x.y\Acrobat\plug_ins. The variables x.y represent the version of Acrobat, for example,
C:\Program Files (x86)\Adobe\Acrobat 10.0\Acrobat\plug_ins.

Graphical indexer example


By using the graphical indexer, you can define triggers, fields, and indexes for PDF reports
within the application component of Content Manager OnDemand in a similar way to defining
them for line data. This section serves as an introduction to the PDF graphical indexer by
stepping through an example of indexing a PDF document.

The example describes how to use the graphical indexer from the report wizard to create
indexing information for an input file. The indexing information consists of a trigger that
uniquely identifies the beginning of a document in the input file and the fields and indexes for
each document. We elaborate on this example by clarifying several of the instructions, and
throughout each step, we add important hints, tips, and explanations.

The process consists of these steps:


1. Start the Administrator Client and log on to a server.
2. Start the report wizard. Click the report wizard icon on the toolbar.
3. In the Sample Data window, select PDF from the drop-down list of data types, and then
click Select Sample Data.
4. In the Open window, enter the name or full path name of your file in the space that is
provided or use the Browse option to locate your PDF file.
5. Click Open. The graphical indexer opens the input file in the report window.
If the PDF data fails to display, or an error message, such as the message that is shown in
Figure 7-2, is displayed, you must follow the steps in “Installation” on page 169 to verify
that the API file is in the correct Acrobat plug-in directory.

Figure 7-2 Error message if PDF does not display

Chapter 7. Indexing and loading 169


6. Press F1 to open the main help topic for the report window.
The main help topic contains general information about the report window and links to
other topics that describe how to add triggers, fields, and indexes. For example, to get help
to define a trigger, click Adding a trigger (PDF). You can also use the context help tool to
display information about the icons on the toolbar.
7. Close any open help topics and return to the report window.
8. To define a trigger, complete the following steps:
a. Find a text string that uniquely identifies the beginning of a document, for example,
Account Number, Invoice Number, Customer Name.

Note: To create trigger values in hexadecimal, select the Output Hexadecimal


Strings check box in the Indexer Properties window before you define a trigger.

b. By using the mouse, draw a box around the text string. Start just outside of the
upper-left corner of the string. Click and then drag the mouse toward the lower-right
corner of the string. As you drag the mouse, the graphical indexer uses a dotted line to
draw a box. After you enclose the text string inside a box, release the mouse. The
graphical indexer highlights the text string inside the box. If the string is not highlighted,
try again and increase the box’s size.

Important: Size the box that you created around the text string, which you are trying
to collect, as large as possible to ensure that the field is collected at load time.

Figure 7-3 on page 171 shows an example of a box that is intended to capture the text
string Content. You can see that the box is much larger than the text string, and it
overlaps onto text that we do not want to collect. However, notice the Add a Trigger box
that is displayed; only the string Content is shown in the Value entry field, which means
that only the string Content is fully encapsulated in the box. Overlapping other text
might seem like an unnecessary precaution. However, when we are capturing data with
the PDF graphical indexer, it is an excellent way to ensure that we encapsulated all of
the text string that we must capture.

170 IBM Content Manager OnDemand Guide


Figure 7-3 Capturing text with the PDF graphical indexer

c. Click the Define a Trigger icon on the toolbar.


d. In the Add a Trigger window (Figure 7-3), verify the attributes of the trigger by
confirming that the text string in the Value field for Trigger 1 is correct. For Trigger 1,
you cannot specify any options or values. For other triggers, click Help for assistance
with the other options and values. Click OK to define the trigger.
e. Follow these steps to verify that the trigger uniquely identifies the beginning of a
document:
i. On the toolbar, click the fourth icon from the right to place the report window in
display mode.
ii. Click the Select tool.
iii. In the Select window, under Triggers, double-click the trigger. The graphical indexer
highlights the text string in the current document.
Double-click the trigger again. The graphical indexer highlights the text string on the
first page of the next document.
iv. Use the Select window to move forward to the first page of each document and
return to the first document in the input file.

Chapter 7. Indexing and loading 171


f. On the toolbar, click the fourth icon from the right to place the report window back into
add mode.
9. Define a field and an index:
a. Find a text string that can be used to identify the location of the field. The text string
needs to contain a sample index value. For example, if you want to extract account
number values from the input file, find where the account number is printed on the
page.
b. By using the mouse, draw a box around the text string. Start just outside of the
upper-left corner of the string. Click and then drag the mouse toward the lower-right
corner of the string. As you drag the mouse, the graphical indexer uses a dotted line to
draw a box. After you enclose the text string inside of a box, release the mouse. The
graphical indexer highlights the text string inside the box.

Important: Use the same principles for collecting fields as collecting the trigger text
string in step 8b on page 170. If the fields that must be collected are close together,
overlap them with adjacent fields to ensure that the box is as large as possible and
to ensure that the data is collected at load time.

c. Click the Define a Field icon on the toolbar.


d. In the Add a Field window, complete the following steps:
i. On the Field Information tab, verify the attributes of the Index field. For example, the
text string that you selected in the report window is displayed under Reference
String and the trigger identifies the trigger on which the field is based. Click Help for
assistance with the options and values that you can specify.
ii. On the Database Field Attributes tab, verify the attributes of the database field. In
the Database Field Name field, enter the name of the application group field into
which you want Content Manager OnDemand to store the index value. In the Folder
Field Name field, enter the name of the folder field to display in the client search
window. Click Help for assistance with the other options and values that you can
specify.
iii. Click OK to define the field and index.
e. To verify the locations of the fields, complete the following steps:
i. Place the report window into display mode. Blue boxes are drawn around the fields.
ii. Click the Select tool.
iii. In the Select window, under Fields, double-click Field 1. The graphical indexer
highlights the text string in the current document. Double-click Field 1 again. The
graphical indexer moves to the next document and highlights the text string.
iv. Use the Select window to move forward to each document and display the field.
Then, return to the first document in the input file.
f. Place the report window back into add mode.
10.Click Create Indexer Parameters and Fields Report to create the indexer parameter
report that the PDF Indexer uses to process the input files that you load into the
application. At a minimum, you must have one trigger, one field, and one index. For more
information about the indexing parameters, see IBM Content Manager OnDemand -
Indexing Reference, SC19-3354.
11.After you define all of the triggers, fields, and indexes, press Esc to close the report
window.

172 IBM Content Manager OnDemand Guide


12.Click Yes to save the changes to the indexer parameters.
13.In the Sample Data window, click Next to continue with the report wizard.

7.3.5 PDF indexing: Using internal indexes (Page Piece Dictionary)


When the PDF document is created, the user or application must insert indexes into the Page
Piece Dictionary. For Content Manager OnDemand, the Page Piece Dictionary must be
named “IBM-ODIndexes” to allow the PDF Indexer to find the Page Piece Dictionary and
collect the index values.

Setting INDEXMODE=INTERNAL (for the application) causes the PDF Indexer to segment the
input file into the individual documents, gather the various PDF resources (fonts, images, and
forms), and then load the PDF indexes, documents, and resources into Content Manager
OnDemand.

The use of internal indexes offers multiple advantages:


 Fast indexing: A single PDF file can contain many PDF documents. Extracting the indexes
for these documents is now fast because Content Manager OnDemand now scans the
documents and reads the index values directly from the Page Piece Dictionary. (No search
exists for the indexes within the document data.)
 Different formats can exist in a single PDF input file: This flexibility is possible if the
indexes are similar because only the index is read and processed by Content Manager
OnDemand.
 The indexed PDFs can be either static or dynamic: Static PDF forms render once and are
displayed on the client in Adobe Acrobat or Adobe Reader. Static PDF forms are not
re-rendered in response to user interactions. Dynamic PDF forms render on the client in
Adobe Reader and, depending on the user interactions, can re-render on the client several
times. Re-rendering causes the content of the form (all objects, including text and image)
to change.
Both the static and dynamic PDFs can be indexed because the PDF Indexer is only
looking at the Page Piece Dictionary. The PDF document data is not examined or
processed.

For more information about using internal indexes (Page Piece Dictionary), see IBM Content
Manager OnDemand - Indexing Reference, SC19-3354.

7.4 Getting started with ACIF indexing


The AFP Conversion and Indexing Facility (ACIF) consists of three separate but related
functions. ACIF can perform the following tasks:
 Convert line data to AFP.
 Index line or AFP data.
 Collect resources.

ACIF accepts either line data or AFP as input and can produce three output files:
 The output file, which is called the “out” file, is either line data or AFP.
 The index file, which is called the “ind” file, is an AFP file.
 The resource file, which is called the “res” file, is an AFP file.

Chapter 7. Indexing and loading 173


Three “modes” of running ACIF are available:
 Mode one: Line data input to ACIF creates line data output:
– Specify the ACIF parameter CONVERT=NO.
– ACIF does not create a resource file.
– Files produced: .out and .ind.
 Mode two: Line data input to ACIF creates AFP output:
– Specify the ACIF parameter CONVERT=YES.
– ACIF creates an AFP resource file.
– Files produced: .out, .ind, and .res.
 Mode three: AFP input to ACIF creates AFP output:
– Specify the ACIF parameter CONVERT=YES.
– ACIF creates an AFP resource file.
– Files produced: .out, .ind, and .res.

A subset of the second mode is mixed mode input (line data records mixed with AFP
records). In this case, ACIF creates AFP output:
 Specify the ACIF parameter CONVERT=YES.
 ACIF creates an AFP resource file.
 Files produced: .out, .ind, and .res.

Types of ACIF parameters


Because ACIF has so much functionality, it has many parameters. Four logical sets of ACIF
parameters are available:
 ACIF parameters that describe the format of the data: CC, CCTYPE, TRC, FILEFORMAT, and
CPGID
 ACIF parameters for line data to AFP conversion: CONVERT, MCF2REF (we recommend
coded font (CF parameter) instead of code page character set (CPCS parameter)), IMAGEOUT
(we recommend ASIS parameter instead of Image Object Content Architecture (IOCA)),
FORMDEF, and PAGEDEF
 ACIF parameters for indexing: TRIGGER, FIELD, INDEX, INDEXOBJ, and INDEXSTARTBY
 ACIF parameters for collecting resources: RESTYPE and EXTENSIONS=RESORDER

For a description of the parameters, see the section “ACIF reference” in IBM Content
Manager OnDemand for Multiplatforms Indexing Reference, SC19-3354, or “ACIF reference”
in IBM Content Manager OnDemand for z/OS Indexing Reference, SC19-3368.

Tools for working with ACIF


Consider the use of the following tools when you work with ACIF:
 The Administrator line data graphical indexer
 A hexadecimal editor to display the input file
 The arsafpd utility, run with the -d and -t options

The arsafpd utility can display the .out file (if it is AFP), .ind file, and .res file that are
created by ACIF.

174 IBM Content Manager OnDemand Guide


7.4.1 Understanding the input data
On every platform except z/OS, the FILEFORMAT parameter is used to describe the format of
the input data. Before setting the FILEFORMAT parameter, it is important to understand the
difference between the carriage control and the delimiter:
 The delimiter separates the records. The most common delimiters are x'0A' and x'0D0A'.
 The carriage control, if it exists, is the first byte of each record. The carriage control follows
the delimiter, except at the beginning of the file, where the carriage control is the first byte.
(Therefore, to search in a hexadecimal editor for the beginning of the next page of a file
that uses x'0A' as the delimiter, search for x'0AF1' or x'0A31'.)

FILEFORMAT parameter
For AFP data, the FILEFORMAT parameter is not needed, unless the file is AFP in record
format. For a description of record format, see “AFP Structured Fields” on page 176.

The FILEFORMAT parameter has the following values:


 record,n:
– For example: FILEFORMAT=record,100.
– Fixed-length line data.
– This type of file has no delimiter.
 stream:
– For example: fileformat = stream,(newline=X'0A') or (newline=X'0D0A').
– For variable record files that are created on UNIX platforms.
– Specify the delimiter in the FILEFORMAT parameter.
 record:
– For example: FILEFORMAT=record.
– Each record has a 2-byte prefix, which contains the length of the record. This length is
exclusive, which means that it does not include the length of the 2-byte prefix itself. A
download for z/OS adds this prefix when it downloads files.
– This type of file has no delimiter.

Carriage controls
It is important to set the ACIF parameters CC and CCTYPE correctly. Table 7-2 describes the
ANSI carriage controls. The encoding columns show what you see if you look at the
document in a hexadecimal editor.

Table 7-2 ANSI carriage controls


Carriage control Description Encoding in ASCII Encoding in EBCDIC

1 New page x'31' x'F1'

<space> Space one line x'20' x'40'

0 Space two lines x'30' x'F0'

- Space three lines x'2D' x'60'

+ Suppress space x'2B' x'8F'

Chapter 7. Indexing and loading 175


Machine carriage controls
Machine carriage controls are in data that is created on z/OS.

Because machine carriage controls are binary values, if a file contains them, it must always
be transferred as binary. Machine carriage controls cannot be converted to ASCII. For a list of
machine carriage control values, see the following website:
http://ibm.co/1M2ZtSG

AFP Structured Fields


AFP, which is also called Mixed Object Document Content Architecture (MODCA), is a
printing architecture that was designed and created by IBM. The beginning of each AFP
record is called the AFP Structured Field Introducer. The following sample shows an example
and description of an AFP Structured Field Introducer (which is shown in the hexadecimal):
5A 00 10 D3 A8 A8 00 00 00
 The first byte is always x'5A'.
 The second and third bytes are the length (maximum length of 32767).
 The fourth byte is always x'D3'.
 The fourth, fifth, and sixth bytes are the Structured Field Identifier, for example, x'D3A8A8'
or x'D3A8AF'.
 The seventh byte is the flag byte. The last two bytes are reserved and usually zeros.
 The information that follows the reserved bytes depends on the Structured Field.
 The length does not include the x'5A'.

For more information, see the Mixed Object Document Content Architecture (MO:DCA)
Reference, AFPC-0004-08, at the following website:
http://afpcinc.org/afp-publications/

The following two examples in hexadecimal of the AFP Structured Field Introducer show the
most common Structured Fields that you might see at the beginning of an AFP file:
5A 00 10 D3 A8 A8 00 00 00 Begin Document (BDT)
5A 00 5B D3 A8 C6 00 00 00 Begin Resource Group (BRG)

An AFP Structured Field can begin with the 2-byte length prefix (which is called record
format):
00 11 5A 00 10 D3 A8 A8 00 00 00

The length in the 2-byte prefix is one greater than the length in the Structured Field because
the 2-byte prefix includes the x'5A', but it does not include itself.

When you work with ACIF, it is important to know the format of the data. Use the arsafpd
utility or look at the input in a hex editor to be sure.

176 IBM Content Manager OnDemand Guide


7.4.2 The index file
ACIF creates the index file. It contains the index values that are extracted from the document,
and also the offsets and lengths of the documents in the .out file.

The index values in the index file become the values that display in the Content Manager
OnDemand Search Results window. The indexes are used to retrieve the document, which is
why the index file is so important, and why no data can be loaded without indexes. Usually,
the index file is created and used to load the documents into Content Manager OnDemand
and you never see it. However, it might be useful to look at the index file. This section
describes the format and content of the index file.

Run arsafpd to display an index file. The first Structured Field in the index file is a Begin
Document Index (BDI), which contains the code page of the index names and values. Most of
the file consists of the two AFP Structured Fields: Index Element (IEL) and Tag Logical
Element (TLE). Two kinds of IELs exist: Page Group and Page. The index file must contain
Page Group IELs for arsload to load the data.

A Page Group IEL is identified by the text “Begin Page Group Reference” in the arsafpd
output. Each Page Group IEL indicates where the group starts and its length in bytes.
Example 7-1 shows part of a Page Group IEL.

Example 7-1 Part of a Page Group IEL


2 IEL Index Element 005D D3B2A7
IEL Object Byte Extent Triplet (57)
IEL Extent = 1614 (64E) <- LENGTH OF GROUP
IEL Object Byte Offset Triplet (2D)
IEL byte offset = 201 (C9) <- WHERE IT STARTS IN THE .OUT FILE
IEL Object Structured Field Extent Triplet (59)
IEL Extent = 18 (12)
IEL Object Structured Field Offset Triplet (58)
IEL Offset = 1 (1)
IEL Medium Map Page Number Triplet (56)
IEL sequence number of page = 1 (1)
IEL Fully Qualified Name Triplet (02)
IEL 0D Begin Page Group Reference <- PAGE GROUP IEL
IEL Name = 'Smith Cyclery Co 00000001'

If you look at offset 201 in the .out file, you find a BNG Structured Field (if the .out file is
AFP), which indicates the start of a document.

You might see Page IELs in the index file. These Page IELs are created by setting the ACIF
parameter INDEXOBJ=ALL. They are needed (and are required) only if the document is being
loaded as large object. Example 7-2 shows part of a Page IEL.

Example 7-2 Part of a Page IEL


7 IEL Index Element 0044 D3B2A7
IEL Object Byte Extent Triplet (57)
IEL Extent = 1342 (53E) <- LENGTH OF PAGE
IEL Object Byte Offset Triplet (2D)
IEL byte offset = 456 (1C8) <- WHERE IT STARTS IN THE .OUT FILE
IEL Object Structured Field Extent Triplet (59)
IEL Extent = 11 (B)
IEL Object Structured Field Offset Triplet (58)
IEL Offset = 7 (7)

Chapter 7. Indexing and loading 177


IEL Medium Map Page Number Triplet (56)
IEL sequence number of page = 1 (1)
IEL Fully Qualified Name Triplet (02)
IEL 87 Begin Page Reference <- PAGE IEL
IEL Name = '00000001'

Example 7-3 shows a Tag Logical Element (TLE) that contains index information.

Example 7-3 TLE that contains index information


3 TLE Tag Logical Element 0032 D3A090
TLE Fully Qualified Name Triplet (02)
TLE 0B Attribute Name
TLE Name = 'NAME'
TLE Attribute Value Triplet (36)
TLE Value = 'Smith Cyclery Co '
TLE Attribute Qualifier Triplet (80)
TLE sequence number = 0 (0)

Summary of index file information


The index file information is summarized:
 arsload uses the code page value in the BDI to convert the index names and values to the
code page of the database. For example, the index names and values are in EBCDIC, but
the database might be in ASCII.
 TLEs contain the index values that display in the Search Results window.
 Group IELs contain the offset of where the group starts in the .out file and the length of
each group.
 All of this information is loaded into Content Manager OnDemand tables, and the index file
is discarded.

7.4.3 Fully composed AFP input


ACIF can process an input file in AFP format that contains TLEs and BNG/ENG pairs. This
data is called fully composed AFP.

Example 7-4 shows a portion of the arsafpd output of a fully composed AFP file in the correct
format to load into Content Manager OnDemand.

Example 7-4 Portion of the arsafpd output of a fully composed AFP file
1 BDT Begin Document
2 BNG Begin Named Page Group 00000001
3 TLE Tag Logical Element
4 TLE Tag Logical Element
5 TLE Tag Logical Element
6 TLE Tag Logical Element
7 IMM Invoke Medium Map ABBB
8 BPG Begin Page 00000001
9 BAG Begin Active Environment Group
10 MCF2 Map Coded Font2
11 NOP No Operation
12 PGD Page Descriptor
13 PTD2 Presentation Text Desc2

178 IBM Content Manager OnDemand Guide


14 EAG End Active Environment Group
15 BCT Begin Composed-Text Block
16 PTX Presentation Text Data
17 ECT End Composed-Text Block
18 EPG End Page
19 ENG End Named Group
20 BNG Begin Named Page Group 00000002
...
4590 ENG End Named Group
4591 EDT End Document

Each group is surrounded by BNG/ENG Structured Fields, and each group contains TLE
Structured Fields that occur after the BNG but before the BPG.

When an input file contains TLE Structured Fields, do not specify indexing parameters, such
as TRIGGER, FIELD, or INDEX. They are not needed because the file already contains index
information.

ACIF processes a file that contains TLE Structured Fields in the following way:
1. For every BNG in the input, ACIF creates a group IEL Structured Field in the index file.
2. ACIF makes a copy of the TLE Structured Fields from the input and places them into the
index file. The original TLE Structured Fields are also placed into the output file.

If the input file does not contain the correct number of TLEs in each group, ACIF might
complete, but arsload might fail with the following message:

“x fields submitted, n expected”

The n is the number of fields that are defined to Content Manager OnDemand.

After ACIF processes an input AFP file, the output file might be larger than the input file, even
if the input was an AFP file. The answer is because ACIF changes the AFP, “improves it”, and
usually increases the file size. The following changes are made to the AFP:
 Creating or adding comments to the BDT Structured Field
 Creating or adding group names to the BNG - ENG Structured Fields
 Changing obsolete Structured Fields to current Structured Fields (for example, MCF1 to
MCF2, or PTD1 to PTD2)

7.5 OS/390 indexer on z/OS and AIX


The OS/390 indexer is supported on both the z/OS and AIX implementations of Content
Manager OnDemand. The indexing parameters are the same for both implementations. If you
are migrating from z/OS to AIX, or from AIX to z/OS, you can continue to use the OS/390
indexer and not change your indexing parameters.

You can use the OS/390 indexer to extract index data from line data and AFP reports. In
addition, other data types, such as TIFF images, can be captured by using the ANYSTORE
exit (ANYEXIT is described in 11.3, “OS/390 indexer exits” on page 248).

The OS/390 indexer is a single pass indexer. (It does not create an intermediate file.) It
therefore provides better performance than ACIF. The COBOL Runtime Library is required on
AIX to run the OS/390 indexer, and it is included in the Content Manager OnDemand
Multiplatform software.

Chapter 7. Indexing and loading 179


The OS/390 indexer is enhanced to allow the storage of documents (or large object
segments) that exceed 2 GB. A report might contain multiple documents (or large object
segments), each of which exceeds 2 GB. This enhancement does not affect the limitations
that are imposed by other indexers. The limitations on the document size are based on the
available hardware and any other limitations that are placed on the operating environment.

For more information about the use of the OS/390 indexer, see IBM Content Manager
OnDemand - Indexing Reference, SC19-3354.

7.6 OS/400 indexer on Content Manager OnDemand on IBM i


The OS/400 indexer is a powerful tool to index the print data streams of IBM i application
programs. Supported data streams include SCS, AFP, and the less common SCS-Extended
and Line Data.

The OS/400 indexer provides three major functions:


 Print data stream processing: The OS/400 indexer processes the output print data
streams of application programs, for example, SCS, AFP, and Line Data reports. The
output can be viewed, printed, and archived by Content Manager OnDemand.
 Sophisticated indexing functions: The OS/400 indexer can logically divide reports into
individual items, such as statements, policies, and bills. You can define up to 32 index
fields for each item in a report if you are running a Content Manager OnDemand server
version that is earlier than version 9.0.0.1. Beginning at version 9.0.0.1 of the server,
128 index fields can be defined.
 AFP resource collection: For AFP spooled files, the OS/400 indexer determines the
resources that are necessary to view, print, and archive the print data stream and collect
the resources (except fonts, which are not stored but are mapped by the client during
display). Resources allow users to view the report as it displayed in the original printed
version, regardless of when or where the report was created.

The OS/400 indexer supports many advanced features:


 Multi-key indexes
 Spool File Archive compatibility
 Start Indexing on Page
 Translate Print Control
 AFP support with or without TLEs
 Large object support

The OS/400 indexer processes three input sources:


 Indexing parameters that specify how the data needs to be indexed. The indexing
parameters are created when you define a Content Manager OnDemand application.
 AFP resources that are required to view and print the data if the application created an
AFP print data stream.
 The print data stream, which can be in a spooled file (all data types) or in a physical file
(Line Data or SCS data that was converted to Line Data with First Character Forms
Control (FCFC) characters in column one of the data).

180 IBM Content Manager OnDemand Guide


The output of the OS/400 indexer consists of an output file that contains the text of the
spooled file and an index file that contains the index values that are extracted from the
spooled file. Also, for AFP, the output of the OS/400 indexer contains a resource file that
contains the AFP resources that are used by the spooled file (except for fonts, which are not
stored but are mapped by Content Manager OnDemand Client during display). To create a
resource file, the OS/400 indexer must have access to the resources that are required by the
input data stream. Content Manager OnDemand stores the resources and then later retrieves
the resources that are associated with a specific document when a user selects the document
for viewing.

The OS/400 indexer indexes input data based on the organization of the data:
 Document organization. For reports that are made up of logical items, such as statements,
policies, and invoices, the OS/400 indexer can generate index data for each logical item in
the report.
 Report organization. For reports that contain lines of detail with sorted values on each
page, such as a transaction log or general ledger, the OS/400 indexer can divide the
report into sets of pages and generate index data for each set of pages.

Before you can index a report with the OS/400 indexer, you must create a set of indexing
parameters. The indexing parameters describe the physical characteristics of the input data,
identify where in the data stream the OS/400 indexer can locate index data, and provide other
directives to the OS/400 indexer.

Indexing parameters include information that allows the OS/400 indexer to identify key items
in the print data stream, tag these items, and create index elements that point to the tagged
items. The OS/400 indexer uses the tag and index data for efficient and structured search and
retrieval. You specify the index information that allows the OS/400 indexer to segment the
data stream into individual items called groups. A group is a collection of one or more pages.
You define the bounds of the collection, for example, a bank statement, insurance policy,
phone bill, or other logical segment of a report file. A group can also represent a specific
number of pages in a report. For example, you might decide to segment a 10,000 page report
into groups of 100 pages. The OS/400 indexer creates indexes for each group. Groups are
determined when the value of an index changes (for example, account number) or when the
maximum number of pages for a group is reached.

Figure 7-4 on page 182 illustrates the data indexing and flow control for OS/400 indexer. For
more information about the OS/400 Indexer, see IBM Content Manager OnDemand -
Indexing Reference, SC19-3354.

Chapter 7. Indexing and loading 181


Application Spooled File OS/400 Indexer
Program Indexer Parameters

Index Object Data Object AFP Resource


.ind .out Object .res

Disk
Database
Storage Cache
Manager
Manager

Archive
OnDemand Archive
Storage
Database Media
Manager

Figure 7-4 Data indexing and flow control for the OS/400 indexer

7.7 Getting started with XML Indexing


The XML indexer enables the high-volume archiving of XML data in a scalable and extensible
manner.

The XML indexer was developed to support the growing need to efficiently and effectively
store large quantities of XML data, for example:
 The European Union’s implementation of a Single Euro Payments Area (SEPA). SEPA
replaced the existing domestic retail credit transfers and direct debits with standardized
European payments that are based on Extensible Markup Language (XML) International
Organization for Standardization (ISO) 20022 messages. ISO 20022 provides a more
efficient way of developing and implementing messaging standards that financial
institutions and clients use to exchange massive amounts of transactional information.
 Other XML standards exist and continued to be developed, such as ACORD (Insurance
industry), AgXML (Agriculture), and Health Level Seven (Health industry).
 XML document formats were developed, such as Office Open XML (OOXML) and Open
Document (OASIS).

With XML indexing, you can automatically batch index and archive XML transactional
messages and statements into the Content Manager OnDemand repository. Documents are
identified and extracted during indexing. Resources are extracted, and, together with the
data, compressed and archived. Multiple stylesheets can be specified to meet device and
accessibility requirements.

XML steeliest (resource) archiving is critical. Content Manager OnDemand optimizes the
storage of XML data by storing only a single version of a resource and then associating it with
all of the archived documents. Document resources can be automatically collected and
managed.

182 IBM Content Manager OnDemand Guide


XML data is loaded into Content Manager OnDemand by using the arsload command. For
example, the following statement loads the bamboo.in file and its .res file (if found):
arsload -I localhost -u userName -p load.stach -g ci_stmts bamboo,in

The XML indexer uses the “Generic XML Index File Format” (GXIFF). The GXIFF format is
functionally similar to the Generic Index File Format in that it allows the loading of any type of
data into Content Manager OnDemand.

For more information about using the XML indexer, see IBM Content Manager OnDemand -
Indexing Reference, SC19-3354.

7.8 User exits


A user exit is a point during processing where control is handed from the indexer program to a
user-written program. After the user-written program finishes, the control is handed back to
the indexer program.

The ACIF indexer and the OS/390 indexer support multiple user exits. The OS/400, PDF,
XML, and Generic indexers do not support any user exits.

For a description of the ACIF user exits in detail, see 11.2, “ACIF exits” on page 242.

For a description of the OS/390 indexer user exits, see 11.3, “OS/390 indexer exits” on
page 248.

7.9 Additional references


For more information, see the following IBM developerWorks® articles:
 Creating PDF Indexing Parameters Using Floating Triggers:
http://ibm.co/1FHsXDq
 Understanding the ACIF Input Exit for DB2 Content Manager OnDemand:
http://ibm.co/1UUcCT0

Chapter 7. Indexing and loading 183


184 IBM Content Manager OnDemand Guide
8

Chapter 8. User clients


In this chapter, we provide an overview of the clients that are available for IBM Content
Manager OnDemand (Content Manager OnDemand), including the various web client
offerings that are based on the Content Manager OnDemand Web Enablement Kit (ODWEK).
We describe the differences between web and Windows clients and their viewing options.

In the later sections, we focus on the integration and application programming interface (API)
client options of Content Manager OnDemand, such as the ODWEK API, the Content
Management Interoperability Services (CMIS) web services, the mid-server SAPI, and
integration with other IBM Enterprise Content Manager products, such as IBM Information
Integrator and IBM FileNet P8. We describe how to use the existing API to build your own web
client interface for Content Manager OnDemand.

In this chapter, we cover the following topics:


 Choosing the correct client for your implementation
 Content Manager OnDemand Client options
 Client API overview

© Copyright IBM Corp. 2003, 2015. All rights reserved. 185


8.1 Choosing the correct client for your implementation
Customers are faced with challenges in choosing the interface to Content Manager
OnDemand that makes the most sense for their implementation. Content Manager
OnDemand has many different user interfaces. Many aspects come into play when you
consider the best design for access to Content Manager OnDemand to meet all of your
requirements in the most cost-effective manner. Licensing costs, hardware costs,
performance, and maintainability are just a few considerations, but the most important
requirement is meeting the business needs for many different user types.

The Content Manager OnDemand Client choices enable the product to meet the
ever-changing world of information technology and the way content is delivered. For example,
delivering documents that are stored in Content Manager OnDemand to a mobile device was
not relevant a few years ago. However, it is an important consideration for enterprise content
delivery today. Technology drives change with current Content Manager OnDemand
customers, and IBM delivers options to meet current and future business requirements. A
customer’s goal is to use a single user interface for access to all of its Enterprise Content
Management content. IBM met that goal with the IBM Content Navigator user interface, but
IBM continues to retain multiple Content Manager OnDemand Client interfaces to meet the
various needs of its customers.

When you choose the correct client for your implementation of Content Manager OnDemand,
two primary considerations are the client functionality and the client architecture.

Concerning the client functionality, the most powerful client is the Microsoft Windows client.
All other clients contain only a subset of the features of the Windows client. The most
prominent difference is the viewer capability.

Determine whether your users require functionality that is specific to the Windows client only.
If not, see the range of viewer options that are described in 8.1.1, “Viewer options” on
page 186, which compares the different viewers across the various client options.

8.1.1 Viewer options


Different viewer options for the data that is stored within a Content Manager OnDemand
system exist. The following general types of viewers are available:
 The viewing capabilities that are provided by the Windows client.
 The web viewers that are shipped with ODWEK.
 Generic web viewers that are available in Content Navigator or other third-party web
viewers. The built-in viewers of Content Navigator are described in 8.2.1, “IBM Content
Navigator” on page 193.
 Conversion and transformation services that are started by ODWEK.
 External applications that are opened according to their associated document types (for
example, Microsoft Word for .doc or .docx files).
 Special client applications, such as the CICS client, the Structured APIs, or Java API
access.

The content that is displayed by certain viewers can be changed by either transforms
(ODWEK) or exits. For more information about exits, see Chapter 11, “Exits” on page 241.

186 IBM Content Manager OnDemand Guide


Windows client viewers
The Content Manager OnDemand Windows client contains native capabilities for viewing
typical archive data types:
 Line Data and SCS
 AFP
 Images

The Windows client reflects the richest set of capabilities in terms of viewing these data types.
Because it directly communicates with the Content Manager OnDemand server, we reference
the Windows client for all of its features that relate to document display.

The Line Data viewer of the Windows client is the most sophisticated viewer that is available
for Content Manager OnDemand from the selection of readily available viewers.

The viewing of these primary data types happens within the same application. The Windows
client provides other features, such as thumbnails, and configurable and saveable views.

The Content Manager OnDemand Windows client also contains other capabilities for viewing
archive data types, such as Portable Document Format (PDF) and User-Defined.

Starting with Content Manager OnDemand version 9.5, for both DocType=PDF and
user-defined PDF, the Windows Client will attempt to view a PDF document with Adobe
Acrobat, if it is installed. If Adobe Acrobat is not installed, for DocType=PDF, Adobe Acrobat
Reader will be used instead when the PDF document is viewed.

Before Content Manager OnDemand version 9.5, PDF documents can be viewed by the
Windows client in two ways:
 If they are configured in the application as data type “PDF”, the rich feature set of the AFP
and Line Data viewer applies, but Adobe Acrobat Professional is required.
 If the data type is configured as “User Defined” and “.pdf” as the extension, the
documents are started externally. Therefore, you can view the documents with the
no-charge Adobe Acrobat viewer or any other installed PDF viewer.

Any data type can be specified as “User Defined”, for example, Word documents (.docx).
User-defined data is viewed by invoking its associated application.

Web-based viewing options


The web-based viewing options for Content Manager OnDemand are provided primarily by
ODWEK. ODWEK includes different viewers that are dedicated to Content Manager
OnDemand documents that can use Content Manager OnDemand functions, such as the
segment-wise retrieval of large objects or annotations. These viewers are used in web
applications, such as Content Navigator or any other custom-developed web client:
 Line Data applet
 Browser plug-in for image viewing
 AFP browser plug-in
 AFP Transforms
 Generic Transforms

Detailed information about ODWEK’s viewers and transforms is in IBM Content Manager
OnDemand Web Enablement Kit Java APIs: The Basics and Beyond, SG24-7646. Only a
brief overview is provided in this chapter.

Chapter 8. User clients 187


The line data applet is a Java applet that is provided by ODWEK. It is similar to the line data
viewing capabilities of the Windows client, but it does not contain all of the parallel
functionality for viewing line data within the Windows client. For example, the applet does not
support saving and selecting custom views.

The plug-ins for AFP and images are shipped as setup packages, which must be installed on
the user’s computer. The plug-ins integrate themselves with Mozilla Firefox browsers and
Microsoft Internet Explorer. The AFP plug-in provides similar viewing capabilities to the
Windows client.

The image plug-in can view image files, with the added benefit of displaying TIFF images
(which current web browsers usually cannot display).

Conversions and transforms


In addition to the viewers, ODWEK uses conversion or transformation engines, which convert
the document into another data type. ODWEK allows the integration of AFP Transform
components for converting AFP into HTML or PDF documents, and it provides a generic
transform interface, which can be used to plug in any conversion or transformation engine.

The transforms apply only to documents that are served by ODWEK. They are available to
web clients that are based on ODWEK (such as Content Navigator) and to any other
application that is written by using the ODWEK Java API. They are not available on the
Windows client.

Web viewing considerations


When you choose a viewer strategy in web clients, it is important to know the differences
among the viewer architectures:
 Java applet viewers, such as the line data applet or Content Navigator’s generic applet
viewer, are downloaded automatically to the user’s computer and run within the browser.
No deployment is needed, but a Java installation must be present on the PC. They are
effectively cached on the user computers, and they can provide sophisticated functionality.
On the downside, each Java applet requires a Java virtual machine (JVM) to run. On
terminal servers that serve multiple users at once, this requirement might lead to larger
memory consumption.
 Plug-in viewers are native applications that must be installed through a setup routine on
the user’s computer. They integrate with the browser and provide their own viewing logic,
which can be sophisticated (for example, with the AFP plug-in).
 The generic and Ajax viewers that are provided by Content Navigator provide limited
rendering and viewing capabilities. They do not require any rollout or JVM.
 Transforms, such as the Ricoh AFP2PDF or other vendor-provided transforms, result in a
PDF document that is viewed in the Acrobat viewer. Although this viewer is deployed on
most user PCs, the rendering consumes processing power on the mid-tier system. Also,
large documents cannot be rendered into PDFs. Because the PDF is displayed by an
external application, it cannot communicate with the Content Manager OnDemand server
like the line data applet.

188 IBM Content Manager OnDemand Guide


Depending on the data that you are working with, consider these options:
 For Line Data:
– The line data applet supports annotations. It can work with large object (LOB) reports if
the large object functionality is employed at load time.
– The Ajax viewer and direct rendering capabilities of Content Navigator work only on
shorter reports. Additionally, the viewing of annotations and large object documents is
not supported.
 For AFP data:
– The AFP plug-in is the best choice, because it is almost identical to the client. However,
it does not support annotations.
The only viewers that use this functionality are the line data applet, the AFP plug-in
viewer, and the Content Manager OnDemand Windows client.
– AFP to PDF is a choice that does not require a plug-in rollout at the users’ computers if
the Acrobat plug-in is installed on their workstations. Font mappings must be
configured at a central location. The additional workload on a rendering system and
additional license costs must be considered. Large reports might not be able to be
rendered or viewed.

Note: The AFP viewer plug-in, which is available with ODWEK and Content
Manager OnDemand, is a version of the AFP viewer plug-in from the InfoPrint
Solutions Company. Although the standard InfoPrint viewer can be used for viewing
AFP, the ODWEK version uses direct communication with the Content Manager
OnDemand server, enabling segmented document transfer for LOB documents.

Annotations
Only the native ODWEK viewers and the Windows client support annotations. These viewers
and Windows clients support annotations in the following ways:
 Line data applet: Supports text. Starting with version 9, the viewer can work with graphical
annotations, also.
 Windows Client: Supports maximum capabilities for all data types.
 Other viewers, for example, the AFP plug-in viewer: Do not support and are not aware of
annotations.

Web clients, such as Content Navigator or the ODWEK Java API, can work with annotations
and provide access to them through the hit list. Graphical annotations cannot be accessed
that way because they are not exposed through the Java API.

Large object support


Large object (LOB) support is the methodology for working with large reports. For more
information about how LOB affects your reports, see “Large object” on page 52.

From a viewer’s perspective, if a large document is transferred, it generates high network


traffic, resource consumption, and long wait times for users. If the viewer supports LOB
documents, the viewer communicates with the server to transfer only the chunk of data that
the user is looking at (for example, a 200 page chunk out of a 10,000 page report). If the user
scrolls to a different chunk of pages, the viewer downloads only that relevant portion of the
document that the user scrolled to.

Chapter 8. User clients 189


The ODWEK Java API provides line-of-business operations. For more information, see IBM
Content Manager OnDemand Web Enablement Kit Java APIs: The Basics and Beyond,
SG24-7646.

8.1.2 Client infrastructure options


Several basic architectural options, Windows client, Content Navigator, or API-based client
integration into your line-of-business application, are available.

Windows client
Consider the following items when you are planning a Windows client infrastructure:
 It is faster than the web clients and more powerful.
 It requires native installation on each user’s workstation or notebook. Server version
upgrades might also require a new client installation.
 This client supports Citrix and Terminal services environments.
 It does not support the Transforms interface for transforming and converting data formats
because the data formats are provided by ODWEK only.

Content Navigator
When you choose a ready-for-use web client, consider the IBM strategic client, IBM Content
Navigator, because it is the most complete, most recent web client.

Special use cases might require the development of a custom client application for Content
Manager OnDemand. For more information about development APIs, see 8.3, “Client API
overview” on page 202.

With Content Navigator, you can run a cross-repository search to search for content across
multiple types of repositories, including Content Manager OnDemand. For example, Content
Manager OnDemand search results can be included in the same hit list as search results
from other supported repositories to help provide a comprehensive view of content.

When you create a cross-repository search, you can specify the following information:
 Specify the scope of the search on each repository. You can specify the search or the
classes that you want to include in the cross-repository search by using IBM Content
Manager OnDemand. On IBM FileNet Content Manager and IBM Content Manager, you
also can limit the search to a specific folder.
 Specify how properties from each repository are related to each other.
 Specify any default search criteria that you want displayed when users open the search.

For more information about how to configure a cross-repository search, see the IBM Content
Navigator Knowledge Center at the following web address:
http://www.ibm.com/support/knowledgecenter/SSEUEX_2.0.3/contentnavigator_2.0.3.htm

190 IBM Content Manager OnDemand Guide


Consider the following items when you choose Content Navigator or other clients:
 The viewers that are provided by Content Navigator are limited compared to the Windows
client.
 A Content Manager OnDemand client that focuses only on Content Manager OnDemand
is probably the easiest to maintain.
 A general Enterprise Content Manager Client, such as Content Navigator, with setup
specifications that support the Content Manager OnDemand model and capabilities, might
increase the dependency footprint of the client tier while it provides access to other
systems through the same user interface.

Developing your own client


When you develop your own applications (web client), you can use the ODWEK Java APIs.
For more information about the ODWEK APIs, see 8.3.1, “Content Manager OnDemand Web
Enablement Kit” on page 202 and IBM Content Manager OnDemand Web Enablement Kit
Java APIs: The Basics and Beyond, SG24-7646.

If you are developing a Windows application, you optionally can use the Object Linking and
Embedding (OLE) (ActiveX Control) API, which is provided by the Windows client. This API
requires a Windows client installation.

Another option is to use an intermediate API that is based on the ODWEK Java API for the
Content Manager OnDemand access portion. Content Management Interoperability Services
(CMIS) or other web services can be used as the intermediate API. The web service
application uses ODWEK to access Content Manager OnDemand and relays this access
through its own web services to any other application. In this case, the Windows application
only needs to talk to the web service. For more information about CMIS and its limitations,
see 8.3.2, “Content Management Interoperability Services” on page 204.

The use of an intermediate API increases complexity and potentially decreases performance,
but it decouples a Windows application and Content Manager OnDemand in terms of API
versioning and requiring a Content Manager OnDemand installation.

8.1.3 Client compatibility


During the development history of Content Manager OnDemand, features were added and
internal API schemes were changed. Therefore, not every client level can work with every
server level. When you choose a client infrastructure for your Content Manager OnDemand
environment, you must consider version dependencies.

Client compatibility matrix


At the API level, all user clients share a common API core that is based on the Windows client
and ODWEK. Almost all other client and API implementations are based on these common
APIs. An up-to-date overview of the compatibility matrix that shows the client and ODWEK
level that can work with each server level is available at the following website:
http://www.ibm.com/support/docview.wss?uid=swg21392275

Determining version levels


Especially on IBM i and IBM z Systems, the server release level might not be obvious,
because it is set by program temporary fixes (PTFs). The most convenient way to determine
the server level of your Content Manager OnDemand system is to log on to either the
Administrator Client or the user Windows client. After you are logged on, the clients show the
server version in the status bar, as shown in Figure 8-1 on page 192.

Chapter 8. User clients 191


Figure 8-1 Server version is displayed in the client

Every ars command on the server displays its current server software version, as well.

You can view the version of the Windows client by clicking Help → About.

To determine the version of ODWEK, you can either look for the readme file in the ODWEK
application directory or use a client. If you are running a web client (for example, Content
Navigator), open a line data report by using the line data applet viewer. Because this viewer is
provided by ODWEK directly, the viewer shows the current ODWEK version level in the About
dialog box under the Help menu.

Cross-server calls with server console commands


Several of the ars commands that are provided by the server software installation, for
example, the ARSDOC and ARSLOAD commands, can work with remote servers. This capability
applies to cross-platform calls, for example, loading data with the ARSLOAD command that is
running on Linux to a Content Manager OnDemand server that is running on the mainframe.

For more information, see “Server commands” on page 205.

Multiple versions at the same time


Before version 9.5, only one installation of the Content Manager OnDemand Windows client
(user and administrative) was installed on a workstation concurrently. Multiple different
versions were not allowed to coexist.

Starting with version 9.5 and later, you can run multiple versions of the Content Manager
OnDemand Windows client (at the release level only, not the PTF level) on a single
workstation. The client code is now installed in the c:\Program Files (x86)\IBM\OnDemand
Clients\V9.5 directory.

For ODWEK, you can run multiple versions of ODWEK on a single system. Although this
capability might not be a preferred scenario from a maintenance point of view, it can be
helpful during upgrades and existing system access scenarios. Each application that uses the
ODWEK API must point to the correct installation path and load the correct corresponding
libraries.

For more information, see the technote at the following website:


http://www.ibm.com/support/docview.wss?uid=swg27019696

192 IBM Content Manager OnDemand Guide


8.2 Content Manager OnDemand Client options
In this section, we describe the common client options for Content Manager OnDemand,
including web and non-web clients.

8.2.1 IBM Content Navigator


Content Navigator is the strategic client for IBM Content Manager, IBM FileNet P8, and
Content Manager OnDemand. Access to Content Manager OnDemand servers is through the
ODWEK Java API. Content Navigator is a Web 2.0 web client and requires a web application
server, such as IBM WebSphere® Application Server.

Content Navigator can be used to access documents from multiple content repositories:
 IBM Content Manager Enterprise Edition repositories
 IBM Content Manager OnDemand repositories
 IBM FileNet P8 repositories
 Organization for the Advancement of Structured Information Standards (OASIS) CMIS
repositories

With Content Navigator, users can perform these tasks:


 Search documents from any of the content repositories
 View documents side-by-side
 Edit document properties
 Add annotations to documents
 Send documents and document links through email
 Print documents
 Download documents

You can use Content Navigator to build a customized user experience. It supports many
configuration options and includes a powerful API toolkit that you can use to extend the web
client and build custom applications.

Figure 8-2 shows Content Navigator browsing a folder in Content Manager OnDemand.

Figure 8-2 Searching a Content Manager OnDemand folder with Content Navigator

Chapter 8. User clients 193


Content Navigator is a full-feature client for Content Manager OnDemand. Its interface follows
modern user interface styles, with a browser pane on the left that shows the available Content
Manager OnDemand folders, and a search and result pane on the right. All components and
data are dynamic, and they can be resized and changed.

Note: Content Navigator is a Web 2.0 Ajax-based client. These web applications rely on an
up-to-date JavaScript engine, which is only available in newer browsers. Older browsers,
such as Microsoft Internet Explorer Version 8, might not work correctly with Content
Navigator.

Content Navigator, version 2.0.2 and later, provides many additional Content Manager
OnDemand capabilities:
 AFP Viewer plug-in support
 External Data Services (EDS) support
 Favorites support for folders and documents
 Single and multiple AFP file download as PDF (with AFP2PDF enabled)
 Highlighted search result terms in full text searches
 Line2PDF conversion viewer
 XML viewer

Starting with Content Manager OnDemand V9.0 Content Navigator provides single sign-on
(SSO) token pass-through to the client side. Date validation is no longer required. Support is
provided for ‘t’ date expression and federated search across Content Manager OnDemand,
FileNet P8, and IBM Content Manager repositories. Content Navigator is also the new CMIS
packaging for Content Manager OnDemand.

Installing Content Navigator


Content Navigator must be installed natively with ODWEK and IBM WebSphere Application
Server (or any other applicable web application server). Typically, Content Navigator is
installed on a separate system in the web tier and not on the same system as the Content
Manager OnDemand server.

The following prerequisites exist for a Content Navigator installation for Content Manager
OnDemand:
 Native installation of the Content Navigator base software
 A database to store the Content Navigator configuration
 Web application server
 ODWEK
 Optional: AFP Transforms for AFP to PDF rendering
 Java Database Connectivity (JDBC) drivers (if not already present)

The Content Navigator database is relatively small, so a collocation with the Content Manager
OnDemand database might be possible in small deployments. The installation manual
provides SQL statements for creating the database and its table spaces.

After you install all of the components, run the Content Navigator Configuration and
Deployment Tool to create a preconfigured web application and deploy it to the web
application server.

The Configuration and Deployment Tool provides a wizard that leads you through the base
setup process. You must provide details about your web application server and connection
information to the configuration database. For the Content Manager OnDemand
configuration, you must provide the location of your ODWEK installation. Run the deployment
scripts at the end for deploying Content Navigator on your application server.

194 IBM Content Manager OnDemand Guide


The installation is described in detail in the “Planning, installing, and configuring IBM Content
Navigator” section of the IBM Content Manager OnDemand Knowledge Center:
http://www.ibm.com/support/knowledgecenter/SSEUEX_2.0.3/com.ibm.installingeuc.doc/
eucao000.htm

Accessing the native libraries


The ODWEK Java API uses native libraries. To run Content Navigator (or any other web client
that is based on the Java APIs), ensure that the web application server can access these
libraries. To achieve this task, add the ODWEK directory into the PATH environment variable.
On Windows platforms, you also must add the lib64 subdirectory of ODWEK into the PATH.

The following example shows the path to the directory in Windows:


PATH=%PATH%;C:\Program Files\IBM\OnDemand\V9.5\www;C:\Program
Files\IBM\OnDemand\V9.5\bin

On Linux and UNIX platforms, it is necessary to expand the LD_LIBRARY_PATH (LIBPATH on


AIX) to include the ODWEK directory. This step must be performed in the environment on
which the web application server is running by editing the start scripts.

For example, on Linux, you run this command:


export LD_LIBRARY_PATH="/opt/ibm/ondemand/V9.5/www:$LD_LIBRARY_PATH

The Content Navigator installer creates a shared native library in WebSphere Application
Server. You can review this library in the Integrated Solution Console in the Environment,
Shared libraries section. You need a library that has the class path set to the location of the
ODApi.jar (for example, /opt/ibm/ondemand/V9.5/www/api/ODApi.jar) and the Native Library
Path set to the ODWEK directory (for example, /opt/ibm/ondemand/V9.5/www). If you
encounter any errors, ensure that these paths are valid.

Note: If multiple applications reference the same native library, the library gets loaded
multiple times. But because the ODWEK library is a shared library, it can be loaded only
one time for each JVM. So, if you are running multiple ODWEK web applications in one
WebSphere Application Server, you must configure the shared library reference on the
Class Loader level of the server itself instead of on the application level. You can use the
Integrated Solution Console, which is in the class loader of the application server, for this
task.

Administering Content Navigator


Content Navigator administration is performed in the admin desktop of the Content Navigator
web application. For more information, see the “Administering IBM Content Navigator” section
of the IBM Content Manager OnDemand Knowledge Center:
http://www.ibm.com/support/knowledgecenter/SSEUEX_2.0.3/com.ibm.installingeuc.doc/
eucco037.htm

Adding a Content Manager OnDemand repository to Content Navigator


Multiple Content Manager OnDemand repositories can be added to a Content Navigator
installation, exposing each repository to a defined set of users through the configuration of
different desktops.

Chapter 8. User clients 195


For the configuration of Content Manager OnDemand repositories, you need the following
parameters:
 Display name: The depository name that is displayed to the users.
 Server name: IP or host name of the Content Manager OnDemand server.
 Port number: Instance port (the default is 1445).
 If you want an encrypted connection between ODWEK and the Content Manager
OnDemand server, enable Secure Sockets Layer (SSL) and provide an SSL key ring
database and stash file. Enabling SSL consumes additional resources on both systems
(Content Manager OnDemand and the web tier).

Note: This option does not affect the SSL security of the web application, for example,
between the web server and the browser. It only encrypts the API communication
between the web tier and the Content Manager OnDemand server.

 If you want to use AFP Transforms or another transform filter through generic transforms,
you must specify the path to the correct configuration files.

You can specify additional configuration parameters, for example, in the ODConfig class in
the Java API. For more information, see the Javadoc of ODApi or IBM Content Manager
OnDemand Web Enablement Kit Java APIs: The Basics and Beyond, SG24-7646.

Content Navigator viewer options


For each Content Navigator Desktop, a different viewer map can be active. Within a viewer
map, for each content type, a different viewer can be configured. Several viewers are
available to Content Manager OnDemand repositories in Content Navigator:
 Content Navigator uses the viewers that ship with ODWEK, for example, the line data
applet. Repository-specific features can be handled only by ODWEK viewers.
 ODWEK performs conversions, for example, an AFP to PDF conversion.
 Built-in viewers for Content Navigator:
– Ajax viewer and a simple PDF and HTML conversion
– Web browser pass-through
– PDF-inline viewer for addressing the Adobe Acrobat viewer browser plug-in
– Generic Applet viewer
– IBM Daeja™ ViewONE Virtual viewer
For a full listing of the viewers, see the IBM Knowledge Center:
http://www.ibm.com/support/knowledgecenter/SSEUEX_2.0.3/com.ibm.installingeu
c.doc/eucco002.htm?lang=en
 Content Manager OnDemand plug-in viewers for Content Navigator:
– AFP Viewer plug-in
– FileNet Content Federation Services Viewer plug-in
– XML Viewer plug-in
 Third-party viewers can be integrated into Content Navigator. For IBM Production Imaging
Edition, for example, a third-party viewer is integrated with Content Navigator. You can
integrate your own viewer by using the Content Navigator plug-in architecture.

196 IBM Content Manager OnDemand Guide


The generic applet viewer (“applet viewer”) is a Java applet, which can handle various types
of documents, such as PDF and Microsoft Office documents (which it renders), images, line
data, and AFP documents. The generic applet viewer might be an option if you work with
images that are stored in Content Manager OnDemand.

If you want to avoid the use of Java applets and your content is viewable by browsers (for
example, certain image types or textual data), try the browser pass-through viewer, which lets
the browser handle the data natively. If you work with AFP and must use the AFP browser
plug-in, register the Content Navigator plug-in, AFPViewerPlugin.jar, and configure the
viewer map that is assigned to your Content Navigator desktop to use the AFP viewer for the
application/afp MIME type. The AFPViewerPlugin.jar file ships with Content Navigator. You
must choose the web browser pass-through viewer.

The Ajax viewer is a Web 2.0 JavaScript application that provides basic document functions,
such as page-wise browsing, rotation, or zoom. It is not a Java applet.

The generic applet viewer, the built-in PDF and HTML conversion, and the Ajax viewer can all
work with various data types:
 Images (such as TIFF, JPEG, and DICOM)
 Office documents
 PDF
 Most line data documents
 Certain AFP data

However, they all use a rendering engine to display Office, PDF, and AFP data into an image.
This rendering might work well with certain Office and PDF files, but it fails on most non-basic
AFP data streams.

For more information, see 8.1.1, “Viewer options” on page 186.

Note: Content Navigator is a Web 2.0 client and relies on HTML 5 and JavaScript for its
core client functionality and especially for the Ajax viewers. Not all browsers are suitable for
running Content Navigator fast and efficiently, especially for Microsoft Internet Explorer
browsers before version 9. Test Content Navigator with your user browser thoroughly
before you consider a deployment.

Extending Content Navigator


Content Navigator is not designed as a client that is dedicated solely to Content Manager
OnDemand, so a more complex configuration is necessary than with simpler client options.
Content Navigator provides many configuration and customization options through its API
and plug-in methodology. For more information about the customization options of Content
Navigator, see Customizing and Extending IBM Content Navigator, SG24-8055.

8.2.2 Content Manager OnDemand Windows client


The Content Manager OnDemand Windows client is a full function, feature-rich client that
meets the needs of line-of-business application areas and customer service representatives.
The Windows client displays content in its native format and is considered a corporate
internal access client. Many technical aspects of the Windows client are described in 8.1.1,
“Viewer options” on page 186 and 8.1.2, “Client infrastructure options” on page 190.

Chapter 8. User clients 197


Figure 8-3 shows a user that is logged in to a folder that performed a search and received the
results list. Figure 8-3 shows the indication of a note or hold and also the location of the
document. On the right side of the hit list, the load date and document size are displayed.

Figure 8-3 Content Manager OnDemand results list in the Windows client

As the full function client for Content Manager OnDemand, the Windows client provides
various business functions and features that can be selected at the document level, as shown
in Figure 8-4 on page 199.

198 IBM Content Manager OnDemand Guide


Figure 8-4 Windows client capabilities

You also can show the pages within a document or report as thumbnails, which provide you
with a visual representation of the report.

8.2.3 CICS Client


The CICS 3270-based interface was the original user interface for Content Manager
OnDemand z/OS. It was the predecessor to the Windows and web technology clients that are
used today by most Content Manager OnDemand customers. Customers still request the
CICS Client to use it to meet their production needs, typically as they migrate their user base
(and applications) from a host environment to a client/server or web architecture. The CICS
client was developed to meet this need. The CICS Client provides a functional subset of the
windows and web clients. The CICS Client is English only. It is included in the Content
Manager OnDemand maintenance. It does not ship in the Content Manager OnDemand
package, so it must be downloaded and installed separately.

The CICS Client can be downloaded from the following website:


https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=db2cmto

Figure 8-5 on page 200 shows the Content Manager OnDemand CICS Client login panel,
which requires the standard login credentials.

Chapter 8. User clients 199


Figure 8-5 Content Manager OnDemand CICS Client login panel

The CICS Client provides viewing capabilities for line data reports and a “best fit” model for
fully composed AFP documents. Viewing a standard line data report is shown in Figure 8-6.

Figure 8-6 Viewing a standard line data report

200 IBM Content Manager OnDemand Guide


8.2.4 Integration with other Enterprise Content Manager products
Content Manager OnDemand provides integration points with other IBM Enterprise Content
Manager software on many different levels. Integration can occur on the client level (for
example, by using another product’s user interface (UI) as the client for Content Manager
OnDemand). You can use an infrastructural integration in which another product accesses
Content Manager OnDemand and information is exchanged between the products at a lower
level.

For more information about the most common integrations, see Federated Content
Management: Accessing Content from Disparate Repositories with IBM Content Federation
Services and IBM Content Integrator, SG24-7742.

8.2.5 Federated search with IBM Information Integrator


Information Integrator is an IBM Enterprise Content Manager product that is available for all
Enterprise Content Manager customers. Although it has many functions, it is primarily a
federation system.

It can connect to various systems, such as Content Manager OnDemand, Content Manager,
FileNet P8, and content management systems by other vendors. You can create a virtual
archive, spanning across all connected systems and document models. Users can search in
one system and the search is propagated to multiple back-end repositories. Information
Integrator maps virtual fields to folder fields in Content Manager OnDemand (or respective
models in other systems) and delivers a consistent hit list of documents to the user.

Content Integrator might be an option for you if you use separate Content Manager
OnDemand systems (instances or physical systems) and must provide a cross-system
search (for example, for eDiscovery or legal inquiries). Another use case is to provide
repository-neutral services with access to multiple content management systems.

Note: Information Integrator is an abstraction layer. You lose Content Manager OnDemand
specific functionality, because the virtual archive provides only the common functionality
that can be implemented by all archives. Always check your use case to verify that a virtual
archive meets your needs for functional compatibility and performance.

8.2.6 Integration with IBM FileNet P8


Integration exists between IBM FileNet P8 and Content Manager OnDemand through FileNet
Content Federation Services. Content Manager OnDemand documents can be federated into
FileNet P8, making them accessible like any other FileNet P8 documents for FileNet P8
users.

This federation differs compared to Information Integrator. In Content Federation Services, for
each Content Manager OnDemand document, a virtual document is created in FileNet P8
(resulting in database records in FileNet P8). So, these documents act as FileNet P8
documents from a FileNet P8 user’s perspective. Information Integrator does not have its own
database and does not create virtual documents, but it instead calls Content Manager
OnDemand for searches and passes on the result list. A search in FileNet P8 never starts a
search in Content Manager OnDemand, but it can find only federated Content Manager
OnDemand documents, which are cataloged in the FileNet P8 database.

Chapter 8. User clients 201


If a FileNet P8 system is installed in your environment that serves as your primary content
management system and reports need to be available to users without their knowing that
those reports are in a different system, this integration might suit your needs. The same
situation applies to the use of FileNet P8 Records Management, which can be applied to
Content Manager OnDemand documents as well, therefore bringing a level of federated
records management capability to your documents.

When you plan your integration with FileNet P8, remember this federation is active: Content
Manager OnDemand actively publishes document links into a FileNet P8 system. You must
consider both volumes (FileNet P8 systems usually are smaller than Content Manager
OnDemand systems) and the active federation process.

For more information about Content Manager OnDemand and FileNet P8 integration, see
IBM FileNet Content Federation Services for Content Manager OnDemand, SC19-2711.

8.3 Client API overview


With various client options, multiple API options are available to navigate through the system
and access Content Manager OnDemand documents. Although the Java API that is provided
by Content Manager ODWEK is the API that is used most by clients and the basis for most
development projects, other APIs are available and used for a limited range of scenarios.

The following list shows the APIs that are available for Content Manager OnDemand:
 Content Manager ODWEK: The Java API for Content Manager OnDemand
 SOAP and Representational State Transfer (REST) web services that follow the CMIS
standard
 Windows OLE (ActiveX control) that is provided by the Windows client
 XML administrative API through the ARSXML server command
 Structured APIs on z/OS environments
 The standard Content Manager OnDemand server commands that serve as a
console-based API to work with Content Manager OnDemand documents

8.3.1 Content Manager OnDemand Web Enablement Kit


ODWEK provides a Java API to access Content Manager OnDemand servers and their
documents. It is the strategic client API that provides the largest feature set of any Content
Manager OnDemand API. It is used by web clients, such as Content Navigator or WEBi, by
abstraction layers, such as Information Integrator, or by API components, such as CMIS.

The ODWEK Java API and its use to develop Content Manager OnDemand clients are
described in detail in IBM Content Manager OnDemand Web Enablement Kit Java APIs: The
Basics and Beyond, SG24-7646. This section covers only a basic overview and focuses on
client considerations about ODWEK. Developers are encouraged to read the referenced book
before they plan a client development that is based on ODWEK.

Scope
ODWEK is a Content Manager OnDemand component that can be used by all Content
Manager OnDemand customers. It is focused on typical client use cases, such as searching
for and accessing data that is stored in a Content Manager OnDemand archive. It also has
web viewers, such as the line data applet and Content Manager OnDemand AFP viewer.

202 IBM Content Manager OnDemand Guide


For more information about ODWEK viewers and conversion support, see “Windows client
viewers” on page 187.

Before Content Manager OnDemand Web Enablement Kit (ODWEK) Java API V9.5, the only
API that allowed documents to be added to the Content Manager OnDemand archive was the
ODFolder.storeDocument API, which resulted in an archive request to the Content Manager
OnDemand server for each document. This API is suitable for low-volume ad hoc storage.

In ODWEK V9.5, new APIs were introduced to allow documents to be loaded in bulk, which
provides high-volume storage similar to the arsload command. To accomplish bulk loading by
using the ODWEK Java API, you perform these steps:
1. Call the ODServer.loadInit API to initiate the load process.
2. For each document to load, call the ODServer.loadAddDoc API, which passes the number
of pages, a hash table of index values to store, and the document data.
3. Call the ODServer.loadCommit API, which specifies the application group and application
to send the load data and load request to the Content Manager OnDemand server.

For special client needs, the Java API provides access to the object model (application group
and application) of Content Manager OnDemand and facilitates an ARSXML pass-through,
which can be used to perform administrative tasks.

Native library dependency


Because of the nature of the Content Manager OnDemand architecture, ODWEK requires the
use of native libraries.

In addition to the physical presence on the system, Java applications must be aware of the
native libraries. The ODWEK native libraries are loaded as shared memory objects and
cannot be reloaded multiple times. If you run multiple ODWEK applications in one web
application server, consider this restriction.

For a description of how the native library reference is managed for the ODWEK client in IBM
Content Navigator in IBM WebSphere Application Server, see “Accessing the native libraries”
on page 195.

ODWEK web client design considerations


When you design a web client for Content Manager OnDemand that is based on ODWEK,
consider the following items:
 Dependency on a native shared library affects deployment and general options, such as
the message language, which can be set only for the whole environment.
 Be careful with multithreading document access. Access to a single session with the
Content Manager OnDemand server must be in a single-threaded fashion. Only one
thread can access objects of a specified Content Manager OnDemand session at a time.
 Every session that is established with a Content Manager OnDemand server consumes
memory on the ODWEK system. For high-usage applications that support many
concurrent users, for example, web clients that work with non-named users, we suggest
the use of connection pooling.
 Ensure that a timeout concept is implemented in your application that meets the Content
Manager OnDemand user activity timeout. Sessions that do not time out might lead to
memory leaks or high memory consumption on the Content Manager OnDemand and
ODWEK machines.

Chapter 8. User clients 203


Note: Starting with version 9 of ODWEK, additional functions were added to reset the
inactivity timeout counter of a user session. This enhancement simplifies the design of
connection pooling and timeout scenarios.

For a connection pooling sample that covers the topics of thread safety, resource
consumption, and timeouts in detail, see Chapter 6, “Connection pooling and connection
handling”, in IBM Content Manager OnDemand Web Enablement Kit Java APIs: The Basics
and Beyond, SG24-7646.

8.3.2 Content Management Interoperability Services


CMIS is an open standard for accessing content management repositories. It is an OASIS
specification and it is supported by various applications from different vendors, including IBM
(with FileNet P8, Content Manager, and Content Manager OnDemand).

CMIS provides a common access interface for searching, retrieving, and in the case of
document management systems, modifying and deleting documents. It is a web services
interface that is implemented in either SOAP web services and REST (Atom) services.

For more information about CMIS, see the CMIS page on the OASIS website, the CMIS
overview page at the IBM Enterprise Content Manager website, and the technical
documentation that is available:
 https://www.oasis-open.org/committees/cmis/
 http://www.ibm.com/software/ecm/cmis.html
 Implementing Web Applications with CM Information Integrator for Content and
OnDemand Web Enablement Kit, SG24-6338
 Content Management Interoperability Services for Content Manager OnDemand is
installed as part of the IBM Content Navigator installation. For more information, see
“Installing Content Navigator” on page 194.

When you consider implementing your own software on CMIS, remember CMIS is used for
accessing document management systems, but not necessarily high-volume report archives,
such as Content Manager OnDemand.

The methodology of accessing documents is based on folders and subfolders with


documents in it (such as in a file system) and partially emulated by Content Manager
OnDemand with its different object model. The use of CMIS must be considered as an
abstraction layer that might have an impact on throughput and feature exposure. Also, much
of the CMIS API is not supported by Content Manager OnDemand (such as the storage and
deletion functions).

8.3.3 Other client-based API options


Other client-based API options include Windows ActiveX API, structured API on z/OS, server
commands, and XML Administration interface (ARSXML).

204 IBM Content Manager OnDemand Guide


Windows ActiveX API
The Windows client ships an ActiveX control, which can be used in its own application for
accessing Content Manager OnDemand servers and documents through the functions that
are provided by the Windows client. It is a development API that enables the development of
custom applications that use an installed Windows client as the API provider. The ActiveX API
covers only a basic operation subset.

For more information about the Windows client-based API, see Windows Client
Customization Guide, SC19-3357.

Structured API on z/OS


In z/OS environments, Content Manager OnDemand includes Structured APIs that provide
custom applications in CICS, IBM IMS™, TSO, or batch environments with the ability to
connect to Content Manage OnDemand servers. The Structured APIs support only the basic
read operations (log on, open folder, search, and retrieve documents and annotations).

Structured APIs are handled by a dedicated component of Content Manager OnDemand that
is called MidServer. MidServer relies on ODWEK and its API to access the Content Manager
OnDemand server.

Structured APIs are available only on z/OS, and they are called from COBOL or C
applications in the same manner as MVS calls. Because ODWEK is used as the access path
to the Content Manager OnDemand server, the Structured APIs can be used to access
non-z/OS Content Manager OnDemand servers, as well.

Server commands
In addition to the API options, which are exposed through Java, OLE, or Web Services,
Content Manager OnDemand provides console (command-line) applications that provide
specific functions, such as searching, retrieving, or deleting documents, and sophisticated
functions, such as placing holds and working with the full text engine. Most of this functionality
is exposed through the ARSDOC application.

Simpler custom applications, for example, shell scripts, can use these server console
applications to interact with Content Manager OnDemand systems. The applications are
available only as part of a Content Manager OnDemand server installation. Because most of
them (namely ARSDOC) communicate with the server through TCP/IP, you can connect and
interact with Content Manager OnDemand servers remotely on other platforms. When you
call remote servers, ensure that the local installation that provides the ARS applications and
the actual Content Manager OnDemand server are on the same version level.

For more information about the administrative commands, see the specific command
descriptions in the IBM Content Manager OnDemand Knowledge Center:
http://www.ibm.com/support/knowledgecenter/SSEPCD_9.0.0/com.ibm.ondemandtoc.doc/ad
ministering.htm

XML administration interface: ARSXML


In addition to the user client APIs, the ARSXML server command provides an interface for
administrative users and applications to access the Content Manager OnDemand data
model. By using ARSXML, folders, application groups, applications, and users can be exported,
created, deleted, and modified. It works on XML documents by describing the change, action,
or selection criteria and the resulting output XML document.

Chapter 8. User clients 205


ARSXML is a console application that is available on the Content Manager OnDemand server. It
can work with remote servers if they are at the same release level.

XMLs can be passed to and from ARSXML through the ODWEK Java API, which enables Java
applications to programmatically call ARSXML and obtain access to administrative data model
functions.

206 IBM Content Manager OnDemand Guide


9

Chapter 9. Data conversion


In this chapter, we provide information about data conversion for IBM Content Manager
OnDemand (Content Manager OnDemand). We describe the reasons for data conversion
and describe the interface that Content Manager OnDemand uses to convert data.

In this chapter, we cover the following topics:


 Overview of data conversion
 Generic Transform Interface

© Copyright IBM Corp. 2003, 2015. All rights reserved. 207


9.1 Overview of data conversion
To work with data conversion, understand the data conversions that are required, and when
and how to convert the data. Perform detailed planning before you build your solution so that
you achieve a design that remains efficient for many years.

In this section, we describe why you might need data conversion, when to convert the data
stream, and how to convert the data.

9.1.1 Why convert data streams


You might want to convert data streams for many reasons:
 Certain data streams, such as Hewlett-Packard (HP) Printer Command Language (PCL)
or Xerox metacode, are printer-specific and cannot be displayed. Before you archive or
display the documents, these data streams must be transformed into a compatible format.
 The archived data stream might need to comply with a company’s internal rules or
regulations. Therefore, the produced data streams must be transformed into the defined
and required final format before they are archived.
 The documents might need to be accessible by a user that is outside of the company. The
document must be displayed through standard tools that are available on any or at least
most of the clients, such as an Internet browser or Adobe Acrobat Reader.
 The documents might need to be manipulated so that only part of the document is
displayed in a personalized way.

9.1.2 When to convert data streams


The decision of when to convert data streams relies mainly on the use of the system.
Typically, converting data at load time requires more time to process the print stream file, and
converting data at retrieval time causes the user retrieval to be a little slower. The decision
might depend on how many documents are retrieved, compared to how many documents are
loaded daily. It might also depend on legal requirements about the format of stored data.

AFP to PDF
If a requirement exists to present AFP documents in the Portable Document Format (PDF)
format over the web, from a storage perspective, it is more efficient to store the documents in
their native format and then convert them to PDF at retrieval time. AFP documents are stored
more efficiently than PDF documents.

The PDF print stream, when it is divided into separate customer statements, is larger than
AFP because each statement contains its own set of structures that are required by the PDF
architecture to define a document.

Elapsed time and processor time are also essential factors in the decision-making process.
The amount of time (elapsed and CPU) that is needed to convert the document depends on
how large the document is and how many resources or fonts are associated with the
document.

208 IBM Content Manager OnDemand Guide


9.1.3 How to convert the data
Content Manager OnDemand uses the Generic Transform Interface to integrate Content
Manager OnDemand with third-party transform solutions.

Consider the following information about target flows:


 HTML might be used with the same intent, but an HTML document is not always displayed
identically, depending on the web browser that is used. Additional testing that includes
your needs and the encountered environments might be necessary for validation before
the implementation.
 PDF might be used as a way to make documents available through standard and
no-charge tools, such as Adobe Acrobat Reader. The transformed documents must be
displayable, saveable, and printable the same way regardless of the environment on which
the user works.
 XML is an intermediate text-based data format for the manipulation of documents,
regardless of the source data stream, and displays the documents totally or partially in a
personalized way. The use of XML usually involves additional development, including
scripts and stylesheets.

9.2 Generic Transform Interface


Content Manager OnDemand uses the Generic Transform Interface to manage third-party
data transforms for the Content Manager OnDemand Web Enablement Kit (ODWEK)
application programming interface (API) set. This interface is used with the document retrieval
APIs.

The ODWEK Java API provides industry-standard Java classes that can be used by a
customer to write a custom web application that can access data that is stored on the Content
Manager OnDemand server. This custom application can, for example, permit the user to log
on to a Content Manager OnDemand server, get a list of folders, search a specific folder,
generate a hit list of matching documents, and retrieve those documents for viewing. Many
APIs provide advanced functionalities.

For more information, see the following resources:


 IBM TechDoc Best practices for building Web Applications using IBM Content Manager
OnDemand Java APIs:
https://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101203
This document, which is prepared by the Content Manager OnDemand development
team, provides recommendations about how to use the ODWEK Java APIs. Use this
document to understand how the ODWEK Java APIs interface with the Java virtual
machine (JVM) and Content Manager OnDemand systems to avoid common coding
mistakes.
 IBM Content Manager OnDemand Web Enablement Kit Java APIs: The Basics and
Beyond, SG24-7646:
This publication provides basic and advanced information about how to use the ODWEK
Java APIs to develop custom applications.

Chapter 9. Data conversion 209


9.2.1 Overview
Before version 8.5.0.0, the ODWEK Java APIs provided a tight integration with only a few
specific transforms: AFP2PDF, AFP2HTML, and AFP2XML. These transform engines were
used by ODWEK clients to generate different document types for display purposes. Although
this capability provided invaluable functionality, it meant that new transform engines were not
readily integrated into ODWEK.

To meet this requirement, a highly flexible interface was added to the ODWEK Java APIs that
allows a developer to easily implement a third-party document transform solution.

The new ODWEK Interface allows a client developer to implement an external program to
transform a document in one of two ways:
 If the transform vendor provides a basic command-line executable file, it is implemented in
an XML interface, which supports the retrieval of all of the document details that are stored
in Content Manager OnDemand, and also allows specific options to be passed to the
transform.
 The ODWEK Java APIs also provide a Java interface that a client developer can use to
add even more flexibility to their client solution. The Java interface allows a client
developer to get the document byte stream from ODWEK, then use any methods that they
want to convert the document. These methods can include calls to web services that allow
remote transformation. After the document is transformed, the resulting data can be
returned to ODWEK, where it is passed back to the client that made the request.

9.2.2 Configuration
To enable the Generic Transform Interface in ODWEK, an XML document must be created
and defined in the ODConfig.Properties object. This XML document is identified by the
<ODConfig.TransformXML> key name and must include the fully qualified path to the XML file
where the transforms are defined.

After you configure your XML configuration for the Generic Transform Interface, as described
in 9.2.3, “Basic implementation: Executable interface” on page 211, you can enable this
functionality in your ODWEK environment, as shown in Example 9-1.

Example 9-1 Enabling Generic Transform Interface in the ODWEK environment


Properties props = new Properties();
props.setProperty(ODConfig.TRANSFORMS_XML, "transform.xml"); /*Fully qualified
path to XML file containing transform details.*/
ODConfig odConfig = new ODConfig(ODConstant.PLUGIN, //AfpViewer
ODConstant.APPLET, //LineViewer
null, //MetaViewer
10, //MaxHits
"", //AppletDir
"ENU", //Language
null, //TempDir
"c:\tracedir", //TraceDir
4, //TraceLevel
props); //Additional properties

210 IBM Content Manager OnDemand Guide


9.2.3 Basic implementation: Executable interface
The basic implementation of the Generic Transform Interface involves an XML configuration
to define a transform to ODWEK that uses the command-line (cmdline) executable
functionality. With this configuration, you can request details that Content Manager
OnDemand stored for the document to be passed in the specified cmdline options and to also
pass through transform-specific options, as specified in the ODTransform.xml file.

Example 9-2 shows a sample of the ODTransform.xml file that can be used in this
implementation.

Example 9-2 ODTransform.xml sample


<Transforms>
<transform>
<TransformName>MyTXFRM_EXE</TransformName>
<TransformDescription>Transform Cmdline Executable</TransformDescription>
<OutputMimeType>application/pdf</OutputMimeType>
<OutputExtension>pdf</OutputExtension><CmdParms>
<RECORDLENGTH>-lm</RECORDLENGTH>
<CARRIAGECONTROL>-x</CARRIAGECONTROL>
<CODEPAGE>-a</CODEPAGE>
<OUTPUTFILE>-o</OUTPUTFILE></CmdParms>
<CmdLineExe>c://opt//txfrm.exe</CmdLineExe>
<Passthru>
<!-- Use tag cmdlineparm to declare additional cmdline variables that the
transform might require -->
<Cmdlineparm>-r PDF</Cmdlineparm>
</Passthru>
</transform>
<Transforms>

In this example, you can see that we defined a transform that is named MyTXFRM_EXE, which
calls the transform command txfrm.exe, which is defined in the <CmdLineExe> tag.

The <TransformName> is used as the viewer name when it calls the ODWEK Retrieve APIs.
From this configuration, ODWEK knows that the transform requires RECORDLENGTH,
CARRIAGECONTROL, CODEPAGE, and OUTPUTFILE information from Content Manager OnDemand,
and can set it on the cmdline by using the options that are specified in each related XML tag.

Also, the txfrm.exe requires additional information to be passed on the cmdline. The -r that
is specified in the <Cmdlineparm> tag has no meaning to Content Manager OnDemand, so it is
passed through and set on the cmdline call to the txfrm.exe.

In the custom Java code, the call to retrieve the data from ODWEK includes the <Transform
Name> that is specified in the XML and looks like the following line:
"byte[] transformedDocument = ODHit.retrieve("MyTXFRM_EXE");

From this example definition, ODWEK calls the specified transform with the following cmdline
executable file. Details for the items within “< >” are provided by ODWEK from the Content
Manager OnDemand data definitions:
"c:/opt/txfrm.exe -lm <record len> -x <carriage control> -a <codepage> -o <output
file name> -r PDF"

Chapter 9. Data conversion 211


9.2.4 V9.5 enhancement: Customizing values that are returned from ODWEK
For certain transforms, values that are returned from ODWEK might not be consistent with
the command-line values that are expected by the transform. For example, a transform might
have a fixed set of options to specify a carriage control type. The values that are returned by
ODWEK when the <CARRIAGECONTROL> tag is included in the <CmdParms> are 'A' (ANSI), 'M'
(Machine), and 'N' (None). The following command is produced by the XML in Example 9-1
on page 210:
c:/opt/txfrm.exe -lm 133 -x A -a 500 -o <outputfilename> -r PDF <datafilename>

Because the <CARRIAGECONTROL> tag is present, ODWEK returns the document’s


corresponding value - "-x A", or "-x M", or "-x N", depending on the carriage control type (CC
Type) that is defined in this document’s application definition. If the transform defines a
different set of acceptable values, for example 2, 4, and 0, to specify the document’s carriage
control, you can map those values by substituting the following XML as shown in Figure 9-1.

Figure 9-1 Sample XML with custom options

Note: The <CARRIAGECONTROL> node was replaced by three values. When the CC
Type that is returned by ODWEK matches ANSI, rather than an 'A', the command includes
"-x 2".

This type of substitution can be used to specify the RECFM (Record Format), PRMode, TRC,
and CC Type.

9.2.5 V9.5 enhancement: Application Group and Application-specific XML


In version 9.5.0.2, ODWEK now provides additional options under the <transform> node that
allow the transform command parameters to be generated based on an Application Group, or
an Application Group and Application pair.

Figure 9-2 on page 213 shows a sample transform.xml that can be used in this
implementation.

212 IBM Content Manager OnDemand Guide


Figure 9-2 Sample XML with <ApplicationGroup><Application> tags

Figure 9-3 shows the transform commands that are generated based on the sample XML and
Application Group and Application of the document that is retrieved.

Figure 9-3 Table of generated commands

Chapter 9. Data conversion 213


Note: Inheritance is not supported. If an <ApplicationGroup> node is matched, only those
options within that node are used for the transform; no parameters that are identified for a
parent transform node are used. Similarly, if an <Application> node is matched within an
<ApplicationGroup> node, only those options are used for the transform; nothing from the
<ApplicationGroup> node is used.

9.2.6 Advanced implementation: Custom Java interface


By using the advanced implementation of the Generic Transform Interface, client developers
can write a Java interface to ODWEK that can handle the transform requests in a
programmatic way, offering the most application flexibility. Developers can create a class and
implement the transformData() method to accept document data and details from Content
Manager OnDemand and transform the data in any way they choose.

Example 9-3 shows a sample of the ODTransform.xml files that can be used in this
implementation.

Example 9-3 Sample ODTransform.xml file

<Transforms>
<transform>
<TransformName>MYTXFRM</TransformName>
<TransformDescription>GENERIC Transform Engine.</TransformDescription>
<ClientClass>com.companyA.corp.TransformClient</ClientClass>
<OutputMimeType>application/pdf</OutputMimeType>
<OutputExtension>pdf</OutputExtension>
<CmdParms>
<AG_NAME>agName</AG_NAME>
<APPL_NAME>applName</APPL_NAME>
<RECORDFORMAT>recfmt</RECORDFORMAT>
<RECORDLENGTH>LineLength</RECORDLENGTH>
<CARRIAGECONTROL>CC</CARRIAGECONTROL>
<CODEPAGE>CodePage</CODEPAGE>
</CmdParms>
</transform>
</Transforms>

Similar to the basic implementation, the developer uses this XML stanza to set up the
required details for document transformation and how those details are passed to the Java
transform interface. Example 9-4 shows an example of how the Java interface can be used
with the XML stanza to create a document transform request. The example is a code snippet
of how the Client Class that is defined in Example 9-3 might be written to transform data.

Example 9-4 Client Class code snippet for transform data


//*******************************************************************
// Testcase: CustomTransform
//
// This class tests the ODWEK Generic Transform's Custom
// Java Interface by implementing the required transformData method.
//
// transformData is called by ODWEK when its corresponding custom
// viewer is called via ODHit.retrieve.
//*******************************************************************
import java.util.*;

214 IBM Content Manager OnDemand Guide


import com.ibm.edms.od.*;

public class CustomTransform {


public static HashMap transformData(HashMap odMap) throws Exception {
System.out.println("Inside transformData method");
// List this transform name from the XML file
System.out.println(" Transform name: " +
(String)odMap.get(ODTransform.TXFRM_REQ_NAME));

// List the property keys and values ODWEK read from the transform XML
// file and provided to this Custom Class
System.out.println(" Transform properties:");
Properties gtProps = (Properties)odMap.get(ODTransform.TXFRM_REQ_PROPS);
Enumeration<?> enumeration = gtProps.keys();
List<String> list = new ArrayList<String>();
while (enumeration.hasMoreElements()) {
list.add((String)enumeration.nextElement());
}
Collections.sort(list);
for (String key : list)
System.out.println(String.format("%25s = %-25s", key,
gtProps.getProperty(key)));

// Retrieve the native document from ODWEK


byte[] inDoc = (byte [])odMap.get(ODTransform.TXFRM_REQ_DATA);
System.out.println(" Native document size: " + (inDoc == null ? null:
inDoc.length));

// Retrieve the document resources from ODWEK


byte[] inRes = (byte [])odMap.get(ODTransform.TXFRM_REQ_RES);
System.out.println(" Native doc resource size: " + (inRes == null ? null:
inRes.length));

// Normally this is where you do the transform or do something with the


byte data.
// Let's just concat the resources if there are any to the doc
byte[] transformedDoc;
if (inRes != null) {
transformedDoc = new byte[inRes.length + inDoc.length];
System.arraycopy(inRes, 0, transformedDoc, 0, inRes.length);
System.arraycopy(inDoc, 0, transformedDoc, inRes.length,
inDoc.length);
}
else
transformedDoc = inDoc;
System.out.println(" Concatenated resources to doc size: " +
transformedDoc.length);

// Send the transformed data back to ODWEK


HashMap rtnMap = new HashMap();
rtnMap.put(ODTransform.TXFRM_RESP_DATA, transformedDoc);
return rtnMap;
}
}

Example 9-4 on page 214 shows how to set up the HashMap to pass document byte arrays in
and out of this custom interface, and how to define a custom Java class that contains the
transformData() method.

Chapter 9. Data conversion 215


This code retrieves the raw document data from ODWEK, gathers all of the document details
that Content Manager OnDemand might store from loading the data, and then transforms the
document data. The transformed document data can be passed back through ODWEK to the
original client request.

Table 9-1 lists the XmlTagNames for the transformation specification.

Table 9-1 XmlTagNames for the transform specification


XmlTagname ODConstant Description

TransformName TransFormName Name of the transform. It is used as the


viewer argument that is passed to
ODWEK Retrieve APIs.

TransformDescription TRANSFORM_DESC Description of the transform.

ClientClass TRANSFORM_CLIENTCLASS The class name of the custom interface


class.

CmdLineExe TRANSFORM_CMDLINEEXE Fully qualified name of the transform


executable file.

OutputMimeType TRANSFORM_MIMETYPE The MIME type of the data as it is


returned from the transform.

OutputExtension TRANSFORM_OUTPUTEXT The extension of the data that is


returned from the transform.

CmdParms TRANSFORM_PARMS The mappings of OD Values to custom


variables. See the constant key words
that are shown in Table 9-2 on
page 216.

Passthru TRANSFORM_PASSTHRU These values are passed through


ODWEK directly to the transform.

Cmdlineparm TRANSFORM_PASSTHRU_CMDLINE These values are passed through


ODWEK directly to the transform
command line.

Table 9-2 provides information about the XMLTags. These XML tags are used to pass specific
values to the transform command line. These XML tags allow the mapping of the
command-line option where the specified value can be passed.

Table 9-2 XmlTags detailed information


XmlTagname ODConstant Description

RECORDFORMAT DOCUMENT_RECORD_FORMAT The record format of the document as stored


in Content Manager OnDemand.

RECORDLENGTH DOCUMENT_RECORD_LENGTH The record length of the document as stored


in Content Manager OnDemand.

CARRIAGECONTROL DOCUMENT The carriage control of the document as


_CARRIAGE_CONTROL stored in Content Manager OnDemand.

TRC_EXIST DOCUMENT_TRC The TRC settings as stored in Content


_EXIST Manager OnDemand.

DOCROTATION DOCUMENT The rotation of the document as stored in


_ROTATION Content Manager OnDemand.

216 IBM Content Manager OnDemand Guide


XmlTagname ODConstant Description

AG_NAME AGNAME The Content Manager OnDemand


application group where the document is
stored.

APPL_NAME APPLNAME The OnDemand application where the


document is stored.

CODEPAGE DOCUMENT_CODEPAGE The code page of the document as stored in


OnDemand.

LINEDELIMITER DOCUMENT_LINE_DELIMITE The line delimiter of the document as stored


R in OnDemand.

INPUTFILE TXFRM_INPUT_FILE The Inputfile parameter to be used by the


transform.

OUTPUTFILE TXFRM_OUTPUT_FILE The OutputFile parameter that is used by the


transform.

V9.5 enhancements

DOCUMENT_CC_ANSI DOCUMENT_CC_ANSI Used instead of <CARRIAGECONTROL> to define


the command-line option and value when the
document’s CC Type is “ANSI” as stored in
Content Manager OnDemand.

DOCUMENT_CC_MACHINE DOCUMENT_CC_MACHINE Used instead of <CARRIAGECONTROL> to define


the command-line option and value when the
document’s CC Type is “Machine” as stored in
Content Manager OnDemand.

DOCUMENT_CC_NONE DOCUMENT_CC_NONE Used instead of <CARRIAGECONTROL> to define


the command-line option and value when the
document’s CC is “No” as stored in Content
Manager OnDemand.

RECORDFORMATFIXED DOCUMENT_RECORDFORMAT Used instead of <RECORDFORMAT> to define the


_FIXED command-line option and value when the
document’s RECFM is “Fixed” as stored in
Content Manager OnDemand.

RECORDFORMATVARIABLE DOCUMENT_RECORDFORMAT Used instead of <RECORDFORMAT> to define the


_VARIABLE command-line option and value when the
document’s RECFM is “Variable” as stored
in Content Manager OnDemand.

RECORDFORMATSTREAM DOCUMENT_RECORDFORMAT Used instead of <RECORDFORMAT> to define the


_STREAM command-line option and value when the
document’s RECFM is “Stream” as stored in
Content Manager OnDemand.

PRMODENONE DOCUMENT_PRMODENONE Used instead of <PRMODE> to define the


command-line option and value when the
document’s PRMode is “None” as stored in
Content Manager OnDemand.

PRMODESOSI1 DOCUMENT_PRMODESOSI1 Used instead of <PRMODE> to define the


command-line option and value when the
document’s PRMode is “SOSI1” as stored in
Content Manager OnDemand.

Chapter 9. Data conversion 217


XmlTagname ODConstant Description

PRMODESOSI2 DOCUMENT_PRMODESOSI2 Used instead of <PRMODE> to define the


command-line option and value when the
document’s PRMode is “SOSI2” as stored in
Content Manager OnDemand.

TRC_YES DOCUMENT_TRCYES Used instead of <TRC_EXISTS> to define the


command-line option and value when the
document’s TRC is “'Yes” as stored in Content
Manager OnDemand.

TRC_NO DOCUMENT_TRCNO Used instead of <TRC_EXISTS> to define the


command-line option and value when the
document’s TRC is “No” as stored in Content
Manager OnDemand.

Table 9-3 provides information about the OnDemand client HashMap keys that are used for
advanced Java implementation.

Table 9-3 OnDemand client hashmap key and descriptions


HashMap key Description

TXFRM_RESP_DATA This key is the HashMap key for the transformed data byte[] to be
returned to ODWEK.

TXFRM_REQ_NAME Name of transform for this request.

TXFRM_REQ_METHOD The method name that is used in the custom Java class. The
transformData() method must exist in the client class.

TXFRM_REQ_DATA The original Content Manager OnDemand Document data that is


contained in this request.

TXFRM_REQ_PROPS The document details as specified or requested in the transform.xml


file.

218 IBM Content Manager OnDemand Guide


10

Chapter 10. Migration and expiring data and


indexes
IBM Content Manager OnDemand (Content Manager OnDemand) provides multiple
methodologies for expiring report data (documents) and their indexes. In this chapter, we
describe the overall lifecycle of report data, including loading, storage, migration, and
expiration.

In this chapter, we cover the following topics:


 Introduction
 Loading and storing the data
 Configuring for migration and expiration
 Reloading data
 Expiration processing on Multiplatforms and z/OS
 Expiring data on OnDemand for i

© Copyright IBM Corp. 2003, 2015. All rights reserved. 219


10.1 Introduction
For this chapter, unless explicitly stated otherwise, the term “data” is used to refer to the
report data, the extracted documents or segments, and their related indexes and the
extracted resources.

A Content Manager OnDemand system logically stores data in application groups. An


application group is defined by the Content Manager OnDemand administrator. It consists of
data that has the same indexing, data storage, and expiration requirements. The application
group definition also specifies where the report and document data are stored, how long the
data is stored, and how the data expires. The method or methods that can be used to expire
the data are a function of the application group parameters that are defined before the data is
loaded into Content Manager OnDemand. In a Content Manager OnDemand system, data
typically goes through a lifecycle of loading, storing, migration, and an expiration process.

10.2 Loading and storing the data


The Content Manager OnDemand architecture allows the control and management of the
data throughout its lifecycle. The data lifecycle begins with running an efficient load process.
Each load process invocation ingests report data for a specified application group.

During a load process, Content Manager OnDemand stores report (document) data, its
resources, and index data, as shown in Figure 10-1.

C ache di r 1

ca che
Stora ge set
C ache di r n

An d or

Report Data Stora ge set Stor age No de 1


(storage objects /
docu ments) SM ma nag er
Load Stora ge nod e n
Process

Indexes Seg ment-1


tabl e
Ag Data
D ataba se
Tab le Ma nag er
Seg ment-n
tabl e

Figure 10-1 Data and index storage locations

The Content Manager OnDemand load process identifies, segments, and compresses
groups of documents into storage objects that are then stored in the Content Manager
OnDemand archive, as illustrated in Figure 10-1. To improve the efficiency of the storage
process, Content Manager OnDemand aggregates the stored documents (typically a few
kilobytes in size) into storage objects. This aggregation provides efficient, high-volume
storage, retrieval, and expiration performance.

220 IBM Content Manager OnDemand Guide


The object size is defined by clicking Advanced on the Storage Manager tab of the
Application Group window. The object size is the size of a storage object in kilobytes (KB). By
default, Content Manager OnDemand segments and compresses report data into 10 MB
storage objects. For most use cases, the default value is appropriate. Valid values are
1 KB - 150 MB.

Object size value: Exercise caution when you change the object size value. Specifying
too large or too small a value can adversely affect performance when you load data.

The storage objects are stored in storage sets. The storage sets contain one or more primary
storage nodes. The storage node points to the location where the data is stored, which can be
cache, the storage manager (Tivoli Storage Manager, object access method (OAM), or
Archive Storage Manager (ASM)), or a combination.

The primary storage nodes can be on one or more object servers. When the Load Type is
Local, Content Manager OnDemand loads data on the server on which the data loading
program runs in the primary storage node with the Load Data property specified. If the Load
Type is Local, and the storage set contains primary nodes on different object servers, you
must select the Load Data check box for one primary node on each object server.

The storage set must support the number of days that you plan to maintain reports in the
application group. For example, if you must maintain reports in archive storage for seven
years, the storage set must identify a storage node (or migration policy on an IBM i server)
that is maintained by ASM for seven years.

A detailed description of adding storage sets and storage nodes is in Chapter 5, “Storage
management” on page 89 and the related OnDemand Administrative Guide.

10.2.1 Storing the report (document) data


To improve efficiency and scalability, stored documents are embedded within storage objects.
The storage objects are then stored in cache or a storage manager (OAM, Tivoli Storage
Manager, or ASM). The storage objects are eventually expired from the system based on
values that are defined by the Content Manager OnDemand administrator. In this section, we
describe each scenario and how it is implemented. The parameters that are described in this
section are on the Storage Manager tab of the Application Group window unless otherwise
specified.

Three sets of data are stored when you load a report:


 Index data, which is extracted by the indexing program and used by the search process
 Resources, such as an overlay and fonts, which are used to customize the viewed data
 Documents (or report segments) that will be viewed

Figure 10-2 on page 222 shows the datasets and illustrates four scenarios of their storage
and expiration.

Chapter 10. Migration and expiring data and indexes 221


CMOD D ataba se
Resou rces
Load Appli cation Grou p
Process In dexes Data tabl es

1 Sto rage Cach e Exp ire i n


o bje ct 5 ye ars
Report Data
do cumen ts
&
resou rces 2 Stora ge Cach e Mig ra te i n Sto rage Exp ire i n
obje ct 9 0 days Ma nag er 5 ye ars

3 Sto rage Sto rage Exp ire i n


o bje ct Ma nag er 5 ye ars

Sto rage
Exp ire after Sto rage Exp ire i n
4 o bje ct Cach e
90 days Ma nag er 5 ye ars

Figure 10-2 Data, resource, and index storage and expiration scenarios

Scenario 1: Cache only, then expiration


In this scenario, the storage object is stored to cache only and it is expired from cache after a
predetermined period. Typically, this methodology is employed under the following
circumstances:
 The life of the data is short, and hierarchical storage management (HSM) is not necessary.
 The life of the data is long, and a backup process exists for the data in cache.
 The cache device is large enough to hold the total archived data, and the cache device is
reliable and performs well.

This method is enabled by selecting a cache-only storage set and entering a number in the
Cache Data for __ Days field.

When you select a cache-only storage set, Content Manager OnDemand automatically sets
Migrate Data from Cache to No and sets the Expire in __ Days field to the same value as the
Cache Data for __ Days field. (The default value is 90 days.)

Selecting a cache-only storage set requires the creation of backup and data management
systems that are external to the Content Manager OnDemand system.

Cache-only storage: If the storage set contains cache-only storage nodes, ensure that
the Cache Data value and the Life of Data and Indexes value are the same. Otherwise, the
add or update operation cannot be completed.

Scenario 2: Cache, then migration to storage, and then expiration


In this scenario, the storage object is first stored to cache for a short period, after which it is
migrated to a storage manager for long-term storage.

222 IBM Content Manager OnDemand Guide


Typically, this methodology is employed under the following circumstances:
 Most of the data access occurs during the initial period. After that period, the data is
infrequently accessed, if ever. So, after this initial period, the data is migrated to the
storage manager.
 A performance advantage is possible if you retrieve the data from cache versus if you
retrieve the data from the storage manager. The performance advantage for cache can
occur if the storage manager is on a device that is separate from the Content Manager
OnDemand object server, or if the storage manager is local but the storage device is
relatively slow, such as tape or an optical disk.

Migrating data from cache


This function, which can be accessed by clicking Advanced on the Storage Manager tab of
the Application Group window, determines how long the data is kept in cache before it is
migrated to archive storage (on a potentially slower archive storage device).

The data needs to be kept on a high-performance storage device for the period during which
it is retrieved frequently. The storage set must support the type of media that is required to
hold reports that are stored in the application group. For example, if you must maintain
reports in cache storage for 90 days and in archive storage for seven years, the storage set
must identify a storage node (or migration policy) that causes ASM to maintain the data for
seven years, and you must select Cache Data for __ Days and enter 90 in its field.

From a user’s perspective, no procedural difference exists in retrieving the data from either
cache or archive storage. The only user-perceivable difference is the response time. Various
archive storage mechanisms provide different performance profiles. For example, when you
use OAM and the data is stored in DB2 tables on disk, the response time is as fast as the
cache response time. The main difference in response time is based on the type of disk that is
used by either method. Conversely, if the OAM data is stored on optical disks or tape, the
response time is increased dramatically. If you use a network-attached Tivoli Storage
Manager server, the retrieval rates (throughput and response times) are governed by the
Tivoli Storage Manager device and the TCP/IP connection to that device.

Typically in a z/OS environment, data is not stored in cache. Content Manager OnDemand for
z/OS customers typically use OAM as their storage manager. OAM supports storing the data
directly in DB2 where the storing and retrieval rates are exceptionally fast, which eliminates
the need to maintain and monitor cache file systems in the z/OS file system (zFS) or the
hierarchical file system (HFS).

Scenario 3: Storage manager only, then expiration


The storage object is stored directly to the storage manager. Typically, this methodology is
employed under the following circumstances:
 The performance of the storage manager equals the performance of the local file system,
which implies that the storage manager stores data to a relatively fast device, such as
local disk.
 Hierarchical storage management is beneficial. An example is z/OS systems where
storing directly to OAM is the most popular solution.

If you do not need to maintain reports in cache storage, select a storage set that identifies a
storage node (or migration policy) that is maintained by ASM and set Cache Data to No.
Content Manager OnDemand automatically sets Migrate Data from Cache to When Data is
Loaded.

Chapter 10. Migration and expiring data and indexes 223


Scenario 4: Both cache and storage manager, then expiration
The storage object is stored directly to both cache and the storage manager. After a short
period, the data is expired from cache. Then, after a much longer period, the data is expired
from the storage manager. Typically, this methodology is used under the following
circumstances:
 The cache file system allows more efficient data retrieval.
 The data needs to be kept for a longer period.
 The hierarchical storage management (or other features) of the storage manager is
required.

The Cache data field determines whether Content Manager OnDemand stores data in cache
storage. If the storage set is a cache-only storage set, Yes is the only selection. If the storage
set is an archive manager-controlled storage set (OAM, Tivoli Storage Manager, or ASM), you
can optionally add storing the data in cache.

Note: Whether the data is stored in cache or in a storage manager, the main performance
differences are a result of the following items:
 The hardware speed (and I/O channels and interfaces) on which the data is stored.
 The location of the hardware device in relations to the object server.
If the hardware device connects over a TCP/IP link, that link can form a bottleneck,
depending on the link’s throughput and the required data retrieval rate.

10.2.2 Storing the index data


The Content Manager OnDemand load process extracts document indexes from the report
data and stores the indexes in the Content Manager OnDemand database application group
data tables. With these indexes, users can efficiently locate, select, and retrieve documents.
Typically, indexes are expired when the document data is expired.

Each application group is segmented into multiple physical tables by using a date or a date
and time field. The size of each physical table is determined by the Max rows setting. Each
row in the table contains a set of user-defined and system-defined indexes that enable the
search for a report segment or a document. Index data is loaded into a table. When the Max
rows value is reached, the table is closed and a new table is created. The number of physical
tables that represent an application group might grow from 1 to n.

10.2.3 Storing the resource data


If data caching is enabled, Content Manager OnDemand stores resources in the cache. Two
locations on the Storage Management tab affect how resources are stored:
 Resource Data
 Document Data

224 IBM Content Manager OnDemand Guide


Resource Data
The following selections are possible for Resource Data:
 Always Maintain in Cache: The resource data stays in cache forever, and it does not
expire.
 Cache Resource Data for xxx Days: The resource data stays in cache for xxx number of
days before the data expires.
 Restore Resources to Cache: The resource data is not in cache, and the resource data is
requested. The resources are restored to cache from the storage manager.

The ARSLOAD program saves one copy of a resource on each node for each application group.
The resource can be stored multiple times, depending on how the ARSLOAD program compares
the data. The ARSLOAD program compares the last 50 resources against the resource that is
generated by the load. If a match is not found, a new resource is stored.

When the ARSLOAD program processes a resource group file, it checks the resource identifier
to determine whether the resource is present on the system.

If the storage node identifies a client node in OAM or Virtual Storage Access Method (VSAM),
the storage manager copies the resources to archive storage.

Document Data
For Document Data, the following selections are valid:
 Yes for Cache Data: You can cache document data and resource data or only resource
data.
 No Cache: Document data is not stored in cache.
 Cache Document Data for xxx Days: Document data is stored in cache for xxx number of
days before the data expires.

10.3 Configuring for migration and expiration


Many customers choose to expire their document data and indexes somewhere in the range
of 5 - 10 years. In one extreme, document and index data might expire daily. In another
extreme, document and index data might never expire.

Four typical lifecycle scenarios are common. The Content Manager OnDemand administrator
selects the scenario to implement through various parameters (as shown in this section),
which are on the Storage Management tab of the Application Group window. The four
scenarios are illustrated in Figure 10-2 on page 222.

10.3.1 Migrating index data


Index migration is the process by which Content Manager OnDemand moves index data from
the database to archive storage. Index migration optimizes database storage space. With
index migration, you can maintain index data for a long time. You typically migrate index data
only after users no longer need to access the reports. However, for legal or other
requirements, you often must maintain data for a number of months or years.

Chapter 10. Migration and expiring data and indexes 225


If a user queries the index data that was migrated, an administrator must act to import a copy
of the migrated table or tables by running ARSADMIN (Multiplatforms or z/OS) or Start Import
into Content Manager OnDemand (STRIMPOND) on IBM i. After Content Manager OnDemand
maintains the imported index data in the database for the number of days that is specified in
the Keep Imported Migrated Indexes field, Content Manager OnDemand deletes the data
from the database.

Migration of indexes
This configuration is set up by clicking Advanced on the Storage Manager tab of the
Application Group window.

This field determines when Content Manager OnDemand migrates index data to archive
storage. Choose from No Migration or Migrate After __ Days. As a preferred practice, do not
migrate indexes to archive storage. Indexes that are migrated cannot be searched until after
they are imported by an administrator. Use this capability only under limited circumstances.

Closing index tables: Before you can migrate index data, the index tables must be closed.
The following Database Organization field options are valid:
 If the Database Organization field for the application group is set to Single Load per
table, this option is no longer supported.
 If the Database Organization field for the application group is set to Multiple Loads per
table, the index table is closed when the Maximum Rows value is reached.
 The Single table for all loads option is available for Content Manager OnDemand for
z/OS and Content Manager OnDemand for IBM i. Select the Single table for all loads
check box if you want to create one database table for each application group. This
option is most frequently used when you load a small amount of data. If you select this
option, the Maximum Rows field in this window is removed.

To close a table to loading before the Maximum Rows value is reached, run the ARSTBLSP
program with the -a1 parameter.

The index data must be migrated only after users no longer need to access the data. If a user
must access data in the migrated tables, the process of importing the data into the database
requires administrator intervention, and usually results in a significant delay in completing the
query. Additional space is required in the database and temporary storage areas to import the
data.

To enable the migration of index data, you must define a storage set that identifies a storage
node that is maintained by ASM and update the System Migration application group to use
the storage set.

10.3.2 Expiring data and indexes


In all four of the storage and expiration scenarios, the index data is stored in the Content
Manager OnDemand database in application group data tables. Typically, these indexes are
expired when the document data is expired from the system.

Life of Data and Indexes field


This field determines when Content Manager OnDemand deletes documents, resources, and
index data from the application group.

226 IBM Content Manager OnDemand Guide


The following options are valid:
 Never Expires: Content Manager OnDemand maintains the application group data
indefinitely.
 Expires in __ Days: After the data reaches this threshold, Content Manager OnDemand
can delete data from the application group the next time that ARSMAINT (with Content
Manager OnDemand for Multiplatforms or z/OS) or Disk Storage Management (DSM)
(with IBM i) is run. The default value is 2555 days (seven years). The maximum value that
you can use is 99999 days (273 years).

Note: If you plan to maintain application group data in archive storage, the length of
time that ASM maintains the data must be equal to or exceed the value that you specify
for the Life of Data and Indexes field.

Life of Data and Indexes can be used only if ARSMAINT (with Multiplatforms or z/OS) or
Disk Storage Management (DSM) (with IBM i) handles the expiration.

10.3.3 Expiring document data


Document data expiration is affected by the document expiration type.

Expiration type
The document expiration type determines how data is deleted from the application group. The
expiration type option is on the Storage Management tab of the Application Group window.

Four expiration types are valid:


 Load
 Storage Manager
 Segment
 Document

Expiration type: Load


When the expiration type is set to Load, the system deletes an input file (a load) from the
application group. Load is the default expiration type. The latest date value from the input file
and the Life of Data and Indexes field determine when the data is eligible to be deleted.

Note: The application group must have an expiration type of Load if any of the following
circumstances are true:
 You use or plan to use the Enhanced Retention Management feature.
 You use or plan to use the full text search feature.
 You use or plan to integrate with the FileNet P8.

For application groups with expiration types of Document, Segment, or Storage Manager,
utilities exist to convert these application groups to Load.

Consider engaging IBM Lab Services to provide these services.

With Content Manager OnDemand for Multiplatforms or z/OS, when the expiration type is set
to Load, if your object server is on z/OS, and your storage manager is OAM, you can allow
OAM to handle the data expiration and Content Manager OnDemand to handle the index
expiration by using ARSEXOAM program.

Chapter 10. Migration and expiring data and indexes 227


With Content Manager OnDemand for i, when the expiration type is set to Load, you can still
allow ASM to handle the data and index expiration by creating an expiration level in the
migration policy.

Expiration type: Storage Manager (z/OS)


The storage manager (OAM or VSAM) determines when data is deleted from the system.
Storage Manager expiration works with either the ARSEXPIR program or the ARSEXOAM program.

For more information about how to configure the system to use the ARSEXPIR and ARSEXOAM
programs, see the IBM Content Manager OnDemand for z/OS Administration Guide:
http://www.ibm.com/support/knowledgecenter/SSQHWE_9.0.0/com.ibm.ondemand.administe
ringzos.doc/aboutpub.htm?cp=SSQHWE_9.0.0%2F7-0

Storage Manager expiration is supported only on Content Manager OnDemand for z/OS
systems.

Expiration type: Segment


The system deletes a segment (table) of data from the application group. The system can
delete a segment of data only after the segment is closed and every record in the segment
reaches its expiration date.

With Multiple Loads per Database Table enabled, the system uses the maximum number of
rows to determine when to close a table. A segment likely contains the data from more than
one input file. If the Maximum Rows setting is too large, the segment is not expired until all of
the documents in the table reach their expiration dates. If the Maximum Rows setting is too
small, segments are created constantly and potentially deleted (based on the expiration
date). This large number of tables imposes a performance impact during the search query
time and expiration time.

The system derives the expiration date from the Segment field (or the date that the data was
loaded, if there is no Segment field) and the Life of Data and Indexes field. If the Segment
field contains a date in the MMYY format, data is eligible to be deleted on the first day of the
month (MM).

To specify the Segment field, complete the following steps:


1. Click the Field Information tab.
2. Select a date or date and time field.
3. Select the Segment check box.

Expiration type: Document


When the expiration type is set to Document, the system deletes a document from the
application group. To determine when to delete a document, the system uses the value of the
Expire Date field and the Life of Data and Indexes field. If the Expire Date field contains only
the month and year (MMYY format), the system deletes documents on the first day of the
month (MM).

To specify the Expire Date field, complete the following steps:


1. Click the Field Information tab.
2. Select a date or date and time field.
3. Select the Expire Date check box.

Performance note: Individual document deletion is the most costly type of deletion in
terms of processor consumption and run time.

228 IBM Content Manager OnDemand Guide


10.3.4 Expiring annotations
Annotations for all application groups are kept in a single application group data table, which
allows the expiration of annotations to be controlled at a system-wide level. The Life Of
Annotations field setup is on the System Parameters General tab. Annotations can be set to
never expire or to expire after N days. After the number of days (N) passes and ARSMAINT is
run, Content Manager OnDemand removes the annotation.

10.4 Reloading data


If you are migrating data by unloading and then reloading the data, you need to determine
your future expiration policy.

Reloading to change the expiration type


For example, if your current expiration policy is set to Storage Manager but you later want to
perform holds on the data, during the migration process (when you create the application
group and before you load any data), change your expiration policy from Storage Manager to
Load.

When you use the Enhanced Retention Management feature with Content Manager
OnDemand or IBM Enterprise Records (formerly IBM FileNet Records Manager), Content
Manager OnDemand must be in complete control of expiration processing. Therefore, if you
are using Tivoli Storage Manager or OAM, you must disable the ability for either of these
storage managers to expire data.

Also, you can use Enhanced Retention Management and Content Federation Services for
Content Manager OnDemand only with application groups with an expiration type of Load. For
those application groups with expiration types of Document, Segment, or Storage Manager,
utilities exist to convert these application groups to an expiration type of Load.

Consider engaging IBM Lab Services to provide these services.

Reloading ad hoc stored documents


If you choose not to take advantage of the ability of Content Manager OnDemand to
aggregate documents but instead you choose to load documents ad hoc by using the
storeDocument Java API, StoreDoc Object Linking and Embedding (OLE) API, or
CommonStore, you must migrate the data later.

If you choose not to take advantage of the ability of Content Manager OnDemand to
aggregate documents into 10 MB storage objects, this decision might result in millions of
small objects that are stored in your storage manager, which might cause the storage
manager to experience performance problems when it migrates these small objects to tape.

Note: Consider aggregating these smaller objects into larger objects for performance
reasons.

For you to aggregate all of these tiny objects into larger objects after they are stored
individually requires that you retrieve and reload them as larger objects. You might want to
engage IBM Lab Services to assist you with this task.

Another option is to not migrate objects to tape, but to use another random access hardware
device instead.

Chapter 10. Migration and expiring data and indexes 229


10.5 Expiration processing on Multiplatforms and z/OS
This section goes into detail about the expiration process on Multiplatforms and z/OS.

10.5.1 Content Manager OnDemand expiration: ARSMAINT


The ARSMAINT program manages application group data in the Content Manager OnDemand
database and in cache storage.

You typically run the ARSMAINT program on a regular schedule to perform the following tasks:
 Migrate files from cache storage to archive storage.
 Delete files from cache storage.
 Optionally, migrate index data from the database to archive storage.
 Delete index data from the database.

The application group data and the data that you stored in cache are all managed by the
ARSMAINT program. It is managed by using the storage management values from the
application groups that are defined to the system.

Here are the storage management field values that are used:
 Life of Data and Indexes
 Length of Time to Cache Data on Magnetic
 Length of Time Before Copying Cache to Archive Media
 Length of Time Before Migrating Indexes to Archive Media
 Length of Time to Maintain Imported Migrated Indexes
 Expiration Type

10.5.2 Expiring indexes


The ARSMAINT program uses the Expiration Type field value to determine how to delete index
data from an application group. The ARSMAINT program can expire a table of application group
data at a time, a load at a time, or individual documents. Ensure that the ARSMAINT program
command runs periodically (for example, daily) so that Content Manager OnDemand deletes
indexes and cache data (and the storage manager deletes archive data, if applicable). By
running the ARSMAINT program regularly, you ensure that the expired documents can no
longer be retrieved.

Additionally, you can start manual expiration processing by running the ARSMAINT program
from the command line. For example, to run expiration processing, run the following
command at the command line:
arsmaint -d

When the ARSMAINT program removes indexes, it saves the following message in the system
log:
“128 ApplGrp Segment Expire (ApplGrp) (Segment)”

One message is saved in the system log for each table that was dropped during expiration
processing.

230 IBM Content Manager OnDemand Guide


When to run the maintenance processes: Most maintenance processes need to run
when no other applications are updating the database or need exclusive access to the
database and when you are sure that no one is retrieving documents from the system. For
example, you must not perform maintenance on the database while you are loading data
into the system.

The relationship between ARSMAINT and ARSSOCKD processing is illustrated in Figure 10-3.
Life of Date and Indexes Settings ARSMAINT ARSSOCKD

Determi nes whi ch 1. Expir es Data fr om C ache


Life of Date and Indexes Settings In dexe s ARSMAINT
a nd ob jects ARSSOCKD
2. Expir es i nde xes
nee d to b e expi red 3. Expir es anno ta ti ons
Determi nes whi ch 1. Expir es Data fr om C ache
In dexe s a nd ob jects 2. Expir es i nde xes
nee d to b e expi red 3. Expir es anno ta ti ons

Figure 10-3 Relationship between ARSMAINT and ARSSOCKD programs

Collecting statistics
Content Manager OnDemand provides two programs to collect statistics on database tables:
the ARSDB program and the ARSMAINT program.

When you run the ARSMAINT program to collect statistics, it collects statistics on all of the
tables in the database that changed since the last time that you collected statistics. You can
automate the collection of statistics by scheduling the ARSMAINT program to run with the
appropriate options.

You can use the ARSDB program to collect statistics on the Content Manager OnDemand
system tables. The Content Manager OnDemand system tables include the user table, the
group table, and the application group table. For most systems, the Content Manager
OnDemand system tables require little maintenance. You can probably schedule the ARSDB
program to collect statistics once a month (or less often).

The syntax for the ARSDB program is shown:


/opt/IBM/ondemand/V9.0/bin/arsdb <options>

The options are explained:


-e Drop configuration indexes.
-r Create configuration indexes.
-s Collect statistics.

System log messages


When you run the ARSMAINT program, it saves messages about its activities in the system log.
The types of messages that are saved in the system log depend on the options that you
specify when you run the ARSMAINT program.

The number of messages that are saved in the system log each time that expiration
processing runs depends on the following factors:
 The options that you specify for the ARSMAINT program
 The number of application groups that is processed
 The number of segments of data that is processed
 The number of cache storage file systems that are defined on the server

Chapter 10. Migration and expiring data and indexes 231


Note: You see one set of messages for each object server on which you run the ARSMAINT
program.

For example, when expiration processing starts on a specified server, you might see the
following message:
“109 Cache Expiration (Date) (Min%) (Max%) (Server)”

Migration processing uses the specified date (the default is “today” in internal format).
Expiration processing begins on each cache file system that exceeds the Max% (default 80%)
and ends when the free space that is available in the file system falls below the Min% (default
80%).

One of these messages shows for each storage object that is deleted from cache storage. A
storage object is eligible to be deleted when its “Cache Document Data for n Days” or “Life of
Data” period passes (whichever occurs first).

A storage deletion message looks similar to the following message:


“196 Cache Migration (ApplGrp) (ObjName) (Server)”

Also, information-only messages report the percentage of space that is used in the file
system.

An information message looks similar to the following message:


“124 Filesystem Statistics (filesystem) (% full) (server)”

Load table (ARSLOAD)


The ARSLOAD table can be used to track loads for expiration. This table maintains a record
of all successful loads to application groups with the “expire by load” expiration type.

10.5.3 Removing documents from the Tivoli Storage Manager archive


Removing a document from archive storage means that the backup (if the primary document
copy is in cache) or long-term copy (if the primary document copy is in archive) of the
document is deleted from the system. You remove documents from archive storage when you
no longer have a business or legal requirement to keep them.

A management class contains an archive copy group that specifies the criteria that makes a
document eligible for deletion. Documents become eligible for deletion under the following
conditions:
 Administrators delete documents from client nodes
 An archived document exceeds the time criteria in the archive copy group (how long
archived copies are kept)

ASM does not delete information about expired documents from its database until expiration
processing runs. You can run expiration processing either automatically or manually by
command. Ensure that expiration processing runs periodically to allow ASM to reuse storage
pool space that is occupied by expired documents.

When expiration processing runs, ASM deletes documents from its database. The storage
space that these documents used to occupy then becomes reclaimable. For more
information, see “Reclaiming space in storage pools” on page 233.

232 IBM Content Manager OnDemand Guide


You control automatic expiration processing by using the expiration processing interval
(EXPINTERVAL) in the server options file (dsmserv.opt). You can set the option by editing the
dsmserv.opt file. For more information, see the Content Manager OnDemand Installation and
Configuration Guide:
http://www.ibm.com/support/knowledgecenter/

You can obtain more information in the “Running expiration processing automatically” section
at the following website:
http://ibm.co/1iO9SdX

If you use the server option to control when expiration processing occurs, ASM processes
expirations each time that you start the server. Afterward, it runs expiration processing at the
interval that you specified with the option, which is measured from the start time of the server.

You can manually start expiration processing by running the EXPIRE INVENTORY command.
Expiration processing then deletes information about expired files from the database. You can
schedule this command by running the DEFINE SCHEDULE command. If you schedule the
EXPIRE INVENTORY command, set the expiration interval to 0 (zero) in the server options so
that ASM does not run expiration processing when you start the server. You can control how
long the expiration process runs by using the DURATION parameter with the EXPIRE INVENTORY
command.

Reclaiming space in storage pools


Space on a storage pool volume becomes reclaimable as documents expire or as they are
deleted from the volume. For example, documents become obsolete because of aging.

ASM reclaims the space in storage pools based on a reclamation threshold that you can set
for each storage pool. When the percentage of space that can be reclaimed on a volume rises
above the reclamation threshold, ASM reclaims the volume. ASM rewrites documents on the
volume to other volumes in the storage pool, making the original volume available for new
documents.

ASM checks whether reclamation is needed at least once each hour and begins space
reclamation for eligible volumes. You can set a reclamation threshold for each storage pool
when you define or update the storage pool.

During reclamation, ASM copies the files to volumes in the same storage pool unless you
specified a reclamation storage pool. Use a reclamation storage pool to allow automatic
reclamation for a storage pool with only one drive. See your ASM documentation for details.

After ASM moves all documents to other volumes, one of the following actions occur for the
reclaimed volume:
 If you explicitly defined the volume to the storage pool, the volume becomes available for
reuse by that storage pool.
 If the volume was acquired as a scratch volume, ASM deletes the volume from its
database.

Important: For more information about reclamation processing, including choosing a


reclamation threshold, reclaiming volumes in a storage pool with one drive, reclaiming
Write Once Read Many (WORM) optical media, reclaiming for copy storage pools, and
reclaiming offsite volumes, see your Tivoli Storage Manager documentation.

Chapter 10. Migration and expiring data and indexes 233


Managing Tivoli Storage Manager storage
For each automated library, Tivoli Storage Manager tracks in its volume inventory for the
library whether a volume has scratch or private status:
 A scratch volume is a labeled volume that is empty or contains no valid data, and it can be
used to satisfy any request to mount a scratch volume. To support Content Manager
OnDemand, you define scratch volumes to Tivoli Storage Manager. Tivoli Storage
Manager uses scratch volumes as needed, and returns the volumes to scratch when they
become empty (for example, when all data on the volume expires).
 A private volume is a volume that is in use or owned by an application, and it might contain
valid data. Volumes that you define to Tivoli Storage Manager are private volumes. A
private volume is used to satisfy only a request to mount that volume by name. When
Tivoli Storage Manager uses a scratch volume, it changes the volume’s status to private.
Tivoli Storage Manager tracks whether defined volumes were originally scratch volumes.
Volumes that were originally scratch volumes return to scratch status when they become
empty.

Secondary storage of storage volumes


For instructions that describe how to handle physical storage volumes and remove them from
the library, see the documentation that is provided by the library manufacturer.

For instructions about documentation that you might need to complete when you remove
storage volumes from a library and where to store them for safekeeping, see your
organization’s media storage guide.

Protecting data with data retention protection


To avoid the accidental erasure or overwriting of critical data, Content Manager OnDemand
supports the Tivoli Storage Manager APIs that relate to data retention. Data retention
protection prohibits the explicit deletion of documents until their specified retention criterion is
met. Although documents can no longer be explicitly deleted, they can still expire.

Important notes:
 Data retention protection is permanent. After it is turned on, it cannot be turned off.
 Content Manager OnDemand does not support deletion on hold data. This feature
prevents held data from being deleted until the hold is released.

Tivoli Storage Manager supports two retention policies:


 In creation-based retention, the policy becomes active when the data is stored (created)
on the Tivoli Storage Manager server. This policy is the default retention policy method
and it is used with normal backup/archive clients.
 In event-based retention, the policy becomes active when the client sends a retention
event to the Tivoli Storage Manager server. The retention event can be sent to the server
any time after the data is stored on the server. Until the retention event is received, the
data is indefinitely stored on the Tivoli Storage Manager server. For Content Manager
OnDemand, the retention event is the call to delete the data. A load, unload, application
group delete, or expiration of data triggers the retention event.

If you decide to use these policies in Tivoli Storage Manager, the Content Manager
OnDemand scenarios that are described in the rest of this section are supported.

234 IBM Content Manager OnDemand Guide


Turning off data retention protection
When you turn off data retention protection, the following descriptions explain what happens
when you use the creation-based object expiration policy and the event-based retention
object expiration policy:
 Creation-based object expiration policy: Content Manager OnDemand issues a delete
object command through the Tivoli Storage Manager API. Objects are deleted during the
next inventory expiration. If a Content Manager OnDemand application group is deleted, a
delete filespace command is issued instead, and the objects are immediately deleted
with the file space.
 Event-based retention object expiration policy: Content Manager OnDemand issues an
event trigger command through the Tivoli Storage Manager API. The status of the
objects that are affected changes from PENDING to STARTED, and the objects are expired by
Tivoli Storage Manager based on their retention parameters. If the retention parameters
are set to NOLIMIT, the objects never expire. If a Content Manager OnDemand application
group is deleted, a delete filespace command is issued instead, and the objects are
immediately deleted with the file space.

Turning on data retention protection


When you turn on data retention protection, the following descriptions explain what happens
when you use creation-based object expiration policy and event-based retention object
expiration policy:
 Creation-based object expiration policy: Content Manager OnDemand issues no
commands to Tivoli Storage Manager. The objects are effectively orphaned by Content
Manager OnDemand and are expired by Tivoli Storage Manager based on their retention
parameters. If the retention parameters are set to NOLIMIT, the objects never expire.
 Event-based retention object expiration policy: Content Manager OnDemand issues an
event trigger command through the Tivoli Storage Manager API. The event status of the
objects that are affected is changed from PENDING to STARTED, and the affected objects
are expired by Tivoli Storage Manager based on their retention parameters. If the retention
parameters are set to NOLIMIT, the objects never expire.
If a Content Manager OnDemand application group is deleted, a delete filespace
command cannot be used with data retention protection; the operation is treated the same
as though a delete is indicated. The status of all of the affected objects is changed from
PENDING to STARTED, and the affected objects are expired by Tivoli Storage Manager
based on their retention parameters. This action leaves the file space entries in Tivoli
Storage Manager, so you must manually delete these entries when the file space is empty
(even with data retention protection on).

Recommendations
Consider the following preferred practices when you work with data retention protection:
 Set up the application groups to expire by load.
 Define the Tivoli Storage Manager archive copy groups to be event-based, and retain data
for 0 days.
 Run the Tivoli Storage Manager inventory expiration regularly to ensure that expired data
is removed.

Chapter 10. Migration and expiring data and indexes 235


The following devices are supported by Content Manager OnDemand:
 IBM DR450 and DR550
These devices are disk-based systems that contain a Tivoli Storage Manager that runs
data retention protection.
 EMC Centera
This device is a disk-based system that is treated as a device by Tivoli Storage Manager.
Tivoli Storage Manager must run data retention protection.

10.5.4 Storage Manager-based expiration (z/OS only)


The ARSEXOAM and ARSEXPIR programs are used for storage manager-based expiration.

ARSEXOAM
The ARSEXOAM program is used to process the rows in the ARSOAM_DELETE table that
indicate that Content Manager OnDemand OAM objects expired and to remove the
associated table entries for those objects. This program works for z/OS only.

Figure 10-4 shows how the ARSEXOAM program deletes the index entries for object stores in
OAM.

Figure 10-4 How ARSEXOAM deletes index entries for object stores in OAM

236 IBM Content Manager OnDemand Guide


Notes:
 If one object for a load ID is deleted, all of the index entries for that load ID are deleted.
 Index entries of all OAM objects that are recorded as being deleted by rows in the
ARSOAM_DELETE table are deleted regardless of the settings in the Life of Data and
Indexes section on the Storage Management tab of the application group.
 If you plan to use Storage Management expiration, ensure that you set the expiration
type of all application groups to Storage Manager.
 The recommended expiration type for Content Manager OnDemand is Load. Content
Manager OnDemand supports the expiration type of Load with the use of ARSEXOAM for
expiring the indexes in Content Manager OnDemand.
 Storage Manager expiration is incompatible with Enhanced Retention Manager and
Content Federation Services for Content Manager OnDemand.

The following parameters relate to the ARSEXOAM program:


 COMMITCNT
Specifies the number of fetches from the ARSOAM_DELETE, ARSOD, and ARSODIND
tables that are performed between COMMITS.
If this parameter is not specified, 1000 is used. If 0 is specified, no commits are performed
while fetching. The ARSOD and ARSODIND tables are processed only if Content
Manager OnDemand for OS/390 Version 2 migrated index rows are being deleted.
 UNLOADMAX
Specifies how many objects to hold in memory at any time. The default is 100,000.
 REQLIMIT
Specifies the maximum number of objects to send to the server in each request. This
number defaults to the ARS_EXPIRE_REQLIMIT parameter in the ars.cfg, or 100 if
ARS_EXPIRE_REQLIMIT is not specified. Load IDs for the same application group can be
grouped up to the ARS_EXPIRE_REQLIMIT value. All load IDs in a single expiration request
must belong to the same application group. For example, adding
ARS_EXPIRE_REQLIMIT=100 allows up to 100 load IDs for an application group to be
processed at a time. The optimum value to use is a function of multiple variables, including
table size. Suboptimal values might lead to table scans. EXPLAINs with various SQL that
uses the type of SQL that is involved help determine whether an index or a table scan
occurs.

ARSEXPIR
The ARSEXPIR program can be used to process System Management Facility (SMF) records
that indicate that Content Manager OnDemand objects expired and to remove the associated
index entries for those objects.

Figure 10-5 on page 238 illustrates two methods that the ARSEXPIR program uses to expire
OAM and VSAM objects.

Chapter 10. Migration and expiring data and indexes 237


Figure 10-5 Two ways ARSEXPIR expires OAM and VSAM objects

The ARSEXPIR program uses SMF type 65 (for VSAM objects) or SMF type 85 (for OAM
objects). The installation must collect and install ARSSMFWR as the CBRHADUX OAM
auto-delete exit. For more information, see “Deleting OAM and VSAM Objects” in the IBM
Content Manager OnDemand for z/OS: Administration Guide, SC19-1213.

ARSSMFWR determines which objects were deleted. The ARSEXPIR program then instructs the
Content Manager OnDemand server to remove the index entries.

Notes:
 If one object for a load ID is deleted, all of the index entries for that load ID are deleted.
 Index entries of all objects that are recorded as being deleted by the SMF records are
deleted regardless of the settings in the Life of Data and Indexes section on the Storage
Management tab of the application group. If you want to use Storage Management
expiration, ensure that you set the expiration types of all application groups to Storage
Manager.

238 IBM Content Manager OnDemand Guide


Important keywords that affect the expiration performance are COMMITCNT, REQLIMIT,
UNLOADMAX, and USERSMF:
 COMMITCNT
This keyword specifies the number of fetches from the ARSOD and ARSODIND table that
are to be performed between COMMITS. If this number is not specified, 1000 is used. If
this number is 0, no commits are performed while fetching. This parameter is used only if
Content Manager OnDemand for OS/390 Version 2 migrated index rows are being
deleted.
 REQLIMIT
This keyword specifies the maximum number of objects to send to the server in each
request. The REQLIMIT keyword defaults to the ARS_EXPIRE_REQLIMIT parameter in the
ars.cfg, or 100 if ARS_EXPIRE_REQLIMIT is not specified.
 UNLOADMAX
Specifies how many objects to hold in memory at any one time. The default is 100,000.
 USERSMF
This keyword specifies the SMF record type that is written by the ARSSMFWR exit (if used).
This parameter can be omitted if ARSSMFWR is omitted. For more information about the
ARSSMFWR exit, see IBM Content Manager OnDemand for z/OS Configuration Guide,
SC19-3363.

10.6 Expiring data on Content Manager OnDemand for i


In most circumstances, you must run Disk Storage Management (DSM) and Archived Storage
Management (ASM) to expire data from Content Manager OnDemand for i.

10.6.1 Content Manager OnDemand expiration


Disk Storage Management (DSM) is the process for performing Content Manager OnDemand
based expiration. DSM performs the following functions:
 Controls the expiration of indexes and data from Content Manager OnDemand (if you do
not use storage manager-based expiration).
 Migrates data from cache to the storage manager (if the Migrate Data from Cache option
is not set to When data is loaded).
 Expires data from cache if Cache Data is set to Yes.

If you do not run DSM, your disk storage requirements for Content Manager OnDemand
might be higher than expected. The number of objects that are stored in the integrated file
system (IFS) might also be higher than necessary, which results in longer save and restore
times.

Note: If you have never run DSM, the first execution of the Start Disk Storage Management
(STRDSMOND) command might last for an extended period.

If you want to configure Content Manager OnDemand so that DSM is not required in the
future, see the section “Eliminating the need to run Disk Storage Manager (DSM)” in the latest
Content Manager OnDemand for i Common Server Administration Guide, SC19-2792.

Chapter 10. Migration and expiring data and indexes 239


10.6.2 Storage Manager expiration
ASM is the process for performing Storage Manager-based expiration. ASM performs the
following functions:
 Controls the expiration of indexes and data from Content Manager OnDemand (if you use
Storage Manager-based expiration)
 Aggregates data before it migrates it to archive media (if you select the Aggregation option
in the migration policy)
 Moves data between storage levels of the migration policy

If you do not run ASM, your disk storage requirements for Content Manager OnDemand are
probably higher than expected. The number of objects that are stored in the IFS is also higher
than necessary, which results in longer save and restore times.

If you never run ASM, the first execution of the Start Archived Storage Management
(STRASMOND) command or the Start Disk Storage Management (STRDSMOND) command with
the STRASMOND parameter set to YES might last for an extended period.

For more information about expiring archives by using ASM, see Expiration processing in
Common Server Archive Storage Manager (ASM):
http://www.ibm.com/support/docview.wss?uid=swg21317082

240 IBM Content Manager OnDemand Guide

You might also like