IBM Content Manager OnDemand and FileNet-3
IBM Content Manager OnDemand and FileNet-3
The security user exit runs the ARSUSEC program when a user attempts to log on to the
system. A sample C program is provided in the EXITS directory. To implement your own
security user exit program, add your specific code to the sample that is provided (for example,
you can call another program from the ARSUSEC program). For more information about
functions, parameters, and return codes, see the ARSCSXIT.H file. Then, compile the ARSUSEC
program and move or copy the executable program to the BIN directory. Then, restart the
library server to use the security user exit program.
The arsuperm (permissions exit) can be modified in the same way and needs to be placed in
the /opt/IBM/ondemand/V9.5/exits directory.
When you enable the exits to implement the required level or type of security, the user ID
must be defined for both TSO and Content Manager OnDemand.
Us er Authent icati on
Resource A ut horizat ion
user login, add, delet e or update CMOD System grant ac cess t o folders , applicati on
Ars usec (dll) groups, docum ents, and query SQL
ex ported entry point of SECURITY. 2 1 Ar supe rm (dll)
SRVR_FLAGS_FOLDER_APPLGRP_EXI T=1
ARSUSE C. C (C s ample) SRVR_FLAGS_DOCUMENT_EXI T=1 export ed entry point of PERMEXI T.
ARSUSE CH (C header) SRVR_FLAGS_SECURITY_EXI T=1 SRVR_FLAGS_SQL_QUERY_EXIT=1 ARS UP ERM.C (C sample)
ARSCSX IT.h ARS US ECH (C header)
or ARS CS XIT. h
ex ported entry point of SECURITC. initializ es t he ARSUSE CA s tructure and call or
ARSUSE CC (Cobol sample)
exported ent ry point of
ARSUSE CB (Cobol copy book )
PE RM EXI T.
ARSCSX IC (Cobol c opybook )
ARS UP ERC (Cobol sample)
ARSUSE JJ (compi le JCL) 3 ARSZUXF ARS CS XIC (Cobol c opybook)
ARSZUXP L structure ARS UP ERJ (compile J CL)
exit rout ine driv er
4
ARSUSECA and A RS US ECH
(or A RS US ECB) provide A RSUSECX
mappings of t he dat a st ruct ures external ex it routine driver - As sembler ARSUSE CJ s ample JCL
pres ent ed as input parameters st ream us ed to ass emble and
to exit routines (ex . bind ARSUSECX and
ARSUSECZ) associated with 5
ARSUSECZ.
the ex it point defined by
ARSUSECX.
MVS D ynami c Exi t Faci lity
8
6 A s et of ex it routines 7
Table 6-2 lists the z/OS modules or executable files that ship with Content Manager
OnDemand.
ARSUPERM This c-module provides the interface between the Content Manager OnDemand
system and the ARSUSECX module.
ARSUSEC This c-module provides the interface between the Content Manager OnDemand
system and the ARSUSECX module.
ARSUSECA The mapping of the data structure that is presented to the exit routine is associated
with the exit point that is defined by ARSUSEC in assembler.
ARSUSECH The mapping of the data structure that is presented to the exit routine is associated
with the exit point that is defined by ARSUSEC in C.
ARSUSECJ This sample JCL stream is for assembling and binding ARSUSECX and ARSUSECZ.
ARSUSECX This interface module is for the MVS Dynamic Exit Facility.
All modules are in the SARSINST library. The sequence of this exit, using the MVS Dynamic
Exit Facility, is different from the classical interface with exit modules or a security exit in an
IBM CICS environment. The kernel code was updated to allow external security. The Content
Manager OnDemand kernel code calls a dynamic link library (DLL) as an interface to the exit.
Modules ARSUSEC and ARSUPERM are provided as C source code modules and as
executable files. You do not need to change and recompile them.
The source is delivered mainly for understanding the entire security system exit. If you want to
change the modules, they must be recompiled and bound as a C dynamic link library (DLL).
These modules communicate with the ARSUSECX module, which is an interface to the MVS
Dynamic Exit Facility. The security exit module ARSUSECZ is the delivered sample that
shows how to perform security checks with a Security Exit Facility (SAF) interface. RACF is a
program that uses SAF. ARSUSECH is a C source code module that passes the data
structure as input for every exit (ARSUSECZ) that is provided. ARSUSEA provides the same
function in assembler language.
Note: More than one security exit can be defined to the MVS Dynamic Exit Facility. For
example, you can define a different security exit for each instance.
Tip: The only module that you must change is the provided source code ARSUSECZ to
meet your requirements. It must be assembled and linked into a library that is accessible
for the MVS Dynamic Exit Facility.
For example, if your folder permissions are stored in an external security system without any
SAF interface, this part must be updated to call this external security system.
Important: Even if the security exit can check the user ID and password against SAF or
other security systems, every user must be defined in Content Manager OnDemand in
every instance. You can use the ARSXML program to create users in batch mode, and use it
as a command from the UNIX System Services command line and use a file as input.
SRVR_FLAGS_SECURITY_EXIT=1 Logon.
(This setting is the default for Content Manager Changing the password.
OnDemand for i. If you do not want to use IBM i Adding or deleting a user ID through the Content
security for the new instance, change the security Manager OnDemand administrator interface.
setting to 0.)
Note: The sample processes the feedback of the exit one at a time, even if you are
running more than one exit.
In addition, you can use the following operator command to add the exit:
SETPROG EXIT,ADD,EXITNAME=ARS.SECURITY,MODENAME=ARSUSECZ
Important: The load module must be in a link pack area (LPA) or an LNLKLST dataset.
PassTicket
ARSPTGN
MVS Dyna mic Exit Faci lity Securit y exit module racro ute SAF
ass embler
Gen erated
Optionall y
mo dified &
re tu rned RAC F, ACF2, Top
Secr et
Ut ility logs on to s erv er
Perf orms f unc tion CMOD RAC ROUTE R EQU EST=VERIFY
term inat es
Server
Secu rity databa se
To enable PassTicket in a security manager, such as RACF, you must complete the following
steps:
1. Activate the PKTDATA class.
2. Define a secured sign-on application key for each application.
3. Run SETROPTS RACLIST(PTKTDATA).
The system log user exit allows access to all of these messages. The exit can then use these
messages for further processing. For example, an email can be generated when a load fails,
or when a user’s system access pattern is abnormal and requires attention. For more
information about the system log, see 11.4.1, “System log exit for Multiplatforms” on page 250
and 11.4.2, “System log exit for z/OS” on page 253.
6.8 Summary
Content Manager OnDemand provides a secure environment. Security features within
Content Manager OnDemand allow access control to the data and the APIs that access the
data. The data itself is controlled at rest and in motion (SSL). Additional exits that are external
to Content Manager OnDemand can be created that allow the creation of customized
extensions to the Content Manager OnDemand internal security.
The index values are text strings that occur in the documents, for example, “John Doe”, or
“Account 1234”. One or more index values identify a unique document in Content Manager
OnDemand.
An indexer extracts the index values and optionally stores them in the index file by examining
the documents and copying the index values into the index file according to criteria that are
specified by the user. Depending on the indexer that is used, the data and indexes are either
directly loaded into Content Management OnDemand or are stored in a set of files that are
then read by the load process to store the data to Content Manager OnDemand. The indexer
creates the following files:
Output file (.out file extension), which contains the documents to load
Index file (.ind file extension), which contains the index values for the documents
The indexer might also create a resource file with a .res extension, which contains the
resources that are extracted from the documents.
Operationally, the loading process arsload calls the indexer that is specified on the Indexer
Information tab for the specified application. Depending on the indexer type, arsload
performs one of the following tasks:
Creates a set of files that is then loaded by the arsload program into the Content Manager
OnDemand System
Directly passes the indexing and document information to the arsload program so that
they can be loaded into the Content Manager OnDemand System
On Content Manager OnDemand for i, arsload is embedded within the (ADDPRPTOND) user
interface. Therefore, run the Add Report to Content Manager OnDemand (ADDPRPTOND)
command instead of ARSLOAD.
It is possible for the indexing to complete successfully but for the load to fail. The following
reasons are the most common reasons for a loading failure:
Using insufficient system resources
Connecting to the wrong database
Extracting the wrong index value from the document
For information about investigating and resolving common load failures, see 18.1.2, “Indexing
and loading issues” on page 379.
You can choose to use either or both of these methods for your remote data loading.
Run arsafpd to determine the input data type of your file. Knowing the input data type
determines the indexer that you can use and also helps you determine several of the indexing
parameters that you need.
To run arsafpd from the command line, enter the following command:
arsafpd -s -i <input file>
Figure 7-1 on page 164 shows examples of running the arsafpd command and the output
that is produced.
arsafpd -s -i testfile.afp
ARS7104I Document type: AFP
ARS7107I Group TLE structured fields were encountered
arsafpd -s -i admin.pdf
ARS7104I Document type: PDF
Figure 7-1 Examples of running the arsafpd command and the output that is produced
You can also run the arsafpd command to display the contents of an AFP document, index, or
resource file. For more information about ARSAFPD, see the Content Manager OnDemand
for Multiplatforms Administration Guide, SC19-3352.
Table 7-1 Indexers that are available for use with Content Manager OnDemand
Indexer Input data type Available Conversion Resource Large object Floating
platforms collection support triggers
ACIF Line, AFP All, except Line to AFP Yes Yes Yes
IBM i
OS/400 Line, AFP, SCS, IBM i SCS to AFP Yes Yes Yes
and SCS-Ext
PDF is a data type or file format that is platform (hardware, operating system)-independent. A
PDF file contains a complete PDF document that is composed of text, graphics, and the
resources that are referenced by that document.
Secure PDF documents are not supported. PDF Digital Signatures are not supported. If a
PDF document contains a digital signature, after indexing, the .out file does not contain the
digital signature. To load a file that contains a PDF Digital Signature, create a generic index
file for it, and load the file as one document.
Fonts that are not members of the base 14 fonts might be embedded in the document, or they
might be stored in a font directory.
Images and bar code fonts are also embedded in the document.
The PDF Indexer collects resources, such as fonts and images, removes them from the
document, and places them in a resource file. The number of embedded fonts in the
document directly affects the size of the resource file.
Accessing fonts
If a document references fonts that are not embedded and fonts that are not available on the
system, the document does not display correctly in the report wizard, and the PDF Indexer
cannot index it. In the report wizard, the document might display as a series of dots instead of
letters; the PDF Indexer fails with the “Trigger not found” message.
If your documents contain Asian fonts, ensure that you install them when you install Adobe
Acrobat.
If the fonts are not embedded in the document, use the FONTLIB parameter to tell the PDF
Indexer the location of font files.
Follow these steps to list the fonts in a PDF (for example, for Adobe Reader XI, version
11.0.3):
1. Display your PDF document in the Adobe viewer (or reader).
2. Click File → Document Properties → Fonts. You will see a list of fonts for the document.
The path to see the fonts might differ, depending on your viewer version.
When the input file is indexed, it is split into multiple PDF documents. Each PDF document
contains its own set of PDF structures that are required by the PDF architecture. For this
reason, the multiple PDF documents that are created by the indexing can be larger in total
than the original PDF document.
One way to reduce the size of the output file is using the base 14 fonts.
Setting INDEXMODE=METADATA (for the application) causes the PDF Indexer to extract fields from
the Document Information Dictionary that correspond to the specific metadata keywords (if
they exist) and place the extracted values into the .ind file to load into Content Manager
OnDemand. The metadata keywords are listed:
Title
Author
Subject
Creator
Producer
CreationDate
ModDate
Trapped
The main advantage of using metadata is the increased speed during the index process. The
main disadvantage of using this method is that each document needs to be loaded
individually; you cannot create large concatenated (multiple document) input data files.
For more information about using PDF metadata, see IBM Content Manager OnDemand -
Indexing Reference, SC19-3354.
If you plan to use the report wizard, you must first install Adobe Acrobat on the Windows
workstation from which you plan to run the Administrator Client. You must purchase Adobe
Acrobat from Adobe.
If you install the client after you install Adobe Acrobat, the installation program copies the
application programming interface (API) file to the Acrobat plug-in directory.
If you install the client before you install Adobe Acrobat, you must copy the API file to the
Acrobat plug-in directory manually.
If you upgrade to a new version of Acrobat, you must copy the API file to the new Acrobat
plug-in directory.
The example describes how to use the graphical indexer from the report wizard to create
indexing information for an input file. The indexing information consists of a trigger that
uniquely identifies the beginning of a document in the input file and the fields and indexes for
each document. We elaborate on this example by clarifying several of the instructions, and
throughout each step, we add important hints, tips, and explanations.
b. By using the mouse, draw a box around the text string. Start just outside of the
upper-left corner of the string. Click and then drag the mouse toward the lower-right
corner of the string. As you drag the mouse, the graphical indexer uses a dotted line to
draw a box. After you enclose the text string inside a box, release the mouse. The
graphical indexer highlights the text string inside the box. If the string is not highlighted,
try again and increase the box’s size.
Important: Size the box that you created around the text string, which you are trying
to collect, as large as possible to ensure that the field is collected at load time.
Figure 7-3 on page 171 shows an example of a box that is intended to capture the text
string Content. You can see that the box is much larger than the text string, and it
overlaps onto text that we do not want to collect. However, notice the Add a Trigger box
that is displayed; only the string Content is shown in the Value entry field, which means
that only the string Content is fully encapsulated in the box. Overlapping other text
might seem like an unnecessary precaution. However, when we are capturing data with
the PDF graphical indexer, it is an excellent way to ensure that we encapsulated all of
the text string that we must capture.
Important: Use the same principles for collecting fields as collecting the trigger text
string in step 8b on page 170. If the fields that must be collected are close together,
overlap them with adjacent fields to ensure that the box is as large as possible and
to ensure that the data is collected at load time.
Setting INDEXMODE=INTERNAL (for the application) causes the PDF Indexer to segment the
input file into the individual documents, gather the various PDF resources (fonts, images, and
forms), and then load the PDF indexes, documents, and resources into Content Manager
OnDemand.
For more information about using internal indexes (Page Piece Dictionary), see IBM Content
Manager OnDemand - Indexing Reference, SC19-3354.
ACIF accepts either line data or AFP as input and can produce three output files:
The output file, which is called the “out” file, is either line data or AFP.
The index file, which is called the “ind” file, is an AFP file.
The resource file, which is called the “res” file, is an AFP file.
A subset of the second mode is mixed mode input (line data records mixed with AFP
records). In this case, ACIF creates AFP output:
Specify the ACIF parameter CONVERT=YES.
ACIF creates an AFP resource file.
Files produced: .out, .ind, and .res.
For a description of the parameters, see the section “ACIF reference” in IBM Content
Manager OnDemand for Multiplatforms Indexing Reference, SC19-3354, or “ACIF reference”
in IBM Content Manager OnDemand for z/OS Indexing Reference, SC19-3368.
The arsafpd utility can display the .out file (if it is AFP), .ind file, and .res file that are
created by ACIF.
FILEFORMAT parameter
For AFP data, the FILEFORMAT parameter is not needed, unless the file is AFP in record
format. For a description of record format, see “AFP Structured Fields” on page 176.
Carriage controls
It is important to set the ACIF parameters CC and CCTYPE correctly. Table 7-2 describes the
ANSI carriage controls. The encoding columns show what you see if you look at the
document in a hexadecimal editor.
Because machine carriage controls are binary values, if a file contains them, it must always
be transferred as binary. Machine carriage controls cannot be converted to ASCII. For a list of
machine carriage control values, see the following website:
http://ibm.co/1M2ZtSG
For more information, see the Mixed Object Document Content Architecture (MO:DCA)
Reference, AFPC-0004-08, at the following website:
http://afpcinc.org/afp-publications/
The following two examples in hexadecimal of the AFP Structured Field Introducer show the
most common Structured Fields that you might see at the beginning of an AFP file:
5A 00 10 D3 A8 A8 00 00 00 Begin Document (BDT)
5A 00 5B D3 A8 C6 00 00 00 Begin Resource Group (BRG)
An AFP Structured Field can begin with the 2-byte length prefix (which is called record
format):
00 11 5A 00 10 D3 A8 A8 00 00 00
The length in the 2-byte prefix is one greater than the length in the Structured Field because
the 2-byte prefix includes the x'5A', but it does not include itself.
When you work with ACIF, it is important to know the format of the data. Use the arsafpd
utility or look at the input in a hex editor to be sure.
The index values in the index file become the values that display in the Content Manager
OnDemand Search Results window. The indexes are used to retrieve the document, which is
why the index file is so important, and why no data can be loaded without indexes. Usually,
the index file is created and used to load the documents into Content Manager OnDemand
and you never see it. However, it might be useful to look at the index file. This section
describes the format and content of the index file.
Run arsafpd to display an index file. The first Structured Field in the index file is a Begin
Document Index (BDI), which contains the code page of the index names and values. Most of
the file consists of the two AFP Structured Fields: Index Element (IEL) and Tag Logical
Element (TLE). Two kinds of IELs exist: Page Group and Page. The index file must contain
Page Group IELs for arsload to load the data.
A Page Group IEL is identified by the text “Begin Page Group Reference” in the arsafpd
output. Each Page Group IEL indicates where the group starts and its length in bytes.
Example 7-1 shows part of a Page Group IEL.
If you look at offset 201 in the .out file, you find a BNG Structured Field (if the .out file is
AFP), which indicates the start of a document.
You might see Page IELs in the index file. These Page IELs are created by setting the ACIF
parameter INDEXOBJ=ALL. They are needed (and are required) only if the document is being
loaded as large object. Example 7-2 shows part of a Page IEL.
Example 7-3 shows a Tag Logical Element (TLE) that contains index information.
Example 7-4 shows a portion of the arsafpd output of a fully composed AFP file in the correct
format to load into Content Manager OnDemand.
Example 7-4 Portion of the arsafpd output of a fully composed AFP file
1 BDT Begin Document
2 BNG Begin Named Page Group 00000001
3 TLE Tag Logical Element
4 TLE Tag Logical Element
5 TLE Tag Logical Element
6 TLE Tag Logical Element
7 IMM Invoke Medium Map ABBB
8 BPG Begin Page 00000001
9 BAG Begin Active Environment Group
10 MCF2 Map Coded Font2
11 NOP No Operation
12 PGD Page Descriptor
13 PTD2 Presentation Text Desc2
Each group is surrounded by BNG/ENG Structured Fields, and each group contains TLE
Structured Fields that occur after the BNG but before the BPG.
When an input file contains TLE Structured Fields, do not specify indexing parameters, such
as TRIGGER, FIELD, or INDEX. They are not needed because the file already contains index
information.
ACIF processes a file that contains TLE Structured Fields in the following way:
1. For every BNG in the input, ACIF creates a group IEL Structured Field in the index file.
2. ACIF makes a copy of the TLE Structured Fields from the input and places them into the
index file. The original TLE Structured Fields are also placed into the output file.
If the input file does not contain the correct number of TLEs in each group, ACIF might
complete, but arsload might fail with the following message:
The n is the number of fields that are defined to Content Manager OnDemand.
After ACIF processes an input AFP file, the output file might be larger than the input file, even
if the input was an AFP file. The answer is because ACIF changes the AFP, “improves it”, and
usually increases the file size. The following changes are made to the AFP:
Creating or adding comments to the BDT Structured Field
Creating or adding group names to the BNG - ENG Structured Fields
Changing obsolete Structured Fields to current Structured Fields (for example, MCF1 to
MCF2, or PTD1 to PTD2)
You can use the OS/390 indexer to extract index data from line data and AFP reports. In
addition, other data types, such as TIFF images, can be captured by using the ANYSTORE
exit (ANYEXIT is described in 11.3, “OS/390 indexer exits” on page 248).
The OS/390 indexer is a single pass indexer. (It does not create an intermediate file.) It
therefore provides better performance than ACIF. The COBOL Runtime Library is required on
AIX to run the OS/390 indexer, and it is included in the Content Manager OnDemand
Multiplatform software.
For more information about the use of the OS/390 indexer, see IBM Content Manager
OnDemand - Indexing Reference, SC19-3354.
The OS/400 indexer indexes input data based on the organization of the data:
Document organization. For reports that are made up of logical items, such as statements,
policies, and invoices, the OS/400 indexer can generate index data for each logical item in
the report.
Report organization. For reports that contain lines of detail with sorted values on each
page, such as a transaction log or general ledger, the OS/400 indexer can divide the
report into sets of pages and generate index data for each set of pages.
Before you can index a report with the OS/400 indexer, you must create a set of indexing
parameters. The indexing parameters describe the physical characteristics of the input data,
identify where in the data stream the OS/400 indexer can locate index data, and provide other
directives to the OS/400 indexer.
Indexing parameters include information that allows the OS/400 indexer to identify key items
in the print data stream, tag these items, and create index elements that point to the tagged
items. The OS/400 indexer uses the tag and index data for efficient and structured search and
retrieval. You specify the index information that allows the OS/400 indexer to segment the
data stream into individual items called groups. A group is a collection of one or more pages.
You define the bounds of the collection, for example, a bank statement, insurance policy,
phone bill, or other logical segment of a report file. A group can also represent a specific
number of pages in a report. For example, you might decide to segment a 10,000 page report
into groups of 100 pages. The OS/400 indexer creates indexes for each group. Groups are
determined when the value of an index changes (for example, account number) or when the
maximum number of pages for a group is reached.
Figure 7-4 on page 182 illustrates the data indexing and flow control for OS/400 indexer. For
more information about the OS/400 Indexer, see IBM Content Manager OnDemand -
Indexing Reference, SC19-3354.
Disk
Database
Storage Cache
Manager
Manager
Archive
OnDemand Archive
Storage
Database Media
Manager
Figure 7-4 Data indexing and flow control for the OS/400 indexer
The XML indexer was developed to support the growing need to efficiently and effectively
store large quantities of XML data, for example:
The European Union’s implementation of a Single Euro Payments Area (SEPA). SEPA
replaced the existing domestic retail credit transfers and direct debits with standardized
European payments that are based on Extensible Markup Language (XML) International
Organization for Standardization (ISO) 20022 messages. ISO 20022 provides a more
efficient way of developing and implementing messaging standards that financial
institutions and clients use to exchange massive amounts of transactional information.
Other XML standards exist and continued to be developed, such as ACORD (Insurance
industry), AgXML (Agriculture), and Health Level Seven (Health industry).
XML document formats were developed, such as Office Open XML (OOXML) and Open
Document (OASIS).
With XML indexing, you can automatically batch index and archive XML transactional
messages and statements into the Content Manager OnDemand repository. Documents are
identified and extracted during indexing. Resources are extracted, and, together with the
data, compressed and archived. Multiple stylesheets can be specified to meet device and
accessibility requirements.
XML steeliest (resource) archiving is critical. Content Manager OnDemand optimizes the
storage of XML data by storing only a single version of a resource and then associating it with
all of the archived documents. Document resources can be automatically collected and
managed.
The XML indexer uses the “Generic XML Index File Format” (GXIFF). The GXIFF format is
functionally similar to the Generic Index File Format in that it allows the loading of any type of
data into Content Manager OnDemand.
For more information about using the XML indexer, see IBM Content Manager OnDemand -
Indexing Reference, SC19-3354.
The ACIF indexer and the OS/390 indexer support multiple user exits. The OS/400, PDF,
XML, and Generic indexers do not support any user exits.
For a description of the ACIF user exits in detail, see 11.2, “ACIF exits” on page 242.
For a description of the OS/390 indexer user exits, see 11.3, “OS/390 indexer exits” on
page 248.
In the later sections, we focus on the integration and application programming interface (API)
client options of Content Manager OnDemand, such as the ODWEK API, the Content
Management Interoperability Services (CMIS) web services, the mid-server SAPI, and
integration with other IBM Enterprise Content Manager products, such as IBM Information
Integrator and IBM FileNet P8. We describe how to use the existing API to build your own web
client interface for Content Manager OnDemand.
The Content Manager OnDemand Client choices enable the product to meet the
ever-changing world of information technology and the way content is delivered. For example,
delivering documents that are stored in Content Manager OnDemand to a mobile device was
not relevant a few years ago. However, it is an important consideration for enterprise content
delivery today. Technology drives change with current Content Manager OnDemand
customers, and IBM delivers options to meet current and future business requirements. A
customer’s goal is to use a single user interface for access to all of its Enterprise Content
Management content. IBM met that goal with the IBM Content Navigator user interface, but
IBM continues to retain multiple Content Manager OnDemand Client interfaces to meet the
various needs of its customers.
When you choose the correct client for your implementation of Content Manager OnDemand,
two primary considerations are the client functionality and the client architecture.
Concerning the client functionality, the most powerful client is the Microsoft Windows client.
All other clients contain only a subset of the features of the Windows client. The most
prominent difference is the viewer capability.
Determine whether your users require functionality that is specific to the Windows client only.
If not, see the range of viewer options that are described in 8.1.1, “Viewer options” on
page 186, which compares the different viewers across the various client options.
The content that is displayed by certain viewers can be changed by either transforms
(ODWEK) or exits. For more information about exits, see Chapter 11, “Exits” on page 241.
The Windows client reflects the richest set of capabilities in terms of viewing these data types.
Because it directly communicates with the Content Manager OnDemand server, we reference
the Windows client for all of its features that relate to document display.
The Line Data viewer of the Windows client is the most sophisticated viewer that is available
for Content Manager OnDemand from the selection of readily available viewers.
The viewing of these primary data types happens within the same application. The Windows
client provides other features, such as thumbnails, and configurable and saveable views.
The Content Manager OnDemand Windows client also contains other capabilities for viewing
archive data types, such as Portable Document Format (PDF) and User-Defined.
Starting with Content Manager OnDemand version 9.5, for both DocType=PDF and
user-defined PDF, the Windows Client will attempt to view a PDF document with Adobe
Acrobat, if it is installed. If Adobe Acrobat is not installed, for DocType=PDF, Adobe Acrobat
Reader will be used instead when the PDF document is viewed.
Before Content Manager OnDemand version 9.5, PDF documents can be viewed by the
Windows client in two ways:
If they are configured in the application as data type “PDF”, the rich feature set of the AFP
and Line Data viewer applies, but Adobe Acrobat Professional is required.
If the data type is configured as “User Defined” and “.pdf” as the extension, the
documents are started externally. Therefore, you can view the documents with the
no-charge Adobe Acrobat viewer or any other installed PDF viewer.
Any data type can be specified as “User Defined”, for example, Word documents (.docx).
User-defined data is viewed by invoking its associated application.
Detailed information about ODWEK’s viewers and transforms is in IBM Content Manager
OnDemand Web Enablement Kit Java APIs: The Basics and Beyond, SG24-7646. Only a
brief overview is provided in this chapter.
The plug-ins for AFP and images are shipped as setup packages, which must be installed on
the user’s computer. The plug-ins integrate themselves with Mozilla Firefox browsers and
Microsoft Internet Explorer. The AFP plug-in provides similar viewing capabilities to the
Windows client.
The image plug-in can view image files, with the added benefit of displaying TIFF images
(which current web browsers usually cannot display).
The transforms apply only to documents that are served by ODWEK. They are available to
web clients that are based on ODWEK (such as Content Navigator) and to any other
application that is written by using the ODWEK Java API. They are not available on the
Windows client.
Note: The AFP viewer plug-in, which is available with ODWEK and Content
Manager OnDemand, is a version of the AFP viewer plug-in from the InfoPrint
Solutions Company. Although the standard InfoPrint viewer can be used for viewing
AFP, the ODWEK version uses direct communication with the Content Manager
OnDemand server, enabling segmented document transfer for LOB documents.
Annotations
Only the native ODWEK viewers and the Windows client support annotations. These viewers
and Windows clients support annotations in the following ways:
Line data applet: Supports text. Starting with version 9, the viewer can work with graphical
annotations, also.
Windows Client: Supports maximum capabilities for all data types.
Other viewers, for example, the AFP plug-in viewer: Do not support and are not aware of
annotations.
Web clients, such as Content Navigator or the ODWEK Java API, can work with annotations
and provide access to them through the hit list. Graphical annotations cannot be accessed
that way because they are not exposed through the Java API.
Windows client
Consider the following items when you are planning a Windows client infrastructure:
It is faster than the web clients and more powerful.
It requires native installation on each user’s workstation or notebook. Server version
upgrades might also require a new client installation.
This client supports Citrix and Terminal services environments.
It does not support the Transforms interface for transforming and converting data formats
because the data formats are provided by ODWEK only.
Content Navigator
When you choose a ready-for-use web client, consider the IBM strategic client, IBM Content
Navigator, because it is the most complete, most recent web client.
Special use cases might require the development of a custom client application for Content
Manager OnDemand. For more information about development APIs, see 8.3, “Client API
overview” on page 202.
With Content Navigator, you can run a cross-repository search to search for content across
multiple types of repositories, including Content Manager OnDemand. For example, Content
Manager OnDemand search results can be included in the same hit list as search results
from other supported repositories to help provide a comprehensive view of content.
When you create a cross-repository search, you can specify the following information:
Specify the scope of the search on each repository. You can specify the search or the
classes that you want to include in the cross-repository search by using IBM Content
Manager OnDemand. On IBM FileNet Content Manager and IBM Content Manager, you
also can limit the search to a specific folder.
Specify how properties from each repository are related to each other.
Specify any default search criteria that you want displayed when users open the search.
For more information about how to configure a cross-repository search, see the IBM Content
Navigator Knowledge Center at the following web address:
http://www.ibm.com/support/knowledgecenter/SSEUEX_2.0.3/contentnavigator_2.0.3.htm
If you are developing a Windows application, you optionally can use the Object Linking and
Embedding (OLE) (ActiveX Control) API, which is provided by the Windows client. This API
requires a Windows client installation.
Another option is to use an intermediate API that is based on the ODWEK Java API for the
Content Manager OnDemand access portion. Content Management Interoperability Services
(CMIS) or other web services can be used as the intermediate API. The web service
application uses ODWEK to access Content Manager OnDemand and relays this access
through its own web services to any other application. In this case, the Windows application
only needs to talk to the web service. For more information about CMIS and its limitations,
see 8.3.2, “Content Management Interoperability Services” on page 204.
The use of an intermediate API increases complexity and potentially decreases performance,
but it decouples a Windows application and Content Manager OnDemand in terms of API
versioning and requiring a Content Manager OnDemand installation.
Every ars command on the server displays its current server software version, as well.
You can view the version of the Windows client by clicking Help → About.
To determine the version of ODWEK, you can either look for the readme file in the ODWEK
application directory or use a client. If you are running a web client (for example, Content
Navigator), open a line data report by using the line data applet viewer. Because this viewer is
provided by ODWEK directly, the viewer shows the current ODWEK version level in the About
dialog box under the Help menu.
Starting with version 9.5 and later, you can run multiple versions of the Content Manager
OnDemand Windows client (at the release level only, not the PTF level) on a single
workstation. The client code is now installed in the c:\Program Files (x86)\IBM\OnDemand
Clients\V9.5 directory.
For ODWEK, you can run multiple versions of ODWEK on a single system. Although this
capability might not be a preferred scenario from a maintenance point of view, it can be
helpful during upgrades and existing system access scenarios. Each application that uses the
ODWEK API must point to the correct installation path and load the correct corresponding
libraries.
Content Navigator can be used to access documents from multiple content repositories:
IBM Content Manager Enterprise Edition repositories
IBM Content Manager OnDemand repositories
IBM FileNet P8 repositories
Organization for the Advancement of Structured Information Standards (OASIS) CMIS
repositories
You can use Content Navigator to build a customized user experience. It supports many
configuration options and includes a powerful API toolkit that you can use to extend the web
client and build custom applications.
Figure 8-2 shows Content Navigator browsing a folder in Content Manager OnDemand.
Figure 8-2 Searching a Content Manager OnDemand folder with Content Navigator
Note: Content Navigator is a Web 2.0 Ajax-based client. These web applications rely on an
up-to-date JavaScript engine, which is only available in newer browsers. Older browsers,
such as Microsoft Internet Explorer Version 8, might not work correctly with Content
Navigator.
Content Navigator, version 2.0.2 and later, provides many additional Content Manager
OnDemand capabilities:
AFP Viewer plug-in support
External Data Services (EDS) support
Favorites support for folders and documents
Single and multiple AFP file download as PDF (with AFP2PDF enabled)
Highlighted search result terms in full text searches
Line2PDF conversion viewer
XML viewer
Starting with Content Manager OnDemand V9.0 Content Navigator provides single sign-on
(SSO) token pass-through to the client side. Date validation is no longer required. Support is
provided for ‘t’ date expression and federated search across Content Manager OnDemand,
FileNet P8, and IBM Content Manager repositories. Content Navigator is also the new CMIS
packaging for Content Manager OnDemand.
The following prerequisites exist for a Content Navigator installation for Content Manager
OnDemand:
Native installation of the Content Navigator base software
A database to store the Content Navigator configuration
Web application server
ODWEK
Optional: AFP Transforms for AFP to PDF rendering
Java Database Connectivity (JDBC) drivers (if not already present)
The Content Navigator database is relatively small, so a collocation with the Content Manager
OnDemand database might be possible in small deployments. The installation manual
provides SQL statements for creating the database and its table spaces.
After you install all of the components, run the Content Navigator Configuration and
Deployment Tool to create a preconfigured web application and deploy it to the web
application server.
The Configuration and Deployment Tool provides a wizard that leads you through the base
setup process. You must provide details about your web application server and connection
information to the configuration database. For the Content Manager OnDemand
configuration, you must provide the location of your ODWEK installation. Run the deployment
scripts at the end for deploying Content Navigator on your application server.
The Content Navigator installer creates a shared native library in WebSphere Application
Server. You can review this library in the Integrated Solution Console in the Environment,
Shared libraries section. You need a library that has the class path set to the location of the
ODApi.jar (for example, /opt/ibm/ondemand/V9.5/www/api/ODApi.jar) and the Native Library
Path set to the ODWEK directory (for example, /opt/ibm/ondemand/V9.5/www). If you
encounter any errors, ensure that these paths are valid.
Note: If multiple applications reference the same native library, the library gets loaded
multiple times. But because the ODWEK library is a shared library, it can be loaded only
one time for each JVM. So, if you are running multiple ODWEK web applications in one
WebSphere Application Server, you must configure the shared library reference on the
Class Loader level of the server itself instead of on the application level. You can use the
Integrated Solution Console, which is in the class loader of the application server, for this
task.
Note: This option does not affect the SSL security of the web application, for example,
between the web server and the browser. It only encrypts the API communication
between the web tier and the Content Manager OnDemand server.
If you want to use AFP Transforms or another transform filter through generic transforms,
you must specify the path to the correct configuration files.
You can specify additional configuration parameters, for example, in the ODConfig class in
the Java API. For more information, see the Javadoc of ODApi or IBM Content Manager
OnDemand Web Enablement Kit Java APIs: The Basics and Beyond, SG24-7646.
If you want to avoid the use of Java applets and your content is viewable by browsers (for
example, certain image types or textual data), try the browser pass-through viewer, which lets
the browser handle the data natively. If you work with AFP and must use the AFP browser
plug-in, register the Content Navigator plug-in, AFPViewerPlugin.jar, and configure the
viewer map that is assigned to your Content Navigator desktop to use the AFP viewer for the
application/afp MIME type. The AFPViewerPlugin.jar file ships with Content Navigator. You
must choose the web browser pass-through viewer.
The Ajax viewer is a Web 2.0 JavaScript application that provides basic document functions,
such as page-wise browsing, rotation, or zoom. It is not a Java applet.
The generic applet viewer, the built-in PDF and HTML conversion, and the Ajax viewer can all
work with various data types:
Images (such as TIFF, JPEG, and DICOM)
Office documents
PDF
Most line data documents
Certain AFP data
However, they all use a rendering engine to display Office, PDF, and AFP data into an image.
This rendering might work well with certain Office and PDF files, but it fails on most non-basic
AFP data streams.
Note: Content Navigator is a Web 2.0 client and relies on HTML 5 and JavaScript for its
core client functionality and especially for the Ajax viewers. Not all browsers are suitable for
running Content Navigator fast and efficiently, especially for Microsoft Internet Explorer
browsers before version 9. Test Content Navigator with your user browser thoroughly
before you consider a deployment.
Figure 8-3 Content Manager OnDemand results list in the Windows client
As the full function client for Content Manager OnDemand, the Windows client provides
various business functions and features that can be selected at the document level, as shown
in Figure 8-4 on page 199.
You also can show the pages within a document or report as thumbnails, which provide you
with a visual representation of the report.
Figure 8-5 on page 200 shows the Content Manager OnDemand CICS Client login panel,
which requires the standard login credentials.
The CICS Client provides viewing capabilities for line data reports and a “best fit” model for
fully composed AFP documents. Viewing a standard line data report is shown in Figure 8-6.
For more information about the most common integrations, see Federated Content
Management: Accessing Content from Disparate Repositories with IBM Content Federation
Services and IBM Content Integrator, SG24-7742.
It can connect to various systems, such as Content Manager OnDemand, Content Manager,
FileNet P8, and content management systems by other vendors. You can create a virtual
archive, spanning across all connected systems and document models. Users can search in
one system and the search is propagated to multiple back-end repositories. Information
Integrator maps virtual fields to folder fields in Content Manager OnDemand (or respective
models in other systems) and delivers a consistent hit list of documents to the user.
Content Integrator might be an option for you if you use separate Content Manager
OnDemand systems (instances or physical systems) and must provide a cross-system
search (for example, for eDiscovery or legal inquiries). Another use case is to provide
repository-neutral services with access to multiple content management systems.
Note: Information Integrator is an abstraction layer. You lose Content Manager OnDemand
specific functionality, because the virtual archive provides only the common functionality
that can be implemented by all archives. Always check your use case to verify that a virtual
archive meets your needs for functional compatibility and performance.
This federation differs compared to Information Integrator. In Content Federation Services, for
each Content Manager OnDemand document, a virtual document is created in FileNet P8
(resulting in database records in FileNet P8). So, these documents act as FileNet P8
documents from a FileNet P8 user’s perspective. Information Integrator does not have its own
database and does not create virtual documents, but it instead calls Content Manager
OnDemand for searches and passes on the result list. A search in FileNet P8 never starts a
search in Content Manager OnDemand, but it can find only federated Content Manager
OnDemand documents, which are cataloged in the FileNet P8 database.
When you plan your integration with FileNet P8, remember this federation is active: Content
Manager OnDemand actively publishes document links into a FileNet P8 system. You must
consider both volumes (FileNet P8 systems usually are smaller than Content Manager
OnDemand systems) and the active federation process.
For more information about Content Manager OnDemand and FileNet P8 integration, see
IBM FileNet Content Federation Services for Content Manager OnDemand, SC19-2711.
The following list shows the APIs that are available for Content Manager OnDemand:
Content Manager ODWEK: The Java API for Content Manager OnDemand
SOAP and Representational State Transfer (REST) web services that follow the CMIS
standard
Windows OLE (ActiveX control) that is provided by the Windows client
XML administrative API through the ARSXML server command
Structured APIs on z/OS environments
The standard Content Manager OnDemand server commands that serve as a
console-based API to work with Content Manager OnDemand documents
The ODWEK Java API and its use to develop Content Manager OnDemand clients are
described in detail in IBM Content Manager OnDemand Web Enablement Kit Java APIs: The
Basics and Beyond, SG24-7646. This section covers only a basic overview and focuses on
client considerations about ODWEK. Developers are encouraged to read the referenced book
before they plan a client development that is based on ODWEK.
Scope
ODWEK is a Content Manager OnDemand component that can be used by all Content
Manager OnDemand customers. It is focused on typical client use cases, such as searching
for and accessing data that is stored in a Content Manager OnDemand archive. It also has
web viewers, such as the line data applet and Content Manager OnDemand AFP viewer.
Before Content Manager OnDemand Web Enablement Kit (ODWEK) Java API V9.5, the only
API that allowed documents to be added to the Content Manager OnDemand archive was the
ODFolder.storeDocument API, which resulted in an archive request to the Content Manager
OnDemand server for each document. This API is suitable for low-volume ad hoc storage.
In ODWEK V9.5, new APIs were introduced to allow documents to be loaded in bulk, which
provides high-volume storage similar to the arsload command. To accomplish bulk loading by
using the ODWEK Java API, you perform these steps:
1. Call the ODServer.loadInit API to initiate the load process.
2. For each document to load, call the ODServer.loadAddDoc API, which passes the number
of pages, a hash table of index values to store, and the document data.
3. Call the ODServer.loadCommit API, which specifies the application group and application
to send the load data and load request to the Content Manager OnDemand server.
For special client needs, the Java API provides access to the object model (application group
and application) of Content Manager OnDemand and facilitates an ARSXML pass-through,
which can be used to perform administrative tasks.
In addition to the physical presence on the system, Java applications must be aware of the
native libraries. The ODWEK native libraries are loaded as shared memory objects and
cannot be reloaded multiple times. If you run multiple ODWEK applications in one web
application server, consider this restriction.
For a description of how the native library reference is managed for the ODWEK client in IBM
Content Navigator in IBM WebSphere Application Server, see “Accessing the native libraries”
on page 195.
For a connection pooling sample that covers the topics of thread safety, resource
consumption, and timeouts in detail, see Chapter 6, “Connection pooling and connection
handling”, in IBM Content Manager OnDemand Web Enablement Kit Java APIs: The Basics
and Beyond, SG24-7646.
CMIS provides a common access interface for searching, retrieving, and in the case of
document management systems, modifying and deleting documents. It is a web services
interface that is implemented in either SOAP web services and REST (Atom) services.
For more information about CMIS, see the CMIS page on the OASIS website, the CMIS
overview page at the IBM Enterprise Content Manager website, and the technical
documentation that is available:
https://www.oasis-open.org/committees/cmis/
http://www.ibm.com/software/ecm/cmis.html
Implementing Web Applications with CM Information Integrator for Content and
OnDemand Web Enablement Kit, SG24-6338
Content Management Interoperability Services for Content Manager OnDemand is
installed as part of the IBM Content Navigator installation. For more information, see
“Installing Content Navigator” on page 194.
When you consider implementing your own software on CMIS, remember CMIS is used for
accessing document management systems, but not necessarily high-volume report archives,
such as Content Manager OnDemand.
For more information about the Windows client-based API, see Windows Client
Customization Guide, SC19-3357.
Structured APIs are handled by a dedicated component of Content Manager OnDemand that
is called MidServer. MidServer relies on ODWEK and its API to access the Content Manager
OnDemand server.
Structured APIs are available only on z/OS, and they are called from COBOL or C
applications in the same manner as MVS calls. Because ODWEK is used as the access path
to the Content Manager OnDemand server, the Structured APIs can be used to access
non-z/OS Content Manager OnDemand servers, as well.
Server commands
In addition to the API options, which are exposed through Java, OLE, or Web Services,
Content Manager OnDemand provides console (command-line) applications that provide
specific functions, such as searching, retrieving, or deleting documents, and sophisticated
functions, such as placing holds and working with the full text engine. Most of this functionality
is exposed through the ARSDOC application.
Simpler custom applications, for example, shell scripts, can use these server console
applications to interact with Content Manager OnDemand systems. The applications are
available only as part of a Content Manager OnDemand server installation. Because most of
them (namely ARSDOC) communicate with the server through TCP/IP, you can connect and
interact with Content Manager OnDemand servers remotely on other platforms. When you
call remote servers, ensure that the local installation that provides the ARS applications and
the actual Content Manager OnDemand server are on the same version level.
For more information about the administrative commands, see the specific command
descriptions in the IBM Content Manager OnDemand Knowledge Center:
http://www.ibm.com/support/knowledgecenter/SSEPCD_9.0.0/com.ibm.ondemandtoc.doc/ad
ministering.htm
XMLs can be passed to and from ARSXML through the ODWEK Java API, which enables Java
applications to programmatically call ARSXML and obtain access to administrative data model
functions.
In this section, we describe why you might need data conversion, when to convert the data
stream, and how to convert the data.
AFP to PDF
If a requirement exists to present AFP documents in the Portable Document Format (PDF)
format over the web, from a storage perspective, it is more efficient to store the documents in
their native format and then convert them to PDF at retrieval time. AFP documents are stored
more efficiently than PDF documents.
The PDF print stream, when it is divided into separate customer statements, is larger than
AFP because each statement contains its own set of structures that are required by the PDF
architecture to define a document.
Elapsed time and processor time are also essential factors in the decision-making process.
The amount of time (elapsed and CPU) that is needed to convert the document depends on
how large the document is and how many resources or fonts are associated with the
document.
The ODWEK Java API provides industry-standard Java classes that can be used by a
customer to write a custom web application that can access data that is stored on the Content
Manager OnDemand server. This custom application can, for example, permit the user to log
on to a Content Manager OnDemand server, get a list of folders, search a specific folder,
generate a hit list of matching documents, and retrieve those documents for viewing. Many
APIs provide advanced functionalities.
To meet this requirement, a highly flexible interface was added to the ODWEK Java APIs that
allows a developer to easily implement a third-party document transform solution.
The new ODWEK Interface allows a client developer to implement an external program to
transform a document in one of two ways:
If the transform vendor provides a basic command-line executable file, it is implemented in
an XML interface, which supports the retrieval of all of the document details that are stored
in Content Manager OnDemand, and also allows specific options to be passed to the
transform.
The ODWEK Java APIs also provide a Java interface that a client developer can use to
add even more flexibility to their client solution. The Java interface allows a client
developer to get the document byte stream from ODWEK, then use any methods that they
want to convert the document. These methods can include calls to web services that allow
remote transformation. After the document is transformed, the resulting data can be
returned to ODWEK, where it is passed back to the client that made the request.
9.2.2 Configuration
To enable the Generic Transform Interface in ODWEK, an XML document must be created
and defined in the ODConfig.Properties object. This XML document is identified by the
<ODConfig.TransformXML> key name and must include the fully qualified path to the XML file
where the transforms are defined.
After you configure your XML configuration for the Generic Transform Interface, as described
in 9.2.3, “Basic implementation: Executable interface” on page 211, you can enable this
functionality in your ODWEK environment, as shown in Example 9-1.
Example 9-2 shows a sample of the ODTransform.xml file that can be used in this
implementation.
In this example, you can see that we defined a transform that is named MyTXFRM_EXE, which
calls the transform command txfrm.exe, which is defined in the <CmdLineExe> tag.
The <TransformName> is used as the viewer name when it calls the ODWEK Retrieve APIs.
From this configuration, ODWEK knows that the transform requires RECORDLENGTH,
CARRIAGECONTROL, CODEPAGE, and OUTPUTFILE information from Content Manager OnDemand,
and can set it on the cmdline by using the options that are specified in each related XML tag.
Also, the txfrm.exe requires additional information to be passed on the cmdline. The -r that
is specified in the <Cmdlineparm> tag has no meaning to Content Manager OnDemand, so it is
passed through and set on the cmdline call to the txfrm.exe.
In the custom Java code, the call to retrieve the data from ODWEK includes the <Transform
Name> that is specified in the XML and looks like the following line:
"byte[] transformedDocument = ODHit.retrieve("MyTXFRM_EXE");
From this example definition, ODWEK calls the specified transform with the following cmdline
executable file. Details for the items within “< >” are provided by ODWEK from the Content
Manager OnDemand data definitions:
"c:/opt/txfrm.exe -lm <record len> -x <carriage control> -a <codepage> -o <output
file name> -r PDF"
Note: The <CARRIAGECONTROL> node was replaced by three values. When the CC
Type that is returned by ODWEK matches ANSI, rather than an 'A', the command includes
"-x 2".
This type of substitution can be used to specify the RECFM (Record Format), PRMode, TRC,
and CC Type.
Figure 9-2 on page 213 shows a sample transform.xml that can be used in this
implementation.
Figure 9-3 shows the transform commands that are generated based on the sample XML and
Application Group and Application of the document that is retrieved.
Example 9-3 shows a sample of the ODTransform.xml files that can be used in this
implementation.
<Transforms>
<transform>
<TransformName>MYTXFRM</TransformName>
<TransformDescription>GENERIC Transform Engine.</TransformDescription>
<ClientClass>com.companyA.corp.TransformClient</ClientClass>
<OutputMimeType>application/pdf</OutputMimeType>
<OutputExtension>pdf</OutputExtension>
<CmdParms>
<AG_NAME>agName</AG_NAME>
<APPL_NAME>applName</APPL_NAME>
<RECORDFORMAT>recfmt</RECORDFORMAT>
<RECORDLENGTH>LineLength</RECORDLENGTH>
<CARRIAGECONTROL>CC</CARRIAGECONTROL>
<CODEPAGE>CodePage</CODEPAGE>
</CmdParms>
</transform>
</Transforms>
Similar to the basic implementation, the developer uses this XML stanza to set up the
required details for document transformation and how those details are passed to the Java
transform interface. Example 9-4 shows an example of how the Java interface can be used
with the XML stanza to create a document transform request. The example is a code snippet
of how the Client Class that is defined in Example 9-3 might be written to transform data.
// List the property keys and values ODWEK read from the transform XML
// file and provided to this Custom Class
System.out.println(" Transform properties:");
Properties gtProps = (Properties)odMap.get(ODTransform.TXFRM_REQ_PROPS);
Enumeration<?> enumeration = gtProps.keys();
List<String> list = new ArrayList<String>();
while (enumeration.hasMoreElements()) {
list.add((String)enumeration.nextElement());
}
Collections.sort(list);
for (String key : list)
System.out.println(String.format("%25s = %-25s", key,
gtProps.getProperty(key)));
Example 9-4 on page 214 shows how to set up the HashMap to pass document byte arrays in
and out of this custom interface, and how to define a custom Java class that contains the
transformData() method.
Table 9-2 provides information about the XMLTags. These XML tags are used to pass specific
values to the transform command line. These XML tags allow the mapping of the
command-line option where the specified value can be passed.
V9.5 enhancements
Table 9-3 provides information about the OnDemand client HashMap keys that are used for
advanced Java implementation.
TXFRM_RESP_DATA This key is the HashMap key for the transformed data byte[] to be
returned to ODWEK.
TXFRM_REQ_METHOD The method name that is used in the custom Java class. The
transformData() method must exist in the client class.
During a load process, Content Manager OnDemand stores report (document) data, its
resources, and index data, as shown in Figure 10-1.
C ache di r 1
ca che
Stora ge set
C ache di r n
An d or
The Content Manager OnDemand load process identifies, segments, and compresses
groups of documents into storage objects that are then stored in the Content Manager
OnDemand archive, as illustrated in Figure 10-1. To improve the efficiency of the storage
process, Content Manager OnDemand aggregates the stored documents (typically a few
kilobytes in size) into storage objects. This aggregation provides efficient, high-volume
storage, retrieval, and expiration performance.
Object size value: Exercise caution when you change the object size value. Specifying
too large or too small a value can adversely affect performance when you load data.
The storage objects are stored in storage sets. The storage sets contain one or more primary
storage nodes. The storage node points to the location where the data is stored, which can be
cache, the storage manager (Tivoli Storage Manager, object access method (OAM), or
Archive Storage Manager (ASM)), or a combination.
The primary storage nodes can be on one or more object servers. When the Load Type is
Local, Content Manager OnDemand loads data on the server on which the data loading
program runs in the primary storage node with the Load Data property specified. If the Load
Type is Local, and the storage set contains primary nodes on different object servers, you
must select the Load Data check box for one primary node on each object server.
The storage set must support the number of days that you plan to maintain reports in the
application group. For example, if you must maintain reports in archive storage for seven
years, the storage set must identify a storage node (or migration policy on an IBM i server)
that is maintained by ASM for seven years.
A detailed description of adding storage sets and storage nodes is in Chapter 5, “Storage
management” on page 89 and the related OnDemand Administrative Guide.
Figure 10-2 on page 222 shows the datasets and illustrates four scenarios of their storage
and expiration.
Sto rage
Exp ire after Sto rage Exp ire i n
4 o bje ct Cach e
90 days Ma nag er 5 ye ars
Figure 10-2 Data, resource, and index storage and expiration scenarios
This method is enabled by selecting a cache-only storage set and entering a number in the
Cache Data for __ Days field.
When you select a cache-only storage set, Content Manager OnDemand automatically sets
Migrate Data from Cache to No and sets the Expire in __ Days field to the same value as the
Cache Data for __ Days field. (The default value is 90 days.)
Selecting a cache-only storage set requires the creation of backup and data management
systems that are external to the Content Manager OnDemand system.
Cache-only storage: If the storage set contains cache-only storage nodes, ensure that
the Cache Data value and the Life of Data and Indexes value are the same. Otherwise, the
add or update operation cannot be completed.
The data needs to be kept on a high-performance storage device for the period during which
it is retrieved frequently. The storage set must support the type of media that is required to
hold reports that are stored in the application group. For example, if you must maintain
reports in cache storage for 90 days and in archive storage for seven years, the storage set
must identify a storage node (or migration policy) that causes ASM to maintain the data for
seven years, and you must select Cache Data for __ Days and enter 90 in its field.
From a user’s perspective, no procedural difference exists in retrieving the data from either
cache or archive storage. The only user-perceivable difference is the response time. Various
archive storage mechanisms provide different performance profiles. For example, when you
use OAM and the data is stored in DB2 tables on disk, the response time is as fast as the
cache response time. The main difference in response time is based on the type of disk that is
used by either method. Conversely, if the OAM data is stored on optical disks or tape, the
response time is increased dramatically. If you use a network-attached Tivoli Storage
Manager server, the retrieval rates (throughput and response times) are governed by the
Tivoli Storage Manager device and the TCP/IP connection to that device.
Typically in a z/OS environment, data is not stored in cache. Content Manager OnDemand for
z/OS customers typically use OAM as their storage manager. OAM supports storing the data
directly in DB2 where the storing and retrieval rates are exceptionally fast, which eliminates
the need to maintain and monitor cache file systems in the z/OS file system (zFS) or the
hierarchical file system (HFS).
If you do not need to maintain reports in cache storage, select a storage set that identifies a
storage node (or migration policy) that is maintained by ASM and set Cache Data to No.
Content Manager OnDemand automatically sets Migrate Data from Cache to When Data is
Loaded.
The Cache data field determines whether Content Manager OnDemand stores data in cache
storage. If the storage set is a cache-only storage set, Yes is the only selection. If the storage
set is an archive manager-controlled storage set (OAM, Tivoli Storage Manager, or ASM), you
can optionally add storing the data in cache.
Note: Whether the data is stored in cache or in a storage manager, the main performance
differences are a result of the following items:
The hardware speed (and I/O channels and interfaces) on which the data is stored.
The location of the hardware device in relations to the object server.
If the hardware device connects over a TCP/IP link, that link can form a bottleneck,
depending on the link’s throughput and the required data retrieval rate.
Each application group is segmented into multiple physical tables by using a date or a date
and time field. The size of each physical table is determined by the Max rows setting. Each
row in the table contains a set of user-defined and system-defined indexes that enable the
search for a report segment or a document. Index data is loaded into a table. When the Max
rows value is reached, the table is closed and a new table is created. The number of physical
tables that represent an application group might grow from 1 to n.
The ARSLOAD program saves one copy of a resource on each node for each application group.
The resource can be stored multiple times, depending on how the ARSLOAD program compares
the data. The ARSLOAD program compares the last 50 resources against the resource that is
generated by the load. If a match is not found, a new resource is stored.
When the ARSLOAD program processes a resource group file, it checks the resource identifier
to determine whether the resource is present on the system.
If the storage node identifies a client node in OAM or Virtual Storage Access Method (VSAM),
the storage manager copies the resources to archive storage.
Document Data
For Document Data, the following selections are valid:
Yes for Cache Data: You can cache document data and resource data or only resource
data.
No Cache: Document data is not stored in cache.
Cache Document Data for xxx Days: Document data is stored in cache for xxx number of
days before the data expires.
Four typical lifecycle scenarios are common. The Content Manager OnDemand administrator
selects the scenario to implement through various parameters (as shown in this section),
which are on the Storage Management tab of the Application Group window. The four
scenarios are illustrated in Figure 10-2 on page 222.
Migration of indexes
This configuration is set up by clicking Advanced on the Storage Manager tab of the
Application Group window.
This field determines when Content Manager OnDemand migrates index data to archive
storage. Choose from No Migration or Migrate After __ Days. As a preferred practice, do not
migrate indexes to archive storage. Indexes that are migrated cannot be searched until after
they are imported by an administrator. Use this capability only under limited circumstances.
Closing index tables: Before you can migrate index data, the index tables must be closed.
The following Database Organization field options are valid:
If the Database Organization field for the application group is set to Single Load per
table, this option is no longer supported.
If the Database Organization field for the application group is set to Multiple Loads per
table, the index table is closed when the Maximum Rows value is reached.
The Single table for all loads option is available for Content Manager OnDemand for
z/OS and Content Manager OnDemand for IBM i. Select the Single table for all loads
check box if you want to create one database table for each application group. This
option is most frequently used when you load a small amount of data. If you select this
option, the Maximum Rows field in this window is removed.
To close a table to loading before the Maximum Rows value is reached, run the ARSTBLSP
program with the -a1 parameter.
The index data must be migrated only after users no longer need to access the data. If a user
must access data in the migrated tables, the process of importing the data into the database
requires administrator intervention, and usually results in a significant delay in completing the
query. Additional space is required in the database and temporary storage areas to import the
data.
To enable the migration of index data, you must define a storage set that identifies a storage
node that is maintained by ASM and update the System Migration application group to use
the storage set.
Note: If you plan to maintain application group data in archive storage, the length of
time that ASM maintains the data must be equal to or exceed the value that you specify
for the Life of Data and Indexes field.
Life of Data and Indexes can be used only if ARSMAINT (with Multiplatforms or z/OS) or
Disk Storage Management (DSM) (with IBM i) handles the expiration.
Expiration type
The document expiration type determines how data is deleted from the application group. The
expiration type option is on the Storage Management tab of the Application Group window.
Note: The application group must have an expiration type of Load if any of the following
circumstances are true:
You use or plan to use the Enhanced Retention Management feature.
You use or plan to use the full text search feature.
You use or plan to integrate with the FileNet P8.
For application groups with expiration types of Document, Segment, or Storage Manager,
utilities exist to convert these application groups to Load.
With Content Manager OnDemand for Multiplatforms or z/OS, when the expiration type is set
to Load, if your object server is on z/OS, and your storage manager is OAM, you can allow
OAM to handle the data expiration and Content Manager OnDemand to handle the index
expiration by using ARSEXOAM program.
For more information about how to configure the system to use the ARSEXPIR and ARSEXOAM
programs, see the IBM Content Manager OnDemand for z/OS Administration Guide:
http://www.ibm.com/support/knowledgecenter/SSQHWE_9.0.0/com.ibm.ondemand.administe
ringzos.doc/aboutpub.htm?cp=SSQHWE_9.0.0%2F7-0
Storage Manager expiration is supported only on Content Manager OnDemand for z/OS
systems.
With Multiple Loads per Database Table enabled, the system uses the maximum number of
rows to determine when to close a table. A segment likely contains the data from more than
one input file. If the Maximum Rows setting is too large, the segment is not expired until all of
the documents in the table reach their expiration dates. If the Maximum Rows setting is too
small, segments are created constantly and potentially deleted (based on the expiration
date). This large number of tables imposes a performance impact during the search query
time and expiration time.
The system derives the expiration date from the Segment field (or the date that the data was
loaded, if there is no Segment field) and the Life of Data and Indexes field. If the Segment
field contains a date in the MMYY format, data is eligible to be deleted on the first day of the
month (MM).
Performance note: Individual document deletion is the most costly type of deletion in
terms of processor consumption and run time.
When you use the Enhanced Retention Management feature with Content Manager
OnDemand or IBM Enterprise Records (formerly IBM FileNet Records Manager), Content
Manager OnDemand must be in complete control of expiration processing. Therefore, if you
are using Tivoli Storage Manager or OAM, you must disable the ability for either of these
storage managers to expire data.
Also, you can use Enhanced Retention Management and Content Federation Services for
Content Manager OnDemand only with application groups with an expiration type of Load. For
those application groups with expiration types of Document, Segment, or Storage Manager,
utilities exist to convert these application groups to an expiration type of Load.
If you choose not to take advantage of the ability of Content Manager OnDemand to
aggregate documents into 10 MB storage objects, this decision might result in millions of
small objects that are stored in your storage manager, which might cause the storage
manager to experience performance problems when it migrates these small objects to tape.
Note: Consider aggregating these smaller objects into larger objects for performance
reasons.
For you to aggregate all of these tiny objects into larger objects after they are stored
individually requires that you retrieve and reload them as larger objects. You might want to
engage IBM Lab Services to assist you with this task.
Another option is to not migrate objects to tape, but to use another random access hardware
device instead.
You typically run the ARSMAINT program on a regular schedule to perform the following tasks:
Migrate files from cache storage to archive storage.
Delete files from cache storage.
Optionally, migrate index data from the database to archive storage.
Delete index data from the database.
The application group data and the data that you stored in cache are all managed by the
ARSMAINT program. It is managed by using the storage management values from the
application groups that are defined to the system.
Here are the storage management field values that are used:
Life of Data and Indexes
Length of Time to Cache Data on Magnetic
Length of Time Before Copying Cache to Archive Media
Length of Time Before Migrating Indexes to Archive Media
Length of Time to Maintain Imported Migrated Indexes
Expiration Type
Additionally, you can start manual expiration processing by running the ARSMAINT program
from the command line. For example, to run expiration processing, run the following
command at the command line:
arsmaint -d
When the ARSMAINT program removes indexes, it saves the following message in the system
log:
“128 ApplGrp Segment Expire (ApplGrp) (Segment)”
One message is saved in the system log for each table that was dropped during expiration
processing.
The relationship between ARSMAINT and ARSSOCKD processing is illustrated in Figure 10-3.
Life of Date and Indexes Settings ARSMAINT ARSSOCKD
Collecting statistics
Content Manager OnDemand provides two programs to collect statistics on database tables:
the ARSDB program and the ARSMAINT program.
When you run the ARSMAINT program to collect statistics, it collects statistics on all of the
tables in the database that changed since the last time that you collected statistics. You can
automate the collection of statistics by scheduling the ARSMAINT program to run with the
appropriate options.
You can use the ARSDB program to collect statistics on the Content Manager OnDemand
system tables. The Content Manager OnDemand system tables include the user table, the
group table, and the application group table. For most systems, the Content Manager
OnDemand system tables require little maintenance. You can probably schedule the ARSDB
program to collect statistics once a month (or less often).
The number of messages that are saved in the system log each time that expiration
processing runs depends on the following factors:
The options that you specify for the ARSMAINT program
The number of application groups that is processed
The number of segments of data that is processed
The number of cache storage file systems that are defined on the server
For example, when expiration processing starts on a specified server, you might see the
following message:
“109 Cache Expiration (Date) (Min%) (Max%) (Server)”
Migration processing uses the specified date (the default is “today” in internal format).
Expiration processing begins on each cache file system that exceeds the Max% (default 80%)
and ends when the free space that is available in the file system falls below the Min% (default
80%).
One of these messages shows for each storage object that is deleted from cache storage. A
storage object is eligible to be deleted when its “Cache Document Data for n Days” or “Life of
Data” period passes (whichever occurs first).
Also, information-only messages report the percentage of space that is used in the file
system.
A management class contains an archive copy group that specifies the criteria that makes a
document eligible for deletion. Documents become eligible for deletion under the following
conditions:
Administrators delete documents from client nodes
An archived document exceeds the time criteria in the archive copy group (how long
archived copies are kept)
ASM does not delete information about expired documents from its database until expiration
processing runs. You can run expiration processing either automatically or manually by
command. Ensure that expiration processing runs periodically to allow ASM to reuse storage
pool space that is occupied by expired documents.
When expiration processing runs, ASM deletes documents from its database. The storage
space that these documents used to occupy then becomes reclaimable. For more
information, see “Reclaiming space in storage pools” on page 233.
You can obtain more information in the “Running expiration processing automatically” section
at the following website:
http://ibm.co/1iO9SdX
If you use the server option to control when expiration processing occurs, ASM processes
expirations each time that you start the server. Afterward, it runs expiration processing at the
interval that you specified with the option, which is measured from the start time of the server.
You can manually start expiration processing by running the EXPIRE INVENTORY command.
Expiration processing then deletes information about expired files from the database. You can
schedule this command by running the DEFINE SCHEDULE command. If you schedule the
EXPIRE INVENTORY command, set the expiration interval to 0 (zero) in the server options so
that ASM does not run expiration processing when you start the server. You can control how
long the expiration process runs by using the DURATION parameter with the EXPIRE INVENTORY
command.
ASM reclaims the space in storage pools based on a reclamation threshold that you can set
for each storage pool. When the percentage of space that can be reclaimed on a volume rises
above the reclamation threshold, ASM reclaims the volume. ASM rewrites documents on the
volume to other volumes in the storage pool, making the original volume available for new
documents.
ASM checks whether reclamation is needed at least once each hour and begins space
reclamation for eligible volumes. You can set a reclamation threshold for each storage pool
when you define or update the storage pool.
During reclamation, ASM copies the files to volumes in the same storage pool unless you
specified a reclamation storage pool. Use a reclamation storage pool to allow automatic
reclamation for a storage pool with only one drive. See your ASM documentation for details.
After ASM moves all documents to other volumes, one of the following actions occur for the
reclaimed volume:
If you explicitly defined the volume to the storage pool, the volume becomes available for
reuse by that storage pool.
If the volume was acquired as a scratch volume, ASM deletes the volume from its
database.
For instructions about documentation that you might need to complete when you remove
storage volumes from a library and where to store them for safekeeping, see your
organization’s media storage guide.
Important notes:
Data retention protection is permanent. After it is turned on, it cannot be turned off.
Content Manager OnDemand does not support deletion on hold data. This feature
prevents held data from being deleted until the hold is released.
If you decide to use these policies in Tivoli Storage Manager, the Content Manager
OnDemand scenarios that are described in the rest of this section are supported.
Recommendations
Consider the following preferred practices when you work with data retention protection:
Set up the application groups to expire by load.
Define the Tivoli Storage Manager archive copy groups to be event-based, and retain data
for 0 days.
Run the Tivoli Storage Manager inventory expiration regularly to ensure that expired data
is removed.
ARSEXOAM
The ARSEXOAM program is used to process the rows in the ARSOAM_DELETE table that
indicate that Content Manager OnDemand OAM objects expired and to remove the
associated table entries for those objects. This program works for z/OS only.
Figure 10-4 shows how the ARSEXOAM program deletes the index entries for object stores in
OAM.
Figure 10-4 How ARSEXOAM deletes index entries for object stores in OAM
ARSEXPIR
The ARSEXPIR program can be used to process System Management Facility (SMF) records
that indicate that Content Manager OnDemand objects expired and to remove the associated
index entries for those objects.
Figure 10-5 on page 238 illustrates two methods that the ARSEXPIR program uses to expire
OAM and VSAM objects.
The ARSEXPIR program uses SMF type 65 (for VSAM objects) or SMF type 85 (for OAM
objects). The installation must collect and install ARSSMFWR as the CBRHADUX OAM
auto-delete exit. For more information, see “Deleting OAM and VSAM Objects” in the IBM
Content Manager OnDemand for z/OS: Administration Guide, SC19-1213.
ARSSMFWR determines which objects were deleted. The ARSEXPIR program then instructs the
Content Manager OnDemand server to remove the index entries.
Notes:
If one object for a load ID is deleted, all of the index entries for that load ID are deleted.
Index entries of all objects that are recorded as being deleted by the SMF records are
deleted regardless of the settings in the Life of Data and Indexes section on the Storage
Management tab of the application group. If you want to use Storage Management
expiration, ensure that you set the expiration types of all application groups to Storage
Manager.
If you do not run DSM, your disk storage requirements for Content Manager OnDemand
might be higher than expected. The number of objects that are stored in the integrated file
system (IFS) might also be higher than necessary, which results in longer save and restore
times.
Note: If you have never run DSM, the first execution of the Start Disk Storage Management
(STRDSMOND) command might last for an extended period.
If you want to configure Content Manager OnDemand so that DSM is not required in the
future, see the section “Eliminating the need to run Disk Storage Manager (DSM)” in the latest
Content Manager OnDemand for i Common Server Administration Guide, SC19-2792.
If you do not run ASM, your disk storage requirements for Content Manager OnDemand are
probably higher than expected. The number of objects that are stored in the IFS is also higher
than necessary, which results in longer save and restore times.
If you never run ASM, the first execution of the Start Archived Storage Management
(STRASMOND) command or the Start Disk Storage Management (STRDSMOND) command with
the STRASMOND parameter set to YES might last for an extended period.
For more information about expiring archives by using ASM, see Expiration processing in
Common Server Archive Storage Manager (ASM):
http://www.ibm.com/support/docview.wss?uid=swg21317082