Content Manager
Content Manager
Wei-Dong Zhu
Bert Bukvarevic
Bill Carpenter
Axel Dreher
Chuck Fay
Ruth Hildebrand-Lund
Elizabeth Koumpan
Sridhar Satuloori
Michael Seaman
Dimitris Tzouvelis
ibm.com/redbooks
SG24-7547-01
Note: Before using this information and the product it supports, read the information in
Notices on page xiii.
Copyright International Business Machines Corporation 2008, 2013. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP
Schedule Contract with IBM Corp.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . xix
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Summary of changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
June 2013, Second Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
Chapter 1. Introduction to IBM FileNet Content Manager . . . . . . . . . . . . . . 1
1.1 Industry challenges and IBM solutions benefits . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Industry challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Information lifecycle governance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Benefits of IBM ECM solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 IBM FileNet P8 Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Platform components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Enterprise capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 IBM FileNet Content Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Basic capabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 Enterprise foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 IBM FileNet P8 and related products . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.1 Content products. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.2 Ingestion products. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.3 Process products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.4 Compliance products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.5 Collaboration products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Chapter 2. Solution examples and design methodology. . . . . . . . . . . . . . 17
2.1 P8 Content Manager sample solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.1 Policy document creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 Processing insurance claims. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.3 Archiving SAP invoices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.4 Email capture for compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.5 Knowledge management through collaboration . . . . . . . . . . . . . . . . 27
iii
iv
Contents
vi
Contents
vii
viii
Contents
ix
Contents
xi
xii
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area.
Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product, program, or service that
does not infringe any IBM intellectual property right may be used instead. However, it is the user's
responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not give you any license to these patents. You can send license
inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer
of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may
make improvements and/or changes in the product(s) and/or the program(s) described in this publication at
any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Any performance data contained herein was determined in a controlled environment. Therefore, the results
obtained in other operating environments may vary significantly. Some measurements may have been made
on development-level systems and there is no guarantee that these measurements will be the same on
generally available systems. Furthermore, some measurements may have been estimated through
extrapolation. Actual results may vary. Users of this document should verify the applicable data for their
specific environment.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm
the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on
the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
xiii
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the
sample programs are written. These examples have not been thoroughly tested under all conditions. IBM,
therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corporation in the United States, other countries, or both. These and other IBM trademarked
terms are marked on their first occurrence in this information with the appropriate symbol ( or ),
indicating US registered or common law trademarks owned by IBM at the time this information was
published. Such trademarks may also be registered or common law trademarks in other countries. A current
list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
AIX
ClearCase
Cognos
DB2
developerWorks
Domino
Enterprise Storage Server
FileNet
Global Business Services
GPFS
IBM
InfoSphere
Lotus Notes
Lotus
Notes
OpenPower
Optim
PowerHA
pSeries
pureScale
Rational
Redbooks
Redbooks (logo)
System p
SystemMirror
Tivoli Enterprise Console
Tivoli
WebSphere
xiv
Preface
IBM FileNet Content Manager Version 5.2 provides full content lifecycle and
extensive document management capabilities for digital content. IBM FileNet
Content Manager is tightly integrated with the family of IBM FileNet products
based on IBM FileNet P8 Platform. IBM FileNet Content Manager serves as the
core content management, security management, and storage management
engine for the products.
This IBM Redbooks publication covers the implementation best practices and
recommendations for solutions that use IBM FileNet Content Manager. It
introduces the functions and features of IBM FileNet Content Manager, common
use cases of the product, and a design methodology that provides
implementation guidance from requirements analysis through production use of
the solution. We address administrative topics of an IBM FileNet Content
Manager solution, including deployment, system administration and
maintenance, and troubleshooting.
Implementation topics include system architecture design with various options for
scaling an IBM FileNet Content Manager system, capacity planning, and design
of repository design logical structure, security practices, and application design.
An important implementation topic is business continuity. We define business
continuity, high availability, and disaster recovery concepts and describe options
for those when implementing IBM FileNet Content Manager solutions.
Many solutions are essentially a combination of information input (ingestion),
storage, information processing, and presentation and delivery. We discuss
some solution building blocks that designers can combine to build an IBM FileNet
Content Manager solution.
This book is intended to be used in conjunction with product manuals and online
help to provide guidance to architects and designers about implementing IBM
FileNet Content Manager solutions.
Many of the features and practices described in the book also apply to previous
versions of IBM FileNet Content Manager.
Product name changes: New for Version 5.2, IBM FileNet Business Process
Manager has been renamed to IBM Case Foundation.
xv
Authors
This book was produced by a team of specialists from around the world working
at the IBM Software Development Lab in Costa Mesa, California.
Wei-Dong Zhu (Jackie) is an Enterprise Content Management (ECM) Project
Leader with IBM in Los Angeles, California. She has more than 10 years of
software development experience in accounting, image workflow processing, and
digital media distribution. Jackie holds a Masters of Science degree in Computer
Science from the University of the Southern California. Jackie joined IBM in
1996. She is a Certified Solution Designer for IBM Content Manager and has
managed and led the production of many Enterprise Content Management IBM
Redbooks publications.
Bert Bukvarevic is an IT Specialist for ECM with IBM in Germany. He has 10
years of experience in the ECM Platform. He has worked at IBM for six years. His
area of expertise is a T-shape skill, which means that the IBM FileNet P8 Content
Platform is required to collaborate across different technologies. He has written
extensively about deployment, upgrades, and migration.
Bill Carpenter is an ECM Architect with IBM in the Seattle area. Bill has had
experience in ECM since 1998 as a developer, development manager, and as an
architect. He is the author of the book Getting Started with IBM FileNet P8
Content Manager. He is also co-author of the first edition of this book and
Developing Applications with IBM FileNet P8 APIs, a contributing author for IBM
developerWorks, and a frequent conference presenter. He has experience in
building large software systems at Fortune 50 companies and has also served as
the CTO of an Internet start-up. He has been a frequent mailing list and patch
contributor to several open source projects. Bill holds degrees in Mathematics
and Computer Science from Rensselaer Polytechnic Institute in Troy, New York.
Axel Dreher is a Managing Consultant working as an ECM Architect and Project
Leader with IBM in Villingen-Schwenningen, Germany. He has more than nine
years of experience in designing and implementing high-performance,
high-volume, and high-availability solutions around the IBM FileNet product suite
and has worked at IBM for five years. Axel studied media and computer science
and graduated in Computer Science with a Diplom-Informatiker degree from the
Fachhochschule Furtwangen, University of Applied Sciences in Germany. He is
an IBM Certified Specialist for IBM FileNet Content Manager and IBM Case
Foundation. Axel specializes in infrastructure implementation and
troubleshooting for IBM FileNet P8 solutions.
xvi
Preface
xvii
xviii
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about
this book or other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
http://www.ibm.com/redbooks
Send your comments in an email to:
[email protected]
Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Preface
xix
xx
Summary of changes
This section describes the technical changes made in this edition of the book and
in previous editions. This edition might also include minor corrections and
editorial changes that are not identified.
Summary of Changes
for SG24-7547-01
for IBM FileNet Content Manager Implementation Best Practices and
Recommendations
as created or updated on June 7, 2013.
xxi
xxii
Chapter 1.
Integration and
Federation
Business Process
Core Content Services
Capturing digitized content
Electronic content search
and retrieval
Simple Content lifecycle and
records management
Content-centric workflow
management
Business process
management
Enterprise process
automation
Analytics and
Optimization
Synthesized business
information data marts
with unstructured content
BI content analytics
Unstructured content
semantic correlations
Predictive analysis
Enterprise process
optimization
Basic Content
Capture
Sophisticated
Value of Content
These components address enterprise content management and are (except for
ECM Collaboration Services and FileNet Content Federation Services)
described in Chapter 3, System architecture on page 37. Some components
are included in the base platform offering and some require additional licensing.
All IBM P8 Platform capabilities are inherited and make a foundation for IBM
ECM solutions. Additional components can be added to a system to enable
additional capabilities.
The IBM P8 Platform capabilities can be leveraged for a wide range of enterprise
scalable solutions, including case management, automated content collection
using the IBM Content Collectors and Datacap, Image Manager, IBM Enterprise
Records, and Information Lifecycle Governance (ILG) portfolio of products and
more. Figure 1-2 illustrates the interaction of some of the ECM portfolio of
products with P8 Content Manager.
Image
Manager
DataCap
Content
Federated
Services
Content
Connector
for Email
Content
Manager
Content
Connector for
File Shares
Content
Classification
Module
Case Manager
Enterprise
Records
Scalable architecture
Content Platform Engine achieves excellent performance rates with a scalable
architecture, offering both vertical and horizontal scalability solutions.
IBM Content Platform can be farmed (scaled horizontally) or can be scaled
vertically by either running multiple instances on a single box or configuring a
single instance to use multiple servers.
Multiple servers can be added in load-balanced configurations to handle
increasing transaction loads. This architecture makes the Content Platform
Engine an ideal candidate for large corporations, government agencies, or any
client with large information management requirements.
steps, with each step representing an event or process that acts on the object
content.
P8 Content Manager features event action scripts that can be triggered when
objects are created, modified, or deleted in the repository. Event actions can
launch workflows or execute Java applications. Events, and the actions that they
trigger, are the mechanism that enables active content.
When designing an ECM solution, document retention can be addressed and
implemented by using the retention management features, including the sweep
framework and event-based retention.
IBM Content Navigator is a ready-to-use, standards-based user interface that
delivers intelligence and control of collaborative content across the organization.
Invoice processing
A major challenge for the account payable process is the ability to process a
large number of paper-based invoices, and place them into the system for
tracking and control invoice processing.
The solution must enable the processing of invoice transactions using a
structured, automated process. This process should secure content associated
with each invoice, providing stakeholders with visibility to the process steps. A
content management platform integrated with an accounts payable process also
enables the continuous improvement of internal business operations over time by
managing each payment case, its associated content, and reporting on the
efficiency of the business process.
The solution includes the ability to extract invoice data from paper invoice
documents during scanning and indexing, applying advanced lookups against
business applications and systems of record, and creating an invoice in the
Accounts Payable application. A robust document capture solution, such as that
provided by Datacap, supports the required high volume ingestion rates. Within
the process, the invoice is used to support day-to-day activities, storing electronic
copies of the documents as records of evidence.
Users (whether internal or external) are able to perform work from within the data
processing application. The content associated with the data systems is stored in
the repository.
The solution provides the following capabilities:
Enables the ingestion of both scanned paper and digital content, whether
submitted by internal or external users, and the automatic classification of the
content.
10
11
that integrate with P8 Content Manager. The products can be grouped into the
following categories:
Content products
Enabling companies to activate content with processes to add value and
transform their business.
Ingestion products
Used to add documents into a repository by copying, moving, or linking to
source documents, as well as index and classify the content.
Process products
Automate and optimize complex processes across the enterprise using
effective content and compliance.
Compliance products
Ensure content is kept as required and deleted when no longer needed.
Collaboration products
Provide an environment where people can work together to achieve business
goals and streamline processes.
There are many content products in the IBM ECM portfolio. It is beyond the
scope of this book to introduce all of them. However, to help you better
understand what these products can do for your corporation, we briefly introduce
several of them here.
12
Email servers
File shares
Microsoft Sharepoint servers
SAP
Based on the configuration of the connector to the ingestion source, IBM Content
Collector, in addition to adding the content to a P8 Content Manager repository,
also performs other tasks:
Creating stubs to the document in a manner transparent to existing client
applications
Declaring content as records
Deduplicating content, which is especially important for email attachments
IBM Datacap
IBM Datacap is a scanning and indexing solution that is integrated with Content
Platform Engine. This product features a complete suite of document indexing
capabilities, including automated document identification and text recognition.
13
Datacap automates the input of data from documents to reduce cost and
accelerate document process efficiencies.
To help you better understand what these products can do for your corporation,
we briefly introduce them here.
IBM Forms
IBM Forms provides an electronic alternative to paper forms. With an easy to use
visual design environment, the electronic forms support easy data entry, can be
used to perform calculations, and can be integrated into business processes.
14
To help you better understand what these products can do for your corporation,
we briefly introduce a few of them here.
15
IBM Enterprise Records utilizes unique Zero Click technology to reduce the
burden and costs associated with proper management of an organizations
records and integrates directly with Content Platform Engine to allow repository
objects to be managed as records.
Use IBM Enterprise Records with the other compliance, process, and ingestion
tools to provide an end-to-end records management solution that covers the
capture, management, and destruction of records.
1.5 Conclusion
The unified content and process architecture of Content Platform Engine
provides inherent capabilities to integrate content with applications to provide
integrated support for data transactions and to coordinate business processes.
IBM core ECM provides the foundation for delivering robust solutions across the
enterprise.
In the next chapter, we provide detailed examples illustrating the use of P8
Content Manager as part of an ECM solution.
16
Chapter 2.
Design methodology
17
18
workers must always have access to the current policy information. The following
flow is for the events of this use case:
1. The author creates a new document for revision and assigns a minor version
number on the documents.
2. The reviewers and authors collaborated in this document and adding minor
versions of the document.
3. The approver approves the final revision of the document and checks in the
document as a major version.
4. The major version is available now to all users based on their permissions
and security is applied to the document as the final approved version of the
document.
This P8 Content Manager solution is implemented with the following features:
Content versioning, including major and minor versions
Document access controlled by role-based permissions and dependency on
document lifecycle status
Check-in and check-out capability
Figure 2-1 illustrates the implemented document revision and approval process
using P8 Content Manager.
4. New version
supersedes the older;
all versions are retained
in the repository.
1. Author creates
document for
revision.
0.1
0.2
1.0
0.1
Repository
0.2
1.0
19
Solution description
This solution uses P8 Content Manager features without additional programming.
In the design, policy documents are stored in the repository where they are
available to all users for reference.
Each policy document goes through a document lifecycle with multiple states. In
this implementation, the states are minor and major versions. A minor version is
a draft document; a major version is a completed document that has been
approved and released. A security policy is implemented to define the security
that applies to documents in the major version state and those in the minor
version state. Minor versions can only be viewed and modified by authors,
reviewers, and approvers. They are invisible to general users. All users can view
major versions, but only authors, reviewers, and managers can modify them.
For simplification and to reflect the majority of actual solutions, this sample
solution does not include document retention. When implementing a document
revision solution for your environment, you must address your document
retention requirements and include them in your solution implementation as
necessary.
20
21
Figure 2-2 illustrates the implemented insurance claim processing solution using
P8 Content Manager.
Users access claim file
and documents
Insurance
Claim
Document Scanning,
Faxing, Email and Uploading
from external sources
Document notification
will trigger a new task
in Claim processing
Notification to create
a Records Folder
Notification to Close
the Claim Folder
Send notification
that Claim is
ready for disposition
CFS
ROS
Documents
FPOS
Policy
Data
Records
Solution description
In this solution, the active content is the insurance claims. The content is moved
through a business process in a series of steps implemented in the LOB
application (claim processing application).
From the claim processing application, a records folder is opened in the ECM
system for each new claim. IBM Datacap converts the paper documents received
from the client to electronic documents and stores them into the ECM repository.
Datacap also extracts the metadata from the documents using ICR and
automatically indexes the documents without user interaction. A record is
declared for the documents that are stored in the ECM repository and the
retention period is set based on the business requirements. The LOB application
utilizes the features of the ECM system through the integration capabilities of the
P8 Content Manager. The documents can be retrieved and viewed from different
sources that are transparent to the business user.
22
IBM Datacap
IBM Content Collector for SAP
P8 Content Manager security
P8 Content Manager server farm
High performance search operation (load balancer)
Scalability
23
Scanners
Scanner workstations
Image Client
Verify workstations
Recognition/
verification
OCR/ICR/OMR
Indexing
Release to FileNet
Repository
Document is processed:
Classified
Data extracted
Data looked up
Data verified
Invoice created
Document is released
to repository
Record is created
Create Invoice
Manage
Documents
IBM Content
Navigator
Search and
retrieve the
documents
Native Format
Viewer
View and
markup the
document
Solution description
This solution utilizes the following P8 Content Manager components and
capabilities: IBM Datacap Invoice Capture, IBM Content Collector for SAP, and
server farms.
24
Server farms
For applications with high-volume loads, P8 Content Manager can be configured
as a server farm. A server farm employs multiple servers to multiply processing
power. In this solution, three P8 Content Manager servers are deployed to
spread the document processing load across three separate P8 Content
Manager servers. A load balancer spreads the incoming load evenly so that even
a high ingestion rate, the load does not overload a single server. In a similar
fashion, searches and document retrieval requests are managed by a load
balancer on the call center side.
Server farms can also be configured in highly available configurations. Refer to
Chapter 3, System architecture on page 37 for more details.
25
Collection Rules
e-mail Server
Inbox
Records File Plan
Email Manager Server
1. Effective e-mail
Content Manager
with Records
Manager
management involves
declaring e-mail content as
business records.
Solution description
This solution uses IBM Content Collector to monitor an Exchange Server Journal
(IBM Lotus Notes and other email systems are also supported). The journal
contains a copy of all incoming and outgoing messages. IBM Content Collector
for Email monitors the journal and searches for messages that meet a set of
conditions or rules. Common conditions include:
Messages that contain particular keywords
Messages to or from a particular set of addresses
Messages that pertain to compliance issues raised by the legal department
Messages that meet the set of conditions or rules are treated this way:
The message is captured and added to P8 Content Manager.
Duplicates of the message (if the message was sent to multiple recipients)
are identified. Only one copy is added to the repository.
Selected messages based on business rules (for example, emails to a
specific address or containing a combination of keywords on the subject or
mail body) are classified and declared as an official record subject to legal
retention rules.
26
In the users mailbox, the message is replaced by a stub. When the user clicks
the stub, the message is retrieved from the repository and displayed in
Outlook as expected.
When the records retention period expires, the content is destroyed with no
ability to restore it.
27
5. Document is published
on IBM Connections
Community for trainees.
1. Author creates
content and
publish it to
repository
IBM Connections
Files
2. Document is published
on IBM Connections
Community for SME review.
IBM Connections
Files
Repository
Communities
Communities
6. Community members
are collaborating over
the training material.
0.2
1.0
Solution description
In this solution, the content that is managed is the training material of the
organization. Training material can include documents, presentations, video, and
audio. Content authors and SMEs need to collaborate during material
preparation. Trainees need to follow the training material, put in comments, and
view activity streams, recommendations, and the number of downloads.
In the first step, content is stored in Content Manager along with metadata and is
available through the IBM Connections community for review by the SMEs. Using
IBM Connections, a SME can add comments, put in recommendations, monitor
the activity steam over the content, and upload new versions in the Content
Manager repository. When the review process is completed, the finalized content
is available on the user community.
Using IBM Connections, a trainee can view new content, add tags, comments,
and recommendations, and follow the content. Content Manager provides activity
feeds, recommendation counts, download counts, and tag searching, enabling
users to locate training content and find relevant content in their social network.
28
29
The ECM strategy will establish a common picture of the ECM future that clarifies
the business units directions, aligns actions to the correct directions, and
coordinates various initiatives. The organization needs to have a clear
understanding and establish an ECM vision to pursue core ECM elements and
any adding advanced capabilities.
In our experience, the following elements of the ECM strategy are key:
ECM standards
Security standards
Content taxonomy and classification
Content inventory
Change management plan for the adoption of the ECM system
ECM requires a change and people management approach that will support
every business group and requires a managed business process change and
large-scale adoption of the technical solutions. The fundamental challenge of
rolling out an ECM system is that it requires the active involvement and
participation of all users.
Effective ECM strategy eliminates information silos, enables the foundational
ECM elements, such as the enforcement of the information governance and
support for business requirements, and enables the integration and automation
of business processes.
30
Typically, requirements gathering is an iterative process. You will start with the
functional design, revisit the requirements, complete the requirements analysis,
and then revisit the functional design. This process continues until you feel
confident that all known requirements have been identified and addressed.
The enterprise content management illustration in Figure 2-6 helps in structuring
the discussion among the various people involved in understanding what
functionality the solution requires to solve specific business problems.
ECM
Foundation Components
Repositories
Business Objects
Content Ingestion
Tools
Versioning
Clasification
Search
Process Management
Auditing
APIs
Social ECM
Presentation Features
Display
Paper scanning
Fax
Storage
Management
Subscriptions
Workflow
definitions
Document
Lifecycle management
Publishing
Browsing
Applications
FTP
Printing
Monitored
filesystem
Workflows / EAI
Figure 2-6 A simple input and output diagram to assess functional requirements
31
For example, based on Figure 2-6 on page 31, you can discuss the following
points to help you gather functional requirements:
How will content be ingested? Options might be paper scanning, fax, email,
other applications, FTP, monitored file system, or workflows.
What are the content and workflow management requirements?
Considerations can be indexing, validation, the addition of document
management, security, binding of documents, usage of entry templates, and
check-in and check-out features.
What are your presentation and delivery management requirements?
Consider content-based search, publishing, and browsing requirements;
printing needs; display needs; Simple Mail Transfer Protocol (SMTP) sends;
and the requirement for the usage of search templates.
What indexing terms will be used to identify and retrieve information?
What interfaces are required with existing systems to move information
between systems?
What is the lifecycle of the document and how does it fit as part of the
business process?
How will auditing requirements be satisfied?
How will security or information confidentiality requirements be met?
How will the information be searched across the enterprise?
What is the minimum required to describe the metadata?
How long will the documents need to be stored in the system?
Who will be responsible for the destruction of the documents?
What type of storage will be required?
Working from the requirements analysis to a functional design is probably one of
the biggest challenges in your project. This activity requires extensive experience
and solid knowledge of the P8 Content Manager product. We address this topic
in more detail in Chapter 13, IBM FileNet Content Manager solutions on
page 427.
As you read through each chapter of this book, remember that each chapter
provides many of the best practices for a number of scenarios but in a
generalized way. Use these best practices and recommendations within the
context of the actual functional requirements for your solution; do not apply them
as is.
32
Server topology
Network (LAN and WAN) topology
Scalability and continuity
Virtualization
Shared infrastructure
Capacity planning
Performance
33
34
User acceptance tests verify that the system meets the functional requirements
described in the functional design document.
Regression tests verify that the functionality of the solution is not affected after
the implementation of new features or the resolution of a defect. It is important to
have a full set of regression tests for the ECM solutions and we suggest that
those tests are automated.
Performance and load tests verify the system responsiveness and stability under
a defined workload. It is a best practice to develop performance tests and
execute them before any major release is put on the production environment.
Backup and recovery tests verify that the system can be successfully recovered
in case of disaster. You need to perform at least one recovery test a year to verify
that your backup and recovery plan is still valid.
2.2.9 Deployment
Deployment is defined as the methodology to move a designed solution from
development to production. When planning for deployment, issues related to
35
release management, change management, testing, and the steps for the actual
move need to be considered. It is important to plan for deployment as early as
possible, especially at development time, to address many of the challenges that
might arise in this area.
For more details, refer to Chapter 9, Deployment on page 271. Also, refer to
Chapter 11, Upgrade and migration on page 371 for upgrade and migration
information.
2.3 Conclusion
In this chapter, we introduced five typical use cases where Content Manager can
be used. Those use cases are simplified and the actual use cases are much
more complicated. The purpose of those examples is to make a quick
introduction of Content Manager features and capabilities. The detailed solution
building blocks and how they are used for the implementation of those use cases
are described in Chapter 13, IBM FileNet Content Manager solutions on
page 427.
We also described the proposed methodology for the implementation of an ECM
system using Content Manager. That methodology is used as a structure for the
rest of the book.
36
Chapter 3.
System architecture
IBM FileNet Content Manager is a collection of tightly integrated components that
are bundled together as a common platform. The broad functionality provided by
these integrated components constitutes an enterprise content and process
management platform. Some of the key elements of this platform are a metadata
repository, a workflow system, an application that can be used by all clients to
access content and participate in processes, and a storage framework that
supports a wide range of storage devices and platforms.
In this chapter, we introduce the components of an IBM FileNet Content Manager
system, discuss the logical and physical layout options, and discuss how to scale
the system to meet both local and global business needs.
We discuss the following topics:
Basic components
Scalability
Virtualization
Shared infrastructure
Geographically distributed systems
For information about the internal system architecture of IBM FileNet Content
Manager and the P8 Platform, see IBM FileNet P8 Platform and Architecture,
SG24-7667.
37
38
JDBC
Datab ase
IIOP/T3
JDBC
NFS/GPFS/CIFS/API
IIOP/T3
Storage
Content
Platform Engine
Workplace XT
HTTP
/HTTPS/
IIOP/T3
LDAP
L DAP
39
40
Figure 3-2 illustrates the logical relationship between a P8 domain, and its
associated configuration information, object stores, and workflow systems.
LDAP
Fixed
storage
devices
P8 Domain
GCD
Object store
Workflo w syste m
Object store
Workflo w syste m
Object store
Workflow system
41
Figure 3-3 illustrates the relationship between the components that are used by
IBM FileNet Content Manager to instantiate a P8 domain and the software
components that comprise the P8 domain.
Sto rage
devi ces
P8 Doma in
Markin g Sets
Co ntent
Pla tform
Eng ine
Serve rs
42
43
Retention management
Retention management defines how long content must be kept before it can be
deleted. With retention management, a date is set that defines the earliest that
content can be deleted. Retention management differs from records
management in that it is usually driven by corporate rather than legal
requirements.
In IBM FileNet Content Manager v5.1 and earlier releases, support was provided
for static retention periods. That is, when a document was added to an object
store, you were able to define the minimum period that the document must be
kept. Now, IBM FileNet Content Manager also supports dynamic retention.
With dynamic retention, the length of time a document must be kept can be
updated at any time by users with the appropriate level of permissions.
Defining retention periods helps with storage management only if you also
regularly delete content that has passed its retention requirement. Set up regular
jobs to delete content that is no longer needed as described in 4.11.2, Automatic
disposition on page 139.
Deduplication
Preventing duplicate copies of the same content from being stored in an object
store can save a significant amount of space.
A common scenario where this feature is beneficial is when email is archived into
an object store via IBM Content Collector. Often, several clients will receive the
same email with the same attachments. If the content is stored in a storage area,
enabling deduplication will ensure that only one copy of the content is stored.
The IBM FileNet Content Manager software tracks the documents associated
with the content, and will not delete the content until all the associated
documents have been deleted.
Deduplication occurs at the content rather than block level. So, additional space
savings can be achieved by also using compression.
44
Compression
Two types of compression are available:
If you are using IBM DB2, consider enabling row compression on large
object store database tables to improve application response times.
You can also choose to compress storage areas. The compression is
block-based and, depending on the type of files being stored, might not
provide a significant savings. The overhead introduced with this capability can
affect both upload and download performance, so test this feature in your
environment to ensure that any performance impact is offset by the space
savings.
Securing content
When content is stored, you can encrypt the data prior to it being stored on the
underlying storage (file store, fixed content device, or database).
The requirement for data encryption is set at the storage container level and can
be enabled or disabled at any time. When the encryption is enabled, any data
added to the storage device will be encrypted, but existing content is not affected
by enabling or disabling the encryption capability.
The overhead introduced with this capability can affect both upload and
download performance.
Data encryption can be used with the other storage features, including data
compression and deduplication.
45
46
For example, prior to configuring the Content Platform Engine, use the tools
supplied by the database vendor to create the databases and tables required for
the GCD, at least one object store, and at least one workflow system. IBM
Content Navigator also requires a database in which to store configuration
information.
Recommendations: Although creating a workflow system is not required
when setting up an IBM FileNet Content Manager environment, consider
creating an initial workflow system and validating basic workflow functionality
when configuring the Combined Platform Engine.
After the basic IBM FileNet Content Manager installation and configuration are
complete, the database tools will also be needed for tasks, such as backing up
the environment, and creating complex indexes to improve performance.
See the FileNet P8 Information Center for a full list of the tasks that need to be
completed before starting an IBM FileNet Content Manager installation:
http://publib.boulder.ibm.com/infocenter/p8docs/v5r2m0/index.jsp?topic=
%2Fcom.ibm.p8toc.doc%2Fplanning.htm
The following administrative tools are supplied with IBM FileNet Content
Manager:
Configuration Manager
Running the Content Platform Engine installation program lays down the
software but does not perform any configuration tasks. So, after you run the
installation program, use the Configuration Manager to identify the following
information:
The application server to host the Content Platform Engine application and
services.
The LDAP server that will provide the authentication and authorization
services required by IBM FileNet Content Manager.
47
ACCE
A web-based tool for defining P8 domain and object store configuration data.
ACCE replaces the FileNet Enterprise Manager tool that was provided with
previous IBM FileNet Content Manager releases.
Deployment Manager
A thick client tool used to move object store configuration data and content
between object stores in the same or different P8 domains. This tool is also
used to reassign object stores to different P8 domains.
Chapter 9, Deployment on page 271 in this Redbooks publication provides
detailed information about using the Deployment Manager.
Consistency Checker
Used to validate that pointers in the object store database that reference
content in a storage area are correct.
FileNet Enterprise Manager
A thick client tool used to administer P8 domains and object stores. This tool
is being replaced by ACCE.
Process Configuration Console, Process Administrator, Process Designer,
and Tracker
These are Java applets that are started from FileNet Workplace XT and that
are used to instantiate a process region, and to design and manage business
processes.
48
Exte rnal
Cli ents
Extern al
Fire wall
HTTP
Servers/Lo ad
Bala ncer
Inte rn al
C lie nts
In te rnal C onten t
Platform Engi ne
In sta nces
Figure 3-4 IBM FileNet Content Manager configuration based on security requirements
For more details about configuring the hardware layout based on security
requirements, see IBM FileNet P8 Platform and Architecture, SG24-7667.
49
Manager. Use the Composite Platform Installation Tool (CPIT) supplied with IBM
FileNet Content Manager to install and configure this environment. CPIT
completes the following tasks:
Installs and configures:
Tivoli Directory Server as the LDAP
DB2 as the database server
IBM WebSphere Application Server as the application server
Content Platform Engine
FileNet Workplace XT
IBM Content Navigator
Creates:
The databases required for the GCD, one content store, one workflow
system, and the IBM Content Navigator configuration information
Administrative user accounts and groups in the LDAP
The GCD
One content store
One workflow system
After the CPIT installation completes, use the single-server installation to perform
these tasks:
Gain familiarity with IBM FileNet Content Manager in general, or the latest
new features.
Define requirements and specifications for new applications or updates to
existing applications.
Start building a new solution or refining an existing solution.
50
51
3.2 Scalability
When planning for an enterprise-wide system, there are a number of
environments to configure, and the workload and scaling requirements for each
are likely to be different. Consider configuring the following IBM FileNet Content
Manager environments:
Sandbox
Typically, a single-server installation that can be used by system architects
and project leads to learn about new software releases and for demonstration
purposes.
Development
The environment used by the development team to build custom applications
for an IBM FileNet Content Manager deployment. The size and configuration
of this environment is typically driven by the number of developers on the
project.
System test
The environment used by the test organization and representatives of the
client teams to verify that the application being built meets their needs in
terms of functionality, usability, and response times.
This environment is also used for integration testing to ensure that elements
of a single solution or that multiple enterprise-wide solutions complement
each other, can coexist, and interact as expected.
The size and configuration of this environment is usually a scaled-down
version of the expected production configuration.
Load and performance test
The environment used to ensure that the production system can cope with the
expected load and provide clients with response times that enable them to
complete their work. The system needs to be configured similarly to the
production system in terms of hardware and data (both the type of data and
the volume need to emulate what is available in the production environment).
Production fix
When introducing new functionality into a production environment, a best
practice is to introduce the changes into development, then promote the
changes first to system test, then to load and performance test, and ultimately
to the production environment. This path is recommended to help ensure that
the new functionality meets both the functional and performance needs of a
production application.
52
The IBM Capacity Planner is a tool that provides guidelines on how much is
needed given your business requirements. For more information about this tool
and capacity planning in general, refer to Chapter 8, Capacity planning with IBM
Content Capacity Planner on page 253.
When scaling an IBM FileNet Content Manager environment, you can choose to
scale:
Hardware using both horizontal and vertical techniques.
Java virtual machines (JVMs) using application cluster technology, such as
WebSphere Network Deployment, or by adding independent JVMs and load
balancers.
53
3.2.3 Clustering
Since Content Platform Engine is a Java EE application deployed on an
application server, the system can be scaled out further using the clustering
technology provided by the application server. For example, using WebSphere
Network Deployment, additional application servers can be configured. The
Content Platform Engine, FileNet Workplace XT, IBM Content Navigator, and
custom applications can be automatically deployed from the WebSphere
Network Deployment management node to these additional servers. The
advantage of this type of configuration is easier maintenance because software
is updated manually in only one location and the deployment to the managed
nodes is automatic.
Figure 3-6 on page 55 illustrates this type of clustering.
54
P8 Domain
Cl ien t ap pli ca tion JVMs
55
Lo ad ba lan ce r
P8 Domain
Cli ent app lica ti on JVMs
cli ent
appl icatio n
Content Platform
E ngine
cli ent
app lica ti on
Content Platf orm
E ngine
cli ent
app lica ti on
Content Platf orm
Engine
D ataba se
se rver
To distribute the load to the client applications, use an external HTTP server with
a load balancer, even when the client applications are installed using an
application server clustering technology.
56
3.3 Virtualization
Virtualization has become a major trend in the IT industry. The drivers for
virtualization are cost reduction and providing better management of hardware
resources. Virtualization can be applied over servers, storage, and applications.
In this section, we focus on server virtualization.
57
In general, multiple servers are consolidated into fewer servers and operate
inside of their own environment. An abstraction layer between the physical
resources and the running application is created. Physical resources are
encapsulated as logical resources, and the environment for the application is
moved into a virtual machine (VM). The shared resources usually are CPUs,
memory, network bandwidth, and hard disk storage.
The benefit of virtualization is better use of the current hardware, because the
number of physical boxes decreases, and a physical box becomes a virtual
machine. Instead of managing multiple systems, the resource optimization can
be concentrated at one point. It also opens new pathways for high availability and
disaster recovery, because you can copy entire systems to another location.
Depending on the virtualization technology that you use, the system
administrator can assign each virtual machine an individual amount of resources,
such as memory or a fraction of CPU resources at run time. This increases
system agility and ensures scaling on demand. An administrator can react
dynamically to changes in system utilization.
For example, if at the end of the month, usage of a certain virtualized application
increases sharply, it can be scaled on demand and assigned more system
resources. In that way, the system hardware is used more efficiently.
Another example is systems that are usually idle and have predictable peak
times. Given the fact that the peak times occur at different points in time, you can
benefit by moving applications from these systems onto one virtualized server.
A third example is systems that are used for training and support. Because
virtualization technology provides the option to clone an existing system, you can
clone a training system with preloaded data from another system. In the area of
client support environments with different operating systems, application version
and patch levels can be stored and started on demand. That increases flexibility
and speeds up problem deduction, because no time-consuming installation tasks
are necessary.
Virtualization approaches differ in the degree of abstraction. In this book, we only
provide an overview. For more information about virtualization, see the
information provided by the virtualization solution providers.
58
When the VM wants to access resources that are managed in a system context,
the access is performed by a virtual machine monitor (VMM). The VMM analyzes
the code and provides a replacement function that safely accesses the
resources. Figure 3-8 illustrates virtual machines using VMM.
VM 2
VM 3
Application
Application
Application
Operating
System (VM)
Operating
System (VM)
Operating
System (VM)
In certain implementations, the host operating system and VMM are combined
into a single layer. Examples of this approach are VMware products or Microsoft
Virtual Server.
59
VM 1
VM 2
VM 3
Application
Application
Application
Partition
Partition
Partition
Partition Management
Operating System (Host)
In this scenario, the coupling between the host operating system and the VM is
much tighter. Because only one kernel is used, the overhead incurred with this
approach is small. However, the disadvantage of virtualization at the operating
system level is that it does not allow you to run different operating systems.
The isolation of the single partition is key, because the system operates in one
kernel. This is done in the partition management part of the operating system.
The resource management, which is where the physical resources such as
CPUs and memory are assigned, is also done in the partition management part
of the operating system.
This level of virtualization is popular for service providers who offer Internet
services or host special services. For this scenario, the low overhead and the
automation for replicating and horizontal scaling of virtual servers are key.
60
Figure 3-10 provides one possibility for deploying an IBM FileNet Content
Manager system in a virtualized environment. Multiple virtual machines are
involved. Each component is deployed in its own virtual machine, in a separate
partition. The figure includes the database server and the directory server,
although typically these components are virtualized only in demo or sandbox
environments.
VM 1
VM 2
W o rk p la ce
XT
C on te n t
Pl a tfor m
En g i n e
P ar titi o n
P a rtiti o n
VM 4
VM 5
IBM C o n te nt
N a v ig a to r
D ata b a se
S e rve r
D i re c to r y
Se r ve r
P a rti tio n
P a rtiti o n
P ar titi o n
VM 3
P ar titi o n M a n a g em e n t
Op e ra ti ng S ys te m (H o st)
This architecture offers the highest flexibility and scalability because of the
number of virtual machines that you can have in the configuration.
VM 1
Workplace
XT
Content
Platform
Engine
IBM
Content
Navigator
Database
Server
Directory
Server
Partition
Partition Management
Operating System (Host)
61
System duplication
Using virtual technologies can make it easier to duplicate systems, but consider
uniqueness requirements when reusing images:
Host names must be unique and some applications, including WebSphere,
are sensitive to host name changes.
If using a multitiered application solution in a WebSphere Application Server
Network Deployment environment, ensure that the cell names are unique at
each layer in the tier. See the following article for more details on this issue:
http://pic.dhe.ibm.com/infocenter/wasinfo/v8r0/index.jsp?topic=%2Fco
m.ibm.websphere.nd.multiplatform.doc%2Finfo%2Fae%2Fae%2Fuagt_rcell.h
tml
When using static IP addresses, ensure that duplicate images are updated
with new values and that any load balancers are updated to include the new
addresses.
62
63
Figure 3-12 Configuring access to an object store using IBM Content Navigator
64
65
66
Separate P8 domains
To maintain complete independence between applications, use separate P8
domains. Each P8 domain has its own set of hardware (that can be virtualized),
object stores, and workflow systems.
In deployments with a large amount of content, it is often necessary to roll to
new object stores regularly, and eventually, it might also be necessary to roll to
a new P8 domain. For instance, after 50 object stores have been created in a P8
domain, a new P8 domain can be created to house the next set of 50 object
stores. When the new P8 domain is configured, consider moving the most
current of the old object stores to the new P8 domain using the FileNet
Deployment Manager. If the object stores are connected with active workflows,
the associated workflow systems must also be moved.
A separate P8 domain can also be used to give a global view of corporate data
for reporting purposes. In this type of scenario, Content Federation Services for
IBM Content Integrator is used to federate content from various source P8
domains to a parent P8 domain.
67
68
69
70
g
S torage
areas
g p
Main Location
Load balancer
Database server
GCD
Object st ores
Content Platf orm
Engine servers
WAN
Database server
Object stores
Cont ent Platf orm
Engine servers
Workf o
l w system
Load balancer
Satellite
71
Figure 3-15 illustrates a hierarchical view of the domain, sites, virtual server, and
servers as displayed in ACCE.
Figure 3-15 Hierarchical view of domain, sites, virtual servers, and servers
72
Caching is a building block for distributed systems. IBM FileNet Content Manager
includes caching at the Content Platform Engine level. It is deeply integrated into
the system. The benefits of caching are that it speeds up retrieval and it can be
used by any client regardless of who authored the software. Caching addresses
content objects and can be used for all types of storage. A document can reside
in multiple caches. You can place each cache on the Content Platform Engine
server or a network share. Cache servers can be installed at sites where you do
not need to perform a full Content Platform Engine installation.
The Content Cache configuration can be customized in the following ways:
Threshold size
Defines how much space the cache can use before content is removed from
the cache.
Threshold elements
Defines the number of elements that can be added to the cache before
content is removed from the cache.
Amount to prune
Defines the percentage of content that needs to be removed when a threshold
is reached.
Preload content when created
Loads content into the local cache as the content is added to an object store.
This feature is especially useful when content is typically used at the same
site that the ingestion occurs but the database associated with the object
store is at a different location.
If the content is going to be frequently accessed from a site that is different
than the one at which the content was ingested, consider developing a
custom application to access the content during off-peak hours to load the
content into the cache at the remote site.
Content lifespan
Identifies how long content stays in the cache without being accessed.
73
When configuring the content cache, you need to be aware of the following
information:
The document content access patterns at each site. For example, will
documents need to be viewed within 24 hours of being ingested? Or, are the
documents being ingested primarily for archive purposes and are only
accessed by clients sporadically?
Who will be downloading or viewing the document content and where are they
located?
If the content is being accessed by clients who are geographically close to the
database and storage areas used to store the document information,
configuring a specific content cache for those users might not be necessary.
Any legal requirements regarding the location of content
For more information about using content cache areas, see the following topic in
the P8 Information Center:
http://publib.boulder.ibm.com/infocenter/p8docs/v5r2m0/index.jsp?topic=
%2Fcom.ibm.p8.ce.admin.tasks.doc%2Fp8pcc101.htm
74
Figure 3-16 on page 75 shows a system distributed over two locations, a main
location and a satellite location. Request forwarding is disabled, which is the
default setting.
Storage
area s
Main Location
Loa d balancer
Database server
GCD
Object stores
Wo rkf low syste ms
ps
undt ri
ple ro
Mu lti
W
over
AN
WAN
Database serve r
Applicati on servers
Storage
areas
Load b alancer
Satellite
75
Storage
areas
Main Location
Load balancer
Database server
GCD
Application servers
Object stores
Content Pl atform
Engine servers
WAN
Mult iple roundtri ps ove
r LAN
Database server
Di rectory server (exi sting)
Object stores
Conten t Platform
Engine servers
Satellite
When enabling request forwarding, you declare that each defined object store
has affinity with a specific site. There are two settings associated with request
forwarding: the ability to forward requests and the ability to receive forwarded
requests.
Again, the client (main) addresses IBM Content Navigator (main), which contacts
Content Platform Engine (main). Instead of directly contacting the database (sat),
Content Platform Engine (main) forwards the request to Content Platform Engine
(sat), which contacts the database (sat). Content Platform Engine (sat) gathers
all data and returns it to Content Platform Engine (main). Again, Content Platform
Engine (main) passes the result back to IBM Content Navigator (main) where it is
presented to the client.
In general, when request forwarding is configured, client requests to other sites
are forwarded to one or more virtual servers at the site associated with the object
store. This has the advantage of minimizing the impact of high network latency
because the cost-intensive database access is performed locally via the LAN
instead of through the WAN.
76
At the time that a Content Platform Engine server receives a request, it evaluates
the request to decide whether to forward it or not. For metadata requests, if all
actions in the client request are based on an object store at a different site,
Content Platform Engine will attempt to forward it. At the destination site, the
administrator enables one or more virtual servers to be able to receive the
incoming requests.
Request forwarding is across the Enterprise JavaBeans (EJB) transport layer
only and is only supported across homogeneous application servers.
77
78
Install IBM Content Navigator, Content Platform Engine, the object store
database, and the file store.
In this scenario, the P8 domain has more than one object store. Each satellite
location has a local object store but can still access data stored in other object
stores.
This architecture is useful:
If users at the satellite location primarily access content only in the local
object store.
An independent satellite must store its own data.
If retrieval from the satellite location to the main location is required, add a
content cache area to the main location.
In all the listed configurations, since FileNet Workplace XT is being used
primarily as an administrative tool, deploy it at one site only.
3.6 Conclusion
In this chapter, we described IBM FileNet P8 Content Manager architecture. In
the next chapter, we provide best practices and recommendations when
designing object stores.
79
80
Chapter 4.
Repository design
In this chapter, we introduce the basic concepts and elements that comprise a
repository. Repositories encapsulate not only the content being managed but
also the various metadata elements and infrastructure that support the IBM
FileNet Content Manager functionality. In addition, we describe the repository
design elements and guidelines for using these elements.
We discuss the following topics:
81
82
83
84
This process typically starts by utilizing not SMEs, but solution domain experts.
They understand the technology and architecture of enterprise content
management systems. The top-down approach works its way down to the level
where SMEs must be consulted for the final and essential details.
Designing from the top down involves understanding the global picture and
decomposing the various levels of the design through clear design goals or
specific design choices. It is also an iterative process, driving from the most
abstract toward the most concrete levels. By designing from the top down, the
specific order of design characteristics can be approached in the manner that
makes the most strategic sense for the organization.
The top-down approach has the advantage of developing a design that does not
include any artificial non-technical barriers, for example, historical organizational
structures. It produces a design that emphasizes the strategic requirements of
the solution. This often results in the most flexible and adaptable design for the
future.
The disadvantage of the top-down approach is the difficulty in mapping existing
content and processes into the new design that is developed. As the design
iterations approach the more concrete aspects and need to be mapped directly
to concrete business entities, the process can become conceptually and
politically difficult for knowledge workers, depending on historic organizations.
Recommendations: Even if you start with a top-down approach, keep SMEs
informed and involved at an appropriate level. Let them know that they will be
vitally involved as you get to the more concrete layers. An information vacuum
can create genuine misunderstandings that can make everyones job more
difficult.
85
The design team itself can consist of one or more architects with the specific
responsibility of producing the design. Regardless of the number of individuals in
the design team, there is a clear set of roles and responsibilities that must be
represented. These roles cover both the technical facets of the design as well as
the business facets. The team is usually led by a technical architect who has the
direct responsibility for the overall solution. The team is either populated with
architects and representatives from the following areas or it includes contacts in
the following areas who can provide feedback and direction as needed to the
team without being full-time team members:
P8 Content Manager architect technical role
This is the architect who has the ultimate responsibility for the overall
repository and solution design. This role must be filled by a full-time member
of the design staff who has expert knowledge of the P8 Content Manager
product.
Enterprise architect technical role
This is the architect who is responsible for overseeing the technical fit of the
solution into the existing solution portfolio. This role must be filled by someone
who has expert understanding of the current technology across the
enterprise.
Application architect technical role
This is the architect who has direct responsibility for the specific application or
applications being addressed at this phase of the design and who is
responsible for tracking the business requirements into the solution space.
Enterprise security technical role
This is someone who has expert understanding of the security environments
and models that are used in the enterprise infrastructure. The purpose of this
role is to assure that all existing security policies are followed and to provide
support as needed for security requirements outside of the P8 Content
Manager solution itself.
Legal business role
This role must be filled by someone who has expert knowledge of the legal
requirements of the business sphere in which the solution exists. They
provide guidance about requirements and restrictions on the system that are
imposed for legal, as opposed to business value, reasons.
Knowledge worker business roles
These roles represent the directly affected business workers whose content
and processes are being integrated into P8 Content Manager and who have
the inherent and implicit knowledge of the business that is not usually
captured in any other manner.
86
87
Check the product documentation for the latest list of reserved prefixes. You need
to be aware that additional prefixes might be used by third parties who provide
components for use with Content Platform Engine.
Recommendations: Use a unique prefix for symbolic names that you create.
The choice of prefix is yours. It is typical to use something short but
meaningful.
4.3.3 Uniqueness
Object names across the entire design generally have a requirement for
uniqueness. Unique naming tracks with appropriate naming, that is, when proper
consideration is given to naming objects, the uniqueness typically follows.
Problems can arise when overly abstract names are given to an object where the
same name more appropriately maps at a higher level in the hierarchy.
An example of naming an object Email implies that it is utilized high in the naming
hierarchy whereas we expect a name, such as AgentCustomerEmail, is a good
choice at a low level.
4.3.4 Taxonomy
Taxonomy is the establishment of categorization based on naming. Having a
specific pattern that is applied to names with well-understood definitions for each
name part facilitates an organized taxonomy. Giving initial thought to taxonomy
and developing a taxonomy prior to the actual naming simplify the naming task
and accent the self-descriptiveness of the names.
88
4.3.5 Consistency
Consistency is important so that as the base of people who will be using the
names is broadened, it leads to better understanding and less confusion as the
system moves forward in scope and in age. Establishing good consistency
standards is beneficial in the end. Consistency is facilitated by the complete
application of the ideas that are already presented.
89
storage area is useful because the storage areas are accessed and applied
throughout the lifetime of the system.
For example, Company XYZ has three storage areas in use for the Company
XYZ repository. The first storage area is a file store hosted on the network
accessible protected storage segment of a storage area network (SAN) by a
Network File System (NFS) mount. The second storage area is a fixed storage
area that links to the companys image management system. The third storage
area is a file store on a nearby set of inexpensive disk drives, also through an
NFS mount. These three storage areas are named NFS-RAID, IMAGES, and
NFS-CHEAP.
90
91
Repository
Repository
Repository
Repository
Repository
A repository contains a single object store and potentially one or more storage
areas as shown in Figure 4-2 on page 93. An object store contains definitions,
configuration information, and metadata for the content that is stored in the
repository. The storage areas store the actual content.
92
Repository
Object
Store
Database Content
File Content
File Content
There are four major stages involved in the population of a repository: three
design stages and one production stage. The three design stages include
organizational design, described in 4.5, Repository organizational objects on
page 95, repository design, described in 4.7, Repository design objects on
page 98, and repository content design, described in 4.8, Repository content
objects on page 117. The final stage in repository population is the actual test or
production usage of the repository. The following sections describe the design
stages and their relationships.
During all of these design phases, there are certain commonalities that are
universally, or nearly universally, utilized in the objects of the design.
93
Here, we list several of the system properties that have potential application in
other places of the design:
Class description
The class description contains the description of the class from which this
object is instantiated.
Display name
This label is intended for display to the user.
Descriptive text
This text describes the purpose and meaning intended for this object.
Is hidden
This is a Boolean value that indicates whether the object is hidden in its
current context. This property can affect the user interface. Hidden objects or
classes are generally of interest to application developers for special
purposes but not of interest to users.
Symbolic name
This label is used for internal, programmatic references to the object.
ID
This immutable global1 unique identifier (GUID) can be used to reference a
specific object throughout its lifetime.
Is content-based retrieval (CBR)-enabled
This is a Boolean value that indicates if content-based retrieval is enabled in
the current context of the object.
In addition to the set of properties just covered that applies to all objects in the
system, there is a set of properties that appears in many of the objects that is
important to mention at this level. The following properties are present in most
objects:
Auditing enabled
This property indicates whether the object has its auditing enabled. This is a
switch that enables and disables all audit logging for this specific object and
its scope. Many events can be audited and controlled at a more granular level.
94
In the P8 platform, GUIDs are only guaranteed to be unique in certain contexts. In other contexts,
where uniqueness does not matter, GUIDs are created in a way that makes duplicates unlikely. For
example, it is possible and harmless for a folder and a document to have the same ID value.
Permissions
This property contains the access control list (ACL) for the object. An ACL
consists of a number of access control entries (ACEs). A single ACE contains
either an individual or group from the directory and the authorizations that
entity has in relation to the object (See Chapter 5, Security on page 151 for
more details).
95
Site
Site
Site
Virtual Server
Virtual Server
Server Instance
Virtual Server
Server Instance
Server Instance
Figure 4-3 Repository organizational objects
96
GCD
ObjectStore1
DS/DSXA
ObjectStore2
DS/DSXA
Subsystem Configuration
Domain
Site1
VirtualServer1
ServerInstance1
ServerInstance2
VirtualServer2
ServerInstance3
Site2
TraceLogging Configuration
97
98
Class Definitions
Other Definitions
Workflows
Choice Lists
Properties
Events
Security polices
Other
An object store is conceptually an object like all entities that make up a repository
and that has specific characteristics. Object stores are created through the use of
administration tools. The best practice is to utilize the wizard for object store
creation, which simplifies the interface and ensures that all settings necessary at
creation time are both set and synchronized where applicable.
99
Recommendations: If your design calls for more than a single object store,
create a metastore that can contain all of the design objects that are common
across all of the stores and replicate this as changes are made. If a meta
object store is used, do not roll this store out into production, because it is
strictly a development object store.
When creating an object store, always set the object store administrator to a
valid administrator logon and grant the administrator all permissions.
Content compression
Content that is uploaded to the storage area is compressed if content
compression is enabled for the storage area. Only the content that can be
compressed beyond the compression threshold will be compressed. Content
compression uses blocked-compression technology to divide the uploaded
content into distinct blocks, which are compressed in memory before being
written to disk. If encryption is also enabled, the block is first compressed and
then it is encrypted in memory before being written to disk. Enabling content
compression on a storage area will not encrypt the existing content.
Encryption of content
The content stored in the storage area can be encrypted using the storage area
configuration. Content encryption helps protect the confidentiality of the content if
it is accessed outside of Content Manager. A new key is generated every time
that the encryption is enabled on the storage area. The most recent key is used
to encrypt the new content. These encryption keys are stored in the object store
database in a secured way.
100
Enabling content encryption on a storage area will not encrypt the existing
content. In content replication, the external repository receives the
non-encrypted content. Also, the decrypted content will be submitted for indexing
purposes.
By moving content from one storage area to another, you can enforce the content
encryption, re-encrypt with a latest key, or store non-encrypted content.
Database store
There is a single database store per object store, where the database store is
part of the same database as the Object Store itself. The database store can be
used to store content, but the content will be stored as a database binary large
object (BLOB). Depending on the size of the content, this is not an effective use
of the database and can have serious impacts on the database performance,
especially with the large content.
File store
Recommendations: Using the database store can help with operational
efficiency in some cases, but it almost always represents a performance cost
over file storage.
There can be multiple file stores per object store with each one a separate
directory structure on the server. The file store can be on local storage media or
can be a mount point for remote, or networked, storage media. This is the typical
location that is used for content with different file stores of different media types
used for different content where appropriate.
101
102
Document classes are inherited from a common top-level document class that
contains all of the basic properties that the system needs. Although it is not
technically necessary, it can be useful to create an immediate child subclass of
document for enterprise-wide use. A top-level subclass for the enterprise can
contain all of the metadata items that are the same across all document objects
in the enterprise, either by requirement or policy.
The first level of document class design is concerned with the common
enterprise objects, as opposed to specific application objects. The result of this
first round of design is a hierarchal document class tree that contains all of the
common enterprise document classes that can be leveraged by specific
applications, because they are included in the Content Manager solution. A
reasonable number of properties need to be defined in each class. It is easier to
administer and expand a design where each document class is concerned with a
specific aspect of the design. The resultant tree is typically neither extremely
narrow, nor extremely wide. A narrow tree usually indicates that the class design
has focused too specifically on an aspect and has been too exclusive. A wide
tree usually indicates that there are too many aspects of the design encapsulated
at a level.
Another test that can be applied to the resultant design is to see how various
changes to the design can be made. If there are properties that have historically
changed somewhat frequently or there are any properties that are projected to
change, see what changes need to be made to the design to accommodate the
changes. The ideal is to address a change with a change in a single class. This is
a good indication that you have the proper level of design encapsulation. The
types of changes to consider are property redefinitions, property additions,
property deletions, class additions, class modifications, class deletions, security
updates, functional changes, and organizational changes.
Recommendations: Avoid making many subclasses for a custom document
class. Changes to a higher level custom document class will be propagated to
all its subclasses.
Adopting an enterprise perspective allows the document class designs to
facilitate greater information sharing and collaboration across the enterprise. In
addition to assisting in breaking down information silos, this makes the overall
design much more usable as well. You must always take usability into
consideration during all the design phases. The use of SMEs at this phase can
greatly assist you in meeting the unspoken requirements and usability goals of
users.
103
As a key design object in the system, there are lots of additional components on
which the document classes are dependent. Most of these dependencies are
covered in the specific sections for the dependent elements. Probably the most
important dependency is the usage of the property templates in the class
designs. This dependency underscores the need to be clear and concise in the
property template definitions and consistent with naming and topology across the
entire design.
Finally, try to avoid designing for the current organization without being modular
enough to accommodate change. Avoid carrying over limitations of the current
system that might have been design flaws in the current system or limitations of
the tools that are used to support it. Take into account any current or future
processes in which the content is utilized. That is, always consider business
process automation in the design. Remember that there will always be additional
applications and functional areas that the system will need to support that are not
currently identified or even identifiable.
There are three focus areas that the document class design typically follows:
design based on organization, design based on content, and design based on
function. Although these are the major design approaches that are used,
variations on these themes as well as modifications and combinations of these
approaches are also successfully used. The correct approach to use is highly
dependent on the specific details of your corporation and the application that is
supported by P8 Content Manager:
104
105
Have metadata
Are containable
Are versionable by both content and metadata
Hold content
106
A key design decision that needs to be made is whether the main access
mechanism for content follows the search paradigm (represented in Figure 4-6)
or follows the browse paradigm (represented in Figure 4-7 on page 108). Both of
these paradigms offer their own strengths and weaknesses, and this decision
directly affects how folder classes will be used and instantiated.
Search paradigm
The model for the search paradigm is represented in Figure 4-6 as a dialog box
requesting some information and returning a set of content that meets the criteria
specified in the dialog. The best analogy is accessing a database. Information is
retrieved from a database by formulating a query, which returns a set of data
elements that matches the criteria in the query.
Search Options
Name:
Size:
Cancel
Submit
< Back
Finish
Cancel
The search paradigm is powerful, because it does not rely on the user needing to
know where the content is in the system or the name of the object that contains
the content. Searching also returns a set of objects as an atomic operation; the
maximum size of this set can be controlled as well. This can include objects that
are in diverse places in the repository. Effective use of the search paradigm
requires the selection of meaningful distinguishing properties for the objects that
have meaning to users. It also requires meaningful document classes that are
understood by users as well.
The search paradigm can be fronted with various methods of compiling the
search criteria and usually is best served by designing searches or through
custom interfaces. It is usually a faster and more reliable method of finding
content than is offered by the browse paradigm.
107
Browse paradigm
The model for the browse paradigm is represented in Figure 4-7 as a typical file
system structure. There is some meaningful relationship between the sets of
folders that lead the user to sets of content in an understandable way. The best
analogy is a file system tree structure. Although the analogy presented to help
understand the browse paradigm is a file system structure, a file system folder is
not the same as a P8 Content Manager folder, which supports multiple filed
locations.
The browse paradigm relies on the users who add the content to be thoughtful
and knowledgeable in the manner in which the content is filed. This potentially
includes filing the same content object in multiple folders. There is also a
requirement that the name of the content object has a meaning in its context that
is understood by users.
108
The browse paradigm can increase the time that it takes for the system to search
for content, but it is well suited to users who all understand the basic concepts of
foldering and are used to using foldering for file system access. The browse
paradigm typically takes longer for users to find content than the search
paradigm, and it requires users to have inherent knowledge to be able to reliably
find content.
Recommendations: In most cases, the search paradigm offers a much better
model for performance and maintenance. Avoid too many layers of too many
folders (keep the total number to tens of folders, not hundreds), which can
impact retrieval performance. There needs to be a single, top-level folder class
that extends the base folder class and from which all other folder classes will
be derived.
All property templates and choice lists must be created prior to creating any
folder classes that utilize them. Create all the metadata in the metastore and
export and import the metadata from the metastore using the deployment tool.
When using a browse paradigm, make sure to file the documents in a
meaningful foldering hierarchy. Each folder class encapsulates a single design
aspect.
Have metadata
Are containable
Are not versionable
Are not content
Are containers
109
Have metadata
Are containable
Are not versionable
Hold no content
110
immediate subclasses that the tables are created. Any instances of subclasses
of the custom root classes will be saved in the same table as the custom root
class table:
Abstract persistable
It is similar to a custom object class, which is a collection of properties without
any content associated with it. The instances of this class cannot be filed into
any folder. The immediate subclasses of abstract persistable are each saved
in a different table.
Abstract queue entry
The subclasses of abstract queue entry are intended for the queues managed
by the sweep framework. This class has additional properties that are
required for the sweep-based queue operations. The abstract queue entry
classes will follow the security model for queue item and replication. Access is
defined by the default instance permissions. There is no owner or permission
property on these instances.
Abstract sequential
Abstract sequential is for the external applications queue and log processing.
It provides a single increasing sequence number property that can be used to
process the entries in the order in which the transactions were created.
Important: The two subclasses created by any custom root class are disjoint
because they are in separate table. Deleting the class definition for a custom
root class will drop the associated tables.
111
There are two types of properties: the system properties that come preinstalled
in P8 Content Manager and custom properties that you create for your specific
installation. All of these properties can be utilized in any definitions as you think
appropriate. Typically, there is a rich set of system properties associated with the
base classes. The system properties that are, by default, associated with a class
must be examined to both prevent duplication of information and to understand
what is available to be leveraged by your class definitions.
Property templates must always have a data type associated with them. The data
type can have a cardinality of either single value or multi-value for all data types.
Recommendations: Property templates need to follow a standardized
naming scheme and topology established at the enterprise level. Property
templates need to be generic enough that they can be used in a number of
design classes, but not so generic that they cannot be given a meaningful
name.
Avoid the creation of property templates that are named in such a manner that
it might be confusing to know which template to use. Avoid the creation of
property templates that encapsulate the same informational data but have
distinct names.
112
4.7.9 Annotations
Annotations allow users to link additional information or comments to other
objects, such as documents. These annotations can be in any format, such as
text, audio, video, image, highlight, and sticky note. An annotations content does
not necessarily have to be the same format as its parent document and can be
published separately. Document annotations are uniquely associated with a
single document version; they are not versioned or carried forward when their
document version is updated, and a new version is created.
You can modify and delete annotations independently of their annotated object.
However, you cannot create versions of an annotation separately from the object
with which it is associated. By design, the annotation will be deleted whenever its
associated parent object is deleted. Annotations receive their default security
from both the annotations class and the parent object. You can apply security to
annotations that is different from the security applied to the parent.
The content of annotations is stored in a storage area, as defined by the default
area for the annotation class. The storage area used by the annotation class
needs to be appropriate for the type of content associated with the annotations.
That is, if a large content is being used for annotations, it must not be stored in
the database storage area.
113
114
Document lifecycles are contained in two design classes: the lifecycle policy
class and the lifecycle action class:
Lifecycle policy class
The definition of the documents states. The policy also identifies the lifecycle
action that executes in response to the state changes.
Lifecycle action class
Action that the system performs when a document moves from one state to
another.
Document types in the Content Platform Engine have default lifecycle policies.
You can also assign a default lifecycle policy to any new document class. When
you create a document using a class with an associated lifecycle policy, the
document uses it as a default lifecycle policy. This can be overridden at creation
time by assigning a different lifecycle policy to the document.
Recommendations: Assign lifecycle policies to a document class whenever
possible, instead of assigning them to individual documents. This practice
helps the operator select the correct policy by choosing the document class
associated with the desired lifecycle policy. This practice also prevents
problems that can occur if you need to delete a lifecycle policy.
115
116
117
A containing folder can contain multiple child folders, but each child folder is
directly contained within at most one parent folder. Custom objects and
documents are always referentially contained. For referentially contained objects,
their containment models a many-to-many relationship. A referentially contained
object can be contained within multiple folders, and can also be contained
multiple times in the same folder.
There are two types of referential containment relationships: dynamic and static.
A static referential containment relationship is a relationship between a folder
and a custom object, a specific document version in a version series, or a folder.
A dynamic referential containment relationship is a relationship between a folder
and the current version of a document. In this case, the current document version
is the released version, or else the current version, otherwise, the reservation
version.
Un-filed Repository
Filed Repository
118
One of the primary benefits of filing into a folder is browsing. Browsing allows
users to traverse a folder structure and locate content inside a folder. Hopefully,
all the content in any specific folder relates to a particular activity or function.
Another advantage is that in a P8 Content Manager repository, content can be
filed in more than one folder at a time. There is one master copy of the content,
and references filed in multiple folders point back to the single master.
Remember that with P8 Content Manager, users can always search for and view
any content that meets search criteria whether or not the content is filed in a
folder. Folders are simply a convenience for users who want to browse for
repository content.
There are use cases where unfiled content makes sense. Table 4-1 is a decision
table for the filed option as compared to the unfiled folder option.
Table 4-1 Folder options and their impact
Folder option
Impact
Unfiled content
(does not use
folders)
Filed content
(uses folders)
119
The metadata set that is collected for each content item must include all
properties necessary to identify and retrieve the content. This set must include
the usual properties, such as content title, content subject, and date collected, in
addition to application-specific properties, such as customer name, customer ID,
and account number.
Division
Department
Function
Activity
Document type
Record type
120
121
Example: By function
The next folder structure is based on function. This structure is appropriate for
records systems that are typically organized by the function of the document, the
activity with which it belongs, and the record category under which it needs to be
filed. In this scheme, as shown in Figure 4-12 on page 123, the folder levels are:
(1) Function (2) Activity (3) Document type
122
123
Repository
Level 1
System-wide:
may not be changed
Information Store
Level 2
finance
HR
IT
marketing
Level 3
accounting
Group controlled:
may be changed to
meet group
requirements
legder
tax
audit
projects
Sub-levels
expenses
CA-File
ECM
Marys Files
124
125
Content Manager
Catalog
File Store
Database Store
Fixed
Store
Object Stores
When choosing a storage method for your content, remember that each of these
storage methods can be configured on a per document class basis.
4.9.1 Catalog
The catalog is a relational database that is specified at installation time. The
catalog can be created on any supported relational database management
system (RDMS). Refer to the product documentation for information about
supported brands and versions.
The catalog database stores all of the P8 Content Manager configuration
information. If you expand the object store view using Content Platform Engine
Administration tools, the object tree that displays is pulled from the catalog
database. The catalog stores the following information:
126
Configuration information
Object references
Object properties
Object security lists
Choice lists
Property values
Document (content) links
Search definitions
127
Note: The file names of the content written to this directory will be based on
Global Unique Identifiers (GUID) associated with the document.
Share
Shared directory
(File storage area
Parent directory)
Root
Content
Committed content
Element files
Root
Content
During the creation of the storage area, you have the option of creating a small
(23 x 23 = 529 directories) or large (23 x 23 x 23 = 12,167 directories) file storage
area. The choice of one or the other is typically determined by the anticipated
growth and the need for physically grouping the documents for storage
management, backup, or disaster recovery purposes. A large file storage area is
more suited for storing a large number of small content elements that contain
single-page scanned documents or small emails. A small file storage area is
more suited for a smaller number of content elements with a larger average size,
such as content element files with embedded images, spreadsheets, and
graphics.
Documents are stored among the directories at the leaf level using a hashing
algorithm. We suggest that the best practice is to limit the number of content
element files in a leaf directory to fewer than 5,000. For a small directory
128
structure, the upper limit is around 2,500,000 content element files. For a large
directory structure, the limit is about 60,000,000 content element files. The
number of documents for the file store depends on several factors, such as the
type of content being stored, the size of the content, and the type of the file store
being used. After that, create multiple storage areas. With larger file stores, the
following issues can arise:
Larger file stores take more time to perform the weekly full backup. After that,
consider differential and synthetic backups to avoid full backups or implement
a SAN/NAS-based snapshot and replication.
Consistency checker takes a long time to run.
P8 Content Manager does not have a hard limit on the number of files in the
file store, but the more documents you have, the larger the constraint placed
on the file system.
P8 Content Platform Engine administration tools offer the capability to set a limit
on the number of content elements and size for a certain file storage area. When
either limit is reached, the file storage area is closed and the new content is
directed to the next open file storage area in the storage policy. To help manage a
large storage space across multiple storage areas, Content Platform Engine
farmed or rolling storage areas can be implemented, as indicated earlier.
Both local SCSI disks and SAN are accessed at the block level, so SAN
typically looks like ordinary locally attached SCSI disk to the operating
system. Disk I/O is done at the block or sector level. A driver translates those
SCSI calls into calls through a host bus adapter over a Fibre Channel
network to the SAN device on the other end of the Fibre Channel. The Fibre
Channel network is a specialized optical network used just for connecting
SAN devices to one or many servers through the host bus adapters in those
servers. A host bus adapter is what you called a Fibre Channel card.
Network file shares, on the other hand, are accessed at the file level with
operating system file system calls. So, network shares look like a local file
system, with folders and files, not like local disk, to applications. Underneath
the operating system, a network protocol is used to extend local file system
calls over a standard LAN to the network file server. The act of binding a
129
remote network file system into the local file system folder hierarchy is called
a mount. The network protocols most commonly used for extending file system
calls over the network to the remote file server are CIFS for Windows based
clients, and NFS for UNIX based clients. Network shares can be provided by
a network file server, or by specialized storage devices called NAS devices.
NAS devices plug directly to a LAN and are dedicated to providing network
shares to other computers on that LAN.
- IBM office of the CTO
A SAN Fibre Channel device, a logical unit number (LUN), can be physically
connected to multiple server nodes at a time. However, you need to have a
combined operating system/file system that supports concurrent access to a
shared LUN to be able to share the physical device without using a network file
system, such as NFS.
With standard (non-parallel) file systems (that is, 99% of them, such as UNIX file
systems (UFS) and journaled file systems (JFS)), the Linux or IBM AIX
operating system is written to control the disk directly, and as efficiently as
possible, assuming sole ownership of the device. It can do whatever it wants.
Think of a SCSI disk or a directly attached device where there is normally only
one SCSI master on a SCSI bus, and that is the computer/disk controller.
Therefore, the OS uses cached copies of data structures that are on disk.
Typically, the file system information (inodes, files, and directories) is not
necessarily constantly in synch with the state of the physical device as the
caching occurs. However, the operating system tracks all the moving parts and
ensures that everything stays consistent within the scope of what it controls. This
assumes that no other system is attempting to access the same blocks from a
different port. This leads to two issues:
The OS might make several updates to a structure that is in memory without
writing it out to disk, for efficiency purposes, for example, an inode. So,
another computer reading the disk is unaware of the latest updates.
Similarly, when the OS writes the updates out to disk, it simply writes out the
whole block. If another computer has updated that block since it was first read
in, the other computers updates will be overwritten when this computer finally
does its write. This problem, in particular, rapidly results in a corruption of the
file system that most likely causes both computers to crash.
130
So, even though Content Platform Engine can handle and control concurrent
access at the file level, it cannot control all I/Os going to a certain device. And, it
has no access to a level lower than the file system, that is, to the physical disk
block level. Content Platform Engine works by using the API provided by the file
system. The fundamental problem is whether the file system underlying that API
supports concurrent access to the disk. So, this is why Content Platform Engine
needs to rely on a network file system, such as NFS, to go beyond the scope of
any one machines operating and file system to handle concurrent access to a
single physical device.
Thus, a SAN or network protocols simulating a SAN, such as iSCSI, cannot be
used for a file storage area if the Content Platform Engine servers run on
different host machines. A SAN cannot control concurrent write access to the
same directory if the requests come from different host machines. A SAN can
only be used as a file storage area if all the Content Platform Engine servers that
are writing to it are on the same host machine. In this case, the operating system
on the host machine can control the concurrent write access to a common file
store directory structure.
131
Repository
Doc class A
Storage
Policy1
Doc class B
Storage
Policy2
Disk Drive
Storage Farm
Recommendations: Use storage policies with the class definition to save the
content.
Farming
A storage area farm is a group of storage areas that acts as a single logical
target for content storage. Storage area farms increase the throughput by
distributing the I/O load across all the open storage areas. With storage area
activation, the Content Platform Engine maintains the same number of open
standby storage areas for the policy, therefore, better distributing the I/O load
across the standby storage areas. With farming, Content Platform Engine
provides load-balancing capabilities for content storage by transparently
spreading the content elements across multiple storage areas. Therefore, the
storage policy functions as both the mechanism for defining the membership of a
storage area farm and also the means for assigning documents to that farm.
Create separate file storage areas to ensure efficient document management.
For example, you can create a file storage area to group documents with the
same deletion or backup requirements. Map storage areas with documents by
modifying the storage policy property on document classes.
Recommendations: Use Content Platform Engine administration tools to
configure storage policies and storage area farms.
132
133
Except under extreme conditions, size is not a factor in the decision to add
additional object stores.
Multiple object stores are warranted in the following situations:
An object store is subject to high ingestion rates or frequent update
procedures and needs to be segregated for performance reasons.
Content must be separated for security reasons.
User groups are separated by a large geographic distance.
134
Email
Manager
Content
Obj t St
Records
Obj t St
Use the Data Source sharing feature. For more information, see the Sharing
Data Sources and Creating a Database Connection topics in Administering
Content Platform Engine:
http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/index.jsp?topic=%2Fcom.
ibm.p8.ce.admin.tasks.doc%2Fp8pcb027.htm
135
By geography
Many organizations have large offices in several countries. Wide area network
(WAN) links are expensive over large distances and typically have low bandwidth
and high latency. It is not always practical for offices in this situation to share the
same P8 Content Manager system. One solution to this situation is two separate
repositories managed by two separate P8 Content Manager systems as shown
in Figure 4-18.
WAN
Link
Location 1
Repository
Location 2
Repository
By functional group
In the organizations with several functional units, each functional unit, such as
Human Resources (HR), Legal, and Marketing, might want to have a separate
object store. This separation of data offers the flexibility for the users to control
the documents and implement the security and access rights for the users in the
organization. This separation also allows the line of business (LOB) applications
to use only the data pertaining to that business unit.
136
137
Class-level retention
The retention period you specify at the class level is applied to newly created
object instances and to document instances when the document is checked in.
Class-level retention can be set to the annotation, custom object, document, and
folder classes. Users need to have the modify retention permissions at the
object-store level to set the class-level retention.
Object-level retention
The objects of annotation, custom object, document, and folder can be set with
the retention date at the time of object creation. By default, objects inherit the
retention value from the parent class. To set or change the retention date for an
object, users need a special modify retention permission. These special
permissions are not required if the retention is defaulting from the class.
For static retention, users have to specify the retention date at the time of
creating the object or the exact value has to be specified in the class as the
retention date. Event-based retention is a scheme where you first set the
retention to Indefinite, which means it cannot be deleted, but the expiration
date is not defined. When a business event occurs, you initiate or trigger the
retention by changing the retention value from Indefinite to a specific date.
Another special value, Permanent, never lets the document delete from the object
store. The retention value can only be set to a greater value than the current
value. Content Platform Engine does not allow you to reduce the retention
period.
Retention modes
There are two retention modes that are supported by storage areas in Content
Platform Engine: aligned and unaligned. Only annotations and documents can
have the content in Content Manager. In aligned mode, the retention value is
reflected on the content stored on the fixed content device so that users cannot
directly delete the content from the fixed store and from the Content Platform
Engine. In unaligned mode, all the retention features are supported by the
Content Platform Engine, and the content in the fixed store is not stored under
retention. In unaligned mode, users can delete the content directly on the fixed
store, because the enforcement of retention happens in Content Manager, not by
the fixed store.
Note: Use aligned mode when you want strict enforcement of retention on the
content object in the document. Use unaligned when all you need is
enforcement by Content Platform Engine.
138
139
140
Search
Search can be customized in FileNet Workplace XT and Content Navigator by
individual users. Search appears when users log in to P8 Content Manager.
Using FileNet Workplace XT or Content Navigator search is an ideal tool for
user-invoked ad hoc searches for repository content. Users can search any
properties and can add any system or custom property to the criteria display.
Note: When users modify their search criteria, the system remembers the
settings and will display them again on the next visit to the site.
Stored searches
FileNet Workplace XT and Content Navigator offer a tool for designing search
templates for more sophisticated content searches. Search Designer offers the
following enhanced features:
141
The Content Platform Engine uses the IBM Content Search Services (CSS)
server for indexing and searching the documents. Content-based searches can
be performed from all P8 Content Platform Engine client search tools. Content
Platform Engine has the capability to fail over during indexing and search, and
has supporting configurations with no single point of failure.
Indexing process
The indexing process begins at the Content Platform Engine when CBR-enabled
objects, such as documents, are created or updated. The Content Platform
Engine stores indexing data for the CBR-enabled objects in the indexes created
and managed by the CSS servers. Each index is associated with a distinct index
area in the object store. During an indexing process, the system can write to
multiple indexes across the index areas. When an indexs capacity is reached,
the index is automatically closed and a new index is created.
The Content Platform Engine queries the items from the index request table to
identify documents that are queued for indexing and then groups index requests
pertaining to the same target index into an index batch. The binary documents in
this batch are converted to text by the text extraction processes, then the entire
batch is submitted to a CSS server for indexing.
Text extraction happens in an external process running outside the Content
Platform Engine. Text extraction runs on the Content Platform Engine server by
default. All the Content Platform Engine servers in a site can dispatch the
requests for indexing. This allows the Content Platform Engine to share the text
extraction load among all the available servers because the text extraction
processes can be CPU-intensive and disk I/O-intensive. Text extraction
throughput can be configured by using the Content Platform Engine
administrative tools. The text extraction processes are also known as text filters.
The number of text filters for Content Platform Engine can be increased or
decreased based on the CPU utilization and the available system memory of the
server. During text extraction, the text filter process writes the intermediate
temporary data to the text filters temporary directory. Having this temporary
directory on a fast I/O device will increase the performance of the text filters.
Note: When IBM Content Collector is used with the Content Platform Engine,
text extraction for the binary documents occurs on the CSS server via the IBM
Content Collector plug-in.
After the batch of documents is processed by the text extraction processes, the
text file batch is submitted to the CSS server. After the CSS index server receives
the index batch, the preprocessing functions begin.
142
143
Search process
Users can submit CBR queries against the full-text index with its criteria:
CBR-enabled properties or terms that exist in the content. Search requests are
initiated through the Content Platform Engine administration tools or other client
applications using the Content Engine API and include a full-text expression that
is submitted to the CSS search server. The content-based search expression is
highlighted in the following query:
SELECT d.This FROM Document d INNER JOIN ContentSearch c ON d.This =
c.QueriedObject WHERE CONTAINS(d.*,'lion AND tiger')
The search server uses word stems, synonyms, and stop words to improve
search efficiency and accuracy. It searches for and identifies the stem for all word
terms included in a full-text search expression. A stop word is a word or phrase
that is ignored by the search server to avoid irrelevant search results caused by
common expressions. The search server uses these definitions on the index and
runs the full-text search. The results are returned to the Content Platform Engine
server, which then joins the results with other tables in the query and runs the
query. The stop words do not affect the indexing, and they appear in the indexes
created by the CSS server.
144
The Content Platform Engine server runs the searches concurrently. The search
server configuration on the domain allows full-text indexes to be searched in
parallel to satisfy user queries. With content-based searches, you can search the
content based on the words and phrases, string properties of a CBR-enabled
object, partitioned properties, and also by using the XML and XPath queries.
Recommendations: If query criteria includes partitionable properties,
consider using index partitions to reduce the number of indexes to search by
increasing the speed of the search. Index partitioning increases the number of
indexes created. If there are no partitionable properties on the query criteria,
we advise that you not use the partition. Search with order by rank reduces the
performance, so use this search only when required. Ranking is determined
by the CSS server.
Content-based searches tend to be slower when searches are run
concurrently with indexing. Dedicate servers for content search because I/O
and memory are the important factors in the search operation. We advise that
you have up to 6 GB of memory for each CSS server. Content-based searches
on property will perform better than the content and XML searches.
Index areas
An index area is a file system directory that contains CSS indexes. Each object
store can have multiple index areas and each index area can have multiple
indexes. The index contains the indexing information for the objects that belong
to the same indexable base class or subclasses of the base class. Index areas
can have different states: OPEN, CLOSED, STANDBY, or FULL. Unrelated to the
status of the index area, all the indexes in the index areas might be searched.
Important: Index area root directories need to be unique among index areas
even if the root directory path is on the local disk.
An affinity group is a group of CSS index servers and index areas. The servers
in a group access only those index areas in the same group. The servers that are
not in a group access only those index areas that are not in a group. Although the
configuration of affinity groups is optional, it is a good practice to have multiple
CSS servers assigned to an affinity group and to have the root directory local to
the CSS servers indexing. All the servers in the affinity group must have
read/write access to the root directory.
145
Configure the number of index areas to less than or equal to the number of CSS
servers in the indexing mode. During content ingestion, you might want to have
an equal number of indexing servers and index areas to keep all the indexing
servers busy. You might want to consider having additional indexing servers for
failover purposes.
Important: The location specified for the index areas needs to be accessible
for read and write for all the CSS servers. If an index area is assigned to an
affinity group, it needs to be accessible to all the servers in the affinity group.
146
In Query builder, there are two ways to construct searches: Simple View and
SQL View. Select view from the toolbar to select a view style:
Simple View offers a point-and-click interface where you can select tables,
classes, and criteria from drop-down lists.
SQL View translates anything that you create in Simple View. This is a
one-way translation only; you cannot translate an SQL View into a Simple
View. SQL View presents the query in an SQL text window that you can then
directly edit or load any *.qry files that you have saved on the network.
Both views construct a query that can be bundled with the other Query Builder
features: bulk operations, scripts, and security changes. Both views support
Search Mode and Template Designer Mode.
Tip: To aid administrators using SQL View, the P8 Content Manager help files
contain P8 Content Manager database view schema.
Multiselect operations
Multiselect (or bulk) operations perform an operation on all objects returned in
the search results from the query builder query. This feature is useful for object
store maintenance activities.
147
With multiselect operations, you can perform the following actions on multiple
files at the same time:
Delete
File to folder
Unfile from folder
Undo checkouts
Change lifecycle states
Add to security ACLs (you cannot delete existing entries)
Run an event action script
For example, assume that several documents had been checked out by someone
who left your company. Using multiselect operations, you can search for all
documents that were left checked out by that person and undo these checkouts
in one operation. To do this, you use the Query Builder to construct a search to
find all documents currently checked out under the former employees system
login name.
148
4.13 Conclusion
In this chapter, you learned about the basic concepts and elements that
comprise a repository and repository design. While designing the system, ensure
that you have someone to look after the design of the repository. Use the prefix
for the symbolic names in the repository that uniquely identifies your solution and
does not interfere with Content Platform Engine symbolic names and naming
conventions. Create a meta object store and import the metadata from the meta
object store to other object stores. Ensure that you create a prototype to validate
your design before implementing it. Consider the best practices and performance
considerations before finalizing the design.
149
150
Chapter 5.
Security
This chapter describes the security mechanisms provided by the Content
Platform Engine to secure the resources under its management against
unauthorized access and to ensure that authorized users are given only sufficient
access to carry out the tasks assigned to them, referred to as access control. In
addition, it provides a series of recommendations for how best to employ those
mechanisms to achieve the desired access control goals.
We discuss the following topics in the chapter:
Access control
Authentication
Authorization
Security best practices
151
5.2 Authentication
All requests submitted to the Content Platform Engine are subject to
authentication, meaning that they must carry within them a verifiable (authentic)
identity of the entity making the request. The Content Platform Engine itself does
not define or implement authentication; rather, it delegates that to a standard
service of the application server environment in which it executes - the Java
Authentication and Authorization Service (JAAS).
The outcome of JAAS authentication is required to be an identity that the Content
Platform Engine can resolve to a user in a directory service containing
configured users and groups, from which a security context can be built and
delivered into the authorization phase of access control. These elements of
authentication are described in detail.
152
Chapter 5. Security
153
Novell eDirectory
Oracle Internet Directory
Oracle Directory Server Enterprise Edition (formerly known as Sun Java
System Directory Server)
CA Directory
IBM Tivoli Directory Server
In addition, support is provided for IBM WebSphere Virtual Member Manager
(VMM), which is a directory service aggregator, capable of presenting a
collection of directory services of heterogeneous types as a single unified virtual
directory.
Alongside users, the directory service can (and usually will) define a number of
groups. A group is a container of users and possibly of other groups, although
not all directory services support nesting of groups. The contained users and
groups are said to be members of the containing group. A group provides a
convenient way to grant or deny access to the group members in a way that
adapts automatically to changes to membership in the group.
Users and groups are referred to collectively as security principals and the
authorization mechanisms described next are expressed exclusively in terms of
security principals, mostly without regard for whether a particular principal is a
user or group.
For each security principal, the directory service is required to provide an
immutable identifier, which unambiguously resolves to that principal. This
identifier is unique within that directory service and can be used to retrieve the
directory service object having that ID. The form this unique identifier takes
varies according to the directory service type and on configuration choices made
by the owning organization. No matter what form it has in the underlying directory
service, the Content Platform Engine transforms it to a universal format when it is
presented in the API. This universal format is referred to as a security identifier
(SID). SIDs are what the Content Platform Engine stores in its internal
authorization data structures.
A security principal also has a distinguished name, a principal name, and a short
name. The distinguished name (DN) is unique, and the principal name and short
name can be also (particularly the former).
In addition to SIDs obtained from directory service principals, the Content
Platform Engine acknowledges two special SID values, which have particular
purposes described later. These SIDs have an associated predefined display
name and do not resolve to a directory service object. They must be treated
specially by API consumers, including administration tools.
154
Chapter 5. Security
155
5.3 Authorization
Authorization is the second phase of access control, and is the phase in which a
determination is made of the operations the caller is permitted to carry out on a
particular object.
156
Right
Applies to
Description
READ
All types
WRITE
All types
LINK
Folder,
Document, a few
others
UNLINK
Folder
MINOR_VERSION
Document
Right
Applies to
Description
MAJOR_VERSION
Document
CREATE_
INSTANCE
Class Definition
CREATE_CHILD
Domain, Folder,
and Class
Definition
CHANGE_STATE
Document, Task
PUBLISH
Document
DELETE
All types
READ_ACL
All types
WRITE_ACL
All types
WRITE_OWNER
All types
CONNECT
Object Store
STORE_
OBJECTS
Object Store
MODIFY_
OBJECTS
Object Store
REMOVE_
OBJECTS
Object Store
WRITE_
ANY_OWNER
Object Store
Chapter 5. Security
157
Right
Applies to
Description
ADD_MARKING
Marking
REMOVE_
MARKING
Marking
USE_MARKING
Marking
PRIVILEGED_
WRITE
Object Store
MODIFY_
RETENTION
Object Store
VIEW_
RECOVERABLE_
OBJECTS
Object Store
158
Owner
The Owner field stores the identity (SID) of the security principal that is
considered to own the object and is granted certain rights that cannot be
revoked by the permissions list. The owner is usually a user and in the majority of
cases will be the user who created the object, but there is nothing (other than the
rules) which requires that to be the case. In particular, it is allowed for the owner
to be a group (in which case, any member of that group is considered to own the
object).
Every SD has a slot for the owner but it is not always populated. There are a
couple of reasons why this can be the case:
For some objects, the concept of an owner is not considered meaningful, so
no mechanism is provided to set or retrieve the Owner field and there is no
default value initialized in it. This applies to all GCD objects and to a small
subset of repository objects.
The owner might have been explicitly set to null (or defaulted so) deliberately
so that no one has the additional access rights conferred by ownership.
In cases where there is an exposed mechanism to modify the Owner field (which
for the majority of repository objects there is, through the Onwer property), doing
so is subject to the following rules:
If the user has WRITE_ANY_OWNER access to the object store in which the
object resides, the Owner can be set to any valid real SID or to null. (It might
not be set to one of the special SIDs).
If the user has WRITE_OWNER access to the object in question (but not
WRITE_ANY_OWNER access), the Owner might only be set to the users
own SID (which is also known as take ownership) or to null.
Otherwise, the Owner cannot be modified.
Permission list
The permission list, referred to as the access control list (ACL) is a collection of
permissions or access control entries (ACEs), each of which grants or denies a
set of access rights to a security principal.
The order of the ACEs in the ACL has no particular significance, although when
the ACL is exposed in the API as a collection of permission objects, it is ordered
by source and type to match the order of precedence applied by the access
check. The ACEs in the ACL are generally a mix of automatically assigned
entries and entries applied manually through the API. See the Source field
described next.
Each ACE has the following fields:
Chapter 5. Security
159
Grantee
The SID of a security principal to which the ACE grants or denies access
rights, which can be either a real SID or one of the special SIDs. In the API,
the SID is exposed as the name of the security principal (GranteeName
property).
Type
Either Allow or Deny. An Allow ACE grants permissions, and a Deny ACE
revokes them.
Access Mask
A bitmask of access rights granted or denied by this ACE.
Inheritable Depth
An integer value determining the extent to which this ACE can be inherited by
security (grand)child objects.
Source
An automatically populated (read-only) enumerated type indicating the origin
of the ACE:
Direct - indicates an ACE that either was added manually through the API
or is a modified Default-sourced ACE.
Default - an ACE that was added by default (described in 5.3.3, Default
security descriptor on page 161) and has not yet been modified.
Template - an ACE that was added as the result of applying a security
template (described in 5.3.4, Security templates on page 161).
Parent - indicates an ACE that was inherited from a security parent
(described in Inheritance proxies on page 165).
Proxy - indicates an ACE that originates from a full security proxy
(described in Full proxies on page 164).
ACEs with a source of Template, Parent, or Proxy can neither be removed from
the ACL nor modified in any fashion through the API. ACEs with source Direct or
Default can be removed through the API and allow modifications to the Type,
Access Mask, and Inheritable Depth fields (Grantee cannot be changed). If an
ACE with source Default is modified, the source is updated to Direct. (Default is
purely presentational; functionally, it is treated no differently than Direct). ACEs
with source Parent do not form part of the stored security descriptor but will
appear in the API representation of the effective ACL of an object. ACEs with the
source Proxy are similar.
160
Chapter 5. Security
161
Object model
A security template is essentially just an ACL tied to an identifier that determines
the object state in which that template is applied.
Security templates come in three types with different packaging:
VersioningSecurityTemplates and ApplicationSecurityTemplates are fully
fledged dependent objects exposing a TemplatePermissions collection
(accessing the underlying template ACL), a GUID property ApplyStateID,
which identifies the state to which the template is applicable, and a boolean
property IsEnabled, which allows application of this template to be disabled.
For a versioning template, the ApplyStateID property is constrained to one of
four well-known values representing the four versioning states: Reservation,
In Progress, Released, and Superseded. For an application template, it might
have any (application-determined) value except those four well-known values.
A document lifecycle template is not a separate type of object. The template
ACL is simply part of the DocumentState definition object (again, exposed as
TemplatePermissions property), with the applicable state being implicitly that
of the DocumentState object. A boolean property ApplyTemplatePermissions
determines whether the template ACL is applicable during transition to this
state.
A group of Versioning and Application security templates representing all that
are applicable to a particular object or set of objects are collected together in
a SecurityPolicy object, which is associated to the relevant objects via their
SecurityPolicy property (which is often defaulted). Folders, documents, and
customer objects (collectively Containables) have the SecurityPolicy property
and participate in the template mechanism, although Versioning templates
are only relevant for Documents.
DocumentState objects, with their attendant template ACLs, are collected
together into a DocumentLifecyclePolicy, associated with relevant Document
objects through their DocumentLifecyclePolicy property.
Both SecurityPolicy and DocumentLifecyclePolicy have an additional Boolean
property PreserveDirectPermissions, which affects the manner in which
templates from that policy object are applied.
Template application
A template is applied to an object under the following circumstances and subject
to the noted constraints:
During a versioning state change (for example: Checkin, Promote, or
Demote), if the document has a non-null value for its SecurityPolicy property
162
5.3.5 Proxies
Proxying is a mechanism by which the security descriptor of one object (the
proxied object) can be replaced by or augmented by the security descriptor of
another object (the proxy object). Replacement is referred to as full proxying.
Augmentation is referred to as inheritance or partial proxying. In both cases, the
relationship between the proxied object and a proxy is represented by a singleton
object-valued proxy-defining property of the proxied object. Setting a value for
the property establishes (activates) proxying and removing the value severs the
Chapter 5. Security
163
Full proxies
Assigning a value to a proxy-defining property of type Full results in the effective
security descriptor (and active markings) of the proxied object being completely
replaced by that of the proxy. In this case, the Permissions collection presented in
the API for the proxied object will reflect exactly the ACEs in the effective security
descriptor for the proxy, but with the special source type Proxy. The Permissions
collection and the Owner property will both be read-only.
A class can define multiple proxy-defining properties of any mix of types, but
among those of type Full only one can have a value at a time. (An attempt to set
a value for a second full proxy-defining property will incur a constraint-violated
error.) This places some limits on the usefulness of the feature, since if two
applications want to full proxy a particular object in two ways, they can only do so
164
Referential Containment
Relationship
Tail
Folder
Component Relationship
Parent Component
Document
Hold Relationship
Hold
Hold
Sweep Relationship
Sweep
Controlling Object
Sweep Job
Controlling Object
Sweep Policy
Thumbnail
Input Document
Document
Inheritance proxies
Security inheritance, like its genetic counterpart, is the passing of inheritable
traits - in these cases, ACEs rather than genes - from a parent (the proxy) to a
child (the proxied object). This analogy (and a certain amount of history) is the
reason behind the use of the ACE source designation Parent. The analogy
becomes a bit stretched when considering that a proxied object can have any
number of parents, represented by values in an arbitrary number of inheritance
type proxy-defining properties.
The effective security descriptor for an object with active inheritance proxy
relationships is constructed by combining the stored SD with copies of the
inheritable ACEs from the effective security descriptor of each proxy. ACEs from
Chapter 5. Security
165
different proxies are combined on an equal footing and all are given source
Parent, regardless of the source for the ACE from which the copy was made.
Inheritability of an ACE is determined by its Inheritable Depth setting, with the
following rules:
A value of 0 means that the ACE is not inheritable and will not appear in the
ACL of the proxied object.
A value of 1 indicates that the ACE will be inherited through one level of
proxying only (the administrative UI refers to this as Immediate children).
When such an ACE is copied to an inheriting object, the copy has Inheritable
Depth 0 as a reflection of not inheriting further.
Other positive values similarly specify inheritance through a maximum
number of levels, with the value being decremented as a copy is made for
each inheritance step.
A value of -1 allows the ACE to be inherited to arbitrary depth, through any
number of transitive proxy relationships.
Other negative values have a subtle effect, because these ACEs (which are
referred to as inherit-only) are not considered when evaluating access for the
object on which they appear. They can be inherited, with their inheritable
depth being transformed as follows on the first inheritance step:
-2 becomes -1, thus it means inherit-only to any depth
-3 becomes 0, thus meaning inherit-only to immediate children
-4 becomes 1, -5 becomes 2, and so on.
As well as ACEs that are directly applied, those that appear in a default instance
ACL or template ACL can also be marked as inheritable. Thus, for example, a
security template can apply inheritable ACEs. A number of system classes have
built-in inheritance proxy-defining properties, as shown in Table 5-3.
Table 5-3 System classes and their properties
166
Proxied class
Property defining
inheritance proxy
Folder
Parent
Folder
Document
Security Folder
Folder
Custom Object
Security Folder
Folder
Annotation
Annotated Object
Containable
Task
Coordinator
Containable
Class Definition
Superclass Definition
Class Definition
Proxied class
Property defining
inheritance proxy
Recovery Item
Recovery Bin
Recovery Bin
Of particular note are the first three. A folder can inherit from its parent folder, and
it from its parent, and so on, all the way to the root folder. Documents and custom
objects can inherit from a folder, which is usually a folder in which the object is
filed, although that is not required. Thus, by combining these two features, it is
possible to manage security for the entire folder tree or for a subtree and all its
contents from the root of that tree/subtree.
The properties with names in italics in Table 5-3 on page 166 all require a value.
It appears impossible to have, for example, a folder that does not inherit from its
parent. (In contrast, the security folder property does not require a value, so
inheritance is always optional.) To overcome this deficiency, in three of the cases
in Table 5-3 on page 166, a companion Boolean property is defined that disables
this apparently required inheritance, shown in Table 5-4.
Table 5-4 Proxied class and inheritance
Proxied class
Folder
Task
Recovery Item
The default value for each of these properties is True, which means that
inheritance is enabled. Setting the value to False disables inheritance. The effect
applies solely to inheritance through the system property from the preceding
table; any additional custom proxy-defining properties are unaffected.
Note: To date, no use case disables inheritance for class definitions or
annotations.
5.3.6 Markings
The origin of this mechanism is the military and intelligence notion of document
classifications and clearance levels:
Each person is assigned a clearance level indicating the maximum sensitivity
of documents they are allowed to see.
Each document is labeled with a classification indicating its sensitivity.
Chapter 5. Security
167
A person is allowed to see only documents that are classified at or below their
clearance level. For example, someone with Top Secret clearance level can
see Top Secret, Secret, and Unclassified documents. Someone with only
Secret clearance level cannot see Top Secret documents.
Furthermore, a person who is responsible for classifying new documents can
do so only up to their own clearance level. Therefore, a person with only
Secret clearance cannot classify a new document as Top Secret.
This approach is sometimes referred to as labeling or as mandatory access
control (in contrast to the features described earlier that can be categorized as
discretionary access control).
The Content Platform Engine manifestation of this type of access control is via
marking-controlled properties. A marking-controlled property is in every respect
an ordinary string property (single-valued or multi-valued), except that its
permitted values (called markings) are drawn from an administratively defined
set called a marking set. Its current values influence the access granted to the
object in a manner that is detailed next.
This approach generalizes upon the basic classification/clearance mechanism in
several dimensions:
An object can have multiple marking-controlled properties, drawn from
multiple marking sets, combining in their influence on the access granted. For
example, there might be a Classification property drawing from the Clearance
Levels marking set, and a Project property, designating a project from the
Projects marking set. The combined effect depends on the clearance levels
assigned to different users and the projects on which they are permitted to
work.
A marking set can be defined as imposing a hierarchical order of precedence
on the markings defined within it (for example, in the clearance levels
example, where Top Secret > Secret > Unclassified. Or, it can define the
markings as unrelated (likely be the case in the projects example).
In the classification/clearance scheme, failing to have sufficiently high
clearance with respect to a particular document acts as a blanket off switch you are not allowed to see or in any way manipulate that document. In
contrast, markings offer fine-grained control by way of an access mask that
can revoke any or all of the rights granted to the object via its security
descriptor.
The right to apply and remove a particular marking can be assigned
independently of the right to use objects to which that marking has been
applied. (Compare with the statement regarding classifying new
documents.) Therefore, someone might, in principle, be given the right to
classify objects as Top Secret even though that person only has clearance to
168
Chapter 5. Security
169
READ
READ + READ_ACL
WRITE
DELETE
DELETE
Object retrieval
This form of access check takes place during a GetObjects operation (or
equivalent) and during the refresh phase of an ExecuteChanges operation, with
the purpose of determining whether the caller is allowed to see the object and
whether it needs to form part of the response.
170
For GCD objects and ClassDescriptions, being allowed to see the object is
simply based on the presence of the READ access right in the effective access
for the object, evaluated from an ACL as described next.
For repository objects, the access check is in two parts, both of which must be
satisfied:
The effective access evaluated from the ACL of the object store in which the
object resides must include CONNECT right.
The effective access evaluated from the objects effective security descriptor
and markings must include either READ or WRITE_OWNER permission (or
both). The rationale for the inclusion of the latter is that it allows the owner of
an object or someone with WRITE_ANY_OWNER access to the object store
to retrieve the object regardless of the state of the ACL. Therefore, it provides
an escape mechanism by which the ACL can be repaired if it has been put
into a state that grants no access to anyone.
In the event that the access check fails, the intent is that the system behaves
exactly as though the object did not exist. Therefore, the following actions occur:
If the object is being retrieved directly, a not found error is returned.
If the object forms part of a collection, it is simply dropped from that collection.
If the object will be returned as the value of a singleton property, a null value is
substituted.
If the access check succeeds, the object is returned with values for all the
properties dictated by the property filter included, except for these properties:
Any values that are independently persistable objects reached through
recursion are subject to an independent access check and handling as
described earlier.
The Permissions collection and the Owner property are returned as
empty/null unless the effective access mask that was used for the primary
access check also includes either READ_ACL, WRITE_ACL, or
WRITE_OWNER.
If the property filter demands that content be returned, an additional check is
made for VIEW_CONTENT access in the primary effective access mask.
ExecuteChanges
ExecuteChanges is the name of the underlying server request type through
which creation, updates, and deletions are performed.
Chapter 5. Security
171
172
Chapter 5. Security
173
effective access mask. This rule must be satisfied for every newly set or modified
property; otherwise, the whole operation will fail.
For most system properties, there is no explicitly defined MAR, and for those
system properties, an implicit mask consisting of just WRITE right is used. For
most properties, setting or modifying them simply requires WRITE permission to
the object to which the properties belong. A number of system properties do
however have an explicit MAR (Table 5-6).
Table 5-6 Properties that explicitly define modification access
Property
Permissions
WRITE_ACL
Owner
WRITE_OWNER
ReplicationGroup
WRITE_ACL
SecurityPolicy
WRITE_ACL
CmRetentionDate
WRITE_ACL
SecurityFolder
WRITE_ACL
DefaultRetentionPeriod
WRITE_ACL
WRITE_ACL
WRITE_ACL
WRITE_ACL
174
the rights present in the TAR are also present in the effective access. The
default (implicit) TAR is READ, but many system properties have more
stringent requirements.
For example, the TAR for the RCR.Tail is LINK. Therefore, it is necessary to
have LINK permission to a folder in order to file things into it.
As with the MAR, for custom properties, the TAR is controlled by the property
definition author.
There is no additional access check for unsetting (nullifying) a singleton OVP.
Only the MAR check applies.
Marking-controlled properties
Values placed into or removed from a marking-controlled property (MCP) are
subject to access checks on the marking objects corresponding to those
values:
If setting a value of a singleton MCP where previously there was no value,
the caller must have ADD_MARKING effective access to the
corresponding marking object.
If nullifying a singleton MCP, which previously had a value, the caller must
have REMOVE_MARKING access to the marking object.
Changing the value of a singleton MCP from one non-null value to another
non-null value is treated as a remove followed by an add. Therefore, it
requires REMOVE_MARKING access to the initial values marking and
ADD_MARKING access to the new values marking.
For updates to a list MCP, the server computes the net deltas - additions
and removals - and demands ADD_MARKING or REMOVE_MARKING
access to the corresponding marking objects.
The manner in which the effective access to a marking is determined is
described in Markings check on page 177.
Search execution
Search applies an access check to any object contributing columns to a result
row. (In a joined query, each From class is considered as contributing to the row,
even if no properties of that class appear in the SELECT list.) This access check
is the same as for object retrieval - READ or WRITE_OWNER rights are required
- except that if the query includes a CBR clause, VIEW_CONTENT is also
required. A failure is handled in the same way that it is handled for collection
retrieval, that is, the row is simply dropped from the result set.
The access check is accomplished by adding columns to the SELECT list
specified by the caller as needed to determine the security descriptor and the
values of any proxy-defining and marking-controlled properties for each
Chapter 5. Security
175
contributing object, from which can be determined the effective access to the
object.
Effective access
Key is the calculation of effective access to an object, determined from the
effective security descriptor and markings applicable to the object. The overall
effective access is, in general, the result of combining three bitmasks, produced
by evaluation of the owner, ACL, and markings against the security context of the
caller, in the following way:
effective-mask = (owner-mask | ACL-mask) & ~markings-constraint-mask
The owner mask and ACL mask are combined, and then any rights constrained
by markings are subtracted. Note a couple of key points:
Denial ACEs in the ACL do not (cannot) revoke rights granted through
ownership.
Markings do not grant additional rights, they only revoke rights, and that
revocation is absolute (cannot be overcome via the ACL or ownership).
For an object without markings, the markings check is skipped and zero is used
for markings-constraint-mask.
Owner check
Subject to the following caveat, the Owner check returns a bitmask of either all
zeros or of READ_ACL+WRITE_ACL+WRITE_OWNER, depending on whether
the caller is determined to be an owner of the object. That determination is made
by comparing the Owner SID from the security descriptor with the SIDs in the
security context. If there is a match with any of those SIDs (user or group), the
caller is an owner and receives the non-zero owner mask. If there is no match (or
if the Owner SID is null), the owner mask is returned as zero.
The caveat is that when evaluating access to a repository object, possession of
the WRITE_ANY_OWNER right to the object store in which that object resides
causes WRITE_OWNER right to be added to the owner mask. (This obviously
only has impact if zero is otherwise returned.)
ACL check
The ACL evaluation phase combines the access masks from the ACEs in the
ACL according to the following rules:
Only ACEs whose grantee SID is present among the SIDs in the security
context are considered, while all others are ignored. The only relevant ACEs
are those whose grantee is the calling user or a group of which that user is a
member.
176
=
=
=
=
=
=
parent-allow-mask
ACL-mask & ~parent-deny-mask
ACL-mask | template-allow-mask
ACL-mask & ~template-deny-mask
ACL-mask | direct-allow-mask
ACL-mask & ~direct-deny-mask
An important point is that denials (Deny ACEs) override grants (Allow ACEs) with
an equal or lower precedence source (for example: Template Deny overrides
Template or Parent Allow) but are themselves overridden by grants with a higher
precedence source (example: Direct).
Markings check
This phase returns a bitmask of rights that are unconditionally revoked from the
caller - the constraint mask - as a result of insufficient access to markings applied
to the object. This is determined by forming a list of the marking objects
corresponding to all the values of all marking-controlled properties of the object,
evaluating the callers effective ask to each of those markings, and ORing
together the constraint masks of any for which the effective access fails to include
USE_MARKING right.
The key part is how effective access to a marking object is evaluated, which has
relevance both here and to the additional property access check for
marking-controlled properties described in Property access check on
page 173. This evaluation varies according to the type of marking set to which
the marking belongs.
Chapter 5. Security
177
List
In this type of marking set, the markings are independent of each other. The
evaluation of effective access simply relies upon the ACL of the individual
marking, which is evaluated according to the rules given in ACL check on
page 176 and simplified because a marking ACL never includes anything but
direct ACEs.
Hierarchical
In this case, the evaluation is more complex because it must include the order of
precedence of the markings in the MarkingSet and the following rules:
Granting rights to a higher precedence marking implicitly grants the same
rights to all lower precedence markings. For example, someone given Top
Secret clearance is inherently able to access Secret and Unclassified
documents.
Denying rights to a lower precedence marking must also deny those rights to
all higher precedence markings. For example, if someone is not allowed to
see Secret documents, clearly they also are not permitted to see Top Secret
documents.
Therefore, evaluation of the effective access to a marking in a hierarchical
MarkingSet needs to include not only all the ACEs of the ACL on that marking
itself, but also the Allow ACEs from any higher precedence markings and the
Deny ACEs from any lower precedence markings. Although it is not implemented
this way, think of this logically as forming a composite ACL from those three
components and then performing a normal evaluation of that composite ACL.
5.3.8 Auditing
In some environments, the assurance that the Content Platform Engine will only
allow properly authorized operations might not be sufficient. It might be
necessary for legal or other reasons to have an actual record of attempts to carry
out unauthorized actions or even of permitted operations.
This is the purpose of the auditing feature, which allows an administrator to
enable recording of either failed operations, successful operations, or both. This
can be done with a considerable amount of selectivity:
Auditing is enabled by class. For example, it can be turned on for some
perhaps more sensitive classes but left disabled for others of lesser
sensitivity.
Within a class, auditing can be controlled by operation. For example, it can be
enabled for deletions but not for anything else.
178
For each enabled operation, the administrator can select whether a record is
made of attempts to perform that operation that failed due to insufficient
access, of successful operations, or both.
Finally, a filter expression can optionally be specified that is applied to an
object that is a candidate for auditing based on the preceding settings. Only if
the object satisfies the filter condition will an audit record be written. This
allows auditing to be narrowed to objects having only certain property values
or to where certain properties are modified.
There is no mechanism for determining auditing based on the identity of the user.
Operations to be enabled for auditing are expressed in terms of events triggered
by those operations, and audit records take the form of stored Event objects of
various classes. These are the same objects that participate in the Content
Platform Engine Events and Subscriptions feature. For a full description of the
Event classes and the corresponding operations, see the following document:
http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/index.jsp?topic=%2Fcom.
ibm.p8.ce.admin.tasks.doc%2Fevents_reference.htm
In all cases, the audit record includes the date and time of the operation and the
identity of the user that initiated it. For everything apart from Query Event, the
record also includes the identity of the object on which the operation was
performed. (A query is of a class, not of any one object.) In addition, for all events
other than Query Event, Get Object Event, and Get Content Event, the
administrator can elect for the audit record to contain property values from the
object at the time of the operation. The record can be either a complete snapshot
of the objects state (either before or after the update or both) or in the form of
selected properties copied (audited as) from the source object into custom
properties defined in the event class.
Audit records are independent of the object for which they record an operation
and, in particular, they are not automatically deleted when the source object is
deleted. An automatic disposal mechanism is however provided, through an
Audit Disposition Policy, which periodically scans the audit records deleting those
that satisfy a filter condition, often based on date.
Chapter 5. Security
179
180
Unique attribute
objectSid
Novell eDirectory
guID
orclguid
nsuniqueid
CA Directory
cn
ibm-entryuuid
Chapter 5. Security
181
182
Having fully developed the security model, the next task is to determine how that
can be expressed concretely in terms of security descriptors, templates, proxy
relationships, and markings. Determine how to define the procedures for
identifying the individuals to be given particular responsibilities and for assigning
them the rights required to carry out those responsibilities. Design how the
security model will be implemented.
Two elements of a successful implementation are important:
Wherever possible grant rights to groups rather than to individual users.
To fully exploit this method means creating a group corresponding to each
area of responsibility or distinct set of access rights identified in the model,
making responsible individuals members of the group, and then assigning the
group as the grantee in ACEs.
Make the maximum use of defaults.
Some applications can set and update the security properties on individual
objects, either based on their own logic or input from users. However, these
are almost always adjustments to the defaults defined for the object class. By
far, the most common case is that the class defaults are allowed to take effect
without modification.
It is therefore essential that class defaults are defined appropriately. This does
not mean only the default permissions and owner. If security templates are to
be used through security policies or document lifecycles, defaults might need
to be defined for the properties that reference those objects. Similarly, if
markings are to be used, defining default values for marking-controlled
properties can be desirable. It can even be appropriate to define default
values for proxy-defining properties, although that is less likely.
In addition, it is advisable to keep the number of ACEs in any ACL reasonably
small (fewer than ten) since this results in more economical storage and makes it
easier for administrators to understand the overall effect of the ACL.
Chapter 5. Security
183
However, that is not always possible. It can be necessary in some cases for the
application to make calls to P8 under an identity that is distinct from that of any
client it is serving and that has sufficient access to objects in P8 to enable it to
carry out any operation on behalf of any of its users.
This is a somewhat tricky proposition, since it implies that the usual P8 access
controls must be relaxed for the identity under which the application makes calls,
and so further implies that the P8 administrator must trust the application to
prevent its users from obtaining access that is not allowed if making requests
directly to P8. This is not something to be taken lightly.
Assuming that such trust exists, the following approach for supporting such
applications is preferred:
A separate directory service user needs to be defined exclusively for use by
the one application. It is completely inappropriate for the application to call P8
using an identity that can also be used by direct users or by another trusted
application.
The application needs to be configured to perform a JAAS login as that user.
The configuration settings containing the credentials for the login need to be
stored securely where only the application can get at them.
The user (directly, not via any group) needs to be granted the required access
in the default permissions for any class of which instances will be acted upon
by the application.
If need for the application ceases, the directory service user needs to be
deleted, thus ensuring no possibility that the elevated rights granted to the
application can be taken over by anyone or anything.
184
To see why not following this advice can cause immense problems, consider a
couple of examples:
Example 1
User X has sole responsibility for a particular task and is granted rights to
perform that task directly (not via a group membership) in ACEs applied to
various objects. Now, X leaves the organization and is replaced by Y.
In order to bring about this change, someone with sufficient privilege
(specifically, an object store administrator) must locate all the objects with
ACEs in which X appears and replace those ACEs with ones granting Y the
same rights.
There is no mechanism for querying for objects based on grantees in the
ACL. (The internal stored format of the security descriptor makes this
impossible.) In limited circumstances, it can be possible to identify the
relevant objects in some other way, based on queryable criteria. For example,
it can be known that all instances of a particular class are affected. However,
even if that is possible, the query can yield a large set of objects, making the
process of updating the permissions of each lengthy. And more typically, there
will be no such narrowing query and the administrator will have no option but
to scan through all documents and folders, examining the permissions of
each to locate those needing to be updated.
The most egregious case is where the task for which X is solely responsible is
that of object store administrator, for example, X was the sole principal given
in the administrators list when the object store was created. In this case, X
was placed in the default instance ACEs for every class defined in the object
store. Those ACEs likely have been copied into every instance created of all
those classes. Fixing that is, practically speaking, impossible.
Example 2
User A currently has sole responsibility for a particular task, but now user B
joins the department and is to share responsibility with A for that task. As in
the previous example, this requires locating and updating all the objects with
ACEs in which A appears, adding ACEs granting the same access to B.
This is largely the same as example 1 except that there is a slight possibility
of it being made easier by the participation of A, who can (a) be aware of all
the objects involved, and (b) can be able to undertake the permissions
updates without the intervention of an administrator, if granted WRITE_ACL
permission by the ACEs on those objects.
However, it is likely equally as intractable as the first example.
Compare this with how the changes can be accommodated if groups had
been used to assign rights to X and A. In each case, it is a matter of adding Y
or B to the group (and in the first example, removing X).
Chapter 5. Security
185
186
The following technote provides a full explanation of various ways this can be
done:
http://www.ibm.com/support/docview.wss?uid=swg21425080
The basic concept of using inheritance proxy as described in the technote is as
follows:
An object serves as the role. Inheritable ACEs are placed in the ACL of the
object, one per member of the role, with the same access mask in each
granting the access rights defined by the role.
For each object class to which the role is applicable, an inheritance
proxy-defining property is added, with required class that of the role object.
Applying the role to an instance of the class is then simply a matter of setting
the proxy-defining property to refer to the role object.
With this approach, adding and removing members for the role are achieved by
updating the ACL of the role object. Inheritance takes care of ensuring the effect
extends to all the objects to which the role has been applied.
Chapter 5. Security
187
188
For each of these caches, the server maintains operational statistics that are
extremely useful as aids to tuning. These statistics can be viewed in the IBM
System Dashboard. Each Content Platform Engine server instance has separate
caches, so in a multiple server configuration, the statistics for each need to be
considered.
Chapter 5. Security
189
Again, the optimal size for the SD cache is around the number in active use more leading to wasted memory and less to reduced performance because of
the processing time and I/O cost of retrieving security descriptors from the
database. However, sharing of security descriptors and other factors such as the
frequency of proxy relationships among objects being accessed makes it difficult
to estimate what the optimal size will be. So, using the performance statistics
available in the Dashboard is the best approach to tuning. The size is
independently configurable for each object store since the patterns of access to
objects, and therefore to security descriptors, can differ between object stores.
There is no TTL for security descriptor cache entries since they are never
modified and can never become stale.
Object-security cache
The object-security cache retains security information for proxy objects in
memory, improving performance when the same proxy is accessed several times
over a short period of time. An example is when retrieving the subfolders of a
folder (each subfolder has the same proxy, namely its parent folder), so the
security information for that folder is fetched just once and is reused to evaluate
the access rights for every subfolder.
There is an independent object-security cache for each object store, again
reflecting the possibility of different patterns of access, in this case, to proxy
objects. For each object store, the maximum size and TTL for the cache are
configurable.
The information retained in an object-security cache entry consists of a reference
to the security descriptor for the proxy object (not the SD itself) plus identity
information for the objects referenced by any proxy-defining properties of the
proxy itself and the values of any marking-controlled properties of the proxy
object. This typically requires a smaller amount of memory than for entries in the
other caches, giving more scope for increasing the size without risk of running
out of memory. Like the security descriptor cache, estimating the optimal size of
the object-security cache is not straightforward and using the Dashboard
statistics is the best approach for tuning.
An object-security cache entry can become stale as a result of updates to
security-related properties of the proxy object - changes to Permissions or
Owner or to proxy-defining or marking-controlled properties. The server that
receives the request to make these updates automatically flushes the now stale
entry from its cache, but other servers rely on a TTL as the means of overcoming
staleness. The default TTL for the object-security cache is relatively short (30
seconds). In a multiple server configuration, it should only be increased if
updates to any proxy objects are known to take place much less frequently than
that.
190
Chapter 6.
Application design
In this chapter, we discuss useful principles for designing IBM FileNet Content
Manager (P8 Content Manager) applications.
We discuss the following topics:
IBM FileNet P8 applications
Application technologies
Principles for application design:
Available P8 Content Engine APIs
Transports available with the APIs
Minimizing round-trips
Creating a custom AddOn
Exploiting the active content event model
Logging
Note: Although the technical components that make up the server pieces are
called the Content Platform Engine, there are still many separate aspects,
including APIs, for content and process as of the writing of this book. For
clarity, we use the earlier term Content Engine when talking specifically
about content matters in this chapter.
191
192
navigating and searching for documents and folders. IBM Content Navigator can
connect to other IBM content repositories and is included as the standard client
in several IBM ECM products.
In addition to being a ready-to-use client application, you can also easily
customize IBM Content Navigator user interface elements or extend it with
entirely new features. See IBM Content Navigator extensions on page 199.
Recommendations: You might have experience with using and extending
earlier generations of IBM FileNet Content Manager web clients, including
Application Engine, Workplace, or FileNet Workplace XT. For any new client
application development work, you need to strongly consider basing it on IBM
Content Navigator.
193
194
Although the Content Engine Java API uses EJBs internally to implement the EJB transport, those
EJBs are not exposed or available to application developers. They are accessible only indirectly
through the use of the Java API.
195
Java API
Content Manager provides a full-featured Java API. Any feature that is available
in the server is completely available to Java programmers. This access includes
routine operations, such as retrieving and updating Document objects, and
specialized operations, such as adding a custom class or property to an object
stores metadata definitions.
In simplified terms, an API object can be thought of as containing the following
information:
Something that identifies the object residing on the server. Typically, this is an
object store reference and an object ID or path.
Some number of locally cached properties. These might have been fetched
from the server, or they might have been set locally. A property value that has
been set or changed in the API object and not yet sent to the server is said to
be dirty, because its value does not match what is persisted on the server.
196
Some number of pending actions. When you call a method that implies a
change to the object (including simple property value changes), the change is
not made immediately. Instead, a representation of that change is added to
the API objects list of pending actions. For example, if you call the method
Document.checkin(), a Checkin pending action is added to the API object.
Dirty property values and pending actions are not sent to the server until an
explicit call is made to do so. If an API object is discarded without that call, the
changes are never made on the server. The most common method of sending
changes to the server is to call the save() method on an API object. There is also
a batching mechanism for sending updates to multiple objects in a single
round-trip over the network. Batching provides improved performance and
provides transactional atomicity for all of the changes in the batch.
Recommendations: Use only exposed and supported classes and interfaces
in the API. Do not use internal implementation classes; in particular, do not
make calls into anything in the com.filenet.apiimpl.* packages.
.NET API
Content Manager provides a full-featured .NET API, which you can use to write
programs in any .NET compatible language. With a couple of exceptions, any
feature that is available in the server is completely available to .NET
programmers. The exceptions are mainly custom code that must be executed
within the server, for example, EventActionHandler. Because the Content
Platform Engine server is a Java EE application, internally executed custom code
is limited to Java compatible technologies.
The principles behind the .NET API are the same as those behind the Java API
(see Java API on page 196), so we do not repeat that discussion here. One
significant feature available only with the .NET API is the use of Kerberos to
perform authentication via Microsoft Windows Integrated Login. This is only
possible when the client application is running on Microsoft Windows and the
Content Platform Engine is using Microsoft Active Directory. In practice, that
latter constraint usually means that the Content Platform Engine is also running
on Microsoft Windows.
Recommendations: Use only exposed and supported classes and interfaces
in the API. Do not use internal implementation classes; in particular, do not
make calls into anything in the FileNet.Apiimpl.* namespaces.
197
Web services
Modern, loosely coupled frameworks, such as a service-oriented architecture
(SOA), favor web services protocols for connecting components. Content
Manager provides Content Engine Web Services (CEWS) for accessing nearly
all features available in the Content Engine server.
Typically, if you as a programmer want to use a web services interface, you obtain
the interface description in the form of a Web Services Description Language
(WSDL) file. You run the WSDL file through a toolkit to generate programming
language objects for interacting with the web services interface. You then usually
build up a library of utilities to provide abstraction layers, caching, security
controls, and other conveniences. The Java and .NET APIs provided by Content
Manager are already exactly equivalent to that, and both APIs can use web
services as a transport (see 6.3.2, Transports available with the APIs on
page 200). Consequently, there is not as much motivation to use CEWS directly,
although there are still a few occasions where the direct use of CEWS might be
useful:
You have an application already using CEWS, and no plans exist for
immediately porting it to the Java or .NET API.
You are building an application component as part of a framework in which
the use of web services is the model for communicating with external
systems.
Although a rare occurrence, you might be using a language or technology that
can make use of web services but is not compatible with the use of a Java or
.NET API.
For these occasions, the direct use of CEWS is a good choice and is supported.
In theory, you can take the WSDL file for CEWS and use any current web
services toolkit to generate the interfaces that you will use on your end. In
practice, however, toolkits are still individualistic in their handling of various
WSDL features, and it is difficult to write a WSDL for a complex service that is
usable by a wide cross-section of web services toolkits. Check the latest
hardware and software support documentation corresponding to the product
version you are using, and use only a supported toolkit.
198
199
start with Content Navigator and then customize and extend it to meet their
custom application needs.
By customization, we mean altering the visual appearance or behavior of an
existing Content Navigator component. By extension, we mean adding new
features, large or small, to an Content Navigator environment.
Content Navigator is a browser-based web application. Its layered architecture
consists of these components:
A collection of visual widgets written using the Dojo JavaScript toolkit and the
dijit component libraries.
A layout framework for arranging visual components into logical desktops and
pages.
A browser-resident JavaScript model view controller (MVC) layer for
orchestrating the flow of information.
Mid-tier server-based components for interfacing to repositories and providing
other services.
Each of those layers has available customization and extension points for
application developer use.
Recommendations: Use Content Navigator as your application framework for
content-centric applications that need a rich and modern user interface.
Extend Content Navigator with features you need that are not already part of
Content Navigator.
In addition to being a highly customizable and extensible application framework,
most of the visual widget components used in the user interface layer can also be
easily adapted for use outside of the Content Navigator environment. The reuse
of those widgets can represent considerable development time savings even if
you do not choose to use Content Navigator itself.
All of the customization and extension topics in this subsection are covered in
extensive detail in Customizing and Extending IBM Content Navigator,
SG24-8055.
200
In the Content Engine APIs, the framework mechanisms are called transports.
The APIs were designed so that all API operations are completely independent
of the transport used. (The few exceptions deal with the propagation of security
and transaction contexts.) A benefit of this independence is that applications can
be written without considering the transport. The selection of a transport is a
configuration decision when the application is deployed (the API finds out about it
through the URI used for the Connection object).
There are two available transports: Content Engine Web Services (CEWS) and
Enterprise JavaBeans (EJB). EJB transport is available only for the Java API.
CEWS transport is available for both APIs. For most situations, the EJB transport
has slightly better performance, but the CEWS transport can be used in more
environments. In all cases, the transport is considered stateless, which means
that the APIs operate on the basis of a single request and response for each
interaction. No client state is maintained by the server after a request has been
serviced. There is one exception to the statelessness, which is that recent
releases of the Content Engine Java API can be configured to use a stateful
session bean when uploading multiple chunks of content over EJB transport.
EJB transport
The EJB transport internally uses EJB method calls. The method calls are made
on the client side and transported by the application server to the server side of
the network connection. Although many people think of EJBs using Java Remote
Method Invocation (RMI) as the remote communications mechanism, that is not
necessarily the case. Application server vendors are free to provide whatever
implementation they like as long as they meet the EJB requirements, and many
vendors use something other than RMI. In any case, the details of the application
servers implementation are transparent to the API, and the API does not need to
have facilities for controlling things, such as clustering or server affinity of the
EJB, because those things are configured within the application server.
CEWS transport
As its name implies, the CEWS transport uses web services protocols. In fact,
the WS transport uses the same Content Engine Web Services (CEWS) protocol
that we mentioned in Web services on page 198. You probably already know
that means XML over HTTP or HTTPS. Because HTTP and HTTPS use only a
single port for the entire conversation and use a strict client/server interaction
model, it is generally easier to configure a firewall or reverse proxy through which
to allow CEWS transport requests to pass.
201
Web services attachments are used for carrying pieces of content between the
client and server sides. Attachment handling has undergone many changes over
the years, and different environments and tools support different standards:
When using either API, you must select the CEWS endpoint that supports
Message Transmission Optimization Mechanism (MTOM) attachments
(recognizable because it has MTOM in the endpoint name: FNCEWS40MTOM).
There is another attachment format called SOAP that is less efficient in a
couple of ways than MTOM. Nonetheless, it is sometimes useful to
temporarily use the SOAP endpoint (FNCEWS40SOAP) as a troubleshooting step
if you suspect problems at the transport layer. That is seldom actually the
case, but it does not hurt to rule it out.
202
Specific details of using Kerberos and WS-EAF are provided in the Web
Service Extensible Authentication Framework Developers Guide section of
the online help files, IBM FileNet P8 Documentation.
CEWS transport, which is based on HTTP or HTTPS, uses just one or two
TCP/IP ports for all interactions. There are also commercially available
products for examining and validating web services traffic. Therefore, many
administrators find it easier and more secure to open their firewalls to CEWS
transport requests. In contrast, EJB transport might use a vendor-specific
binary protocol. Such protocols often employ a range of TCP/IP ports. These
factors typically lead to a greater willingness to allow CEWS transport to pass
through firewalls and a reluctance to do the same for EJB transport.
In cases where WS transport is using Username token authentication, the
credentials will appear on the wire unprotected unless you use Transport
Layer Security or Secure Sockets Layer (TLS/SSL), which we strongly advise.
With EJB transport, content is uploaded or downloaded in chunks. With
CEWS transport, the entire content is uploaded as part of a single HTTP
request. For download, however, CEWS transport also generally chunks
content.
Note: It used to be recommended to use CEWS transport for upload of
large content. However, recent releases have included some optimization
work using a stateful EJB call when uploading content chunks. That
translates directly to less work needed on the Content Engine server side
once the chunks have been uploaded. Although EJB transport still chunks
content on both upload and download, the performance overhead of the
chunking itself is typically quite small. Do not use the presence of large
content as your sole reason for selecting a particular transport.
203
Get or fetch
When many people think about interacting with an object from the server, they
first think about doing a round-trip to fetch the object. That is a necessity for
many things, but there are several cases where you do not need that initial fetch.
For example, if you are only going to use an object so you can set the value of an
object-valued property on another object, you really only need a reference. If you
somehow know that the object already exists, you can skip the round-trip to fetch
it.
204
(If it turned out that you were wrong and it did not already exist, the referential
integrity mechanisms in Content Engine will throw an exception when you try to
save the referencing object.) The APIs have a mechanism called fetchless
instantiation. There are three types of Factory methods for creating
programming language objects that reference Content Engine objects, and you
can tell them apart by the word used as the beginning of the method name:
create indicates that a new Content Engine object is to be created. No
round-trip is done as the result of this Factory method call. A save() call must
eventually be done.
fetch indicates that a round-trip is immediately made to the Content Platform
Engine to verify that the object exists and to return an initial set of properties.
Fine-tuning of the properties returned can be controlled via an optional
PropertyFilter. See Property filters on page 205.
get indicates that no round-trip will be made. This is a fetchless instantiation.
The API assumes that the object exists. There is no initial set of property
values available, so you need to request any property values that you need. If
you know that you always need some property values immediately, there is no
advantage to fetchless instantiation.
Property filters
Property filters are optional parameters to a number of methods that fetch
objects or properties from the Content Platform Engine. They allow highly
granular control of the objects or properties being returned.
It is easy to understand how returning fewer properties can improve
performance, but, less obviously, you can also improve performance by returning
more properties and objects. The savings comes if you can return multiple
objects in a single round-trip instead of making multiple round-trips to perform the
same work. A property filter can do just that. Over time, most application
developers know what properties and objects they need, so this can be an
efficient way to perform most or all of your retrievals in just a few round-trips.
Most of the Content Engine API calls that can take a property filter also accept a
null value. In these cases, the API still works correctly, but it might make
additional round-trips in the background as your application progresses. It is
designed that way so that you can get your application working quickly and
optimize the performance later.
205
Batching
The Content Engine APIs contain two separate but similar batching mechanisms:
A RetrievingBatch is used to fetch multiple, possibly unrelated, objects from
the Content Platform Engine in a single round-trip. Object references and
property filters are added to the batch, and retrieveBatch() is called to
trigger the round-trip.
An UpdatingBatch is used to group multiple updates in a single round-trip to
the Content Platform Engine. Instead of calling save() on individual objects,
the objects are added to the batch, and updateBatch() is called to trigger the
round-trip. Updates are performed as an atomic transaction.
Recommendations: Unless it leads to tortured application logic, it is a
good idea to accumulate multiple changes to objects before calling save(),
and it is also a good idea to batch updates to multiple objects in an
UpdatingBatch.
As a general rule, plan to carry no more than 50 - 100 items in a batch.
Somewhere in that range, the overhead associated with batching itself
tends to neutralize any performance benefits. Since specifics of application
workload can change for various reasons, consider making the batch sizes
configurable so that a code change is not needed for that adjustment.
206
There is a recurring application pattern that involves issuing a query for objects
matching a particular criteria and then performing an action on each result
object. The issuance of the query and accumulation of results are good jobs for
the coordinator thread or process. Disjoint sets of result objects can be handed to
worker threads or processes for action.
Alternatively, you might have an application that must process a large number of
objects, but your performance constraints are to operate as a background task.
That is, you want to the processing to move forward, but you do not want to
interfere with foreground work by placing an undue load on the server machines.
In that case, single-threaded processing might be a better match. In some cases,
you can easily distinguish the already processed objects from those still needing
processing. For example, your criteria might include some property value being
null, and your action might include setting that property to a non-null value. In
such cases, you can use a non-continuable query instead of a continuable,
paged query. Non-continuable queries have lower server overhead than
continuable queries. Just be sure to include a TOP qualifier to the SELECT clause,
for example, TOP 50. The number that you use can be convenient for the batch
sizes that you plan to use for the actions.
Recommendations: When the semantics of iterative processing allow it, use
a non-continuable query for best performance. This approach generally does
not work when multiple threads or processes perform the update actions in
parallel.
207
Content Manager follows the Java EE model for transactions, and Java EE in turn
follows industry standards for distributed transactions. In this context, the relevant
facts are that a transaction is started, operations performed by a transactional
resource (in this case, Content Engine) are tagged with the transaction identifier,
and the transaction is either committed or rolled back. All changes tagged with a
certain transaction identifier are committed or rolled back as an atomic unit.
Now that we have described the use of client-side transactions, here are a few
reasons to avoid them:
Client-side transactions tend to create or magnify performance problems. The
overall transaction times are longer simply due to network latency and other
factors inherent in the interaction between client and server. Longer
transaction times mean that resources all the way into the database are being
held for longer periods of time. This greatly increases the chances for
resource contention and slows overall system throughput.
Most of the tasks that applications want to do in a client-side transaction can
be done more efficiently with the API batching mechanism using an
UpdatingBatch object. A batch is performed as an atomic transaction, but the
transactional control is on the Content Platform Engine side.
API batches can be used with all APIs and transports, so it is a more flexible
mechanism than client-side transactions.
After some analysis, it almost always turns out to be the case that applications
using client-side transactions can be rewritten to use API batching. For the few
cases where client-side transactions are genuinely needed, they are supported
as described. The case where you might be forced into a client-side transaction
is when your application must include transactional resources outside of Content
Manager. For example, if you must include P8 Content Manager updates
atomically with updates to a stand-alone database, that is a motive for using a
client-side transaction. If you find yourself using a client-side transaction that you
cannot avoid, do your best to minimize the amount of time that the transaction is
active.
Recommendations: Avoid using client-side transactions. Instead, rely on the
inherent transactional behavior of the Content Engine server.
208
209
210
6.3.9 Logging
The Content Manager APIs and the server have built-in logging that focuses on
providing details of round-trips between the client and server. The reason for that
focus is because those details are typically interesting information for resolving
both performance and functional problems. The main purpose of the logging is to
have artifacts for diagnosing problems when hands-on debugging is not possible.
Those logs are intended to be examined by IBM Support and development
engineers. They are not documented in detail, but you might easily develop an
informal familiarity with them if you work with them.
When designing logging for your own applications, you are likely to have similar
goals. You might want to consider the following points:
Determine the interesting interactions in your application. Focus your logging
efforts on those interactions first. You can always add more logging as your
application evolves or as you become more familiar with the types of problems
that occur in production. Think of logging those interesting interactions as a
unit, whether they are all contained within a certain software module or not.
Do not log uninteresting details. Logs can become quite large, and many
details that are logged turn out to be distracting clutter when you are looking
at log files later. If something is likely to help solve a problem, log it.
If there is just a remote possibility that it will help, skip it.
Be careful about tying things to source code. It is fine to assume that the
people looking at the logs will have access to the source code to see what
entries mean, but only do that if that is actually true. Otherwise, log entries
must be reasonably self-explanatory so that you can teach someone what
they mean.
Log the impossible. In any application, there are conditions that are supposed
to be impossible. It is tempting to silently ignore those conditions in program
logic. If one of those conditions actually happens, it must be logged, because
211
Inheritance
Repository classes support a convenient inheritance model. You can define new
subclasses that add properties or change various characteristics of existing
properties for the subclass.
212
You can also add new properties to most system classes, although it usually
makes more sense to define a subclass just for that purpose and extend it (by
adding properties or further subclassing) for your applications needs.
Object-valued properties
One of the more powerful features of the data model is object-valued properties
(OVPs). When one object needs to reference another object, use OVPs instead
of storing the ID or path to the object. By using OVPs, you can directly navigate
from object to object. For an OVP, the metadata provides type safety by only
allowing you to point to objects of a certain class (or subclass), just like an object
reference in a programming language. The server provides features for
referential integrity and configurable cascading deletion (automatically controlling
the deletion of pointed-to objects or preventing the deletion of pointing-to
objects).
Reflective properties
A particularly useful form of OVPs is a reflective property, also known as
association properties. More than one object can point to a particular other
object. When that happens, the reflective property mechanism is used to simplify
the bookkeeping and let Content Engine perform most of the work. The usual
examples have a parent and many children. Suppose you have an Invoice object
with many LineItem child objects. With the reflective property mechanism, define
an Invoice property on the LineItem class and a LineItems property on the
Invoice class. The naming is just a convention that works well in practice. Any
property names can be used. To affiliate a new LineItem with the Invoice, you
need to only populate the Invoice property on the LineItem object. Because it
was created as a reflective property, the LineItems property on the Invoice class
automatically reflects the new line item being added. When you access the
multi-valued property (the LineItems property in our example), the Content
Engine automatically performs a query for applicable objects with the appropriate
value in the single-valued property (the Invoice property in our example).
213
Many-to-many relationships
Especially because of reflective properties, it is easy to use OVPs to model
one-to-many and many-to-one relationships. You might find the need to model a
many-to-many relationship. The usual solution for that is to use an intermediate
object to express a single pair of relationships. The system class,
ReferentialContainmentRelationship (RCR), is an example of this solution for
the special case of containing objects in folders. A single object can be contained
in many folders, and a folder can contain many objects. The document class has
a reflective property, Containers, which identifies all the RCRs (and, therefore all
the containment relationships) that reference a specific document instance. The
folder class likewise has a Containees property.
You can see that this intermediate relationship object, combined with reflective
properties, is a powerful tool for simplifying your modeling of many-to-many
relationships. Not only does it express the relationship, but it can also have
properties specific to that particular relationship. For example, an RCR has a
property, ContainmentName, that gives a unique name to a contained object for
the purposes of path-based navigation. When you use an intermediate object for
a relationship, you can add whatever properties are appropriate to your business
needs. Both ReferentialContainmentRelationship and
DynamicReferentialContainmentRelationship classes are subclassable, and
you can use them for your own relationships if they happen to fit the folder
containment model. Other good choices for the intermediate object are
subclasses of customer object and link system classes.
Custom objects
You will often find yourself with a need to hold a collection of related properties
for one reason or another. In a database programming environment, you might
create a new table with rows representing the collection of information. The
Content Manager solution for this is to create a subclass of the custom object
class. The custom object system class has only a few properties of its own, and it
exists specifically to be subclassed for this use. The invoice and line item
example used for reflective properties can also be modeled this way.
As part of the persistence architecture the Content Engine stores all custom
objects, regardless of class, in a single database table. It sometimes happens
that different kinds of custom objects are used in significantly different ways by
applications. For example, an object store might have numerous custom objects
that represent business object entities, and it might also have custom objects that
represent configuration items. The latter custom objects are relatively few in
number and can get lost in the volume of business objects. That can result in
performance problems at the database level. Because of this occasional
database issue, the 5.2 release introduces custom root classes. A custom root
214
class has its own table in the database but otherwise is similar to a custom object
subclass.
Recommendations: When contemplating the use of custom objects in your
data model design, consider using a subclass of CmAbstractPersistable as a
custom root class. This is useful if your objects will not be typical business
objects.
215
216
Chapter 7.
Business continuity
In this chapter, we describe how to provide for business continuity with IBM
FileNet Content Manager (P8 Content Manager).
We discuss the following topics:
217
218
Planned uptime is the time that the system administrators have agreed to keep
the system up and running for its users, frequently in the form of a service level
agreement (SLA) with the user organizations. The SLA might allow the system
administrators to take the system down nightly or weekly for backups and
maintenance, or, in an increasing number of applications, rarely if at all. Certain
mission critical systems for around-the-clock operations now need to be available
24 hours a day, 365 days a year.
The concept of high availability roughly equates to system and data available
almost all of the planned uptime for a system. Achieving high availability means
having the system up and running for a period of time that meets or exceeds the
SLA for system availability, as measured as a percentage of the planned uptime
for a system.
Table 7-1 helps quantify and classify a range of availability targets for IT systems.
At the low end of the availability range, 95% availability is a fairly modest target
and therefore is termed basic availability. It can typically be achieved with
standard tape backup and restore facilities. The next level up, enhanced
availability, requires more robust features, such as a Redundant Array of
Independent Disks (RAID) storage system, which prevents data loss in the first
place, rather than the more basic mechanisms for recovering from data loss after
it occurs. Highly available systems will range from 99.9% to 99.999% availability
and require protection from both application loss and data loss. At the high end of
this continuum of availability is a fault tolerant system that is designed to avoid
any downtime ever, because the system is used in life and death situations.
Table 7-1 Range of availability
Availability percent
Annual downtime
Availability type
100%
0 minutes
99.999%
5.3 minutes
99.99%
53 minutes
High availability
99.9%
High availability
99%
Enhanced availability
95%
Basic availability
219
To make this more concrete, consider the maximum downtime that can be
absorbed in a year while still achieving 99.999% availability, also called five nines
availability. As Table 7-1 on page 219 indicates, five nines availability permits no
more than 5.3 minutes of unscheduled downtime per year, or even less if the
system is not scheduled for round-the-clock operation. This is near continuous
availability, but not strictly fault tolerant. For a three nines target of 99.9%, we can
allow 100 times more downtime, or 8.8 hours per year. An availability target of
99%, which still sounds like a high target, can be achieved even if the system is
down 88 hours per year, or over three and half days. So the range of availability is
actually quite large.
You might be asking yourself, Why not provide for the highest levels of
availability on all IT systems?. The answer, as always, is cost. The cost of
providing high availability goes up exponentially as availability approaches 99.9%
and higher.
Choosing an appropriate availability target involves analyzing the sources and
costs of downtime in order to justify the cost of the availability solution. Industry
experts estimate that less than half of system downtime can be attributed to
hardware, operating system, or environmental failures. The majority of downtime
is the result of people and process problems, which comes down to a mix of
operator errors and application errors.
This chapter focuses primarily on how to mitigate downtime due to hardware
outages, system, and IBM FileNet software problems outside the control of an
IBM FileNet client, or environmental failures, such as loss of power, network
connectivity, or air conditioning. This covers less than half of the sources of
downtime. The majority of the sources requires people or process changes.
Our advice is to determine what has caused the most downtime in the past for a
particular system and focus first on that. Frequently, we have found that stricter
change control and better load testing for new applications provide the greatest
benefit. Focus on the root causes of outages first and then address the
secondary and tertiary causes only after protecting against the root causes.
Here are several examples of best practices for avoiding downtime from people
and process problems:
System administrators need to be well-trained and dedicated full-time to their
systems so that they are least likely to commit pilot errors.
The applications running on the system must be designed with great care to
avoid possible application crashes or other failures.
Exception handling, both by administrators and application programs, must be
carefully thought-out so that problems are anticipated and handled efficiently
and effectively when they occur.
220
221
222
Java EE application server vendors, including IBM, use the term cluster for their
load-balancing software feature. The other Java EE application servers, Oracle
WebLogic and JBoss, also provide a similar load-balancing software feature.
Java method calls by clients of a clustered Java application, such as the P8
Content Platform Engine, are distributed across all the WebSphere Application
Server Network Deployment servers running Content Platform Engine by means
of the WebSphere Application Server Network Deployment Workload
Management (WLM) component. WLM consists of both a client-side component
and a server-side component.
The server side, in conjunction with the WebSphere Application Server Network
Deployment High Availability (HA) Manager component, keeps track of the health
of each instance of the Java application, and sends that information back to the
client-side WLM component on the return from every Java method call from the
client.
The client-side WLM component, which is part of the WebSphere Java Runtime
Environment (JRE) running on the client server, is responsible for distributing the
method calls from local Java applications, such as the IBM Content Navigator,
over the servers running the target Java application, such as the P8 Content
Platform Engine. When IBM Content Navigator makes a content-related or
process-related method call to the P8 Content Platform Engine, the local WLM
running on the IBM Content Navigator server will decide which currently active
server in the P8 Content Platform Engine cluster to use for that call, effectively
load balancing all the calls across the servers in the P8 Content Platform Engine
cluster.
Network hardware vendors, such as Cisco and f5 Networks, have implemented
load balancing for server farms in several of their network devices. f5 BIG-IP is a
popular hardware load-balancing device.
There are also many other vendors that have load balancer products. These
products are best for load balancing the HTTP network traffic from web browsers
to the web application tier in a P8 system, as well as the SOAP/HTTP network
traffic from P8 client applications that use the Web Services interfaces to the P8
Content Platform Engine. However, do not use hardware load balancers in
combination with WebSphere Application Server Network Deployment WLM
software load balancing for the native Java APIs to the P8 Content Platform
Engine, because the WebSphere Application Server Network Deployment WLM
load balancing is self-contained and complete on its own.
In the best case, the hardware load balancer affects only the initial Java
infrastructure call to locate the instances of the P8 Content Platform Engine in
the WebSphere Application Server Network Deployment cluster. After that, the
WLM component takes over the routing of all the Java method calls to the P8
223
Content Platform Engine. In the worst case, a hardware load balancer can
compete with and disrupt the software load balancing provided by WebSphere
Application Server Network Deployment and cause serious performance
problems.
In addition to WLM for Java method call load balancing, WebSphere Application
Server Network Deployment provides another load balancing feature for HTTP
load balancing. This is the WebSphere Application Server HTTP plug-in for all
the popular HTTP servers. The plug-in intercepts HTTP traffic flowing through
the HTTP server to P8 servers, and distributes that traffic over the P8 servers
configured for each HTTP function. For example, HTTP traffic between users
web browsers and the IBM Content Navigator web application can be load
balanced by HTTP servers in front of the IBM Content Navigator server
instances, if the WebSphere Application Server HTTP plug-in is installed on the
HTTP server and configured for that traffic. Another example is traffic from clients
of the P8 Content Platform Engine that use the Web Services interface to the
content and process functions of Content Platform Engine, rather than the Java
API. The Web Services calls are made outside of the Java infrastructure over the
SOAP protocol running on HTTP. These calls can be load balanced by any HTTP
load balancer, including the WebSphere Application Server HTTP plug-in, or
hardware load balancers from the network hardware vendors, such as f5
Networks.
Figure 7-1 shows a logical diagram of a load-balanced server farm. This figure
shows a pair of hardware load balancers and multiple servers in the server farm.
Redundancy is essential to prevent the failure of one load balancer from taking
down the server farm.
Loa d
Bal an cer
224
This concept of no single point of failure is key to high availability. Every link in the
chain, that is, every element in the hardware and software, must have an
alternate element available to take over in case the first element fails. Software
load balancers, for example, are designed to avoid any single point of failure;
therefore, each server in the farm has a copy of the load-balancing software
running on it in configurations using software instead of hardware for load
balancing.
The software running on each server in a farm is functionally identical. As
changes are made to any server in the farm, you must replicate those changes to
all the servers in the farm. In this regard, a key benefit of WebSphere Application
Server Network Deployment is its facility for rolling out software changes across
all the nodes in a WebSphere Application Server Network Deployment cluster,
after one of the nodes in the cluster has been updated. So, it facilitates keeping
the software the same across all the nodes of a WebSphere Application Server
Network Deployment cluster. (Recall that a WebSphere Application Server
Network Deployment Java EE cluster is actually what we call an active-active
load-balanced server farm in this chapter. We describe next how that differs from
the concept of an active-passive server cluster.)
Load balancing offers a good solution: Any client calling into a load-balanced
server farm can be directed to any server in the farm. The load can be evenly
distributed across all the servers for the best possible response time and server
usage. However, load balancing can be a problem if the servers in the farm retain
any state between calls. For instance, if a user initiates a session by providing
logon credentials, it is beneficial for those credentials to be cached for reuse on
all subsequent calls to the server for that user session.
We cannot ask the user to log in over and over every time the application needs
to communicate with the server. Therefore, in one solution, the server keeps a
temporary copy of the users validated credentials in its memory. This works fine
if there is only one server, but in a load-balanced server farm, the load balancer
can easily direct subsequent calls from the same user session to different
servers in the farm. Those other servers will not have the session state in their
memory.
Load balancers can be configured for session-based load balancing to solve this
session state problem. This is also known as sticky sessions, session affinity, or
stateful load balancing. The load balancer keeps track of which server it selected
at the beginning of a user session and directs all the traffic for that session to the
same physical server. Session-based load balancing is required for the
Application Engine, but not for the Content Platform Engine, because the
Application Engine caches session state, while the Content Platform Engine
does not.
225
Load-balanced server farms (or Java EE clusters) that manage persistent data
stored on disk need to have a way for all the servers in the farm to share the
same set of disks. In data stored in databases, such as DB2, all the database
vendors provide interfaces with locking and transaction features that enable
multiple database clients in a load-balanced server farm to share read/write
access to the same database.
In addition to data housed in databases, the IBM FileNet Content Platform
Engine manages data stored in file systems, such as the file storage areas for
content objects, such as documents and annotations. So all the servers in a
Content Platform Engine farm have to be able to read and write to one or more
common file systems. The solution is a shared network file system, which
network-attached storage (NAS) devices provide natively over the Network File
System (NFS) or Common Internet File System (CIFS) protocols. NFS is
supported by the UNIX and Linux operating systems, and CIFS is supported by
Microsoft Windows. Another option for AIX and Linux based P8 Content Platform
Engine servers is the IBM General Parallel File System (GPFS), which can be
deployed with storage area network (SAN) storage devices to provide a shared
network file system for P8 servers.
Now, we turn to active-passive server clusters and explore how they differ from
active-active load-balanced server farms (or from load-balanced Java EE server
clusters, such as WebSphere Application Server Network Deployment clusters).
226
server instances to manage dynamic data sets safely. Those products can take
advantage of active-active load balancing, described previously. But other
products, notably IBM FileNet Image Services, do not have this capability, so
each data set must be managed by only one server.
Because of that single server architecture, a server farm with two or more active
servers does not fit well with servers that have not been designed for cooperative
data management. Yet a second server is still needed for continued availability, in
case the first server fails. The solution in this case is an active-passive server
cluster, where the second server stands by until the first server fails, before
stepping in to take over the data management.
The second server needs to have access to the data that was being managed by
the first server, either the same exact copy, or a copy of its own. The common
solution allows both servers to be connected to the same copy of data either via
a network file share or, more commonly, a SAN storage device that both servers
can access, but only one at a time. The active server owns the SAN storage, and
the passive server has no access.
Shared access to SAN storage in this way is an alternative to replicating the data
to a second storage device accessed by the second server. However,
maintaining a replica of the data, sometimes called a mirror, on a second local
storage device is a good practice, as protection against the failure of the primary
SAN storage device. Even highly available SAN storage devices, which have
internal protection against the loss of a disk drive through redundant copies of
the data, have been known to fail completely. Active-passive server clusters can
still be configured such that all servers can take over the primary storage in the
event of the active server failing, with the local mirror as a standby copy that is
used only if the primary storage device failed. The IBM DB2 High Availability
Disaster Recovery (HADR) product is an example of a product that provides both
active-passive server clustering as well as data replication so that the passive
server has its own separate copy of the data.
If there is no local mirror, recovering from the loss of a primary storage device
involves either time-consuming restoration from a previous backup, or declaring a
disaster and failing everything over to the recovery site, which is also
time-consuming. Data updates that have occurred in the time since a backup
was taken will necessarily be lost when a backup is restored. If the sources of
those updates are still available, the updates can be made a second time to avoid
data loss. In comparison to restoring from backup or switching over to a disaster
recovery site, switching over to a local replica by reconfiguring the server
managing the storage is faster, simpler, and avoids any data loss.
Figure 7-2 on page 228 shows two servers in a server cluster with access to the
same shared storage. Recall that some server farms typically do not have this
requirement for shared storage. DB2 pureScale, Oracle RAC, and the Content
227
Platform Engine are exceptions, in that they exhibit both server farm and server
cluster characteristics. They take advantage of load balancing, combined with
cooperative data management using storage that is simultaneously shared by all
the servers. In a load-balanced server farm with shared storage, all the servers
are active and thus need to access the storage in parallel, so a network file share
is required. An active-passive server cluster, however, is designed to allow only
the active server to access the storage, so the single-owner model of SAN
storage works well. The typical server cluster does not support load balancing,
but it does support shared storage via SAN. The storage is shared in a server
cluster in the sense that both servers are connected to the same storage, so they
share access to the same storage, but never concurrently in the case of SAN
storage.
Shared storage
As with server farms, clients of a server cluster see one virtual server, even
though the physical server they interact with will change if the primary server
fails. If the primary server fails, a failover occurs, and the second server takes
over the data copy and starts the software to manage the stored data. It also
takes over the virtual network address, which is shared by the two servers,
making the failover transparent to the client of the server cluster.
Both triggering a failover and actually accomplishing the failover cleanly are the
responsibility of clustering software running on both servers. This software is
configured on the secondary server to monitor the health of the primary server
and initiate a failover if the primary server fails. The active server in an
active-passive cluster owns the storage resources, commonly called a resource
group or shared volume group. The resource group is visible from both cluster
nodes but only dedicated to the active node. If the active node fails, the clustering
software will move the resource group to the remaining passive node. The
228
passive node sees the resource group but does not write to it until the clustering
software ensures consistency. This is called a shared volume group for IBM
PowerHA clusters.
After the failed server is repaired and running again, a failback is initiated via the
clustering software to shift the responsibility back to the primary server and put
the secondary server in waiting mode again. This failback is necessary to get
back to a redundant state that can accommodate another server failure.
In certain cases, intentional failovers can be used to mask planned downtime for
software or hardware upgrades or other maintenance. You can upgrade and test
the secondary server offline. And then, you can trigger a failover and apply the
upgrade to the primary server while the secondary server is standing in for the
primary server.
This type of configuration, in which the second server is inactive or passive until it
is called to step in for the active server, is called an active-passive server cluster.
Several clustering software products also support an active-active cluster
configuration, which is similar to a server farm where all servers are active. An
active-active cluster configuration is useful for data managing servers that are
designed to share the management across more than one server.
However, IBM FileNet products that use active-passive clustering software for
high availability all require an active-passive configuration. IBM FileNet products
that work with an active-active configuration always use a server farm and load
balancing rather than clustering software. (Server farms are always
active-active.)
Server cluster software requires agents or scripts that are configured to manage
key server processes on a particular server. These agents or scripts allow the
cluster software to monitor the health of the application software, as well as start
and stop the application software on that server. Cluster software typically comes
with predefined agents or scripts for common server types, such as database
servers.
A failover in an active-passive server cluster is not instantaneous. It will typically
take ten or fifteen minutes or longer, depending on how long it takes the
clustering software to stop the failing server, shift the virtual IP address and the
storage to the passive server, and start the application software on the passive
server. Before the system is accessible again, additional internal steps can take
place, such as database transaction recovery. Depending on the state of a
database and the number of in-flight transactions at the time of a database
server failure, it can take substantially more than fifteen minutes to roll back
incomplete transactions before the database is once again online and available.
229
230
centers within a single metropolitan area, typically less than 40 kilometers (25
miles) apart. The motivation behind twin data centers is to reduce the risk of
downtime from one data center being unavailable, because of planned or
unplanned downtime for the whole data center. By limiting the distance between
sites, the networking costs and risk of network failure are reduced, making this
approach more feasible.
Still, the simplicity of keeping server clusters and farms local to a single data
center is the best practice for high availability, because this minimizes the risk of
failure due to the complexity of running these clusters or farms across multiple
data centers, and the increased risk of network problems.
As we will see later in the disaster recovery discussion, the best practice solution
for the loss of the production data center is to fail over to a standby recovery data
center that is typically located hundreds of miles or more away from the
production data center.
Twin data centers in a metropolitan area are less attractive for true disaster
recovery, because both can be lost in a single local disaster. Even if one of the
nearby data centers survives a disaster, the IT staff living in the metropolitan
area surrounding the two data centers can effectively become a single point of
failure for the two data centers. If their access to the remaining data center is cut
off, or they are otherwise unable to work due to effects of the disaster, the
remaining data center can be effectively lost without suffering direct damage
itself from the disaster.
Some organizations have combined twin nearby data centers with a third
recovery center farther away, in order to have both a local disaster recovery
option as well as a remote disaster recovery option. That is the best practice
when twin nearby production data centers are a company standard. But a more
cost-effective and lower-risk solution is to have a single production data center
configured for full local high availability, and a remote standby disaster recovery
data center located at least a hundred miles away, preferably more.
PowerHA
231
HP ServiceGuard
Sun Solaris
Sun Cluster
232
Feature
Farms
Clusters
Yes
Yes
Yes
No
No
Yes, typically
Yes
No
Instantaneous failover
Yes
Not usually
No
Feature
Farms
Clusters
Requires hardware or
software load balancer
No
No
Now that we have covered the differences between server farms and server
clusters, we explore the advantages of farms over clusters and the advantages of
clusters over farms. Server farms have no idle servers, by definition, because all
servers in a farm are active. Server clusters always have one or more idle
servers in a steady state. Even more importantly, you can expand server farms
by simply adding a server clone, thereby scaling out the farm to handle larger
workloads. This horizontal scalability is not possible with active-passive server
clusters. The last advantage of a farm over a cluster is faster recovery time.
Server cluster failovers are delayed by the time that it takes to start the software
on the passive server on a failover. All the servers in a server farm are active and
immediately available to accept work that has been redirected away from failed
servers.
There are also some advantages that clusters have over farms, but on balance
farms have the advantage. The chief advantage of a cluster over a farm is that
the passive server can be configured identically with the active server,
guaranteeing no performance drop-off in the event of a failover. With server
farms, even if the initial server sizing is done to allow one server in a two-server
farm to handle 100% of the workload, the workload can increase over time to the
point where a single server is unable to handle the full workload after a failure.
Careful capacity monitoring and periodic testing can prevent this problem from
occurring with farms, however.
233
HA terminology
Microsoft
Symantec Veritas
Oracle
IBM
234
235
The duration of time that passes before the systems can be made operational at
the recovery site is called the recovery time. The RTO is the businesss time
requirement for getting the system back online. That is, how much downtime can
the business endure?
RPOs and RTOs for different businesses and industries range from seconds to
minutes or days, even to weeks, depending on business requirements.
7.5.1 Replication
Backing up to tape or other removable media is the minimum for copying data for
use after a disaster. You must ship the media off-site to a location outside of the
projected disaster impact zone. The greater the distance of the location from the
production site, the lower the risk that both production and recovery sites will be
affected by the same disaster. One general rule is that a backup tape vault and
recovery site must be at least 48.28 km (30 miles) away from the production
system, which in most cases is sufficient to avoid a flood or fire disabling both
sites. However, sites that are close together can still be in the same impact zone
for earthquakes, hurricanes, or power grid failures, so more cautious
organizations separate their production and recovery sites by hundreds, if not
thousands, of miles.
Companies usually perform backups once a day, which meets only a 24 hour
RPO. That means that as much as 24 hours of data can be lost. The recovery
time required for data restoration from tape can be days due to the need to
restore a series of tapes that represents a full backup and subsequent
236
incremental or differential backups. So, you measure both RPO and RTO in days
if the only DR provision is tape backup.
For a better RPO, that is, to reduce the potential data loss in a disaster, you need
to periodically replicate the data to a remote disk, because periodical replication
can be done more often than tape backup. This effectively reduces the window of
data loss. Continuous replication that is done in real time can avoid any data loss
at all.
Note: When you use continuous data replication products, point-in-time
backups, such as tape backup or periodic replication, are still required in order
to recover from data corruption or accidental deletion. Continuous replication
copies the corruption or deletion to the replica; therefore, you need to be able
to fall back on a point-in-time copy prior to when the corruption occurred.
There are several levels at which you can perform replication: the application
level, the host level, and the storage level. Database replication is the best
example of application-based replication. Host-based replication is beneath the
application level, but it still resides on the server host and typically runs at the file
system or operating system level. Storage-level replication is implemented by the
storage subsystem itself, frequently, a SAN device or a NAS device.
Application-based replication
Application-level software that understands the structure of data and
relationships between data elements can copy the data intelligently, so that the
structure and relationships are preserved in the replica. Database and
object-based replication are examples. Database replication ensures that the
replica database is always in a consistent state with respect to database
transactions. Object-based replication ensures that content objects that include
both content and properties are replicated as an atomic unit, so that the content
and properties are always consistent with each other in the replica.
Each database vendor has replication products that replicate just the database,
but not other data. Examples include IBM DB2 High Availability and Disaster
Recovery (HADR) and Oracle Data Guard. Database replication products are
typically based on shipping database logs to the recovery site to be applied to a
database copy there. The advantage of these products is that they keep the
database replica in a fully consistent state at all times, with no incomplete
transactions, which reduces the recovery time required when bringing up the
database after a disaster. The disadvantage of these products is that they have
no means to replicate anything other than databases. File systems that need to
be kept consistent with the database, for instance, have to be replicated by a
different replication mechanism, which introduces the possibility of inconsistency
between the database and file system replicas.
237
Host-based replication
In contrast to application-based replication, host-based replication has no
understanding of the data content, structure, or interrelationships. It detects
when a file or disk block has been modified and copies that file or block to the
replica. Symantec Veritas Volume Replicator and Double-Take Software
Double-Take are examples of host-based replication products. Unlike
application-based replication, they can be used to replicate all forms of data,
whether it is in a database, a file system, or even a raw disk partition. Several of
these products use the concept of consistency groups, which tie together data in
different volumes and allow all the data to be replicated together, therefore
maintaining consistency across related data sets, such as databases and file
systems. In contrast to application-based replication, however, the replica is not
guaranteed to be in a clean transactional state, because the replication
mechanism has no visibility into database or file system transactions. Recovery
can take longer, because incomplete transactions must be cleaned up prior to
making the data available again.
Storage-based replication
All of the storage vendors offer storage-based replication for their SAN and NAS
products. The storage products themselves provide storage-based replication
and do not use server host resources. Examples include IBM Metro Mirror
(PPRC) and Global Mirror (XRC), EMC SRDF and MirrorView, Hitachi Data
Systems TrueCopy, and Network Appliance SnapMirror.
NAS products replicate changes at the file level. SAN products replicate block by
block. In both NAS and SAN replication, as with host-based replication, there is
no knowledge of the structure or semantics of the stored data. So, databases
replicated in that way can be in any transient state with regard to database
transactions and therefore might require more database recovery time when the
replica is brought online. That increases the overall recovery time.
NAS replication covers any data in the file system, whereas SAN replication,
which is at the lower level of disk blocks, covers all data stored on the disk.
An emerging specialization of storage-based replication uses a SAN network
device to intercept disk writes to SAN storage devices and manage replication
independently of both the server host and the storage devices. IBM SAN Volume
Controller is an example of this type of product. It has the advantage of being
able to span heterogeneous SAN storage devices and replicate data for all those
devices in a consistent manner. You can think of the IBM SAN Volume Controller
as a new form of storage-based replication, because it resides in the Fibre
Channel infrastructure used to access SAN storage. Analysts have a new term
for this kind of replication: network-based replication.
238
239
copy at Site 2 holds all the data up to the moment of the disaster. From there, the
data can be replicated asynchronously to Site 3, the actual recovery site,
therefore extending zero data loss all the way to Site 3. It works, but the added
replica and site can be expensive.
Site 1
Site 2
Site 3
Asynch
Synch
Asynch
< 60 miles
Any distance
Several vendors support an optimized version of the second site called a bunker
site where only the blocks not yet replicated are stored and no others. The list of
the blocks that have not yet been replicated is typically a small list, so a bunker
site can be configured with minimal storage space, which reduces the overall
cost of this solution. IBM Asynchronous Cascading Peer-to-Peer Remote Copy
(PPRC) is an example of this three-site zero data loss solution.
240
241
242
243
Recovery point
Cost
Technologies
Minutes to an hour
$$$$$$$$$$$$$
1 - 6 hours
$$$$$$$$$
Hot or warm
standby site,
asynchronous
replication, or
global clustering
6 - 12 hours
$$$$$
12 - 24 hours
Hours to days of
data lost
$$$
Warm or cold
standby site, or
periodic backup to
remote storage
Days to weeks
Cold or no standby
site, or nightly tape
backups shipped
off-site
The cost now starts to accelerate upward. As the name implies, continuous
replication is the process of replicating data to the recovery site as it changes,
that is, on a continuous basis. Near continuous and continuous replication greatly
decrease the potential for data loss when compared to periodic replication, which
brings the RPO down to seconds worth of data loss, or even zero data loss in
synchronous replication.
Disaster recovery time is similarly decreased with synchronous and
asynchronous replication, because the data is kept continuously in sync, or close
to it, at both sites. In the event of a disaster, no time is required to bring the data
up-to-date, as is the case with restoring from backup, periodic replication, or log
shipping, but time might be required for configuring and bringing up a duplicate of
the application environment on the replicated data. The RTO is in the range of
hours in that case, or, if a complete application environment is maintained at all
times at the recovery site, and global clustering is used to automate and speed
site failover, RTO can be in the range of just minutes.
244
245
User/client 1
User/client n
HTTP
active-active,
WAS,
or WAS ND * cluster
HTTP
active-active,
WAS ND
cluster
DB2 purescale
(or DB2 HADR *)
Databases
N series
SAN
Files
N series
NAS
and/or
Snaplock
protected
storage
active-active
(or active-passive *)
Content Search
Services
HTTP
Index
N series
NAS
Database
N series
SAN
SnapMirror
asynchronous
replication
with protection
groups
Databases
N series
SAN
Files
N series
NAS
and/or
Snaplock
protected
storage
Index
N series
NAS
Database
N series
SAN
Servers
mirroring
production,
with
redundancy
for HA
OR
Scaled back
configuration,
without
redundancy
for HA
active-active,
load-balanced by CPE
Rendition Engine
active-active,
proprietary loadbalancing
Servers could be
used for Dev, QA,
staging, etc., then
rebooted in prod
mode for DR
At the business logic tier, sometimes also called the services tier, the HA best
practices for the core P8 components shown in Figure 7-4 are all load-balanced
server farms. Only a few optional P8 components, not shown in Figure 7-4,
require active-passive server clustering because they do not support
active-active load balancing. Process Simulator is an example. (Many clients
choose not to make Process Simulator highly available, because it does not play
a runtime production role.)
Two or more P8 Content Platform Engine servers must be deployed in a
load-balanced server farm when high availability is required.1 The Content
Platform Engine has been qualified with both hardware and software load
balancers.
246
Prior to P8 4.0, the Content Engine supported both farming and clustering for its Object Store
Services component, but only active-passive clustering for its File Store Services component.
Starting with P8 4.0, these components were unified and have since supported farming across the
board. Prior to P8 4.0, the Process Engine required active-passive server clustering for high
availability, but has also supported farming since P8 4.0. In P8 5.2, the Content Engine and
Process Engine were merged together into the Content Platform Engine.
IBM Case Manager 5.1.1 only supports active-passive database clusters, due to a Business Space
constraint.
247
RDBMS
TCP/IP
Web Server DMZ
HW load
balancer
P8 web client
HW load
balancer
HTTP
SOAP/HTTP
for CEWS or PEWS
Image Services
CLI
Active-active
Active-active
IBM Content Navigator,
WAS,
IBM Case Manager *, or WAS ND * cluster
Application Engine,
Workplace XT,
and/or custom web app
Active-passive, e.g.
AIX/PowerHA
SAN
SAN
CFS-IS/IS RPC
EJB
WLM
SOAP/HTTP
User/client n
EJB/IIOP
HW load
balancer
JDBC
User/client 1
HTML/HTTP
to P8 web client
MSAR/ISDS
P8 Content
Platform Engine
farm
Active-active
WAS ND cluster
CIFS/NFS/API
Files
CEWS-PEWS/SOAP/HTTP
CIFS/NFS/GPFS/API
P8 Content Search Services
Active-active,
load-balanced by CPE
CIFS/NFS
RMI/IIOP
P8 Rendition Engine
Active-active,
proprietary load balancing
CIFS/NFS
TCP/IP
ADO
JDBC
SAN
Figure 7-5 High availability best practices for P8 5.2 with protocol detail
248
This redirection allows reconnection to the recovery site without making any
client computer changes. The DNS servers or DNS load balancers themselves
must be redundant, of course, to avoid being a single point of failure.
249
range of 99.9% and higher) is not reachable when every local failure triggers a
full site failover (and later a full site failback to return to a protected state).
How about using geographically dispersed farms and clusters, that is, with the
farms and clusters split between the two sites? If one server fails, the server at
the other site takes over, either coming up at the time of failure in an
active-passive server cluster or simply taking on redirected client requests in
server farms. Again, there is an availability trade-off because of the added risk of
communication problems between the two sites. We do not recommend
geographically dispersed farms and clusters as best practice because of the
added risk and higher networking costs.
So the best practice is to deploy local server farms and clusters for high
availability in order to provide for continuing service in the event of local
component failures and to deploy a second site with data replication and,
optionally, global clustering, to provide for rapid recovery from disasters. The best
practice is to locate the recovery site outside the disaster impact zone of the
production site.
250
See Images Services 4.1.2 High Availability Procedures and Guidelines. This
document describes both high availability via clustering software (Microsoft
Cluster Server or Veritas Cluster Server) and disaster recovery via data
replication software (Veritas Volume Replicator - see Appendix C) for Image
Services:
ftp://ftp.software.ibm.com/software/data/cm/filenet/docs/isdoc/412x/HAC
luster.pdf
251
252
Chapter 8.
253
Input
Select
hardware
Processing
Output
Adjust
NOK
Examine
utilization
Transform
(automatic)
Define
workload
Refine
OK
Document
and present
254
IBM Content Capacity Planner uses at least two input sources. One is the
hardware configuration, and the other is the defined workload that consists of
one or multiple transactions. The output from IBM Content Capacity Planner
consists of performance charts. If the system utilization of all components is
below a threshold, the system is deemed adequate to meet the workload
requirements. The results are documented. If system utilization is at or above the
threshold, you need to change the hardware configuration.
When defining a workload in a presales situation, the details of a model might not
be obvious. Therefore, it might be easiest to develop your general model first and
refine it as you learn more details.
You might want to start with a moderate hardware configuration. When defining
your workload, after each transaction, you can immediately see the result in the
chart and scale the hardware with the transactions. This provides a better
understanding of the cost per modeled transaction. However, there is a chart
option to view utilization by transaction function to get the explicit cost per
modeled transaction function.
When modeling the workload, IBM Content Capacity Planner provides a
walk-through wizard for a quick start that helps you to configure the basic
parameters of the components that you want to size. We found it useful to use
the wizard and save the result to another file. The wizard helps you learn which
transaction functions to add to your workload but it creates a simplified model.
Some of the lesser used functions can only be obtained by manually adding them
to your workload from the Transaction Templates in the tree view.
255
Each system sizing is individual. Avoid the one fits all approach after sizing one
initial IBM FileNet Content Manager environment. Each solution is built to fulfill
defined functional requirements and has a different sizing of required hardware,
number of CPUs, memory, disk capacity, and network bandwidth.
We concentrate on general sizing questions. The typical questions to ask the
client when preparing to size a system usually fit into the following categories:
Client environment
Content ingestion
User activities
Configuring records management
Business process management specifics
Client environment
The following list provides questions to ask during sizing that are related to the
client environment:
Does the client prefer specific hardware? If yes, which vendor?
Are there standard machine types that the client wants to use? If yes, what is
the standard server, which processor, and how many CPUs?
What application server will be used?
What database server will be used?
256
What are the default working hours? You can overwrite this default value in
each transaction if needed.
Content ingestion
The following list provides questions to ask during sizing that are related to
content ingestion:
If content is ingested through scanning:
What are the scanning hours?
What is the average number of scanned documents during the scanning
hours?
What is the total number of documents usually scanned?
What is the average size (in KB) of a scanned document?
In how many batches are these scanned documents processed?
How many documents are in a batch?
If content is ingested through file import:
What are the importing hours?
What is the average number of documents imported during that time?
What is the total number of documents usually imported?
What is the average size (in KB) of an imported file?
If ingested content is email via IBM Content Collector for Emails:
Will original emails be archived?
What is the average email size (in KB)?
What is the average properties set?
What is the number of duplicate email pointers?
What is the number of original attachments?
What is the number of duplicate attachment pointers?
What is the average size of attachments (in KB)?
User activities
After the content is ingested, corresponding actions are started. The content can
be processed by IBM Case Foundation or simply stored and used for retrieval
later. A user can work on the content using a custom application or FileNet
Workplace XT. How the user uses the content might determine the sizing of the
system.
257
258
259
You can see the Content Platform Engine load throughout the day. In the morning
hours between 8:30 a.m. to 11:30 a.m., the system load is higher due to
scanning activities. From 11:30 a.m. to 4:30 p.m., the activity level is lower,
because only retrieval and processing activities occur. Between 3 a.m. and
4 a.m., prefetching takes place. Documents that are needed for the next day are
retrieved and loaded into the cache for better performance.
260
Input
Processing
Select
hardware
Output
Adjust
NOK
Collect and
import
Examine
utilization
Transform
(automatic)
Define
workload
Refine
OK
Document
and present
261
For example, we show an existing IBM FileNet P8 system, including IBM FileNet
Image Services. The client is planning to roll out another application on Content
Platform Engine that is expected to double its workload. In addition to that, a
third-party application is installed that adds about 20% additional load.
For modeling purposes, we import the current Content Platform Engine utilization
with a factor of two, import the Image Services utilization, and add an application
that accounts for an increased workload of 20%.
Figure 8-4 shows the utilization for the Image Services system.
The chart shows the workload summary after importing the three workload
profiles: one for Content Platform Engine, one for the Image Server, and one for
the additional third-party application. The various colors represent single
services that run simultaneously. The chart illustrates the imported workload
together with the new application workload.
262
The result is that with the additional application, the Image Services server,
exceeds its threshold at 7:30 a.m. It needs to be scaled up with two additional
CPUs.
263
Figure 8-5 shows an example in which the Content Platform Engine is under a
heavy load.
Figure 8-5 Content Platform Engine under heavy load (utilization is more than 90%)
We want to discover what transaction led to the workload. So, we switch to the
Transaction Functions view. Figure 8-6 on page 265 shows the result and the
transaction responsible for the workload.
264
Figure 8-6 Transactions Function view (showing the transactions causing the workload)
As shown in Figure 8-6, we see that the IBM FileNet P8 4.x Java Create
Documents transaction creates the most intense workload. When verifying with
the system, in this example, we realize a typographical error in the number of
input documents and correct it.
With the correction made, we see in Figure 8-7 on page 266 that the system
operates well under the threshold.
265
266
Figure 8-8 shows an extract of the spreadsheet that contains the input system
values and the output, which is the estimated disk space required for IBM FileNet
Content Platform Engine and additional components.
P8 Disksizing Tool for the Content Engine and Process Engine
Version 2.3 Beta
About
Enter values for the sections in green. Appropriate disk sizes
will be displayed in the yellow sections.
Global settings
Headroom
Search Engine
Concurrent active collections for Search Engine
CSE Maximum Collection Size
CSS Maximum Index (Collection) Size Criteria
CSS Max Size per Index (Collection)
CSS Max Objects per Index (Collection)
CSS Object stores for CM
CSS Batch size for CM
CSS Threads for CM
CSS Object stores for ICC
CSS Batch size for ICC
CSS Threads for ICC
Index to Content Ratios for various file types
Assumptions
1.25
CSS
5
8.0
Max Objects per Index
100.0
8.0
1
100.00
4.00
1.00
100.00
4
22%
20%
25%
25%
50%
Content Manager
How many objects associated with content will be stored?
How many custom objects will be stored?
How many versions of each object will there be?
1.00
1.00
1
There are also several additional sizing spreadsheets provided by IBM for other
IBM FileNet P8 products.
267
8.4 Conclusion
This chapter offers a brief look of the concept and process to define and size
hardware, network bandwidth, and storage by using IBM Content Capacity
Planner. It also offers hints about the input information that is needed to size the
environment with IBM FileNet Content Manager.
Note: The sizing with IBM Content Capacity Planner will only be as good as
the provided input information.
268
Now that you have a general understanding of how to plan and lay out an IBM
FileNet Content Manager environment, we explore the basic deployment
concepts of IBM FileNet Content Manager in Chapter 9, Deployment on
page 271.
269
270
Chapter 9.
Deployment
In this chapter, we describe the deployment for your IBM FileNet Content
Manager solution. We provide advice about how to automate deployment from
organizational and technical points of view. We describe the repository design
elements and the repository infrastructure components that are part of a FileNet
Content Manager solution.
When you read through this chapter, you will understand the deployment of a
FileNet Content Manager system.
The chapter will give you the following insights:
Overview
Deployment by using a formal methodology
Deployment approaches
Deployment based on cloning
Deployment by export, transform, and import
FileNet Content Manager deployment
Summary
271
9.1 Overview
Why is deployment important to you?
The most important reason is to ensure consistency in two areas:
Ensure consistency in your deployment.
Ensure consistency of the metadata model across the environments and in
different stages.
Deployment has different technical meanings. For instance, if you talk to your
WebSphere Application Server administrator, the administrator will mostly
associate deployment with web or enterprise application deployment via IBM
WebSphere Network Deployment Manager Console. As another example, in the
IBM Case Manager product, deployment refers to the process of migrating and
installing an IBM Case Manager solution that was developed in one environment
into another environment.
In this chapter, we discuss the following aspects of deployment:
FileNet Content Manager data model deployment
FileNet Content Manager repository deployment
Discuss the technical and organizational dependencies of FileNet Content
Manager deployments in general
In the following sections, you will get information that deployment is defined as a
collection of related assets used in an application. At the end, the assets will be
packaged together and delivered as an application solution. Therefore, we will
treat this package as a solution and talk about solution deployment.
Each deployment starts with the following questions and considerations:
Which objects need to be deployed?
What is the source and the destination?
This chapter describes deployment methods and approaches. It provides details
about the tools and features available in the IBM products that can be used in the
deployment process.
The deployment discussed in this chapter does not cover content migration,
upgrade scenarios, or switching to a different platform. Chapter 11, Upgrade and
migration on page 371 covers some topics regarding upgrading to the current
release of FileNet Content Manager as of this writing.
Note: This chapter assumes that you are performing the deployment.
272
The next section gives us an overview about technical environments and their
organizational role in our deployment strategy.
Chapter 9. Deployment
273
Another reason to have several object stores in each environment is so that your
repository design objects are separate from your data object store when the
application runs. In your design object store, you have no instantiated application
or solution data. This is advantageous in that you have no technical constraints
by deleting a property or changing repository design objects. The repository
where the design objects are stored is also called a metastore. For more
information about designing a metastore, see Chapter 4, Repository design on
page 81. IBM Case Manager maintains a metastore automatically for you and is
part of the solution design process that is seamlessly integrated in the solution
deployment model. In IBM Case Manager, the design object store acts as the
metastore.
For more information about IBM Case Manager, see this website:
http://www.ibm.com/software/advanced-case-management/case-manager
274
Larger companies tend to add these additional environments to the basic three
environments identified earlier:
Performance testing
Training
Staging
You might add more environments for the following reasons:
A need to mitigate risks associated with multiple projects running at the same
time interfering with each other, while retaining the ability to reproduce errors
from the production system in a test environment
A need for multiple training environments so that many people can be
educated in a short period of time
The more environments that you have, the more important it is to maintain and
synchronize them correctly.
The segregation of environments by the FileNet Content Manager domain is a
best practice. The isolation achieved by this approach is optimal to allow people
to work simultaneously and independently on the same project but in different
phases without adversely affecting each other. In particular, giving each
environment its own FileNet Content Manager domain makes it easy to grant
domain-wide permissions in each environment to different groups. For example,
developers can be given full permission to configuration objects in the
development environment but no permission to configuration objects in the
production environment.
The next section provides guidance to set up a formal deployment process
before you start using the IBM FileNet Deployment Manager utility in later
sections.
Chapter 9. Deployment
275
Development
Release
Planning
Gather
Requirements
Design and
Development
phase
Iteration 1..n
Release
Backlog
Build and
Regression
testing 1..n
Release
build 1..n
Testing
Quality
review 1..n
Regression
test
Useraccept
ance test
Release
accepted
Rollout
planning
Ready to
implement
release
Production
Implement
release
Verification
Regression
test
Rollout
completed
276
Figure 9-1 on page 276 shows three phases of a software development lifecycle:
development, testing, and production. Each phase can correspond to one or
more individual environments. This lifecycle is the groundwork for our
deployment strategy.
Note: Development or change running directly on production without using the
staging-based approach is a bad practice and can cause loss of consistency.
In each phase, the regression testing has a different focus. In development, your
automated regression test is done on your local resources. You cannot map
these results to another environment. You will get a statement about the
functional verification test (FVT) in development. In testing, you can perform
system verification testing (SVT) with load and performance tests to get the
business requirements verified and qualified. In production, a short functional
verification test is typically performed by starting the application and creating test
data that has no impact on the production environment and must be reversible.
Recommendations: In each environment, always create test data in a
dedicated storage area that has no special requirements for retention or
security and that is separate from the production storage. You must sanitize
the production data that is used as test data for FVT/SVT in each environment.
The illustration in Figure 9-1 on page 276 also shows that a change management
process is indispensable.
How do you document deployment requirements in a change request as the
deployer? As a best practice in typical client situations, you include at least one
spreadsheet with multiple worksheets inside that are added to your change
management system. This deployment sheet allows you to track the changes
that you made across the stages. It also provides detailed history about the
implementation schedules and results. We agree that it is absolutely crucial to
have a good change management process in place be able to track the same
level of detail also for the non-production environments.
In the next sections, we present more details about release, change, and
configuration management, as well as testing, before we dive into a discussion of
moving the applications from development to production. To understand how
deployment works, it is important that you understand how a software
development process works.
Chapter 9. Deployment
277
278
Project plan
Release notes
Test matrix, test plan, and test results
Installation scripts and documentation
Support documentation
User documentation
Training material
Operations documentation
The following documentation and information can help the release manager for a
FileNet Content Manager solution perform release management tasks:
Hardware and software compatibility matrices from all involved vendors of
your solution.
Release notes, technotes, and the latest fix packs with their descriptions.
Available export and import options to deploy the solution between
development and production environments.
Search and replace scripts used to prepare exported assets for use in the
target environment where object stores, users, or groups differ from the
source environment. Tools required for needed data transformations.
Deployment guidelines in the IBM FileNet P8 Information Center
Online help
Chapter 9. Deployment
279
In addition, the IBM Rational product line from IBM can be helpful in supporting
release management, change management, and testing:
http://www.ibm.com/software/rational
Release management delegates several of the underlying support processes to
the change and configuration management that is discussed in 9.3.2, Change
management on page 281 and 9.3.3, Configuration management on
page 286.
A release can consist of multiple components in specific configurations of the
involved components. Release management handles the validation of
combinations of application releases, commercial components, customized
components, and others. While a specific component is developed on the basis
of a concrete version of its underlying commercial application programming
interface (API), at the moment of deployment to production, this combination
might have changed in the bigger context of our solution. The management of
combinations of versions of involved components is a time-consuming activity
and needs to be scheduled and planned carefully and early.
Another aspect of release management deals with objects that have been
created in production that affect the configuration of the solution and might affect
deployment. In FileNet Content Manager solutions, these types of objects include
folder structures, entry templates, search templates, and others. Release
management must have a strategy in place to handle or restrict bidirectional
deployment between multiple environments.
Recommendations: Have a strategy in place to handle or restrict changes to
the production environment that might affect the overall solution configuration
and future deployment. Have a policy that all application changes must be
made first in development and test environments, then deployed to production.
Typically, a bidirectional deployment is only defined between a development
environment and a testing environment.
We talked about the role of release manager and their organizational tasks in a
FileNet Content Manager deployment process. The release manager works with
the change coordinator that we describe in the next section of this chapter.
280
Component
Name
Task
description
Tool
Package
Notes
Property
template
MyProperty1
Remove
default
value
FDM
004_SRC_TGT_A
_
Properties_
YYYY_MM_DD
FDM import
option
always
update
Stored
procedure
MyProcedureA
B
Replace
DB2-CLI
004_SRC_TGT_B
_
MyProcedureAB_
YYYY_MM_DD
Delete all
previous
versions
Web
service
MyWebService
3
Deploy
WAS
console
004_SRC_TGT_C
_
MyWebService3_
YYYY_MM_DD
Uninstall old
web service
first
Chapter 9. Deployment
281
282
If you inspect the data in Table 9-1 on page 281, you see that some areas outside
of FileNet Content Manager are affected but still belong to our FileNet Content
Manager solution.
Consider the following areas related to FileNet Content Manager when managing
the change process:
Commercial code and assets (versions of FileNet Content Manager, as well
as individual patches and levels of its add-ons, such as IBM Case Manager)
Custom code and assets (for example, the versions of the application
leveraging a commercial API, such as the FileNet Content Manager API, or
versioned assets, such as FileNet Content Manager code modules)
Recommendations: Deployment starts in the development phase.
Incorporate a defined build process that acknowledges changes to
commercial components and custom components in a controlled manner. A
build process can be established with commercial products, such as Rational,
or a manually defined set of process steps and shell scripts that build the
custom application. As the deployer, work together with your development
team to implement a common build process.
In software development companies, there is often a department called the build
team. Investigate whether you have a department that can assist you in creating
a build process.
At the beginning of the deployment of every custom application or FileNet
Content Manager deployment by using IBM FileNet Deployment Manager, we
advise you to handle commercial code and custom code separately. For the
targeted solution release, everything must be assembled via an automated
process if possible in observance of the dependencies.
Note: Always keep an integral point of view on the deployment process. If you
follow these guidelines, you will treat deployment as a solution.
Figure 9-2 on page 284 shows the areas for which you distinguish between
custom code (red area) and commercial code (green area):
IBM Content Navigator
FileNet Content Manager
Chapter 9. Deployment
283
Custom code
plugin
ICN node1
ICN node2
...
ObjectStore
EJB over IIOP
Content Platform
Engine node1
Content Platform
Engine node2
Custom code
module
Figure 9-2 Custom code at the level of object store and IBM Content Navigator
Figure 9-2 shows you that a custom application is dependent on the commercial
code but treated separately. Part of code works inside the application but has
dependencies to parts of the commercial code to make a functional verification
successful. Therefore, it makes sense to declare deployment into at least two or
three layers with different technical and organizational responsibilities. To
determine whether the custom code plug-in inside of IBM Content Navigator has
been deployed correctly, conduct a regression test by calling the plug-in inside
IBM Content Navigator. The functional verification is done by the business
department that belongs to the application layer, not by the owner of the
implementer or deployer role.
A regression test of a custom code module in FileNet Content Manager
determines whether a new version in FileNet Content Manager has been
generated for the code module after running the related IBM FileNet Deployment
Manager import package.
Figure 9-3 on page 285 shows the layer-based deployment approach.
284
IBM Content
Navigator
IBM Workplace
XT
IBM Enterprise
Records
Custom
Application
Service layer
Database
Application Server
Network
LDAP
Content Platform
Engine
Workflow
System
Component
Integrator
CBR
Note: From an architectural point of view, this diagram might differ from your
environment in distinguishing between the back end and the front end.
This three-layer approach shows the differences of deployment types and also
the different departments with their responsibilities for the solution. The solution
contains the presentation layer (the application tier as shown in Figure 9-2 on
page 284) where IBM Content Navigator is running. The IBM Content Navigator
depends on a service layer where the underlying configuration database is
deployed. The service layer also provides the support for the FileNet Content
Manager base layer where your FileNet Content Manager is running. In our
example, it is IBM WebSphere Application Server Network Deployment. Each
layer has its own deployment type and process.
Chapter 9. Deployment
285
For more information about the separation of custom code and commercial code
and the build process for IBM Content Navigator, see 9.7.5, Exporting and
importing other components on page 311 and Customizing and Extending IBM
Content Navigator, SG24-8055. For more information about the separation of
custom and commercial assets during repository design, see Chapter 4,
Repository design on page 81.
In summary, a layer-based approach is necessary to split responsibilities, reduce
risk, and improve the quality of the IBM FileNet Content Manager solution. These
layers represent different change requests based on deployment types that are
determined by the development team and documented in a common worksheet.
A layer-based approach also shows the complexity of deployments.
To speed up the change requests and reduce their complexity, it is essential to
automate. The next section describes tooling that can be used to achieve useful
configuration management.
286
These parameter values can be used by the build process for specific
environments. It makes sense to keep track also of the access control entries
(ACE) from an access control list (ACL) of a FileNet Content Manager repository
design object.
Retain a zip/tar file of all release-specific data, including code, exported assets,
and documentation, in a central datastore. Typically, you maintain
release-specific data by using a code version control system. IBM clients can use
IBM Rational ClearCase, for example.
Recommendations: Implement a central datastore that tracks the
parameters, such as GUIDs, object store names, and project names, that you
need for the deployment. The datastore needs to be implemented for all target
environments in one location that is accessible to every environment. The best
way is to use this datastore via FileNet Content Manager API to automatically
track changes and specifications vigilantly.
9.3.4 Testing
There are multiple ways to address environments associated with testing. One
way is to split testing into two major phases that typically happen in different
environments:
Development environment
In this environment, the following tests are commonly conducted:
Unit testing verifies that the detailed design for a unit (component or
module) has been correctly implemented.
Integration testing verifies that the interfaces and interaction between the
integrated components (modules) work correctly and as expected.
System testing verifies that an integrated system meets all requirements.
Testing environment
In this environment, the following tests are commonly conducted:
System integration testing verifies that a system is integrated into the
external or third-party systems as defined in the system requirements.
User acceptance testing is conducted by the users, customer, or client to
validate whether they accept the system. This is typically a manual testing
process with documented expected behavior and the tested behavior.
Load and performance testing.
Chapter 9. Deployment
287
Regression tests
For all environments, automated regression testing must be implemented to be
able to verify earlier test cases. In production, you mostly perform a smoke test,
which means testing only the most critical functions of a system. The automated
regression testing suite might include from each relevant aspect one test object,
such as a test document class, a test search template, a test folder, and a test
workflow. In the deployment of repository design objects, it is best practice to
deploy repository design objects on the same environment in another object
store by using IBM FileNet Deployment Manager utility.
The regression test must be used after having modified software (either
commercial or custom code) for any change in functionality or any fix for defects.
A regression test reruns previously passed tests on the modified software to
ensure that the modifications do not unintentionally cause a regression of
previous functionality. Regression testing can be performed at any or all of the
previously mentioned test levels. The regression tests are often automated.
Automating the regression test can be an extremely powerful and efficient way to
ensure basic readiness. The implementation of automated regression tests is
time-consuming and the test cases must be adjusted every time a change occurs
in the business functionality.
Recommendations: Establish a small suite of automated regression tests in
each environment. The best synergies are achieved by having the deployment
of the test assets and the test script as automated as possible. One side effect
is that this automation of regression tests affects the repository design and
you must be able to revert changes.
Test automation
Two areas of consideration for automating tests are available:
Load and performance test
Regression test
While the load and performance test might be executed only on major version
changes (commercial or custom releases), the effort to maintain the code for the
automation might be substantial.
288
Hint: It is essential that you test with production data. Have a process in place
to make production data anonymous before you start testing.
If you use the datastore to track all repository design objects, you can revert
those changes that were made by your regression suite.
Recommendations: Ensure that your testing environment or performance
load environment is preloaded with test data. The data inserted by your
regression suite must simulate the amount of data generated by average
concurrent users to get a baseline performance matrix.
The regression test must be generic enough so that the scripts are written once,
and maybe updated if there are minor changes, but typically stay pretty stable
over time. This approach can be reached by developing a regression test
framework. We advise that you store the scripts together with the supporting
version of the test application in one location. Typically, this location is your
source code version control system.
Test automation tools are available from IBM and other vendors. For example,
see the IBM Rational products website:
http://www.ibm.com/software/rational
Recommendations: Distinguish load and performance tests from regression
tests. Each area has its own characteristics.
You can typically use the existing testing infrastructure for load and
performance tests. For regression testing, it makes no sense to use a
centralized large and complex infrastructure. It is more important that the tests
can be executed and quickly show simple results.
Chapter 9. Deployment
289
Test documentation
Before we move to a discussion of the actual deployment, we must discuss the
testing documentation and its importance.
In a FileNet Content Manager project, multiple departments with different skill
sets are involved. It is difficult to perform user acceptance testing or integration
testing without a clear concept of what needs to be tested, how it must be tested,
what the expected behavior is, and how the tests must be conducted. This
documentation is derived by the requirements analysis of the business needs.
Documenting the test cases with descriptions of the inputs and expected
behavior is useful. Test descriptions must have enough information to achieve
repeatability, which means that multiple testers can perform the same test (in an
identical environment) while working from the test documentation and get the
same results. After the execution of the tests, collect and document all of the
observed system behaviors. Using this information, the release manager can
decide to proceed with the new release or to delay the release if there are more
bugs to fix.
290
Several of the tests might fail. It is crucial to document the behavior but also the
resolution. The knowledge base must be part of problem management.
Combining test documentation with a searchable interface to find known
problems is advantageous.
In FileNet Content Manager and authorization, the detailed, different levels of
security must be tested by using an intersection of security principals. More
information about security is in Chapter 5, Security on page 151.
Recommendations: Carefully document your tests with sufficient detail
before the tests are executed. Make the test documentation database
searchable to search for problems previously seen by users. Create a
knowledge database and publish it on a social media platform, for example, in
an internal wiki.
The social media platform is available from IBM and other vendors. For example,
try out IBM Connections. For IBM Connections information, see this website:
http://www.ibm.com/software/lotus/products/connections
Chapter 9. Deployment
291
There are at least three preferred practices to deploy (transport) changes from
one environment to another:
Cloning (AIX logical partition (LPAR)-based cloning or VMware based
cloning)
Exporting, converting, and importing using the IBM FileNet Deployment
Manager utility
Using scripted generation of all the necessary documents and structures
9.4.1 Cloning
You can deploy changes from one environment to another by cloning the source
environment and bringing it up as a new but identical instance of the source
environment.
Cloning is practical when you temporarily need a dedicated environment. It must
be an exact copy of what you have already in place, for example:
In a training class, you need to be able to quickly revert or go forward to a
well-known working environment (by a teacher over lunchtime) for the next
class or for the next part of the lectures. (A few students might not be able to
follow the exercises and then are unable to continue with the rest of the class
because their environment has not been set up correctly.)
Many parallel identical training environments are needed to educate more
people in a short period of time.
Development environments are needed to work in parallel.
Test environments are needed for specific tests.
You can use local VMware based images to clone a system. For a large system,
however, this might not be a workable solution. Large systems are often not as
flexible as small systems, or there is a lack of powerful machines that can be
made available in a timely manner for cloning. Sometimes, the security and
networking policies do not allow these virtual environments to connect to
back-end machines.
Note: IBM AIX LPAR-based cloning and IBM AIX workload partition
(WPAR)-based cloning are alternatives to VMware based cloning. The cloning
methods that you use depend on your infrastructure. For more information,
see IBM FileNet P8 Platform and Architecture, SG24-7667.
The next logical step is to use virtual farms that host applications at larger client
sites. This approach might not be practical for the following reasons:
292
From the corporate network, they cannot be accessed unless using remote
desktop applications. A direct interaction is not possible due to using the
same host names and IP addresses multiple times in the same network.
Single virtual images are typically not powerful enough for the full stack of
components that are needed for a solution (the stack includes the directory
server, database, application server, and other FileNet Content Manager
components).
A good way to still rely on virtualization techniques is described in 3.3.1, A
virtualized IBM FileNet Content Manager system on page 60. The solution is
built on individual images hosting the various IBM FileNet P8 components,
including the database and directory server. The images are accessed by a
gateway, which shields the network topology of the FileNet Content Manager
solution from the corporate network by using network address translation (NAT)
and virtual private network (VPN) access.
You clone an environment by copying all files representing the storage of virtual
images. With this approach, you can clone an environment within hours with little
knowledge. Use this approach predominantly for development and training.
For cloning, consider the following topics:
Chapter 9. Deployment
293
Separation of commercial code from custom code and automation of the build
process mainly for Java EE based or .NET based applications
Adherence to the proposed guideline of stable GUIDs to reduce
dependencies
Implementation of a central datastore (database-based or file-based) in which
environment-specific information is stored and serves as the datastore for
scripting
Automation wherever possible
Deployments typically apply assets from the development environment to the
testing environment. The production environment typically receives deployments
from the testing and acceptance test environment.
There are cases for which you might consider the reverse:
Documents with configuration characters, such as search templates, add
entry templates, custom objects, and folders, that have been created in a
production environment. (This raises the question of what a release needs to
contain and restrict.)
Hot-fixing a serious production problem in the staging area.
Refreshing and populating a training environment or acceptance test system
with anonymized or obscured production data.
For example, you can use IBM InfoSphere Optim Data Lifecycle
Management Solutions to anonymize or obscure production data:
http://www.ibm.com/software/data/optim
The activity for transformation can take place as described in Deploying IBM
FileNet P8 applications in the IBM FileNet P8 Information Center, before or
just after import. Custom scripts can be called to make the necessary
transformation. The transformation can also be conducted on the exported
files before importing. The IBM FileNet Deployment Manager utility has a
script interface. This script interface allows you to run a code fragment on
each object before or after import. Furthermore, it allows you to run a script
once before or after import. You can combine all four of them. For more
information about the IBM FileNet Deployment Manager utility, see the IBM
FileNet P8 Information Center where you can find an example of creating
marking sets during import.
294
This approach has been proven to work, but the effort to maintain this type of
script is huge and every change must be put into the script code. All of the
benefits of using a tool, such as the IBM Administration Console for Content
Platform Engine (ACCE) (or IBM FileNet Enterprise Manager, are lost with this
approach. There is little benefit in using this approach unless it is to overcome
limitations where there is no alternative.
Nevertheless, this approach can be used to create basic structures, such as a P8
domain, create generic object stores with their add-ons, marking sets, folder
structures, or maintain application roles.
For object stores and their add-ons, you can customize the schema script that is
used to generate the object store tables over Java Database Connectivity
(JDBC). The advantage of customizing the script is to create custom indexes.
With a customized object store add-on, you can create specific metadata for your
custom FileNet application.
The script-based approach has a problem because every change within the
FileNet Content Manager master system needs to be synchronized with the
script source. You can develop a wrapper to perform this work but developing a
wrapper can be time-consuming. In the next sections, we describe other FileNet
Content Manager population techniques.
Chapter 9. Deployment
295
unidirectional synchronization
instance 1..3
namingcontext.root1.local
Referral
namingcontext.root2.local
Referral
namingcontext.root3.local
OU1
OU1
OU1
FileNetUsers/
FileNetGroups
FileNetUsers/
FileNetGroups
FileNetUsers/
FileNetGroups
Figure 9-5 shows using IBM Tivoli Directory Integrator or the Active Directory
Lightweight Directory Service replicating from a Microsoft Active Directory.
Additionally, you can enable referrals in each naming context to have a
distinguished name (DN) available that does not exist in the current naming
context. Referrals are not problematic if you have UserPrincipalNames (UPN)
enabled in FileNet Content Manager across the naming contexts due to the
unique short name requirement.
296
Chapter 9. Deployment
297
9.5.2 Topology
Figure 9-6 illustrates a clonable topology with three identical environments that
use VMware images. Every domain is formed by a collection of servers that are
part of multiple VMware images. All images of one domain are connected over a
private network to a special image that is called the router. The router
implements network address translation (NAT) and virtual private network (VPN)
gateway functionality by using tunneling over Secure Shell (SSH) or other
products. The other network link of the router is mapped to a network card that
has access outside, which can be to the corporate network.
To clone the environment, only the router image needs to be modified and the
public interface needs to be set up correctly. An IBM Content Navigator instance
resolves the Domain Name System (DNS) of the router image.
Note: This approach is useful for development environments only.
router
router
router
Figure 9-6 shows how an access to a FileNet Content Manager domain can be
established by using a separate gateway for each environment.
Cloning offers these advantages:
Automatic provisioning of environments by using a master clone
Less effort to create a FileNet Content Manager environment
298
Chapter 9. Deployment
299
300
There are multiple ways to figure out the differences between the two releases:
Manually
By strictly rolling forward changes from the source environment to the target
environment and preventing any changes to the target environment between
releases
Automated discovery of the differences
Manually detecting the differences between the source environment and the
target environment is time-consuming and error-prone. This option is only valid
for small deployments or if the difference is related to only one asset type, such
as an instance of the custom object class.
Clients typically choose the second option with the consideration that someone
has manually verified both environments. In a multi-stage environment, there is a
good chance that mistakes in this approach will be detected in the first
deployment step from the development environment to the test environment.
When errors are detected at this point, there is an opportunity to fix the
underlying problems and retry the same procedure. As soon as the deployment
to the test environment passes testing (and is documented), the future
deployment to production most likely works smoothly.
The third option is extremely difficult to achieve and potentially too expensive.
There are many exceptions when just comparing date times between the various
environments. A development or source environment might include more objects
than will be used for the target deployment. So, a selective tagging of objects,
which are part of a release, seems to be mandatory.
Chapter 9. Deployment
301
302
If you use IBM Forms, you must use stable GUIDs across all stages, especially
for the version series IDs of workflow definitions.
Save time.
Reduce errors.
Reduce risks.
Ensure similarity among environments.
Reproduce problems.
Chapter 9. Deployment
303
Important: The use of IBM FileNet Enterprise Manager for performing FileNet
Content Manager deployments is deprecated. Use the IBM FileNet
Deployment Manager utility instead.
Tip: The IBM FileNet Deployment Manager utility has no customizable query
interface so you need to use FileNet Enterprise Manager to perform a custom
query and put the results into an export manifest. This export manifest, which
is generated by FileNet Enterprise Manager, can be used in IBM FileNet
Deployment Manager to perform the remaining export and import steps,
including conversion.
The IBM FileNet Deployment Manager utility offers a consistent way of
performing deployments across different stages and environments. It can be
used to deploy data between disconnected FileNet Content Manager systems
even if these systems use different LDAP providers and different security
principals.
It offers a powerful mapping interface with automatic mapping if the symbolic
name of the object store or the short name of a security principal is the same.
We need to know which data can be used for export.
There are three major types of objects to be exported:
Structure (such as document classes and folders)
Configuration documents (such as templates, subscriptions, events, custom
code modules, and workflow definitions)
Business documents (images)
304
Hierarchy of exports
Build a logical hierarchy of exports, which can help you to test the imports
sequentially.
Configuration documents, such as entry templates, are documents that include
content elements and they are stored as an XML file in the repository. Document
class definition is a database object with no content elements. Exporting a
document class definition with its property definitions is a simple transaction,
whereas exporting a configuration document involves exporting the content
elements.
Certain objects, which include all FileNet Content Manager domain-level objects,
cannot be exported.
There are Application Engine-related objects that cannot be exported as well.
See 9.7.5, Exporting and importing other components on page 311.
There is a general export and import sequence:
1.
2.
3.
4.
Marking sets
Choice lists
Property templates
Security templates
Chapter 9. Deployment
305
Exporting content
Usually, content is versioned. For configuration documents, it does not make
sense to export all versions. The most recent version is the best choice, so as a
rule, discard earlier versions. If you import them to the target system, keep only
one version.
306
Transformation
required
Remarks
Property Templates
Not required
Choice Lists
Not required
Chapter 9. Deployment
307
Type of asset
Transformation
required
Remarks
Document Classes
Not required
Workflow Definitions
Required
Folders
Not required
Search Templates
required
Entry Templates
required
Events
Not required
Dependencies to subscriptions.
Code Modules
Not required
Dependencies to subscriptions.
Subscriptions
Not required
Business documents
Not required
308
Chapter 9. Deployment
309
There are existing documents that you can reference when you use FileNet
Deployment Manager:
Proven Practice: IBM FileNet Deployment Manager 5.1 Data Migrations
http://www.ibm.com/support/docview.wss?uid=swg21609929
Migrating IBM Case Manager solutions using FileNet Deployment Manager
and Case Manager Administration Client. (This document is valid also for
FileNet Content Manager only.)
http://www.ibm.com/support/docview.wss?uid=swg21612959
Impact of a FileNet Deployment Manager import using the Update If Newer
option without the Use Original Create/Update Timestamps option
http://www.ibm.com/support/docview.wss?uid=swg21455363
FileNet Deployment Manager fails to import the documents of a certain
version series if one of those documents references an object that appears
after the document in the deploy data set (explanation of object hierarchy)
http://www.ibm.com/support/docview.wss?uid=swg27020038
MustGather: FileNet Deployment Manager (FDM)
http://www.ibm.com/support/docview.wss?uid=swg21502186
Hints
When using the FileNet Deployment Manager, remember this list:
IBM FileNet Deployment Manager is able to transfer objects between different
FileNet Content Manager domains within a single compressed file containing
all the exported content.
Object store transplantation between different FileNet Content Manager
domains can occur by using the object store reassign function.
Change the total transaction time to be longer for your connection pool to
keep long running import processes active.
Always inspect the FileNet Content Manager log and IBM FileNet Deployment
Manager log after you import. Create pattern matching search strings to
extract the data that you need.
310
WFAttachmentFinder.exe
Process Configuration Console and peObject_export.bat and
peObject_import.bat
Database
All changes to rows in the object store database are covered by exporting and
importing objects as previously explained.
In addition, you can consider propagating changes that have been applied at the
database level, such as adding additional indexes, changing server options, and
others. You can typically accomplish this function by extracting the index-related
information from the SQL-based scripts that were written to configure the
database in the source environment. Check whether the scripts depend on
infrastructural information, such as user ID, password, server name, IP
addresses, and database name.
Full-text index
Content-based retrieval (CBR) is covered in 4.12.2, Content-based search on
page 141 and Chapter 11, Upgrade and migration on page 371.
Workflow System
Workflow System export and import are straightforward by using the Process
Configuration Console or the command-line interface, which supports both full
and incremental deployments.
Chapter 9. Deployment
311
The underlying workflow system APIs contain all the required methods to move
Queues, Rosters, and EventLogs, and to validate Workflow Definitions.
You can export the Workflow System configuration by a call to the Workflow
System Java API. The import into the Content Platform Engine works in a similar
way.
If you currently use other Workflow System features or services, see Deploying
P8 applications in the IBM FileNet P8 Information Center.
FileNet Workplace XT
Whenever FileNet Workplace XT applications have to be moved between
environments, there are business assets and application configuration assets to
be deployed.
FileNet Workplace XT stores various objects in an object store:
Site Preferences
User Preferences
Entry Templates
Stored Searches
Search Templates
Application Roles
We have already discussed the business assets under the export, transform, and
import process. We do not need to provide further explanations from a
methodology point of view.
Table 9-3 provides a short summary about these assets.
Table 9-3 Summary
312
Asset type
Automatic
transformation?
Remarks
Site Preference
No
User Preferences
No
Entry Templates
Yes
Asset type
Automatic
transformation?
Remarks
Search Templates
Yes
Application Roles
No export/import
possible
FileNet Workplace XT is a web application that spans one war file, which
contains the relevant Java APIs to connect to FileNet Content Manager.
9.8 Conclusion
We learned that deployment is divided into two major parts. The first one is the
organizational part where you ensure that deployment is comprehensive by using
a compliant process that is derived from the software development process. You
as owner of the deployer role will be involved in each phase of development but
the majority of work is performing the change and configuration management.
Deployment without having a strategy in place causes inconsistency sooner or
later. The second part of deployment is the procedural part of deployment.
This chapter also contained information about useful techniques and tools for
deployment. We included samples of different ways of deploying based on the
different kinds of data migrated between the environments. The first tool of
choice for FileNet Content Manager data is the IBM FileNet Deployment
Manager utility. This tool requires process-oriented procedures to ensure a
quality-driven deployment in your company.
Chapter 10, System administration and maintenance on page 315, provides
information about how to ensure the availability of your FileNet Content Manager
solution from an administrative point of view.
Chapter 9. Deployment
313
314
10
Chapter 10.
315
316
Security administrators
Ensure that the directory server meets the requirements for an IBM FileNet
Content Manager installation, provide the LDAP configuration information
required for an IBM FileNet Content Manager installation, and tune
performance.
Global configuration database (GCD) administrators
Members of specific LDAP groups, who are identified during the Content
Platform Engine installation, and who have rights to create object stores and
other P8 domain-level artifacts.
Object store administrators
Members of specific LDAP groups, who are identified during object store
creation, and who have rights to administer the object store.
Because so many people can be involved in managing a P8 environment, use
the following best practices:
Document the following information:
Users and LDAP groups assigned to each role
Local processes that must be followed when making changes
Keep the documentation current. Do not just create it for installation purposes
and then forget about it.
Always have at least two people in your organization who can fill a role.
Use email distribution lists to ensure that everyone is kept informed when
changes are being made to an environment.
317
During the installation of any IBM FileNet Content Manager component, you are
prompted for the location of the information center. If suitable access is available
from your site, use the URL of the IBM hosted information center for these
reasons:
The content is updated regularly.
There is no maintenance overhead.
An installable version of the information center is also provided with the product
software. If access to the Internet is limited at your site, install the shipped
version of the information center on a local application server and give the URL
for this local installation when configuring the IBM FileNet Content Manager
components.
When using any of the applications or administrative tools supplied with IBM
FileNet Content Manager, clicking Help displays the appropriate topic from the
information center. You can also navigate to the same information by accessing
the information center directly.
The following figures show the main menu for the IBM Content Navigator help.
Figure 10-1 on page 319 shows the display after clicking Help from within the
IBM Content Navigator application. Figure 10-2 on page 319 displays the same
information but it was accessed by navigating to the help from the information
center posted on the IBM website.
318
Figure 10-1 IBM Content Navigator help when accessed from within the application
Figure 10-2 IBM Content Navigator help when accessed from the Information Center
319
320
Performance
Functional issues
Capacity planning
Security
321
CPU usage
Memory usage
Disk usage
Response times
Number of objects in the object store
There also might be legal requirements that need to be monitored; for example,
knowing who accessed content, or when content was deleted.
Three useful web pages are available for getting a quick check on the status of
the Content Platform Engine:
The Ping Page (Content Engine Startup Context):
http://<Content Platform Engine server>:<port>/FileNet/Engine
Use this URL to check that your Content Platform Engine environment is
running and to gather other useful information about the environment, such
as the build number. Figure 10-4 on page 323 shows the information that is
provided by the ping page.
322
323
324
325
326
as possible to the needs of the live production system, so that the output from the
Content Capacity Planning tool provides an accurate assessment of future
needs. Proper capacity monitoring provides you with advanced notice that
additional server resources need to be allocated to the system. The initial model
is an estimate of what is needed by using the numbers that you provide. If your
estimated content count or size was too small, you need to plan additional space.
327
The listeners are activated automatically; however, there are several operating
system-specific requirements. For more information, see the following page:
http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/index.jsp?topic=%2Fcom.
ibm.p8.sysmgr.admin.doc%2Foverview_activate.htm
10.4.2 Dashboard
The Dashboard generates detailed reports about performance. It displays the
details and can also save the information in various formats.
The Dashboard is a Java utility that can be installed and run on Windows or
UNIX/Linux clients. It is installed separately from the server installation. It can
also be installed and run on the IBM FileNet Content Manager servers. On
Windows machines, run the Dashboard utility. On UNIX, you must have an
XWindows display exported and run the P8Manager shell script. The Dashboard
installs a local copy of its online help that can be accessed from the Help menu
option.
When the Dashboard is first run, you need to define clusters of IBM FileNet
Content Manager components to monitor. (The cluster is a logical construct used
by the Dashboard; it has no relation to an application or operating system
cluster.) These clusters are not used for high availability but are simply a
user-defined logical collection or cluster of servers to monitor. The cluster
contains servers and monitoring frequency. Select the Clusters tab and click
New. Enter a name for the cluster, which is typically the application system name
or location, and click OK. See Figure 10-7.
328
The Dashboard tool queries the System Dashboard listeners on the servers and
populates details in the Dashboard tools various windows. It finds all listeners
running on each server; individual servers need to be defined only once. You can
save the cluster details for future use or open existing details from the file menu.
The cluster file is an XML-formatted file that is saved on the local computer. You
can copy the cluster.xml file to other computers where the Dashboard is
installed for use on other workstations.
329
330
The Dashboard has a report mechanism that allows you to save reports in
comma-separated value (CSV) format, which is useful for generating
spreadsheet reports. There is also an export option that enables you to generate
data that can be used as input to IBM Content Capacity Planner. In addition, you
can save the report template for future use. For more information, see the
Dashboards online help for reports.
Note: The Dashboard uses the name Scout to refer to the IBM Content
Capacity Planner.
Figure 10-10 on page 332 shows a sample report output.
331
332
Description
-t hh:mm
Total amount of time in hours and minutes that the archiver process
must run
-n hh:mm
The interval at which the current archived files must be closed and new
ones opened
-i integer
-d file path
The path to the location at which to place the archive log files
FileName.xml
The complete path to the saved cluster file that specifies which
machines to poll
333
Usage Reporter
The Usage Reporter is provided with the System Dashboard and is used to
monitor the number of users accessing the Content Platform Engine. The tool
looks for individual user names. If access to the engine is via an application that
uses a service or guest account, the tool might not reflect the number of people
actually using the system.
For more details, see the following document:
http://publibfp.dhe.ibm.com/epubs/pdf/c1930850.pdf
334
The rapid fault isolation and corrective action database make System Monitor a
must-have for mission critical systems. System Monitor reduces manual efforts in
the daily administration of IBM FileNet Content Manager and helps to increase
system availability. System Monitor can help reduce your operational costs and
help you meet your service-level agreements (SLAs) more efficiently.
For more information about IBM ECM System Monitor, go to the following links:
http://www.ibm.com/software/products/us/en/ecmsystemmonitor
http://www.ibm.com/support/docview.wss?uid=swg27010374
335
10.5 Tracing
Tracing is primarily used for debugging. Tracing can be enabled for many
components. By default, tracing is disabled. Tracing can be enabled and disabled
without recycling the application server.
Note: Tracing all components can create enormous trace log files with little
system activity. Performance might also be affected. Enable the minimum
necessary tracing to collect the required information in relation to the problem
that you are investigating.
Tracing is controlled via the IBM Administration Console for Content Platform
Engine (ACCE). Tracing can be enabled at the P8 domain level and at the site
level. Any setting at the site level takes precedence over the setting at the object
store level, including the location of the trace logs. If you use different settings at
the P8 domain and site levels, ensure that you track the various settings.
Figure 10-12 shows the trace control settings at the P8 domain level.
336
10.6 Auditing
Auditing is the recording of events that occur on objects. For each recorded
event, a row is added to the event table in the object stores database. From the
event object, you can get information about the audited event, including the
creation date, originating user, result status, and source object of the event.
All out-of-the-box events, such as create and check-in, can be audited, and the
auditing capability can be extended to custom events. The IBM Enterprise
Records (IER) product, for example, takes advantage of this extensibility to
provide audit events that are specific to a records management environment.
However, auditing can affect both performance and the database space usage,
so it is important to configure auditing judiciously.
For more information about auditing concepts, see the following topic in the
information center:
http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/index.jsp?topic=%2Fcom.
ibm.p8.ce.dev.ce.doc%2Faudit_concepts.htm
In addition to storing information about the type of change made to the object and
who made the change, you can also choose to store copies of the object before
and after the audited event. The ObjectStateRecordingLevels property defines
whether to keep copies of the object in the audit record from before and after the
audited event. The ObjectStateRecordingLevels property takes the following
values:
ORIGINAL_AND_MODIFIED_OBJECTS
Records a copy of both the original, pre-event object and the modified,
post-event object
MODIFIED_OBJECT
Records a copy of the modified, post-event object
NONE
Does not store a copy of the object being audited
337
338
Use the Ping page to find the log file location. On the Ping page, search for Log
File Directory:
http://<Content Platform Engine server>:<port>/FileNet/Engine
These files are at the following default locations:
WebSphere Application Server
install_root/profiles/profile_name/FileNet/server_Instance_name
WebLogic Server
bea/user_projects/domains/my_domain/FileNet/AdminServer
JBoss
jboss_install/jboss-as/bin/FileNet/server_instance_name
In addition, you might need to review the application server message logs. These
files are at the following default locations:
For IBM WebSphere:
WAS_install_path/AppServer/profiles/profile_name/logs/server_name/S
ystemOut.log
WAS_install_path/AppServer/profiles/profile_name/logs/server_name/S
ystemErr.log
For Oracle WebLogic
WLS_install_path/user_projects/domains/domain_name/servers/server_na
me/logs/server_name.log
For JBoss
JBOSS_DIST/server/server_name/log/server_name.log
339
340
Tip: You can perform all the required configuration tasks, such as configuring
the directory server and creating JDBC data sources, manually by using the
administrative tools provided by the Java EE application server. However,
completing the tasks this way is error prone and likely to result in an
installation that is configured incorrectly. Instead, use the Configuration
Manager.
341
342
Top level of tabs: o ne for th e domain and o ne for e ach o bject s tore to which you navi gate
Control s for
scrolli ng to
additi onal tabs
If need ed, an
additi onal
horiz ontal
scroll bar
dis plays
here
Domain-level settings
The following tabs are available at the domain level. Many settings can be
altered. However, unless you want to enable a capability that is by default
disabled, or you are trying to resolve a particular issue, we highly advise that you
leave the settings unchanged, especially because many of the settings can be
overwritten at the object-store level. The following settings are domain-level
settings:
General
Use to set the URL to the P8 Information Center to make it easier to get
information when you use ACCE.
Properties
Provides a concise listing of the domain-level artifacts, such as the number of
external repositories, object stores, and content cache areas.
Security
Lists the default security settings for the P8 domain. Whenever a new object is
created in the domain, these settings will be applied by default. However, they
can be manually overwritten if needed.
343
Directory configuration
Provides the details of the currently configured LDAP environment. The initial
set of values is defined when the Content Platform Engine software is
configured by using the Content Manager configuration tool. After the initial
installation and configuration are verified, use the Directory configuration tab
to add information about additional LDAP environments or to modify the
existing configuration.
Server cache subsystem
Use to define the refresh time for and maximum size of the different types of
cache, including the user cache, and also to set the default and maximum
number of objects that can be returned by a search.
Audit subsystem
Use to configure the pruning of the audit log.
Content subsystem
Use to maximize the throughput of content to clients and to configure
thumbnail generation.
Content cache subsystem
Use to define the size and location of the content cache.
Text search subsystem
Use to enable the text search capability that is provided by Content Search
Services (CSS) and to optimize the settings for the text extract, indexing, and
searching capabilities.
Tip: Enabling the text search capability does not cause text to be indexed.
It just makes the feature available. You must also navigate to the specific
object stores and document classes that have the content to be indexed
and enable text searching at those levels as appropriate.
Trace subsystem
Use to configure trace log options. The trace logs are usually required when
more information is needed on an error that has been logged in the content
error log or when determining the cause of slow performance. But, be
selective about the components to trace and the amount of time that tracing is
enabled because the tracing will affect system performance and generate a
large amount of log information.
344
Sweep subsystem
Use to configure how often to run the sweep processes and what resources to
allocate to the sweep processes. The actions of the sweep runs are defined at
the object store level and enable you to ensure that old content is removed
from the system in a timely fashion, and perform bulk operations that are
related to setting retention times and thumbnail generation.
Replication subsystem
Used with FileNet Content Federation Services for Image Services to define
the available resources for replicating information from Image Services to
Content Platform Engine, and from Content Platform Engine to Image
Services.
Use the replication subsystem options to stop the Content Platform Engine
from processing any federation requests. This capability can be useful when
you need to perform maintenance work on the Content Platform Engine or
Image Services server.
Publishing subsystem
Use to configure the rendition processing that is available with Rendition
Engine, an optional add-on to the IBM FileNet Content Manager suite.
Asynchronous processing
Use to enable and disable event processing, and to optimize the event
processing by the wait time, timeout setting, number of workers, and how
often failed events need to be tried again.
FileNet Content Federation Services import agent subsystem
Use to enable and disable the processing of FileNet Content Federation
Services for Image Services and FileNet Content Federation
Services-Content Manager OnDemand federation requests.
Workflow subsystem
Use to configure the available resources for workflow and case analyzer
processing.
Recommendations: Ensure that dispatchers that are not being used are
disabled. For example, disable the asynchronous processing if events are
not being processed. Each dispatcher issues regular queries to look for
work, so if there is no work to look for, you can save system resources by
disabling the dispatcher.
345
Site-level settings
Every P8 domain has at least one site. In a geographically distributed
environment in which you have configured multiple sites, it might be appropriate
to overwrite P8 domain-level configuration settings with site-specific settings.
Some site-level settings can also be further refined at the object store level.
To access the site-level information, in ACCE, navigate to Global
Configuration Administration Sites.
The following tabs are available at the site level:
General
Besides providing general information about the site, such as its name, use
this tab to specify whether requests can be forwarded to or from this site. This
option is not available at the domain level.
Properties
Server cache subsystem
Use to configure the cache for various objects, including user tokens, marking
sets, and the GCD.
Audit subsystem
Use to configure pruning of the audit logs.
Content subsystem
Along with the content cache subsystem, use this tab to optimize the upload
and download of content by clients.
Content cache subsystem
Use to define the location of the content cache areas and the number of
elements that can be stored in the cache.
Text search subsystem
Use to optimize the indexing of and the searching for content.
Trace subsystem
Use to configure trace logging. Trace logging is often needed when you are
troubleshooting issues with the environment and with custom applications.
Sweep subsystem
Use to enable the sweep capability and to build sweep schedules. The sweep
processes are defined at the object store level and can be used to perform
bulk updates, move content, and manage queues.
346
Replication subsystem
Use with FileNet Content Federation Services for Image Services to manage
the frequency with which updates to the Image Services repository are
replicated to the object stores, and vice versa.
Publishing subsystem
Use with Rendition Engine to manage publishing processes.
Asynchronous processing subsystem
Use to enable and disable asynchronous event processing and to optimize
the processing of the events. When events are generated, a row is entered
into the queueitem table. After an event is successfully processed, the row is
removed from the table. By default, if an event fails to process successfully, it
will be tried again up to seven times. Two columns, retry count and next retry
date, in the queueitem table track the number of retry attempts and the next
time an attempt will be made to retry processing an event that previously
failed.
Tip: Avoid large backlogs in the queueitem table. Set up regular queries
against the queueitem table to track the backlog, event processing
throughput, and event processing failure rate.
FileNet Content Federation Services import agent subsystem
Use with FileNet Content Federation Services for Image Services to manage
the initial federation of documents from Image Services to object stores.
Workflow subsystem
Use to manage the processing throughput of workflows and Case Analyzer.
347
Database connections
A Java Naming and Directory Interface (JNDI) XA and non-XA data source
pair defines a JDBC connection to a database available to the IBM FileNet
Content Manager software. You define the data source pairs by using the
Configuration Manager.
In ACCE, you define database connections as labels to a data source pair.
And then, as you define object stores and workflow systems, you identify
which database connection (and therefore database) to use.
Object stores and workflow systems can share databases, which can simplify
the maintenance of the P8 environment. Ensure that if you combine
databases that it does not adversely affect any of these areas:
Application requirements
Backup and restore schedules
Data independence requirements
External repositories
These repositories exist outside of the current P8 domain whose content can
be made available by using FileNet Content Federation Services.
For more information about FileNet Content Federation Services, see
Federated Content Management: Accessing Content from Disparate
Repositories with IBM Content Federation Services and IBM Content
Integrator, SG24-7742.
Fixed content devices
These storage devices, such as IBM Tivoli Storage Manager, EMC Centera,
and Network Appliance SnapLock, can be used to store object store content.
A full list of the supported devices is provided in the IBM FileNet P8 Hardware
and Software Requirements guide, which can be downloaded from the
following page:
http://www.ibm.com/support/docview.wss?rs=3278&uid=swg27013654
Rendition Engine connections
Used to configure Darwin Information Typing Architecture (DITA) and P8
Rendition Engine connections.
Replication groups
A replication group is used to connect an external repository with an object
store.
348
Sites
A site is a logical grouping of P8 domain resources. This feature is used
primarily with geographically dispersed clients and multiple data centers. You
can use this feature to allocate local resources to clients and limit the amount
of WAN traffic.
Text search servers
These servers are for use with CSS. Each server can be used to perform text
indexing, text searching, or both.
Add-ons
Add-ons are modules that can be added to any object store to support
specific functionality. Several add-ons are supplied with the IBM FileNet
Content Manager product to support functionality provided by Content
Platform Engine and FileNet Workplace XT. Other products in the P8 Suite,
such as IBM Enterprise Records, as well as custom applications, can also
provide (and require) additional add-ons.
Tip: When creating object stores, use only the add-ons that you are sure
will be needed. Additional add-ons can be added later, but after they are
added to an object store, they cannot be removed.
Marking sets
Marking sets are special properties that help control access to objects.
349
350
If recovery bins and soft deletes are used in applications, work with the
designers of the applications to answer the following questions:
How many recovery bins are needed?
Who can restore content from recovery bins?
Who can empty recovery bins?
When is it appropriate for the object store administrator to empty and
remove recovery bins?
Defining sweep processes
Sweep processes can be defined for bulk updates, retention management,
and queue management. Build the default schedules for these processes at
the domain level. But, build the sweep definitions, which the individual sweep
process will accomplish, at the object store level.
Configuring connection points
Backups
Ensure that regular database backups are taken. As object store content is likely
stored in file storage or on fixed content devices, the database backups need to
be coordinated with backups of the data on the storage devices and also cover
any temporary storage areas. In addition, if content search indexes and workflow
systems are part of your environment, the backup strategy must include the
content that they generate, too. If data has to be restored, everything must be
restored to the same point in time.
351
You can use both hot (or online) and offline backups with IBM FileNet Content
Manager environments as long as you ensure that the backups of all the
components are synchronized and can be restored to the same point in time.
You must test restoring from backups regularly, both as part of general
environment maintenance and for disaster recovery.
For more information about backing up and restoring IBM FileNet Content
Manager environments, see 10.13, Backup and restore on page 364.
Tuning
When you first deploy a new P8 solution, ensuring that the databases are tuned
appropriately is a key element of applications that have good response times.
Look for these items:
Adequate number of available database connections
Appropriate indexes to improve search performance
Create indexes on individual object store properties via FEM. Follow these
steps to identify a property as an index item:
a. Navigate to the class that uses the property.
b. Display the properties of the class.
c. Click the Property Definitions tab.
d. Select the property from the list, and then click Edit.
e. On the General tab, use the set/remove index option.
Complex indexes must be created by using the database vendor tools.
Cache read ahead
Maintenance
Monitor the database for these conditions:
Available space for items, such as tables, indexes, logs, and journal files
Structures that need reorganizing or that have space that needs reclaiming
If many objects, for example, documents or events, are regularly being added
and deleted, regularly reorganizing the appropriate tables or rebuilding
indexes can have a positive effect on performance and throughput.
352
353
With this information, you can design a storage plan that enables you to set up
the following rules:
Set retention rules when content is added to a repository
Update retention rules when requirements change
Develop sweep rules and schedules that enable you to perform these tasks:
Delete content that is no longer needed
Move content to lower-cost storage when it is accessed less frequently
354
When using this retention model, the deletion happens in two stages:
1. A delete request must be initiated from the IBM FileNet Content Manager.
This request removes the document from the object store, and the
document is no longer visible to IBM FileNet Content Manager applications.
2. In the background, IBM FileNet Content Manager calls the fixed content
device to delete the content. Since IBM FileNet Content Manager previously
determined that the content is eligible for deletion, deleting the content from
the fixed storage device is allowed and successful.
Best Practice: Allow IBM FileNet Content Manager to control the retention on
fixed content devices. Do not define default retention periods directly on the
fixed content device.
355
The sweep framework also supports thumbnail generation and batch printing.
There are three forms of sweep:
Single sweep
Use this form of sweep to complete a one-time batch task, such as moving
documents that have been incorrectly filed.
Policy-controlled
Use this form of sweep to automate regular maintenance tasks, such as
deleting documents that have passed their required retention dates and
moving documents to lower-cost storage.
Queue sweep
A special form of sweep that is used by IBM FileNet Content Manager for
queue operations, such as thumbnail generation.
Storage statistics
The following information is available for file storage areas. Monitor this
information to ensure that the file storage is configured appropriately for your
environment:
356
Figure 10-14 on page 357 illustrates the storage statistics information provided in
ACCE.
Cache subsystems
There are multiple cache subsystems that can be tuned. You can configure the
subsystems at the P8 domain level and at the site level.
Consider these best practices for cache subsystems:
Use the default settings and only make changes if specific use cases require
them.
Only make changes at the site level if there are specific characteristics of that
site that require different settings.
357
Track the changes you make, especially if you are making them at the site
level.
Figure 10-15 displays the ACCE page for modifying the site-level cache
subsystems. In addition to the subsystems shown, there are also subject and
metadata merged scope subsystems.
358
359
360
361
362
FileNet\ContentEngine\Scripts\Component Library\
3. When prompted to define security roles, you see two roles under Security
Role: Object Store Administrators and Object Store Users.
Click Add to add security participants for the selected role. The Select Users
and Groups dialog box opens. Click OK when you have added the
participants for that particular role. See Figure 10-16 on page 363, which
shows the Security Script Wizard.
4. Click Finish when you are done. The wizard generates a prompt informing
you where its log file will be located. The wizard proceeds to apply the
security permissions to the objects in the object store. This process can take
time, depending on the number of objects that need to be updated. The
wizard reports when the process of applying security is complete.
5. If you added groups to only one Security Role, a notice appears (see
Figure 10-17). Click OK to continue. This notice appears because no current
Security Roles will be deleted; only the new roles will be added by the wizard.
363
The Security Script Wizard sets permissions on the root folder in the object store,
but it does not directly modify the security assigned to individual documents and
custom objects. Depending on how inheritance has been configured, the
document and custom object permissions might be inherited from the root folder.
You can read more detailed information by selecting ecm_help FileNet P8
Administration Content Platform Engine Administration Managing
Security Security Script Wizard.
364
365
expect the system to be operational. In this example, allotting one hour before
and after gives you a five hour total backup window to stop the servers, perform
the backup, and start the servers.
Typical installations store content in a file system, metadata in a database, work
items in a process database, and pointers to the content in external systems. The
amount of time that is necessary to back up the individual system components
can vary by minutes or hours.
The amount of time required for the longest components backup must fit within
your backup window. Your content storage area usually consumes the greatest
amount of backup time.
There are a few steps that you can take to decrease backup time to fit your
window:
Use a combination of full and incremental backups. Incremental backups
simply capture information that has changed since the last backup. This can
greatly reduce time spent backing up data. During a restore, you must restore
from your last full backup and apply the incremental backups before starting
your system, which increases the amount of time necessary to restore your
system. A best practice is to perform full backups weekly when a larger
backup window is available and perform incremental backups during the week
when your backup window is smaller.
If you use tape as your backup media, a faster alternative is to back up your
data to disk files. When the backup to disk completes, transfer the backup
files to tape, which allows your IBM FileNet Content Manager system to run
while the transfer to tape occurs.
The next section describes potential methods to run online backups. Those
techniques can safely be used for offline backups. Simply stop your IBM
FileNet Content Manager servers, run the copy, and restart your system. This
approach provides the fastest possible offline backup.
If your backups cannot be completed within your backup window, you need to
look at the online backup methods discussed next.
366
367
Application servers
Content Platform Engine
File stores if used or configured
Other IBM FileNet Content Manager components
Any external systems with which your IBM FileNet Content Manager
application operates
Your IBM FileNet Content Manager system needs to be down during the restore
process. If you used incremental backups, restore all incremental backups before
you start your IBM FileNet Content Manager system. After all restores are
completed, start your IBM FileNet Content Manager system normally.
Recommendations: Consistency checks can run for a long time depending
on the amount of content in your system. Limit the amount of time that the
consistency check runs. Set the check to start a few hours before the major
event that requires its use.
368
Frequency
Comments
Monitor system
Daily
Back up system
Daily
Log maintenance
Weekly
See footnote 1
Weekly
Check
performance
Weekly
See footnote 2
Monthly
See footnote 3
Database
maintenance
Periodically
Backup software
Monthly
Apply patches
Semi-annually
See footnote 3
Test restore
Annually
Log maintenance must include all operating system, application server, and IBM
FileNet Content Manager product error and trace log files. Log maintenance must
also include the Content Platform Engine audit log and the Content Platform
Engine log database tables, if used. All log files can grow quite large over time;
on busy systems, you might need to increase the maintenance frequency. Low
use systems might be able to reduce the frequency.
2 System Dashboard performance archiver on page 332 describes how to archive
performance logs. You can generate reports from the archived log files. If you use
IBM FileNet System Monitor, you can configure it to keep archived performance
data and generate reports, as well.
3 IBM FileNet fix packs are produced at regular intervals. Fix packs are available on
Fix Central: http://www.ibm.com/support/fixcentral/
369
10.15 Conclusion
This list summarizes our recommendations in this chapter:
Run the archiver.jar to capture performance data during peak hours of
activity.
Maintain message logs by renaming them and then deleting them after a
period of time.
Manage (clean up) audit and statistics logs weekly when used.
Keep auditing as minimal as possible.
Use security groups to secure content.
Store your backup media off-site.
Allot free time before and after the backup as part of a backup window.
If you use incremental backups, perform full backups weekly.
Run the Consistency Checker utility after you restore a system.
In Chapter 11, Upgrade and migration on page 371, we address upgrade and
migration topics. In Chapter 12, Troubleshooting on page 387, we discuss
troubleshooting techniques.
370
11
Chapter 11.
Terminology
Planning for updates
Upgrading to a new software release
Migration best practices
Special considerations for upgrade
371
11.1 Terminology
Specific terminology is used in the P8 Information Center and by the IBM FileNet
Support teams for updating a FileNet Content Manager environment.
Understanding the terminology enables you to make informed decisions about
your update process choices and the effort involved with each type of change.
11.1.1 Packages
There are several package types that you can apply to your FileNet Content
Manager environment:
Software release
A new software release introduces new features and functionality, adds and
removes support for infrastructure elements, resolves client-reported issues, and
adds and removes support for components that interact with FileNet Content
Manager.
Although new functionality might be added and existing functionality changed or
removed, the new software release maintains compatibility with an earlier version
for any application programming interfaces (APIs). However, APIs can be marked
as deprecated. Deprecated APIs eventually are removed.
Mod release
A mod release, or service pack, provides a small set of new features, as well as
resolutions to client-reported issues. The mod release provides a roll-up of fixes
that are available in previous update packages, as well as fixes that are released
for the first time.
Fix pack
A fix pack provides a roll-up of authorized program analysis report (APAR)
resolutions that were previously provided as interim fixes, test fixes, or in a
previous fix pack, as well as fixes not previously released. Each fix pack contains
fixes from previous releases.
372
Interim fix
An interim fix provides the resolution to a few APARs, usually one, that are likely
to be needed by multiple clients.
Test fix
A test fix, or limited availability fix, provides the resolution to a small number of
APARs, usually one, that are required by a specific client. Test fixes are also
known as limited availability fixes. If you need this type of fix, IBM provides you
with specific download information and a password for extracting the fix package.
Interim fix and test fix packages are similar. The primary difference is the size of
the target audience for the package.
Software release
Software releases can be considered major or minor releases. The designation
of major or minor is arbitrary, but in general, it tries to convey the quantity of
changes introduced by the software release.
Examples:
5.0.0.0 indicates that the package is major release 5 of the software.
4.5.0.0 indicates that the package is minor release 4.5 of the software.
Mod release
Mod releases are identified by the third digit in the four-digit identifier.
Examples:
5.1.1.0 indicates that this package is the first mod release for the 5.1 software
level.
4.0.2.0 indicates that this package is the second mod release for the 4.0
software level.
373
Fix pack
Fix packs are identified by the fourth digit in the release level and the notation
FPxyz, where xyz identifies the number of the fix pack.
Examples:
5.1.0.2-P8CE-FP002 is the first fix pack that is on top of the Content Engine
5.1 release.
1.1.5.2-WPXT-FP012 is the 12th fix pack on top of the FileNet Workplace XT
1.1.5 release.
Interim fix
Interim fixes are identified by the notation IFxyz at the end of the package name,
where xyz identifies the number of the interim fix.
Examples:
5.1.0.0-P8CE-IF004 is the fourth interim fix on top of the Content Engine 5.1
release.
5.1.1.2-P8CE-IF001 is the first interim fix on top of Content Engine 5.1.1 fix
pack 2.
Test fixes are identified by the notation LAxyz at the end of the package name,
where xyz identifies the number of the test fix.
Examples:
4.0.2.4-P8eF-LA001 is the first test fix on top of fix pack 4 for the eForms
4.0.2 release.
1.1.4.6-WPXT-LA003 is the third test fix on top of fix pack 6 for the FileNet
Workplace XT 1.1.4 release.
374
Tip: The readme file might also direct you to the appropriate installation and
upgrade sections in the P8 Information Center for some of the procedural
information.
Major, minor, and mod releases contain full installers. Therefore, they can be
installed on a system that has no previous version of the software installed, as
well as on top of an earlier version of the software.
In some cases, such as FileNet Workplace XT, fix packs are also full installations.
In other cases, such as with IBM Content Navigator, the fix packs must be
installed on top of the product release that is indicated in the fix pack name. All fix
packs are cumulative, which means that you can skip fix pack levels.
Interim fixes and test fixes are not usually cumulative; instead, they contain files
that must be applied, often manually, to a specific level of software. To verify the
content of the package and to determine any prerequisites, see the readme file
that is supplied with the interim fix or test fix.
Caution: Before upgrading an environment that is updated with a test fix,
check the readme file of the newer software update package to ensure that the
package contains the fix that was originally provided by the test fix. If the test
fix is not explicitly listed, check with your IBM representative for information
about when the test fix will be made generally available. If necessary, request
that a new version of the test fix is generated that is compatible with the newer
software update package.
Software updates cannot be uninstalled independently or separately from
uninstalling the associated product component. The exception to this rule is the
packages that are installed by manually copying files. These updates can be
uninstalled by copying the older versions of the files over the newer versions of
the files, and by reversing any other steps described in the readme file that is
associated with the update package.
375
Upgrade
Refers to applying a new release to the environment, such as moving from
Content Engine 5.1 to Content Platform Engine 5.2. In an upgrade, there is no
change in the hardware that is used by the components.
Migration
Refers to applying a new release to the environment and redeploying the
software on new hardware or in a new software configuration. Software
configuration changes can include changing application server, moving
components to new operating systems, and moving to a new database
management system.
Updating, upgrading, and migrating FileNet Content Manager are standard
maintenance tasks and need to be scheduled on an on-going basis. These
changes need to be coordinated with other system maintenance tasks, such as
updating application server levels and applying operating system fixes.
Another term that is frequently used when discussing upgrade and migration is
staging. Staging can be used in two ways:
Breaking the upgrade or migration into multiple steps that will be performed at
different times so that, for example, the upgrade is completed over multiple
maintenance windows.
Using an interim set of servers on which to perform some of the upgrade and
migration steps. The staging servers are frequently used to test portions of an
upgrade or migration by using production data.
Figure 11-1 illustrates the process flow for a migration upgrade.
Start
End
Upgrade/migration required
decision
Decommissioning of retired
components
Go live
Testing
Perform upgrade/migration
376
377
378
The order in which you update the environments can be different depending on
the update being installed. Table 11-1 illustrates the order in which you might
perform an update, assuming you have the following five environments:
Development
User acceptance test
Performance test
Preproduction
Production
Software
release
Mod
release
Fix
pack
Interim
fix
Test fix
Development
User Acceptance
Performance
Preproduction
Production
Because interim fixes and test fixes are often installed to address an issue
encountered in the production environment, it is important to get the change
installed into the production environment as quickly as possible, but also as
safely as possible. We advise that you install the change initially into an
environment running the exact same software as the production environment,
and then promote the change to production.
Practicing the update in this fashion has the following advantages:
Refines the installation process that is needed at your site.
Trains the resource staff who will ultimately perform the update in the
production environment.
Allows the project manager to develop a realistic schedule for the update.
Identifies issues that might occur during the installation and any associated
workarounds.
Provides an opportunity to build an appropriate test suite to ensure that your
applications work correctly after the update.
Recommendations: Test suites need to cover functional, stress, and
performance testing.
379
380
Whatever plan you choose to adopt, leave enough time in the plan for these
activities:
Backup and restore activities
At a minimum, take backups before you start the upgrade activity and after
the upgrade completes.
Testing
The test activities include functional, performance, and stress testing.
Tip: If you have a disaster recovery site, ensure that your plan includes
breaking any synchronization links at appropriate points in the upgrade
process, upgrading the software on the disaster recovery hardware, and
re-establishing the synchronization links.
381
Are there clear delineations in the resources that are used by your
applications?
If you are replacing all the hardware in your current production environment
and there is a clear delineation in the resources used by your applications,
you can choose to run with two production environments in parallel but on
different release levels. In this model, you copy over the object stores,
workflow systems, and file storage areas that are used by each application in
stages.
For more information about taking this type of approach, see the following
technote:
http://www.ibm.com/support/docview.wss?uid=swg21428407
382
If moving to a new LDAP server, ensure that all user and group references are
updated correctly.
Consider testing the change by setting up a new P8 domain (at the same
software level as the existing P8 domain) with the LDAP server and using the
FileNet Deployment Manager to map users and groups. FileNet Deployment
Manager will attempt to map the users and groups automatically, and flags
any users or groups that do not appear to have a match in the destination
environment.
Tips:
If moving to a new type of LDAP server, consider engaging IBM Lab
Services to help with the migration because they have expertise and
special tools to help with this type of migration.
If you perform the upgrade on new hardware, copy the Content Manager
configuration file from the existing production system to the new hardware,
and use it when running the upgrade of the Content Engine or Content
Platform Engine.
383
Content Platform Engine 5.2 supports CSS only. If content has not already been
reindexed by using Content Platform Engine 5.2, clients will be unable to perform
content-based retrievals until the reindexing is complete.
If there is a period of time where it is acceptable that content is not indexed,
upgrade directly to Content Platform Engine 5.2. Otherwise, upgrade to Content
Engine 5.1 before you upgrade to Content Platform Engine 5.2.
384
11.6 Conclusion
In this chapter, we discussed the update types and the best practices to follow to
ensure a timely and successful update.
385
386
12
Chapter 12.
Troubleshooting
In this chapter, we discuss the methods that are used to troubleshoot IBM FileNet
P8 Content Manager issues. P8 Content Manager implementations range from
small departmental systems running one application using one or two servers to
large enterprise systems running many applications on many servers.
We discuss the following topics:
387
Input
Frontend
Middletier
Backend
LDAP
Directory Server
Human Interaction
Browser
Application Server
Application Server
hosting
hosting
IBM FileNet
Custom Application Code
Content Platform Engine
and / or
IBM Content Navigator
and / or
IBM WorkplaceXT
File Server
hosting 1-n
File Store/s
Database Server
with 1-n
Object Store/s
388
In the basic P8 Content Manager environment, the user points their browser or
desktop to the chosen front-end application running on a Java EE application
server:
1. The user receives a logon window and enters their user ID and password.
2. The credentials are validated against the directory server.
3. The Content Platform Engine caches user and group membership
information.
4. The chosen front-end GUI then displays the appropriate artifacts to the user.
At this point, the user can view, change, or create new content. The benefit of this
architecture is that as your user base or transaction activity grows, you can
quickly and easily increase the resources allocated to the environment. You can
scale vertically by adding more CPU and memory to your existing servers, or
horizontally by adding more servers. This approach allows your P8 Content
Manager system to grow to support hundreds of applications with thousands of
concurrent users working on an enormous amount of content. The N-Tier Java
EE architecture (server/client) enables P8 Content Manager to scale from small
systems to large enterprise systems with minimal effort.
You can simplify the logon sequence by looking at it from a client/server
perspective. The components used are essentially several client and several
server components working together:
The users web browser is a client to the chosen front-end GUI.
The chosen front-end GUI is a client to both the LDAP and Content Platform
Engine.
The Content Platform Engine is a client to the database and file storage
areas.
The Java EE application server is the server on which the chosen GUI
application and Content Platform Engine run.
This perspective might be oversimplified, but as you approach a problem, think
about it in client/server terms. Finding the failing client/server section enables
you to quickly rule out what is working and focus on the component that is not
working.
In this chapter, we look at common problems and different types of
troubleshooting by breaking them down into a client/server approach.
389
390
391
392
393
Applications and tools that ship with P8 Content Manager, such as IBM
Administration Console for Content Platform Engine (ACCE), IBM FileNet
Enterprise Manager, IBM Content Navigator, and FileNet Workplace XT, are also
helpful for problem determination. They can be used to perform similar actions,
such as adding folders, creating documents, and viewing content. If none of the
applications can perform a certain function, the problem you are trying to
troubleshoot is likely caused by an infrastructure component, such as the
network, or the Content Platform Engine server. However, if these applications
function correctly but a custom application does not, the likely culprit is the
custom application.
If an issue arises just after a component in the environment is updated, start the
troubleshooting by ensuring all the component software levels are compatible
with P8 Content Manager and with each other.
For P8 Content Manager compatibility requirements, see the following
documents:
Hardware and software requirements for the P8 suite of products:
http://www.ibm.com/support/docview.wss?uid=swg27013654
FileNet P8 Fix Pack Compatibility Matrices:
http://www.ibm.com/support/docview.wss?uid=swg27014734
MustGather: Read first for the Content Platform Engine:
http://www.ibm.com/support/docview.wss?rs=3278&uid=swg21308231
The MustGather documents the information that the IBM Support team
needs to start troubleshooting issues. Use this information as a starting point
for your own troubleshooting efforts. If your individual troubleshooting efforts
are unsuccessful, attach the information identified in the MustGather
documents to the problem management record (PMR) or trouble ticket.
394
Often, the issues arise because the guidance provided in the following
documentation was not followed or validated prior to starting the installation:
Planning sections of the P8 Information Center
Third-party software-level requirements documented in the P8 Hardware and
Software Requirements guide
Information in the readme files or release notes
395
Operating system
Refer to the P8 Hardware and Software guide to determine the supported
operating systems and operating system prerequisites.
Database
Although new database servers can be installed specifically for use with
the P8 Content Manager software, you can also use existing database
servers if they meet the documented prerequisites. In addition, you need
to create the databases and tablespaces required by the P8 components
that will be installed. Some components, such as the Content Platform
Engine, have a minimum requirement of one database for the GCD and
one for an object store and a workflow system. However, components
might require more databases depending on your specific use cases.
Application server
Assuming an application server meets the documented prerequisite
requirements and has adequate capacity, the P8 Content Manager
components can be deployed to existing application servers. When
deciding to use existing application servers or to install new application
servers, consider the expected load on the system and the ability to
accommodate more load than initially expected.
Content Platform Engine
FileNet Workplace XT
Required for managing workflow systems.
IBM Content Navigator
Check each of the following components to ensure that they are functioning
correctly:
Network components:
The appropriate ports are open. For the ports used by P8 components,
see the following link:
http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/index.jsp?topi
c=%2Fcom.ibm.p8.planprepare.doc%2Fp8pap057.htm
Operating system:
396
Domain name server (DNS) lookup and reverse lookup between the
servers works.
Directory service:
Application server:
397
Login is successful.
Navigate to the folder and document that were created by using ACCE
and open the document.
Add additional documents and then check that they can be opened by
using ACCE.
If you follow this best practice and issues occur, start Root Cause Analysis (RCA)
immediately. The localization of the root cause is much easier by using this
398
399
After you confirm that the Content Platform Engine is running, use the Content
Platform Engine System Health page to check the content-related items in the
environment:
http://<Content Platform Engine server>:<port>/P8CE/Health
This URL is an example:
http://hqdemo1:9080/P8CE/Health
If your Content Platform Engine is running, you see a window similar to
Figure 12-3.
400
401
402
403
404
If the information provided in the Content Platform Engine logs is not enough to
identify and resolve the issue, the information points you to the next logical step
in your troubleshooting efforts. Look at these files next:
Application logs
Java applications do not log error messages; they log exceptions. Java
messages and exceptions are written to message log files. P8 Content Manager
has two major engine components: IBM Content Navigator or FileNet Workplace
XT as the front end, and the Content Platform Engine as the back end. Content
Platform Engine writes four message logs:
p8_server_error.log
p8_server_trace.log
pesvr_system.log
pesvr_trace.log
405
If there are errors that seem related to the functional issues you are debugging,
resolve the issues, recycle the environment, clear the logs, and then retest the
P8 Content Manager application to see whether the problem is resolved.
Database logs
If an object store cannot be accessed, response time is slow, or there are poorly
performing queries, the next step is to ensure that the database is running as
expected. Work with the database administrator to ensure that the databases are
running as expected. Common issues that can occur include space allocation
issues for the temp, database, and log files, poor query plans resulting in slow
searches, and an insufficient number of available database sessions. Also, check
for permission errors and that all the database prerequisites documented in the
planning and prepare section of the P8 Information Center were implemented
correctly:
http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/index.jsp?topic=%2Fcom.
ibm.p8.planprepare.doc%2Fp8ppi084.htm
Performance-related issues can often be mitigated by reorganizing tables and
reclaiming free space, adding indexes, or reworking searches so that they use
indexes that exist.
System logs
Check the relevant system logs. If you find errors, ensure that the errors relate to
the problem you are investigating. For example, there might be errors that relate
to outdated virus files or an improperly configured mail server that are not related
to the issue you are trying to debug.
If there are errors that seem related to the functional issues you are debugging,
resolve the issues and then retest the P8 Content Manager application to see
whether the problem is resolved.
406
Abbreviation
API
API
Asynchronous
Processing
ASYN
Audit
Disposition
AUDT
CBR
CBR
CFS Daemon
CFSD
CFS Import
Agent
CFSI
Code Module
CMOD
Content Cache
CCHE
Content
Storage
CSTG
Database
DB
407
408
Property
Abbreviation
EJB
EJB
Engine
ENG
Error
ERR
Events
EVNT
Fixed Content
Provider
FCPV
GCD
GCD
Handler
HDLR
Metadata
MCHE
Publish
PUBL
Publishing operations.
Replication
Subsystem
REPL
Search
SRCH
Security
SEC
SSI
SSI
Sweep
SWP
Sweep operations.
Thumbnail
Generation
THMG
WSI
WSI
409
Try to reproduce the issue and behavior of the production system in a test
environment1. Then, analyze and try to fix the problem. If you are unable to fix
the problem on your own, contact the IBM Support team and ask for help. When
you receive a fix (software or a configuration change), install the fix into an
integration or test environment. Perform some general regression tests in
addition to validating that the main issue is fixed before moving the fix into the
production environment. Depending on the severity of the issue, you might also
choose to move the fix into all your other environments before moving it into
production, in much the same way as you handle a new application or an
enhancement to an existing application.
Follow these steps before you put the change into the production environment:
1. Back up any log files.
2. Clear the log files so any new errors are easy to detect.
1
410
This action presupposes a similar reference system is available with similar physical and software
characteristics as the production environment.
411
Operating systems
Databases
Application servers
Directory servers
Content Platform Engine
FileNet Workplace XT and IBM Content Navigator
412
Environmental information
Central processing unit counters
Disk counters
Network inbound/outbound counters
User counters and response times of operations
Hangs
Deadlocks
Resource contention
Bottlenecks in Java threads
413
CPU utilization
Memory use
Kernel statistics and run queue information
Disk I/O rates, transfers, and read/write ratios
Free space on file systems
Disk adapters
Network I/O rates, transfers, and read/write ratios
Paging space and paging rates
CPU and AIX specification
Top processors
IBM HTTP web cache
User-defined disk groups
Machine details and resources
Asynchronous I/O (AIX only)
Workload Manager (WLM) (AIX only)
IBM TotalStorage Enterprise Storage Server (ESS) disks (AIX only)
Network File System (NFS)
Dynamic logical partition (LPAR) (DLPAR) changes (only IBM pSeries p5
and IBM OpenPower for either AIX or Linux)
414
415
Production issues
In addition to the basic performance issues during the initial implementation of a
solution, there is always the possibility of running into performance issues during
production.
Implement regression test cases to validate your application after upgrading to
new software levels, fixing an issue, or introducing new features. One of the
regression tests needs to be a performance baseline test in which a known user
logs on to the system. If your clients are reporting slow logons, one of the first
troubleshooting steps is to run this baseline test. The results of the test help
identify whether there is an issue common to all users or whether the slow logon
affects only a subset of your clients.
If all users are affected, review the common infrastructure elements to determine
whether a failure occurred.
If the problem is limited to a subset of users, follow these steps:
Ask the user if the logon was quicker in the past. If yes, ask when they first
noticed the slowdown.
Examine the users group membership to determine whether anything
changed recently:
Does this user belong to significantly more groups than other users?
If there are nested group memberships, determine the nesting depth.
416
417
418
419
420
421
Further definitions
Examples
Severity 1
Severity 2
Severe impact:
A software component is severely
restricted in its use, causing
significant business impact.
Severity 3
Moderate impact:
A non-critical software component
is malfunctioning, causing
moderate business impact.
Severity 4
Minimal impact:
A non-critical software component
is malfunctioning, causing minimal
impact, or a non-technical request
is being made.
Documentation is incorrect.
Additional documentation
requested.
When speaking with the software support specialist, also mention the following
items if they apply to your situation:
You are under business deadline pressure.
Your availability, or when you will be able to work with IBM software support.
You can be reached at more than one phone number.
You can designate a knowledgeable alternate contact with whom the IBM
Support representative can speak.
You have other open problems (PMRs) with IBM about this service request.
You are participating in an early adoption program.
You have researched this situation prior to calling IBM and have detailed
information or documentation to provide for the problem.
422
423
Fix pack
A fix pack provides a roll-up of APAR resolutions that were previously provided
as interim fixes, test fixes, or in a previous fix pack, as well as fixes not previously
released.
Interim fix
An interim fix provides the resolution to a few APARs, usually one, that are likely
to be needed by multiple customers.
Test fix
A test fix provides the resolution to a few APARS, usually one, that are required
by a specific customer. Test fixes are password protected.
424
12.11 Conclusion
In this chapter, we discussed the basics of troubleshooting P8 Content Manager
and associated third-party software.
In the next chapter, we provide an overview of building software solutions that
use P8 Content Manager.
425
426
13
Chapter 13.
Foundation components
Content ingestion tools
Process management
Presentation features
427
Foundation components
Content ingestion tools
Process management
Presentation features
Content
Ingestion
Tools
Process
Management
Presentation
Features
Foundation
Components
These major components form the building blocks of ECM solution design.
Solution building blocks are the features that ECM solution designers can
specify and combine to build out each of the components of an ECM solution:
content ingestion, content management, process, and presentation. The IBM
FileNet suite of products contains applications and tools that offer designers a
wide range of features and functions for the design of each of the major
components of an ECM implementation.
Figure 13-2 on page 429 shows several of the IBM tools that are available to
ECM designers and the place for these blocks within the four major design
phases of an ECM solution.
428
ECM
Foundation Components
Repositories
Business Objects
Versioning
Content Ingestion
Tools
Classification
Search
Process Management
Storage
Management
Auditing
APIs
Social ECM
Presentation Features
Display
Paper scanning
Subscriptions
Fax
Email
Workflow
definitions
Document
Lifecycle management
Publishing
Browsing
Applications
FTP
Printing
Monitored
filesystem
Workflows / EAI
Repositories
Business object management
Classification
Versioning
Security
Auditing
Search
Content Platform Engine application programming interfaces (APIs)
Content storage and content caching
Lifecycle management and retention management
Social ECM capabilities
429
Repositories
Repositories are the basic components of FileNet P8 Content Manager. Their
main purpose is to store the business objects, for example, documents, images,
folders, and custom objects, with the respective metadata and provide a
centralized information library.
A single FileNet Content Platform Engine can serve several repositories that are
also called object stores. An object store can store various business data,
including structured and unstructured content, such as images, XML documents,
Microsoft Office documents, and web pages. It can also be configured to store
the content in a database, a file system, a fixed content device, or any
combination of these options.
Business benefits: The FileNet content repository provides a standard
solution for document creation, versioning, and check-out, or add and
check-in. The FileNet content repository can house all documents in a central
repository with accessibility for all authorized users.
Folders are special objects that are used to relate other type of objects, such as
document and custom objects, and provide a way to browse through other
objects. Folders have the following characteristics:
Have system properties that the system manages automatically, such as Date
Created.
Can have custom properties for storing business-related metadata.
Are secured.
Can be hierarchical (a folder can have subfolders).
Can contain documents and custom objects.
Can generate server events when they are created, modified, or deleted.
These events are then used to customize behavior.
Can be annotated.
430
Custom objects contain only metadata without any content and are used to store
business information. Custom objects have the following characteristics:
Have system properties that the system manages automatically, such as Date
Created.
Can have custom properties for storing business-related metadata, such as
Account Number.
Are secured.
Can participate in business processes as workflow attachments.
Can generate server events when they are created, modified, or deleted.
These events are then used to customize behavior.
Recommendations: Use custom objects to store data that relates to your
business requirements, for example, a client can be represented as a custom
object.
431
432
Classification
Each document class defines the properties and the default security for all the
documents that are added under a specific document class. Document classes
support design based on organization content or function and can encapsulate a
single design aspect. Classification of the documents is performed by selecting
document class and property values for each document. Classification can also
be performed by adding objects into folders that define classification taxonomies.
Classification can be performed in the following ways:
Manually by a user.
By an application that uses the P8 Content Engine API.
Automatically by using the content-based classification capability that is
provided in the P8 Content Platform Engine.
Business benefits: Document classes support transparent business
functionality and can automatically file the document in the correct folder and
apply the correct retention schedule.
Versioning
Versioning is a base document management capability that is used to maintain
the history of the changes of the document content. The set of versions for a
single document is called a version series. P8 Content Platform Engine supports
a minor and major version scheme; a minor version typically represents a work
in progress document and a major version represents a completed document.
The system can be configured to apply security policies that in turn automatically
apply different access rights for major and minor versions, making it easy to
enforce a different viewing audience for in-progress documents.
In addition to version numbers, P8 Content Platform Engine maintains a state
property that indicates the current state of each document version. The states
are listed:
In Process: A work-in-progress version. Only one version of a version series
can be in process.
Reservation: A document currently checked out for modification. Only the
latest version of a version series can be reserved.
433
Auditing
Auditing is the recording of events that occur on business objects. All business
objects and almost all events can be audited. Audit definitions describe how to
audit an event. For example, you can configure an audit definition for a document
class so that audit entries are automatically created whenever documents of that
class are checked in.
Audit entries are stored in a table of the object store database. Those entries can
be viewed, exported for reporting reasons, and administered by users with the
correct authorization.
Business benefits: Auditing helps you monitor content and process
management for the following activities and regulatory compliance:
Object creation
Updates
Deletions
Recommendations: Only enable auditing for selected events and plan for
audit log cleanup. Audit logs are stored in a database table and a large audit
log table can create performance issues.
Security
In P8 Content Manager, you secure the business objects by defining a directory
service that controls who logs on to the Content Platform Engine by setting
access rights for those users.
434
P8 Content Manager has a defined security context. Only those users, groups,
and machine accounts that are explicitly given access to the object store can
access the resources (business objects).
There are many ways to define the security of a business object:
Default instance security: The security of the object is defined in the object
class and inherited to all instances of that document class.
Version state: The security of the object is defined by the version level of the
document. For example, some users might have access to minor versions of
the document, and other users have access only to major versions of the
same document.
Document state: The security of the document is controlled by the document
state.
Marking sets: The security of the document is controlled by a property value
and by the code you implement that interprets the meaning of the property.
Directly applied security: The security of the object is assigned directly to the
object by a user or an application.
Inherited security: Security is placed on the object by a security parent or by
setting up a relationship with an object-valued property whose Security Proxy
Type is set to Inherited.
Business benefits: Using the model where document privileges are assigned
from Directory Services functional groups, not individual users, helps reduce
the cost of managing the security of the system by reducing security access
complexity and by handling the separation of duty requirements.
Marking sets allow access to a document to be controlled based on specific
property values to ensure that sensitive information is protected, for example,
Secret/Confidential. With marking sets, you ensure document security and
privacy control, limit access to sensitive data, and control access to
documents.
435
Search
P8 Content Manager supports property and content-based searches.
With property searches, a user or an application can search multiple object
stores for business objects that have a specific value in a property. Therefore,
users can search for documents, custom objects, or folders in different object
stores based on a property value.
Searches are defined in P8 Content Manager as SQL queries and support many
of the standard SQL operators, such as OR, AND, LIKE, UNION, and INTERSECTION.
Search definitions can be created and then stored in an object store, allowing
users easy access to common queries.
Content-based retrieval (CBR) supports searching within the content of a
document or the metadata. CBR provides capabilities to search for misspelled
words, typographical errors, word stems, synonym expansion, and wildcards.
CBR search results can be ranked by relevancy and can display a document
summary format.
Bulk operations can be performed on search results. Operations can be scripted
or selected from a set of predefined operations, such as delete, cancel checkout,
file, unfile, and change security.
Business benefits: Content Platform Engine search capabilities provide an
effective means of locating information, improving the ability to share
information across the organization, and enhancing record requests, thus
improving organizational efficiency.
436
APIs
Content Platform Engine has a collection of available development and
integration tools.
Depending on your enterprise strategy and architecture, you can use any of the
following APIs for application development:
Content Engine Java API: Provides access to the full content capabilities of
Content Platform Engine
Content Engine .NET API: Is the functional equivalent of Content Engine Java
API for .NET application development
Content Management Interoperability Services API: Is an open source OASIS
standard that enables applications to work with one or more content
management systems by defining a standard domain model and a standard
set of services
Process Java API: Provides classes for all workflow and business process
management features
Process Engine REST Service: Is used from custom applications to perform
fundamental business process management operations
Web Services: Provides a service that provides access to most of the same
functionality as the Content Engine and Process Java APIs
Business benefits: The P8 Content Manager APIs provide a way to integrate
with other line of business applications, improve the user experience by
offering functionality, and access multiple systems transparently.
437
Database storage is useful when you only need to store a few, small documents.
There are performance advantages to storing smaller documents (less than 10
MB) in a database storage area when compared to other storage area types.
Avoid storing any document that is over 100 MB in a database storage area. The
main benefit of database storage is that backups are much simpler because your
document content is backed up along with your normal object store database
backup.
File and fixed storage areas are the preferred medium when storing large
numbers of files with high ingestion rates.
File storage areas use a directory structure on the file system to store a
documents content. Documents are stored among the directories at the leaf level
by using a hashing algorithm to evenly distribute files among these leaf
directories.
Fixed storage areas are used to store documents in external repositories, such
as IBM Tivoli Storage Manager, EMC Centera, and Network Appliance
SnapLock. There are two scenarios for the integration of P8 Content Manager
with those external repositories. In the first scenario, the document content is
managed by Content Manager and the external device is used only as a storage
device. In the second scenario, fixed storage areas are used with federation
when content is stored in an external repository. In this scenario, the document
and its associated metadata can be accessed as native P8 documents, in
addition to their accessibility via the source repository.
P8 Content Manager also provides the following features for storage
management:
Bulk content move: With the bulk content move sweep job, you can move
content from one storage area to another. There is also a Move Content API
method, which can be used from other applications to move content from one
storage area to another. Content can be moved from any storage area type
(database, file system, or fixed content) as long as content is not under
device-level retention.
Content caching: Database file store area content and fixed storage area
content can be cached on a cache server. For frequently accessed content,
content caching provides a faster response time in content retrieval. Content
caching also benefits geographically distributed systems and systems with
hierarchical storage devices by storing copies of content local to where they
are accessed most often.
Content de-duplication: P8 Content Manager supports de-duplication of the
content that is ingested from various sources. This feature saves storage
costs for duplicate content that is saved in the repository.
438
Retention management
Retention management is an event-based retention infrastructure that can define
object-level retention policies. It is supported for documents, annotations, folders,
and custom objects.
A retention management automatic deletion and disposal policy defines the rules
for when objects are automatically deleted.
The policy has these characteristics:
Can apply to any searchable repository object.
439
Allows documents, folders, and custom root classes that have a retention date
in the past to be deleted.
Allows custom root classes that have a closure date in the past to be deleted.
Can delete queue items that have reached the maximum failure count more
than one month in the past.
Is based on values assigned to the CmRetentionDate system property.
Can also be based on system-defined or user-defined properties.
Business benefits: Implementing a retention management scheme helps
organizations meet organizational, business, regulatory, and legislative
requirements.
Social ECM
P8 Content Manager can be used for the enablement of social collaboration,
social content management, and integration with IBM Connections.
P8 Content Manager supports the following social ECM features:
Ability for users to recommend a business object.
Ability for users to comment on managed objects. Comments can only be
created by authorized users.
Ability for users to follow updates to business objects.
Social tagging of managed objects.
Activity stream generation for business objects. Activity streams provide a
syndicated view of updates to the content, including notifications and
recommendations.
Tracking the number of downloads of a document.
Large content streaming.
Thumbnail generation and storing.
User-centric recycle bin for deletion and recovery of documents.
440
Foundational components provide the strong build blocks for ECM enterprise
solution for organizations across multiple industries.
441
442
443
Change preprocessors
Change preprocessors are action handlers that change new or updated objects
before they are saved to the Content Manager. Change preprocessor handlers
are associated with a class definition. When an object of that class is saved, the
action handler is triggered.
Change preprocessors allow object modifications that are difficult or impossible
to accomplish by using event action handlers. For example, a change
preprocessor can alter a modifiable-only-on-create property because those
properties cannot be altered after the object is saved.
444
445
By creating a workflow definition, you define the activities and resources that are
needed to complete a business process. A workflow definition is a series of
steps connected by a series of routes that defines the sequence that the steps
are executed. Workflow definitions can contain several maps and submaps that
can group related steps.
Steps in the workflow represent a business task or a system activity. Steps can be
executed by a user, a group of users, or by an automated application. Workflow
steps can run in parallel to facilitate efficient processes.
Routing defines the order in which the steps are executed. Routing can be based
on a specific rule or events. Except for the last step, every step has one or more
routes that lead to it. Routes can be defined so that they are always taken or
followed only if a condition is met.
You can use deadlines and timers to ensure that work is processed in a timely
manner. A deadline provides a time-based scheduling constraint, which requires
that a step or workflow is completed within a certain amount of time. The
deadline can be relative to the time that the step was routed to the participant or
to the time that the workflow was launched. A participant with a deadline can
receive a reminder of the pending deadline through an email message. When the
deadline is passed, a visual reminder displays in the participants inbox, and an
email can be sent to a configurable list, such as one or more supervisors. The
distribution list can be specific to each work item. This automatic process
escalation has the additional benefit of operatively ensuring that certain functions
or processes are completed on time and without tying up resources to
continuously monitor system activities.
A timer indicates a time during which you want a specified series of steps to
process. If the timer expires before this processing completes, processing
proceeds to another workflow map that provides alternate processing of the
work.
Recommendations: For complex content-centric processes, use the process
management capabilities of Content Manager. Examples of content-centric
processes are loan origination processes and insurance claim processes.
Using Content Manager process management, you can activate the
organizational content and take control over processes that involve
documents.
446
447
Can have a different file format than the source documents, for example, the
source document can be a word document, and the publication document can
be an HTML document. Publishing options are defined by individual
templates
Can originate as Microsoft Office (for example, Word, Excel, and PowerPoint)
documents and be rendered to PDF or HTML
DITA documents
DITA is an XML-based open standard for developing, managing, structuring, and
publishing content. IBM originally developed DITA for more efficient reuse of
content in product documentation. IBM donated DITA work to the Organization
for the Advancement of Structured Information Standards (OASIS) for further
development and public release.
The content can be composed based on the DITA model that allows content to
be linked to multiple topics. After the content is reviewed and approved, it is
published to allow business users to perform searches and to navigate around
the content.
The two central units of authoring in DITA are the topic and the map. The map
combines multiple topics into a structure that has a unique map. The topic might
appear in different manuals, and in multiple sections of the final document. Maps
are XML documents that consist of links to topics and metadata. Maps do not
have content themselves. DITA content (topics and maps) is rendered into PDF
and HTML.
Storing each piece of content in a separate file allows users to check out, revise,
check back in as a new version, and reuse the single source material in multiple
locations.
Next, we review the sample use cases from Chapter 2, Solution examples and
design methodology on page 17. We use the available components of Content
Manager and explain how those components are used to address business
requirements.
448
449
4. New version
1. Author creates
document for
revision.
0.1
0.2
1.0
0.1
Repository
0.2
1.0
450
Presentation features:
Native content format
Rendition Engine for PDF output
Solution details
As described in the requirements, authorized users must be able to check out
policy documents from the repository, change the content, and put it back in the
repository as a new version. For this requirement, we use P8 Content Manager
checkout, checkin, and versioning capabilities and Microsoft Office integration.
By adding a version to the repository, we use the content lifecycle management
capabilities of P8 Content Manager, and we assign the lifecycle state Pending
approval on the document. Approvers can check out the document, review the
content, and change it, creating a version of the document. They can return the
policy document to the previous state with comments for the author by adding a
new minor version, or they can approve the document and put it in the approved
state by adding a new major version.
In the requirements, there is the need for the user community to easily locate the
policy documents that are stored in an object store. For that purpose, we use the
object store foldering capabilities. We are creating logical folders for the policy
documents where users can put them according to their classification (for
example, Human Resources policies or procurement policies). We are also
providing to users the ability to search for policy documents based on their
properties. The P8 Content Manager search capabilities allow you to search for
documents by using any combination of document properties (for example,
Human Resources policies that were published two years ago).
According to the requirements, general users must have access only to approved
documents. Authors and reviewers must also have access to draft documents.
We use object store security to present the draft documents to the special users
that create, review, and approve the policy documents.
Also, there is a requirement that an approved policy document must be published
to the companys intranet site as a PDF document. For this requirement, we use
IBM Rendition Engine for PDF generation and publishing.
Finally, there is the need to import all already created policy documents to
Content Manager. For this step, we use the file import capabilities of IBM
Datacap task master.
451
452
Insurance
Claim
Document Scanning,
Faxing, Email and Uploading
from external sources
Document notification
will trigger a new task
in Claim processing
Notification to create
a Records Folder
Notification to Close
the Claim Folder
Send notification
that Claim is
ready for disposition
CFS
ROS
Documents
FPOS
Policy
Data
Records
453
Process management
Events for notification of content addition on the Content Manager repository
Presentation feature
Content display in native format in claim processing department
Solution details
This complex solution uses many features of P8 Content Manager.
As described in the requirements, after claim notification, a new claim must be
opened in the core Claim Management System. New claim registration on the
core system triggers the creation of a new folder in IBM Enterprise Records. In
that folder, all records objects will be created for the claim-related documents that
are stored in the Content Platform Engine repository. For that requirement, we
use Content Manager APIs for integration with the Claim Management System.
When the paper document arrives, we use IBM Datacap Task Master to scan that
document. By using OCR/intelligent document recognition (ICR) capabilities, we
retrieve the claim number from the paper documents. Using that claim number,
along with other indexing information, such as the document type, the document
is stored in the Content Manager repository and filed under the claim folder. A
notification is sent to the Claim Management System. For the notification of the
Claim Management System, we use Content Platform Engine events to execute
certain code when a document is added in the ECM system.
During the claim lifecycle, the Claim Management System updates documents
and folders on the ECM system by using the Content Manager APIs with the
current claim status.
Due to sensitive personal information in the document, only authorized users
must have access to claim documents based on the claim status. For that
requirement, we use object store security and marking sets that control the
access to the document based on a property value (claim status).
According to the requirements, the Claim Management System users must have
access to insurance policy documents that are stored in a different Content
Manager repository. For that requirement, we use Content Federation Services
to integrate the claim repository with the insurance policy repository.
Users must be able to provide casual information around the documents, such as
comments, or highlight a specific portion of a document that contains critical
information. For that capability, we use Content Manager annotations over the
documents.
454
When the claim is closed in the Claim Management System, a notification needs
to be sent to trigger a retention for the claim documents. For that requirement, we
use the IBM Enterprise Records features and APIs.
455
Scanners
Scanner workstations
Imaging Client
Verify workstations
More supporting documents
added later and reconciled using
the same property value (example
invoice number)
Recognition/
verification
OCR/ICR/OMR
Indexing
Release to FileNet
Repository
Document is processed:
Classified
Data extracted
Data looked up
Data verified
Invoice created
Document is released
to repository
Record is created
Create Invoice
Manage
Documents
IBM Content
Navigator
Search and
retrieve the
documents
Native Format
Viewer
View and
mark up the
document
456
Solution details
Based on requirements, all incoming invoices must be scanned on arrival and
data must be exported from the scanned images. For that requirement, we use
Datacap Task Master scanning and OCR/ICR capabilities.
Exported data must be validated by authorized users and the data accuracy must
be verified. For that requirement, we use Datacap Task Master validation
features.
The scanned images of the invoices must be stored in the Content Manager and
become available to authorized SAP users to link those documents to SAP
transactions. For that requirement, we use IBM Content Collector for SAP, which
provides the functionality for linking Content Manager images to SAP
transactions.
SAP users must be able to view the scanned image of the invoice on the related
SAP transaction. For that requirement, we use IBM Content Collector for SAP
and the native content format viewing presentation feature of Content Manager.
Authorized users must have access to scanned images outside of the SAP
system and must be able to search for those images based on their properties.
For that requirement, we use Content Manager security and search features.
457
Collection Rules
e-mail Server
Inbox
Records File Plan
Email Manager Server
1. Effective e-mail
management involves
declaring e-mail content as
business records.
Content Manager
with Records
Manager
458
Content ingestion
IBM Content Collector for Email
Solution details
According to the requirements, mail to specific accounts or mail that meets a rule
(for example, contains the word proposal) needs to be stored in an object store.
For that requirement, we use IBM Content Collector for Email. It monitors
mailboxes, retrieves email based on business rules, and stores it in an object
store.
Some of the emails that are considered special based on business rules are
declared as records by using IBM Enterprise Records. Emails are associated
with a retention period based on legislative requirements.
Users must be able to see the emails and their attachments in their mail clients,
but the content must be stored in an object store. For that requirement, we use
the stubbing capabilities of IBM Content Collector for Email. With that capability,
the original content of the email is removed from the email server and is replaced
by a stub that points to the object store where the content is stored.
For the emails that are not declared as records, content deduplication and
content compression are needed, specifically when an email with a large
attachment is sent to multiple recipients within the organization. For that
requirement, we use the content deduplication and content compression feature
of P8 Content Manager.
Authorized users must have access to those emails and attachments that are
declared as records outside the mail clients of the user. In order to locate specific
emails and attachments, searches within the content are required. The searches
must be implemented by using the CBR capabilities of P8 Content Manager.
459
5. Document is published
on IBM Connections
Community for trainees.
1. Author creates
content and
publishes it to the
repository
IBM Connections
Files
2. Document is published
on IBM Connections
Community for SME review.
IBM Connections
Files
Repository
Communities
Communities
6. Community members
are collaborating over
the training material.
0.2
3. Comments and
suggestions are on IBM
Connections Community
and final version of
content is created.
1.0
Solution details
Based on the requirements, users must be able to create different types of
training material from Microsoft Word and PDF to video and audio files and store
them in Content Manager. For that requirement, we use the versions and folders
capabilities.
460
13.3 Conclusion
In this chapter, we described the main solution building blocks of an ECM
system. We described features and characteristics of Content Manager and
add-ons that can be used for the implementation of a huge range of applications
from small departmental applications to large Enterprise Content Management
applications that cross the boundaries of many departments. As a reference, we
used these solution building blocks for the implementation of the five use cases
that we introduced in Chapter 2, Solution examples and design methodology on
page 17.
461
462
Related publications
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this book.
IBM Redbooks
The following IBM Redbooks publications provide additional information about
the topic in this document. Note that some publications referenced in this list
might be available in softcopy only.
IBM FileNet P8 Platform and Architecture, SG24-7667
Introducing IBM FileNet Business Process Manager, SG24-7509 (the product
is currently known as IBM Case Foundation)
Advanced Case Management with IBM Case Manager, SG24-7929
IBM Content Analytics Version 2.2: Discovering Actionable Insight from Your
Content, SG24-7877
Disaster Recovery and Backup Solutions for IBM FileNet P8 Version 4.5.1
Systems, SG24-7744
Federated Content Management: Accessing Content from Disparate
Repositories with IBM Content Federation Services and IBM Content
Integrator, SG24-7742
IBM High Availability Solution for IBM FileNet P8 Systems, SG24-7700
Understanding IBM FileNet Records Manager, SG24-7623
You can search for, view, download or order these documents and other
Redbooks, Redpapers, Web Docs, draft and additional materials, at the following
website:
ibm.com/redbooks
463
Online resources
These websites are also relevant as further information sources:
IBM FileNet Content Manager support website:
http://www.ibm.com/software/data/content-management/filenet-contentmanager/support.html
IBM FileNet P8 Version 5.2 Information Center:
http://publib.boulder.ibm.com/infocenter/p8docs/v5r2m0/index.jsp
URL links to Version 5.1 of the IBM FileNet P8 Information Center:
http://pic.dhe.ibm.com/infocenter/p8docs/v5r1m0/index.jsp?topic=/co
m.ibm.p8toc.doc/ic-homepage.html
URL links to Version 5.2 of the IBM FileNet P8 Information Center:
http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/index.jsp?topic=/co
m.ibm.p8toc.doc/ic-homepage.html
Product documentation for IBM FileNet P8 Platform:
http://www.ibm.com/support/docview.wss?rs=86&uid=swg27036917
IBM FileNet Hardware and Software Requirements guide:
http://www.ibm.com/support/docview.wss?rs=3278&uid=swg27013654
FileNet P8 Fix Pack Compatibility Matrices:
http://www.ibm.com/support/docview.wss?rs=3278&uid=swg27014734
IBM FileNet Content Manager Fix Central - Provides available fixes:
http://www.ibm.com/support/fixcentral
Information Center - Installation:
http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/topic/com.ibm.p8.ins
tall.doc/p8pti000.htm
Information Center - Supported upgrade paths:
http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/index.jsp?topic=%2Fc
om.ibm.p8.planprepare.doc%2Fp8ppu097.htm
Information Center - Database administration installation tasks:
http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/index.jsp?topic=%2Fc
om.ibm.p8.planprepare.doc%2Fp8ppi084.htm
Information Center - Planning and preparation for upgrade and migration:
http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/index.jsp?topic=%2Fc
om.ibm.p8toc.doc%2Fplanning.htm
464
Related publications
465
IBM FileNet Application Engine Files and Registry Keys Technical Notice
IBM FileNet P8 Asynchronous Rules Technical Notice
IBM FileNet Content Engine Component Security Technical Notice
IBM FileNet P8 Directory Service Migration Guide
IBM FileNet P8 Disaster Recovery Technical Notice
IBM FileNet P8 Extensible Authentication Guide
IBM FileNet P8 Process Task Manager Advanced Usage Technical Notice
IBM FileNet P8 Recommendations for Handling Large Numbers of Folders
and Objects Technical Notice
IBM FileNet P8 DB2 Large Object (LOB) Data Type Conversion Procedure
Technical Notice
Although several technical notices were written for the 3.5 version, much of
the content provided is useful for Version 4.0 as well.
Administering Content Platform Engine Sharing Data Sources and
Creating a Database Connection topics:
http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/index.jsp?topic=%2Fc
om.ibm.p8.ce.admin.tasks.doc%2Fp8pcb027.htm
Content storage management and storage farming IBM developerWorks
article:
http://www.ibm.com/developerworks/data/library/techarticle/dm-1003fi
lenetstoragemanagement/index.html
Inheritance proxies and various ways of getting this done:
http://www.ibm.com/support/docview.wss?uid=swg21425080
IBM FileNet P8 Performance Tuning Guide - Provides information about
tuning parameters that can help improve the performance of your IBM FileNet
P8 system:
ftp://ftp.software.ibm.com/software/data/cm/filenet/docs/p8doc/50x/p
850_performance_tuning.pdf
IBM FileNet P8 Performance Tuning - There are several pages that provide
additional information for improving the performance of IBM FileNet P8
components:
http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/topic/com.ibm.p8.per
formance.doc/p8ppt000.htm
Proven Practice: IBM FileNet Deployment Manager 5.1 Data Migrations:
http://www.ibm.com/support/docview.wss?uid=swg21609929
466
Related publications
467
468
(1.0 spine)
0.875<->1.498
460 <-> 788 pages
Back cover
System architecture,
business continuity,
and capacity
planning
IBM FileNet Content Manager Version 5.2 provides full content lifecycle and
extensive document management capabilities for digital content. IBM FileNet
Content Manager is tightly integrated with the family of IBM FileNet products
based on the IBM FileNet P8 technical platform. IBM FileNet Content Manager
serves as the core content management, security management, and storage
management engine for the products.
Repository, security,
application design,
and solution building
This IBM Redbooks publication covers the implementation best practices and
recommendations for solutions that use IBM FileNet Content Manager. It
introduces the functions and features of IBM FileNet Content Manager,
common use cases of the product, and a design methodology that provides
implementation guidance from requirements analysis through production use
of the solution. We address administrative topics of an IBM FileNet Content
Manager solution, including deployment, system administration and
maintenance, and troubleshooting.
Deployment, system
administration, and
maintenance
SG24-7547-01
ISBN 073843812X
INTERNATIONAL
TECHNICAL
SUPPORT
ORGANIZATION
BUILDING TECHNICAL
INFORMATION BASED ON
PRACTICAL EXPERIENCE
IBM Redbooks are developed by
the IBM International Technical
Support Organization. Experts
from IBM, Customers and
Partners from around the world
create timely technical
information based on realistic
scenarios. Specific
recommendations are provided
to help you implement IT
solutions more effectively in
your environment.