I2 Analyze 4.3.5
I2 Analyze 4.3.5
i2 Analyze
Welcome to the i2® Analyze documentation, where you can learn how to install, deploy, configure, and
upgrade i2 Analyze.
i2 Analyze is an integrated set of secure services and stores that provide search, analysis, and storage
of intelligence data to authorized users. It presents these functions through the web and desktop
applications that connect to it.
Support
The i2 Analyze support page contains links to the release notes and support articles.
i2 Analyze support
Understanding i2 Analyze
i2® Analyze is an integrated set of secure services and stores that provide search, analysis, and storage
of intelligence data to authorized users. It presents these functions through the web and desktop
applications that connect to it.
When deployed with all components active, i2 Analyze provides key features to its users:
• A structured store for intelligence data with services that enable bulk and targeted create, retrieve,
update, and delete operations.
• An extensible framework for connecting to and retrieving intelligence data from sources external to
the platform.
• A searchable store for Analyst's Notebook charts that analysts can use for remote storage, and for
sharing work with their peers.
• A pervasive data model that is optimized for exploring and visualizing the relationships between
records.
• A security model and architecture that together ensure the security of data in motion and at rest.
i2 Analyze enables these features through a range of technologies:
• The i2 Analyze services are deployed in an application on a WebSphere Liberty application server.
• Searching and returning results from intelligence data uses the Apache Solr search index platform.
• The stores for intelligence data and Analyst's Notebook charts are created in a database
management system.
The precise feature set of a particular deployment of i2 Analyze depends on the components that
you decide to include, which depends in turn on the i2 software that you use it with. The different
components of i2 Analyze are subject to licensing agreements.
Understanding
i2 Analyze: Data model on page 5
The i2 Analyze data model governs the structure of the data that i2 Analyze processes. That structure,
consisting of entities and links and the properties they contain, is fundamental to the analytical and
presentational functionality provided by other offerings that i2 Analyze is part of, for example i2 Analyst's
Notebook Premium and i2 Enterprise Insight Analysis.
i2 Analyze: Security model on page 18
All data in i2 Analyze can be secured so that only the users who are supposed to interact with it are able
to do so. Using the i2 Analyze security model, you can decide what access users have to records and
features, based on their membership of user groups.
i2 Analyze: Logging and auditing on page 25
i2 Analyze provides mechanisms for logging two types of information that the system generates during
normal execution. You can control what information is sent to the system logs, and audit the commands
that users invoke.
Configuration tasks
Configuring i2 Analyze on page 103
During implementation of a production deployment, you need to modify the original base deployment to
match the needs of your organization. When the i2 Analyze deployment is in use, you can make further
configuration changes to adjust to changing needs, and to administer the system.
Troubleshooting and support
i2 Analyze support page
i2 Support
Components
The data and security models that i2 Analyze provides are a constant in all deployments. Its other
features are due to components that you can choose to deploy according to your requirements and the
terms of your license.
i2 provides i2 Analyze as a part of other offerings. For example, i2 Analyst's Notebook Premium and i2
Enterprise Insight Analysis. These offerings entitle you to deploy different components of i2 Analyze.
• For any of the offerings, you can deploy i2 Analyze with the Chart Store, so that users can store and
share their charts securely.
• You can also deploy i2 Analyze with the i2 Connect gateway, which allows you to develop and
deploy connectors to data sources outside your organization.
• For Enterprise Insight Analysis only, you can also deploy the Information Store, in which users can
retain, process, and analyze intelligence data.
Chart Store
The Chart Store has two purposes in a deployment of i2 Analyze. First, it provides Analyst's Notebook
Premium users with server-based, secure storage for their charts. Second, in a deployment that does
not also include the Information Store, it supports the operation of the i2 Notebook web client.
Users of Analyst's Notebook Premium can upload charts to a Chart Store as easily as they can save
them to their workstations. When they do so, they can choose to share charts with their colleagues, or
take advantage of i2 Analyze's features for organizing, indexing, and securing charts:
• i2 Analyze authenticates users at login and then uses authorization to control access to all stored
charts
• Charts in the Chart Store are subject to the i2 Analyze security model, enabling per-team or per-user
access
• i2 Analyze indexes both the contents and the metadata of uploaded Analyst's Notebook charts, so
that users can search and filter large numbers of charts quickly and easily
Note: For each chart in the store, the generated index contains information from i2 Analyze records
and Analyst's Notebook chart items. i2 Analyze does not index the contents of any documents that
are embedded in a chart.
• The chart metadata itself is configurable, so that the options for categorizing and searching charts
reflect the users' domain.
Users of the i2 Notebook web client can search for and visualize data from an Information Store and
any external sources that are connected through the i2 Connect gateway. However, not all deployments
of i2 Analyze include both of those components. In a deployment that does not have an Information
Store, the i2 Notebook web client requires the Chart Store, which provides short-term storage for the
web charts that users create.
For information on deploying i2 Analyze with the Chart Store, see the deployment documentation.
i2 Connect gateway
When you deploy i2 Analyze with the i2 Connect gateway, you gain the option to develop or purchase
connectors that can acquire intelligence data from external sources. When a user of Analyst's Notebook
Premium or the i2 Notebook web client makes a request to an external data source, the connector
converts results from their original format into the entities, links, and properties of i2 Analyze records.
When i2 Analyze accesses data through a connector, the source is not modified. After the data is vetted
and verified, and provided that the new or modified entities and links have compatible types, Analyst's
Notebook Premium users can upload the records to an Information Store.
For each external data source, an organization can choose between setting up a connector to access
the data in the external location, and ingesting the data into an Information Store. The choice depends
on a number of factors, including the following considerations:
Availability
A connector is typically useful where a data source is constantly available. If a data source is
not always available, ingestion to the Information Store provides users with constant access to a
snapshot of the data.
Currency
If an external data source is updated very frequently, and it is desirable to have the most current
data returned at each request, a connector can be used to achieve this. If regular refreshes are
appropriate for the data, but moment-to-moment currency is not required, the data source could
instead be ingested to the Information Store on a regular schedule.
Quantity
The Information Store is designed for high-volume data. However, an organization might not want
to create a duplicate copy of a very large external data source that can be made available using a
connector.
Terms of use
For reasons of security, privacy, or commercial interest, a third-party provider of data might not
permit the organization to store a copy of the data source in their own environment. In this situation,
a connector must be used to access the external data source.
For more information about how the i2 Connect gateway and connectors provide access to external
data, see Connecting to external data sources. For examples of creating and deploying connectors, see
the open-source project at https://github.com/IBM-i2/Analyze-Connect.
Information Store
The Information Store is a secure, structured store for intelligence data that analysts and other users
can search and visualize. i2 Analyze provides services that enable bulk and targeted create, retrieve,
update, and delete operations on Information Store data. In deployments that include it, the Information
Store also fulfills the functions of the Chart Store on page 3.
Records in the Information Store can be loaded automatically from data sources external to i2 Analyze,
or they can be imported or created by individual Analyst's Notebook Premium users. To cope with the
different requirements of bulk and targeted operations, the store distinguishes between system- and
analyst-governed data:
• System-governed data is loaded into the Information Store in bulk. i2 Analyze includes a toolkit that
accelerates the process of extracting and transforming data from external sources and loading it
into the Information Store. You can arrange for correlation to take place during this process, and for
subsequent loads from the same source to update or delete data that the Information Store has seen
before.
• Analyst-governed data is added to the Information Store through Analyst's Notebook Premium,
when users upload records that they've imported or created directly on their charts. To optimize that
process, i2 Analyze provides matching functionality, so that users are alerted when records that they
create match records that are already in the Information Store.
Subject to authorization, Analyst's Notebook Premium users can modify analyst-governed records in the
Information Store, but system-governed records are read-only. Apart from that distinction, all records in
the Information Store can searched for, compared, and analyzed in exactly the same way, regardless of
governance.
Users of the i2 Investigate and i2 Notebook web clients cannot modify any records in the Information
Store, but they can search for and visualize records from the store using the tools that those
applications provide.
To understand the structure of data in the Information Store, see The i2 Analyze data model. To learn
how to deploy i2 Analyze with the Information Store, see Creating an example with an Information Store
and Creating a production deployment. For more information about using the ETL toolkit to populate the
Information Store with system-governed data, see Ingesting data into the Information Store.
Data model
The i2 Analyze data model governs the structure of the data that i2 Analyze processes. That structure,
consisting of entities and links and the properties they contain, is fundamental to the analytical and
presentational functionality provided by other offerings that i2 Analyze is part of, for example i2 Analyst's
Notebook Premium and i2 Enterprise Insight Analysis.
Note: Because of the way that these relationships appear in visualizations, the structure is sometimes
called a dumbbell.
Some of the information that users see in a relationship like this one comes from the data itself:
• For the entity on the left, the data includes the property values "Anna", "Harvey", and "5/5/74".
• Similarly, for the entity on the right, the property values include "Ford", "Mondeo", and "2007".
• The data for the link includes a way of identifying the two entities that it connects.
The remainder of the information in the example comes from definitions in an i2 Analyze schema:
• The default icons for the entities, and the names ("First Name", "Manufacturer") and logical types of
their properties, are all defined in an i2 Analyze schema.
• The default label for the link ("Owns") is also defined in an i2 Analyze schema.
In practice, it can be best to make links lightweight and use intermediate entities to model the details
of complex associations. Among other things, this approach allows improved modeling of multi-way
associations, such as a conference call that has multiple participants. The following diagram shows the
difference:
To align your data with an i2 Analyze schema, you must resolve it into the component parts of ELP
relationships. If your source is a relational database, it is possible that your tables each correspond to
particular kinds of entities, while foreign key relationships form the basis for particular kinds of links. If
your source is a delimited text file, it is possible that rows in the file contain the data for one or more
entities and links.
In an i2 Analyze schema, entity types and link types have similar definitions. Among several common
features, entity types and link types both have:
• Identifiers, so that types can refer to each other in rules about their relationships
• Names that users see when they interact with entities and links of particular types
• Definitions of the properties that entities and links of particular types can contain
Identifier
Icon
Entity type
Link type
As well as the common features, each entity type contains the icon that represents entities with that type
in visualizations. Link types do not contain icons, but they do contain lists of 'from' and 'to' entity type
identifiers. For a link that has a particular link type, these lists determine what entity types the entities
at each of the link can have. For example, a link of type Calls might be valid in both directions between
two entities of type Person. An Owns link might be valid between a Person and a Car, but would not be
allowed in the opposite direction.
For an i2 Analyze schema to be valid, any identifiers that appear in the 'from' and 'to' lists of link types
must also appear as the identifiers of entity types.
Property types
In an i2 Analyze schema, entity types and link types both contain property types. For an entity or link
record that has a particular type, the property types specify the names and the logical types of the
properties that the entity or link can have.
Identifier
Display name
Data type
Property type
Note: This representation is simplified. A property type can also specify a list of possible values for
properties of that type. Furthermore, it can declare whether a property of that type is mandatory for an
entity or a link that has the containing type.
i2 Analyze schemas
An i2 Analyze schema is a statement about the categories of data that a deployment can work with,
and about the relationships that can exist between data in different categories. As a result, i2 Analyze
schemas are responsible for what data looks like when it is visualized; for the types of analysis that can
users can perform; and for the structures in which data is stored for retrieval.
Kinds of schema
All deployments of i2 Analyze must contain at least one schema. Some deployments can contain
several schemas. All schemas perform the same role, but which kinds of schema you need depends
partly on how you want your deployment to behave, and partly on which components of i2 Analyze it
uses.
Gateway schema
i2 Connect gateway
i2 Analyze application
defines a schema of its own. But sometimes, you might have several connectors whose data shares
a set of common types. In this situation, you can create a gateway schema and arrange for all the
connectors to use the types that the gateway schema defines.
Note: Because gateway schemas are not tied to the structure of data storage, they are relatively
easy to modify and redeploy. Developing and testing a gateway schema can be a convenient way to
create a schema that you plan eventually to use for the Information Store.
Connector schema
All data in an i2 Analyze deployment must use entity and link types from a schema. If a connector
does not or cannot use types from an Information Store schema or a gateway schema, then it must
provide its own definitions of the types that it uses.
A connector schema can be appropriate when you are prototyping a connector to a new data
source, or when you know that data from a particular source is subject to frequent changes.
Alternatively, you might be using or creating a connector that is designed for use in multiple i2
Analyze deployments. In that case, it can be helpful for the connector to come with definitions of its
own types.
An i2 Analyze deployment that includes the i2 Connect gateway and connectors to external data
sources is likely to involve several schemas. Client software can visualize data that uses types from
any schema, display its property values, and subject it to structural analysis. However, data can be
uploaded to an Information Store only if it uses types from that store's schema. And users can perform
comparative analysis only between data that uses the same types.
Type conversion
Type conversion
i2 Connect
gateway
i2 Analyze application
Information Store
Connector Connector Schema Connector Schema Schema
Connector Connector Connector
i2 Analyze deployment
Web server
Key: Data
Conversion
Type usage
To enable i2 Analyze to treat data from all sources similarly, you can arrange for records to be
converted from one type to another as they move through the system. To enable this type conversion,
you provide one-to-one mappings between types from different schemas. For example, if you map
the types from a connector schema (or a gateway schema) to types in the Information Store schema,
then you can upload data from that connector to the Information Store. If you map types from several
connector schemas to the types in a gateway schema (or the types in one gateway schema with those
in another), it becomes possible to use data from one connector as seeds for searches of another.
Record identifier
Record identifier
Provenance
Metadata
Security dimension values Property values
• When an i2 Analyze record has more than one piece of provenance, it can contain all the data from
all the contributing sources. In that case, the property values that the record presents are derived
from the source data.
• Metadata includes the following information:
• Timestamps, which reflect when data in an i2 Analyze record was created or modified
• Source references, which describe the sources that the data in a record came from
• Notes, which users can write to add free-form text commentary to a record
For link records, the metadata also includes information about the strength and direction of the link.
As an example of how to represent a simple entity that contains data from a single source, consider the
following information about a person:
The following diagram shows one way to represent this information as an i2 Analyze record:
Identifier
Record identifier
"Person"
Type identifier
"First name"
Source identifier
Property type
Provenance
"Last name"
"Anna"
Property type
"Harvey" Display names
5/5/74
"Date of birth"
"Blonde"
Property type
"Blue"
Property values
"Hair color"
Property type
Note: An i2 Analyze entity record can contain properties that have any of the property types that the
entity type defines. However, one record can contain only one property of each defined type.
The diagram also shows how the property types in the schema only partially determine the contents of
an i2 Analyze record. Some of the other contents are due to the security schema, while others still are
about identification:
• All i2 Analyze records contain security dimension values, which i2 Analyze uses to determine the
access level to the record that a particular user has.
• When they enter the system (through ingestion to the Information Store, or through Analyst's
Notebook Premium), i2 Analyze records receive a universally unique record identifier. This identifier
is permanent for the lifetime of the record. If they have the necessary access level, any user of the
system can use the record identifier to refer to a particular record.
• i2 Analyze records that began life in an external data source contain one or more pieces of
provenance. Each piece has a source identifier that references the data for the record in its original
source. One record can have provenance from more than one source.
Note: For records in an Information Store that were loaded through ingestion, source identifiers
have the additional feature of being unique within the store. These source identifiers are known as
origin identifiers.
• All i2 Analyze records can contain timestamps in their metadata that specify when source data for
the record was created or edited.
• i2 Analyze link records contain an indication of their direction. i2 Analyze considers links to go 'from'
one entity 'to' another. The direction of a link can be with or against that flow, or it can run in both
directions or none.
When i2 Analyze records are stored in an Information Store, they contain a few extra pieces of data:
• All i2 Analyze records retain timestamps in their metadata for when they were first created or
uploaded to the Information Store, for when they were most recently uploaded, and for when they
were last updated.
• All i2 Analyze records can contain a correlation identifier. If two records have the same correlation
identifier, the platform considers that they represent the same the real-world object and might merge
them together.
Your data sources are likely to contain some, but not all, of the data that i2 Analyze records require. To
enable an Information Store to ingest your data, or to develop a connector for the i2 Connect gateway,
or to write an import specification, you must provide the extra information to i2 Analyze.
Type identifiers
Every i2 Analyze record contains a type identifier, which is a reference to one of the entity types or link
types that a schema defines. When you create an ingestion mapping file, an import specification, or a
connector, you must arrange for each incoming record to receive the identifier of a type definition.
Every i2 Analyze link record contains two further type identifiers, which are references to the entity
types of the records at the ends of the link. You must arrange for incoming link records to receive these
identifiers as well.
This strong typing of records in i2 Analyze is key to the analytical functions that the platform provides.
It allows users to consider not only the existence of relationships between records, but also the nature
of those relationships. Schemas define exactly what relationships to allow between record types, and i2
Analyze enforces those rules during record creation.
Record identifiers
i2 Analyze records are created when you ingest data into the Information Store, or when a user creates
an item that contains an i2 Analyze record on the chart surface by:
• Importing data through an import specification
• Adding the results of an operation against an external source
• Using an i2 Analyze palette in Analyst's Notebook Premium
At creation, every i2 Analyze record automatically receives a universally unique record identifier that
is permanent for the lifetime of that record. Users and administrators of an i2 Analyze deployment can
use the record identifier as a convenient way to refer to a record in features such as text search and the
Investigate Add-On.
Source identifiers
The role of a source identifier is to refer to the data for a record reproducibly in its original source. If a
record represents data from several sources, then it contains several source identifiers. The nature of a
source identifier depends on the source and the record creation method, and sometimes on whether the
record is a link or an entity.
When you write ingestion mappings or develop connectors for the i2 Connect gateway, you are
responsible for providing the identifying information. For example, if the original source is a relational
database, then entity data is likely to have ready-made source identifiers: table names and primary
keys. Link data can also have ready-made source identifiers, but it might not, especially if the
relationship that the link represents exists only as a foreign key.
If the source of a record is a text file, then the file name might form part of the source identifier, along
with some reference to the data within the file.
Note: Source identifiers are not displayed to end users, but they are a part of the data that records
contain. Avoid including sensitive information such as passwords, or configuration detail such as IP
addresses. Assume that any information you use as part of a source identifier might be read by users of
the system.
Origin identifiers
In general, source identifiers are not certain to be unique within a deployment of i2 Analyze. Several
users might independently retrieve the same data from an external source, resulting in several records
with the same source identifier. However, when you ingest data into the Information Store, i2 Analyze
compares the incoming source identifier with existing records. If it finds a match, i2 Analyze updates a
record instead of creating one.
The source identifiers that records receive during ingestion therefore are unique within i2 Analyze, and
they have a special name in this context. They are called origin identifiers.
Correlation identifiers
The purpose of a correlation identifier is to indicate that the data in an i2 Analyze record pertains to
a particular real-world object. As a result, correlation identifiers are usually related to property values
rather than other identifiers. (For example, two Person records from different sources that contain the
same social security number are likely to contain data about the same real person.) When two records
have the same correlation identifier, they represent the same real-world object, and are candidates to be
merged.
When you ingest data into the Information Store, you can provide a correlation identifier for each
incoming record. For more information about correlation identifiers and how to create them, see
Correlation identifiers.
Security model
All data in i2 Analyze can be secured so that only the users who are supposed to interact with it are able
to do so. Using the i2 Analyze security model, you can decide what access users have to records and
features, based on their membership of user groups.
In i2 Analyze, all users are members of one or more groups. For example, there might be a group
of "administrator" users. There might be separate groups of users for each operational team in
your organization. There might be a group of users with higher security clearance than others. The
assignment of users to groups is handled at login.
Just as users of i2 Analyze are categorized, so too are records, according to a range of deployment-
specific criteria. For example, records might be categorized according to the nature of the information
they contain, or how sensitive that information is.
To make sure that users see only the records that they are allowed to see, every deployment of i2
Analyze has a security schema. The security schema defines the categories into which records must be
placed, and the relationships that determine what access the users in a particular group get to records
in a particular category.
In other words, the i2 Analyze security schema allows you to create rules that say things like, "Users
with low security clearance cannot see sensitive records," or "Users in Team A can see only records
whose source was signals intelligence." i2 Analyze then combines such rules predictably, on a per-
record and per-user basis.
Important: Orthogonal to this security model, i2 Analyze supports blanket controls over the visibility
of records with particular types. You can specify that only certain groups of users can see records of
a specific type, and that all records of that type are invisible to all other users, regardless of security
schema categories. For more information about this functionality, see Item type security.
As a result of these definitions, for example, it is possible to mark a record as containing confidential
information derived from a human informant, to be available to users in Team B.
In some dimensions (such as security classification), the possible values form a sequence from which
each record takes a single value. In these cases, the values act as levels, where each value supersedes
all the values after it. If a record is "Top Secret", it cannot be "Restricted" at the same time.
In dimensions such as operational team, where the values do not form a sequence, records can take
one or more values. A record might be available to users in Team B and Team C.
Every record in an i2 Analyze deployment must have at least one value from each of the security
dimensions in that deployment. There is no such thing as an "optional" dimension. For example:
Security schema
Record X Record Y
Security Classification: Restricted Security Classification: Secret
Intelligence Type: Human Informant Intelligence Type: Open Source
Operational Team: C Operational Team: A, B
There are no restrictions on the numbers of dimensions or values that a security schema can define.
Keep in mind, though, that the more dimensions there are, the more complicated it becomes to maintain
the security schema.
The user has read-only access to the record and its data.
Update
The user can read, modify, and delete the record and its data.
In an i2 Analyze security schema, the set of security permissions for a user group defines mappings
from dimension values to access levels. Users receive the security access levels that their user group
indicates for the dimension values of a record.
For example, a dimension value might mark a record as containing open source information, and a
security permission might state that members of a certain user group have the "Update" access level on
records with that dimension value. In that case, a user in that group receives the "Update" access level
on that record.
In practice, when a user is a member of several user groups, or a record has multiple dimension values,
it is possible for a user to receive several security access levels from different security permissions. In
these circumstances, i2 Analyze computes a single security access level from all the contributors.
Security schema
It is not compulsory for a set of permissions for a user group to provide a security access level for every
value of every dimension. Any dimension value that does not appear in a set of permissions receives a
default security access level, according to a set of rules:
• For an unordered dimension, a dimension value that does not appear in the permissions receives the
"None" level.
• For an ordered dimension:
• If the unspecified value comes after a dimension value that does appear, then the unspecified
value receives the same level as the specified value.
• If the unspecified value comes before a dimension value that does appear, then the unspecified
value receives the "None" level.
For example, if a particular set of permissions associates the "Read only" access level with
"Restricted" records (and makes no other setting), then the default access level for "Confidential"
records is "None". However, if the permissions associate the "Read only" access level with
"Confidential" records instead, then users in the same group receive that access level for
"Restricted" records as well.
An i2 Analyze system administrator must arrange the security schema so that all users can receive a
security access level that is not "None" for at least one value in every dimension.
Record Y
Security Classification: Secret
Intelligence Type: Open Source
Operational Team: A, B
Then, consider a user in a group that has the following security permissions. (It does not matter whether
the permissions are due to one user group or several.)
Security permissions
The following diagram then represents the process for determining the security access level of the user
for this record.
The record has two values in the "Operational Team" dimension that map to different access levels for
this user. At this stage in the calculation, the less restrictive access level ("Update") is taken. However,
the values from the "Security Classification" and "Intelligence Type" dimensions both map to the
"Read only" access level. The final part of the calculation takes the most restrictive level, and the user
therefore has the "Read only" access level on this record.
Security architecture
The i2 Analyze security architecture supports the behavior that the i2 Analyze security model requires.
Any part of the i2 Analyze application can interact with the security architecture to determine what rights
the current user has for the operation that they want to perform.
i2 Analyze authenticates users through a choice of technologies, and determines their access level
for every record that it manages. The i2 Analyze security model bases its behavior on the interaction
between the security dimension values that records have, and the security permissions that user groups
convey.
• Records in i2 Analyze are categorized by receiving values from security dimensions. The values that
a record has from different security dimensions affect whether users can view or edit that record.
• Security permissions apply to groups of users. On a per-group basis, they associate security access
levels with particular dimension values. User group membership is often decided by the team
membership or the security clearance of the users that they contain.
The components of an i2 Analyze deployment interact with the security architecture in the following
ways:
• At login, WebSphere Liberty requires clients to authenticate before they can interact with i2 Analyze.
On successful authentication, the client receives a Lightweight Third-Party Authentication (LTPA)
token in a cookie.
• During normal operation, the client passes the cookie back to the i2 Analyze application, which
enforces data access rights.
The following diagram shows how security works in a typical i2 Analyze deployment:
Web Browser
Client
SERVER
HTTP Server
authentication
Servlet Container
Trust
Association
Interceptor Intelligence
Analysis Platform
Write Application
Default
WebSphere
Principal
User authorization
Provider
Registry
LDAP
Write-side Server
principal provider then maps the retrieved information to group permissions that are defined in the
security permissions section of the i2 Analyze security schema. This mapping is deployment-specific
because the security schema is deployment-specific.
• Code in the i2 Analyze application compares the permissions of the current user with the security
dimension values of records, to determine what access levels the user receives for each record.
The technologies in the diagram are not fixed. For example, it is possible to use any supported store for
the user registry. The requirements are as follows:
• The i2 Analyze application must be able to derive information about a user from the credentials they
present.
• A (potentially) deployment-specific module must map user information onto membership of the
groups that are named in the security permissions section of the i2 Analyze security schema.
If an implementation of the security architecture fulfills these requirements, then it is suitable for use in
an i2 Analyze deployment.
System logging
The components that make up the i2 Analyze server all contain instrumentation that sends information
about the health of the system to log files or the console. You can control the locations of the log files,
and the volume of information that the system sends, by editing the log4j.properties files in the
deployment toolkit.
The information that i2 Analyze can log through this mechanism includes detail about warnings
and errors that users see in their client software, and incremental status reports about long-running
processes such as ingestion.
For more information about system logging, see the deployment and configuration guides for i2 Analyze,
or the Apache Log4j website.
When a user runs an authenticated command against any of its services, i2 Analyze can record
information about the user who ran the command, and full details of the command that they ran.
For example, you might use this functionality to audit the frequency with which different users make
requests for the data that i2 Analyze manages, or to track searches with particular patterns. i2 Analyze
handles user activity logging for the i2 Connect gateway separately from the Information Store and the
Chart Store.
Note: Depending on the volume of data, enabling user activity logging might affect the performance of
i2 Analyze.
Information Store and Chart Store
i2 Analyze supports user activity logging for all of the main analysis operations against the
Information Store and (where relevant) the Chart Store. For example, you can configure separate
logging (or no logging at all) for search, expand, and find path operations. You can also arrange for
logging to occur when records and charts are created or modified.
To audit user activity, including activity due to Analyst's Notebook Premium and the Investigate
Add-On, you write a class that implements the IAuditLogger interface and specify it in the
ApolloServerSettingsMandatory.properties file in the deployment toolkit.
At startup, i2 Analyze calls IAuditLogger to discover what activities to log information about.
Later, it calls again with information such as the time of the activity, the name and security clearance
of the user, and the parameters that they supplied.
For more information and an example of how to implement IAuditLogger, see i2 Analyze
Developer Essentials.
i2 Connect gateway
To log operations against external sources through the i2 Connect gateway, i2 Analyze uses the
same IAuditLogger interface that it uses for the Information Store and chart store. However, all
such operations are logged through a single method on the IAuditLogger interface.
Deploying i2 Analyze
All deployments of i2 Analyze are different in terms of the functionality they support and the components
they employ. They are also different in terms of how you intend them to be used. The process by which
you deploy i2 Analyze changes significantly, depending on whether your target is an example or a
production environment.
Deployment information
Deployment types on page 27
Before you start a deployment of i2 Analyze, choose the type of deployment that you want to create.
Use the following information to ensure that you choose the correct type of deployment for your
requirements.
Deployment tasks
Creating an example deployment on page 27
To understand what i2 Analyze is, and to demonstrate the features of the system, you can create an
example deployment.
Creating a production deployment on page 37
The process of creating a production deployment is separated into a number of different activities, which
you complete in an iterative process. The suggested process involves the creation and retention of
several environments, each one focused on different aspects of a production deployment of i2 Analyze.
Troubleshooting and support
i2 Analyze support page
i2 Support
Deployment types
Before you start a deployment of i2 Analyze, choose the type of deployment that you want to create.
Use the following information to ensure that you choose the correct type of deployment for your
requirements.
Example deployment
You can use an example deployment to learn about i2 Analyze, demonstrate the features of the
system, and ensure that any software prerequisites are installed correctly on a single server.
When you create an example deployment, the deployment toolkit populates all of the mandatory
configuration settings with default values and deploys the system. The deployment uses an example
i2 Analyze schema, security schema, and data. Some configuration settings that are not mandatory
for deployment are also populated to demonstrate extra features of the system.
For more information about example deployments, see Creating an example deployment on page
27.
Production deployment
A production deployment is available to analysts to complete mission critical analysis on real-world
data. When you decide to create a production deployment of i2 Analyze, you must start from a clean
installation of i2 Analyze.
The process for creating a production deployment involves a number of different deployment and
configuration activities. As part of the process, you must develop an i2 Analyze schema and security
schema for your data.
For more information about production deployments, see Creating a production deployment on page
37.
setup -t deployExample
setup -t start
When you start i2® Analyze, the URI that you can use to connect to the deployment is displayed in the
console. For example:
Web application available (default_host): http://host_name:9082/opal.
Install Analyst's Notebook Premium and connect to your deployment. For more information, see
Connecting clients on page 96.
setup -t deployExample
setup -t start
7. Optional: To populate your Information Store with the provided example data for the law-
enforcement-schema.xml schema, run the following command:
setup -t ingestExampleData
When you start i2® Analyze, the URI that you can use to connect to the deployment is displayed in the
console. For example:
Web application available (default_host): http://host_name:9082/opal.
Install Analyst's Notebook Premium or open a web browser and connect to your deployment. For more
information, see Connecting clients on page 96.
setup -t deployExample
4. Install any dependencies and start the server that hosts the example connector.
Note: The example connector uses port number 3700. Ensure that no other processes are using
this port number before you start the connector.
a) In a command prompt, navigate to the toolkit\examples\connectors\example-
connector directory.
b) To install the dependencies that are required for the example connector, run the following
command:
npm install
npm start
5. Start i2 Analyze.
a) Open a command prompt and navigate to the toolkit\scripts directory.
b) To start i2 Analyze, run the following command:
setup -t start
When you start i2 Analyze, the URI that you can use to connect to it from Analyst's Notebook Premium
is displayed in the console. For example:
.
Install Analyst's Notebook Premium and connect to your deployment. For more information, see
Connecting clients on page 96.
Production deployments of i2 Analyze use client-authenticated SSL communication between i2 Analyze
and any connectors. The example deployment does not use it, and so Analyst's Notebook Premium
displays a warning to that effect when you open the external searches window. For more information
about configuring client-authenticated SSL, see Client-authenticated Secure Sockets Layer with the i2
Connect gateway.
You can create your own connectors to use with the deployment of i2 Analyze. For more information,
see i2 Analyze and the i2 Connect gateway.
Creating an example with the Chart Store and the i2 Connect gateway
An installation of i2 Analyze includes example settings for deploying the server with the Chart Store and
support for the i2 Connect gateway. With these settings, Analyst's Notebook Premium users can upload
charts to the Chart Store, while both they and i2 Notebook web client users can search for and retrieve
data from an example external data source through the i2 Connect gateway.
Install i2 Analyze and any software prerequisites. For more information, see Installing i2 Analyze. To
deploy the Chart Store, you need either IBM® Db2® or Microsoft™ SQL Server 2019. You do not need
IBM HTTP Server.
If you are using SQL Server, download the Microsoft™ JDBC Driver 7.4 for SQL Server archive
from https://www.microsoft.com/en-us/download/details.aspx?id=58505. Extract the contents of the
download, and locate the sqljdbc_7.4\enu\mssql-jdbc-7.4.1.jre11.jar file.
Before you create the example deployment, you must download and install Node.js to host the example
connector. Download Node.js for your operating system from: https://nodejs.org/en/download/. You can
install Node.js with the default settings.
The following procedure describes how to create an example deployment of i2 Analyze with the Chart
Store and the i2 Connect gateway. To use any deployment of i2 Analyze with the i2 Connect gateway,
you must obtain or create a connector to the external data source that you want to search. The i2
Analyze toolkit contains an example configuration for the deployment, and an example connector with
example data. The deployExample task generates the default values for the mandatory settings and
deploys the platform.
The example deployment demonstrates a working i2 Analyze system with an example user so that you
can log in.
In the example deployment, i2 Analyze runs with the example security schema and matching Liberty
security groups and users. The example user has the following credentials:
• The user name is Jenny
• The password is Jenny
The example deployment uses the chart-storage-schema.xml schema file, with the associated
chart-storage-schema-charting-schemes.xml file as the charting scheme.
1. Create the configuration directory:
a) Navigate to the \toolkit\examples\configurations\chart-storage-daod directory.
This directory contains the preconfigured files that you need to deploy a system that uses the i2
Connect gateway to connect to an external data source, and the Chart Store to store Analyst's
Notebook charts and support the i2 Notebook web client.
b) Copy the configuration directory to the root of the toolkit.
For example, C:\IBM\i2analyze\toolkit\configuration.
If you are using SQL Server as your database management system, you must complete extra
configuration actions to deploy the example.
2. Copy the example topology.xml file for SQL Server from the toolkit\configuration
\examples\topology\sqlserver to the toolkit\configuration\environment directory.
Overwrite the existing topology.xml file in the destination directory.
3. Copy the mssql-jdbc-7.4.1.jre11.jar file that you downloaded to the toolkit
\configuration\environment\common\jdbc-drivers directory.
Regardless of your database management system, you must complete the following steps after you
create the configuration directory.
setup -t deployExample
7. Install any dependencies and start the server that hosts the example connector.
Note: The example connector uses port number 3700. Ensure that no other processes are using
this port number before you start the connector.
a) In a command prompt, navigate to the toolkit\examples\connectors\example-
connector directory.
b) To install the dependencies that are required for the example connector, run the following
command:
npm install
npm start
8. Start i2 Analyze.
a) Open a command prompt and navigate to the toolkit\scripts directory.
b) To start i2 Analyze, run the following command:
setup -t start
When you start i2 Analyze, the URI that you can use to connect to the deployment is displayed in the
console. For example:
Web application available (default_host): http://host_name:9082/opal.
Install Analyst's Notebook Premium or open a web browser and connect to your deployment. For more
information, see Connecting clients on page 96.
Production deployments of i2 Analyze use client-authenticated SSL communication between i2 Analyze
and any connectors. The example deployment does not use it, and so Analyst's Notebook Premium
displays a warning to that effect when you open the external searches window. For more information
about configuring client-authenticated SSL, see Client-authenticated Secure Sockets Layer with the i2
Connect gateway.
You can create your own connectors to use with the deployment of i2 Analyze. For more information,
see i2 Analyze and the i2 Connect gateway.
This directory contains the preconfigured files that you need to deploy a system that uses the i2
Connect gateway to connect to an external data source, and the Information Store to store data
and Analyst's Notebook charts, and to support the i2 Notebook web client.
b) Copy the configuration directory to the root of the toolkit.
For example, C:\IBM\i2analyze\toolkit\configuration.
If you are using SQL Server as your database management system, you must complete extra
configuration actions to deploy the example.
2. Copy the example topology.xml file for SQL Server from the toolkit\configuration
\examples\topology\sqlserver to the toolkit\configuration\environment directory.
Overwrite the existing topology.xml file in the destination directory.
3. Copy the mssql-jdbc-7.4.1.jre11.jar file that you downloaded to the toolkit
\configuration\environment\common\jdbc-drivers directory.
Regardless of your database management system, you must complete the following steps after you
create the configuration directory.
4. Specify the credentials to use for the deployment.
a) Using a text editor, open the toolkit\configuration\environment
\credentials.properties file.
b) Enter the user name and password to use with the database.
c) Enter a user name and password to use for Solr.
d) Enter a password to use to encrypt LTPA tokens.
e) Save and close the credentials.properties file.
5. If you are using IBM Db2 11.5.6 Fix Pack 0 or later, you might need to update the port number that is
used in the topology.xml file.
By default, i2 Analyze is deployed to connect to Db2 using port 50000. In version 11.5.6 Fix Pack 0
and later, the default port that Db2 uses is changed to 25000. For more information about the ports
that Db2 uses, see Db2 server TCP/IP port numbers.
a) Using an XML editor, open the toolkit\configuration\environment\topology.xml file.
b) In the <database> element, set the value of the port-number attribute to 25000.
c) Save and close the topology.xml file.
6. Run the setup script to create the example deployment.
a) Open a command prompt and navigate to the toolkit\scripts directory.
b) To deploy the example, run the following command:
setup -t deployExample
7. Install any dependencies and start the server that hosts the example connector.
Note: The example connector uses port number 3700. Ensure that no other processes are using
this port number before you start the connector.
a) In a command prompt, navigate to the toolkit\examples\connectors\example-
connector directory.
b) To install the dependencies that are required for the example connector, run the following
command:
npm install
npm start
8. Start i2 Analyze.
a) Open a command prompt and navigate to the toolkit\scripts directory.
b) To start i2 Analyze, run the following command:
setup -t start
9. Optional: To populate your Information Store with the provided example data for the law-
enforcement-schema.xml schema, run the following command:
setup -t ingestExampleData
When you start i2 Analyze, the URI that you can use to connect to the deployment is displayed in the
console. For example:
Web application available (default_host): http://host_name:9082/opal.
Install Analyst's Notebook Premium or open a web browser and connect to your deployment. For more
information, see Connecting clients on page 96.
Production deployments of i2 Analyze use client-authenticated SSL communication between i2 Analyze
and any connectors. The example deployment does not use it, and so Analyst's Notebook Premium
displays a warning to that effect when you open the external searches window. For more information
about configuring client-authenticated SSL, see Client-authenticated Secure Sockets Layer with the i2
Connect gateway.
You can create your own connectors to use with the deployment of i2 Analyze. For more information,
see i2 Analyze and the i2 Connect gateway.
Before you can start to install and deploy i2 Analyze, you must first understand how i2 Analyze fits into
your organization:
• Understand what i2 Analyze is.
• Understand the requirements of the deployment, and align these to a deployment pattern.
• Understand the data and security models of the environment that i2 Analyze is deployed in.
For more information, see the Understanding section.
Deploying
After you identify the requirements of the deployment, you can start to create the production
deployment. The process of deploying i2 Analyze is completed in three phases that are explained in
Deployment phases and environments on page 38.
Note:
If your planned deployment of i2 Analyze is to a small workgroup and includes only the Chart Store,
an example deployment with the Chart Store and custom security settings might offer sufficient
performance. For more information about example deployments, see Creating an example with the
Chart Store on page 28.
Schema Configuration
Pre-production Production
development development Test environment
environment environment
environment environment
There are three phases in the process to deploy i2 Analyze into production.
Development
The development phase is where you configure i2 Analyze to meet the requirements of the final
deployment. In this phase, you develop the configuration in an iterative process that involves a
number of configuration changes and deployments of the system. During this phase, the lifetime of
a deployment is short.
Test
The test phase is where you deploy i2 Analyze for testing. In the test phase, you deploy i2 Analyze
with the configuration from the development phase and perform comprehensive testing of the
system.
Production
The production phase is where you deploy i2 Analyze into production. i2 Analyze is deployed with
the configuration that you tested in the test phase. In production, the deployment is fully operational
and used by analysts to complete mission critical work.
To start creating your production deployment, complete the instructions in Schema development
environment on page 39.
Schema Configuration
Pre-production Production
development development Test environment
environment environment
environment environment
In an environment that includes a database, significant changes to the schema or the security schema
can be time-consuming. Destructive changes to either schema require you to rebuild the database.
The purpose of the schema development environment is to enable rapid iteration. When you apply the
schemas that you create here to the Information Store in the configuration development environment,
they are less likely to need significant changes.
setup -t generateDefaults
The environment.properties and topology.xml are modified by this toolkit task. For more
information about the default values that are provided, see Configuration files reference.
5. In i2 Analyze Schema Designer, either create a new schema or open one of the examples to modify.
For information about creating or modifying schema files, see Creating schemas and Charting
schemes.
Example schema files are located in subdirectories of the configuration\examples\ directory.
For more information, see Example schemas on page 60.
Note:
If your planned deployment does not include the Information Store and you do not intend to use
a gateway schema, your choice of example here is not important. You just need to set up a valid
development environment.
a) Save the initial version of your schema in the configuration\fragments\common\WEB-INF
\classes directory.
A charting scheme file is saved in the same location when you save the schema.
b) Keep the schema file open in Schema Designer so that you can make more modifications after
you deploy i2 Analyze.
6. Copy the example security schema to the configuration\fragments\common\WEB-INF
\classes directory.
The example security schema file is located in the configuration\examples\security-
schema directory. For more information, see Example schemas on page 60.
7. In the configuration\fragments\common\WEB-INF\classes
\ApolloServerSettingsMandatory.properties file, set the values of the following settings to
the file names of your schema, charting scheme, and security schema:
• Gateway.External.SchemaResource
• Gateway.External.ChartingSchemesResource
• DynamicSecuritySchemaResource
For example, Gateway.External.SchemaResource=custom-schema.xml
8. Deploy i2 Analyze:
setup -t deploy
setup -t ensureExampleUserRegistry
The user has the user name 'Jenny' and the password 'Jenny'.
10.Start i2 Analyze:
setup -t startLiberty
When you start i2 Analyze, the URI that you can use to connect to it from Analyst's Notebook
Premium is displayed in the console. For example:
.
11.Connect to your deployment by using Analyst's Notebook Premium. Log in with the example 'Jenny'
user.
For more information, see Connecting i2 Analyst's Notebook Premium to i2 Analyze.
12.In Analyst's Notebook Premium, create items on the chart to visualize the i2 Analyze schema by
using the Gateway palette.
After you deploy i2 Analyze with the initial schema files, you can develop them for your own data
requirements:
setup -t updateConnectorsConfiguration
3. Start i2 Analyze:
setup -t restartLiberty
4. Test the changes to the schema and charting scheme by connecting to your deployment in Analyst's
Notebook Premium and modeling representative data on the chart by using the Gateway palette.
Repeat this process until the schema and the charting scheme meet your requirements. When your
schema development is complete, store your schema and charting scheme files in a version control
system.
This is not the final chance to modify the schema, but it should now contain all the entity and link types
that you require, and most of the property types.
When you are satisfied with the schema, develop the security schema for the deployment. For more
information, see Updating the security schema on page 42.
1. In an XML editor, either create a new security schema or open the one that is deployed in the
schema development environment.
For information about creating or modifying a security schema file for i2 Analyze, see The i2 Analyze
Security schema.
a) Save your security schema in the toolkit\configuration\fragments\common\WEB-INF
\classes directory.
b) Ensure that your security schema file is specified in configuration\fragments\common
\WEB-INF\classes\ApolloServerSettingsMandatory.properties.
After you modify your security schema, update the deployment with your changes.
2. Update and redeploy the system:
setup -t updateSecuritySchema
setup -t deployLiberty
3. Start i2 Analyze:
setup -t restartLiberty
4. If you changed the names of the user groups in the security schema, update the basic user registry
to match the new names.
For more information, see Configuring the Liberty user registry.
5. Test the changes to the security schema by connecting to the deployment in Analyst's Notebook
Premium as different users and changing the security permissions on records that you create.
Repeat this process until your security schema meets your requirements. When your security schema
development is complete, store your security schema and Liberty user registry files in a version control
system.
This is not the final time that you can modify the security schema, but you should aim to have most of
the security dimensions and dimension values defined.
After you finish developing your schema files, you can move to the next environment. For more
information, see Configuration development environment on page 44.
Schema Configuration
Pre-production Production
development development Test environment
environment environment
environment environment
After you create your schemas, you can use a configuration development environment to configure i2
Analyze in a single server environment. In the configuration development environment, you deploy i2
Analyze with the same data store as your intended production system.
In this deployment of i2 Analyze, you use any schema files that you previously created. If your intended
production system includes the Information Store and you have not created your schema files, complete
the instructions in Schema development environment on page 39.
setup -t generateDefaults
The environment.properties and topology.xml are modified by this toolkit task. For more
information about the default values that are provided, see Configuration files reference.
8. Specify the i2 Analyze schema, charting scheme, and security schema that you previously prepared.
a) Copy your i2 Analyze schema, charting scheme, and security schema files to the
configuration\fragments\common\WEB-INF\classes directory.
b) In the configuration\fragments\common\WEB-INF\classes
\ApolloServerSettingsMandatory.properties file, set the file names of the schema,
charting scheme, and security schema.
• If your deployment includes the Information Store or the Chart Store, set SchemaResource
and ChartingSchemesResource.
• If your deployment includes the i2 Connect gateway and you have developed
a gateway schema, set Gateway.External.SchemaResource and
Gateway.External.ChartingSchemesResource
• In both cases, set DynamicSecuritySchemaResource
9. Deploy i2 Analyze:
setup -t deploy
10.If you created a schema development environment, copy the user registry file from that environment
to the deploy\wlp\usr\shared\config directory in the new environment.
If you did not create a schema development environment, execute the following command to create
an example user that you can use to log in:
setup -t ensureExampleUserRegistry
The user has the user name 'Jenny' and the password 'Jenny'.
11.Start i2 Analyze:
setup -t start
The URI that users must specify is displayed in the console. For example:
Web application available (default_host): http://host_name:9082/opal/
After you deploy i2 Analyze, you can begin Developing the i2 Analyze configuration on page 46.
1. To develop the process for ingesting data into the Information Store, refer to Ingesting data into the
Information Store.
2. To develop connections to external data sources, refer to Connecting to external data sources.
3. Ensure that analysts can import and create representative records in Analyst's Notebook Premium.
Then, if required, upload records to the Information Store. For more information, see Import data and
Create i2 Analyze chart items.
When you develop the process to get your data into i2 Analyze, you might realize that your schema
or security schema are not correct for your data. You can update the deployed schemas to better
represent your data and security model. Some changes require you to remove and recreate the
underlying database.
4. To update your deployed Information Store or Chart Store schema, refer to Changing the schema.
5. To update your deployed security schema, refer to Configuring the security schema.
After you develop the mechanisms for making data available to analysts, you can configure how
analysts interact with the data when they use the system. The list of things that you can configure
includes:
6. To configure which features or types of commands analysts can access, refer to Controlling access
to features.
7. To configure how analysts search for information, and the options that are available to them, refer to
Configuring search.
8. To configure how analysts can identify matching records, refer to Configuring matching.
9. To configure user security, refer to Configure user authentication and authorization.
For more information about the configuration changes that you can make, see Configuring i2
Analyze.
After you configure your deployment sufficiently in the single-server environment, you can move to
another environment that is more representative of the production deployment. Keep your configuration
development environment in place so that you can access the configuration directory in later
phases of the production process, and return to it to make further configuration changes.
Next, create the Pre-production environment on page 48.
Pre-production environment
Your target deployment topology is probably different from the single-server configuration development
environment. In the pre-production environment, deploy i2 Analyze in the deployment topology that
matches your intended production deployment.
Schema Configuration
Pre-production Production
development development Test environment
environment environment
environment environment
After you develop the i2 Analyze configuration, you can use a more representative pre-production
environment for aspects of the configuration that rely on environment-specific variables. In this
environment, deploy i2 Analyze in the same physical deployment topology as the target production
deployment. For example, you might modify the configuration to deploy the components of i2 Analyze
on multiple servers or with high availability.
Test environment
The second phase of creating a production deployment focuses on testing. This phase is important,
because it enables you to identify any changes to the environment or configuration that must be
completed before you deploy into production.
Schema Configuration
Pre-production Production
development development Test environment
environment environment
environment environment
You use a test environment to test the deployment to ensure that it meets the requirements of the
production deployment. The test environment should match the production environment as closely as
possible. In the test environment, perform comprehensive testing of the deployment against the final
requirements with a selection of the users and a sample data set.
Production environment
The third phase of the deployment process focuses on deploying into production. In this phase you
make your deployment of i2 Analyze available to users.
Schema Configuration
Pre-production Production
development development Test environment
environment environment
environment environment
You use a production environment to host the deployment in production. When you have a configuration
of i2 Analyze that passed your test phase, you can deploy i2 Analyze with that configuration into your
production environment.
In the production environment, run the tests again to confirm that the deployment is working
successfully. If these tests are successful, you can make the deployment available to users. If the tests
are not successful, you must return to your test or development environment to make any necessary
changes. Then, complete the test phase once more.
If you are planning to deploy with high availability, complete the instructions in i2 Analyze with high
availability on page 77 rather than the steps on this page.
1. Install any prerequisite software to prepare your servers for the production environment.
For the production environment, use the same deployment topology as in your test environment. For
more information, see Deployment topologies on page 54.
2. Copy the toolkit\configuration directory from the test environment to the toolkit directory
at the root of the deployment toolkit on the Liberty server in the production environment.
3. Update the values for any configuration settings that are specific to the environment.
environment.properties, http-server.properties, and topology.xml contain settings
that you might need to update. For more information, see Configuration files reference.
4. If your deployment uses a proxy server or load balancer to route requests from clients, ensure that it
is configured for use with i2 Analyze. Specify the URI that clients use to connect to i2 Analyze.
For more information, see Deploying a proxy server for use with i2 Analyze on page 75.
5. Deploy and start i2 Analyze:
• In a single-server topology, or with a remote database only, see Deploying i2 Analyze on page
89.
• In a multiple-server topology, see Deploying i2 Analyze on multiple servers on page 90.
After you deploy i2 Analyze, you can replicate any configuration changes that are not stored in the
configuration of i2 Analyze.
6. Configure Liberty security for your environment. To do this, repeat any changes that you made to the
Liberty configuration in the previous environment.
This might involve copying the user registry file, or updating the server.xml file.
7. Complete any configuration changes in the Information Store database.
a) If you created any rules or schedules to delete records by rule, replicate the rules and schedules
in the current environment.
b) If you created any merged property values definition views for your ingestion process, replicate
the view definitions in the current environment.
Complete testing of the deployment to ensure that it is working in the production environment before you
make i2 Analyze available to users.
Deployment resources
The deployment resources section contains information that is referenced elsewhere in the deployment
section. The information is used in a many places throughout the deployment process.
Deployment topologies
You can use the deployment toolkit to deploy i2 Analyze in a number of different physical topologies.
Throughout the process of creating a production deployment, you might use several of them depending
on your purpose at the time.
The following diagrams show the servers that are used in each deployment topology, the prerequisites
that are required on each server, and the components of i2 Analyze that are deployed.
You can deploy i2 Analyze in the following physical deployment topologies:
• i2 Connect gateway only (single server) on page 55
In the i2 Connect gateway only deployment, all of the components of i2 Analyze are on the same server.
No database management system is required.
i2 Analyze server
Prerequisites Deployed components
i2 Analyze toolkit Liberty
Configuration i2 Analyze
ZooKeeper
Solr nodes
host
Single server
In the single-server topology, all of the components of i2 Analyze are on the same server.
Single-server topology
i2 Analyze server
Prerequisites Deployed components
i2 Analyze toolkit Liberty
ZooKeeper
Solr nodes
IBM HTTP Server host
Multiple servers
Multiple-server topology
Database management
system client
In the diagram, Liberty, Solr, ZooKeeper, and the Information Store (or the Chart Store) are each
deployed on their own server.
You can also have a multiple-server deployment topology in which any of number of the components
are located on the same server.
On the Liberty server, install i2 Analyze, the database management system client, and optionally the
HTTP server:
• Installing i2 Analyze
• Db2 client or SQL Server client
• HTTP Server
On your database management server, install your database management system:
• Installing Db2 or Installing SQL Server
On each Solr and ZooKeeper server, install i2 Analyze:
• Installing i2 Analyze
Directories
The deployment toolkit contains files in several directories. On most occasions, you need to interact with
only three of the directories:
• The examples directory includes the base configurations that you can use to create example
deployments, example data, and an example i2 Connect connector.
• The configuration directory contains files that you must update with information specific to your
deployment.
Note: When the deployment toolkit is first installed, this directory does not exist. At the start of
deployment, you use one of the base configurations to create this directory.
• The scripts directory contains the setup script that you use to deploy and configure i2 Analyze.
The setup script completes tasks that apply configuration and other changes to deployments of i2
Analyze. For a list of available tasks and other information, refer to Deployment toolkit tasks on page
60.
Alternatively, on the i2 Analyze server, open a command prompt and navigate to toolkit\scripts
and run one of the following commands:
setup -h
The -h argument displays the usage, common tasks, and examples of use for the setup script.
setup -a
The -a argument displays the same content as when you use -h, and a list of additional tasks.
Note: On Linux, whenever you run the setup script you must prefix its name with './'. For example,
./setup -t deploy.
Example schemas
The i2 Analyze deployment toolkit includes example schema files that you can use as a starting point for
your own schemas.
The deployment toolkit includes five pairs of example i2 Analyze schemas and associated charting
schemes in the toolkit\examples\schemas directory:
Law Enforcement
The law enforcement schema deals with criminal activity. It contains entity and link types that are
designed to track connections within criminal networks.
Commercial Insurance
The commercial insurance schema deals with fraud in a commercial setting. It contains entity
and link types that are designed to track financial transactions such as credit card payments and
insurance claims.
Military
The military schema helps with military intelligence tracking. It contains entity and link types that
target military operations.
Signals Intelligence
The signals intelligence schema focuses particularly on the cellphones and cell towers that are
involved in mobile telecommunications, and on the calls that take place between them.
Chart Storage
The chart storage schema is not an example of a particular domain, but rather the starting point for
any deployment of i2 Analyze that includes only the Chart Store.
The example security schema files specify a number of security dimensions and dimension values, and
security groups for users.
After you create the toolkit\configuration directory by copying it from the base configuration
of your choice, you can find an appropriate example security schema file in the toolkit
\configuration\examples\security-schema directory.
usage: setup [-h] [-a] [-s SERVER] [-w WAR | -id IDENTIFIERS | -hn HOST | --all] [--force] -t TASK
[--option-name...]
Examples of use:
• setup -t deployExample
• setup -t ingestExampleData
• setup -t deploy
• setup -t start
• setup -t configSummary
Examples of use:
• setup -t upgrade
• setup -t upgradeConfiguration
• setup -t upgradeSolr -hn "example.solr.hostname
The "upgradeZookeeper", "upgradeSolr", "upgradeDatabases", and "upgradeSolrCollections" tasks
support an optional -hn argument that restricts their effect to a single host.
The "clearData" and "clearSearchIndex" tasks support an optional -co argument that restricts their effect
to a single Solr collection.
Example of use:
• setup -t clearData -co "solr.collection.id
* The exact behavior of these tasks may change depending on the chosen database engine.
Examples of use:
• setup -t configureHttpServer
• setup -t replayFromTimestamp
In addition to the start, stop and restart tasks, the following tasks are available:
Examples of use:
• setup -t startSolrAndZk
• setup -t stopZkHosts -id "1,3"
• setup -t restartSolrNodes -id node1
• setup -t startSolrNodes -hn "example.solr.hostname"
The "SolrNodes" and "ZkHosts" tasks support an optional -id argument.
The comma-separated list of identifiers that you specify restricts the task to the nodes and hosts with
matching identifiers in the topology.
The "SolrNodes" and "ZkHosts" tasks support an optional -hn argument that restricts their effect to a
single host.
Examples of use:
• setup -t createSolrNodes -hn "example.solr.hostname"
• setup -t createSolrCollections -co "solr.collection.id"
These tasks support an optional -hn argument that restricts their effect to a single host.
The "createAndUploadSolrConfig", "createSolrCollections", and "deleteSolrCollections" tasks support an
optional -co argument that restricts their effect to a single Solr collection.
The following tasks manage main indexes and match indexes. ZooKeeper and the application
server must be running for these commands to succeed:
• LTPA keys
• Solr search platform
After you specify the credentials, you might need to change the values in the
credentials.properties file from the values that are used in development or test deployments.
For example, your database management system might require a different user name and password in
a test or production system.
When you deploy i2 Analyze, the passwords in the credentials.properties file are encoded.
Database
For each database that is identified in topology.xml, you must specify a user name and a
password in the credentials.properties file. The setup script uses this information to
authenticate with the database.
Note: The user that you specify must have privileges to create and populate databases in the
database management system.
The database credentials are stored in the following format:
db.infostore.user-name=user name
db.infostore.password=password
For example:
db.infostore.user-name=admin
db.infostore.password=password
Note: The db.truststore.password credential is used only when you configure the connection
between the database and Liberty to use SSL. If you are not using SSL to secure this connection,
you do not need to specify a value for the db.truststore.password credential. For more
information about configuring SSL, see Configure Secure Sockets Layer with i2 Analyze.
LTPA keys
You must provide a value for the ltpakeys.password property. This value is used by the system
to encrypt LTPA tokens.
• For a stand-alone deployment of i2 Analyze, you can specify any value as the password.
• For a deployment of i2 Analyze that uses LTPA tokens to authenticate with other systems, you
must specify the same password that those systems use.
Solr search platform
The Solr search platform is used to search data in the Information Store. You must provide values
for the solr.user-name and solr.password properties. Any Solr indexes are created when
i2 Analyze is first deployed, and the values that you provide here become the Solr user name and
password.
If you have already deployed i2 Analyze, and you want to change the Solr password, new properties
must be created in the following format:
solr.user-name.new=value
solr.password.new=value
For example:
#Solr credentials
#The user name and password for solr to use once deployed
solr.user-name=admin
solr.password={enc}E3FGHjYUI2A\=
solr.user-name.new=admin1
solr.password.new=password
The value for node-name is the name of the node to create in the Db2 node directory. The value
of the node-name attribute must start with a letter, and have fewer than nine characters. For
more information about naming in Db2, see Naming conventions.
The value for os-type is the operating system of the remote Db2 server. The value of the os-
type attribute must be one of the following values: AIX, UNIX, or WIN.
Note: The value of the instance-name attribute must match the instance name of the remote Db2
instance.
You can use the db2level command to get the name of your remote instance. For more
information, see db2level - Show Db2 service level command.
2. Edit the configuration\environment\opal-server\environment.properties file, to
specify the details of your remote and local instance of Db2.
a) Ensure that the value of the db.installation.dir property is set for the local instance of Db2
or Data Server Client on the Liberty server.
If you are using a non-root installation, set the value for this property to the sqllib directory in
the installation user's home directory. For example, /home/db2admin/sqllib.
b) Set the value of the db.database.location.dir property for the remote instance of Db2 on
the database server.
3. Ensure that the users that are specified for your databases in the configuration\environment
\credentials.properties file are valid for your remote instance of Db2 on the database server.
4. Catalog the remote node:
setup -t catalogRemoteDB2Nodes
You can complete the steps that are performed by the catalogRemoteDB2Nodes task manually.
For example, if you are deploying a system that uses Transport Layer Security (TLS).
To catalog the remote nodes manually, you can run the CATALOG TCPIP NODE instead of using the
setup -t catalogRemoteDB2Nodes command. For more information about the command, see
CATALOG TCPIP/TCPIP4/TCPIP6 NODE command.
The following table shows how the CATALOG command parameters map to the values in the
topology.xml file:
When i2 Analyze is deployed, the Information Store database is created or updated on the remote
database management system. A remote node is created with the name that is specified for the node-
name attribute, and the database is cataloged against that node.
To check that the remote nodes and databases are cataloged, you can use the
listDB2NodeDirectory and listDB2SystemDatabaseDirectory tasks after you deploy i2
Analyze:
• The listDB2NodeDirectory task lists the contents of the Db2 node directory.
• The listDB2SystemDatabaseDirectory task lists the contents of the local Db2 system
database directory.
Where the value for os-type is the operating system of the remote database server. The value
of the os-type attribute must be one of the following values: UNIX or WIN.
For more information about the <database> element attributes, see Databases on page 297.
2. Edit the configuration\environment\opal-server\environment.properties file to
specify the details of your remote and local installations of SQL Server:
a) Ensure that the value of the db.installation.dir property is set for the local installation of
SQL Server or the Microsoft™ Command Line Utilities for SQL Server on the Liberty server.
b) Set the value of the db.database.location.dir property for the remote installation of SQL
Server on the database server.
3. Ensure that the user that is specified for your database in the configuration\environment
\credentials.properties file is valid for your remote installation of SQL Server on the
database server.
When i2 Analyze is deployed, the Information Store database is created or updated on the remote
database management system.
If the connection details for the remote database management system change, you can update the
topology.xml file and redeploy the system.
<solr-nodes>
...
<solr-node
memory="2g"
data-dir="C:/IBM/i2analyze/data/solr"
host-name="solr_server_host_name"
id="node2"
port-number="8984"
/>
</solr-nodes>
For example:
<zookeeper id="zoo">
<zkhosts>
<zkhost
host-name="zookeeper_server1_host_name" id="1"
port-number="9983" quorum-port-number="10483" leader-port-number="10983"
data-dir="C:/IBM/i2analyze/data/zookeeper"
/>
<zkhost
host-name="zookeeper_server2_host_name" id="2"
port-number="9983" quorum-port-number="10483" leader-port-number="10983"
data-dir="C:/IBM/i2analyze/data/zookeeper"
/>
<zkhost
host-name="zookeeper_server3_host_name" id="3"
port-number="9983" quorum-port-number="10483" leader-port-number="10983"
data-dir="C:/IBM/i2analyze/data/zookeeper"
/>
</zkhosts>
</zookeeper>
I2A_COMPONENTS_JAVA_HOME
An absolute path to a Java™ installation that is used by the Solr, ZooKeeper, and Liberty
components of i2® Analyze. By default, it is C:\IBM\i2analyze\deploy\java on Windows™ and
/opt/IBM/i2analyze/deploy/java on Linux®.
Note: When you deploy i2® Analyze, Adopt OpenJDK 11 HotSpot is installed in the specified
location if it does not contain an installation of Java.
The following variables are commented out in the file, but can be used to specify different Java™
installations for the components of i2® Analyze. If the file contains these variables, their values override
the value of I2A_COMPONENTS_JAVA_HOME for each component.
SOLR_JAVA_HOME
An absolute path to a Java™ installation that is used by Solr.
Apache recommends to use Adopt OpenJDK 11 HotSpot with Solr.
ZK_JAVA_HOME
An absolute path to a Java™ installation that is used by ZooKeeper.
Apache recommends to use Adopt OpenJDK 11 HotSpot with ZooKeeper.
LIBERTY_JAVA_HOME
An absolute path to a Java™ installation that is used by Liberty.
IBM® recommends to use Adopt OpenJDK 11 OpenJ9 with Liberty.
ETL_TOOLKIT_JAVA_HOME
The path to a Java™ installation that is used by the ingestion commands in the ETL toolkit. The path
can be absolute or a relative path from the root of the ETL toolkit.
Note: When you create the ETL toolkit, Adopt OpenJDK 11 HotSpot is included as part of the
ETL toolkit. If you change the path, you must ensure that an installation of Java™ 11 exists in the
specified location.
The setup.in is part of the configuration, and is copied to each deployment toolkit in your
environment. It is recommended that you install Java™ in the same location on each server where
a particular component is located. This means that this file remains the same on each copy of the
configuration.
1. In a text editor, open the setup.in script file for your operating system.
2. Uncomment and update the paths to specify the Java™ 11 installations that you want to use.
3. Save and close the file.
After you update the file, continue with the rest of the deployment process.
• Specify a single connection string for the FrontEndURI setting in the i2 Analyze configuration
To use a proxy server or load balancer with i2 Analyze, it must meet the following requirements:
• Handle WebSocket requests and keep the WebSocket connection alive for longer than 180 seconds.
• Pass through any security configurations (for example client certificate authentication) to the i2
Analyze server. You must enable this according to the documentation for your proxy server.
• If you want to use the X-Forwarded headers, the proxy server or load balancer must be able to
populate them. For more information about the headers, see X-Forwarded-Host and X-Forwarded-
Proto.
• If you are planning to deploy i2 Analyze with high availability, your load balancer must also provide
server persistence. For more information, see Deploying a load balancer on page 87.
To allow users to connect by using the URI of the proxy server or load balancer, you can populate
X-Forwarded-Host and X-Forwarded-Proto headers or specify the URI in the i2 Analyze
configuration. If you configure the connection URI for your deployment, users that connect to i2 Analyze
must use the URI that you specify.
If your application has the http-server-host attribute set to true in the topology.xml, your proxy
server is not required to route requests to the port number the application is listening on.
The following image shows the URIs that you might use in this example:
http://proxy.server/i2 http://proxy.server/i2
Proxy server / Load Proxy server / Load
Client Client
balancer balancer
X-Forwarded-Proto: HTTP
X-Forwarded-Host: proxy.server:9082
http://example.host:9082/opal
http://example.host:9082/opal
a) Configure your proxy server or load balancer to populate the X-Forwarded-Host and X-
Forwarded-Proto headers with the hostname and protocol that was used to connect to the
proxy server or load balancer.
For more information about the headers, see X-Forwarded-Host and X-Forwarded-Proto.
After you configure the X-Forwarded headers, you do not need to redeploy i2 Analyze. The URI that
is displayed in the console will remain the same.
2. To specify a single connection URI:
a) Using a text editor, open the toolkit\configuration\fragments\opal-services\WEB-
INF\classes\DiscoClientSettings.properties file.
b) Set the value of the FrontEndURI property to the URI that can be used to connect to your
deployment of i2 Analyze through the proxy server.
For example, FrontEndURI=http://proxy.server/i2.
c) Save and close the file.
After you update the connection URI, you can either modify other aspects of the deployment toolkit
or redeploy the system to update the deployment with your changes. After you deploy i2 Analyze and
start the server, the URI that can be used to access the system is displayed in the console.
Ensure that you can access i2 Analyze by using this URI from a client workstation that uses the proxy
server.
Load balancer
Database server -
standby
Solr cluster
ZooKeeper ensemble
ZooKeeper server
After you read the following information that explains how each component is used in a deployment with
high availability, follow the instructions to deploy i2 Analyze in this pattern. For more information about
how to deploy i2 Analyze, see Deploying i2 Analyze with high availability on page 81.
Load balancer
A load balancer is required to route requests from clients to the Liberty servers that host the i2
Analyze application. Additionally, the load balancer is used to monitor the status of the Liberty
servers in a deployment. The load balancer must route requests only to servers that report their
status as "live".
After you deploy i2 Analyze with a load balancer, you can make requests to i2 Analyze through the
load balancer only.
You can also use the load balancer to distribute requests from clients across the servers in your
deployment. The load balancer must provide server persistence for users.
Liberty
To provide high availability of Liberty, and the i2 Analyze application, you can deploy i2 Analyze
on multiple Liberty servers. Each Liberty server can process requests from clients to connect to i2
Analyze.
In a deployment with multiple Liberty servers, one is elected the leader. The leader can process
requests that require exclusive access to the database. The actions that are required to be
completed on the leader Liberty are described in Liberty leadership configuration.
At a minimum, two Liberty servers are required.
Database management system
To provide high availability of the Information Store database, use the functions provided by your
database management system to replicate the primary database to at least one standby instance.
i2 Analyze connects to the primary instance at one time, with the contents of the Information Store
database replicated to the standby instances.
If the primary instance fails, one of the standby instances becomes the primary. When a standby
becomes the primary, it must be configured to be read and writable. This means that i2 Analyze can
continue to function when the initial primary server fails.
• For more information about high availability in Db2, see High availability.
• For more information about high availability in SQL Server:
• For Enterprise edition, see What is an Always On availability group?
• For Standard edition, see Basic availability groups
• If you are deploying with SQL Server on Linux, see Always On availability groups on Linux
For Db2, at a minimum one primary server and one standby server is required.
For SQL Server, at a minimum one primary server and two standby servers are required. If you are
using basic availability groups, this is one standby server and one failover quorum witness.
Solr
i2 Analyze uses SolrCloud to deploy Solr with fault tolerance and high availability capabilities. To
provide fault tolerance and high availability of the Solr cluster, deploy the cluster across at least two
Solr nodes with each node on a separate server.
At a minimum, two Solr servers are required.
For more information about SolrCloud, see How SolrCloud Works.
ZooKeeper
To deploy ZooKeeper for high availability, deploy multiple ZooKeeper servers in a cluster known
as an ensemble. For a highly available solution, Apache ZooKeeper recommend that you use
an odd number of ZooKeeper servers in your ensemble. The ZooKeeper ensemble continues to
function while there are more than 50% of the members online. This means, that if you have three
ZooKeeper servers, you can continue operations when a single ZooKeeper server fails.
At a minimum, three ZooKeeper servers are required.
For more information about a multiple server ZooKeeper setup, see Clustered (Multi-Server) Setup.
Load balancer
Solr cluster
ZooKeeper ensemble
Note: For Microsoft SQL Server, you must have a third database server to act as a witness. To
continue to provide availability when one of the zones fails, this server must be located outside of the
two availability zones. For more information, see Three synchronous replicas.
• Because ZooKeeper requires more than 50% of the servers in an ensemble to be available, one
server must be located outside of the availability zones. The ZooKeeper server does not require a
significant amount of resources so it can be located on any server that can communicate with the
availability zones.
• You can deploy multiple Liberty and Solr servers to provide more availability at a server level within
each zone.
• For the components that use an active/active pattern, servers in each availability zone are used to
process requests. To maintain the performance of the system, ensure that the network that is used
between availability zones is stable and provides enough bandwidth for system operation.
setup -t installLiberty
setup -t deployLiberty
setup -t startLiberty
After you deploy i2 Analyze, you can replicate any configuration changes that are not stored in the
configuration of i2 Analyze on each Liberty server.
The security configuration must be the same on each server.
8. Configure Liberty security for your environment. To do this, repeat any changes that you made to the
Liberty configuration in the previous environment.
This might involve copying the user registry file, or updating the server.xml file.
9. After you deploy i2 Analyze, configure your database management system to replicate the
Information Store database to your standby servers.
a) If you are using Db2, configure high availability for the Information Store database and replicate it
to any standby database instances.
For more information, see Replicate the Information Store in Db2 on page 88.
b) If you are using SQL Server, add the Information Store to your availability group.
For more information, see Replicate the Information Store in SQL Server on page 89.
10.Complete any configuration changes in the Information Store database on the primary server.
a) If you created any rules or schedules to delete records by rule, replicate the rules and schedules
that you created in the previous environment.
On SQL Server, you must update the automated job creation schedule on every database
instance. For more information see, Changing the automated job creation schedule.
b) If you created any merged property values definition views for your ingestion process, replicate
the view definition that you created in the previous environment.
After you deploy i2 Analyze with high availability, return to perform the rest of the instructions for
creating a deployment in your current environment:
• Creating the pre-production environment on page 48
• Creating the test environment on page 51
• Creating the production environment on page 53
For example:
<zookeeper id="zoo">
<zkhosts>
<zkhost
host-name="zookeeper_server1_host_name" id="1"
port-number="9983" quorum-port-number="10483" leader-port-number="10983"
data-dir="C:/IBM/i2analyze/data/zookeeper"
/>
<zkhost
host-name="zookeeper_server2_host_name" id="2"
port-number="9983" quorum-port-number="10483" leader-port-number="10983"
data-dir="C:/IBM/i2analyze/data/zookeeper"
/>
<zkhost
host-name="zookeeper_server3_host_name" id="3"
port-number="9983" quorum-port-number="10483" leader-port-number="10983"
data-dir="C:/IBM/i2analyze/data/zookeeper"
/>
</zkhosts>
</zookeeper>
For example:
<solr-nodes>
<solr-node
memory="2g"
data-dir="C:/IBM/i2analyze/data/solr"
host-name="solr_server1_host_name"
id="node1"
port-number="8983"
/>
<solr-node
memory="2g"
data-dir="C:/IBM/i2analyze/data/solr"
host-name="solr_server2_host_name"
id="node2"
port-number="8983"
/>
</solr-nodes>
<solr-collections>
<solr-collection
num-replicas="2"
min-replication-factor="1"
id="main_index"
type="main"
max-shards-per-node="4"
num-shards="1"
/>
...
</solr-collections>
{
"set-cluster-policy":[
{
"replica":"<2",
"shard":"#EACH",
"host":"#EACH"
}
]
}
• The cluster policy is set when you create the Solr cluster as part of the deployment steps.
Continue configuring the i2 Analyze configuration. For more information, see Deploying i2 Analyze with
high availability on page 81.
Call the live endpoint on each Liberty server in your deployment. The endpoint returns the status of
a single Liberty server, and not the others in the deployment.
3. Configure your load balancer to use cookie-based server persistence.
When a user connects, if there is no cookie from the load balancer that indicates it has already
connected to one of the active Liberty servers, provide it with one. When this user connects again,
the cookie it received ensures that any requests are routed to the same Liberty server.
If the server that a user has persistence with is offline, the load balancer must route the request to a
live Liberty server. The user's cookie must be updated to achieve persistence with the live server that
it connected to.
4. Before you can connect to i2 Analyze via the load balancer, you might need to specify the
connection URI that clients can use to connect to i2 Analyze.
a) In the configuration\fragments\opal-services\WEB-INF\classes
\DiscoClientSettings.properties file, set the value of the FrontEndURI setting to the
URI that can be used to connect to your deployment of i2 Analyze.
For more information, see Specifying the connection URI.
Some load balancers modify the HTTP origin header, the value that you specify for the
FrontEndURI must match the value of the HTTP origin header after it is modified by the load
balancer.
Continue configuring the i2 Analyze configuration. For more information, see Deploying i2 Analyze with
high availability on page 81.
Deploying i2 Analyze
To deploy i2 Analyze in a single-server deployment topology, you can run a script. After i2 Analyze is
successfully deployed, you can start the system.
You must have an i2 Analyze configuration that is set up for a single-server, remote database only, or i2
Connect gateway only deployment topology. For more information about creating a valid configuration,
see Creating the pre-production environment on page 48.
Run any toolkit commands from the toolkit\scripts directory in the deployment toolkit.
1. Deploy i2 Analyze:
setup -t deploy
2. Start i2 Analyze:
setup -t start
If an error message is displayed, refer to Troubleshooting the deployment process on page 92.
After you deploy and start i2 Analyze, return to perform the rest of the instructions for creating a
deployment in your current environment:
• Creating the pre-production environment on page 48
• Creating the test environment on page 51
• Creating the production environment on page 53
Installing components
Install the components of i2 Analyze on the servers that you have identified.
1. On the Liberty server, run the following commands:
setup -t installLiberty
setup -t installZookeeper
setup -t installSolr
Where zookeeper.hostname is the hostname of the ZooKeeper server where you are running
the command, and matches the value for the host-name attribute of a <zkhost> element in the
topology.xml file.
2. On the Liberty server, run the command to upload the Solr configuration to ZooKeeper:
Where liberty.hostname is the hostname of the Liberty server where you are running the
command, and matches the value for the host-name attribute of the <application> element in
the topology.xml file.
3. On each Solr server, create and start any Solr nodes:
Where solr.hostname is the hostname of the Solr server where you are running the
command, and matches the value for the host-name attribute of a <solr-node> element in the
topology.xml file.
On the Liberty server, run the commands to deploy and start a number of the components.
4. Create the Solr collections:
To test that the Solr Collection is created correctly, click Cloud in the Solr Web UI, or you can go
to http://solr.hostname:port-number/solr/#/~cloud. Log in with the user name and
password for Solr in the credentials.properties file.
A horizontal tree with the collection as the root is displayed. Here you can see the breakdown of the
shards, nodes, and replicas in any collections.
5. Create the Information Store database:
setup -t createDatabases
To check that the database is created correctly, connect to the database by using a database
management tool.
6. Deploy the i2 Analyze application:
setup -t deployLiberty
7. If you are using IBM HTTP Server, configure the HTTP Server:
setup -t configureHttpServer
setup -t startLiberty
After you deploy and start i2 Analyze, return to perform the rest of the instructions for creating a
deployment in your current environment:
• Creating the pre-production environment on page 48
• Creating the test environment on page 51
• Creating the production environment on page 53
• Deploying i2 Analyze with high availability on page 81
Warning
When there is a warning, the validation process displays a brief configuration summary and a
WARNINGS section. The WARNINGS section identifies settings that might not be configured correctly,
but the deployment process continues. For example:
Here, the schema is not set, and so the default law enforcement schema is used.
Error
If an error occurs, the validation process displays a longer configuration summary, and an ERRORS
section. The ERRORS section identifies missing values that must be present. The deployment
process stops, and you must correct the errors before you attempt to deploy again. For example:
Here, the database location directory is not set, so the database cannot be configured.
:buildApplication
There are two reasons why a task might not run, but deployment can still proceed.
UP-TO-DATE
The task was performed earlier, or its output is already present. For example:
:installJDBCDrivers UP-TO-DATE
SKIPPED
The task is not required for this deployment. For example:
:importLTPAKey SKIPPED
If an error occurs, deployment stops in a controlled manner. i2 Analyze displays a stack trace that
contains the name of the task that failed, and information about the location of the error. For example:
:createDatabasesIfNecessary FAILED
* Where:
Script 'C:\IBM\i2analyze\toolkit\scripts\gradle\database.gradle' line: 173
The messages are displayed on screen and sent to the log files that are stored in the toolkit
\configuration\logs directory.
The IBM_Solr.log file contains messages that are directly from Solr and ZooKeeper.
Solr logs
By default, information that relates to Solr is logged in deploy\solr\server\logs\<node
port>.
The following message is displayed if there are any identifiers in your results configuration file that are
not present in your i2 Analyze schema when you start i2 Analyze.
To resolve this issue, ensure that all of the identifiers in your results configuration file are present in your
i2 Analyze schema. For more information, see Setting up search results filtering.
The security dimension values that are specified in the security schema for the
<DefaultSecurityDimensionValues> element are incorrect. For more information, see Setting
default dimension values.
Unable to determine the DB2 version as you do not have execute permission to
db2level
This message is displayed if the JDBC driver is not present or the installation path to Db2 is specified
incorrectly when you run the generateDefaults toolkit task.
There are two possible solutions to resolve this issue:
• Ensure that you provide the JDBC driver for your deployment. For more information, see Specifying
the JDBC driver on page 70.
• Ensure that the value for the db.installation.dir setting in the environment.properties
file is correct. For more information, see environment.properties
This message is displayed when trying to deploy with a highlight query configuration that contains an
invalid character for the Db2 code page. After you install Db2, you specify the code page that the Db2
instance uses. For more information about the value that you must use for the code page, see the Post-
install section of Installing IBM Db2 for i2 Analyze.
Connecting clients
After you deploy and start i2 Analyze, connect to your system by using one of the supported clients
and search for data. Depending on the deployed components, you can access an i2 Analyze system by
using either i2 Analyst's Notebook Premium or one of the web clients.
• Set up at least one user on the application server that has permission to access records in i2
Analyze.
• Ensure that i2 Analyze is started.
• Make a note of the URI for connections. When you start i2 Analyze, the URI that can be used to
connect is displayed in the console. For example:
Web application available (default_host): http://host_name:9082/opal
To use Analyst's Notebook to connect, you must also:
• Install Analyst's Notebook Premium. For more information, see the i2 Analyst's Notebook Premium
documentation.
To use a web client to connect, you must also:
• Ensure that you have the correct licence agreements in place to use the i2 Investigate or the i2
Notebook web client.
• If you want to use the i2 Notebook web client, ensure that you have configured your deployment
to use SSL and given users the relevant command access permission. For more information, see
Enabling access to the i2 Notebook web client.
Note: When you create an example deployment, the example user already has the necessary
permission to use the i2 Notebook web client. Also, if you have access to the i2 Analyze server, you
can connect to the client through the localhost URL without configuring SSL.
• To use Analyst's Notebook Premium to connect, follow the instructions in the i2 Analyst's Notebook
Premium documentation.
• To use a web client to connect:
a) Open a web browser, and navigate to https://host_name/opal.
The web client displays a login dialog.
b) Enter the name and password of a user who is registered in the application server.
If that user has permission to use the i2 Notebook web client, you see that application's user
interface. If they do not have permission, you see the i2 Investigate web client user interface
instead.
c) Search for and visualize data to verify that the application is working correctly.
setup -t dropDatabases
A message is displayed when you run the toolkit task to confirm that you want to complete the
action. Enter Y to continue.
To ensure that the database is removed correctly, use a database management tool to try and
connect to the database.
3. Move or delete the deployment and data directories in every i2 Analyze deployment toolkit in your
environment.
By default, the directories are: IBM\i2analyze\deploy and IBM\i2analyze\data:
• The deployment directories for the components of i2 Analyze, and the data directory for
i2 Analyze is specified in the environment.properties file. For more information see
environment.properties.
• The data directories for the components of i2 Analyze are specified in the topology.xml file.
For more information about this file, see topology.xml.
4. Move or delete the toolkit\configuration directory in every i2 Analyze deployment toolkit in
your environment.
The i2 Analyze deployment is returned to its just-installed state.
Best practices
When you create or manage a production deployment of i2 Analyze, there are a number of best
practices that you might follow.
Server host names
When you have multiple environments with i2 Analyze deployed, it can be useful to use host
names for the servers and application that identify their purpose. For example, in your development
environment you might set the host names of servers as follows: i2analyze.development.server.
Configuration management
When you create and maintain a deployment of i2 Analyze, it is an iterative process. Because of the
iterative nature, it is important to keep a record of changes to the i2 Analyze configuration. By using
a source control system, you can maintain a history of changes to the configuration.
After you populate the configuration directory in the pre-production environment, make a copy of the
directory in a location that is managed by a source control system.
Permanent environments
You can maintain your final development and test environments so that you can return to previous
deployment phases or use the environments to test any future upgrades of the system.
Production changes
Make any configuration changes in the lowest available deployment, then promote up to production.
Installing i2 Analyze
You can install i2 Analyze by extracting an archive file.
For details of the system requirements, see the Release notes.
To install i2 Analyze, you must have the i2 Analyze version 4.3.5 distribution. Choose one of the
following distributions to install i2 Analyze from:
• i2 Analyze V4.3.5 (Archive install) for Windows
• i2 Analyze V4.3.5 (Archive install) for Linux
Installing i2 Analyze by extracting an archive file is useful when you are installing and deploying i2
Analyze on multiple servers, or on a server without a graphical user interface.
i2 Analyze is provided in a .zip archive file for Windows, and a .tar.gz archive file for Linux. To
install i2 Analyze, extract the archive file and then accept the license agreement.
1. Download the i2 Analyze distribution file for your operating system, and extract the contents into one
of the following directories:
• On Windows, C:\IBM\i2analyze
• On Linux, /opt/IBM/i2analyze
The following files and directories are present in the IBM\i2analyze directory:
• license
• swidtag
• toolkit
• license_acknowledgment.txt
Before you can use i2 Analyze, you must read and accept the license agreement and copyright notices.
2. In a text editor, open the notices file and the license file for your language from the i2analyze
\license directory.
For example, the English license is in the LA_en file.
3. Accept the license and copyright notices.
a) Open the IBM\i2analyze\license_acknowledgment.txt file.
b) To accept the license and copyright notices, change the value of LIC_AGREEMENT to ACCEPT.
For example:
LIC_AGREEMENT = ACCEPT
Software prerequisites
The software prerequisites that you require depend on the deployment pattern and deployment topology
of i2® Analyze that you want to deploy.
The software prerequisites that you might require are:
• A supported database management system.
• IBM® Db2® Enterprise Server, Advanced Enterprise Server, Workgroup Server, or Advanced
Workgroup Server editions at version 10.5 Fix Pack 10 or later and version 11.1 Fix Pack 3 or
later, or Advanced and Standard editions at version 11.5.
IBM® Db2® Standard Edition - VPC Option - version 11.5 is included with i2 Analyze.
For more information about how to install Db2, see Installing IBM Db2 for i2 Analyze on page
99.
• Microsoft™ SQL Server Standard or Enterprise editions at version 14.0 (2017) or 15.0 (2019).
For more information about how to install SQL Server, see Installing Microsoft SQL Server for i2
Analyze on page 101.
• An HTTP server that supports a reverse proxy.
If you decide to use an HTTP server, it must be configured to handle WebSocket requests.
The deployment toolkit can automatically configure an IBM® HTTP Server instance on the server
where the i2 Analyze application is deployed to act as a reverse proxy. To support this approach,
you must install IBM® HTTP Server 9.0.0.7.
For more information about how to install IBM HTTP Server, see Installing IBM HTTP Server for i2
Analyze on page 102.
For more information about the system requirements and prerequisites, see Release Material.
If you are creating a production deployment, you can install Db2® in any location. When you install Db2®,
record the location of the installation directory because you must specify this location in the deployment
toolkit before you can deploy i2® Analyze.
If you are creating an example deployment, install Db2® in the following location:
• For Windows™: C:\Program Files\IBM\SQLLIB
• For Linux®: /opt/ibm/db2/Db2_version
In all deployments, you must ensure that the following features are installed:
• Spatial Extender server support
• Spatial Extender client
When you deploy i2® Analyze with the Information Store, you must install Db2® with the product
interface language set to English only. Additionally, if you install Db2® on Red Hat® Enterprise Linux®,
you must use an English version of Red Hat® Enterprise Linux®. For more information, see Changing
the Db2® interface language (Linux® and UNIX®) and Changing the Db2® product interface language
(Windows™).
®
For Linux deployments, if you are deploying with a schema that contains non-English characters,
ensure that the operating system's LANG environment variable is set to the locale of the non-English
characters.
You can only deploy i2® Analyze with a Db2® instance where the DB2_WORKLOAD environment
variable is not set and the database is row-organized. If you have an existing Db2® instance where
DB2_WORKLOAD is set or the database is column-organized, you must create a Db2® instance with the
supported configuration and deploy i2® Analyze with it. For more information about the DB2_WORKLOAD
environment variable, see System environment. For more information about column-organized
databases, see Setting the default table organization.
Users
On Windows™, Db2® creates a Windows™ user account (db2admin), and two Windows™ groups
(DB2ADMNS, DB2USERS). To work successfully with Db2®, ensure that your Windows™ user account is a
member of the DB2ADMNS Windows™ group.
On Linux®, Db2® creates an Administration Server user (dasusr1) and group (dasadm1), an
instance-owning user (db2inst1) and group (db2iadm1), and a fenced user (db2fenc1) and group
(db2fadm1). To work successfully with Db2®, ensure that the user that runs the deployment script is a
member of the dasadm1 and db2iadm1 groups.
Make a note of any user names and passwords that are specified during the installation process.
Note: In all scenarios, the user that you use to run the deployment scripts must have permission to
create and modify the database.
Post-install
After you install Db2® for the Information Store, you must enable the administrative task scheduler and
set the code page on the Db2® installation:
1. On the command line, navigate to the SQLLIB\bin directory of your Db2® installation. On Linux®,
navigate to the db2inst1/sqllib/bin directory.
2. To enable the administrative task scheduler, run the following command:
db2set DB2_ATS_ENABLE=YES
3. To set the code page for UTF-8 encoding, run the following command:
db2set DB2CODEPAGE=1208
For more information about installing Db2®, see Installing Db2® database servers.
If you plan to deploy i2® Analyze with remote database storage, you must install Db2® on your database
server, and Db2® or IBM® Data Server Client on the application server. Install Db2® according to the
previous instructions; if you are using IBM® Data Server Client, also ensure that Spatial Extender client
support is installed. For more information about IBM® Data Server Client, see Installing IBM® Data
Server drivers and clients.
The instance of Db2® or IBM® Data Server Client on the application server must be the same version
level as the instance of Db2® on the database server. For example, if the instance of Db2® on your
database server is version 11.1, the instance of Db2® or IBM® Data Server Client on the application
server must also be version 11.1.
High availability
When you install Db2 in a deployment that uses HADR, the following must be true:
• The version of Db2 installed on the primary and standby servers must be the same.
• Install Db2 on all servers according to the previous information
• Db2 must be installed with the same bit size (32 or 64 bit) for both the primary and standby servers.
• The primary and standby databases must have the same database name.
• The primary and standby databases must be in difference instances.
• The amount of space allocated for log files must be the same on both the primary and standby
databases.
For more information about the Db2 requirements, see High availability disaster recovery (HADR)
support. For more information about the Db2 HADR feature, see High availability disaster recovery
(HADR).
If you are creating a production deployment, you can install SQL Server in any location. When you
install SQL Server, record the location of the installation directory because you must specify this location
in the deployment toolkit before you can deploy i2® Analyze.
If you are creating an example deployment, install SQL Server in the default location:
• For Windows™: C:\Program Files\Microsoft SQL Server
• For Linux®: /opt/mssql. Install the SQL Server tools in the default path: /opt/mssql-tools
Features
In all deployments, you must ensure that the following features are installed or enabled:
• Database Engine Services
• SQL Server Authentication
• TCP/IP protocol
In all deployments, you must install the ODBC Driver for SQL Server and sqlcmd utility on your
database server.
On Windows™:
• Microsoft™ ODBC Driver 17 for SQL Server, Microsoft™ ODBC Driver 17 for SQL Server.
• The sqlcmd utility https://docs.microsoft.com/en-us/sql/tools/sqlcmd-utility?view=sql-
server-2017#download-the-latest-version-of-sqlcmd-utility.
On Linux®:
• Microsoft™ ODBC Driver 17 for SQL Server, Microsoft™ ODBC Driver 17 for SQL Server.
• The SQL Server command-line tools, https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-
setup-tools?view=sql-server-2017
You can also install Microsoft™ SQL Server Management Studio to administer your SQL Server
installation. If you are using SQL Server on Linux®, you can install SQL Server Management Studio on a
Windows™ machine and connect to your SQL Server installation.
™
To create an example deployment on Windows , the instance name that you use must be
MSSQLSERVER. Regardless of your operating system, the port number must be 1433.
Users
You must have an SQL Server Authentication Login that has the following permissions:
• Server Roles:
• dbcreator
• bulkadmin, to ingest the example data. The bulkadmin role is not supported on Linux®.
• User mappings for the msdb database:
• SQLAgentUserRole
• db_datareader
Note: In all scenarios, the user that you use to run the deployment scripts must have permission to
create and modify the database.
Post-install
If you plan to deploy i2® Analyze with remote database storage, you must install SQL Server on the
database server, and SQL Server or Microsoft™ Command Line Utilities 17 for SQL Server on the
application server. You can install SQL Server and the Command Line Utilities according to the previous
instructions.
High availability
When you install SQL Server for HADR, you must ensure that the following statements are true:
• All database servers are in the same IP address range.
• All database servers are members of the same domain for SQL Server Always On Availability
Groups DNS name resolution.
• Two static TCP/IP addresses, one for the Windows Failover Cluster and one for the SQL Server
Always On Availability Group. The IP addresses must be in the same range.
• The same version of Windows Server is installed.
For information about installing SQL Server with SQL Server Always On Availability Groups, see
Prerequisites, Restrictions, and Recommendations for Always On availability groups.
Install IBM® HTTP Server from an archive file in the following location:
• For Windows™: C:\IBM\HTTPServer
• For Linux®: /opt/IBM/HTTPServer
For more information about installing, see Installing IBM® HTTP Server.
The Web Server Plug-ins for WebSphere® Application Server are included in the archive installer.
Post-install
• The bin/ikeyman file and the bin/gskcmd file use the Java™ in your provided $JAVA_HOME
environment variable. After you deploy i2® Analyze, you can set your $JAVA_HOME environment
variable to the directory where the deployment toolkit installs Java™.
• Ensure that Microsoft™ Internet Information Server is either inactive or not present on the i2
Analyze server.
• On Linux®, the user that you use to run the deployment scripts must have write permissions on
the /opt/IBM/HTTPServer/conf/httpd.conf file.
Configuring i2 Analyze
During implementation of a production deployment, you need to modify the original base deployment to
match the needs of your organization. When the i2 Analyze deployment is in use, you can make further
configuration changes to adjust to changing needs, and to administer the system.
Configuration sections
Configuring the i2 Analyze application on page 104
To configure i2 Analyze for your organization's requirements, you complete various activities that modify
its behavior. These activities can affect aspects of the system such as authenticating and authorizing
users, controlling access to features, and providing appropriate search options.
Ingesting data into the Information Store on page 376
System administrators can add data to the Information Store in bulk by following an ETL (extract,
transform, load) process. To update data that was added to the Information Store earlier, they can use
the same process to add it again.
Connecting to external data sources on page 318
Subject to your licensing terms, a deployment of i2® Analyze can use the i2® Connect gateway to query
and retrieve data from external data sources. By implementing a connector to an external data source,
you enable i2® Analyze to create and display records that represent the data in that source.
i2 Analyze Schema Designer on page 350
Welcome to the i2 Analyze Schema Designer documentation, where you can find information about how
to use i2 Analyze Schema Designer.
Common tasks
Modifying a deployed i2 Analyze schema on page 106
i2 Analyze supports a limited set of small changes to the Information Store schema in a production
deployment. In general, if your changes only add to the existing schema, you can apply them without
the disruption that more significant changes can cause.
Configuring user security on page 129
In a deployment of i2 Analyze, you can configure how users are authenticated and authorized with the
system.
Connecting to external data sources on page 318
® ®
Subject to your licensing terms, a deployment of i2 Analyze can use the i2 Connect gateway to query
and retrieve data from external data sources. By implementing a connector to an external data source,
you enable i2® Analyze to create and display records that represent the data in that source.
Configuring Find Matching Records on page 213
Analysts use Find Matching Records to identify when multiple records on an Analyst's Notebook
Premium chart might represent the same real-world object or relationship.
Troubleshooting and support
i2 Enterprise Insight Analysis support page
i2 Support
Schema settings
When a deployment of i2 Analyze includes the i2 Connect gateway and connectors that define their
own schemas, the gateway interrogates the connectors for the locations of those schemas. For
schemas that are hosted on the server, the i2 Analyze application retrieves their locations from the
ApolloServerSettingsMandatory.properties file.
For a deployment of i2 Analyze that includes the Chart Store or the Information Store,
ApolloServerSettingsMandatory.properties must contain populated SchemaResource and
ChartingSchemesResource settings that provide the locations of the schema and charting scheme
files:
SchemaResource=
ChartingSchemesResource=
Note: A single deployment of i2 Analyze cannot include both the Chart Store and the Information Store.
When necessary, the Information Store subsumes the Chart Store, and the Information Store schema
includes the elements that describe stored Analyst's Notebook charts.
The ApolloServerSettingsMandatory.properties files in the example
configurations that include the i2 Connect gateway contain extra settings that are named
Gateway.External.SchemaResource and Gateway.External.ChartingSchemesResource.
However, these names are not mandatory, and they are not the only settings for gateway schemas that
can appear in the file.
In general, a deployment of i2 Analyze that includes the i2 Connect gateway can have any number
of gateway schemas. Every pair of schema and charting scheme files must be identified in the
ApolloServerSettingsMandatory.properties file, using the following syntax:
Gateway.ShortName.SchemaResource=
Gateway.ShortName.ChartingSchemesResource=
You are free to specify the ShortName of each pair of settings as you wish. The short name that you
provide is displayed to Analyst's Notebook Premium users in the names of item types from the schema.
It is also displayed to system administrators in the user interface for creating type conversion mappings.
And i2 Analyze uses the short name to differentiate between any item types that have the same name in
separate gateway schemas.
The following steps describe how to add property types to the schema in a deployment of i2 Analyze
that includes the Chart Store.
1. Locate the XML file that contains the Chart Store schema for the i2 Analyze deployment, and load it
into Schema Designer.
2. Make your additions to the "Analyst's Notebook Chart" entity type, and then save the file.
Note: Schema Designer does not validate whether your changes are compatible with the deployed
schema. Validation takes place when you apply the changes to your deployment.
3. Run the following commands on the Liberty server to update the database and application to
conform to the updated schema.
setup -t stopLiberty
setup -t updateSchema
setup -t deployLiberty
setup -t startLiberty
The command recognizes that you modified the schema, determines whether the changes are
valid for the running Chart Store, and then applies them. If the changes are not valid, the command
displays messages to explain the problems.
By default, deployments of i2 Analyze that include the Chart Store do not specify a results configuration
file. Any property types that you add are automatically available as filters for search results. However, if
you add a property type that you do not intend to use for filtering, or if you have modified the "Analyst's
Notebook Chart" entity type in an Information Store schema, you must set up search results filtering to
include or exclude the new property types as required.
When you complete the procedure, reconnect to i2 Analyze from Analyst's Notebook Premium to
confirm that the changes are present and behaving correctly.
1. Locate the XML file that contains the Information Store schema for the i2 Analyze deployment, and
load it into Schema Designer.
2. Make your changes to the schema and its associated charting schemes, and then save the file.
Note: Schema Designer does not validate whether your changes are compatible with the deployed
schema. Validation takes place when you apply the changes to your deployment.
3. Run the following commands on the Liberty server to update the database and application to
conform to the updated schema.
setup -t stopLiberty
setup -t updateSchema
setup -t deployLiberty
setup -t startLiberty
The command recognizes that you modified the schema, determines whether the changes are valid
for the running Information Store, and then applies them. If the changes are not valid, the command
displays messages to explain the problems.
Note: If you customized the Information Store creation process by specifying
createdatabase="false" in the topology file and running the scripts yourself, this command
works in the same way. Execution stops so that you can customize the changes to the Information
Store. After you apply the changes, you can run the task again to complete the process.
Because you can make only additive changes to an Information Store schema that you modify through
this procedure, it is not mandatory to change other parts of your deployment. However, to take full
advantage of your additions, consider the following complementary changes.
• To enable the Information Store to ingest data for the new item types and property types, modify
your ingestion artifacts. See Information Store data ingestion.
• If your deployment includes definitions of the property values of merged i2 Analyze records, you
must update your merged property values definition views. See Define how property values of
merged records are calculated.
• If you want users to see the new types in quick search filters, edit the configuration file that controls
them. See Setting up search results filtering.
• If your deployment includes highlight queries, you can update them for the new item and property
types. For more information, see Deploying highlight queries on page 189.
Permitted changes
You can make the following changes to the Information Store schema of a live deployment:
• Add an item type.
Prevented changes
i2 Analyze prevents all of the following Information Store schema changes from taking place against a
live deployment:
• Change the schema identifier.
• Remove an item type.
• Remove an entity type from the permitted list for a link type.
• Remove a property type.
• Make a property type mandatory.
• Remove a default property value.
• Remove a property value from a selected-from or suggested-from list.
• Change the logical type of a property type. Except for the permitted change described previously.
• Remove a grade type.
• Remove a link strength.
To protect your data when you redeploy with a modified Information Store schema, i2 Analyze carries
out validation checks to ensure that the changes you made do not result in data loss.
You can make the following changes to the Information Store schema of a live deployment. However,
they do not affect the store, which means that users still see items with hidden types when they run a
quick search, for example.
• Add a link constraint.
• Add a link strength.
• Disable an item type.
• Hide an item type.
The disable and hide functions change how item and property types behave in the deployment without
making any destructive changes. Use these features with caution though because they can affect the
behavior of visual query and import operations.
setup -t stopLiberty
4. To remove the database and Solr collections, navigate to the toolkit\scripts directory and run
the following command:
Here, liberty.hostname is the hostname of the Liberty server where you are running the
command. It matches the value for the host-name attribute of the <application> element in the
topology.xml file.
A message is displayed when you run each task to confirm that you want to complete the action.
Enter Y to continue. The database and Solr collections are removed from the system.
5. To re-create the Solr collections and databases, run the following commands:
setup -t deployLiberty
7. Start Liberty:
setup -t startLiberty
If you are using a results configuration file, configure the facets to match the item and property types
that you added or changed in the Information Store schema. See Setting up search results filtering on
page 204.
If you are defining the property values of merged i2 Analyze records, you must update your merged
property values definition views. See Define how property values of merged records are calculated.
Gateway.SourceOne.SchemaResource=
Gateway.SourceOne.ChartingSchemesResource=
Gateway.SourceTwo.SchemaResource=
Gateway.SourceTwo.ChartingSchemesResource=
For each gateway schema that you add to a deployment, you must add two entries to this list with a new
name in place of SourceOne or SourceTwo. If you remove a gateway schema, make sure to remove
both entries that were associated with that schema.
If you follow this procedure in a deployment that provides high availability, you must complete each step
and run each command on every Liberty server in your environment before you move to the next step or
command.
After you add or remove the settings for a particular gateway schema from
ApolloServerSettingsMandatory.properties, check other aspects of your system
configuration.
1. Before you reload the i2 Analyze server to reflect your changes, make sure that the results
configuration file matches your new set of gateway schemas.
a) Open the file indicated by the ResultsConfigurationResource property in
DiscoServerSettingsCommon.properties and follow the instructions in Setting up search
results filtering on page 204 to remove references to item types from removed gateway
schemas.
b) Add or edit elements in the file to enable result filtering on item and property types in any new
gateway schemas.
2. Follow the instructions in Modifying and testing a connector to reload the server and update its
information about the gateway schemas.
3. Follow the instructions in Configuring matching on page 209 to make any changes to the system
or Find Matching Records match rules files that are required as a result of your changes to the
configured schemas.
4. If your deployment uses them, update the set of type conversion mappings.
a) Open a web browser and navigate to the i2 Analyze server admin console, as described in
Modifying and testing a connector.
b) In the i2 Analyze type conversion app, add, edit, or remove type conversion mappings to reflect
the new set of gateway schemas.
c) Export the modified conversion mappings. Copy the resulting mapping-configuration.json
file to the deployment toolkit's configuration\fragments\common\WEB-INF\classes
directory.
d) Redeploy Liberty to update the type conversion mappings on the server.
On completion of the above steps, your i2 Analyze deployment is fully updated in response to the
changes that you introduced by adding, removing, or modifying a gateway schema.
1
If you add a dimension value to the bottom of the sequence in an ordered security dimension, you do
not need to complete a reindex.
To add a security dimension value to a security dimension, add a <DimensionValue> element as a
child of an existing <Dimension> element.
To modify the display name or description of a dimension or a dimension value, change the
DisplayName or Description attributes an existing <Dimension> or <DimensionValue>
element. You must not change the value of the Id attribute.
If you follow this procedure in a deployment that provides high availability, you must complete each step
on every Liberty server in your environment before you move to the next step.
1. Using an XML editor, open the security schema for the deployment.
The security schema is in the toolkit\configuration\fragments
\common\WEB-INF\classes\ directory. The name of the security schema
is specified in the DynamicSecuritySchemaResource property of the
ApolloServerSettingsMandatory.properties file in the same directory.
2. Modify the security dimensions in the security schema according to your requirements.
3. Increment the version number that is stated in the Version attribute of the
<SecurityDimensions> element in the security schema.
4. Check your updated schema to ensure that it remains possible for all users to get an access level
that is not "none" for at least one value in every access dimension.
5. Save and close the file.
Redeploy i2 Analyze to update the application with your changes.
6. In a command prompt, navigate to the toolkit\scripts directory.
7. Stop Liberty:
setup -t stopLiberty
8. If you completed a change that requires a reindex, clear the search index:
In a deployment that provides high availability, you only need to run this command on one Liberty
server.
9. Update and redeploy the system:
setup -t updateSecuritySchema
setup -t deployLiberty
10.Start Liberty:
setup -t startLiberty
If the requirements for security groups change, you can modify the <GroupPermissions> element
and its children.
• To add a group, insert a complete <GroupPermissions> element. To use the new group, you
must ensure that the user repository contains a group that matches the value of the UserGroup
attribute.
• To modify the name that is associated with a group, change the value of the UserGroup attribute.
• To remove a group, remove the <GroupPermissions> element for that group.
If the requirements for the permissions of a security group change, you can add or remove
<Permissions> elements, and add, modify, and remove child <Permission> elements.
• To change the dimensions that a group has permissions for, you can add or remove
<Permissions> elements as follows:
• To add a dimension that the group has permissions for, insert a <Permissions> element where
the value of the Id attribute matches the value of the Id attribute of the dimension.
• To remove a dimension that the group has permissions for, remove the <Permissions>
element where the value of the Id attribute matches the value of the Id attribute of the
dimension.
• To change the security permissions that a group has within a dimension, you can add, modify, and
remove <Permission> elements as follows:
• To add a permission to a group, insert a <Permission> element. The DimensionValue
attribute must match a dimension value in the same dimension that is defined in the Dimension
attribute of the parent <Permissions> element.
• To modify the current permission that a group has in a dimension value, set the Level attribute
to a different value.
• To modify the dimension value that a permission is for, set the DimensionValue attribute to a
different value.
• To remove the current permission that a group has in dimension value, remove the
<Permission> element in which the DimensionValue attribute matches that dimension value.
If you follow this procedure in a deployment that provides high availability, you must complete each step
on every Liberty server in your environment before you move to the next step.
1. Using an XML editor, open the security schema for the deployment.
The security schema is in the toolkit\configuration\fragments
\common\WEB-INF\classes\ directory. The name of the security schema
is specified in the DynamicSecuritySchemaResource property of the
ApolloServerSettingsMandatory.properties file in the same directory.
2. Modify the security permissions in the security schema according to your requirements.
3. Increment the version number that is stated in the Version attribute of the
<SecurityDimensions> element in the security schema.
4. Check your updated schema to ensure that it remains possible for all users to get an access level
that is not "none" for at least one value in every access dimension.
5. Save and close the file.
Redeploy i2 Analyze to update the application with your changes.
6. In a command prompt, navigate to the toolkit\scripts directory.
7. Stop Liberty:
setup -t stopLiberty
setup -t updateSecuritySchema
setup -t deployLiberty
9. Start Liberty:
setup -t startLiberty
setup -t stopLiberty
setup -t updateSecuritySchema
setup -t deployLiberty
7. Start Liberty:
setup -t startLiberty
In a deployment that provides high availability, stop and start each Liberty server in your environment
but run deleteSolrCollections and createSolrCollections on one Liberty server only.
1. Modify or create the security schema that you want to update your deployment with.
For more information about creating the security schema, see Creating a security schema on page
117.
2. Update the configuration with your security schema.
a) Ensure that the security schema file is in the configuration\fragments\common\WEB-INF
\classes directory.
b) Ensure that your security schema file is specified in configuration\fragments\common
\WEB-INF\classes\ApolloServerSettingsMandatory.properties.
c) Ensure that the identifiers of the security dimension values that records receive by default are
valid in the <DefaultSecurityDimensionValues> element in your security schema.
For more information, see Setting default dimension values on page 114.
The following steps update your deployment with the new security schema.
3. Stop the deployment:
setup -t stopLiberty
4. To remove the database and Solr collections, navigate to the toolkit\scripts directory and run
the following command:
Here, liberty.hostname is the hostname of the Liberty server where you are running the
command. It matches the value for the host-name attribute of the <application> element in the
topology.xml file.
A message is displayed when you run each task to confirm that you want to complete the action.
Enter Y to continue. The database and Solr collections are removed from the system.
5. To re-create the Solr collections and databases, run the following commands:
setup -t updateSecuritySchema
setup -t deployLiberty
7. Start Liberty:
setup -t startLiberty
8. If you changed the names of the user groups in the security schema, update the basic user registry
to match the new names.
For more information, see Configuring the Liberty user registry.
Add some data to the system, and verify that users see the behavior that you intended. Iterate over the
process of modifying and replacing the schema as many times as you need.
Security schemas
An i2 Analyze security schema defines the security dimensions that exist in a deployment, and the
dimension values that can be assigned to items and records. A security schema also defines the
permissions that i2 Analyze users can receive.
Every deployment of i2 Analyze has a security schema whose contents reflect local requirements. It is
the responsibility of the deployer to ensure that the security schema is appropriate for the environment
where it is used. Often, the security dimensions map to security classifications that exist in the
organization.
Before you create a security schema, it is important to understand the relationship between the security
model and the security schema. For more information, see i2 Analyze security model .
Security dimensions
A security schema defines access security dimensions that contain dimensions and dimension values.
Security permissions
A security schema defines security permissions by user group, and then by dimension. For a particular
user group, the schema identifies one or more dimensions for which membership of that group affects
access rights. For each identified dimension, the schema contains a list of security permissions.
It is not necessary for the security schema to define permissions for every user group in the
organization. Similarly, it is not necessary for the permissions within any particular dimension or group
to set a security level for every possible dimension value. The completeness of the schema is judged at
run time when the security level of a particular user for a particular item or record is calculated.
<SecuritySchema>
<SecurityDimensions Id="" Version="">
<AccessSecurityDimensions>
<Dimension ...>
<DimensionValue ... />
...
</Dimension>
...
</AccessSecurityDimensions>
</SecurityDimensions>
<SecurityPermissions>
<GroupPermissions ...>
<Permissions ...>
<Permission ... />
...
</Permissions>
...
</GroupPermissions>
...
</SecurityPermissions>
</SecuritySchema>
The <SecurityDimensions> element has attributes for the Id and Version of the schema. If you
modify any part of the security schema, retain the identifier but increment the version number. In this
way, you ensure that all i2 data stores and services are informed of the changes.
In a valid security schema, the <AccessSecurityDimensions> element must be present, and there
must be at least one <GroupPermissions> element inside <SecurityPermissions>.
Security dimension definitions
Security dimensions are defined in an i2 Analyze security schema file as children of the mandatory
<AccessSecurityDimensions> element. A valid security schema defines at least one access
security dimension.
The following example shows a simple, complete <AccessSecurityDimensions> element:
<AccessSecurityDimensions>
<Dimension Id="SD-SC"
DisplayName="Security Classification"
Description="The security classification of this information"
Ordered="true">
<DimensionValue Id="TOP" DisplayName="Top Secret" Description="Top Secret" />
<DimensionValue Id="RES" DisplayName="Restricted" Description="Restricted" />
</Dimension>
</AccessSecurityDimensions>
The attributes of the <Dimension> element affect how the values in the security dimension are
interpreted.
Attribute Description
Id A unique identifier that is used to distinguish this security dimension
throughout the system.
DisplayName A name that identifies this dimension to the user in clients.
Description A more detailed description of this security dimension that provides more
information to the user. In the Analyst's Notebook Premium, the description
is used as a tooltip.
Ordered Indicates whether the values in this dimension form a descending sequence
in which each value supersedes the values below it.
Marking this dimension as Ordered="true" means that a user who has
access rights to "Top Secret" data implicitly has the same access rights to
"Restricted" data as well. For a dimension in which Ordered="false",
there is no such implication, and access rights must be assigned explicitly
for each dimension value.
The Id, DisplayName, and Description attributes of <DimensionValue> elements have the
same purpose and meaning as the <Dimension> attributes with the same names. The identifiers of
dimension values must be unique within the dimension that defines them.
Important: After you deploy i2 Analyze, the changes that you can make to security dimensions are
limited. You cannot add or remove dimensions, or remove dimension values. You can only add values
to existing dimensions. For this reason, you must understand the requirements of your organization
before you deploy i2 Analyze in a production environment.
Security group permission definitions
In an i2 Analyze security schema, the mandatory <SecurityPermissions> element contains one or
more <GroupPermissions> elements. Each <GroupPermissions> element defines the security
levels that users in a particular group receive for items and records with particular dimension values in
an i2 Analyze deployment.
The syntax for defining the security permissions for user groups enables membership of one
group to convey permissions across several dimensions, and allows different groups to convey
different permissions for the same dimensions. The following example shows how to structure
<GroupPermissions> elements inside the <SecurityPermissions> element:
<SecurityPermissions>
<GroupPermissions UserGroup="Clerk">
<Permissions Dimension="SD-SC">
<Permission ... />
</Permissions>
...
</GroupPermissions>
<GroupPermissions UserGroup="Manager">
<Permissions Dimension="SD-SC">
<Permission ... />
...
</Permissions>
<Permissions Dimension="SD-IT">
<Permission ... />
...
</Permissions>
...
</GroupPermissions>
<GroupPermissions UserGroup="Security Controller">
<Permissions Dimension="SD-GA">
<Permission ... />
</Permissions>
</GroupPermissions>
</SecurityPermissions>
The value of the UserGroup attribute of each <GroupPermissions> element must match the name
of a group of i2 Analyze users.
The value of the Dimension attribute of each <Permissions> element must match the identifier of
one of the dimensions that is defined in the first part of the schema.
It is normal for <Permissions> elements for the same dimension to appear in more than one
<GroupPermissions> element:
• Users who are members of one group but not the other can receive different access levels on items
and records that have the same dimension values.
• When users are members of more than one group, <Permissions> elements for the same
dimension are combined before any access level calculation takes place.
Important: You can add and remove <GroupPermissions> elements from a deployed security
schema, if the resulting system continues to obey the rules of i2 Analyze. In particular, it must remain
possible for all users to get an access level that is not "none" for at least one value in every access
dimension.
Security permission definitions
The security permission definitions in an i2 Analyze security schema each associate a single dimension
value with a single security level. The definitions can be simple because of the additional context that
their location in the security schema file provides.
The <Permission> elements that define security permissions always appear inside <Permissions>
elements, which in turn always appear inside <GroupPermissions> elements.
<GroupPermissions UserGroup="Manager">
<Permissions Dimension="SD-SC">
<Permission DimensionValue="TOP" Level="UPDATE" />
<Permission DimensionValue="RES" Level="UPDATE" />
</Permissions>
<Permissions Dimension="SD-IT">
<Permission DimensionValue="HUMINT" Level="READ_ONLY" />
</Permissions>
</GroupPermissions>
It is possible, and often desirable, for identical <Permission> elements to appear in different locations
in an i2 Analyze security schema. The effect of a security permission definition depends entirely on its
position in the file.
Important: Like the <GroupPermissions> elements that contain them, you can add and remove
<Permissions> and <Permission> elements from a deployed security schema, if the resulting
system does not break the rules of i2 Analyze.
By default (and ignoring metadata), the type-access-configuration.xml file contains only its root
element:
<tns:TypePermissions>
</tns:TypePermissions>
If the file does not exist or remains in its default state, then all users have access to all records, subject
to the rules of the security schema.
Note: The configuration file is processed after item type conversion takes place. As such, the
restrictions you define apply only to the types that are present post-mapping.
Configuration example
Given a deployed schema that contains the entity types ET1, ET2, ET3, and LT1, consider the following
configuration:
<tns:TypePermissions DefaultSchemaShortName="...">
<ItemType Id="ET1" SchemaShortName="...">
<Allow>
<UserGroup Name="Analyst"/>
<UserGroup Name="Clerk"/>
</Allow>
</ItemType>
<ItemType Id="ET3">
<Allow></Allow>
</ItemType>
<ItemType Id="LT1"></ItemType>
</tns:TypePermissions>
Note: It is not possible to specify more than one <ItemType> element with the same Id and
SchemaShortName attributes.
Other considerations
Take note of the following considerations when you configure item type security in your deployment of i2
Analyze.
In this situation, users who cannot see records of type ET1 also cannot see records of type LT1.
On the other hand, if the definition of LT1 specifies other valid link end types that are not restricted for
the same users, as in this case, then records of type LT1 remain visible:
Seeds
Item type security restrictions can affect the behavior of seeded search services. The following rules
apply to services configured with seedConstraints:
• If the service accepts seeds of a restricted type, and does not specify that it must have at least one
seed of that type (that is, it is possible to use the service without a seed of the restricted type), then
that constraint is removed from the user's view.
• If the service accepts seeds of a restricted type, and it must have at least one seed of that type, then
the service is removed from the user's view.
Asynchronous queries
Asynchronous queries are validated when they are launched, to ensure that the seeds are visible to the
user.
If a user's permissions change after initiating the asynchronous query on the connector, the query runs
to completion, but any results no longer visible to the user are filtered out of the result set (which might
be empty as a result).
Procedure
Edit the configuration file:
1. If you do not already have one, obtain and configure an XML editor.
2. In the XML editor, open the toolkit/configuration/live/type-access-
configuration.xml file.
3. Using the The item type security configuration file on page 126 and Item type security on page
122 information, modify the file to define the type access permissions that you need.
Update the deployment with your changes.
The following method deploys your changes without stopping the server, through a POST request to a
REST endpoint.
To redeploy your changes using only the deployment toolkit, see Redeploying Liberty. You must use
the deployment toolkit if you are in a deployment with high availability, or if you are deploying to your
production environment.
1. At the command line, navigate to the toolkit/scripts directory.
2. Update the server with your configuration file:
setup -t updateLiveConfiguration
3. Update the running application by using the reload endpoint. Make sure that you provide the
credentials of a user with administration rights:
Warning: reload updates the configuration without requiring a server restart, but any logged-in
users are logged out from i2 Analyze when you run it.
The server validates the item type configuration as it loads, and returns any errors in its response.
Test the new and updated item type permissions.
1. A good way to verify that your item type security configuration is loaded correctly is to call connector
or gateway schema endpoints and search the response body for an item type that the current user
should not be able to see.
Root element
<TypePermissions>
<TypePermissions> is the root element of the item type security configuration file. In the file that
deployments of i2 Analyze receive by default, the element is empty and its name is prefixed with the
tns namespace:
<tns:TypePermissions DefaultSchemaShortName="...">
...
</tns:TypePermissions>
<ItemType>
The <TypePermissions> root element supports any number of child <ItemType> elements
that specify the type security permissions. <ItemType> is the only permitted child of
<TypePermissions>:
<TypePermissions>
<ItemType Id="..." SchemaShortName="...">
...
</ItemType>
...
</TypePermissions>
The <ItemType> element has two attribues: Id, which is mandatory; and SchemaShortName, which
is optional:
• Id is the identifier of the item type, as defined in the schema that contains it.
• SchemaShortName is the short name of the schema that contains the item type. When this attribute
is set, it overrides DefaultSchemaShortName in the parent element.
Each item type for which the file contains permissions appears in exactly one <ItemType> element. If
an <ItemType> element is empty, however, it is as if that element does not exist.
<Allow>
The item type security model assumes that if you want to control access to a particular type, then
usually you want to make it so that only users in particular groups can see records that have that type.
The <ItemType> element supports a single <Allow> child element. As soon as you add the element,
access to the type is denied to all groups that are not specifically mentioned:
<TypePermissions>
<ItemType Id="...">
<Allow>
...
</Allow>
</ItemType>
...
</TypePermissions>
The <Allow> element has no attributes. If an <ItemType> element has an empty <Allow> child
element, then only users who have the i2:Administrator command access permission can see
records of that type.
<UserGroup>
The <Allow> element supports any number of child <UserGroup> elements. Members of each
user group that you specify (as well as users who have the i2:Administrator command access
permission) are allowed to see records that have the parent item type:
<TypePermissions>
<ItemType Id="...">
<Allow>
<UserGroup Name="..."/>
...
</Allow>
</ItemType>
...
</TypePermissions>
The <UserGroup> element has a single, mandatory Name attribute. For each user group that
should have permission to see records of the specified type, the <Allow> element must contain a
<UserGroup> element whose Name attribute is set to the name of the user group.
IUserItemTypeAccessResolverProvider=com.example.ImplementationClassName
setup -t stopLiberty
setup -t deployLiberty
setup -t startLiberty
To illustrate these rules, consider that the example security schema defines the following dimensions
and groups:
To map to this security schema, the user group values in the table must match with the user groups in
the user repository.
Each user in this deployment must be in either of the "Analyst" or "Clerk" groups, and either of the
"Controlled" or "Unclassified" groups.
Every deployment must contain an account that is associated with the administrator role. You
can create a group in the user registry named "Administrator", or you can change the value of the
security.administrator.group property to the name of an existing group in the repository. The
security.administrator.group property is in the environment-advanced.properties
file for each application, in the toolkit\configuration\environment\application directory.
When an i2 Analyze user is a member of this group, they can access administrative features.
The following process is an approach to security in Liberty that uses a basic user registry.
1. Create the users and groups in Liberty for each of the group permissions elements in the security
schema.
a) In an XML editor, open the user.registry.xml file. You can find this file in the C:\IBM
\i2analyze\deploy\wlp\usr\shared\config directory of your Liberty installation.
b) Use the following template to add your users and groups to the user.registry.xml file as the
first child of the <server> element:
• The <group> elements are populated by <member> elements. For a user to be a member of
a group, a <member> element's name attribute must match that user's name attribute.
If you are using the example deployment, the user Jenny is a member of each group.
In the following example user.registry.xml, the users Analyst1, and Clerk1 have been
added into a subset of the groups. If you use the following example, log in as these users to see
the different permission levels of each group:
The encoded password is displayed in the command line. Record the encoded password,
including the {xor} prefix, and use the encoded password as the password in the
user.registry.xml file.
For more information about using the security utility, see securityUtility command.
3. Save and close the file.
To test that your changes have worked, log in to i2 Analyze as one of the users that you added to the
user registry.
After you test your changes to the user registry, you can configure user access to features. To
access the REST endpoints, a user must be a member of a group that has the i2:Administrator
permission under command access control. For more information, see Configuring command access
control on page 148.
Intended audience
In the production deployment process, you might first configure SPNEGO single sign-on in the
configuration or pre-production environments. As you move to a production deployment, you must
replicate any configuration changes in any new deployments.
This section is intended for readers who are familiar with configuring and managing domain controllers,
Microsoft™ Active Directory, and have an understanding of SPNEGO single sign-on.
There are many different single sign-on technologies. This section defines a SPNEGO single sign-on
setup with workstations that are members of the same Microsoft Active Directory domain. i2 Analyze
uses the users and groups in Active Directory to determine the authorization of users.
The instructions assume that the following prerequisites are installed and accessible:
• A Microsoft Windows® Server running an Active Directory Domain Controller and associated
Kerberos Key Distribution Center (KDC).
• A Microsoft Windows® domain member (client) with a web browser that supports the SPNEGO
authentication mechanism.
• A working deployment of i2 Analyze that can be accessed by users in Active Directory.
For information on the prerequisites that are required, see the Before you begin section of Configuring
SPNEGO authentication in Liberty .
® ®
Attention: IBM takes reasonable steps to verify the suitability of i2 Analyze for internet
deployment. However, it does not address lower-level issues such as guarding networks against
penetration, securing accounts, protecting against brute force attacks, configuring firewalls to
avoid DoS or DDoS attacks, and the like. For your deployment of i2® Analyze, follow industry-
standard practices and recommendations for protection of your systems. IBM® accepts no liability
for the consequences of such attacks on your systems. This information is not intended to
provide instructions for managing key databases or certificates.
SPNEGO single sign-on enables users to log in to a Microsoft domain controller, and be authenticated
within the single sign-on environment. In SPNEGO single sign-on, to change the user that is logged in to
i2 Analyze, the user must log out of the workstation, and a new user must log in to the workstation.
Authentication
When i2 Analyze is configured to use SPNEGO single sign-on, the authentication sequence between
the client and the platform matches the following steps and the associated diagram:
1. The client attempts to connect to WebSphere Application Server Liberty profile with an HTTP/Post/
Get request.
2. WebSphere Application Server Liberty profile returns HTTP 401 with a Negotiate header.
3. The client requests a SPNEGO token from the domain controller.
4. The domain controller returns a SPNEGO token to the client.
5. The client attempts to connect to WebSphere Application Server Liberty profile with an HTTP/Post/
Get request and the SPNEGO token.
6. On successful authentication, the client receives a Lightweight Third-Party Authentication (LTPA)
token in a cookie. During normal operation, the client passes the cookie back to i2 Analyze.
Authorization
After the user is authenticated, they are logged in to i2 Analyze. To define the data that the user has
access to, the user must be authorized by i2 Analyze.
For authorization, the i2 Analyze application communicates with Active Directory, through the
WebSphere Application Server Liberty profile user registry APIs to retrieve information about the current
user. The principal provider then maps the retrieved information to security dimension values in the i2
Analyze security schema.
The following diagram shows how authorization works in i2 Analyze:
Note: The security schema that the deployment uses is defined in the
ApolloServerSettingsMandatory.properties file. The security schema, and properties files are
in the toolkit\configuration\fragments\common\WEB-INF\classes directory.
In a single sign-on setup, the following users must be present in Active Directory:
• A user for the server that hosts the i2 Analyze application, that is mapped to a Kerberos Service
Principal Name (SPN).
• The users that are used to log in to i2 Analyze.
To authorize users, the following groups must be present in Active Directory:
• A group for each of the group permission elements in the i2 Analyze security schema.
• A group for administrators.
1. Create the Microsoft™ Active Directory groups.
For more information, see How to Create a Group in Active Directory.
a) Open the Microsoft™ Active Directory groups controller.
b) Create groups whose names exactly match the value of the UserGroup attribute of each
<GroupPermissions> element in the i2 Analyze security schema file.
2. Create any Microsoft™ Active Directory users.
Create user accounts that can be used to log in to i2 Analyze.
For more information, see How to Create a Domain Account in Active Directory.
3. Make each user a member of the correct groups for your environment.
The groups that the user is a member of in Active Directory are used for authorization in i2 Analyze.
For more information, see Adding Users to an Active Directory Group.
The users that can access i2 Analyze are created, and are members of the groups that define their
access levels.
Note: Ensure that the host file on the Active Directory server uses the full host name, including
the domain name, for the i2 Analyze server. Remove any entries that use only the short name for
the i2 Analyze server. The value in the host file must match the value that is used for the SPN.
b) Configure the server that hosts WebSphere Application Server Liberty profile, and WebSphere
Application Server Liberty profile.
2. Configure WebSphere Application Server Liberty profile to use the Microsoft™ Active Directory
registry by using the instructions in Configuring LDAP user registries with Liberty as a reference.
a) Complete step 1 to add the features to the i2analyze\deploy\wlp\usr\servers\opal-
server\server.xml file.
b) Complete step 4 by using the Microsoft Active Directory Server example to populate the
<ldapRegistry> element.
Note: This information does not cover the configuration of Secure Sockets Layer (SSL) between
WebSphere Application Server Liberty profile and Active Directory. Do not include the <ssl> and
<keyStore> elements from the example, in your server.xml.
c) Ensure that the mapping between Active Directory and the i2 Analyze security schema is correct.
Add the following code after the <ldapRegistry> element in the server.xml file:
<federatedRepository>
<primaryRealm name="">
<participatingBaseEntry name=""/>
<groupSecurityNameMapping inputProperty="cn" outputProperty="cn"/>
</primaryRealm>
</federatedRepository>
Populate the empty name attribute values by using the following information:
• The <primaryRealm> element's name attribute has the same value as the realm attribute of
the <ldapRegistry> element.
• The <participatingBaseEntry> element's name attribute has the same value as the
baseDN attribute as the <ldapRegistry> element.
By default, all requests to access protected resources use SPNEGO authentication. If you previously
deployed i2 Analyze with basic authentication, you must ensure that the basic registry is not present in
the user.registry.xml file.
3. Using an XML editor, either remove or comment out the complete <basicRegistry> element in
the i2analyze\deploy\wlp\usr\shared\config\user.registry.xml file.
Redeploying i2 Analyze
Redeploy i2 Analyze to update the application with your configuration changes.
If you follow this procedure for a deployment that provides high availability, you must complete each
step on every Liberty server in your environment before you move to the next step.
1. In a command prompt, navigate to the toolkit\scripts directory.
2. Stop Liberty:
setup -t stopLiberty
setup -t deployLiberty
4. Start Liberty:
setup -t startLiberty
5. If you are using the IBM HTTP Server, start, or restart, it.
Intended audience
In the production deployment process, you might first configure client certificate authentication in the
configuration or pre-production environments. As you move to a production deployment, you must
replicate any configuration changes in any new deployments.
This information is intended for readers who are familiar with managing key databases and certificates,
user authentication mechanisms, and the i2 Analyze toolkit.
Prerequisites
The starting point for configuring client certificate authentication is a deployment of i2 Analyze that is
configured to use Secure Sockets Layer on connections to the HTTP Server, and between the HTTP
Server and Liberty. For more information about configuring Secure Sockets Layer on connections to the
HTTP Server, see Configuring Secure Sockets Layer with i2 Analyze.
® ®
Attention: IBM takes reasonable steps to verify the suitability of i2 Analyze for internet
deployment. However, it does not address lower-level issues such as guarding networks against
penetration, securing accounts, protecting against brute force attacks, configuring firewalls to
avoid DoS or DDoS attacks, and the like. For your deployment of i2® Analyze, follow industry-
standard practices and recommendations for protection of your systems. IBM® accepts no liability
for the consequences of such attacks on your systems. This information is not intended to
provide instructions for managing key databases or certificates.
Client certificates
The client certificates that are used to authenticate users must be signed by a certificate authority that is
trusted by the i2 Analyze server.
The common name in a client certificate must match a user name in the i2 Analyze user registry. A user
that selects such a certificate logs in to i2 Analyze as the corresponding i2 Analyze user.
You can have as many client certificates as you require. Each certificate is associated with a single user
in the user registry. Each certificate can be installed on any number of workstations. Each workstation
can have any number of certificates installed.
To demonstrate a working configuration, you can use a self-signed client certificate. For more
information, see Creating a self-signed client certificate on page 140. However, in a production
deployment you must use certificates that are signed by a certificate authority that is trusted by the i2
Analyze server.
There are many methods for obtaining an X.509 certificate that is signed by a certificate authority.
When you receive a signed certificate, you also receive signer certificates so that you can trust the
client certificates that are signed by that certificate authority. If the certificate authority that signed your
certificates is not already trusted within the key database, you must add any signer certificates to the
key database so that the certificate authority is trusted.
For information about managing key databases, certificates, and trusted certificate authorities using the
IBM Key Management utility, see Managing keys with the IKEYMAN graphical interface.
2. Open the key database that is used for Secure Sockets Layer (SSL) connections. If you followed
the instructions to set up the SSL example, the key database file is IBM\i2analyze\i2-http-
keystore.kdb.
For more information about opening a key database, see Working with key databases.
3. Add the certificates to the key database, to ensure that the certificates received from the client are
trusted.
a) In the IBM® Key Management utility, with the i2 Analyze key database open, select Signer
Certificates from the list in the Key database content pane.
b) Click Add.
c) Click Browse, and locate your certificate.
Note: When you are using a self-signed client certificate, add the self-signed client certificate as a
signer certificate. For example, Jenny.der.
4. The Liberty truststore must contain the certificates to ensure that the certificates received from the
client are trusted.
a) Run the following command to import the required certificate into the truststore. If the truststore
does not exist, it is created.:
Note: When you are using a self-signed client certificate, add the self-signed client certificate as
a signer certificate. For example, Jenny.der.
The key database contains the signer certificates so that the client certificates can be trusted. The
truststore is populated so that Liberty can use it to trust the client certificates.
Configuring i2 Analyze
To enable a user to log in using a client certificate, you must modify some of the configuration files for i2
Analyze.
Add a rewrite rule that enables client authentication on the IBM HTTP Server to the i2 Analyze
configuration. Then, update the web.xml file for the application to enable client certificate
authentication.
If you follow this procedure for a deployment that provides high availability, you must complete each
step on every Liberty server in your environment before you move to the next step.
1. Using a text editor, open the configuration\environment\proxy\http-custom-
rewrite-rules.txt file. Add the following line between the !Start_After_Rules! and !
End_After_Rules! lines to enable client certificate authentication:
SSLClientAuth Optional
b) Specify the truststore password in the credentials file. In a text editor, open the toolkit
\configuration\environment\credentials.properties file and enter a password for
the truststore that you specified in the topology.xml file.
ssl.truststore.password=password
<ssl clientAuthenticationSupported="true"
id="defaultSSLConfig"
keyStoreRef="defaultKeyStore"
trustStoreRef="defaultTrustStore"/>
<httpDispatcher enableWelcomePage="false"
trustedSensitiveHeaderOrigin="*"/>
<login-config>
<auth-method>FORM</auth-method>
<realm-name>Form-Based Authentication Area</realm-name>
<form-login-config>
<form-login-page>/login.html</form-login-page>
<form-error-page>/login.html?failed</form-error-page>
</form-login-config>
</login-config>
In the login configuration section, add the following lines to define the client certificate authentication
method:
<login-config>
<auth-method>CLIENT-CERT</auth-method>
<realm-name>WebRealm</realm-name>
</login-config>
Redeploying i2 Analyze
Redeploy i2 Analyze to update the application with your configuration changes.
If you follow this procedure for a deployment that provides high availability, you must complete each
step on every Liberty server in your environment before you move to the next step.
1. In a command prompt, navigate to the toolkit\scripts directory.
2. Stop Liberty:
setup -t stopLiberty
setup -t deployLiberty
4. Start Liberty:
setup -t startLiberty
5. If you are using the IBM HTTP Server, start, or restart, it.
-Dcom.sun.net.ssl.checkRevocation=true
-Djava.security.properties=C:/IBM/i2analyze/deploy/wlp/usr/servers/opal-server/
java-security-ocsp.properties
Where:
• checkRevocation is set to true to instruct Liberty to check whether certificates have been
revoked.
• java.security.properties is the path to a properties file that contains the settings to
configure OCSP.
2. In a text editor, create the i2analyze\deploy\wlp\usr\servers\opal-server\java-
security-ocsp.properties file and add the following lines:
ocsp.enable=true
ocsp.responderURL=
Where:
• ocsp.enable is set to true to enable OCSP.
• ocsp.responderURL is the URL of the OCSP service that is used to check the status of a
certificate. When this value is specified, it overrides the value in the Authority Information Access
extension on the certificate.
3. Restart Liberty:
setup -t restartLiberty
Log in to your deployment to test that revoked certificates are identified successfully.
If a user attempts to log in with a revoked certificate, a message is displayed in the Liberty logs. For
example:
java.security.cert.CertPathValidatorException: Certificate has been revoked,
reason: UNSPECIFIED,
revocation date: Wed Jan 20 17:13:35 UTC 2021, authority: CN=ocsp, OU=i2,
O=IBM, ST=England, C=GB, extension OIDs: []
If your OCSP service is unavailable, a message is displayed in the Liberty logs. For example:
The extended error message from the SSL handshake exception is: PKIX path
validation failed: java.security.cert.CertPathValidatorException: Unable to
determine revocation status due to network error
CommandAccessControlResource=command-access-control.xml
setup -t stopLiberty
setup -t deployLiberty
7. Start Liberty:
setup -t startLiberty
Connect to your deployment and test that members of each user group have the correct access to
features. Continue to change the configuration until you are satisfied with the access of each user
group.
After you set command access control, you can revert to the default state by ensuring that the
CommandAccessControlResource property in the toolkit\configuration\fragments\opal-
services\WEB-INF\classes\DiscoServerSettingsCommon.properties has no value.
<CommandAccessPermissions UserGroup="Analyst">
<Permission Value="i2:RecordsUpload" />
<Permission Value="i2:RecordsDelete" />
<Permission Value="i2:RecordsExport" />
<Permission Value="i2:ChartsUpload" />
<Permission Value="i2:ChartsDelete" />
<Permission Value="i2:Notebook" />
</CommandAccessPermissions>
setup -t stopLiberty
setup -t deployLiberty
7. Start Liberty:
setup -t startLiberty
File structure
<CommandAccessControl>
The <CommandAccessControl> element is the root of the configuration file.
It contains child <CommandAccessPermissions> elements.
<CommandAccessPermissions>
The <CommandAccessPermissions> element contains the access permissions for groups of
users.
The UserGroup attribute defines the user group that the access permissions apply to. The value
of the UserGroup attribute must match a user group from the user registry. To specify that the
permissions apply to all user groups, you can use the * wildcard.
It contains one or more child <Permission> elements.
<Permission>
The Value attribute of the <Permission> element defines a permission that members of the user
group that is specified in the parent <CommandAccessPermissions> element has access to.
For the list of values that you can specify for the Value attribute, see Command access
permissions on page 151.
The following example allows users of all groups to upload records and charts, and members of the
"Analyst" user group can delete records and charts too:
<tns:CommandAccessControl
xmlns:tns="http://www.i2group.com/Schemas/2018-01-19/CommandAccessControl"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.i2group.com/Schemas/2018-01-19/
CommandAccessControl CommandAccessControl.xsd ">
<CommandAccessPermissions UserGroup="*">
<Permission Value="i2:RecordsUpload"/>
<Permission Value="i2:ChartsUpload"/>
<Permission Value="i2:ChartsRead"/>
<Permission Value="i2:Notes"/>
</CommandAccessPermissions>
<CommandAccessPermissions UserGroup="Analyst">
<Permission Value="i2:RecordsDelete"/>
<Permission Value="i2:ChartsDelete"/>
</CommandAccessPermissions>
</tns:CommandAccessControl>
• Record and chart permissions control access to commands for record and chart management.
• Web client permissions control access to features in the web client.
• Connector permissions control access to connectors when your deployment includes the i2 Connect
gateway.
• Administrator permissions control access to REST API endpoints, including the admin endpoint.
Record and chart permissions:
i2:RecordsUpload
Members of groups that have this permission can create and modify records and upload them to the
Information Store.
Without this permission, users can search for records and add them to charts, but cannot upload
changes to records.
i2:RecordsDelete
Members of groups that have this permission can delete records that were originally uploaded
through Analyst's Notebook Premium.
Without this permission, users can search for records and add them to charts, but cannot delete
records from the Information Store.
i2:RecordsExport
Members of groups that have this permission can export records that are returned in search results
to a CSV file.
Without this permission, users cannot export records that are returned in search results to a CSV
file.
i2:ChartsUpload
Members of groups that have this permission can create and modify Analyst's Notebook charts
and upload them to the Chart Store. Modifying a chart includes deleting versions of a chart, but not
deleting the chart itself.
Without this permission, users can save Analyst's Notebook charts locally, but cannot upload new
charts and modifications to existing charts.
i2:ChartsBulkUpload
Members of groups that have this permission receive access to the Upload from Folder feature in
i2 Analyst's Notebook Premium that enables users to upload charts from disk to the Chart Store in
bulk.
Note: This permission automatically includes the i2:ChartsUpload permission. You do not need
to give both permissions to the same user groups.
i2:ChartsDelete
Members of groups that have this permission can delete charts that were originally uploaded
through Analyst's Notebook Premium.
Without this permission, users cannot delete charts from the Chart Store.
i2:ChartsRead
Members of groups that have this permission can search for and retrieve charts from the Chart
Store.
Without this permission, users cannot search for or retrieve charts.
i2:Notes
Members of groups that have this permission can create and access notes on records and charts.
Without this permission, notes are not displayed in the Notes tab, and the contents of any notes are
not searchable.
Web client permission:
i2:Notebook
Members of groups that have this permission can access the i2 Notebook web client. Members of
groups without this permission see the i2 Investigate web client instead.
For more information, see Enabling access to the i2 Notebook web client on page 149.
Connector permissions:
i2:Connectors
If you are using the i2 Connect gateway, members of groups that have this permission can view all
i2 Connect connectors.
Without this permission, i2 Connect connectors are not visible unless individual connectors are
specified by using the i2:Connectors:connector-id permission.
i2:Connectors:connector-id
If you are using the i2 Connect gateway, members of groups that have this permission can view the
i2 Connect connector with the matching connector-id. For example, i2:Connectors:example-
connector.
Without this permission, the specified i2 Connect connector is not visible.
Administrator permissions:
i2:AlertsCreate
Members of groups that have this permission can access the REST API alerts endpoint to create
and send alerts to i2 Analyze users. For more information, see Managing data in i2 Analyze on page
376.
i2:Administrator
Members of groups that have this permission can access the REST API admin endpoints. For more
information, see Using the admin endpoints on page 316.
The data schema contains objects that store all the data that is available for analysis.
By default, the value is IS_Data
WebChartSchema
The web chart schema contains temporary objects used during manipulation of the web chart.
By default, the value is IS_WC
VisualQuerySchema
The visual query schema contains temporary objects used during visual query processing.
By default, the value is IS_Vq
FindPathSchema
The find path schema contains objects that support find path results.
By default, the value is IS_FP
StagingSchema
The staging schema contains temporary objects that support the ingestion process.
By default, the value is IS_Staging
PublicSchema
The public schema contains objects that represent a public API for the Information Store. It also
contains procedures, tables, and views related to the deletion by rule feature.
By default, the value is IS_Public
DeletionByRuleRoleName
The deletion-by-rule role name.
By default, the value is Deletion_By_Rule
Collation
The collation sequence used for the Information Store. You can only change this setting
before you create the Information Store database. Defaults to CLDR181_LEN_S1 for Db2 and
Latin1_General_100_CI_AI_SC for SQL Server.
The InfoStoreNamesDb2.properties file contains the following settings that are specific to
deployments that use IBM® Db2® to host the Information Store database:
SystemTempTableSpace
The system temporary table space.
By default, the value is IS_TEMP_TS
UserTemp16KTableSpace
The user temporary 16K table space to hold global temporary objects.
By default, the value is IS_16K_TS
BigTableSpace
The InfoStoreNamesSQLServer.properties file contains the following settings that are specific to
deployments that use Microsoft™ SQL Server to host the Information Store database.
For the Filename settings, you must specify a file name or absolute path. If you specify a file name,
it is located relative to the value that is specified in the environment.properties file for the
db.database.location.dir property. If you specify a file path, it is considered an absolute path.
ISSchema
The Information Store schema contains internal configuration information about the Information
Store database.
By default, the value is IS_Core
PrimaryName
The SQL Server database primary data file contains the startup and configuration information for the
Information Store database. The logical name of the primary data file.
The data schema contains objects that store all the data that is available for analysis.
By default, the value is IS_Data
WebChartSchema
The web chart schema contains temporary objects used during manipulation of the web chart.
By default, the value is IS_WC
VisualQuerySchema
The visual query schema contains temporary objects used during visual query processing.
By default, the value is IS_Vq
FindPathSchema
The find path schema contains objects that support find path results.
By default, the value is IS_FP
StagingSchema
The staging schema contains temporary objects that support the ingestion process.
By default, the value is IS_Staging
PublicSchema
The public schema contains objects that represent a public API for the Information Store. It also
contains procedures, tables, and views related to the deletion by rule feature.
By default, the value is IS_Public
DeletionByRuleRoleName
The deletion-by-rule role name.
By default, the value is Deletion_By_Rule
Collation
The collation sequence used for the Information Store in SQL Server. You can only
change this setting before you create the Information Store database. Defaults to
Latin1_General_100_CI_AI_SC.
By default, the value is Latin1_General_100_CI_AI_SC
PrimaryName
The SQL Server database primary data file contains the startup and configuration information for the
Information Store database. The logical name of the primary data file.
By default, the value is IStore_System_Data
PrimaryFilename
The filename or absolute path for the primary data file.
By default, the value is IStore-p1.mdf
UserTableName
The user table filegroup contains all the tables and data for the Information Store database. The
logical name of the user table in the user table filegroup.
IS_DATA.E_PERSON_FN_IX is the name of the index to create, IS_DATA.E_PERSON is the table for
the Person entity type, and P_FIRST_GIVEN_NAME is the column for the first given name property type.
To determine the syntax of the SQL statement that you must use to create the index, use the
documentation for your database management system. For more information about creating indexes in
a Db2 database, see CREATE INDEX statement and for SQL Server, see CREATE INDEX (Transact-
SQL).
1. Identify the item types, and any of their property types, that you want to add indexes for.
2. Create the directory for the informationStoreModifications.sql file.
a) In the configuration\environment\opal-server directory, create the databases
directory.
b) In the databases directory that you created, create the infostore directory.
You can find the value to use for infostore in your topology.xml file. The value to use in your
deployment is the value of the id attribute of the <database> element for your Information Store
database.
For example, configuration\environment\opal-server\databases\infostore.
3. Using a text editor, create a file that is named informationStoreModifications.sql in the
configuration\environment\opal-server\databases\infostore directory.
4. Develop a script to create the indexes on the tables and columns that you identified in step 1 in the
informationStoreModifications.sql file.
5. Save and close the file.
6. If the Information Store database exists, run the modifyInformationStoreDatabase toolkit task
to run the script that you saved in step 4.
a) Open a command prompt and navigate to the toolkit\scripts directory.
b) Run the following command to run the informationStoreModifications.sql file:
setup -t modifyInformationStoreDatabase
Ensure that the indexes you expect are in the Information Store database. You can use IBM Data Studio
or SQL Server Management Studio to see the indexes in the Information Store database.
1. Change the host name, port, and operating system type attributes of the remote database:
a) Using an XML editor, open toolkit\configuration\environment\topology.xml.
b) Update the host-name and port-number attribute values of the <database> element for the
database.
c) Run the following command to recatalog the remote node:
setup -t recatalogRemoteDB2Nodes
setup -t uncatalogRemoteDB2Nodes
setup -t catalogRemoteDB2Nodes
The next time that you deploy i2 Analyze, or recreate the database, the database is created using the
new values that you provided in the topology.xml.
Configuring search
Depending on the requirements of the environment that you are deploying into, a number of
configurable options are available for searching and indexing. You can configure the behavior of search
features to match your requirements.
For example, to use Arabic uncomment the <Definition> and <SynonymsFile> file elements
in the ar_EG config section:
<Definition Analyzer="text_facet">
<AnalyzerChain>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"
preserveOriginal="false"/>
<filter class="solr.ArabicNormalizationFilterFactory"/>
<filter class="solr.ArabicStemFilterFactory"/>
</AnalyzerChain>
</Definition>
b) To specify a different synonyms file, place the file in the configuration\solr directory and
provide the file name in the <SynonymsFile> element.
For example:
For more information about creating a custom synonyms file, see Creating a synonyms file on
page 169.
3. To preserve the diacritics in facets, you must complete the previous step for your chosen template
and remove the following line from the text_facet analyzer:
6. Redeploy i2 Analyze:
setup -t deployLiberty
Here, liberty.hostname is the hostname of the Liberty server where you are running the
command. It matches the value for the host-name attribute of the <application> element in the
topology.xml file.
8. Clear the search index:
Here, liberty.hostname is the hostname of the Liberty server where you are running the
command. It matches the value for the host-name attribute of the <application> element in the
topology.xml file.
9. Start i2 Analyze
a) If you are using a single server deployment, run setup -t start.
b) If you are using a multiple server deployment, complete the steps to start the components of i2
Analyze in Stopping and starting i2 Analyze on page 314.
Run a selection of queries against your deployment server to test the configuration.
Root element
<SolrSchemaTemplate>
<SolrSchemaTemplate> is the root element of the Solr template configuration file. It references
the SolrSchemaTemplate.xsd file and version number. Do not change the value for the
Version attribute.
<SolrSchemaTemplate
xmlns:tns="http://www.i2group.com/Schemas/2021-02-12/SolrSchemaTemplate"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.i2group.com/Schemas/2021-02-12/
SolrSchemaTemplate SolrSchemaTemplate.xsd "
Version="1">
</SolrSchemaTemplate>
i2 Analyze elements
<Definition>
In the Solr index, analyzers are used to examine the text that is stored in the index. Two of the
analyzers that i2 Analyze uses are the free_text and text_facet analyzers. In the template
file, the analyzer is specified in the Analyzer attribute of the <Definition> element.
<Definition Analyzer="free_text">
...
</Definition>
<Definition Analyzer="text_facet">
...
</Definition>
<AnalyzerChain>
The <AnalyzerChain> element is a container for the Solr elements on page 169 in the analyzer
chain. The <AnalyzerChain> element is a child of the <Definition> element.
<Definition Analyzer="free_text">
<AnalyzerChain>
...
</AnalyzerChain>
</Definition>
<PostSynonyms>
The <PostSynonyms> element is a container for the Solr elements on page 169 in the analyzer
chain that are applied after the synonym filter. The <PostSynonyms> element is a child of the
<Definition> element, specified after the <AnalyzerChain> element.
<Definition Analyzer="free_text">
<AnalyzerChain> ... </AnalyzerChain>
<PostSynonyms>
...
</PostSynonyms>
</Definition>
<SynonymsFile>
The Path attribute of the <SynonymsFile> element contains the file name of the synonyms file to
use. The <SynonymsFile> element is at the same level as the <Definition> element.
Solr elements
The <tokenizer> and <filter> are directly converted to Solr. For more information about the
elements, see Understanding Analyzers, Tokenizers, and Filters.
In the Solr template configuration, the <tokenizer> and <filter> elements can be child elements of
the <AnalyzerChain> and <PostSynonyms> elements.
Note: The <tokenizer> element can be specified as a child of the <Definition> element where
Analyzer="free_text".
• If the position of the characters is not known, the user might want to search for "*AB*", which is
invalid in the configuration that is described.
In addition to wildcard characters that are specifically entered as part of Visual Query, several conditions
provide implicit wildcard logic:
• 'Starts with' - Applies an asterisk to the end of the condition. For example, 'Starts with: Fred' is
equivalent to 'Fred*', which could match; Fred, Frederick, or Freddie.
• 'Ends with' - Applies an asterisk to the start of the condition. For example, 'Ends with: Fred' is
equivalent to '*Fred', which could match; Fred, Wilfred, or Alfred.
• 'Contains' - Applies an asterisk to both the start and the end of the condition. For example, 'Contains:
Fred' is equivalent to '*Fred*', which could match any of the above terms, but also include Alfredo.
The use of these conditions follow the same limits as wildcard characters that have been entered
explicitly.
To change the minimum number of characters that must be included in a search query with a wildcard
character, edit properties in DiscoServerSettingsCommon.properties.
The properties that specify the minimum number of characters for Quick Search are:
WildcardMinCharsWithAsterisk
The minimum number of characters other than asterisks (*) that must be included in a wildcard
query that contains an asterisk.
WildcardMinCharsWithQuestionMark
The minimum number of characters other than question marks (?) and asterisks (*) that must be
included in a wildcard query that contains a question mark. This value should be less than, or equal
to the value of the WildcardMinCharsWithAsterisk property.
The properties that specify the minimum number of characters for Visual Query are:
VisualQueryWildcardMinCharsWithAsterisk
The minimum number of characters other than asterisks (*) that must be included in a Visual Query
condition that contains or implies asterisks.
VisualQueryWildcardMinCharsWithQuestionMark
The minimum number of characters other than question marks (?) and asterisks (*) that must be
included in a wildcard query that contains a question mark. This value should be less than, or equal
to the value of the VisualQueryWildcardMinCharsWithAsterisk property.
If you follow this procedure for a deployment that provides high availability, you must complete each
step on every Liberty server in your environment before you move to the next step.
1. Using a text editor, open the DiscoServerSettingsCommon.properties file. You can find this
file in the following location: toolkit\configuration\fragments\opal-services\WEB-INF
\classes.
2. For Quick Search, edit the values of the WildcardMinCharsWithAsterisk and
WildcardMinCharsWithQuestionMark properties.
3. For Visual Query, edit the values of the VisualQueryWildcardMinCharsWithAsterisk and
VisualQueryWildcardMinCharsWithQuestionMark properties.
4. Save and close the file.
Redeploy i2 Analyze to update the application with your changes.
5. In a command prompt, navigate to the toolkit\scripts directory.
6. Stop Liberty:
setup -t stopLiberty
setup -t deployLiberty
8. Start Liberty:
setup -t startLiberty
Run a selection of Quick Search and Visual Queries that use conditions including wildcard characters.
Continue to change the configuration until you are satisfied with the wildcard behavior.
• Rule Type - Whether the rule should allow or deny a specific type of operation
• Item Type - Which Item Type the rule should apply to
• Property Type - Which Property Types for the specified Item Type that the rule should cover
• Operator - Which Operator Types to apply the rule to
• Date and Time Aspects - Which types of temporal aspect to apply the rule to. For example, day
of the week, or time of the day.
Implicit 'all valid'
If one or more of the above rule components are not specified, the rule will be applied to all the
Visual Query conditions that would be valid.
Rule ordering
Visual Query rules are applied sequentially, meaning that if conflicting rules are added, they are
handled in order, allowing rules that are later in the file to overwrite earlier rules. This allows you to
refine your conditions, for example:
<Deny/>
<!--Allow 'Person' searches to include conditions for
'Date of birth' and 'Gender'
that are exactly or between a specified range -->
<Allow ItemTypeId="ET5" PropertyTypeIds="PER9, PER15"
Operators="EQUAL_TO, BETWEEN"/>
Using these rules, searches for people with specified dates of birth or genders are permitted, but all
other Visual Query conditions are prevented from running.
setup -t stopLiberty
setup -t deployLiberty
6. Start Liberty:
setup -t startLiberty
Run a selection of Visual Queries that use conditions. Continue to change the configuration until you are
satisfied with the Visual Query condition.
Rule Type
There are two types of Visual Query condition rule. The type of restriction determines whether the
operators specified in the rule are allowed or denied.
The type of rule can be either:
• Deny - Specify a rule that prevents an operation
• Allow - Specify a rule that enables an operation
Note: For each rule that you add to the visual-query-configuration.xml, you must specify the
type of rule.
<EntityType Id="ET5"
SemanticTypeId="guid8A586959-9837-47DE-8DBF-BC7031F01545"
Description="Person details"
DisplayName="Person"
Icon="Person (Shaded Shirt)">
Note: You can only specify one type of item per rule.
For example the following property types in the law enforcement schema:
<EntityType Id="ET5"
SemanticTypeId="guid8A586959-9837-47DE-8DBF-BC7031F01545"
Description="Person details"
DisplayName="Person"
Icon="Person (Shaded Shirt)">
...
<PropertyType Position="2"
Mandatory="false"
SemanticTypeId="guidFE45F1C4-B198-4111-8123-F42D2CD6419D"
DisplayName="Date of Birth"
Description=""
LogicalType="DATE"
Id="PER9">
<PossibleValues />
</PropertyType>
<PropertyType Position="3"
Mandatory="false"
SemanticTypeId="guid7548369B-BA9A-4C4B-AEAD-0CB442EAFA27"
DisplayName="Gender"
Description=""
LogicalType="SUGGESTED_FROM"
Id="PER15">
<PossibleValues>
<PossibleValue Description="" Value="<Unknown>"/>
<PossibleValue Description="Male" Value="Male"/>
<PossibleValue Description="Female" Value="Female"/>
</PossibleValues>
...
</EntityType>
Note: This example restriction allows conditions to be run that search for 'equal to' exact values. In
addition it includes allowing conditions that search between specified values. As a range of values
cannot be determined for a 'suggested from' property type, conditions that are between a specified
range will only be enabled for dates of birth.
Apply the rule to conditions that include both a date and time. For example: Searches for people
spotted in a specific location at specific time.
DATE
Apply the rule to conditions that focus on a date. For example: Searches for people born on a
particular day.
TIME
Apply the rule to conditions that focus on a time. For example: Searches for financial transactions
that regularly occur at a set time.
DAY_OF_MONTH
Apply the rule to conditions that focus on the day of the month. For example: Searches for people
paid on a specific day of the month.
MONTH
Apply the rule to conditions that focus on the month. For example: Searches for people born within a
specific month.
QUARTER
Apply the rule to conditions that focus on the quarter of the year. For example: Searches for
financial results.
YEAR
Apply the rule to conditions that focus on a specific year. For example: Searches for people born
within a specific year.
DAY_OF_WEEK
Apply the rule to conditions that focus on a specific day of the week. For example: Searches for
events that always occur on a Tuesday.
WEEK_OF_YEAR
Apply the rule to conditions that focus on the week of the year as calculated using the ISO week
date system. For example weekly sales results.
Operators
You can use operators to specify the types of visual query condition to restrict. If you do not specify an
operator, the restriction applies to all valid operations.
STARTS_WITH
Applies an implicit asterisk to the end of the condition. For example, Starts with: Fred is
equivalent to 'Fred*', which might match; Fred, Frederick, or Freddie.
Supported logical types:
• SINGLE_LINE_STRING
• SUGGESTED_FROM
• SELECTED_FROM
Note: The server places a limit on the length of the strings that this operator considers. By default,
the limit is 256 characters.
For example:
ENDS_WITH
Applies an implicit asterisk to the start of the condition. For example, Ends with: Fred is
equivalent to '*Fred', which might match; Fred, Wilfred, or Alfred.
Supported logical types:
• SINGLE_LINE_STRING
• SUGGESTED_FROM
• SELECTED_FROM
Note: The server places a limit on the length of the strings that this operator considers. By default,
the limit is 256 characters.
For example:
CONTAINS
Applies an implicit asterisk to both the start and the end of the condition. For example, 'Contains:
Fred' is equivalent to '*fred*', which could match any of the above terms, but also include Alfredo.
Supported logical types:
• SINGLE_LINE_STRING
• SUGGESTED_FROM
• SELECTED_FROM
For example:
WILDCARD_PATTERN
An exact match to the specified term that includes wildcard characters. For example, Wildcard
pattern: Fr?d is equivalent to 'Fr?d', which matches: Fred, but not Alfredo.
Supported logical types:
• SINGLE_LINE_STRING
• SUGGESTED_FROM
• SELECTED_FROM
For example:
NOT_WILDCARD_PATTERN
Excludes an exact match to the specified term that includes wildcard characters. For example, Not
wildcard pattern: Fr?d matches anything aside from: Fr?d.
Supported logical types:
• SINGLE_LINE_STRING
• SUGGESTED_FROM
• SELECTED_FROM
For example:
EQUAL_TO
An exact match to the specified term.
Supported logical types:
• BOOLEAN
• DECIMAL
• DATE
• DATE_AND_TIME
• DOUBLE
• INTEGER
• TIME
• SINGLE_LINE_STRING
• SUGGESTED_FROM
• SELECTED_FROM
Note: The server places a limit on the length of the strings that the operator considers. By default,
the limit is 256 characters.
For example:
NOT_EQUAL_TO
Exact matches to the specified term should be excluded. For example, Not equal to: Fred
matches anything aside from: Fred.
Supported logical types:
• BOOLEAN
• DECIMAL
• DATE
• DATE_AND_TIME
• DOUBLE
• INTEGER
• TIME
• SINGLE_LINE_STRING
• SUGGESTED_FROM
• SELECTED_FROM
Note: The server places a limit on the length of the strings that the operator considers. By default,
the limit is 256 characters.
For example:
GREATER_THAN
Matches values that are higher than a set value.
Supported logical types:
• DECIMAL
• DATE
• DATE_AND_TIME
• DOUBLE
• INTEGER
• TIME
For example:
GREATER_THAN_OR_EQUAL_TO
Matches values that are higher than or equal to a set value.
Supported logical types:
• DECIMAL
• DATE
• DATE_AND_TIME
• DOUBLE
• INTEGER
• TIME
For example:
LESS_THAN
Matches values that are less than a set value.
Supported logical types:
• DECIMAL
• DATE
• DATE_AND_TIME
• DOUBLE
• INTEGER
• TIME
For example:
LESS_THAN_OR_EQUAL_TO
Matches values that are less than or equal to a set value.
Supported logical types:
• DECIMAL
• DATE
• DATE_AND_TIME
• DOUBLE
• INTEGER
• TIME
For example:
BETWEEN
Matches values that are within a set range.
Supported logical types:
• DECIMAL
• DATE
• DATE_AND_TIME
• DOUBLE
• INTEGER
• TIME
For example:
IS_SET
Matches properties with any populated value.
Supported logical types:
• BOOLEAN
• DECIMAL
• DATE
• DATE_AND_TIME
• DOUBLE
• INTEGER
• TIME
• SINGLE_LINE_STRING
• SUGGESTED_FROM
• SELECTED_FROM
For example:
IS_NOT_SET
Procedure
1. Open a command prompt on the server, and navigate to the toolkit\scripts directory of the i2
Analyze toolkit.
2. Run the following command:
setup -t convertClobsToVarchars
3. Restart i2 Analyze:
setup -t restartLiberty
When Liberty starts, the process of updating the multi-line string columns is started. This process works
in the same way as the processing that occurs to the Information Store after an upgrade. For more
information about this processing, and how to monitor its progress, see: Information Store processing.
While the process is running, data cannot be ingested or uploaded into the Information Store.
Note: As part of the script generation process, each table that contains a CLOB column is checked to
ensure the data in that CLOB column will fit inside a VARCHAR(32672) column. This process alone
might take several hours depending on the number of rows in each table and the number of CLOB
columns. If you know that there are no CLOB columns that contain data that is too large, you can use
the -p skipSizeCheck parameter to skip this process.
When you run the command, the SQL scripts are generated in the toolkit\scripts\database
\db2\InfoStore\generated\convert-clobs-to-varchars directory. There is a script for each
item type, for example 0200-0300-clob-to-varchar_ET1.sql. The item type scripts can be run in
parallel.
Use the Db2 command line processor to run the generated SQL scripts.
Data size
When running the convertClobsToVarchars toolkit task, the following message is displayed if the
data in a CLOB column is more than 32,672 bytes (or octets):
The <table name> table contains a CLOB column that is too long to fit into a
VARCHAR column. Conversion of this table has been abandoned.
To remedy the issue, truncate the data in the specified CLOB columns, re-run the
convertClobsToVarchars toolkit task, and restart the server.
When running the SQL scripts, the following messages are displayed in the toolkit log file if the data in a
CLOB column is more than 32,672 bytes (or octets):
Column <column name> in table <table name> has a defined length of <n> which is
greater than the maximum permitted size of <n>. (Skipping table conversion).
Found: <n> values with length greater than 32,672 in table: <table name>, column:
<column name>. (Skipping table conversion).
To remedy the issue, truncate the data in the specified CLOB columns, re-run the
convertClobsToVarchars --scripts toolkit task, and run the scripts again.
Error handling
The process to convert columns from CLOB to VARCHAR is designed to be re-run in the event of a
failure.
When you start the Liberty server, monitor the console.log to ensure that the process completes
successfully. If an error does occur, the process will continue when you restart the server after you
resolve the cause of the error.
If you are running the scripts manually, errors are reported in the Db2 command line processor. You can
rerun the same script after you resolve the cause of the error.
The most common error is that the Db2 server runs out of disk space while populating the replica tables
that are used to retain data while the data type is updated.
HADR
If the i2 Analyze deployment is in HADR mode, you must update the database by using the following
procedure:
1. Backup the primary database
2. Generate the update scripts and run the scripts on the primary database
3. Backup the primary database
4. Disable HADR
5. Restore the primary database onto the secondary server
6. Re-enable HADR
setup -t stopLiberty
setup -t deployLiberty
7. Start Liberty:
setup -t startLiberty
Run a selection of Visual Queries that use conditions including lists. Continue to change the
configuration until you are satisfied with the lists in your Visual Queries that use conditions.
setup -t stopLiberty
setup -t deployLiberty
8. Start Liberty:
setup -t startLiberty
After you redeploy i2 Analyze, the Visual Queries that are saved with alerting enabled are run according
to the updated daily schedule.
6. Stop Liberty:
setup -t stopLiberty
setup -t deployLiberty
8. Start Liberty:
setup -t startLiberty
After you deploy i2 Analyze, the updated schedule is applied to run Visual Queries that are saved with
alerting enabled.
Highlight queries
In the i2 Enterprise Insight Analysis Investigate Add On, users can search for records in the Information
Store, flag records of interest, and find paths in the data between flagged records. However, the web
application does not support charting and expanding the records that users find. Instead, when users
select a record, they see its information together with lists of other records that are related to it. The
appearance and contents of these lists are driven by highlight queries.
The role of highlight queries is to navigate the data in the Information Store on the behalf of users, and
to present them with answers to questions that they ask most frequently. For example:
• If the subject record is a car, then your users might be interested in its recent owners or its recent
sightings.
• If the subject is a cellphone, users might want to know which other phones it calls most often - or
who owns the phones that it calls most often.
• If the subject is a person, the key information might be their bank accounts; or the bank accounts
that the person's accounts transact with; or the owners of those transacting accounts.
Highlight queries run whenever the user opens or refreshes the page that contains the details of a
particular record. The first five results from each highlight query appear in highlight panes on the same
page as the subject record. Each highlight pane also contains a button that the user can click to show
more information and more results.
Technical background
When a deployment of i2 Analyze starts up for the first time, the server automatically generates a set
of highlight queries from the Information Store schema. This set contains one highlight query for each
entity type in the schema that appears in at least one list of valid link end types. Each highlight query
searches for entity-link-entity structures in which the types of the target link and entity are the first valid
types for such structures.
For example, imagine a simple schema that defines four entity types (Address, Car, Cellphone, and
Person) and four link types (Calls, Owns, Related, and Resides). According to the link type definitions:
• Cars can be linked to addresses, through Resides links
• Cellphones can be linked to other cellphones, through Calls links
• People can be linked to addresses (through Resides links), to cars and cellphones (through Owns
links), and to other people (through Related links)
In this scenario, with the automatically generated highlight queries in place, a user who selects a
cellphone record sees a highlight pane containing a list of cellphones that the first cellphone has called.
(Calls is the only valid link type for cellphones, and Cellphone is the only valid end type.)
If the user selects a person record, they see a highlight pane containing a list of the cars that the person
owns. (Owns is the first valid link type for people, and Car is the first valid end type for such links.)
The automatic mechanism for generating highlight queries is designed only to be a starting point.
When you write your own highlight queries, you can make them search for more complex relationships
between the subject record and others in the Information Store.
For example, for data that matches the same schema as before, you might enable users to select a
person record and see a list of people who reside at the same address (person-address-person), or to
whom the subject makes cellphone calls (person-cellphone-cellphone-person).
To create a set of custom highlight queries for your data and your users, you write an XML configuration
file that is valid according to an XSD file that is derived from the Information Store schema. Subsequent
topics describe how to start, edit, and publish a highlight queries configuration for your deployment of i2
Analyze.
these files as a guide, you can develop your own highlight queries, replace the default file, and upload it
to the i2 Analyze server.
Highlight queries are closely aligned with the Information Store schema. You create and edit the
highlight queries configuration file in your configuration development environment, alongside Visual
Queries and search results filters.
Note: As described in Example highlight queries on page 201, example deployments of i2 Analyze
receive an example set of highlight queries instead of the automatically generated set. To replace
the example set with an automatically generated set, replace the highlight queries configuration file in
toolkit\configuration\live with its equivalent from a configuration\live directory under
the toolkit\examples directory. Then, pick up the procedure below after Step 8.
The first part of the process is to fetch the XSD file that corresponds to the Information Store schema
from the server.
1. Open a web browser and connect to the i2 Analyze server at http://host_name/opal/doc to
view the REST API documentation.
If you are not logged in to the server, you are prompted to do so. The user must be a member of a
group that has the i2:Administrator permission under command access control.
Note: This requirement is in addition to the other requirements on i2 Analyze users, all of whom
must be members of groups that confer an access level for at least one dimension value in each
security dimension.
2. Open the section for the GET v1/admin/highlightqueryconfig/xsd method, and click Try it
out! to request the XSD file from the server.
The file is displayed in the Response Body field.
3. Save the contents of the field to a file named highlight-queries-configuration.xsd in the
toolkit\configuration\live directory.
Next, you need a highlight queries configuration file to edit. To start from scratch, you can use the
highlight-queries-configuration.xml file from the live directory. Alternatively, you can
download the file that defines the automatically generated highlight queries.
4. Open the section for the GET v1/admin/highlightqueryconfig/automatic method, and
click Try it out! to request the XML file from the server.
The file is displayed in the Response Body field.
5. Save the contents of the field to a file named highlight-queries-configuration.xml in the
toolkit\configuration\live directory, replacing the existing file.
Edit the configuration file.
6. If you do not already have one, obtain and configure an XSD-aware XML editor, as described in
Setting up your XSD aware XML editor on page 316.
7. In the XML editor, open the toolkit\configuration\live\highlight-queries-
configuration.xml file.
8. Using the reference and example information, modify the file to define the highlight queries that your
analysts require.
Update the deployment with your changes.
The following method deploys your changes without stopping the server by using a POST request to a
REST endpoint.
To redeploy your changes by using the deployment toolkit only, see Redeploying Liberty on page 316.
You must deploy your changes by using the deployment toolkit if you are in a deployment with high
availability or you are deploying in your production environment.
setup -t updateLiveConfiguration
Warning: The reload method updates the configuration without requiring a server restart,
but any logged-in users are logged out from i2 Analyze when you run it.
The server validates your highlight queries as it loads them and returns any errors in its response.
12.If the configuration is invalid, modify the highlight-queries-configuration.xml file, and
then repeat the process to update the deployment.
Test the new and updated highlight queries.
13.In your web browser, view the highlight panes for a record with a number of connections that test
your highlight queries.
Segment Segment
Subject
( ...
) ( Result
)
Path
At a minimum, each segment in a highlight query specifies the link type and the entity type of the
records that match it. For example, a highlight query for finding people who live at the same address as
the subject person might have two segments:
• A link of type 'Lives At' and an entity of type 'Address'. If this segment was the last one in the path,
the results would be of type Address, and users might see a list of past and current addresses for the
subject.
• A link of type 'Lives At' and an entity of type 'Person'. The records found in the first segment are used
as inputs to the second segment, so the full query returns a list of all the people who are known to
have lived at the same addresses as the subject.
You can further constrain the results of a highlight query by placing conditions on the link and entity
records in the Information Store that each segment matches. Depending on the schema, you might
decide that you are only interested in addresses that the subject has 'Lived At' in the last three years, or
when the 'Address' is in a particular city or country. Similarly, you might only want to find people who are
female (or, alternatively, male).
As well as controlling which records you want to find with a highlight query, you can control how users
see the query results in highlight panes. Every query has a name and a description that users can read
to understand what the results in front of them mean. For the results themselves, you can specify how to
sort them, and what values to display in the limited space that highlight panes provide.
The values that users see in highlight panes do not have to come from the records that form the results
of the query. In each segment of a highlight query, you can export values from the link and entity
records that form part of a found path. For example, imagine a highlight query for bank account records
whose single segment finds other accounts that are connected through transaction links. You can export
the value of the largest transaction from the segment, and then set it as one of the outputs from the
path.
Finally, as well as exporting property values from the segments of a highlight query, you can export (and
subsequently output) the counts of the records that the highlight query finds at each point on the path.
Count A: 3
Count B: 2
Count C: 2
Count A: 1
Count B: 1
Count C: 1
Count A: 2
Count B: 2
Count C: 3
Count A: 1
Count B: 1
Count C: 1
Count A: 1
Count B: 1
A B C Count C: 1
In the diagram, the five records on the right are results of a highlight query whose subject was the
single record on the left. When you present the results to users, you can output or sort by the counts of
records along their paths. The meanings of the counts, and whether they are useful, vary according to
the specifics of the highlight query.
Path elements
• <segments>
• <EntityType>/<LinkType>
• <direction>
• <conditions>
• <exportFields>
• <outputs>
Condition elements
• <PropertyType>
Root element
<highlightQueryConfiguration>
<highlightQueryConfiguration> is the root element of the highlight query configuration file.
In the examples, and in the automatically generated file, it references the highlight-queries-
configuration.xsd file:
<highlightQueryConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance"
xsi:noNamespaceSchemaLocation="highlight-queries-configuration.xsd">
...
</highlightQueryConfiguration>
Top-level elements
<settings>
As well as the highlight queries themselves, the configuration file enables control of some aspects
of highlight query behavior that can affect the performance of the i2 Analyze server. The optional
<settings> element supports the following child elements:
• <maxLinkResultsPerSegment> - The maximum number of link records that a single segment
can match in the Information Store, which defaults to 50000.
• <maxSeedsPerSegment> - The maximum number of entity records that any segment except
the last can match in the Information Store, which defaults to 1000.
• <maxResultsPerQuery> - The maximum number of entity records that the last segment can
match in the Information Store, which defaults to 1000.
• <expireResultsInMinutes> - The length of time for which the server caches the results of a
highlight query, which defaults to 30 minutes.
<highlightQueryConfiguration ...>
<settings>
<maxLinkResultsPerSegment>50000</maxLinkResultsPerSegment>
...
</settings>
...
</highlightQueryConfiguration>
If any of the "max" limits is breached, the query in question returns no results and the user sees an
error message. Other queries that do not breach the limits continue to run normally.
<highlightQueryGroups>
The only other element permitted as a direct child of <highlightQueryConfiguration> is
<highlightQueryGroups>, which contains all the highlight queries for the deployment, grouped
by the entity types of the subject records to which they apply.
<highlightQueryConfiguration ...>
<settings>...</settings>
<highlightQueryGroups>
...
</highlightQueryGroups>
</highlightQueryConfiguration>
<highlightQueryGroup>
Each highlight query group contains the highlight queries for subject records of a particular type.
The itemType attribute of the <highlightQueryGroup> element indicates which type the group
is for. The valid values for the itemType attribute are defined in the XSD file; they are based on the
display names of entity types in the Information Store schema.
<highlightQueryGroups>
<highlightQueryGroup itemType="Vehicle-I">
...
</highlightQueryGroup>
...
</highlightQueryGroups>
<highlightQuery>
A <highlightQueryGroup> element can have multiple child <highlightQuery> elements.
Each <highlightQuery> element defines a query to run against the Information Store, and
controls some aspects of the highlight pane that displays the query results in the user interface.
The <highlightQuery> element supports the title attribute whose value is displayed as the
title of the highlight pane. It also supports the automatic attribute that determines whether the
query runs as soon as a user views the subject record. You can specify that users must run a time-
consuming query manually (by clicking a button) by setting the value of this optional attribute to
false.
<description>
In the <description> child of a <highlightQuery> element, you can explain the behavior
of the query to users. The contents of this element are displayed to users when they click the
information icon ( i ) in a highlight pane heading. At a minimum, IBM recommends that you describe
the relationship of the results to the subject record, and the order that you chose to present them in.
<path>
The <path> child of a <highlightQuery> element contains <segments> and <outputs>
elements that together provide the logic for the query. Segments are responsible for the query
structure, while outputs determine how users see the query results in highlight panes.
<sortBy>
The last permitted child of a <highlightQuery> element is <sortBy>. This optional (but
important) element specifies how values that were discovered during execution of the query
can change the results that users see. For example, depending on your investigation, you might
sort a list of people who are associated with a vehicle by their family names or by their dates of
association.
The <field> children of <sortBy> are processed in order of appearance. In each one, the id
attribute contains the identifier of a field that was exported from the path. The order attribute
determines whether results appear in ascending (ASC) or descending (DESC) order of the values of
the identified field.
<highlightQueryGroup itemType="Vehicle-I">
<highlightQuery title="People with access" automatic="true">
<description>People who have access to this vehicle,
sorted by family name.</description>
<path>...</path>
<sortBy>
<field id="familyName" order="ASC"/>
</sortBy>
</highlightQuery>
...
</highlightQueryGroup>
Path elements
<segments>
<path> elements always have exactly two children: <segments> and <outputs>. <segments>
always contains at least one <segment> element that defines a part of the query structure.
<path>
<segments>
<segment>
...
</segment>
...
</segments>
<outputs>
...
</outputs>
</path>
<segment>
Each <segment> element also has exactly two children that specify the link and entity types that
records must have in order to match this part of the query. Like the legal values for the itemType
attribute of the <highlightQueryGroup> element, the names of these elements are not fixed but
depend upon the Information Store schema for the deployment. Furthermore, the link type must be
valid for the entity type that you specified in <highlightQueryGroup>, and the entity type must
be valid for that link type. For example, a segment for people who have access to vehicles:
It is valid to use <AnyLinkType/> in place of a specific link type from the schema. It is also valid
to use <AnyEntityType/> in place of a specific entity type, in any segment except the last one in
the path.
<EntityType>/<LinkType>
The previous example contains a functionally complete segment definition. In a highlight query, the
segment finds all the people in the Information Store that are connected to the inputs by 'Access To'
links. If the segment is the first in the path, its single input is the subject record. Otherwise, its inputs
are the records that the previous segment found.
The "entity type" and "link type" elements that make up a segment support three child elements, all
of which are optional. To place further constraints on the records that the segment matches, you
can add a <direction> or a <conditions> element. If you want to use some values from the
records that you find at this stage of the query to provide context for the final results, you can add
an <exportFields> element.
<direction>
For link type elements only, you can constrain matching records according to whether their direction
is towards or away from the entity record that forms the input to the segment. The <direction>
child of a link type element always contains a single <value> child element whose text content is
either INCOMING or OUTGOING.
A constraint of this type is often useful in scenarios such as telephone call analysis or financial
investigations, where the direction of a link is key to determining its relevance.
<conditions>
The <conditions> child of an entity type or link type element can contain any number of child
elements that correspond to property types of the entity type or link type in question. For each
property type, you can place a requirement on the values that matching records must have.
Note: At this version of i2 Analyze, highlight queries do not support conditions on property types
with geospatial or multi-line string logical types.
<exportFields>
You might want to export values from a segment of a highlight query for two reasons. First, to
display those values to users alongside the results in the highlight pane for the query. Second, to
use the values as criteria by which to sort the results of the query.
For a particular highlight query result, the values that you can export are the values of properties
from the segments that contributed to that result. For example, imagine a query that finds bank
account records that are connected through transaction links. You want to display the transaction
dates from those links alongside the found accounts. In this example, every account is connected to
the subject record through different links from every other account. If you export the date from the
link type, each result sees a date value that comes from its particular links.
<segments>
<segment>
<Access_To-I>
<direction>...</direction>
<conditions>...</conditions>
<exportFields>
...
</exportFields>
</Access_To-I>
<Person-I/>
</segment>
...
The <exportFields> element supports any number of two child element types:
<propertyField> and <countField>.
<propertyField>
The <propertyField> child of the <exportFields> element has three mandatory attributes:
• propertyType - The property type whose value you want to export from the segment. Valid
values for this attribute depend on the current entity or link type and are defined in the XSD file.
These values are based on the display names of property types in the Information Store schema.
• aggregatingFunction - For any particular result, it is possible for a segment to match
multiple records. The aggregating function specifies what property value to use when that
situation arises. Set the attribute value to MAX or MIN to select the highest or lowest property
value from such a set. Alternatively, set the value to SUM to add the property values from all the
matching records together.
• id - An identifier for the exported value that you can use to refer to it from elsewhere in the
highlight query structure.
<Access_To-I>
<direction>...</direction>
<conditions>...</conditions>
<exportFields>
<propertyField id="typeOfUse" propertyType="type_of_use-P"
aggregatingFunction="MAX"/>
...
</exportFields>
</Access_To-I>
<countField>
As well as using <propertyField> to export the highest, lowest, or total value of a property from
a set of records, you can use <countField> to export the number of records in that set. Returning
to the example of linked accounts, it might be useful to tell users how many transactions took place
between the subject and each result, or to sort the results by that number.
The <countField> element has a single mandatory attribute: id, which behaves exactly as it
does in <propertyField>.
<outputs>
After the <segments> in the path are the <outputs>, which control the values that users see
with the query results in a highlight pane. The <outputs> element supports any number of child
<field> elements, which have two attributes:
• id - The identifier of an export field, as previously specified in a <propertyField> or
<countField> element.
• label - A value that the user interface uses as a column header for the field.
<path>
<segments>
<segment>
...
</segment>
...
</segments>
<outputs>
<field id="typeOfUse" label="Type of use" />
...
</outputs>
</path>
Each highlight pane has space for up to two output fields. If you specify more than two <field>
elements, users see them only when they click Show more in the highlight pane to display more
result information.
Condition elements
<PropertyType>
Inside an <EntityType> or <LinkType> element, the <conditions> element can contain any
number of child elements whose names are based on property types from the Information Store
schema. The child elements restrict the records in the Information Store that match the current
segment. The permitted contents of the condition depend on the logical type of the property type.
<Access_To-I>
<conditions>
<type_of_use-P>
...
</type_of_use-P>
...
</conditions>
...
</Access_To-I>
Some logical types - geospatial values, multiple-line strings, documents, pictures, and XML data -
are not supported in highlight query conditions. According to the generated XSD file, property types
with unsupported logical types are not valid children of <conditions> elements. For property
types with supported logical types, this version of i2 Analyze supports conditions with four kinds of
operators:
• For all valid property types, you can use <isSet> and <isNotSet> elements to specify that
matching records must (or explicitly must not) have a value for the property with that type.
<conditions>
<type_of_use-P>
<isSet/>
</type_of_use-P>
...
</conditions>
• For property types whose properties have string values, you can use <equalTo> and
<notEqualTo> elements to specify that matching records must have a value for the property
that exactly matches (or explicitly does not match) values that you provide.
<conditions>
<type_of_use-P>
<notEqualTo>
<value>Passenger</value>
...
</notEqualTo>
</type_of_use-P>
...
</conditions>
• For property types whose properties have numeric values, you can use <greaterThan>,
<lessThan>, <greaterThanOrEqualTo>, and <lessThanOrEqualTo> elements to specify
that matching records must have a property value with the specified relationship to a value that
you provide.
<conditions>
...
<transaction_value-P>
<greaterThan>
<value>50000</value>
</greaterThan>
</transaction_value-P>
...
</conditions>
• For property types whose properties have date and time values, you can use an
<inThePreceding> element to specify that matching records must have a value for the
property that occurred within a certain period before the current date.
<conditions>
<type_of_use-P>
...
</type_of_use-P>
<end_date_and_time-P>
<inThePreceding unit="WEEKS">
<value>6</value>
</inThePreceding>
</end_date_and_time-P>
...
</conditions>
In <inThePreceding> elements, the <value> must be an integer, while the unit attribute
can be one of WEEKS, MONTHS, or YEARS.
If you specify multiple property types in the same <conditions> block, their effects are ANDed
together. For example, if your conditions are "given name equal to 'John'" and "family name equal to
'Doe'", then only records for which both conditions are true appear in the results.
Conversely, if you use multiple operators against the same property type, their effects are ORed
together. For example, you might decide that a property must either be not set or have one of a
handful of values. However, you cannot use the same operator more than once, and you cannot
use more than one of the numeric operators at the same time. Attempting to do either results in an
invalid configuration file.
A worked example
The following listing represents one of the more complicated highlight queries in the toolkit example:
<segment>
<Transaction-I>
<exportFields>
<countField id="transactioncount2"/>
</exportFields>
</Transaction-I>
<Account-I/>
</segment>
<segment>
<AnyLinkType/>
<Organization-I/>
</segment>
</segments>
<outputs>
This highlight query is for subject records of type 'Person'. The results that it finds are of type
'Organization', because that is the entity type in the final segment. There are four segments in all, which
has the potential to make the query resource-intensive. The enclosing <highlightQuery> element
has its automatic attribute set to false so that the query only runs when a user requests it.
The first segment in the query finds all the 'Account' records in the Information Store to which the
subject is connected through an 'Access To' link. There are no conditions on either part of the segment,
and no values are exported for use elsewhere.
The second segment takes all the accounts to which the subject has access, and finds accounts that
they have exchanged transactions with. However, it does not find all such accounts, because of the
condition on the link type:
<conditions>
<transaction_currency-P>
<equalTo>
<value>US Dollars</value>
</equalTo>
</transaction_currency-P>
</conditions>
The condition restricts the accounts that this segment finds to those where transactions have taken
place in US dollars. ("US Dollars" is configured in the Information Store schema as a possible value
for properties of this type.) The next part of the segment then exports information about these dollar
transactions for later use:
<exportFields>
<propertyField propertyType="transaction_value-P"
aggregatingFunction="MAX"
id="value"/>
<propertyField propertyType="date_and_time-P" aggregatingFunction="MAX"
id="dateTime"/>
<countField id="transactioncount" />
</exportFields>
For each of the eventual results of the highlight query, these lines record the value of the largest
transaction at this location in the path, and the date of the most recent transaction. These values might
come from different transactions if the count - which we also export here - is greater than one.
The inputs to the third segment, then, are all the accounts that have transacted in dollars with accounts
to which the subject has access. The third segment goes on to find any accounts in the Information
Store that have exchanged transactions with the inputs. It also exports the count of transactions at this
location in the path:
<exportFields>
<countField id="transactioncount2"/>
</exportFields>
The final segment takes these accounts that are twice-removed from accounts to which the subject has
access, and finds organizations in the Information Store that are connected to them in any way. To do
so, it makes use of an <AnyLinkType> element:
<segment>
<AnyLinkType/>
<Organization-I/>
</segment>
It is worth recounting what this means from an investigative point of view. For an organization to be
found by this query, it must be linked to an account that has transacted with an account that has also
exchanged dollar transactions with an account to which the subject has access. In simpler terms, it
might be that accounts belonging to the person and the found organizations are exchanging money
through third accounts. The query certainly is not conclusive, but the results might become targets for
further investigation.
When the results of the query are presented to users, they include the values that were exported from
the second and third segments:
<outputs>
<field label="# txns Per -- Inter" id="transactioncount"/>
<field label="# txns Inter -- Org" id="transactioncount2"/>
<field label="Largest txn value" id="value"/>
<field label="Most recent txn date" id="dateTime"/>
</outputs>
These lines show the challenges of displaying useful information in the relatively confined space of a
highlight pane. In fact, only the first two fields appear in the pane; the others are displayed when the
user clicks Show more to display more results (and more property values from those results). Ideally,
the labels work in harmony with the <description> of the query that you write to explain the results to
your users.
The final part of the highlight query specifies how the application should sort the results. In general, you
can request multiple sorts here that are applied in sequence. In this instance, the single criterion is the
highest transaction value from the second segment. Setting the order to DESC means that numbers
run from high to low, which is the more common requirement. The opposite is true when you sort on text
values, and setting order to ASC places the results in alphabetical sequence.
(license plates, social security numbers), then by default the user sees a separate, one-record facet for
every value in the search results. If you configure that property type not to appear, then the system has
fewer facets to calculate, and it can display more useful facets from different property types instead.
For records of each entity and link type, the results configuration file defines which property types and
metadata criteria to use for filtering in Quick Search, external search, and Visual Query results views. All
these views display the same set of facets.
The deployment toolkit contains a results configuration file for each of the example schemas in
the examples\schemas directory, apart from the Chart Store schema. Their names all end in -
results.configuration.xml. If your system uses a modified version of one of the example
schemas, you can modify the appropriate results configuration file.
If you decide to write your own file, the entity and link types that appear must correspond to entity and
link types from the deployed schemas. Examine the schemas and the data in your system, and decide
which property types to display with facets in the results view. You can also decide which metadata
criteria to use for the same purpose.
If you do not specify a results configuration file, all of the property types and metadata criteria that can
be displayed with facets, for records of all entity and link types, are displayed in the results view in
schema order.
If you follow this procedure in a deployment that provides high availability, you must complete each step
on every Liberty server in your environment before you move to the next step.
For more information about the results configuration file and the changes that you can make, see
Understanding the results configuration file on page 206.
1. Using an XML editor, open the results configuration file that you want to modify.
2. Add, modify, or remove any <ItemTypeFacet> elements and appropriate child
<PropertyTypeFacet> and <MetadataFacet> elements for your deployment.
Note: Property types that have the following logical types cannot be used to filter search results:
• GEOSPATIAL
• MULTIPLE_LINE_STRING
3. Save and close the file.
Note: Ensure that your modified file is stored in the toolkit\configuration\fragments
\common\WEB-INF\classes directory.
4. If you have created the file or changed its name, you must update the name of the file that the
deployment uses.
a) Using a text editor, open the DiscoServerSettingsCommon.properties file in the toolkit
\configuration\fragments\opal-services\WEB-INF\classes directory.
b) Ensure that the value of the ResultsConfigurationResource property is set to the name of
your results configuration file, and then save and close the file.
Tip: If you do not want to configure search result filtering, clear the value of the
ResultsConfigurationResource property.
Redeploy i2 Analyze to update the application with your changes.
5. In a command prompt, navigate to the toolkit\scripts directory.
6. Stop Liberty:
setup -t stopLiberty
setup -t deployLiberty
8. Start Liberty:
setup -t startLiberty
Run a selection of queries against an i2 Analyze server. Continue to change the configuration until you
are satisfied with the types of filter that are available.
Here, the <ItemTypeFacet> element is used to declare which item type you are defining property
types and metadata criteria for. The value of the TypeId attribute specifies the item type. This value
corresponds to the Id value of an <EntityType> or <LinkType> in the schema.
Important: For item types that are defined in gateway or connector schemas, you must augment the
item type identifier with the short name of its schema. If the type in the previous example was defined in
a gateway schema with the short name External, then the <ItemTypeFacet> element changes:
In these augmented identifiers, the short name is always in lower-case letters, and any whitespace or
non-alphanumeric characters are converted to single hyphens. Furthermore, the short name is always
separated from the original identifier with a hyphen.
Property types are specified in child <PropertyTypeFacet> elements. The value of the TypeId
attribute specifies the property type. This value corresponds to the Id value of a <PropertyType> in
the i2 Analyze schema.
Metadata criteria are specified in child <MetadataFacet> elements. The value of the Criterion
attribute specifies the metadata criterion and can have the following values:
FirstUploaded
The date that the record was first uploaded to the Information Store.
FirstUploadedBy
The display name of the user who first uploaded the record.
LastUploaded
The date that the record was last uploaded to the Information Store.
LastUploadedBy
The display name of the user who last uploaded the record.
NotesCreatedBy
The display name of a user who added a note to the record.
SourceRefSourceName
The name of a source that appears in the source references for the record.
SourceRefSourceType
The type of a source that appears in the source references for the record.
IngestionDataSourceName
The data source name that the record was ingested from.
If the record was uploaded from Analyst's Notebook Premium, the IngestionDataSourceName is
automatically set to ANALYST.
StorageLocation
For a record that was found through an external search, the location where the record is stored. At
this version of the software, the values that users might see are Information Store and Other.
Note: Any child <MetadataFacet> elements must be specified after all of the child
<PropertyTypeFacet> elements within an <ItemTypeFacet> element.
In the <ItemTypeFacet> element, the value of the Subfacets attribute defines the method for
specifying the property types and metadata criteria.
The Subfacets attribute can have the following values:
All
All property types of this item type and metadata criteria are available as filterable options for the
results. This behavior is the default if an item type is not added to the results configuration file. For
example:
Declaring this fragment while working with the law enforcement schema will allow you to filter by
'Person' and by all the available properties and metadata.
Note: You must not specify any child elements when the value of the Subfacets attribute is All.
IncludeSpecific
Specific property types and metadata criteria for this item type are displayed in the filtering lists. For
example:
Declaring this fragment while working with the law enforcement schema will allow you to filter by
'Person' and by 'First (Given) Name', 'Family Name', and 'NotesCreatedBy'.
Note: You must specify at least one child <PropertyTypeFacet> or <MetadataFacet>
element when the value of the Subfacets attribute is IncludeSpecific.
ExcludeSpecific
Specific property types and metadata criteria for this item type are excluded from the filtering lists.
For example:
Declaring this fragment while working with the law enforcement schema will allow you to filter by
'Vehicle', but not by 'License Plate Number' or 'FirstUploaded'.
Note: You must specify at least one child <PropertyTypeFacet> or <MetadataFacet>
element when the value of the Subfacets attribute is ExcludeSpecific.
None
No property types of this item type and metadata criteria are available for filtering. For example:
Declaring this fragment while working with the law enforcement schema will allow you to filter by
'Person' but not by any of the properties of the Person type (such as eye color), nor any of the
metadata criteria.
Note: You must not specify any child elements when the value of the Subfacets attribute is None.
Additionally, you can disable all property type and metadata criteria for faceting by using the
PropertyTypeFacetsEnabled and MetadataFacetsEnabled attributes of the <Facets>
element. By default, both property type and metadata criteria are enabled for faceting. To disable, set
the attribute values to false.
ExpandMaxResultsSoftLimit=100
setup -t stopLiberty
setup -t deployLiberty
6. Start Liberty:
setup -t startLiberty
After you redeploy i2 Analyze, run a selection of Expand queries to test the behavior of the deployment.
Configuring matching
In a deployment of i2 Analyze that contains data that originates from multiple sources, the likelihood
of multiple records that represent the same real-world object or relationship increases. To enable
analysts to identify and reduce the number of matching records, you can activate the matching features
in Analyst's Notebook Premium by creating custom match rules.
i2 Analyze can be presented with data from multiple sources, through multiple routes. The mechanisms
that identify matching records depend on how the data is presented to i2 Analyze:
• For records that are on a chart, system matching can compare them with records in the Information
Store, and Find Matching Records can compare records on the chart with each other.
• When data is presented through Analyst's Notebook Premium import, or retrieved by connectors, or
created by analysts, system matching can compare incoming records with records on the chart and
in the Information Store.
• During the ingestion process, the incoming records are compared with the records in the Information
Store. For more information about identifying matches in this way, see Information Store data
correlation.
Important:
The scope of match rules is not the same for link records as it is for entity records. Rules for matching
entity records apply universally, but rules for matching link records apply only to links between the same
pair of ends. Link records that do not connect the same ends are never a match for each other.
This behavior also means that rules for matching link records behave differently in system matching and
Find Matching Records:
• In system matching, link records can match only when they are between the same pair of entity
records.
• In Find Matching Records, link records can match when they are between any two records in a pair
of entity items on the chart surface.
If the pair of entity items on the chart contain a large number of united records, some combinations
of those records are not searched for matching links. By default, the number of link end record
combinations that are searched is 100. You can change this value, but the link matching process might
take longer if more combinations are searched. For more information about changing the maximum
number of combinations that are searched, see The DiscoServerSettingsCommon.properties
file.
setup -t updateMatchRules
setup -t switchStandbyMatchIndexToLive
This toolkit task switches the standby match index for your new rules to the live index. The match
index that was live is now the standby match index.
Connect to the deployment again and test your system rules with representative data to ensure that they
meet your requirements. In Analyst's Notebook Premium, you must log out and log in again to see your
changes.
If you want to modify the system rules again without using Analyst's Notebook Premium, you can modify
the file in an XML editor. For more information about the structure of the match rules file, see Match
rules syntax on page 215.
setup -t updateLiveConfiguration
The updateLiveConfiguration toolkit task updates server with every file in the toolkit
\configuration\live directory.
7. Update the running application by using the reload endpoint:
To reload the server like this, ensure that you can use the admin endpoints. For more information,
see Using the admin endpoints on page 316.
Warning: The reload method updates the configuration without requiring a server restart,
but any logged-in users are logged out from i2 Analyze when you run it.
Connect to the deployment again and test your system rules with representative data to ensure that they
meet your requirements. In Analyst's Notebook Premium, you must log out and log in again to see your
changes.
If you want to modify the system rules again without using Analyst's Notebook Premium, you can modify
the file in an XML editor. For more information about the structure of the match rules file, see Match
rules syntax on page 215.
Also, to reload the server through the reload endpoint, ensure that you have a command-line tool
such as postman or curl, and a user that has the i2:Administrator command access control
permission.
Note: For more information about using the admin endpoints, see Using the admin endpoints on page
316.
In your configuration development environment, use Analyst's Notebook Premium to create your match
rules, which are saved to an XML file that you can move to the i2 Analyze server. (The match rules files
that Analyst's Notebook Premium creates are compatible with both Find Matching Records and system
matching, but IBM recommends that you configure the features separately.)
Alternatively, you can create the server rules by editing the match rules XML file manually. If you create
the match rules this way, you must still complete the following steps that describe how to configure and
deploy the file on the i2 Analyze server.
1. Connect to your server in Analyst's Notebook Premium to load the i2 Analyze schema, and then
create your match rules. As you do so, test them with representative data on the chart.
For more information about how to create match rules, see Find matching records in the Analyst's
Notebook documentation.
After you create your rules, Analyst's Notebook Premium saves a file named local-fmr-match-
rules.xml to the %localappdata%\i2\Enterprise Insight Analysis\Match Rules
\<data_source_id> directory on the local workstation. The file is saved in the directory that was
modified most recently.
Next, you must place the match rules file on the i2 Analyze server and update the deployment with your
changes.
2. Move the local-fmr-match-rules.xml file to the toolkit\configuration\live directory
in the i2 Analyze deployment toolkit.
3. Delete the fmr-match-rules.xml file from the toolkit\configuration\live directory.
4. Rename local-fmr-match-rules.xml to fmr-match-rules.xml.
Update the deployment with your changes.
The following method deploys your changes without stopping the server by using a POST request to a
REST endpoint.
To redeploy your changes by using the deployment toolkit only, see Redeploying Liberty on page 316.
You must deploy your changes by using the deployment toolkit if you are in a deployment with high
availability or you are deploying in your production environment.
5. In a command prompt, navigate to the toolkit\scripts directory.
6. Update the configuration on the server:
setup -t updateLiveConfiguration
The updateLiveConfiguration toolkit task updates server with every file in the toolkit
\configuration\live directory.
To reload the server like this, ensure that you can use the admin endpoints. For more information,
see Using the admin endpoints on page 316.
Warning: The reload method updates the configuration without requiring a server restart,
but any logged-in users are logged out from i2 Analyze when you run it.
Connect to the deployment again and test that your server rules are visible in the Find Matching
Records feature. In Analyst's Notebook Premium, you must log out and log in again to see your
changes.
Note: If you copied the local-fmr-match-rules.xml file instead of moving it, each rule is
duplicated in Analyst's Notebook Premium because the local and server rules are the same.
If you continue to develop your server rules by using this process, the cached-fmr-match-
rules.xml file in the Match Rules directory contains the rules that are currently deployed on the
server. Every time that Analyst's Notebook Premium connects, the cached rules file is overwritten with
the latest version from the server.
If you want to modify the server rules again without using Analyst's Notebook Premium, you can modify
the file in an XML editor. Any changes that you make are validated when you start i2 Analyze. For more
information about the structure of the match rules file, see Match rules syntax on page 215.
The root element of a match rules file is a <matchRules> element from the defined namespace. For
example:
<tns:matchRules
xmlns:tns="http://www.i2group.com/Schemas/2019-05-14/MatchRules"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xsi:schemaLocation=
"http://www.i2group.com/Schemas/2019-05-14/MatchRules MatchRules.xsd"
version="2"
enableSourceIdentifierMatching="true">
...
</tns:matchRules>
Attribute Description
enableSourceIdentifierMatching For a deployment that contains the Information Store and the
i2 Connect gateway, controls whether system matching uses
source identifiers to determine whether records match each other,
regardless of whether they have property matches.
When this attribute is false or absent, different users can cause
duplication by retrieving the same records from external sources,
editing them, and uploading them to the Information Store. When it
is true, such records are detected by system matching.
version The version of the match rules file, which must be 2 at this release.
matchRule
Inside the root element, each <matchRule> element defines a match rule for records of a particular
entity or link type. The <matchRule> element has the following attributes:
Attribute Description
id An identifier for the match rule, which must be unique within the match rules
file.
itemTypeId The identifier for the entity or link type to which the rule applies, as defined
in the i2 Analyze schema.
Important: For item types that are defined in gateway or connector
schemas, i2 Analyze appends the schema short name to the item type
identifier. For example, if the gateway schema defines an item type with the
identifier ET5, then the identifier to use here might be ET5-external.
In the modified type identifier, the short name is always in lower-case letters
and separated from the original identifier with a hyphen. Any whitespace
or non-alphanumeric characters in the short name are converted to single
hyphens.
When you create or edit match rules through Analyst's Notebook Premium,
the application handles these modifications to item type identifiers for you.
When you edit the XML file yourself, you are responsible for specifying item
type identifiers correctly.
displayName The name of the rule, which is displayed to analysts in Analyst's Notebook
Premium in Find Matching Records.
description The description of the rule, which is displayed to analysts in Analyst's
Notebook Premium in Find Matching Records.
active Defines whether the rule is active. A value of true means that the rule is
active; a value of false means that the rule is not active.
linkDirectionOperator Determines whether two links must have the same direction in order
to match. Mandatory for link type rules, where it must have the value
EXACT_MATCH or ANY. Must be absent or null for entity type rules.
Attribute Description
version In earlier versions, the version attribute was mandatory on <matchRule>
elements. At this release, the per-rule version is optional, and any value is
ignored.
<matchRule
id="4a2b9baa-e3c4-4840-a9fd-d204711af50e"
itemTypeId="ET3"
displayName="Match vehicles"
description="Match vehicles with the same license plate when
either the registered state or region are the same."
active="true">
...
</matchRule>
<matchRule
id="8aa5f6f4-a1a8-41de-b5c5-8701e44bcde7"
itemTypeId="LAS1"
displayName="Match duplicate links"
description="Match link records between the same pair of entity
records when the links are in the same direction."
active="true"
linkDirectionOperator="EXACT_MATCH">
...
</matchRule>
To specify the behavior of a match rule, you can use the following children of the <matchRule>
element:
matchAll
The <matchAll> element specifies that all of the conditions within it must be met.
matchAny
The <matchAny> element specifies that at least one of the conditions within it must be met.
A <matchRule> element must have both the <matchAll> and <matchAny> elements. For example:
condition
All match rules must contain the <matchAll> and <matchAny> elements, although both can be empty
for link type rules. It is valid to create a rule that makes all link records of the same type between the
same pair of entity records match each other, regardless of any other considerations.
All entity type rules must contain at least one condition. Many link type rules contain conditions too.
Each condition defines a comparison that takes place between values in different records, and specifies
when those values are considered to match. Conditions can be refined by using operators, values, and
normalizations.
To specify the conditions of a match rule, you use the <condition> element that can be a child
of the <matchAll> and <matchAny> elements. Each <condition> element has a mandatory
propertyTypeId attribute, which is the identifier for the property type to which the condition applies,
as defined in the i2 Analyze schema.
For example:
<condition propertyTypeId="VEH2">
...
</condition>
All conditions contain an <operator> element, most of them a contain <value> element, and many
contain <normalizations>.
operator
The <operator> element defines the type of comparison between the property values in different
records, or between the property value and a static value specified within the rule. The possible
operators are:
Operator Description
EXACT_MATCH The values that are compared must match each other exactly.
EXACT_MATCH_START A specified number of characters at the start of string values must
match each other exactly.
EXACT_MATCH_END A specified number of characters at the end of string values must
match each other exactly.
EQUAL_TO The property values must match each other, and the specified
<value>.
For example:
<condition propertyTypeId="VEH2">
<operator>EXACT_MATCH</operator>
...
</condition>
For more information about the operators that you can use, depending on the logical type of the
property, see Table 1: Operators and normalizations for each schema logical type on page 220.
value
The contents of the <value> element affect the behavior of the <operator> of the condition.
Different operators require different value types.
<operator>EXACT_MATCH_START</operator>
<value xsi:type="xsd:int">3</value>
• If the operator is EQUAL_TO, the value is a string to compare with the property value:
<operator>EQUAL_TO</operator>
<value xsi:type="xsd:string">red</value>
<operator>EXACT_MATCH</operator>
normalizations
The <normalizations> element contains child <normalization> elements that define how
property values are compared with each other (and sometimes with the contents of the <value>
element). The possible values for the <normalization> element are:
Normalization Description
IGNORE_CASE Ignores case during the comparison ('a' matches 'A')
IGNORE_DIACRITICS Ignores diacritic marks on characters ('Ã' matches 'A')
IGNORE_WHITESPACE_BETWEEN Ignores whitespace between characters ('a a' matches
'aa')
IGNORE_WHITESPACE_AROUND Ignore whitespace around a string (' a ' matches 'a')
IGNORE_NUMERIC Ignore numeric characters ('a50' matches 'a')
IGNORE_ALPHABETIC Ignore alphabetic characters ('a50' matches '50')
IGNORE_NONALPHANUMERIC Ignore non-alphanumeric characters ('a-a' matches
'aa')
SIMPLIFY_LIGATURES Simplify ligatures ('æ' matches 'ae')
For example, you might have the following normalizations for an EXACT_MATCH operator:
In this example, the values "b m w xdrive" and "BMW x-drive" are considered a match.
The operators and normalizations that you can specify for a condition depend on the logical type of the
property type to which the condition applies. The following table shows the operators and normalizations
that you can use for each logical type:
Property types that have the following logical types cannot be used in match rules:
• GEOSPATIAL
• MULTIPLE_LINE_STRING
The following XML is an example of a match rules file that contains a single entity match rule. The rule
matches vehicle records that have the same values for the license plate property, and the same values
for either the state or region properties.
For example, two vehicle records with the license plates "1233 DC 33" and "1233DC33" from the
regions "Ile-de-France and "Île De France" are identified as a match for the following rule:
<tns:matchRules
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation=
"http://www.i2group.com/Schemas/2019-05-14/MatchRules MatchRules.xsd "
version="2"
xmlns:tns="http://www.i2group.com/Schemas/2019-05-14/MatchRules">
<matchRule id="4a2b9baa-e3c4-4840-a9fd-d204711af50e"
itemTypeId="ET3"
displayName="Match vehicles"
description="Match vehicles with the same license plate,
when either the registered state or
region are the same."
active="true">
<matchAll>
<condition propertyTypeId="VEH2">
<operator>EXACT_MATCH</operator>
<normalizations>
<normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
</normalizations>
</condition>
</matchAll>
<matchAny>
<condition propertyTypeId="VEH16">
<operator>EXACT_MATCH</operator>
<normalizations>
<normalization>IGNORE_CASE</normalization>
<normalization>IGNORE_DIACRITICS</normalization>
<normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
<normalization>IGNORE_NONALPHANUMERIC</normalization>
</normalizations>
</condition>
<condition propertyTypeId="VEH15">
<operator>EXACT_MATCH</operator>
<normalizations>
<normalization>IGNORE_CASE</normalization>
<normalization>IGNORE_DIACRITICS</normalization>
<normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
<normalization>IGNORE_NONALPHANUMERIC</normalization>
</normalizations>
</condition>
</matchAny>
</matchRule>
</tns:matchRules>
setup -t updateLiveConfiguration
To reload the server by using the reload endpoint, ensure that you can use the admin endpoints. For
more information about using the admin endpoints, see Using the admin endpoints on page 316.
Warning: The reload method updates the configuration without requiring a server restart,
but any logged-in users are logged out from i2 Analyze when you run it.
Your configuration is validated, and any errors are reported in the REST response and in the wlp\usr
\servers\opal-server\logs\deployed_war\IBM_i2_Update_Live_Configuration.log.
Where deployed_war can be:
• opal-services-is
• opal-services-daod
• opal-services-is-daod
Depending on the type of spatial data that you are ingesting into your Information Store, to increase
the performance of Visual Query operations, you might need to set up spatial database indexes on
IS_DATA tables that contain geographical data. For more information on creating indexes manually
in the Information Store, see Creating indexes in the Information Store database on page 162. In
addition, for database specific information about the use of spatial indexes:
• For Db2 databases, see the IBM Integrated Analytics System documentation: Using indexes and
views to access spatial data.
• For SQL Server databases, see the Microsoft SQL Server Spatial Indexing documentation: https://
docs.microsoft.com/en-us/sql/relational-databases/spatial/spatial-indexes-overview?view=sql-server-
ver15.
{
"mapConfig": {
"baseMaps": [
{
"id": "ESRI_WorldStreetMap",
"displayName": "ESRI World Street Map",
"url": "https://server.arcgisonline.com/ArcGIS/rest/services/
World_Street_Map/MapServer/tile/{z}/{y}/{x}",
"attribution": "Sources: Esri, HERE, Garmin, USGS, Intermap, INCREMENT
P, NRCan, Esri Japan, METI, Esri China (Hong Kong), Esri Korea, Esri (Thailand),
NGCC, © OpenStreetMap contributors, and the GIS User Community"
}
]
},
"coordinateSystems": [
{
"id": "EPSG:27700",
"displayName": "OSGB 1936 / British National Grid - United Kingdom Ordnance
Survey",
"projString": "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000
+y_0=-100000 +ellps=airy +datum=OSGB36 +units=m +no_defs",
"editorType": "EASTING_NORTHING",
"bounds": {
"north": 1300000,
"east": 700000,
"south": 0,
"west": 0
}
},
{
"id": "EPSG:26910",
"displayName": "NAD83 / UTM zone 10N",
"projString": "+proj=utm +zone=10 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0
+units=m +no_defs",
"editorType": "EASTING_NORTHING"
},
{
"id": "EPSG:3005",
"displayName": "NAD83 / BC Albers",
"projString": "+proj=aea +lat_1=50 +lat_2=58.5 +lat_0=45 +lon_0=-126
+x_0=1000000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs",
"editorType": "EASTING_NORTHING"
},
{
"id": "EPSG:32661",
"displayName": "WGS 84 / UPS North (N,E)",
zoom The map zoom level when it is first opened. Number Undefined
Examples:
"mapConfig": {
"center": {
"lat": 52.202468,
"lng": 0.142655
},
"zoom": 8,
"maxBounds": {
"north": 60.84,
"south": 49.96,
"east": 1.78,
"west": -7.56
},
"minZoom": 2,
"maxZoom": 10,
"wrap": false,
...
}
In addition, depending on your mapping requirements, it must contain one or more base maps and
can contain one or more overlays:
• baseMaps
• overlays
baseMaps
A base map is an image that forms a background over which other GIS items are overlaid. All
base maps must be hosted externally, and can be referenced by using a URL. If you would like to
reference more than one base map, you can set a default that is used when a map is requested.
maxZoom The maximum amount the map can zoom (inclusive). Number
If this option is not specified, the minimum zoom level
is calculated as the lowest minimum zoom option
available in the layer.
tileBounds The WGS84 bounds for the tiles, allowing you to Lat/
constrain the tile loading area to a specific location. Long
For example: Bounds
tileBounds: {
west: 0.72235,
east: 0.79101,
south: 52.20424,
north: 52.35211
}
defaultBaseMap If there are multiple base maps, this option identifies Boolean False
the map to load initially.
Example:
"baseMaps": [
{
"id": "Esri.DeLorme",
"displayName": "Esri Delorme",
"url": "https://server.arcgisonline.com/ArcGIS/rest/services/Specialty/
DeLorme_World_Base_Map/MapServer/tile/{z}/{y}/{x}",
"attribution": "Tiles © Esri — Copyright: © 2012 DeLorme"
}
]
overlays
An overlay is a type of map layer that is designed to provide additional information in addition to the
base map. As these images for a secondary layer over the base map, you can set the opacity of an
overlay to allow information present in the base map to be displayed.
maxZoom The maximum amount the map can zoom (inclusive). Number
If this option is not specified, the minimum zoom level
is calculated as the lowest minimum zoom option
available in the layer.
tileBounds The WGS84 bounds for the tiles, allowing you to Lat/
constrain the tile loading area to a specific location. Long
For example: Bounds
tileBounds: {
west: 0.72235,
east: 0.79101,
south: 52.20424,
north: 52.35211
}
Example:
"overlays": [
{
"id": "OpenMapSurfer_AdminBounds",
"displayName": "OpenMapSurfer Admin Bounds",
"url": "https://maps.heigit.org/openmapsurfer/tiles/adminb/webmercator/{z}/
{x}/{y}.png",
"attribution": "Imagery from [hyperlink]http://giscience.uni-hd.de/
GIScience Research Group @ University of Heidelberg[/hyperlink] | Map data ©
[hyperlink]https://www.openstreetmap.org/copyright OpenStreetMap[/hyperlink]
contributors"
}
]
coordinateSystems
The coordinateSystems object defines the coordinate reference systems that are available.
bounds The area that can be mapped by defining the coordinates of Lat/Long
two diagonally opposite corners of a rectangle. (Optional) Bounds
Example:
"coordinateSystems": [
{
"id": "EPSG:3081",
"displayName": "NAD83 / Texas State Mapping System",
"projString": "+proj=lcc +lat_1=27.41666666666667 +lat_2=34.91666666666666
+lat_0=31.16666666666667 +lon_0=-100 +x_0=1000000 +y_0=1000000 +ellps=GRS80
+datum=NAD83 +units=m +no_defs",
"editorType": "X_Y"
}
]
Source references are created or provided in different parts of the system, and you configure the
possible values in different ways depending on how the source reference enters the system.
SourceReferenceSchemaResource=source-reference-schema.xml
a) Open your XSD aware XML editor. For more information see Setting up your XSD aware XML
editor on page 316.
The associated XSD file is: toolkit\scripts\xsd\SourceReferenceSchema.xsd.
b) In your XML editor, open your source-reference-schema.xml file.
c) Using the reference and example information, modify the file to define the values that analysts
can use.
3. Update the deployment with your changes. On the Liberty server, in a command prompt navigate to
the toolkit\scripts directory:
setup -t deployLiberty
setup -t startLiberty
For example, to make a set for the source references associated with charts and the item type with
ID "ET5":
<sourceReferenceSchemaFragments>
<sourceReferenceSchemaFragment itemTypeIds="CHART,ET5">
...
</sourceReferenceSchemaFragments>
Example
In this example, analysts can create source references for charts with the source name "Local Police
Department" or "Analyst Team 1" to describe the source of the chart.
For item type ET5, for the source name, analysts can use one of the three possible values from the
selected-from list. For the source type, they can use one of the three values in the suggested-from list,
or provide their own value.
For all other item types, analysts can use any values for the name and type.
<tns:sourceReferenceSchema
xmlns:tns="http://www.i2group.com/Schemas/2019-07-05/SourceReferenceSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.i2group.com/Schemas/2019-07-05/
SourceReferenceSchema SourceReferenceSchema.xsd">
<sourceReferenceSchemaFragments>
<sourceReferenceSchemaFragment>
<sourceName logicalType="SINGLE_LINE_STRING">
</sourceName>
<sourceType logicalType="SINGLE_LINE_STRING" mandatory="false">
</sourceType>
</sourceReferenceSchemaFragment>
</sourceReferenceSchemaFragments>
</tns:sourceReferenceSchema>
Securing i2 Analyze
Depending on the requirements of the deployment, you can configure your deployment to use additional
security mechanisms.
The following diagram shows the connections in a deployment that you can use SSL to secure:
Clients
Analyst's
Investigate
Notebook
Add-on
Premium
Server 1.
2.
i2 Analyze application 3.
Solr
Liberty
4.
3.
ZooKeeper
Information Store
As part of the SSL handshake process, certificates are exchanged that are signed by a trusted
certificate authority to verify that a certificate is authentic. The environment where you are deploying i2
Analyze might already have a well-defined certificate process that you can use to obtain certificates and
populate the required key and truststores.
The examples in the following procedures use self-signed certificates to demonstrate working SSL
configurations. In a production deployment, you must use certificates that are signed by a trusted
certificate authority.
If all of the public key certificates are signed by the same certificate authority, then you can add the
certificate authority's certificate to each of your truststores. If you have a number of certificates to
authenticate trust, you might have to add multiple certificates to your truststores.
In the examples, the self-signed certificate is created in a keystore, exported, and imported to the
relevant truststore. When you configure SSL communication between components of i2 Analyze, you
must have the required certificates in the correct keystores. In the following procedures, examples
commands are provided for creating, exporting, and importing self-signed certificates.
Clients
Analyst's
Investigate
Notebook Truststore
Add-on
Premium
Server
Solr
Liberty
ZooKeeper
The environment in which you are deploying i2 Analyze might already contain files that are candidates
for use as keystores or truststores. If not, you must create the required files. The files that are required
in the sample scenario that is described in the instructions are summarized by component, in the
following list.
IBM HTTP Server
To enable SSL connections, the HTTP server requires a CMS key database (*.kdb). The
password for this key database must be saved to a stash file. A certificate in the key database
identifies the HTTP server and is used when clients connect to i2 Analyze so that they can
authenticate the HTTP server. This key database is also used as a truststore to authenticate the
certificate that it receives from Liberty.
The i2 Analyze client's truststore, usually located in the operating system or web browser, is used to
authenticate the certificate that it receives from the HTTP server.
WebSphere® Application Server Liberty
The Liberty server requires a keystore file (*.jks) and a truststore file (*.jks).
A certificate in the keystore identifies the Liberty server and is used to connect to the HTTP server.
The HTTP server key database authenticates this certificate that it receives from Liberty.
The certificate in the Liberty truststore is used to authenticate certificates that are received from the
database management system keystore and the Solr keystore.
Solr
Each Solr server requires a keystore file (*.jks) and a truststore file (*.jks). Solr requires that
all Solr certificates are available in the Solr truststores and the Liberty truststore, so that individual
nodes can trust one another, and Liberty can trust the Solr nodes. When you enable ZooKeeper to
use SSL, the Solr certificate must also be trusted by the ZooKeeper truststore.
The certificate in the Solr keystore identifies the Solr server and is used to connect to Liberty, and
ZooKeeper if it is configured for SSL.
The certificate in the Solr keystore also identifies each Solr node and is used to authenticate secure
connection within Solr itself, using the Solr truststore. The certificate in the Solr truststore is used to
authenticate the certificate that it received from the Solr keystore.
ZooKeeper
Each ZooKeeper server requires a keystore file (*.jks) and a truststore file (*.jks). ZooKeeper
requires that all ZooKeeper certificates are available in the ZooKeeper truststores and the Liberty
truststore. This enables the individual servers to trust one another, and Liberty can trust the
ZooKeeper servers. Additionally, the Solr certificate must be in the ZooKeeper truststore so that
ZooKeeper can trust Solr.
The certificate in the ZooKeeper keystore identifies the ZooKeeper server and is used to connect to
Liberty.
The certificate in the ZooKeeper keystore also identifies each ZooKeeper server and is used to
authenticate secure connections between the servers in the ZooKeeper quorum, by using the
ZooKeeper truststore. The certificate in the ZooKeeper truststore is used to authenticate the
certificate that is received from the ZooKeeper keystore.
Database management system
To enable SSL connections, the database management system requires a keystore to connect
to Liberty. The type of keystore depends on the type of database management system. For more
information about SSL in your database management system, see Configuring SSL for a Db2
instance on page 250 or Configuring SSL for Microsoft SQL Server on page 252.
The certificate in the database management system keystore identifies the database management
system server and is used to connect to Liberty.
In the following procedures, example commands are provided for creating the keystores, certificates,
and truststores to use with each component of i2 Analyze. The instructions contain details that are
based on a single-server deployment example.
The diagram shows the connection that you secure by completing the steps in the following procedures.
It also shows the key database that is required.
Clients
Analyst's
Investigate
Notebook Truststore
Add-on
Premium
i2 Analyze application
Liberty
You must modify the configuration of both the i2 Analyze application and the HTTP server to use SSL to
secure the connection.
• Save a password for the key database to a stash file by using the -stash attribute.
• Set -db location to the directory that contains the toolkit directory in your deployment.
2. Create a self-signed certificate.
For example, run the following command:
Important: Set the value of CN and san_dnsname to the fully qualified domain name for host name
of the server that hosts the HTTP server. The URL that you use to connect to i2 Analyze must use
the same value for the host name as the value of the CN. The password is the one that you saved to
the stash file in step 1.
3. Extract the certificate from the key database.
For example, run the following command:
Set the location of the certificate to the same directory as the key database.
To enable SSL connections to i2® Analyze, the certificate that you added to, or created in, the key
database must be installed to be trusted on each client workstation.
setup -t deployLiberty
setup -t configureHttpServer
The httpd.conf and plugin-cfg.xml files are updated to use the SSL configuration.
5. Start Liberty.
To start Liberty, run the following command on each Liberty server:
setup -t startLiberty
Open Analyst's Notebook Premium and connect to i2 Analyze at the URL https://host_name/opal.
host_name is the fully qualified domain name or IP address of the HTTP server, and matches the
Common Name value of the certificate.
Note: You cannot connect by using the HTTP protocol http://host_name/opal.
When you connect to i2 Analyze, the connection is secure.
If you are using self-signed certificates, add the certificate that you exported from your Liberty keystore
to the HTTP server key database. For more information, see Adding the Liberty certificate to the HTTP
key database on page 247.
Note: The host-name attribute value must match the common name that is associated with the
certificate for the application server.
b) Add the <key-stores> element as a child of the <application> element. Then, add a child
<key-store> element.
For your keystore, specify the type as key-store and file as the full path to your keystore.
For example, add the attribute as highlighted in the following code:
ssl.keystore.password=password
3. Update the application with your configuration changes. For more information, see Redeploying
Liberty on page 316.
You must secure the connection between the HTTP Server and Liberty. For more information, see
Securing the connection between the HTTP server and Liberty on page 247.
Clients
Analyst's
Investigate
Notebook Truststore
Add-on
Premium
SSL connection
Truststore
Liberty
In a production environment, import certificates into the HTTP key database to ensure that the
certificates that are received from Liberty are trusted.
Add the certificate that you exported from the Liberty keystore into the HTTP server key database.
For more information about exporting the certificate from the Liberty keystore, see Creating the Liberty
keystore and certificate on page 245.
For example, run the following command:
setup -t configureHttpServer
setup -t deployLiberty
The plugin-cfg.xml file is updated to enforce that a secure connection is available between the
HTTP server and Liberty.
6. Navigate to the IBM\HTTPServer\plugins\iap\config directory, and open the plugin-
cfg.xml file in an XML editor.
7. In each <ServerCluster> element, there is a child <Server> element. Ensure that each of these
<Server> elements has a child <Transport> element that uses the https protocol.
Update the <ServerCluster> element with the value "opal-server_cluster".
a) Add the following element to any of the child <Server> elements that do not have a child
<Transport> element that uses the HTTPS protocol.
hostname is the same as for the <Transport> element that uses the HTTP protocol, and
portnumber matches the value in the port definition properties for the application that you
are securing. You can find this value in C:\IBM\i2analyze\toolkit\configuration
\environment\opal-server.
b) Add the following <Property> elements as children of each <Transport> element that uses
the HTTPS protocol:
Where the Value attributes contain the absolute paths to the keystore for the HTTP server and
the associated password stash file.
c) Save and close the plugin-cfg.xml file.
8. Restart the HTTP server.
To ensure that the configuration is correct, look in the IBM\HTTPServer\plugins\iap\logs
\plugin-cfg.log file.
If the <Property> elements for the keyring and stashfile are not present on each <Transport>
element in your plugin-cfg.xml, the following error message is displayed:
ERROR: ws_transport:
transportInitializeSecurity: Keyring was not set.
ERROR: ws_transport:
transportInitializeSecurity: No stashfile or keyring password given.
To resolve this issue, ensure that the <Property> elements for the keyring and stashfile are
present on each <Transport> element in your plugin-cfg.xml.
Clients
Analyst's
Investigate
Notebook
Add-on
Premium
Server
Liberty
SSL connection
Note: Start the GSKCapiCmd tool by using the gskcapicmd command. Follow the details that
are provided in the Db2® documentation link for the path to the command, the required libraries,
and the command syntax.
b) Create a key database.
For example, to create the key file, run the following command:
Note: The command is a simplified version of the command in the Db2® documentation without
the O, OU, L, and C values that are not required for this example. Use a label of dbKey to align
with httpKey and libertyKey used in the HTTP server and Liberty keystores. Ensure that the
common name in the certificate matches the fully qualified domain name of the database instance
server.
3. Extract the certificate from the key database.
For example, to extract the certificate, run the following command:
5. Navigate to the toolkit/scripts directory, and run the command to stop Liberty:
setup -t stopLiberty
db2stop
db2start
®
After you configure Db2 , you can check the db2diag.log file to ensure that there are no errors with
your SSL configuration.
Configuring SSL for Microsoft™ SQL Server
To secure the connection between the i2® Analyze application server and the database instance, you
must change the configuration of both. Microsoft™ SQL Server stores its associated certificates and you
must create or obtain certificates for the Microsoft™ SQL Server to use.
In i2® Analyze, SSL connections that involve SQL Server require i2® Analyze to trust the certificate that
it receives from SQL Server. SQL Server stores certificates in the operating system's certificate stores.
In a production deployment, you must use a certificate that is signed by a trusted certificate authority. To
demonstrate a working configuration, you can create and use a self-signed certificate.
Ensure that you understand the details that are provided in the SQL Server documentation to configure
SSL for your SQL Server. For more information, on Windows™ see Enable Encrypted Connections to
the Database Engine or Linux® see Server Initiated Encryption.
Create a self-signed certificate for SQL Server.
1. For example, on Windows™ you can use the New-SelfSignedCertificate command in
PowerShell. For information, see New-SelfSignedCertificate.
Run the following command to create a certificate:
Important: Ensure that you set the value of CN to the hostname of the server where SQL Server is
located.
2. For example, on Linux® you can run the following commands by using OpenSSL:
openssl req -x509 -nodes -newkey rsa:2048 -subj '/CN=hostname' -keyout sql-
server-key.key -out sql-server-certificate.pem -days 365
sudo chown mssql:mssql sql-server-certificate.pem sql-server-key.key
sudo chmod 600 sql-server-certificate.pem sql-server-key.key
sudo mv sql-server-certificate.pem /etc/ssl/certs/
sudo mv sql-server-key.key /etc/ssl/private/
Important: Ensure that you set the value of CN to the hostname of the server where SQL Server is
located.
Export the self-signed certificate.
3. On Windows™:
a) Use the Certificates snap-in in the Microsoft™ Management Console to export the certificate from
the Local Computer user's certificates.
b) Locate the self-signed certificate in the Personal certificate store.
c) Right-click the certificate, and click All Tasks > Export. Complete the Certificate Export
Wizard to export the certificate without the private key as a DER encoded binary X.509 file. Set
the file name to i2-sqlserver-certificate.cer.
For more information about exporting the certificate, see To export the server certificate.
4. On Linux®:
a) Extract the DER certificate from the PEM file by using OpenSSL:
After you create the CER file, ensure that the file permissions are the same as the original PEM
file.
Configure SQL Server to encrypt connections.
5. On Windows™:
a) In SQL Server Configuration Manager, expand SQL Server Network Configuration, and right-click
Protocols for <instance> and click Properties.
b) In the Properties window on the Certificate tab, select your certificate from the Certificate list and
click Apply.
c) On the Flags tab, select Yes from the Force Encryption list.
d) Click OK and restart the SQL Server instance.
Note: The service account that is used to start the SQL Server instance must be have read
permissions to your certificate. By default, the service account is NT Service/MSSQLSERVER
on Windows. For more information about service accounts, see Service Configuration and Access
Control.
For more information about encrypted connections, see To configure the server to force encrypted
connections.
6. On Linux®:
a) Run the following commands to specify your certificate and key, and configure SQL Server:
You can use the same truststore that you created for Liberty. For more information, see Creating the
Liberty keystore and certificate on page 245.
For Liberty to communicate with the secured database, in the topology database element you must
specify the secure connection attribute to be true and the name of the truststore that contains the
database certificate. Also, specify the correct port number, which corresponds to the SSL port for the
database. In the credentials.properties file, the correct password for the specified truststore
must be added.
1. Create the Liberty truststore and import into the truststore the certificate that you exported from the
database management system.
For example, run one of the following commands:
If you are using Db2®:
®
If you are using Db2 :
d) In same element, ensure that the following attribute values are correct:
• The host-name attribute value must match the common name that is associated with the
certificate for the database.
• The port attribute value must match the value of the port number when you configured the
database management system for SSL.
4. Specify the truststore password in the credentials file:
a) In a text editor, open the toolkit\configuration\environment
\credentials.properties file.
b) Enter the password for the truststore that contains the certificate in the
db.infostore.truststore.password credential.
Update the application with your configuration changes. For more information, see Redeploying Liberty
on page 316.
Testing the deployment
To test the SSL connection between i2 Analyze and the database management system, connect to i2
Analyze. After you connect, ensure that you can create, browse, and search for data.
1. Connect to your data store by using one of the supported clients. For more information, see
Connecting clients.
2. Create, browse, and search for data to ensure that the database connection is working.
The certificate in the Solr keystore is also used for authentication within Solr itself, using the Solr
truststore. The certificate in the Solr truststore is used to authenticate the certificate that it received from
the Solr keystore.
The certificate in the ZooKeeper keystore is used to identify the ZooKeeper server. The certificate in the
Liberty truststore is used to authenticate certificates that are received from the ZooKeeper keystore.
The certificate in the ZooKeeper keystore is also used for authentication within a ZooKeeper quorum, by
using the ZooKeeper truststore. The certificate in the ZooKeeper truststore is used to authenticate the
certificate that it received from the ZooKeeper keystore.
The certificate in the ZooKeeper truststore is used to authenticate certificates that are received from the
Solr keystore.
The diagram shows the connection that you can secure by completing the following instructions. It also
includes the keystores and truststores that are required for a single server.
Clients
Analyst's
Investigate
Notebook
Add-on
Premium
Liberty
SSL connections
Keystore Truststore
ZooKeeper
Solr stores its associated certificates in keystore and truststore files (.jks). The certificate in the Solr
keystore identifies the server that Solr is deployed on. This certificate is checked against the certificate
in the Liberty truststore when Liberty attempts to connect to Solr.
To ensure that communication between the i2 Analyze application server and the Solr index is secured,
create a keystore and truststore for Solr.
The following steps use a self-signed certificate. In a production environment, use or request a signed
certificate for Liberty from a certificate authority. Place this certificate in the Liberty keystore.
For Solr to work correctly, there must be a keystore for each server on which Solr is deployed. Each
keystore contains a certificate that identifies the server. Solr also requires that all Solr certificates are
available in the Solr truststore and the Liberty truststore, so that individual nodes can trust one another,
and Liberty can trust the Solr nodes.
Depending on the topology of your deployment, you might need to create a separate keystore and
truststore on each Solr server.
1. Create a keystore and export a certificate for Solr by using the Java keytool utility.
For more information, see keytool - Key and Certificate Management Tool.
a) Create a keystore and a certificate.
For example, run the following command:
Enter the password that you set for the keystore in the previous step.
2. Create the Solr truststore and import the required certificates.
If you are using a self-signed certificate, import the certificate that you exported from the Solr
keystore in step 1b.
For example, run the following command:
Enter the password that you set for the keystore in the previous step.
2. Create the ZooKeeper truststore and import the required certificates.
You must complete this for each server on which ZooKeeper is deployed.
a) If you are using a self-signed certificate, import the certificate that you exported from the
ZooKeeper keystore in step 1b.
setup -t stopLiberty
Ensure that the Liberty instance is stopped, otherwise you encounter an error if you try to run the
command when you complete the configuration changes.
b) Stop ZooKeeper.
To stop ZooKeeper, run the following command on every server where ZooKeeper is running:
Here, zookeeper.host-name is the hostname of the ZooKeeper server where you are running the
command, and matches the value for the host-name attribute of a <zkhost> element in the
topology.xml file.
Modify the configuration on the Liberty server.
2. Create the truststore and import the certificate that you exported from the Solr and ZooKeeper
keystores into the truststore.
This truststore can be the truststore that you created for the certificate that you exported from the
database management system. Alternatively, you create a new truststore if the file does not exist.
For example, run the following commands:
c) Add the key-store and trust-store attributes to either the <solr-cluster> or the
<solr-node> element.
Add the attribute values as defined:
key-store
The path to the Solr keystore. For more information, see Creating Solr keystores and certificates on
page 256. Reference step 1a.
trust-store
The path to the Solr truststore. For more information, see Creating Solr keystores and certificates on
page 256. Reference step 2a.
For example, add the attribute as highlighted in the <solr-clusters> element:
Note: The host-name attribute value must match the common name that is associated with the
certificate for Solr. For more information, see Creating Solr keystores and certificates on page
256. Reference step 1a.
5. Modify the topology.xml file to specify SSL for its ZooKeeper connection.
a) In an XML editor, open the toolkit\configuration\environment\topology.xml file.
b) In the <zookeeper> element for the ZooKeeper host that you want to connect to with SSL, add
the secure-connection attribute with the value of true.
For example, add the attribute as highlighted in the following code:
c) Add the key-store and trust-store attributes to either the <zookeeper> or to the
<zkhost> element.
Add the attribute values as defined:
key-store
The path to the ZooKeeper keystore. For more information, see Creating ZooKeeper keystores and
certificates on page 258. Reference step 1a.
trust-store
The path to the ZooKeeper truststore. For more information, see Creating ZooKeeper keystores and
certificates on page 258. Reference step 2a.
• For example, add the attribute as highlighted in the <zookeeper> element:
Note: The host-name attribute value must match the common name that is associated with
the certificate for ZooKeeper. For more information, see Creating ZooKeeper keystores and
certificates on page 258. Reference step 1a.
6. Modify the topology.xml file to add the Liberty truststore.
Add a child <key-store> element. For your keystore, specify the type as trust-store and
file as the full path to your truststore.
For example, add the element as highlighted in the following code:
solr.truststore.password=password
solr.keystore.password=password
c) Enter the passwords for the ZooKeeper keystore and truststore that you specified in the topology
file.
zookeeper.truststore.password=password
zookeeper.keystore.password=password
d) Enter the password for the Liberty truststore that you specified in the topology file.
ssl.truststore.password=password
8. Copy the toolkit\configuration from the Liberty server, to the toolkit directory of the
deployment toolkit on each server in your environment.
Update the application with your configuration changes. Run the following commands from the
toolkit\scripts directory on the Liberty server.
9. Redeploy Liberty to update the application:
setup -t deployLiberty
10.Recreate the ZooKeeper host on each server where your ZooKeeper hosts are located:
Where zookeeper.host-name is the hostname of the ZooKeeper server where you are running
the command, and matches the value for the host-name attribute of a <zkhost> element in the
topology.xml file.
11.Start ZooKeeper.
To start ZooKeeper, run the following command on every server where your ZooKeeper hosts are
located:
Where zookeeper.host-name is the hostname of the ZooKeeper server where you are running
the command, and matches the value for the host-name attribute of a <zkhost> element in the
topology.xml file.
12.Upload the new Solr configuration to ZooKeeper:
Where liberty.hostname is the hostname of the Liberty server where you are running the
command, and matches the value for the host-name attribute of the <application> element in
the topology.xml file.
13.Restart the Solr nodes.
To restart the Solr nodes, run the following command on every server where Solr is running:
Where solr.host-name is the host name of the Solr server where you are running the command, and
matches the value for the host-name attribute of a <solr-node> element in the topology.xml
file.
14.Start Liberty.
setup -t startLiberty
Connect to your Information Store. For more information, see Connecting i2 Analyst's Notebook
Premium to i2 Analyze.
Search for data to ensure that the Solr connection is working.
Clients
Analyst's
Notebook
Premium
Server
Liberty
Truststore Keystore
Connector
The Liberty server requires a keystore file and a truststore file. Your connector can use any
implementation for its keystore and truststore. The certificates in each truststore must trust the
certificates received from the corresponding keystore.
The certificates that are required are as follows, where certificate authority (CA) X issues the certificates
to the connector (the server) and Liberty (the client):
The connector requires:
• In its keystore:
• The personal certificate issued to the connector by CA X
• The connector's private key
• In the truststore:
• The CA certificate for CA X
Liberty requires:
• In its keystore:
• The personal certificate issued to Liberty by CA X
<application http-server-host="true"
name="opal-server" host-name="hostname">
...
<key-stores>
<key-store type="key-store"
file="C:/IBM/i2analyze/i2-liberty-keystore.jks"/>
<key-store type="trust-store"
file="C:/IBM/i2analyze/i2-liberty-truststore.jks"/>
</key-stores>
...
</application>
b) Update the base-url attribute of any connectors using SSL to use the HTTPS protocol.
For example:
<connectors>
<connector id="example-connector" name="Example"
base-url="https://localhost:3700/" />
</connector>
Note: Ensure that the hostname that is used in the base URL matches the common name on the
certificate of the connector.
2. Specify the keystore and truststore passwords in the credentials file.
a) In a text editor, open the toolkit\configuration\environment
\credentials.properties file.
b) Enter the password for the keystore and truststore that you specified in the topology.xml file.
ssl.keystore.password=password
ssl.truststore.password=password
setup -t stopLiberty
setup -t deployLiberty
6. Start Liberty:
setup -t startLiberty
You can create your own connectors to use with the deployment of i2 Analyze, when you create your
own connector you can implement security that conforms to the security required by the i2 Connect
gateway. For more information about creating your own connectors, see i2 Analyze and i2 Connect.
When you use a connector configured for SSL communication, you should not see any warnings
displayed in Analyst's Notebook Premium.
Create a keystore for the connector that contains a signed certificate that is used to authenticate with
Liberty.
2. Create a keystore and certificate for the connector by using the IBM Key Management Utility
command-line interface.
For more information about the IBM Key Management Utility, see Key Management Utility command-
line interface (gskcmd) syntax.
a) In a command prompt, navigate to the bin directory of your IBM HTTP Server installation. For
example: IBM\HTTPServer\bin or IBM\HTTPServer\IHS\bin.
b) Create a keystore in PKCS12 format. To create the keystore, run the following command:
Important: Set the value of CN to the hostname of the server that hosts the connector, as
defined in the topology.xml file. By default in the daod-opal example configuration, the value
is localhost.
3. The example connector implementation requires the private key in the PEM format, and the
certificate for the connector.
a) Export the personal certificate and private key into a PKCS12 file by running the following
command:
b) Convert the private key from the PKCS12 format to the PEM format. You can use OpenSSL to do
this. For more information about OpenSSL see https://www.openssl.org/source/. If you are using
OpenSSL, you can run the following command:
When you are prompted for a password, provide the password that you specified when
you created the example-connector-key.p12 PKCS12 file. In this example, enter
connectorKeyPassword.
c) Extract the certificate for the connector by running the following command:
Create a truststore for Liberty that contains the certificate that is used to trust the certificate that it
receives from the connector.
4. Create the Liberty truststore and populate it with the connector's certificate by using the Java keytool
by running the following command:
5. Populate the connector keystore with Liberty's certificate by using the IBM Key Management Utility
by running the following command:
6. The example connector implementation requires the certificate that is used to trust Liberty to also
be present in the example-connector directory. Copy the C:\IBM\i2analyze\i2-liberty-
certificate.cer file to the C:\IBM\i2analyze\toolkit\examples\connectors
\example-connector directory.
7. If you are using the IBM HTTP Server, start or restart it.
The certificates are now in place to enable client authentication SSL between Liberty and the connector.
8. Configure the example connector to reference the certificates that you created, and the hostname of
the gateway. Using a text editor, modify the security-config.json file with the following values.
https
Set to true to use the HTTPS protocol to connect to the connector.
keyFileName
The file name for the private key of the connector in PEM format. For this example, example-
connector-key.pem.
keyPassphrase
The password that is required to access the key file specified in keyFileName. For this example,
connectorKeyPassword.
certificateFileName
The file name for the certificate of the connector. For this example, example-connector-
certificate.cer.
certificateAuthorityFileName
The file name of the certificate that enables trust of the certificate that is received from Liberty. For
this example, i2-liberty-certificate.cer.
gatewayCN
The common name of the gateway. This must be the value of the common name in the certificate
the connector receives from Liberty. You specified the value of the CN in the certificate in step 1.
Configure Liberty to use the keystore and truststore that you created.
9. In an XML editor, open the toolkit\configuration\environment\topology.xml file.
a) Add the <key-stores> element as a child of the <application> element. Then, add child
<key-store> elements.
For your keystore, specify the type as key-store, and file as the full path to your keystore.
For your truststore, specify the type as trust-store, and file as the full path to your
truststore.
<application http-server-host="true"
name="opal-server" host-name="hostname">
...
<key-stores>
<key-store type="key-store"
file="C:/IBM/i2analyze/i2-liberty-keystore.jks"/>
<key-store type="trust-store"
file="C:/IBM/i2analyze/i2-liberty-truststore.jks"/>
</key-stores>
...
</application>
b) Update the base-url attribute of any connectors using SSL to use the HTTPS protocol.
For example:
<connectors>
<connector id="example-connector" name="Example"
base-url="https://localhost:3700/" />
</connector>
Note: Ensure that the hostname that is used in the base URL matches the common name on the
certificate of the connector.
10.Specify the keystore and truststore passwords in the credentials file.
a) In a text editor, open the toolkit\configuration\environment
\credentials.properties file.
b) Enter the password for the keystore and truststore that you specified in the topology.xml file.
ssl.truststore.password=libertyTrustStorePassword
ssl.keystore.password=libertyKeyStorePassword
setup -t deployLiberty
npm start
Note: The example connector uses port number 3700. Ensure that no other processes are using
this port number before you start the connector.
14.Start Liberty:
setup -t startLiberty
15.If you are using the IBM HTTP Server, start or restart it.
Use Analyst's Notebook Premium to connect to your deployment. For more information, see Connecting
i2 Analyst's Notebook Premium to i2 Analyze.
Now that the example connector is configured for SSL, no warnings are displayed in Analyst's Notebook
Premium.
setup -t addI2Connect
setup -t deploy
4. Start i2 Analyze.
setup -t start
5. Start, or restart, the HTTP server that hosts the reverse proxy.
When you start i2 Analyze, the URI that users must specify in Analyst's Notebook Premium is
displayed in the console. For example, This application is configured for access on
http://host_name/opal.
Install Analyst's Notebook Premium and connect to your deployment. For more information, see
Installing i2 Analyst's Notebook Premium and Connecting i2 Analyst's Notebook Premium to i2 Analyze.
Note: The example connector does not use client authenticated SSL communication between i2
Analyze and any connectors, so a warning is displayed in Analyst's Notebook Premium. For more
information about configuring client-authenticated SSL, see Client authenticated Secure Sockets Layer
with the i2 Connect gateway.
Backing up a deployment
In a production deployment of i2 Analyze, use a tested backup and restore procedure to ensure that
you can recover from failures. The various components of i2 Analyze must be backed up for complete
backup coverage.
database. Regardless of the mechanism that you use to back up the components, you must take the
Solr backup before the database backup.
The configuration and Solr backups can be completed while the deployment is running. For more
information, see:
• Back up and restore Solr on page 280
• Back up and restore the configuration on page 279
When you use the toolkit task to back up the database, you must stop the deployment first. To perform
an online backup of the database, see the documentation for your database management system.
For more information about backing up the database using the toolkit, see: Back up and restore the
database on page 281.
In a deployment that contains the i2 Connect gateway only, you only need to backup the i2 Analyze and
Liberty configuration.
Run each toolkit command from a Liberty server in your deployment.
Procedure
1. In the environment.properties file, specify the locations where the toolkit creates the backup
files.
backup.config.location.dirThe location where the i2 Analyze configuration backups
are created and restored from. This location must exist on the Liberty server where you run the
backupConfiguration command.
backup.solr.location.dirThe location where the Solr index and ZooKeeper configuration
backups are created and restored from. This location must be accessible by every Solr node in your
deployment.
backup.db.location.dirThe location where the database backups are created and restored
from. The user that is specified in the credentials.properties file for your database
management system must have write permissions to this location. This location must be accessible
on the database server.
For example:
backup.config.location.dir=C:/backup/configuration
backup.solr.location.dir=D:/network-drive/solr/backup
backup.db.location.dir=D:/backup/database
2. To back up the i2 Analyze and Liberty configuration, run the following command from a Liberty
server.In a deployment with high availability, run this command on each Liberty server.
setup -t backupConfiguration
3. To back up the Solr search index, run the following command from a Liberty server:
setup -t backupSolr
4. To back up the database in your deployment by using the toolkit task, you must stop the application
first, and ensure that there are no other connections to the database, then run the backup command.
In a deployment that contains multiple Liberty servers, you must run the stopLiberty command on
every Liberty server.
setup -t stopLiberty
setup -t backupDatabases
setup -t startLiberty
Procedure
1. Use the validateBackups to ensure that the timestamps of the backups you are restoring are
compatible.This task compares the timestamps of the backups created by the toolkit to ensure that
the configuration backup is the earliest, and that the Solr backup precedes the database backup.
2. Before you restore the deployment, stop Liberty. In a deployment that contains multiple Liberty
servers, you must run the stopLiberty command on every Liberty server.
setup -t stopLiberty
3. To restore the database in your deployment, run the following command from a Liberty server:
The database backup is identified in the location from the backup.db.location.dir setting by
using the timestamp that was specified in the command. The timestamp value in the command must
match the timestamp of a backup in the location.
For example: setup -t restoreDatabases -p timestamp=20210420145332.
4. To restore the Solr search index, run the following command from a Liberty server:
Where liberty.hostname is the hostname of the Liberty server where you are running the
command, and matches the value for the host-name attribute of the <application> element in the
topology.xml file.
The Solr backup is identified in the location from the backup.solr.location.dir setting by
using the timestamp that was specified in the command. The timestamp value in the command must
match the timestamp of a backup directory in the location.
For example: setup -t restoreSolr -p timestamp=20210420145332 --hostname
<liberty.hostname>.
5. To restore the i2 Analyze and Liberty configuration, run the following command from a Liberty server:
setup -t startLiberty
Result
Your deployment of i2 Analyze is restored to the point in time of the backups that you specified.
Procedure
1. Create the environment to restore your deployment in to. If your servers were lost due to a disaster,
you might need to recreate them with the required prerequisites. For more information about the
servers that are used in a deployment of i2 Analyze, see Deployment topologies.
2. Use the validateBackups to ensure that the timestamps of the backups you are restoring are
compatible.This task compares the timestamps of the backups created by the toolkit to ensure that
the configuration backup is the earliest, and that the Solr backup precedes the database backup.
5. Before you restore the deployment, stop Liberty. In a deployment that contains multiple Liberty
servers, you must run the stopLiberty command on every Liberty server.
setup -t stopLiberty
6. To restore the database in your deployment, run the following command from a Liberty server:
The database backup is identified in the location from the backup.db.location.dir setting by
using the timestamp that was specified in the command. The timestamp value in the command must
match the timestamp of a backup in the location.
For example: setup -t restoreDatabases -p timestamp=20210420145332.
7. To restore the Solr search index, run the following command from a Liberty server:
Where liberty.hostname is the hostname of the Liberty server where you are running the
command, and matches the value for the host-name attribute of the <application> element in the
topology.xml file.
The Solr backup is identified in the location from the backup.solr.location.dir setting by
using the timestamp that was specified in the command. The timestamp value in the command must
match the timestamp of a backup directory in the location.
For example: setup -t restoreSolr -p timestamp=20210420145332 --hostname
<liberty.hostname> .
8. Deploying to new Liberty servers.
a. To restore the i2 Analyze and Liberty configuration, run the following command from a Liberty
server:
setup -t startLiberty
Result
Your deployment of i2 Analyze is restored to new servers after a disaster.
Intro
• The configuration directory and some files from the Liberty deployment are backed up. These
files enable you to recreate a deployment from the backup files in the same configuration.
• The system can be running when you back up the configuration. The system must be stopped when
you restore the configuration.
• In a distributed deployment of i2 Analyze, the configuration directory should be identical on each
server. Therefore, you only need to back up the configuration from the Liberty server.
Backup location
In the environment.properties file, the backup.config.location.dir setting is the location
where the configuration backups are created and restored from when you use the toolkit tasks. This
location must exist on the Liberty server where you run the toolkit tasks.
Toolkit tasks
• backupConfiguration
• The backupConfiguration toolkit task backs up the i2 Analyze and Liberty configuration to the
backup.config.location.dir location.For example:
setup -t backupConfiguration
• restoreConfiguration
• The restoreConfiguration toolkit task restores the i2 Analyze and Liberty configuration
from a backup. The restoreConfiguration also deploys Liberty. You must pass
the timestamp parameter to the toolkit task with a timestamp for a backup in the
backup.config.location.dir location.For example:
If you are recovering from a disaster scenario where you must create new Liberty servers, you must
copy the i2 Analyze deployment toolkit configuration from the backup to your new Liberty servers
before you run the restoreConfiguration task.
Backup files
When the backupConfiguration task is run, a directory is created named i2a-config-
<timestamp> where <timestamp> is the timestamp of when the toolkit task was run. The directory
contains the files of the i2 Analyze and Liberty configuration and a file that shows the version of i2
Analyze. For example:
- i2a-config-20210420145332
- version.txt
- liberty-app-config
- liberty-shared-config
- toolkit-config
Intro
• The non-transient Solr collections are backed up and restored by using the provided toolkit tasks.
The following collection types are non-transient:
• main_index
• match_index
• chart_index
• You must take the Solr backup before a database backup, and you must restore Solr after you
restore the database.
Backup location
In the environment.properties file, the backup.solr.location.dir setting is the location
where the Solr index and ZooKeeper configuration backups are created and restored from. This location
must be accessible by every Solr node in your deployment.
Toolkit tasks
• backupSolrThe backupSolr toolkit task backs up the Solr index and ZooKeeper configuration to
the backup.solr.location.dir location.For example:
setup -t backupSolr
• restoreSolrThe restoreSolr toolkit task restores the Solr index and ZooKeeper configuration
from a backup. You must pass the timestamp parameter to the toolkit task with a value for a
backup in the backup.solr.location.dir location.You must also specify the hostname
parameter, with the hostname of the Liberty server where you are running the command, and that
matches the value for the host-name attribute of the <application> element in the topology.xml
file. For example:
Backup files
When the backupSolr task is run, a directory is created named i2a-solr-<collection_name>-
<timestamp> where <timestamp> is the timestamp of when the toolkit task was run and
<collection_name> the name of the Solr collection backed up. A directory is created for each non-
transient Solr collection, with each collection containing a back up. For example:
- i2a-solr-chart_index-20210420145332
- backup.properties
- snapshot.shard1
- snapshot.shard2
- zk_backup
- i2a-solr-main_index-20210420145332
- backup.properties
- snapshot.shard1
- snapshot.shard2
- zk_backup
- i2a-solr-match_index1-20210420145332
- backup.properties
- snapshot.shard1
- snapshot.shard2
- zk_backup
- i2a-solr-match_index2-20210420145332
- backup.properties
- snapshot.shard1
- snapshot.shard2
- zk_backup
Intro
• The backupDatabases toolkit task completes a full backup of the database. The application must
be stopped before you can back up the database and there must be no active connections to the
database.
• Refer to your database management system documentation to learn how to ensure that there are
no connections to your database before you create the backup.
• The toolkit task is aimed for smaller deployments where the system can be offline while the database
backup is completed. If you have a larger deployment or don't want the system to be offline at all,
you can back up the database manually. To back up and restore the database manually refer to the
documentation for your database management system:
Backup location
In the environment.properties file, the backup.db.location.dir setting is the location
where the database backup are created and restored from. This location must be accessible from the
database server. The user that is specified in the credentials.properties file for your database
management system must have write permissions to this location.
Toolkit tasks
• backupDatabasesThe backupDatabases toolkit task backs up the database to the
backup.db.location.dir location.For example:
setup -t backupDatabases
Backup files
A backup file is created in the location specified in backup.db.location.dir. The file name
includes a timestamp of when the backup is created. For example:
• On Db2: ISTORE.0.db2inst1.DBPART000.20210604144506.001
• On SQL Server: ISTORE.20210604152638.bak
You can verify the backup file was correctly created by running the following commands on your
database server:
• On Db2: db2ckbkp <backup file>
• On SQL Server:
operation includes, but is not limited to, components of the application continuing to function after a
subset of their resources is taken offline.
In summary, when a server, or number of servers, fail in your environment you need to ensure that the
system continues to function. To do so, the process involves:
• Detecting server failure
• Ensuring that only the functional servers are used to process requests
• Recovering the failed servers
• Configuring the environment to use the recovered servers again
The implementation of this process is different for each component of i2 Analyze, the software
prerequisites that are used, and the operating system that the component is deployed on. For some
components of i2 Analyze, the process is automatic but you should understand the process and what
state the environment is in.
The following information outlines the process for each component in a deployment of i2 Analyze.
Liberty
In a deployment that provides high availability, multiple Liberty servers are used to provide active/active
availability. The active/active pattern allows multiple Liberty servers to allow connections from clients
and continue to function when some of the Liberty servers fail.
To detect if a Liberty server has failed, you can use logs in your load balancer. In the load balancer,
monitor the response where you use the health/live endpoint to determine which Liberty servers to
route clients to. If the response from the health/live endpoint for a particular Liberty server is 503,
that Liberty server may have failed.
For more information about the endpoint and its responses, see The health/live endpoint on page 285.
The Liberty server can return a 503 if it is not started, in the startup process, or there is a temporary
loss of connection to another component. Liberty can restart as a non-leader if it temporarily loses the
connection to a component and loses leadership. If this is the reason for the 503, the following message
is displayed in the IBM_i2_Component_Availability.log file:
For more information about the log file, see Monitor the system availability on page 289.
Automatic failover
If a Liberty server fails, the other Liberty servers continue to function as usual. If a client was connected
to the failed Liberty server, the analyst might have to log out from the client and log in again for the load
balancer to route the request to one of the live Liberty servers.
If it was the leader Liberty server that failed, one of the remaining Liberty servers is elected leader. For
more information about the leadership process, see Liberty leadership configuration on page 284.
There are a number of reasons why a server might fail. Use the logs from the failed server to diagnose
and solve the issue. For example, you might need to restart the server, increase the hardware
specification, or replace hardware components.
• The Liberty logs are in the deploy\wlp\usr\servers\opal-server\logs directory.
• The deployment toolkit logs are in the toolkit\configuration\logs directory.
For more information about the different log files and their contents, see Deployment log files.
On the recovered Liberty server, run setup -t startLiberty to restart the server and i2 Analyze
application.
You can use the load balancer logs to ensure that the Liberty server is now returning 200 from the
health/live endpoint.
The leader Liberty can complete actions that require exclusive access to the Information Store
database. These actions include;
• Initiating Solr indexing
• Initiating online upgrade tasks
• Initiating alerts for saved Visual Queries
• Initiating deletion-by-rule jobs
Leadership state
ZooKeeper is used to determine and manage the leadership state of the Liberty servers in a
deployment. Each Liberty server polls ZooKeeper for the current leadership state. If there is no leader,
the Liberty attempts to become the leader. One Liberty will successfully become the leader.
Leadership poll
To become a leader, and to maintain leadership, Liberty must be able to connect to ZooKeeper, Solr,
and the database management system. The connection to each component is checked whenever the
leadership state is polled. If any of the components cannot be connected to, the Liberty either releases
its leadership or cannot run for leader if there is not currently a leader.
The leadership poll schedule is independent on each Liberty server. If the current leader releases its
leadership, there is no leader until the leadership poll runs on another Liberty that can then assume
leadership status.
The leadership poll is completed on a configurable schedule. By default, the
leadership state is checked every five minutes. To change the interval, update
the value for the ServerLeadershipReconnectIntervalMinutes setting in
DiscoServerSettingsCommon.properties.
For example:
ServerLeadershipReconnectIntervalMinutes=5
Error thresholds
The leadership state can change if a certain number of errors are encountered during indexing
operations within a specified time span. These types of operations are run only on the leader Liberty.
The maximum number of errors that can occur, and the time span in which they can occur, are
configurable.
The MaxPermittedErrorsBeforeLeadershipRelease setting defines
the maximum number of errors that can occur in the time specified by the
TimeSpanForMaxPermittedErrorsBeforeLeadershipReleaseInSeconds setting. By default,
up to five errors can occur within 30 seconds before the Liberty relinquishes its leadership.
For example:
MaxPermittedErrorsBeforeLeadershipRelease=5
TimeSpanForMaxPermittedErrorsBeforeLeadershipReleaseInSeconds=30
Restarting limits
If the leader Liberty releases leadership, it attempts to restart in non-leader mode. If Liberty cannot
connect to each component, it fails to start. By default, it tries to start three times before failing. To
start Liberty on this server again, resolve the issue that is stopping Liberty connecting to all of the
components and run setup -t startLiberty.
You can change the maximum number of times that Liberty attempts to start before it fails. To change
the number of attempts, update the value of the ServerLeadershipMaxFailedStarts setting in
DiscoServerSettingsCommon.properties.
ServerLeadershipMaxFailedStarts=3
There are a number of different ways to detect if there has been a database failure.
• The IBM_i2_Component_Availability.log contains messages that report the connection
status for the Information Store database. For more information about the messages that are specific
to the database, see Database management system messages on page 291.
There are also a number of different tools that are specific to your database management system.
• Db2:
• Db2 fault monitor on Linux
• Heartbeat monitoring in clustered environments
• Monitoring Db2 High Availability Disaster Recovery (HADR) databases
For more information about detecting failures for Db2, see Detecting an unplanned outage.
• SQL Server
• To understand when SQL Server initiates failover, see How Automatic Failover Works.
When the primary server fails, the system must fail over and use the remaining servers to complete
operations.
• Db2
• If you are using an automated-cluster controller, failover to a standby instance is automatic. For
more information, see .
• If you are using client-side automatic client rerouting, then you must manually force a standby
instance to become the new primary. For more information about initiating a takeover, see
Performing an HADR failover operation.
• SQL Server
• When SQL Server is configured for high availability in an availability group with three servers,
failover is automatic. For more information about failover, see Automatic Failover.
There are a number of reasons why a server might fail. You can use the logs from the database server
to diagnose and solve the issue. For example, you might need to restart the server, increase the
hardware specification, or replace hardware components that caused the issue.
When the server is back online and functional, you can recover it to become the primary server again.
Alternatively, you can recover it to become the standby for the new primary that you failed over to.
This might include recovering a back up of the Information Store database from before the server failed.
• Db2
• Some toolkit tasks only work on the original primary database server when you are not using
an automated-cluster controller such as TSAMP. To use those toolkit tasks, you must revert to
using the original primary database server after a failure or redeploy your system to use the new
primary database server.
• For more information about the process to make the recovered database the primary again,
see Reintegrating a database after a takeover operation.
• To redeploy with the new primary, update the topology.xml on each Liberty to reference
the new server in the host-name and port-number and redeploy each Liberty.
• SQL Server
• On SQL Server it is not required to return to the previous primary, however you might choose
to do so. You can initiate a planned manual failover to return the initial primary server. For more
information, see Planned Manual Failover (Without Data Loss).
Solr
In a deployment that provides high availability, i2 Analyze uses SolrCloud to deploy Solr with fault
tolerance and high availability capabilities. Solr uses an active/active pattern that allows all Solr servers
to respond to requests.
To detect if there has been a Solr failure, you can use the following mechanisms:
• The IBM_i2_Component_Availability.log contains messages that report the status of the
Solr cluster. For more information about the messages that are specific to Solr, see Solr messages
on page 289.
• The Solr Web UI displays the status of any collections in a Solr cluster. Navigate to the following
URL in a web browser to access the UI: http://localhost:8983/solr/#/~cloud?
view=graph. Where localhost and 8983 are replaced with the hostname and port number of one of
your Solr nodes.
Automatic failover
Solr continues to update the index and provide search results while there is at least one replica
available for each shard.
There are a number of reasons why a server might fail. Use the logs from the server to diagnose and
solve the issue.
Solr logs:
• The Solr logs are located in the i2analyze\deploy\solr\server\logs\8983 directory on
each Solr server. Where 8983 is the port number of the Solr node on the server.
• The application logs are in the deploy\wlp\usr\servers\opal-server\logs directory,
specifically the IBM_i2_Solr.log file.
For more information about the different log files and their contents, see Deployment log files.
To recover the failed server you might need to restart the server, increase the hardware specification, or
replace hardware components.
On the recovered Solr server, run setup -t startSolrNodes -hn solr.hostname to start the
Solr nodes on the server.
You can us the IBM_i2_Component_Availability.log and Solr Web UI to ensure that the nodes
start correctly and that the cluster returns to its previous state.
ZooKeeper
In a deployment that provides high availability, ZooKeeper is used to manage the Solr cluster and
maintain Liberty leadership information. ZooKeeper uses an active/active pattern that allows all
ZooKeeper servers to respond to requests.
The ZooKeeper quorum continues to function while more than half of the total number of ZooKeeper
hosts in the quorum are available. For example, if you have three ZooKeeper servers, your system can
sustain one ZooKeeper server failure.
There are a number of reasons why a server might fail. Use the logs from the server to diagnose and
solve the issue.
ZooKeeper logs:
• The ZooKeeper logs are located in the i2analyze\data\zookeeper\8\ensembles\zoo
\zkhosts\1\logs directory on the failed ZooKeeper server. Where 1 is the identifier of the
ZooKeeper host on the server.
• The application logs are in the deploy\wlp\usr\servers\opal-server\logs directory.
For more information about the different log files and their contents, see Deployment log files.
To recover the failed server you might need to restart the server, increase the hardware specification, or
replace hardware components.
•
INFO ApplicationStateHandler - I2ANALYZE_STATUS:0065 - Application is
entering leader mode
If the Liberty is not elected leader, the following messages are displayed:
•
INFO ServerLeadershipMonitor - We are not the Liberty leader
•
INFO ApplicationStateHandler - I2ANALYZE_STATUS:0063 - Application is
entering non-leader mode to serve requests
Component messages
If all of the components are available to the Liberty server, the following message is displayed:
•
INFO ComponentAvailabilityCheck - All components are available
If at least one of the components in not available to the Liberty server, the following message is
displayed:
•
WARN ComponentAvailabilityCheck - Not all components are availale
When the Liberty server continues to check the availability of each component, the following
message is displayed:
•
INFO ComponentAvailabilityCheck - I2ANALYZE_STATUS:0068 - The
application is waiting for all components to be available
Solr messages
If the Liberty server cannot connect to the Solr cluster, the following message is displayed:
•
WARN ComponentAvailabilityCheck - Unable to connect to Solr cluster
WARN ComponentAvailabilityCheck - The Solr cluster is unavailable
The following messages describe the state of the Solr cluster in the deployment:
• ALL_REPLICAS_ACTIVE - The named Solr collection is healthy. All replicas in the collection are
active.
For example:
• DEGRADED - The named Solr collection is degraded. The minimum replication factor can be
achieved, but at least one replica is down or failed to recover.
For example:
When the status is DEGRADED, data can still be written to the Solr index. When the status returns
to ALL_REPLICAS_ACTIVE, the data is synchronized in the Solr index as described in Data
synchronization in i2 Analyze on page 291.
They deployment can still be used with the collection in a degraded state, however the
deployment can now sustain fewer Solr server failures. If a degraded state is common, or lasts
for an extended time, you should investigate the Solr logs to improve the stability of the system.
• RECOVERING - The named Solr collection is recovering. The minimum replication factor cannot
be achieved, because too many replicas are currently in recovery mode.
For example:
If all of the replicas recover, the status changes to ALL_REPLICAS_ACTIVE. If the replicas fail to
recover, the status changes to DEGRADED or DOWN.
• DOWN - The named Solr collection is down. The minimum replication factor cannot be achieved
because too many replicas are down or have failed to recover.
For example:
When the status is DOWN, data cannot be written to the Solr index. You should attempt to resolve
the issue. For more information, see Solr on page 287.
• UNAVAILABLE - The named Solr collection is unavailable. The application cannot connect to the
collection.
For example:
When the status is UNAVAILABLE, data cannot be written to the Solr index. You should attempt
to resolve the issue. For more information, see Solr on page 287.
ZooKeeper status messages
If the connection to ZooKeeper is lost, the following messages are displayed:
When the connection to the database is restored, the all components are active message is
displayed.
Database management system messages
If the connection to the database is lost, the following messages are displayed:
When the connection to the database is restored, the all components are active message is
displayed.
If at least on replica for a shard is unavailable, the collection is marked as unhealthy. If there are data
changes in the database during this time, the Solr index is updated on the replicas that are available
while the minimum replication factor is achieved.
When all of the replicas are available and the Solr collection is marked as healthy again, all of the
data changes that occurred when it was in the unhealthy state are reindexed. This means that all data
changes are indexed on every replica.
Implementation details
The values that i2 Analyze uses to record the state of the deployment are called watermarks. Whenever
data is changed in the database, the data change is then indexed in Solr. When Solr has completed
indexing the changed data, the watermarks are updated. The watermarks are stored in the database
and ZooKeeper. Two watermarks are maintained, a high watermark and a low watermark.
High watermark
The high watermark is updated in both locations when Solr achieves the minimum replication
factor. That is, when the index is updated on a specified number of replicas for a shard. When
you configure Solr for high availability, you specify the minimum replication factor for each Solr
collection.
Low watermark
The low watermark is updated in both locations when Solr achieves the maximum replication factor.
That is, when the index is updated on all of the replicas for a shard. The low watermark is only
updated after the high watermark, the low watermark cannot have a higher value than the high
watermark.
Configuration resources
The configuration resources section contains information that is referenced elsewhere in the
configuration section. The section also includes the reference documentation for the configuration files
that are in the i2 Analyze deployment toolkit.
names and passwords that you provide allow the system to set up and administer components of i2
Analyze, and are not used to access i2 Analyze.
Three types of credentials are stored in credentials.properties:
• Database
• LTPA keys
• Solr search platform
After you specify the credentials, you might need to change the values in the
credentials.properties file from the values that are used in development or test deployments.
For example, your database management system might require a different user name and password in
a test or production system.
When you deploy i2 Analyze, the passwords in the credentials.properties file are encoded.
Database
For each database that is identified in topology.xml, you must specify a user name and a
password in the credentials.properties file. The setup script uses this information to
authenticate with the database.
Note: The user that you specify must have privileges to create and populate databases in the
database management system.
The database credentials are stored in the following format:
db.infostore.user-name=user name
db.infostore.password=password
For example:
db.infostore.user-name=admin
db.infostore.password=password
Note: The db.truststore.password credential is used only when you configure the connection
between the database and Liberty to use SSL. If you are not using SSL to secure this connection,
you do not need to specify a value for the db.truststore.password credential. For more
information about configuring SSL, see Configure Secure Sockets Layer with i2 Analyze.
LTPA keys
You must provide a value for the ltpakeys.password property. This value is used by the system
to encrypt LTPA tokens.
• For a stand-alone deployment of i2 Analyze, you can specify any value as the password.
• For a deployment of i2 Analyze that uses LTPA tokens to authenticate with other systems, you
must specify the same password that those systems use.
Solr search platform
The Solr search platform is used to search data in the Information Store. You must provide values
for the solr.user-name and solr.password properties. Any Solr indexes are created when
i2 Analyze is first deployed, and the values that you provide here become the Solr user name and
password.
If you have already deployed i2 Analyze, and you want to change the Solr password, new properties
must be created in the following format:
solr.user-name.new=value
solr.password.new=value
For example:
#Solr credentials
#The user name and password for solr to use once deployed
solr.user-name=admin
solr.password={enc}E3FGHjYUI2A\=
solr.user-name.new=admin1
solr.password.new=password
solr.home.dir
The installation path for Apache Solr. (Required for the Opal deployment pattern)
For example, C:/IBM/i2analyze/deploy/solr on Windows or /opt/IBM/i2analyze/deploy/solr on Linux
By default, the value is /opt/IBM/i2analyze/deploy/solr
zookeeper.home.dir
The installation path for Apache ZooKeeper. (Required for the Opal deployment pattern)
For example, C:/IBM/i2analyze/deploy/zookeeper on Windows or /opt/IBM/i2analyze/deploy/
zookeeper on Linux
By default, the value is /opt/IBM/i2analyze/deploy/zookeeper
apollo.data
The path to a directory where i2 Analyze can store files. By default, the directory is also used to
store the Solr index files.
For example, C:/IBM/i2analyze/data on Windows, or /opt/IBM/i2analyze/data on Linux
By default, the value is /opt/IBM/i2analyze/data
backup.config.location.dir
Backup and restore The location where the i2 Analyze configuration backups are created and
restored from. This location must exist on the Liberty server where you run the backupConfiguration
command.
backup.solr.location.dir
The location where the Solr index and ZooKeeper configuration backups are created and restored
from. This location must be accessible by every Solr node in your deployment.
backup.db.location.dir
The location where the database backup are created and restored from. The user that is specified in
the credentials.properties file for your database management system must have write permissions
to this location.
applications
The applications that comprise this deployment of i2 Analyze, and the locations of the application
servers on which they are to be installed. For more information, see Applications on page 298.
zookeepers and solr-clusters
The ZooKeeper hosts and Solr clusters that are used in this deployment. For more information, see
Solr and ZooKeeper on page 302.
connectors
The connectors that are used with the i2 Connect gateway in this deployment. For more information,
see Connectors on page 307.
After you update the topology file, you can either modify other aspects of the deployment toolkit
or redeploy the system to update the deployment with any changes. For more information about
redeploying your system, see Redeploying Liberty on page 316.
Data sources
The topology files that are supplied with each example in the deployment toolkit contains a
preconfigured <i2-data-source> element.
The <i2-data-sources> element contains a single <i2-data-source> element for the data source
that the deployment connects to. For example:
<i2-data-sources>
<i2-data-source default="false" id="infostore">
<DataSource Version="0" Id="">
<Name>Information Store</Name>
</DataSource>
</i2-data-source>
</i2-data-sources>
Where:
Attribute Description
id A unique identifier that is used to distinguish this data source
throughout the system.
default An optional attribute. The value must be false in this version of i2
Analyze.
Each <i2-data-source> contains a single <DataSource> element that has two standard attributes
and two child elements.
Attribute Description
Id Reserved for future use. The value must be empty in this version of
i2 Analyze.
Version Reserved for future use. The value must be 0 in this version of i2
Analyze.
Element Description
Name The name of this data source, which is presented to users.
Databases
The <databases> element defines the database that i2 Analyze uses to store data. The <database>
element contains attributes that define information about the database, and the mechanism that is used
to connect to it.
For example:
<database database-type="InfoStore"
dialect="db2" database-name="ISTORE" instance-name="DB2"
xa="false" host-name="host" id="infostore"
port-number="50000" />
Where:
Attribute Description
database-type The value must be InfoStore in this version of i2 Analyze.
dialect Specifies the type of database engine. This attribute can be set to one of the
following values:
• db2
• sqlserver
If your database is on a remote server from the i2 Analyze application, you must use the following
attributes in the topology.xml file:
Attribute Description
os-type Identifies the operating system of the server where the database is located.
This can be one of the following values:
• WIN
• UNIX
If you are using Db2, you can also specify AIX.
node-name Identifies the node to create in the Db2 node directory.
The value of the node-name attribute must start with a letter, and have
fewer than 8 characters. For more information about naming in Db2, see
Naming conventions.
Applications
The <applications> and child <application> element within the topology file define the indexes,
file stores, and WAR file for the application.
An <application> element has the following attributes:
Attribute Description
name The value must be opal-server in this version of i2 Analyze.
host-name The hostname of the server where the application is located.
http-server- Determines whether the HTTP Server is configured by the deployment toolkit.
host
Set to true to configure the HTTP Server, or false not to configure the HTTP
Server.
The following sections explain how to define the indexes, file stores, and WAR file for an application.
File stores
The <application> element contains a child <file-stores> element that defines the file stores
that are used by the application. The location attribute specifies the file path to use for each file store.
The other attributes must be left with their default values.
WAR files
The <application> element contains a child <wars> element. The <wars> element contains child
<war> elements that define the contents of the i2 Analyze WAR files that are installed on the application
server.
Each <war> element has the following attributes:
Attribute Description
target The type of WAR file to create. The following types are available:
• opal-services-is
• opal-services-daod
• opal-services-is-daod
Attribute Description
database-id Identifies the database to use. This value must match the id value that is
specified in a <database> element.
For the opal-services-is and opal-services-is-daod WARs, the
<data-source> element must reference a database of type InfoStore.
solr-collection-ids
Identifies the Solr collections that are used by this WAR in child <solr-collection-id>
elements. Each Solr collection is identified by the collection-id attribute, the value of which
must match the id value that is specified in a <solr-collection> element. For more information
about the collections that you must reference, see <solr-collections>.
A Solr collection belongs to a Solr cluster. Each Solr cluster is identified by the cluster-id
attribute, the value of which must match an id value that is specified in a <solr-cluster>
element.
For more information about the ZooKeeper and Solr-related elements in the topology.xml file,
see Solr and ZooKeeper on page 302.
file-store-ids
Identifies the file stores that are used by this WAR. Each file store is identified by the value
attribute, the value of which must match an id value that is specified in a <file-store> element.
connector-ids
Identifies the connectors that are available in the opal-services-is-daod and opal-
services-daod WARs in a child <connector-id> element. Each connector is identified by the
value attribute, the value of which must match an id value that is specified in a <connector>
element.
For more information about connector-related elements in the topology.xml file, see Connectors
on page 307.
fragments
Specifies the fragments that are combined to create the WAR.
Note: All WARs must contain the common fragment.
In the example topology.xml file for the information-store-opal example deployment, the
opal-server application definition is:
In the example topology.xml file for the daod-opal deployment, the opal-server application
definition is:
<solr-clusters>
In the supplied topology.xml file that includes the opal-server application with the opal-
services-is WAR, the <solr-clusters> definition is:
<solr-clusters>
<solr-cluster id="is_cluster" zookeeper-id="zoo">
<solr-collections>
<solr-collection
id="main_index" type="main"
lucene-match-version=""
max-shards-per-node="4" num-shards="4" num-replicas="1"
/>
<solr-collection
id="match_index1" type="match"
lucene-match-version=""
max-shards-per-node="4" num-shards="4" num-replicas="1"
/>
<solr-collection
id="match_index2" type="match"
lucene-match-version=""
max-shards-per-node="4" num-shards="4" num-replicas="1"
/>
<solr-collection
id="highlight_index" type="highlight"
lucene-match-version=""
max-shards-per-node="4" num-shards="4" num-replicas="1"
/>
<solr-collection
id="chart_index" type="chart"
lucene-match-version=""
max-shards-per-node="4" num-shards="4" num-replicas="1"
/>
</solr-collections>
<solr-nodes>
<solr-node
memory="2g"
id="node1"
host-name=""
data-dir=""
port-number="8983"
/>
</solr-nodes>
</solr-cluster>
</solr-clusters>
The <solr-clusters> element includes a child <solr-cluster> element. The id attribute of the
<solr-cluster> element is a unique identifier for the Solr cluster. To associate the Solr cluster with
the ZooKeeper instance, the value of the zookeeper-id attribute must match the value of the id
attribute of the <zookeeper> element.
<solr-collections>
Attribute Description
id An identifier that is used to identify the Solr collection.
type The type of the collection. The possible values are:
• main
• daod
• match
• highlight
• chart
lucene-match- The Lucene version that is used for the collection. At this release, the
version value is populated when you deploy i2 Analyze.
num-shards The number of logical shards that are created as part of the Solr collection.
num-replicas The number of physical replicas that are created for each logical shard in
the Solr collection.
max-shards-per- The maximum number of shards that are allowed on each Solr node. This
node value is the result of num-shards multiplied by num-replicas.
Attribute Description
min-replication- The minimum number of replicas that an update must be replicated to for
factor the operation to succeed. This value must be greater than 0 and less than
or equal to the value of num-replicas.
This attribute is optional.
num-csv-write- The number of threads that are used to read from the database and write
threads to the temporary .csv file when indexing data in the Information Store.
This attribute is optional, and applies to Solr collections of type main and
match only.
The total of num-csv-write-threads and num-csv-read-threads
must be less than the number of cores available on the Liberty server.
num-csv-read- The number of threads that are used to read from the temporary .csv file
threads and write to the index when indexing data in the Information Store.
This attribute is optional, and applies to Solr collections of type main and
match only.
This value must be less than the value of num-shards.
The total of num-csv-write-threads and num-csv-read-threads
must be less than the number of cores available on the Liberty server.
<solr-nodes>
The <solr-nodes> element is a child of the <solr-cluster> element. The <solr-nodes>
element can have one or more child <solr-node> elements. Each <solr-node> element has the
following attributes:
Attribute Description
id A unique identifier that is used to identify the Solr node.
memory The amount of memory that can be used by the Solr node.
host-name The hostname of the Solr node.
data-dir The location that Solr stores the index.
port-number The port number of the Solr node.
<zookeepers>
In the supplied topology.xml file that includes the opal-server application, the <zookeepers>
definition is:
<zookeepers>
<zookeeper id="zoo">
<zkhosts>
<zkhost
id="1"
host-name=""
data-dir=""
port-number="9983"
quorum-port-number=""
leader-port-number=""
/>
</zkhosts>
</zookeeper>
</zookeepers>
The <zookeepers> element includes a child <zookeeper> element. The id attribute of the
<zookeeper> element is a unique identifier for the ZooKeeper instance. To associate the ZooKeeper
instance with the Solr cluster, the value of the id attribute must match the value of the zookeeper-id
attribute of the <solr-cluster> element.
The <zkhosts> element is a child of the <zookeeper> element. The <zkhosts> element can have
one or more child <zkhost> elements. Each <zkhost> element has the following attributes:
Attribute Description
id A unique identifier that is used to identify the ZooKeeper host. This value must be an
integer in the range 1 - 255.
host-name The hostname of the ZooKeeper host.
data-dir The location that ZooKeeper uses to store data.
port- The port number of the ZooKeeper host.
number
quorum- The port number that is used for ZooKeeper quorum communication. By default, the
port- value is 10483.
number
leader- This port number that is used by ZooKeeper for leader election communication. By
port- default, the value is 10983.
number
Connectors
An i2 Analyze deployment that includes the i2 Connect gateway enables you to connect to external data
sources.
The topology.xml file for a deployment that includes the opal-server application with the opal-
services-is-daod or opal-services-daod WARs includes the <connectors> element. The
<connectors> element defines the connectors that are used in a deployment.
<connectors>
The <connectors> element contains one or more child <connector> elements. The <connector>
element can have the following attributes:
Attribute Description
id A unique identifier that is used to identify the connector.
name The name of the connector, which is presented to users.
base-url The URL to the connector, made up of host name and port number. For
example, https://host name:port number.
You can use the HTTP or HTTPS protocol.
configuration- The URL to any configuration that is required for the connector. The default
url value for the configuration-url attribute is /config.
The presence of the attribute in the topology.xml file is optional. If it is not
present, the default value is used.
gateway-schema The short name of the gateway schema whose item types the connector
can use, overriding any setting in the connector configuration. This attribute
is optional; when present its value must match one of the settings in
ApolloServerSettingsMandatory.properties.
schema-short- The short name by which the connector schema for the connector is known
name in the rest of the deployment, overriding any setting in the connector
configuration.
Use this optional attribute to avoid naming collisions, or to change the name
that users see in client software.
<connectors>
<connector id="example-connector" name="Example"
base-url="http://localhost:3700/" />
</connector>
The DiscoServerSettingsCommon.propertiesfile
The following properties are in the DiscoServerSettingsCommon.properties file:
IdleLogoutMinutes
The idle time in minutes after which the end user is logged out.
By default, the value is 15
AlertBeforeIdleLogoutMinutes
The time in minutes the end user is alerted prior to being logged out due to inactivity.
By default, the value is 2
AlwaysAllowLogout
Forces logout to always be available for all authentication methods.
By default, the value is false
DeletedChartTimeoutSeconds
The time in seconds till a deleted chart is permanently removed.
By default, the value is 7200
ResultsConfigurationResource
The file that specifies what options are available to users when they view and filter results.
CommandAccessControlResource
The file that is used to specify specific group access to commands.
EnableSolrIndexScheduler
Turn on or off the scheduling of indexing Setting this option to false disables the scheduler and
should be used when ingesting large amounts of data.
By default, the value is true
SolrHealthCheckIntervalInSeconds
The interval time in seconds between the checks of the Solr cluster status.
By default, the value is 60
QueryTimeoutSeconds
The time in seconds after which the server can cancel a search, resulting in an error. This is not an
absolute limit. A search might continue to run for several seconds after the limit is reached, but it
should terminate within a reasonable time. A zero setting disables search timeout.
By default, the value is 60
WildcardMinCharsWithAsterisk
The minimum number of characters (not counting asterisks) that must be provided in a wildcard text
search query that contains asterisks.
By default, the value is 0
WildcardMinCharsWithQuestionMark
The minimum number of characters (not counting question marks or asterisks) that must be
provided in a wildcard text search query that contains question marks.
By default, the value is 0
SearchTermMaxLevenshteinDistance
The maximum Levenshtein distance for spelled-like text searches. The allowed values are 2, 1, and
0. Set the value to 0 to turn off spelled-like text searches.
By default, the value is 2
VisualQueryWildcardMinCharsWithAsterisk
The minimum number of characters (not counting asterisks) that must be provided in a visual query
condition that contains or implies asterisks. (Matches wildcard pattern; Starts with; Ends with;
Contains)
By default, the value is 0
VisualQueryWildcardMinCharsWithQuestionMark
The minimum number of characters (not counting question marks or asterisks) that must be
provided in a visual query condition that contains question marks. (Matches wildcard pattern)
By default, the value is 0
VisualQueryConfigurationResource
The file that specifies what operators are valid in visual query conditions that involve particular
property types of particular item types.
VisualQueryMaxValuesInList
The maximum number of values in the value list of a visual query condition.
By default, the value is 10000
RecordMaxNotes
The maximum number of notes that can be added to a record.
By default, the value is 50
MaxRecordsPerDeleteRequest
The maximum number of records that can be deleted in one request.
By default, the value is 500
MaxSourceIdentifiersPerRecord
The maximum number of source identifiers that can be present on a record.
By default, the value is 50
MaxSeedsForDaodServices
The maximum number of records to send as seeds to any i2 Connect service in a "DAOD" or
"combined" deployment.
By default, the value is 500
AlertScheduleTimeOfDay
The time of day to run the Visual Queries that are saved with alerting enabled. The format is
HH:mm, HH is the hour of the day in a 24-hour time format, 00-23, and mm is the number of
minutes past the hour, 00-59.
By default, the value is 00:00
ExpandMaxResultsSoftLimit
The soft limit for the maximum number of results that can be returned for an Expand operation.
By default, the value is 0
SourceReferenceSchemaResource
The file that restricts the source names and types that users can specify in source references when
creating and importing records.
LinkMatchEndRecordCombinationLimit
The maximum number of link end combinations that are searched for when performing link
matching.
By default, the value is 100
ChartUnsavedChangesLifespanDays
The length of time for which the server retains unsaved changes to a chart across user sessions.
Any change to an unsaved chart resets the timer.
By default, the value is 7
ChartUnsavedChangesCleanUpScheduleExpression
The schedule on which to clean up unsaved changes to charts when
ChartUnsavedChangesLifespanDays is exceeded. The format is a Unix C cron expression.
By default, the value is 0 0 * * 0
setup -t stopLiberty
3. To clear data and the search index, run the following command:
• Where liberty.hostname is the hostname of the Liberty server where you are running the
command, and matches the value for the host-name attribute of the <application> element
in the topology.xml file.
• A message is displayed when you run the task to confirm that you want to complete the action.
Enter Y to continue. The data and the search index are removed from the system.
If you run the clearData task with the --scripts argument, the scripts for clearing the
Information Store database are generated in the toolkit\scripts\database\<database
dialect>\InfoStore\generated\clearData directory, but not run. You can then inspect the
scripts before they are run, or run the scripts yourself. To run the scripts yourself, run them in their
numbered order to clear the data from your system.
To generate the scripts for the task, run:
After you run the scripts manually, clear the search index:
setup -t startLiberty
setup -t stopLiberty
3. To remove the database and Solr collections, navigate to the toolkit\scripts directory and run
the following command:
Here, liberty.hostname is the hostname of the Liberty server where you are running the
command. It matches the value for the host-name attribute of the <application> element in the
topology.xml file.
A message is displayed when you run each task to confirm that you want to complete the action.
Enter Y to continue. The database and Solr collections are removed from the system.
4. To re-create the Solr collections and databases, run the following commands:
setup -t startLiberty
Here, zookeeper.host-name is the hostname of the ZooKeeper server where you are running the
command, and matches the value for the host-name attribute of a <zkhost> element in the
topology.xml file.
b) Stop Solr.
To stop Solr, run the following command on every server where Solr is running:
Here, solr.host-name is the hostname of the Solr server where you are running the command,
and matches the value for the host-name attribute of a <solr-node> element in the
topology.xml file.
c) Stop Liberty and the i2 Analyze application.
To stop Liberty, run the following command on each Liberty server:
setup -t stopLiberty
Here, zookeeper.host-name is the hostname of the ZooKeeper server where you are running the
command, and matches the value for the host-name attribute of a <zkhost> element in the
topology.xml file.
b) Start Solr.
To start Solr, run the following command on every server where your Solr nodes are located:
Here, solr.host-name is the hostname of the Solr server where you are running the command,
and matches the value for the host-name attribute of a <solr-node> element in the
topology.xml file.
c) Start Liberty and the i2 Analyze application.
To start Liberty, run the following command on each Liberty server:
setup -t startLiberty
Redeploying Liberty
Redeploy Liberty to update the i2 Analyze application with your configuration changes.
In a multiple-server environment, run all toolkit tasks from the Liberty server.
If you follow this procedure in a deployment that provides high availability, you must complete each step
on every Liberty server in your environment before you move to the next step.
1. In a command prompt, navigate to the toolkit\scripts directory.
2. Stop Liberty:
setup -t stopLiberty
setup -t deployLiberty
4. Start Liberty:
setup -t startLiberty
5. If you are using the IBM HTTP Server, start or restart it.
You can find the REST API reference documentation at the following URL on the i2 Analyze server:
http://host_name/conext_root/doc. For example, http://localhost/opal/doc.
1. If the curl utility is not available on your server, download it from the project website at https://
curl.haxx.se/download.html.
2. Open a command prompt and use curl to log in to the i2 Analyze server and retrieve an
authorization cookie:
This command connects to the specified i2 Analyze server as the specified user, retrieves the
authentication cookie, and saves it to a local file named cookie.txt. The LTPA token in the cookie
is valid for 2 hours.
When the command completes, the response from the request is displayed. If the retrieval is
successful, the first line of the response is:
After you retrieve and save the cookie, you can POST to the admin endpoints. When you POST to the
admin endpoints, include the cookie file that you received when you logged in.
3. The following command is an example of a POST to the admin endpoint to reload the live
configuration:
When the command completes, the response from the request is displayed. If the retrieval is
successful, the first line of the response is:
HTTP/1.1 200 OK
If you see the HTTP/1.1 403 Forbidden response code, the user for which you retrieved the
token does not have the command access control administrator permission, or the token has
expired:
• If your token has expired, you can get a new one by running the command in step 2 again.
• If your user does not the command access control i2:Administrator permission, see
Configuring command access control on page 148 to provide the user with the permission.
Warning: The reload method updates the configuration without requiring a server restart,
but any logged-in users are logged out from i2 Analyze when you run it.
Connector
Service Data
Client i2 Analyze application
i2 Connect
Connector
gateway
Service
Service Data
Service
Each connector contains the code that interacts with an external data source, the definitions of the
services that it supports, and the descriptions of how clients might present its services to users. For
example, clients need to know the following information:
• The name and description of each service a user has access to
• Whether a service requires users to authenticate before they can use it
• Whether a service supports parameters that users can specify when they run it
• The input controls to display, and the validation to perform, for each parameter
• Whether a service behaves differently as a result of the current chart selection (any selected records
are then seeds for the operation that the service performs)
One of the roles of the gateway is to retrieve this information from all the connectors that i2® Analyze
knows about. It can then help to ensure that requests to and responses from services are formatted
correctly.
In solutions like these, a connector is an implementation of a REST interface that enables the gateway
to perform its role. More specifically, a connector must support the following tasks:
• Respond to requests from the gateway for information about its services
• Validate and run queries against an external data source, including any queries that require
parameters or seeds
• Convert query results to a shape that the gateway can process into i2® Analyze records
Writing a connector requires you to develop an implementation of the REST interface that the gateway
expects, and to write queries that retrieve data from an external source. You must also understand the
i2® Analyze data model, and be able to convert the retrieved data so that it conforms to an i2 Analyze
schema.
The examples that i2 provides in the open-source repository at https://github.com/ibm-i2/analyze-
connect demonstrate meeting several of these requirements in fully functional connectors. The
remainder of this section explains how to meet them in more detail.
Note: For information on securing the communication between the i2 Connect gateway and its
connectors, see Client authenticated Secure Sockets Layer and the i2 Connect gateway.
System interactions
The following diagram represents the deployment that is described in the introduction, and the
interactions that take place between each part.
i2 Connect
gateway schema
charting scheme
reload configuration
2
Service
3
Analyst's
Notebook validate
Premium
i2 Analyze
Connector
Liberty server
Web server
The diagram also identifies five of the REST endpoints that connectors can implement and use:
Configuration endpoint
The gateway sends a request to the mandatory configuration endpoint to gather information about
the services that the connector supports. All connectors have at least one service, and it is a service
that implements the validate and acquire endpoints.
Note: For connectors whose services present modified behavior to different users, the information
is split between two endpoints: this one, and the user-specific configuration endpoint.
Schema endpoint
The schema endpoint is optional, and so is where you implement it. If a connector returns results
whose types are not in the Information Store schema or a gateway schema, you can provide
a connector schema that define those types. The configuration specifies whether and where a
connector schema is available.
Interaction sequence
The interaction between the i2® Connect gateway and the other parts of an i2® Analyze deployment
takes place in three distinct phases, labeled 1, 2, and 3 in the diagram.
1. When i2® Analyze starts up, the i2® Connect gateway sends a request to the configuration endpoint
of every connector that is listed in the topology. It caches the information that it receives in response.
If the configuration for any of the connectors specifies schema and charting scheme endpoints,
the gateway sends requests to retrieve the connector schemas and charting schemes from those
endpoints.
Note: You can also force the gateway to repeat this process while i2 Analyze is running. The
gateway implements the reload endpoint, whose POST method you can call after you change the
configuration of a connector.
2. When a client connects to i2® Analyze, the latter gets the cached information from the gateway and
returns it to the client. The client can then present the queries that the services implement to users,
and help users to provide valid parameters and seeds to those queries where appropriate.
Note: For connectors that implement the user-specific configuration endpoint, i2® Analyze sends a
request to it with information about the connecting user at this stage, to retrieve the remainder of the
configuration information.
3. When a user runs a synchronous query, i2® Analyze passes it to the gateway for processing. The
gateway packages the query into a request to the acquire endpoint on the service in question,
preceded by a request to the validate endpoint if the service supports it.
On receiving the response from the acquire endpoint, the gateway converts the data that it contains
into results that i2® Analyze can ultimately return to the client.
For example, if the purpose of your connector is to provide data from an external source to an existing
deployment, it might be that some or all of the data in the source is similar to data that the deployment
already models.
Alternatively, if you are writing a connector for use in many deployments, there are advantages to
providing a description of the data that the connector returns alongside the connector itself.
During connector development, you can decide whether the definition for each type that you want to
return is found in the Information Store schema, a gateway schema, or the connector schema. In the
first two cases, you can also decide whether to use an item type that already exists or to extend the
schema with a new type.
1. Consider the type of the source and how you can query it.
For spreadsheets and text files, it is likely that you have complete access to the data they contain.
For databases and web services, you might be restricted to a set of predefined queries.
2. Identify the types of data that you can retrieve from the source. If your aim is to integrate with an
existing deployment, compare the source's types with the types in the Information Store or gateway
schemas.
For each type in the source, you can reach one of three conclusions:
• The type and its data are a match for (or a subset of) one of the entity or link types you already
have
• The type is a match for one of the types you already have, but its data implies new property types
• The type is not currently represented in the existing schemas
3. If the Information Store schema or a gateway schema already describes all the data in the external
source, then you can start the process of writing queries that retrieve data from that source.
Note: In a deployment that contains more than one gateway schema, any single connector can use
the types from only one gateway schema.
4. If the data in the external source contains property types that the existing schemas do not describe,
then you can ignore that data or edit a schema to add those property types.
Additive changes to schemas are always permitted.
5. If the data in the external source contains item types that the existing schemas do not describe, then
you can use one of the techniques for adding them to a schema, or create a connector schema to
describe them.
No fixed set of steps for retrieving data from external data sources exists, but some of your choices
can make later tasks easier. Whatever the source, your initial aim is to generate a small set of
representative data.
• If the source allows it, begin by writing a tightly bound query. Or, if your source comprises one or
more text files, try to abbreviate them.
At the start of the process, large volumes of data can be distracting.
• If you can, make the query retrieve data for entities and links.
Creating the data structures that represent linked entities is an important part of connector
development.
• Try to generate a data set that has most of the types that the external source contains.
No matter how complex your queries eventually become, the code to process data into the right
shape for i2® Analyze is unlikely to change.
When you implement an acquire endpoint, you can call the query that you developed here directly from
that code. Alternatively, during development, you might decide to run the query now, and save the
results in a file.
Note: If you do not indicate whether these identifiers are persistent here or in the services
array, the gateway assumes that the identifiers are not persistent.
• The services array.
The services array must contain information for at least one synchronous or asynchronous
service. The information must include a unique identifier and a name for the service. It must also
specify whether the service requires a client UI, and the URL of the acquire or queriesResource
endpoint.
First, construct the defaultValues object:
1. Determine the time zone that is most likely to apply to any temporal data in the source that the
services in this connector query.
2. Find the identifier of the time zone in the IANA Time Zone Database, and arrange for the response
from the endpoint to include the identifier in its defaultValues object.
For example:
{
"defaultValues": {
"timeZoneId": "Europe/London",
...
},
"services": [
...
]
}
Note: You can also retrieve a list of supported time zones from the GET method on the i2 Analyze
server's /api/v1/core/temporal/timezones REST endpoint.
3. If the source contains only a few types, and if you intend the connector eventually to have many
services that run different queries, then consider adding an entityTypeId and a linkTypeId to
the defaultValues object.
It is more common to specify what types of record a query might retrieve on a per-service than on
a connector-wide basis. However, if you do not supply default types here, then every service must
supply types individually.
The gateway searches for the types that you supply here in the connector schema, the gateway
schema, and the Information Store schema. To specify exactly where the types are defined, you can
add entityTypeLocation and linkTypeLocation properties.
4. If you know that the source will always attach the same identifier to the same piece of retrieved
information, add the resultIdsPersistent field to the defaultValues object and set its value
to true.
The effect of the field depends on your deployment of i2 Analyze:
• If the deployment contains only the i2 Connect gateway, then setting
"resultIdsPersistent": true prevents duplication when the same record is returned twice
from the same connector. Each time the record is sent to a chart, it replaces the existing version.
• In other scenarios (if the deployment also contains the Information Store, or if you set
"resultIdsPersistent": false), records can be duplicated on the chart unless you
configure source identifier matching in the match rules.
Note: The resultIdsPersistent field is also valid at the per-service level, where its value
overrides any connector-wide setting.
Then, add an object to the services array:
5. To the services array, add a service object that has an id and a name. It is common to
include a description and populate the resultItemTypeIds array as well, although neither is
mandatory.
For example:
{
"defaultValues": {
"timeZoneId": "Europe/London",
"resultIdsPersistent": true
},
"services": [
{
"id": "nypd-service",
"name": "NYPD Connector: Get All",
"description": "A service that retrieves all data",
"resultItemTypeIds": {
"INFOSTORE": ["ET1", "ET2", "ET3", "LT1", "LT2", "LT3"]
},
...
}
]
}
6. For a service whose query does not allow callers to provide parameters, you can set the
clientConfigType to NONE.
{
"defaultValues": {
"timeZoneId": "Europe/London",
"resultIdsPersistent": true
},
"services": [
{
"id": "nypd-service",
"name": "NYPD Connector: Get All",
"description": "A service that retrieves all data",
"resultItemTypeIds": {
"INFOSTORE": ["ET1", "ET2", "ET3", "LT1", "LT2", "LT3"]
},
"clientConfigType": "NONE",
...
}
]
}
If you later add parameters to the query, you can allow users to specify them by changing the value
to FORM and providing the identifier of a client configuration in clientConfigId.
If you later add support for seeds, you must add a seedConstraints object to the service.
7. Finally, for a synchronous service, set the acquireUrl of the service to the URL where you intend
to host the acquire endpoint.
{
"defaultValues": {
"timeZoneId": "Europe/London",
"resultIdsPersistent": true
},
"services": [
{
"id": "nypd-service",
"name": "NYPD Connector: Get All",
"description": "A service that retrieves all data",
"resultItemTypeIds": {
"INFOSTORE": ["ET1", "ET2", "ET3", "LT1", "LT2", "LT3"]
},
"clientConfigType": "NONE",
"acquireUrl": "/all"
}
]
}
This JSON structure represents (almost) the simplest response that you can send from the configuration
endpoint of a connector. It contains the definition of one simple service. In order for the i2 Connect
gateway to retrieve that definition, you must add details of the connector to the i2 Analyze deployment
topology.
<connector-ids>
<connector-id value="ConnectorId"/>
</connector-ids>
Here, ConnectorId must be unique within the topology file. Its purpose is to reference the
configuration for the connector in the <connectors> element.
4. Add a <connector> element to the <connectors> element, which is a child of the root
<topology> element:
<connectors>
<connector id="ConnectorId"
name="ConnectorName"
base-url="Protocol://HostName:PortNumber"
configuration-url="Protocol://HostName:PortNumber/Path"/>
</connectors>
Here, ConnectorName is likely to be displayed to users in the list of queries that they can run against
external sources.
The i2® Connect gateway uses the value that you assign to the base-url attribute as the stem
for the URLs that you specify in the configuration (such as /exampleSearch/acquire).
configuration-url is the only optional attribute.
By default, the gateway attempts to retrieve the configuration from <base-url>/config. You can
change this behavior by specifying a different URL as the value of configuration-url.
5. Save the file, and then update the deployed connectors configuration and restart the i2® Analyze
server.
a) On the command line, navigate to toolkit\scripts.
b) Run the following commands in sequence:
setup -t stopLiberty
setup -t updateConnectorsConfiguration
setup -t startLiberty
When the server restarts, the i2® Connect gateway makes its request to the configuration endpoint. If
that endpoint (or any of the endpoints in the retrieved configuration) is not available, the result is not
an unrecoverable error. Rather, the i2® Analyze server logs the problem, and users see messages
about unavailable services in the client application.
To preview what users see, and to discover the status of the configured connectors at any time, you can
use the server admin console.
6. Open a web browser and connect to the i2 Analyze server admin console at http://host_name/
opal/admin.
The admin console displays a list of connectors with their statuses. To see what the logged-in user
would see in their client application, click Preview services.
This command connects to the specified i2 Analyze server as the specified user, retrieves the
authentication cookie, and saves it to a local file named cookie.txt.
c) Use curl a second time to call the POST method on the reload endpoint:
This command causes the i2 Connect gateway to reload configuration information from all
the connectors that the topology included when it was last deployed. Clients receive updated
information about the connectors when they next connect to i2 Analyze.
2. After you reload the connectors (or restart the server), click Preview services in the server admin
console to check that users will see your changes in their lists of queries.
If you do not see your changes, different symptoms imply different causes:
• If a query does not appear in the list at all, then the problem might lie with either the
implementation of the configuration endpoint, or the <connector> element in the
topology.xml file.
If the connector supports user-specific configuration, there might be a problem with the user
configuration endpoint, or you might be logged in as a user whose configuration does not list the
query in question.
• If a query is in the list but marked as unavailable, then the cause of the problem is either the
implementation of the acquire endpoint, or the specification of that endpoint in the response from
the configuration endpoint.
• If the displayed information about a query is faulty or incomplete, look again at the definition of the
corresponding service in the response from the configuration or user configuration endpoint.
3. Provided that the queries due to new or modified connectors appear correctly, you can open a client,
run them, and view the results that they return.
After you test that your connector meets your requirements, you might choose to secure the connector
in your deployment. You can control user access to the connector, and secure the connection between
the connector and gateway. For more information, see System security with the i2 Connect gateway on
page 321.
When you return source reference information from a service, only the name is mandatory. However,
you must also state whether users can edit or delete the source references that you add to your records.
To add source references to the record data that your service returns to clients, you must arrange to
include a sourceReference object as a peer of the properties object in your response from the
acquire endpoint.
1. In your code that generates responses from the acquire endpoint, add at least the following extra
content:
{
...
"properties": {
...
},
"sourceReference": {
"source": {
"name": "source_name"
}
},
...
}
2. To give users the best experience, add a type and a description for your source that align with the
definitions in the source reference schema for the deployment:
...
"sourceReference": {
"source": {
"name": "source_name",
"type": "source_type",
"description": "source_description"
}
},
...
3. If the data in the external source has associated images, you can arrange for users to see an image
for a record when they view it by adding an image field to the source object:
"source": {
...
"image": "image_url"
}
You can similarly include the location of the source in the source reference by adding a location
field and setting it to either a URL or a text description.
4. By default, when users add records from an external data source to a chart, they cannot edit or
delete the source references that you add. To change that, add a userModifiable field to the
sourceReference object:
...
"sourceReference": {
"source": {
...
},
"userModifiable": true
},
...
Completing these steps in your service code means that source references are present in the records
that users retrieve and view. Source references are optional, so you can control whether to include them
for all records, or just for records of particular types. Finally, source references have the same features
in entity and link records, so you do not need to write different code for those two cases.
{
"gatewaySchema": "Example",
"defaultValues": {
...
},
"services": [
...
]
}
Provided that a gateway schema with the short name EXAMPLE is configured in the
ApolloServerSettingsMandatory.properties file for the deployment, the types in that
schema are available for use in the connector.
To enable a connector schema, you must add endpoints for retrieving it to any connector that uses it,
and then specify the locations of those endpoints in the response from the configuration endpoint.
2. Make your connector schema and its charting scheme available through two simple HTTP GET
methods that do not require parameters.
It is easiest, but not mandatory, to implement these endpoints on the server that hosts the
connectors that use them.
3. Add schemaUrl, chartingSchemesUrl, and schemaShortName settings to the response from
your implementation of the configuration endpoint:
{
"schemaUrl": "/schema",
"chartingSchemesUrl": "/chartingSchemes",
"schemaShortName": "Another-Example"
"defaultValues": {
...
},
"services": [
...
]
}
In this form, i2 Analyze appends the URLs that you provide to the base-url that you specified
for the connector in the topology file. By definition, these endpoints are on the same server as the
connector. To use endpoints that are implemented elsewhere, you must provide the full URLs here.
Making changes like these affects the connector but not the topology, so you do not need to redeploy i2
Analyze. Restarting the server or using the reload endpoint on the i2 Connect gateway is enough.
4. Follow the steps in Modifying and testing a connector on page 331 to restart the server or force the
gateway to reload your connector configuration.
The new types become available for use.
With the above changes in place, and provided that there are no identifier clashes, the connector can
use item types from the connector, gateway, or Information Store schema just by using their identifiers.
i2 Analyze searches for types in all three schemas, in that order. It is also possible to specify which
schema a particular item type comes from:
• When you are specifying the default return types for the connector as a whole, you can set the
entityTypeLocation and linkTypeLocation properties.
• When you are specifying the possible return types from a service, you can provide locations in the
resultItemTypeIds structure.
• In each record that you return from a service, you can set the typeLocation property.
When you deploy a connector, you can use the <connector> element in the topology file to override
some of the schema settings in the connector. You can instruct the connector to use a gateway schema
with a different short name, which can be useful if you were unable to deploy the schema with the name
that the connector expected. You can also change the short name of the connector schema, which is
useful if you have two connectors that advertise the same name.
5. To specify a gateway schema with a different short name, add the gateway-schema attribute to the
<connector> element:
<connectors>
<connector id="ConnectorId"
name="ConnectorName"
base-url="Protocol://HostName:PortNumber"
gateway-schema="ShortName"/>
</connectors>
6. To provide a different short name for the connector schema, add the schema-short-name attribute
to the <connector> element:
<connectors>
<connector id="ConnectorId"
name="ConnectorName"
base-url="Protocol://HostName:PortNumber"
schema-short-name="NewShortName"/>
</connectors>
After you make changes to the topology of an i2 Analyze deployment, you must update the deployed
connectors configuration and restart the i2 Analyze server, as described in Adding a connector to the
topology on page 328.
Type conversion
Type conversion
i2 Connect
gateway
i2 Analyze application
Information Store
Connector Connector Schema Connector Schema Schema
Connector Connector Connector
i2 Analyze deployment
Web server
Key: Data
Conversion
Type usage
Type conversion mappings describe how the property values of source records that have a particular
item type become the property values of target records that have a different item type.
For example, a gateway schema might define an entity type named "Person", with property types
including "Forename" and "Surname". In the same deployment, the Information Store might also define
an entity type named "Person", with property types including "First Name" and "Family Name". Provided
that the property types in question have compatible logical types, you can create a mapping to change
the type of any record with the gateway type so that it becomes a record with the Information Store type.
The effect of this type conversion mapping is that any record that would otherwise appear in search
results or on the chart surface with the gateway type instead appears with the Information Store type.
As a result, it can be compared directly with existing records in the Information Store, and can itself be
uploaded to the Information Store if the user requires it.
Note: It is not mandatory to create mappings between item types. By default, records from connectors
are copied to users' charts with their assigned entity or link types, without modification.
To create type conversion mappings, you use the i2 Analyze Server Admin Console.
1. In a web browser, navigate to the URI of a i2 Analyze development environment, but append /
admin.
For example: http://host_name:9082/opal/admin.
When you log in as a user with administrator privileges, the i2 Analyze Server Admin Console
appears.
2. On the left of the user interface, select the i2 Analyze Type Conversion app.
In the main panel, the app displays all the item types from all the connector and gateway schemas in
this deployment of i2 Analyze. (Item types from an Information Store schema cannot be source types
in type conversion mappings.) For each type in the list, you can create a mapping to a different type.
3. Select a source item type, and click Create mapping to display the Convert records dialog.
This dialog displays all the item types from all the gateway schemas and any Information Store
schema in this deployment of i2 Analyze. (Item types from connector schemas cannot be target
types in type conversion mappings.)
4. Select a target type for your mapping, and click Create mapping to display the dialog with the same
name.
The Create mapping dialog displays a list of the property types of the target item type. For each
property type, you can elect to provide no mapping, or to say that a property with this type is to be set
from a source property with a specified type, or that a property with this type is always to be set to a
fixed value.
5. Configure the property type mappings as you see fit, and then click OK to create the item type
mapping.
If there are problems with any of the mappings that you create, the app displays inline warning and
error messages.
As you develop your type conversion mappings, you can preview their effects on the results that users
see when they query external data sources by temporarily applying them to the development server.
6. Click Apply to apply all the current type conversion mappings to the server.
7. Click Preview services to display the same Services dialog that users see when they query
external data sources.
Note: If any of your connectors support user-specific configuration, you might need to log in with
different credentials in order to see the effect of your mappings on all services.
8. Select and run some of the queries.
The descriptions and results of the queries now reflect the mapping that you created. Any references
to the source types in your mappings are replaced with their target types.
This approach to previewing the effect of your mappings is not suitable for a production environment,
where you must instead deploy a type conversion mapping configuration file to your i2 Analyze servers.
9. When you are satisfied with the mappings, click Export to create a configuration file named
mapping-configuration.json.
setup -t stopLiberty
setup -t deployLiberty
setup -t startLiberty
12.Finally, verify that the changes that you made in the development environment behave correctly in
the production environment.
more control over the constraints, you can populate the object. The REST SPI documentation for the
configuration endpoint describes its structure:
"seedConstraints": {
"connectorIds": [""],
"min": 0,
"max": 0,
"seedTypes": {
"allowedTypes": "",
"itemTypes": [
{
"id": "",
"min": 0,
"max": 0,
"typeLocation": ""
}
]
}
}
A service can specify which records it accepts as seeds by restricting the connectors they came from.
It can also set boundaries on the number of seed records it accepts, regardless of their type. A service
can specify whether entity or link records are allowed as seeds (entity records are the default), and then
restrict the range to a subset of that group. It can also set boundaries on the number of seeds, on a per-
type basis.
If you configure the constraints so that requests must contain at least one seed of each permitted item
type, then the service requires seeds. Users cannot run the service without an appropriate selection.
Otherwise, your implementation of the acquire endpoint must support requests that might or might not
contain seed information.
When the i2 Connect gateway calls your acquire endpoint to perform a seeded query, it includes a
payload that contains identifiers and property values for all seeds. For information about the structure of
the payload, see the REST SPI documentation for the request parameter of the acquire endpoint.
In outline, the procedure for supporting seeded queries in one of your services is to start by adding seed
constraints to its configuration. Clients then interpret the configuration and present the queries to users.
When users run a query, your implementation of the acquire endpoint receives seeds that you can use
to guide your response.
1. Decide what kind of seeded operation you want to perform, and its implications for the service.
For an "expand" query, for example, you might accept a fairly large number of entities of any type as
seeds. For "find like this", the seed is more likely to be a single record of a specific type.
2. Add a seedConstraints object that reflects the requirements of the query, to the response from
the configuration (or user configuration) endpoint.
3. Implement the acquire endpoint for your service.
If at least one of the constraints does not specify a minimum record count, the endpoint might be
called with or without seeds. The REST SPI documentation for the acquire endpoint describes the
structure of the seed data.
Important: For "get latest" and "expand" queries, it is common for seed records also to appear in
the results. In that case, you must ensure that the id of the outgoing record matches the seedId of
the incoming record. In an "expand" query, for example, you can identify a seed as being one end of
a link in the response by setting the fromEndId of the link to the seedId.
4. Restart the i2 Analyze server or instruct the i2 Connect gateway to reload the configuration. Then,
connect to the server with a client, log in as a user who can see the query, and ensure that the
service is working properly.
Supporting parameters
In nearly all real situations, users want to be able to customize the queries that they can run against an
external data source. To make that possible, you must configure the service to present parameters to its
users. In your implementation of the acquire endpoint, you can act on the values that they specify.
As you read the following information, also look at the config.json files in the example projects at
https://github.com/IBM-i2/Analyze-Connect, some of which demonstrate the results of following this
procedure.
To support parameters in a service, you add a set of conditions for which users can supply values. In
a connector that supports user-specific configuration, you can even arrange for the same service to
support different parameters, depending on who is using it.
For example, in a query for people, you might allow users to search for particular names or
characteristics. In a "find path" seeded query, you might allow users to specify how long the path can be
- and you might vary the limit according to the user's group membership.
Note: A significant difference between visual query and external searches is that in the latter,
conditions are not bound to property types. A condition in a service can be anything that you can
process usefully in your implementation.
To add parameters to a service, you add a client configuration to its definition that describes how each
condition is presented to users. The REST SPI for the configuration endpoint describes the structure of
clientConfig objects, which look like this outline:
{
"id": "",
"config": {
"sections": [
{
"title": "",
"conditions": [
{
"id": "",
"label": "",
"description": "",
"mandatory": false,
"logicalType": ""
}
]
}
]
}
}
The client configuration is responsible for the appearance of the conditions that users see, and the
restrictions on the values they can provide. As well as controlling the types of conditions, and specifying
whether supplying a value is mandatory, you can add further validation that is appropriate to their
type. For more information about this kind of validation, see the REST SPI documentation for the
configuration endpoint.
When a user opens a query that supports parameters, they see a form that displays your conditions.
When they provide values and run the query, your implementation of the acquire endpoint receives a
payload that contains those values. You can write the implementation to act on those values in any way
that makes sense for your data source.
Like many other aspects of developing a connector, supporting parameters means changing your
implementations of the configuration (or user configuration) and acquire endpoints.
1. In your response from the configuration endpoint, write or modify the service definition so that its
clientConfigType is "FORM", and add a clientConfigId.
This identifier links to a client configuration elsewhere in the response. In some circumstances, it
might be appropriate to use the same client configuration for more than one service.
2. Add a clientConfigs array to the response, and within it a client configuration object whose id
matches the identifier that you specified in Step 1.
3. Add your conditions to the client configuration.
To begin with, you might consider leaving out validation checks in the interests of getting a working
implementation quickly.
4. Add code to your implementation of the acquire endpoint that unpacks the condition values from the
payload.
You can then use those values to affect the query that you perform against the external source.
5. Restart the i2 Analyze server or instruct the i2 Connect gateway to reload the configuration.
You can now test the code and make sure that it does what you expect.
6. Return to the response from the configuration endpoint, edit the condition descriptions, and add
client-side validation to improve the user experience.
7. Restart or reload again, and ensure that the validation has your intended effect.
The validation that you can specify in the response from the configuration endpoint is performed by the
client, and applies to values in isolation. If your service (and your users) might benefit from some more
complex validation, consider adding a validate endpoint.
Supporting validation
The support in the i2 Connect gateway for queries that take parameters includes the ability to perform
relatively simple, client-side validation on the values that users supply. In cases where running a query
might fail due to a set of values that are mutually incompatible, you can write an endpoint to perform
server-side validation.
When you write a service definition that includes conditions, you have the option (and sometimes the
duty) to include logic that validates the values that users supply for those conditions. For example, you
might insist that a string is shorter than a certain length, or that its contents match a particular pattern.
If you ask the user to select from a discrete set of values, then you must provide the values for them to
select from.
However, there are other kinds of validation that the mechanism for defining conditions does not
support. In particular, you cannot use it to validate values that are reasonable in isolation, but faulty in
combination. (For example, dates of birth and death might both contain reasonable values, but it would
make no sense to search for individuals where the latter is earlier than the former.)
If you have an implementation of the acquire endpoint for which it might be useful to perform this kind of
validation, you can write a validate endpoint. When your configuration specifies a validate endpoint for
a service, the gateway uses it before the acquire endpoint, and passes the same payload. If you decide
that validation fails, the request to the acquire endpoint does not happen, and the user sees an error
message that you specify.
To add a validate endpoint to a service:
1. In your response from the configuration (or user configuration) endpoint, add the validateUrl
setting alongside the existing acquireUrl setting. Set its value to the location of the
implementation.
2. Implement the rules for the validate endpoint in a way that is consistent with its definition in the
REST SPI documentation.
The payload that the endpoint receives in the request contains all the same seed and parameter
information as the acquire endpoint receives.
3. If validation succeeds according to your rules, return an empty response. If it fails, set the response
to a simple object that contains an errorMessage:
{
"errorMessage": ""
}
When the i2 Connect gateway receives a response that is not empty, it does not then send a request
to the acquire endpoint.
4. Restart the i2 Analyze server or instruct the i2 Connect gateway to reload the configuration. Test that
your new validation has the correct behavior.
Supporting authentication
Many of the data sources that users might want to query through a connector require authentication.
For example, some sources require a username-password combination, while others need the caller
to provide an API key. The i2 Connect gateway supports asking users for credentials on behalf of an i2
Connect service, and management of authenticated connections during a user session.
To enable authentication, a connector must specify in its configuration which of its services require
authentication. If a user makes a request for an authenticated service without previously providing
credentials, the gateway's response prompts the client to display a login dialog. When the user provides
valid credentials, the gateway caches a token in memory that allows further requests to succeed.
Fetch config
Load config
Failure
Success
Failure
{
"id": "authConfig1",
"loginUrl": "/login/userpass",
"form": {
"description": "This service requires a username and a password.",
"fields": [
{
"id": "username",
"label": "Username",
"type": "text"
},
{
"id": "password",
"label": "Password",
"type": "password"
}
]
}
}
3. Create the endpoint at the loginUrl that is defined in the authentication configuration.
The syntax of this POST method is straightforward: it receives the credentials that the user provides,
and (on success) responds with a token for subsequent requests to use. The detail of generating the
token depends on the nature of the service. For example, you might be passing the credentials to an
external provider, or implementing the authentication yourself in the service code.
If authentication fails, the endpoint must instead respond with an object that complies with RFC
7807. For an example of doing so, see the file named ExternalConnectorDataService.java
in the connector/auth sample project at https://github.com/ibm-i2/analyze-connect.
4. Adapt any endpoint that requires authentication (for example, the acquire, delete, or results
endpoints) to use the token mechanism.
The implementations should expect requests to include an authorization header containing a token
that was originally returned by the authentication endpoint, in the form Authorization: Bearer
token.
If the token is valid, processing of the request should proceed normally. If the token is not valid, the
endpoint must respond in the same way as the authentication endpoint. The user is then required to
re-authenticate with the service.
5. Restart the i2 Analyze server or instruct the i2 Connect gateway to reload the configuration. Connect
to the server with a client that supports external searches, and verify that the connector and its
services behave in the way you expect.
Source identifiers
In all versions of i2 Analyze, the ETL pipeline requires you to provide origin identifiers for incoming
records through the ingestion mappings that you write. Starting from i2 Analyze 4.3.5, developers of
connectors for i2 Connect and plug-ins for i2 Analyst's Notebook Premium can explictly provide source
identifiers for the records that they create.
Note: For high-level descriptions of source identifiers and origin identifiers (and the differences between
them), see Identifiers in i2 Analyze records.
When you attach recognizable identifiers to the records that you create in connector code, you enable
services to perform operations based on those identifiers. You also enable i2 Analyze clients and
servers to perform matching on records that have a shared source.
The rules that govern the structure and contents of source identifiers are similar to - but not the same as
- the rules for origin identifiers. In line with their definitions, the rules for source identifiers are sometimes
less restrictive than those for origin identifiers.
type
The type of a source identifier allows the services in an i2 Analyze deployment to determine whether the
source identifier is "known" to them - that is, that they can understand the key.
The value of the type element does not have to be meaningful, but should be unique to your services so
that you avoid clashes with any third party services you might use.
• The length of the source identifier's type must not exceed 200 bytes, which is equivalent to 100 2-
byte Unicode characters.
• The following types are reserved and must not be used (case-insensitive):
• OI.IS
• OI.DAOD
• OI.ANB
• Anything starting with i2.
key
The key of a source identifier is an array containing the information necessary to reference the data in
its source. The pieces of information that you use to make up the key differ depending on the source of
the data.
• The total length of the source identifier's key must not exceed 692 bytes, which is equivalent to 346
2-byte Unicode characters.
• The key is stored as a serialized JSON array, so additional characters appear alongside the actual
key elements: two characters are required for the array braces, and two quotes are required for each
element, with commas as separators between elements.
In other words, a key with N elements requires 3N + 1 characters of overhead. For example, the
total length of ["a","bc","defg"] is 17 characters, while ["a,bc,defg"] is 13 characters.
Also, if some special characters are present in a key element, they are be escaped for storage. For
example, " becomes ", which further increases the size of the key.
Limitations
There is a limit on the number of unique source identifiers that you can add to a
record in the Information Store. The default limit is 50, but you can modify it by adding
MaxSourceIdentifiersPerRecord=N (where N is a positive integer) to the
DiscoServerSettingsCommon.properties file.
Origin identifiers are not stored in chart records. As a result, they are not present as source identifiers
on seeds that are sent to connectors or Analyst's Notebook Premium plug-ins. Only those source
identifiers that have been added to records through the same mechanisms are be present.
Connectors that specify source identifiers are API-compatible with older versions of the i2 Connect
gateway and Analyst's Notebook Premium, but those products will not recognize new source identifiers
as such. Rather, they will be treated as ordinary identifiers or ignored. To use source identifiers to their
full potential, upgrade all products to the latest release at your earliest opportunity.
MULTIPLE_LINE_STRING
Multiple-line strings can be up to 32K bytes long, and can be further limited by the schema.
DATE
Date values must be in the ISO 6801 format "YYYY-MM-DD". Also, values must be in the range from
1753-01-01 to 9999-12-30.
"2021-11-30"
TIME
Time values must be in the ISO 6801 format "hh:mm:ss".
"23:59:59"
DATE_AND_TIME
There are two ways to return values with the DATE_AND_TIME logical type. You can use an ISO 8601
string without a time zone, or a date-and-time JSON object that explicitly defines the time zone.
When you use the ISO 8601 format, the time zone is taken from the timeZoneId that's defined in
defaultValues in the connector configuration. If no default is present, the time zone is set to UTC.
"2021-12-30T23:59:59"
or
"2021-12-30T23:59:59.999"
or
{
"localDateAndTime": "2021-12-30T23:59:59.999",
"isDST": false,
"timeZoneId": "Europe/London"
}
BOOLEAN
true
or
false
INTEGER
Integer values must be in the range from -2147483648 to 2147483647.
2147483647
DOUBLE
Double values must be in the range from 4.94065645841246544e-324d to 1.79769313486231570e
+308d.
4.94065645841246544e-324d
DECIMAL
Decimal values can contain up to 18 digits before the decimal separator, and up to 4 digits after it. There
can be a leading minus sign, but no exponent (e) notation.
-123456789012345678.1234
SELECTED_FROM
Selected-from string values must match a permitted value that the schema or a form condition defines.
SUGGESTED_FROM
Suggested-from string values must match a permitted value that a form condition defines when you use
them as default values, but are otherwise unrestricted.
GEOSPATIAL
Geospatial values must be formatted as GeoJSON points, as described at https://datatracker.ietf.org/
doc/html/rfc7946#section-3.1.2.
The first element in the coordinates array is longitude, and must be a decimal between -180.0 and
180.0. The second element is latitude and must be a decimal between -90.0 and 90.0.
{
"type": "Point",
"coordinates": [1.0, 2.0]
}
GEOSPATIAL_AREA
Geospatial area values must be formatted as GeoJSON feature collections, as described at https://
datatracker.ietf.org/doc/html/rfc7946#section-3.3. Foreign members are allowed.
The geometry must be of type Polygon or MultiPolygon, both of which must contain at least four
coordinates, where the first coordinate matches the last.
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "MultiPolygon",
"coordinates": [
[
[
[1.0, 2.0],
[3.0, 4.0],
[5.0, 6.0],
[1.0, 2.0]
]
]
]
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[1.0, 2.0],
[3.0, 4.0],
[5.0, 6.0],
[1.0, 2.0]
]
]
}
}
]
}
A schema defines the structure of data that will be stored within your system. A schema must be
designed in a way that effectively maps the data that you have, to the type of analysis that is going to be
carried out.
Charting schemes on page 361
Charting schemes determine the mapping between items in i2 Analyze and the same items when
displayed on a chart in Analyst's Notebook Premium. You can transform data in different ways by
creating multiple charting schemes.
Common tasks
Creating entity types on page 355
Entity types define the structure for data that will be stored in i2 Analyze. By creating entity types, you
can classify your data.
Creating link types on page 357
You can create link types to define the structure for describing the relationships between entities. Link
types are similar to entity types in that they contain property types.
Changing the grading scheme on page 366
You can grade your properties in i2 Analyze to indicate the level of confidence you have in property
values. You can create and configure grade types within a grading scheme.
Troubleshooting and support
i2 Support
Schemas
A schema defines the structure of data that will be stored within your system. A schema must be
designed in a way that effectively maps the data that you have, to the type of analysis that is going to be
carried out.
Structuring your data in a way that is easy to understand is important because correctly organizing data
increases the likelihood of finding key information that relates to your investigation.
Data that will be stored within your system must be defined by using an entity, link, and property-based
model. Structuring your data in this format to fit the specifications of your investigation is called data
modeling.
You can use the Schema Designer to classify your data to resemble your data model. Data
classification involves creating components within your schema for the data that will be stored in terms
of entity and link types, with their property types.
Creating an effective schema involves several cycles of reviewing and testing before it is uploaded into
a production environment. This process reduces the need for changes during production that can affect
the data within your system.
Related tasks
Creating schemas on page 354
When the data that you want to analyze is modeled in terms of entities, links, and properties, you can
create a schema with schema components. The example schemas that come with i2 Analyze can be
edited if you do not want to start from a blank schema.
Data modeling
Before you create your schema, you must analyze the data that is likely to be available for analysis and
understand how that data is used during an investigation. Based on your findings, you can create a data
model that defines your data in terms of entities, links, and properties.
It is important to first establish the aim of your investigation when you are creating a data model
because it can help organize data effectively. You can analyze your data and consider how it fits the
basic concepts of data modeling.
The basic concepts of data modeling are formed by entities, links, and properties (ELP). Real world
objects in your data can be expressed as entities, the association between those entities as links, and
the values that characterize the entities and links as properties. Both entities and links are types of
items; items contain properties and property groups, and properties can be placed in items or property
groups.
The Schema Designer can be used to create a schema that is based on the data model's
specifications.
Important: If your deployment is using data connectors with other sources of data, the shape of the
data from those data connectors must also be modeled in your schema.
Schema structure
You can determine the structure of your schema by analyzing your data in terms of item and property
types. The schema structure is based on your data model.
Item types form the core of your schema and are created to classify the entities and links your identified
in your data model. The following table describes the item types that you can create:
Name Description
Entity You can create entity types to classify your entities that represent your real world objects.
type For example, you might create a Person entity type to classify entities that represent
specific people in an investigation. When you are adding information about these people,
details about a specific person are added to a separate entity of type person within the
system. You can assign a semantic type to an entity type to provide it with meaning by
identifying the nature of the data.
Link type You can create link types to classify the links you create between your entities. For
example, a Car entity type and a Person entity type might be associated with an Owner link
type. When you are adding information about the relationship between a car and a person,
it is added as a link of type owner within the system. A link type also dictates what types
of entities can be at the ends of links of that type. You can assign a semantic type to a link
type to provide it with meaning by identifying the nature of the data.
When you create item types, you can also create property and property group types that define
information about that item type. The following table describes how property and property group types
are used to classify property values for that are found in your data:
Name Description
Property type You can create property types to identify features in entity and link types that might be
useful in analysis. Property types are important because they enable comparison of
data more effectively by declaring whether they are the same. For example, you can
create an Eye Color property type to classify properties in your data that represents
different eye colors. The logical type of a property type establishes what type of data
can be stored in that property type.
Property Property types can be grouped in property group types to organize your data in a
group type way that might help analysis. For example, if you had a Person entity type, you might
assign facial features such as Hair Color or Eye Color as property types, and these
types could be grouped in a Facial Features property group type.
In addition to creating item, property, and property group types, the following table describes how your
schema can define other behavior:
Name Description
Labeling Labeling schemes define the property types that are used to identify items.
scheme
Grade types You can add grading information to indicate the level of confidence you have in your
property values for items. You can create a grading system with appropriate grade types
to match the rating system that your organization uses.
Link Link strengths that are set in the schema can be used on all link types. Only one link
strengths strength can be set as default. The default link strengths describe the links as being
either confirmed, tentative, or unconfirmed, but you can configure the labels and their
appearance in the Schema Designer.
Schema validation
To ensure the schema functions and is safe to use in your production environment, there are restrictions
your schema must comply with when you create and develop it. The number and types of restrictions
that are imposed on your schema vary depending on what stage of the development it is in.
Validation checks prevent you from implementing any changes that involve removing a unique ID from
your schema. Your schema must retain these unique IDs because without them you cannot find the
data that they are assigned to.
Your schema must pass three types of validation checks before it can be uploaded:
System type validation
When you create a new schema, it contains system entity types. The system entity types must
exist for the schema to function. You can configure the Name and Description sections of these
system entity types as they do not change the function of the components. You can assign other
components to the system entity types such as property and property group types. You are unable
to remove them from the schema.
Validation as you edit your schema
While you edit your schema, it must pass the Schema Designer validation checks. These checks
do not prevent you from editing your schema, but make you aware of any invalid changes that are
made. These validation checks ensure that your schema contains all the necessary components for
it to function, such as link end constraints. Any issues that are found in the schema can be found at
the bottom of the Schema Designer window for you to fix.
Validation when you upload your schema
To protect your data, i2 Analyze checks ensure that any changes you might make do not result in
data loss. Any changes that can cause data loss are called destructive changes. If you try to upload
a destructively changed schema, the validation checks fail, and an error message is displayed.
Related tasks
Creating link types on page 357
You can create link types to define the structure for describing the relationships between entities. Link
types are similar to entity types in that they contain property types.
Changing link strengths on page 367
You can create link strengths to visually indicate how much confidence there is in the information that is
represented by a particular link. Link strengths that you create in the schema can be applied to all link
types.
Creating schemas
When the data that you want to analyze is modeled in terms of entities, links, and properties, you can
create a schema with schema components. The example schemas that come with i2 Analyze can be
edited if you do not want to start from a blank schema.
When you create a schema, it contains default system entity types. These components hold necessary
information for the system and exist in order for it to work.
You can create schema components to classify and structure your data when it is stored in i2 Analyze.
You can classify your data in terms of item types with their property types.
Related concepts
Schemas on page 351
A schema defines the structure of data that will be stored within your system. A schema must be
designed in a way that effectively maps the data that you have, to the type of analysis that is going to be
carried out.
Field Description
Name The name is used to identify your property type
in the schema and Schema Designer window. It
has no size or content limitations, however you
might want to keep it short as it can be used as
a column header in results grids.
Description This field is used to store the hover help
associated with the property type.
Semantic Type Indicates the semantic type of the property.
Logical Type The type of data that is stored in the property
type and must be defined.
Is Mandatory Select this option to ensure that an item includes
at least one value for this property.
2. Optional: Click Is Mandatory if you want to make the property type mandatory.
Your property type is created. The property type can be integrated into a labeling scheme, graded, and
also mapped to chart item property types in a charting scheme.
Field Description
Name The name is used to identify your property type
in the schema and Schema Designer window. It
has no size or content limitations, however you
might want to keep it short as it can be used as
a column header in results grids.
Description This field is used to store a description of the
property group type that might be useful to
schema maintainers.
Is Mandatory Select this option to ensure that at least one
property within the property group is populated
for an item.
2. Optional: Click Is Mandatory if you want to make the property group type mandatory.
Your property group type is created.
1. In the tab of your chosen property type, select a logical type from the Logical type list. The default
logical type is Single Line String.
2. If you select either Selected From List or Suggested From List, add entries to the Selection List.
You can input each entry manually by clicking Add, or from an existing list file by clicking Import.
Note: When you import entries from a list file:
• They must be present in a text document (.txt).
• They must be presented as a list with the following structure:
Logical types
The logical types that you can specify in your schema depend on the data store that your deployment
uses.
If you want your labeling scheme to be used with your chart, you must map the labeling scheme in
the charting scheme editor.
4. Optional: Rearrange the Label Parts by clicking the up and down arrows as needed.
Your labeling scheme is created and applied to your item types based on your configuration. Your
labeling scheme is composed of any property types you specified to be labeled.
Your item or property type is assigned a semantic type. You can remove this semantic type by
clicking Remove.
Related tasks
Creating link types on page 357
You can create link types to define the structure for describing the relationships between entities. Link
types are similar to entity types in that they contain property types.
9. Optional: You can load a library of custom semantic types by clicking File > Custom Semantic
Types > Load Library
Note: If you choose to save to a file name that exists, the content of the library is overwritten. This
library can be merged into other schemas to use the custom semantic types it contains.
Your custom semantic type is added to your library.
Charting schemes
Charting schemes determine the mapping between items in i2 Analyze and the same items when
displayed on a chart in Analyst's Notebook Premium. You can transform data in different ways by
creating multiple charting schemes.
Property types from your schema can be mapped to chart item property types and attribute classes.
Mapping your property types to chart item property types correctly is important because they define how
property values from i2 Analyze are represented on a chart.
The effectiveness of a charting scheme depends on how accurately defined your schema is. It is
important to ensure that your schema is configured and detailed to meet the requirements of your
organization before you create a charting scheme that will match it.
You can configure the default entry for entity and link types. To apply your default entry to the item
types in your schema, they all must have property types with the same name. The default configurations
are used unless a type-specific mapping exists. For example, if your Default entry is configured and a
specific Person type mapping is inserted, then your Account uses Default, and Person uses Person.
You can choose to create data records on the mapped chart items. The data records contain the
information in the i2 Analyze item, regardless of whether it is mapped or not. You can also choose to
include property gradings in the data record.
To create or edit a charting scheme:
1. In the Schema Designer window, click File > Edit Charting Schemes.
If your current schema has no associated charting schemes, a window opens with four default
charting schemes in a navigation tree. If your schema has charting schemes that are associated with
it, a window opens with the charting schemes in the navigation tree.
In the Edit Charting Scheme window you can create, delete, and edit charting schemes.
2. When you are finished creating, deleting, or editing charting schemes, you can click OK to save your
changes. Alternatively, you can click Cancel if you do not want to save any creations or changes.
Related concepts
Charting scheme validation on page 361
When you create or edit any charting scheme in Schema Designer, it is validated upon creation or
when edited. Charting schemes that are not created within Schema Designer might contain invalid
information that does not coincide with the configuration of your schema.
Labeling schemes for chart item property types are created to label the data that is represented on
a chart for analysis. To map from the label of an item in i2 Analyze, you must specify which labeling
scheme you want to use.
To map a property type to a chart item property type:
1. Right-click on the Properties entry of the relevant property type, and select Insert Chart Item
Property Type.
The Add Chart Item Property Type window is shown.
2. From the Available chart item property type list, select a property and click OK. The Available
chart item property type list contains any chart item property type that is not mapped.
The appropriate property type page is shown, for example Label.
3. Configure the property type's specifications to how you want them to be represented on the chart.
The property type setup depends on the property type that is selected. You can define a label from
as many property types as you want, and include spaces or new lines.
For example, given name followed by a space followed by surname.
Your property type is successfully mapped to a chart item property.
Multiple value settings
It is possible for an item in i2 Analyze to contain properties with multiple values. Charting schemes
contain configurable specifications that define how multiple property values are charted.
For example, an item might have several properties of type Middle name, all of which have different
values. Items on a chart do not support this notion, so you must create a charting scheme and configure
it to accommodate the multiple property values.
You can configure the property value specifications when you insert property or attribute mappings.
The configuration options for determining the value that is charted depends on the underlying data type
of the property. For example, you can chart a Date & Time property value by using the earliest or the
most recent value from the set of multiple values.
Note: The default configuration is set to Use any single value. For the default configurations, they hold
no deterministic rules that specify which value in the set of values is charted.
When links are summarized - Single, Directed or Flow - each link item that is being summarized might
have multiple values for a property. When you are creating link type property mappings, the following
extra options are available for populating the set of values from which the charted value is selected:
• All values from all links
• Primary property from each link
CAUTION: The Information Store and Analyst's Notebook Premium do not support the concept
that a single property type can possess multiple property values. Data loss can occur if you
create and upload a charting scheme that accommodates for multiple property values, and place
items with selected property values onto a chart.
Related tasks
Configuring link summarizations on page 365
If there are multiple links between the same two entities on an Analyst's Notebook Premium chart, you
can choose how to represent those links to compact information. Your links can summarize information
such as transaction links with the Flow option selected, and can show the transactions between two
bank accounts with all of the transaction amounts.
Option Description
Single Link All the links of the selected link type between
the same two entities are combined into a single
link. This option is useful if you are producing a
summary chart and do not want to show all of
the details.
Directed All the links of the selected link type in the
same direction between the same two entities
are combined. This option is useful when
charting information such as telephone calls or
transactions.
Multiple Each link of the selected link type between the
same two entities is charted separately. This
option is useful if you want to show all of the
detail but if there are many links, it might make
the chart cluttered or unreadable.
Flow All the links of the selected link type are
combined into a single link. The direction (or
flow) is determined from a property that you
specify. For example, if there are several
financial transactions between two bank
accounts, this option is useful to determine
the direction that the aggregate amount of the
money is flowing. Select a property from the
Property used to calculate flow list. Flow value
is calculated and is available for you to map
in an attribute instance or chart item property
mapping.
Note: The Flow option is available only
when the selected link type has numeric data
properties.
4. When you combine multiple links between the same two entities, you can choose whether to retain
the strengths by keeping links of different strengths separate. Alternatively, you can choose to use
the strength of the weakest link. Select the appropriate options on the right side of the page.
Copying mappings
At any level in the charting scheme tree structure, you can copy the specifications of a mapping or
collection of mappings to another charting scheme. You can duplicate a whole charting scheme but
there might be situations when you want to copy only certain definitions in a charting scheme.
If you want to copy an element from one charting scheme to another:
1. Select the element in the tree view that you want to copy.
2. Click Copy to.
The Copy To window is shown.
3. Select the check boxes for the charting schemes that you want to copy the element to and click OK.
Note: When you copy an element, the Copy to window indicates whether a definition exists in the
target scheme for the entry you want to copy. Any existing definition in the selected charting scheme
is overwritten.
If you want to duplicate a charting scheme:
4. Select the charting scheme that you want to duplicate, and click Duplicate.
Your charting scheme is successfully duplicated and appears in the navigation tree with the prefix
'Copy'.
The definitions that are contained in the selected entry in the tree structure and all its child entries are
copied to the selected charting scheme. For example, if you select Entity Types in the tree structure and
click Copy to, all of the property and attribute mappings are also copied.
• 4 - Cannot be judged
• 5 - Suspected to be false or malicious
Handling code
• 1 - Can be disseminated to other law enforcement and prosecuting agencies, including law
enforcement agencies within the EEA, and EU compatible (no special conditions)
• 2 - Can be disseminated to UK non-prosecuting parties (authorization and records needed)
• 3 - Can be disseminated to non-EEA law enforcement agencies (special conditions apply)
• 4 - Can be disseminated within the originating agency only
• 5 - No further dissemination: refer to the originator. Special handling requirements that are
imposed by the officer who authorized collection
To change the grading scheme:
1. After you select Grade Types in the navigation tree in the Schema Designer application window,
click Insert > New Grade Type.
The new grade type is added to the Grade Types list on the left.
2. Enter a Name and a Description to identify the grade type.
3. For each grade value, enter a Grade Value and click Add.
4. Optional: Use the arrow buttons to change the order in which the grade values are displayed.
5. Optional: With a grade value selected, click Remove to delete the value.
6. Optional: Select the Mandatory check box to force users to select a value for this grade type when
you enter data.
Your grading scheme is successfully modified.
Glossary
This glossary provides terms and definitions for the i2 Analyze software and products.
The following cross-references are used in this glossary:
• See refers you from a nonpreferred term to the preferred term or from an abbreviation to the spelled-
out form.
• See also refers you to a related or contrasting term.
A
abstract semantic type
A semantic type that only serves as the parent of other semantic types. Abstract semantic types
categorize their child semantic types, but are never associated with real data.
access level
A measure of the rights that a user has to view or edit an item. Access levels are calculated
separately for every user and every item. See also grant level, security dimension.
administrator
A person responsible for administrative tasks such as access authorization and content
management. Administrators can also grant levels of authority to users.
alert
A message or other indication that signals an event or an impending event that meets a set of
specified criteria.
alert definition
The statement of criteria that trigger an alert.
aligned value
A value that is used to interpret equivalent native values from different data sources. For example,
the value Male can be used to align the native values M or Ma.
analysis attribute
A characteristic or trait pertaining to a chart item. Analysis attributes are never displayed on charts.
association chart
A chart that highlights the relationships between entities, rather than a chronology of events, by
arranging data in a manner that emphasizes particular associations.
attribute
A piece of information that is associated with a chart item, such as a date of birth or an account
number. An attribute is represented by a symbol, or a value, or both, that is displayed with the chart
item.
attribute class
A descriptor of the characteristics of an attribute, including the type of its values, how its values are
displayed, and the treatment of its values when they are merged or pasted on a chart.
attribute entry
An attribute with a preset value that can be associated with a chart item.
attribute instance
A single use of an attribute on a chart item.
audit
To record information about database or instance activity by applications or individuals.
authority
A measure of how well-connected an entity is, based on its inbound links. Authority is one of two
eigenvector centrality measures used in social network analysis. See also centrality, eigenvector.
automatic attribute
An attribute that is created automatically by the application and added to a chart item.
B
betweenness
A measure of how important an entity is, based on the number of paths that pass through it on an
association chart. Betweenness is one of the centrality measures used in social network analysis.
See also centrality, gatekeeper.
binding strength
A measure of the strength of a relationship between two entities that are directly or indirectly linked.
See also common neighbor.
box
An entity representation that can indicate an organization or group on a chart. A box is often used to
enclose other entities. See also circle, representation.
C
card
A record of information attached to an item. An item can have multiple cards.
centrality
The relative importance of one entity compared to other entities in social network analysis, as
determined by its relationships. See also authority, betweenness, closeness, degree, eigenvector,
hub, social network analysis.
chart
A visual representation of real-world objects, such as organizations, people, events, or locations,
and the relationships between them.
chart fragment
A view of a chart that highlights particular items of interest.
charting scheme
A definition that describes how item data behaves when it is visualized on a chart. For example,
how data is copied into chart item properties, the chart template and labeling scheme to use, and
whether to display attributes and pictures. See also chart template.
chart property
A characteristic of a chart, such as its summary description, time zone, grid size, background color,
or merge and paste rules. Chart properties are saved with the chart. See also chart template.
chart template
An object that is used for chart creation that contains preconfigured chart properties, and lists of
permitted entity types and link types. See also chart property, charting scheme.
child
In a generalization relationship, the specialization of another element, the parent. See also parent.
circle
An entity representation that can indicate an organization or a group on a chart. A circle is often
used to enclose other entities. See also box, representation.
circular layout
A layout in which entities are arranged by type around the circumference of a circle. See also layout.
cloaked item
An item whose existence is known to the user but whose information is hidden from the user. See
also item, placeholder, signpost message.
closeness
A measure of how quickly an entity can use links to get access to other entities on an association
chart. Closeness is one of the centrality measures used in social network analysis. See also
centrality.
cluster
A group of entities that have more connections to each other than to entities outside the group.
common neighbor
An entity that is directly connected to at least two other entities. For example, if C is connected to A
and B, then C is a common neighbor of A and B. See also binding strength, connection.
compact peacock layout
A layout in which complex groups of linked entities are arranged to highlight the structure of
associations. It is most suitable for charts with many linked entities. See also layout.
condition
A specified property, a value, and an operator that defines a comparison relationship between them.
One or more conditions can be used to create a query or a conditional formatting specification. See
also parameterized query.
conditional formatting
The process of defining and applying rules to change the appearance of chart items automatically,
based on their properties. See also conditional formatting specification.
conditional formatting specification
A collection of conditional formatting rules. See also conditional formatting.
connection
A direct relationship between a pair of entities on a chart, represented by one or more links. See
also common neighbor, connection multiplicity, directed connection.
connection multiplicity
A setting that controls whether multiple links between the same items are displayed as a single line,
as directed lines, or as multiple lines. See also connection.
controlling item
A chart item whose position on the chart is defined by its date and time, and whose position affects
the positions of other timed items. See also free item, ordered item.
cover sheet
A page on which the user can view and edit the summary and custom properties of a chart.
D
degree
A measure of how many direct relationships an entity has with other entities on an association chart.
Degree is one of the centrality measures used in social network analysis. See also centrality, root
entity.
directed connection
A connection between entities in which links that are in the same direction are represented as a
single link on a chart. See also connection.
E
eigenvector
A measure of how well-connected an entity is, based on its inbound and outbound links.
Eigenvector is one of the centrality measures used in social network analysis. See also authority,
centrality, hub.
end
An entity that is attached to a link. See also end constraint.
end constraint
A constraint on the types of entities that can be the end of a particular link. See also end, valid end
type.
entity
A set of details that are held about a real-world object such as a person, location, or bank account.
An entity is a kind of item.
entity semantic type
A semantic type that can be assigned only to an entity or an entity type. See also semantic type.
entity type
A descriptor of the characteristics of an entity, including the properties it can contain and its
appearance in visualizations.
event frame
An entity representation that emphasizes date and time information. An event frame is often used in
conjunction with theme lines. See also diverted theme line, representation.
excluded word list
A list of words that are ignored when they are entered as search terms.
expansion
A process that searches for entities within a data source that are directly related to some selected
entities.
F
free item
A chart item that is not ordered. Free items can be moved anywhere on the chart. See also
controlling item, ordered item.
G
gatekeeper
An entity with a high measure of betweenness that may control the flow of information among other
entities on an association chart. See also betweenness.
grade
A rating that indicates the accuracy of a piece of information or the reliability of an intelligence
source.
grading system
A rating scale that is used to classify information in a data store or on a chart. A grading system is a
measure of reliability and accuracy.
grant level
A measure of the rights that a user has to change the security permissions of an item. Grant levels
are calculated separately for every user and every item. See also access level, security dimension.
grouped layout
A layout in which entities are arranged to show groups of interconnected entities. See also layout.
H
heat map
A graphical representation of data values in a two-dimensional table format, in which higher values
are represented by darker colors and lower values by lighter ones.
hierarchical layout
A layout in which entities are arranged to show organizational structures. See also layout.
histogram
A graphical display of the distribution of values for a numeric field, in the form of a vertical bar chart
in which taller bars indicate higher values. See also histogram filter.
histogram filter
A filter that changes the appearance of a chart. When a histogram bar is selected, items that
match the conditions defined by that bar are selected, while items that do not are hidden. See also
histogram.
hub
A measure of how well-connected an entity is, based on its outbound links. Hub is one of two
eigenvector centrality measures used in social network analysis. See also centrality, eigenvector.
I
icon
An entity representation that consists of a stylized image and an optional label. See also
representation.
import design
A specification of how data from an external source will be transformed into chart items during an
import procedure.
item
An entity or a link. Items are characterized by the values of their properties. See also cloaked item,
merged item, ordered item.
L
labeling scheme
A specification for combining property values to be displayed on screen, or as chart item labels.
layout
The arrangement of items on a chart. See also circular layout, compact peacock layout, grouped
layout, hierarchical layout, minimize crossed links layout, peacock layout.
line strength
M
match
The part of a result that met a condition during a search operation. A search can yield a perfect
match or a partial match.
merged item
An item that is created by merging the information held in two or more items. See also item.
minimize crossed links layout
A layout in which entities are arranged in a configuration where the fewest number of links overlap.
See also layout.
multiplicity
See connection multiplicity.
N
network chart
See association chart.
O
ordered item
A chart item whose position is maintained within a sequence. The movement of an ordered item is
restricted such that it cannot be dragged beyond neighboring ordered items. See also controlling
item, free item, item.
P
parameterized query
A query with conditions in which one or more parameters are defined. The parameter values are set
by the user. See also condition.
parent
In a hierarchy or auto-level hierarchy, a member that has one or more child members at the level
immediately below.
path
A route on a chart between two entities. A path may include intermediate entities.
peacock layout
A layout where complex groups of linked entities are arranged to show the structure of associations.
It is most suitable for charts with many linked entities. See also layout.
pick list
A data category that has a limited number of permissible values, which are often presented in a
drop-down list in the user interface.
placeholder
A redacted version of an item that is displayed to the user in situations where displaying the full item
is not possible or not permitted. See also cloaked item, signpost message.
property
A container for a single piece of information about an item.
property group
A piece of information about an item that comprises related properties. For example, a
distinguishing feature of a person comprises information about the type, appearance, and location
of the distinguishing feature.
property semantic type
A semantic type that can be assigned to a property type, a property in a data record, or an attribute
class. See also semantic type.
property type
A descriptor of the characteristics of a property, including the type of information it can contain.
proportional
Pertaining to an area of a chart in which the horizontal distances between items have a linear
relationship with the time differences between them.
R
representation
The form in which an entity is represented on a chart. See also box, circle, event frame, icon, theme
line.
root entity
An entity in a grouped layout that has the highest degree centrality in its group. Depending on the
data, there can be more than one root entity. See also degree.
S
schema
A complete description of all the entity types, link types, and their associated property types that are
available for items within a system.
security dimension
A collection of related values that can be used to label a user according to their role or security
clearance, with the aim of affecting their access to information. See also access level, grant level.
semantic type
A category that defines the real-world meaning of data, and therefore how applications should
interpret that data. For example, Person is a semantic type that could be assigned to entity types
such as Male, Victim, and Witness. See also entity semantic type, link semantic type, property
semantic type.
signpost message
A piece of text that is stored and displayed with an item or a placeholder. The signpost message
explains how to obtain more control over, or more information about, the item. See also cloaked
item, placeholder.
snapshot
A stored version of a chart that preserves its contents and layout at a particular stage of its
development.
social network analysis
A method of analyzing the structure of social relationships that uses mathematical metrics to make
claims about social organization and social dynamics. See also centrality, weight.
source reference
An identifier that indicates the source of information, for example, a document reference number.
style segment
A section of a theme line between adjacent items to which color and strength can be applied.
T
theme line
An entity representation that shows the interactions of an entity over time. A theme line can be used
with event frames. See also diverted theme line, representation.
theme line extent
The distance between the beginning and end of a theme line
theme line wiring
The manner in which a theme line diverts from a horizontal trajectory in order to pass through and
travel between event frames.
timeline chart
A chart or a portion of a chart that shows a chronology of events. For example, a series of meetings
that occur over several days, or a set of transactions that occur over a period of time.
V
valid end type
An entity type that conforms to the end constraints of a particular link. See also end constraint.
W
weight
A value that is added to a link on an association chart, to represent its importance relative to other
links. Weight can influence the centrality measures used in social network analysis. See also social
network analysis.
weightings file
A file that contains information that can apply weighting values to links on a chart.
wiring segment
The section of a theme line between adjacent diverting event frames.
®
• During data ingestion, i2 Analyze uses ingestion mappings that describe how to create records in
the Information Store from the data in the staging tables. You can define these mappings in one or
more files.
There are two scenarios for ingesting data from an external source. The first time that you load data
of a particular item type from an external source, you perform an initial load. In an initial load, you are
presenting the data to the Information Store for the first time. After you perform the initial load, you might
want to perform periodic loads that update the Information Store as data is added to the external source
or the data is changed. The same staging tables and mapping files can be used in both scenarios.
The following diagram shows the types of data that can be included in each type of load that you might
perform. The data in blue is presented to the Information Store for the first time, and the data in red
contains updates to data that exists in the Information Store.
The i2® Analyze deployment toolkit provides two import modes that you can use to ingest data:
Intended audience
This guide is intended for users who understand both the i2 Analyze data model and the i2 Analyze
schema that defines the data structure of their deployment. Populating the Information Store staging
tables also requires you to be familiar with your database management system.
The ingestion mappings that you use are slightly different for each external data source that you ingest
data from. It is recommended that you keep any ingestion mapping files in a source control system to
track changes and so that you can use them in later development and production environments.
In your pre-production environment or test environment, use the process that you developed to
ingest larger quantities of data and note the time that is taken to ingest the data. You can complete
representative initial and periodic loads from each of your external data sources. The time that each
ingestion takes allows you to plan when to ingest data in your production environment more accurately.
In your production environment, complete your initial loads by using the developed and tested process.
According to your schedule, perform your periodic loads on an ongoing basis.
Your deployment architecture can impact the ingestion process in general, and any logic or tools used
for transformation of data. For more information, see Understanding the architecture on page 397.
The instructions that describe the ingestion process are separated into preparing for ingestion and
adding data to the Information Store.
Information Store
4
Data tables
Data tables
Data tables
Data tables
Ingest 6
Create Ingestion
mappings
i2 Analyze
schema Ingestion mapping file
Create 5
Staging tables
Data tables ETL
Data tables Populate logic & tools External
Data tables (optional) data source
Staging area
1
1. Decide which entity types and link types in the active i2 Analyze schema best represent the data that
you want the Information Store to ingest.
2. Create staging tables in the database for the types that you identified. Create more than one staging
table for some link types.
3. Use external tools, or any other appropriate technique, to transform your data and load the staging
tables with the data for ingestion.
4. Add information about your data source to the list of ingestion sources that the Information Store
maintains.
5. Write the ingestion mappings that govern the ingestion process and provide additional information
that the Information Store requires.
6. Run the ingestion command separately for each of the ingestion mappings that you wrote.
setup -t createInformationStoreStagingTable
-p schemaTypeId=type_identifier
-p databaseSchemaName=staging_schema
-p tableName=staging_table_name
createInformationStoreStagingTable
-stid type_identifier
-sn staging_schema
-tn staging_table_name
In both cases, type_identifier is the identifier of one of the entity types or link types from the i2 Analyze
schema that is represented in your data source. staging_schema is the name of the database schema
to contain the staging tables. (If you are using Db2, the command creates the database schema if it
does not exist. If you are using SQL Server, the schema must exist.) staging_table_name is the name of
the staging table itself, which must be unique, and must not exceed 21 characters in length.
Important: Many of the commands that are associated with the ingestion process modify the database
that hosts the Information Store. By default, the commands use the database credentials that you
specified during deployment in the credentials.properties file.
To use different credentials in the deployment toolkit, add importName and importPassword
parameters to the list that you pass to the command. To use different credentials in the ETL toolkit,
modify the DBUsername and DBPassword settings in the Connection.properties file.
1. If you are using the deployment toolkit, open a command prompt and navigate to the toolkit
\scripts directory. If you are using the ETL toolkit, navigate to the etltoolkit directory.
2. For each entity type or link type that you identified for ingestion, run the
createInformationStoreStagingTable command.
For example:
setup -t createInformationStoreStagingTable
-p schemaTypeId=ET5 -p databaseSchemaName=IS_Staging
-p tableName=E_Person
By convention, you create all of the staging tables for the same source in the same database
schema, which has the name IS_Staging in this example. It is also conventional to name the
staging table itself similarly to the display name of the entity type or link type to which the table
corresponds. In this case, the staging table is for the Person entity type.
Note: When the i2 Analyze schema allows the same link type between several different entity types,
create separate staging tables for each combination:
setup -t createInformationStoreStagingTable
-p schemaTypeId=LAC1 -p databaseSchemaName=IS_Staging
-p tableName=L_Access_To_Per_Acc
setup -t createInformationStoreStagingTable
-p schemaTypeId=LAC1 -p databaseSchemaName=IS_Staging
-p tableName=L_Access_To_Per_Veh
This example illustrates an Access To link type (with identifier LAC1) that can make connections from
Person entities to Account entities, or from Person entities to Vehicle entities. The commands create
staging tables with different names based on the same link type.
At the end of this procedure, you have a set of staging tables that are ready to receive your data before
ingestion takes place. The next task is to make your data ready to populate the staging tables.
• All staging tables contain columns for each of the access dimensions that the security schema
defines. If your external source includes security information, then you can map that information to
the security schema of your target deployment, and populate the staging table columns accordingly.
Alternatively, you can leave the security columns blank, and provide security dimension values on a
mapping- or source-wide basis later in the ingestion process.
• All staging tables contain correlation_id_type and correlation_id_key columns. To
correlate data that is ingested into the Information Store, use these columns to store the values that
comprise the correlation identifier for each row of data. If you do not want to use correlation, leave
the columns blank.
If you specify values for a correlation identifier, then also specify a value for the
source_last_updated column, which is used during the correlation process.
For more information about correlation, correlation identifiers, and the impact of the
source_last_updated value, see Overview of correlation.
• The columns named source_ref_source_type, source_ref_source_location, and
source_ref_source_image_url are used to populate the source reference that is generated
when the data is ingested.
For more information about implementing source references in your deployment, see Configuring
source references.
• The staging tables for link types contain a column for the direction of the link.
The Information Store considers links to go "from" one entity "to" another. The direction of a link can
be WITH or AGAINST that flow, or it can run in BOTH directions, or NONE.
• If your link data includes direction information, then you can add it to the staging table during the
population process, and then refer to it from the mapping file.
• If your link data does not include direction information, then you can specify a value in the
mapping file directly.
By default, if you have no direction information and you do nothing in the mapping file, the
Information Store sets the direction of an ingested link to NONE.
Important: The Information Store places limits on the ranges of values that properties with different
logical types can contain. If you attempt to use values outside these ranges, failures can occur during or
after ingestion. For more information, see Information Store property value ranges on page 404.
• If your data is in comma-separated value (CSV) files, then you can use the IMPORT or INGEST
commands.
• If your data is in the tables or views of another database, then you can use the IMPORT, INGEST, or
LOAD commands.
SQL Server provides the bulk insert and insert utilities:
• If your data is in comma-separated value (CSV) files, then you can use the BULK INSERT
command.
Note: To use the BULK INSERT command, the user that you run the command as must be a
member of the bulkadmin server role.
• If your data is in the tables or views of another database, then you can use the INSERT command.
• You can use SQL Server Integration Services as a tool to extract and transform data from various
sources, and then load it into the staging tables.
Alternatively, regardless of your database management system, you can use IBM InfoSphere
DataStage as a tool for transforming your data and loading it into the staging tables. You can specify the
database schema that contains the staging tables as the target location for the ETL output.
The subdirectories of the examples\data directory in the deployment toolkit all contain a db2 and a
sqlserver directory.
If you are using Db2, inspect the LoadCSVDataCommands.db2 file in the db2 directory. In each case,
this file is a Db2 script that populates the example staging tables from the prepared CSV files. The script
calls the IMPORT command repeatedly to do its work. In most instances, the command just takes data
from columns in a CSV file and adds it to a staging table in a Db2 database schema.
If you are using SQL Server, inspect the LoadCSVDataCommands.sql file in the sqlserver
directory. In each case, this file is an SQL script that populates the example tables from the prepared
CSV files. The script calls the BULK INSERT command repeatedly to do its work. The BULK INSERT
command uses .fmt format files, which are also in the sqlserver directory, to instruct SQL Server
how to process the CSV files into the staging tables. For more information about format files, see Non-
XML Format Files.
setup -t addInformationStoreIngestionSource
-p ingestionSourceName=src_name
-p ingestionSourceDescription=src_display_name
addInformationStoreIngestionSource
-n src_name
-d src_display_name
In both cases, src_name is a unique name for the ingestion source, which also appears in the mapping
file. src_display_name is a friendlier name for the ingestion source that might appear in the user
interface of applications that display records from the Information Store.
Important: The value that you provide for src_name must be 30 characters or fewer in length. Also, do
not use the word ANALYST as the name of your ingestion source. That name is reserved for records that
analysts create in the Information Store through a user interface.
1. If you are using the deployment toolkit, open a command prompt and navigate to the toolkit
\scripts directory. If you are using the ETL toolkit, navigate to the etltoolkit directory.
2. Run the addInformationStoreIngestionSource command, specifying the short and display
names of your ingestion source.
For example:
setup -t addInformationStoreIngestionSource
-p ingestionSourceName=EXAMPLE
-p ingestionSourceDescription="Example data source"
If the Information Store already contains information about an ingestion source with the name
EXAMPLE, this command has no effect.
After you complete this task, you have performed all the necessary actions, and gathered all the
necessary information, to be able to write ingestion mapping files. The next task is to create the
ingestion mapping file for your ingestion source.
If you prefer to start from an existing file, look at mapping.xml in the examples\data\law-
enforcement-data-set-1 directory of the deployment toolkit.
3. Run the ingestion command to validate the mapping.
If you are unhappy with the outcome, edit the ingestion mapping and run the command again.
4. Repeat all of the preceding steps for all the other staging tables that you populated.
When you specify importMode and set it to VALIDATE, the command checks the validity of
the specified mapping, but no ingestion takes place. For more information about the running the
command and any arguments, see The ingestInformationStoreRecords task on page 416.
The output to the console indicates whether the mapping you identified is valid, provides guidance
when it is not valid, and gives a full list of column mappings. The command sends the same
information to a log file that you can find at toolkit\configuration\logs\importer
\IBM_i2_Importer.log.
3. Complete this validation step for each staging table and ingestion mapping that you plan to use.
The ingestion process for links verifies that the entities at each end of the link are already ingested.
If it fails to find them, the process fails. When you are developing your ingestion process, ingest a
small amount of entity data before you validate your links.
Correct any problems in the ingestion mappings file (or any ingestion properties file that you specified)
before you proceed to Adding data to the Information Store on page 387.
Information Store keeps a log of all such instructions that you can review to determine the success or
failure of each one.
The commands in the i2 Analyze deployment and ETL toolkits make it possible for you to create,
update, and delete records in the Information Store. All three operation types are controlled by the data
in the staging tables and the mappings in the ingestion mapping files.
After any operation that uses toolkit commands to change the contents of the Information Store, you can
examine ingestion reports to determine how successful the operation was.
As described in Information Store data ingestion on page 376, there are 2 different import modes
that you can use to ingest your data. Before you run the ingestion commands, ensure that you use the
correct import mode for the data that you want to ingest. Remember that this might differ depending on
the item type that you are ingesting or ingestion mapping that you are using.
setup -t ingestInformationStoreRecords
-p importMappingsFile=ingestion_mapping_file
-p importMappingId=ingestion_mapping_id
-p importLabel=ingestion_label
-p importConfigFile=ingestion_settings_file
-p importMode=BULK
For more information about the running the command and any arguments, see The
ingestInformationStoreRecords task on page 416.
3. If any errors occur, refer to Troubleshooting the ingestion process on page 422.
4. If you need to re-create the database indexes, do so now.
For entity operations, the formula for calculating the required lock list size is:
If the lock record size for your deployment is 256 bytes, you need a lock list size of 75,000.
For link operations, the formula for calculating the required lock list size is:
If the lock record size for your deployment is 256 bytes, you need a lock list size of 225,000. Link
ingestion requires an increased lock list size because more tables in the Information Store are written to.
For information about lock lists, see locklist - Maximum storage for lock list configuration parameter.
The situations where you might want to drop and re-create the database indexes include:
• Completing an initial load for an item type or ingesting data into an empty Information Store
• When ingesting large amounts of data (For example, more than 20 million rows in the staging table)
The situation where you do not want to drop and re-create the database indexes include:
• When the Information Store contains a large amount of data for the item type you are ingesting,
the time that it takes to re-create the indexes can make the total ingestion time longer. If you are
ingesting into an Information Store that already contains about 1 billion records, do not drop and re-
create the database indexes.
• If you are ingesting links between entities of the same type and some of those entities might be
correlated, do not drop and re-create the database indexes.
By default, the indexes are kept in place and not dropped during data ingestion.
To help determine what is best for your data, you can test some ingestions that drop and re-create
indexes and some that don't. When you complete the test ingestions, record the time that it takes to
complete. You can use these times to inform you whether to manage the database indexes during
ingestion or not. As the amount of data in the Information Store increases, the time it takes to create the
indexes also increases.
If you decide to drop and re-create the database indexes during large ingestions that use multiple
staging tables across both entity and link types, the high-level process consists of the following steps:
1. Stop Liberty and Solr
2. Drop the database indexes for the entity types that you are ingesting data for
3. Ingest all entity data
4. Create the database indexes for the entity types that you ingested data for
5. Drop the database indexes for the link types that you are ingesting data for
6. Ingest all link data
7. Create the database indexes for the link types that you ingested data for
8. Start Liberty and Solr
You can use two methods to drop and create indexes:
• Use the i2 Analyze deployment toolkit to generate scripts that you run against the database
manually.
• Use the import configuration properties file to specify whether the indexes must be dropped or
created during a bulk import mode ingestion.
If you are ingesting data for multiple link types or you want to review the database scripts and run them
manually, it is best to use the generated scripts to drop and re-create the database indexes. If you
do not want to, or cannot, run scripts against the Information Store database, you can use the import
configuration file and allow the ingestion process to drop and re-create the database indexes.
If you are completing an ingestion that spans multiple ingestion periods where the system is in use
between ingestion periods, ensure that all of the indexes are created at the end of an ingestion period
and that you start Liberty and Solr before analysts use the system.
For information about creating the import configuration file, see References and system properties.
1. In your import configuration file, set both the dropIndexes and createIndexes settings to
FALSE.
For example:
dropIndexes=FALSE
createIndexes=FALSE
2. Generate the create and drop index scripts for each item type that you are ingesting data for.
For example, to generate the scripts for the item type with identifier ET5, run the following
commands:
setup -t stopLiberty
Where solr.host-name is the host name of the Solr server where you are running the command, and
matches the value for the host-name attribute of a <solr-node> element in the topology.xml
file.
4. Run the scripts that you generated in step 2 to drop the indexes for each entity type that you plan to
ingest data for. For example:
6. Run the scripts that you generated in step 2 to create the indexes for each entity type that you
ingested. For example:
7. Run the scripts that you generated in step 2 to drop the indexes for each link type that you plan to
ingest data for. For example:
Where solr.host-name is the host name of the Solr server where you are running the command, and
matches the value for the host-name attribute of a <solr-node> element in the topology.xml
file.
On the Liberty server, run:
setup -t startLiberty
When you use the import configuration settings to drop and re-create the database indexes, the toolkit
creates and drops the indexes for you as part of the ingestion process instead of running scripts against
the database manually. However, you must modify the import configuration file a number of times
throughout the ingestion of multiple item types. You must still stop Liberty and Solr before you run the
ingestion command, and start them again after the indexes are created.
If all of the data for a single item type is ingested with a single ingestion command, in your import
configuration file set the dropIndexes and createIndexes settings as follows:
dropIndexes=TRUE
createIndexes=TRUE
If the data for a single item type must be ingested by using multiple ingestion commands, you need to
modify the import configuration file before the first ingestion command, for the intermediate commands,
and before the final command for each item type. For example, if your data for a single entity type is in
more than one staging table.
1. The first time you call the ingestion command, set the dropIndexes and createIndexes settings
as follows:
dropIndexes=TRUE
createIndexes=FALSE
2. For the intermediate times that you call the ingestion command, set the dropIndexes and
createIndexes settings as follows:
dropIndexes=FALSE
createIndexes=FALSE
3. The final time you call the ingestion command, set the dropIndexes and createIndexes settings
as follows:
dropIndexes=FALSE
createIndexes=TRUE
After you ingest all the entity data, repeat the process for the link data.
setup -t ingestInformationStoreRecords
-p importMappingsFile=ingestion_mapping_file
-p importMappingId=ingestion_mapping_id
-p importLabel=ingestion_label
-p importConfigFile=ingestion_settings_file
-p importMode=STANDARD
For more information about the running the command and any arguments, see The
ingestInformationStoreRecords task on page 416.
Note:
setup -t previewDeleteProvenance
-p importMappingsFile=ingestion_mapping_file
-p importMappingId=ingestion_mapping_id
setup -t deleteProvenance
-p importMappingsFile=ingestion_mapping_file
-p importMappingId=ingestion_mapping_id
-p importLabel=ingestion_label
-p logConnectedLinks
-p importMode=BULK_DELETE
In the ETL toolkit, you reuse the ingestInformationStoreRecords command. For more
information about running the command from the ETL toolkit, see ETL toolkit on page 417.
For more information about the running the commands and any arguments, see The
previewDeleteProvenance and deleteProvenance tasks on page 418.
Bulk delete mode can be used for improved performance when you are removing provenance from the
Information Store that does not contribute to correlated records. If you try to delete any provenance
that contributes to correlated records, that provenance is not removed from the Information Store and
is recorded in a table in the IS_Public database schema. The table name is displayed in the console
when the delete process finishes. For example, IS_Public.D22200707130930400326011ET5.
Before you use bulk delete mode, ensure that your database is configured correctly. For more
information, see Database configuration for IBM Db2 on page 389.
The procedure for updating the Information Store in this way starts with a staging table that contains
information about the data that you no longer want to represent in the Information Store.
1. Ensure that the application server that hosts i2 Analyze is running.
2. Run the previewDeleteProvenance command to discover what the effect of running
deleteProvenance is.
For example:
The output to the console window describes the outcome of a delete operation with these
settings. High counts or a long list of types might indicate that the operation is going to delete
more records than you expected. Previewing the delete operation does not create an entry in the
Ingestion_Deletion_Reports view, the output is displayed in the console.
Note: When you run the command for entity records, the output can exaggerate the impact of the
operation. If the staging table identifies the entities at both ends of a link, the preview might count the
link record twice in its report.
3. Correct any reported problems, and verify that the statistics are in line with your expectations for the
operation. If they are not, change the contents of the staging table, and run the preview command
again.
4. Run the deleteProvenance command with the same parameters to update the Information Store.
For example:
Note: Do not run multiple deleteProvenance commands at the same time, or while data is being
ingested into the Information Store.
5. Repeat the steps for the types of any other records that you want to process.
At the end of this procedure, the Information Store no longer contains the provenance (or any connected
link provenance) for the data that you identified though the mapping files and staging tables. Any
records that lose all of their provenance, and any connected link records, are deleted as a result.
Deleting data is permanent, and the only way to restore it to the Information Store is to add it again
through the ingestInformationStoreRecords command.
Ingestion resources
The ingestion resources section contains information that is referenced elsewhere in the ingestion
section.
The following diagram shows some of the permutations. The examples in the upper-left and upper-right
quadrants represent deployments in which the ETL logic (implemented by a tool like IBM DataStage, for
example) is co-hosted with the i2 Analyze application. The database can be on the same or a separate
server; the solid arrows show data flow between the components during data load and ingestion.
Information Store
Information Store
Commands
i2 Analyze
i2 Analyze
Commands Commands
i2 Analyze i2 Analyze
The examples in the lower-left and lower-right quadrants represent deployments in which the ETL logic
is on a separate server from the i2 Analyze application. (Typically, the ETL logic is hosted alongside the
database or the external data source.) To enable the architecture, those deployments include the ETL
toolkit, which is a cut-down version of the main deployment toolkit that targets only data ingestion.
When you need the ETL toolkit, you can generate it on the i2 Analyze server, and copy it to the server
that hosts the ETL logic. When the ETL toolkit is properly configured, your ETL logic can run toolkit
commands without reference to the rest of the deployment. If you decide to use the ETL toolkit, the next
step is to deploy it. If not, you can move on to creating staging tables in the database.
As the diagrams show, the ETL toolkit is most likely to be useful in deployments where the i2 Analyze
application and the ETL logic are on separate servers. As you plan your approach to ingestion, consider
the following:
• If the ETL logic is relatively simple and data volumes are low, there are benefits to colocating as
many components as you can, especially in a new deployment.
• If your deployment requires separate servers from the start, or as it evolves over time, determine
where the bottlenecks are. Is it limited by server speed or network speed?
• If the ETL logic is taxing a server that hosts other components, consider moving the logic, but be
aware of the increase in network traffic.
• If the volume of data is taxing the network, consider colocating components when you are able. (You
might not have permission to deploy components to some servers, for example.)
By acting as a proxy for the i2 Analyze deployment toolkit, the ETL toolkit provides for more flexibility in
your choice of architecture. In some circumstances you can separate the database, the ETL logic, and
the i2 Analyze application without incurring a networking penalty.
This command creates the ETL toolkit in a directory that is named etltoolkit in the output path
that you specify.
3. Copy the ETL toolkit to the server that hosts the ETL logic.
If the ETL logic and toolkit are on the same server as the database management system that hosts the
Information Store, you do not need to modify the connection configuration. If the database management
system is on a different server, then you must ensure that the ETL toolkit can communicate with the
remote database.
4. Dependent on your database management system, install either Db2® client software or Microsoft™
Command Line Utilities for SQL Server on the server that hosts the ETL toolkit.
For more information, see Software Prerequisites.
5. Navigate to the classes directory of the ETL toolkit and open the Connection.properties file
in a text editor.
®
6. Ensure that the value for the db.installation.dir setting is correct for the path to the Db2
client or Microsoft™ Command Line Utilities for SQL Server on the server that hosts this ETL toolkit.
For example:
db.installation.dir=C:/Program Files/IBM/SQLLIB
7. If you are using Db2® to host the Information Store, you must catalog the remote Db2® database.
Run the following commands to enable the ETL toolkit to communicate with the Information Store:
Here, host-name, port-number, and instance-name are the values that are specified in the
topology.xml file. node-name can be any value that you choose, but you must use the same
value in both commands.
If the database management system that hosts the Information Store is not using SSL, then the process
is complete.
If the database management system is configured to use SSL, you must also enable the ETL toolkit to
communicate by using SSL.
8. Register the i2-database_management_system-certificate.cer certificate that you
exported from the database management system when you configured SSL on the server that hosts
the ETL toolkit.
• On Windows, import the certificate into the Trusted Root Certification Authorities store for the
current user.
• On Linux, copy the certificate to the /etc/pki/ca-trust/source/anchors directory and use
update-ca-trust to enable it as a system CA certificate.
9. Create a truststore and import into the truststore the certificate that you exported from the database
management system when you configured SSL.
For example, run the following command:
DBTrustStoreLocation=C:/IBM/etltoolkit/i2-etl-truststore.jks
DBTrustStorePassword=password
12.You can use the Liberty profile securityUtility command to encode the password for the
truststore.
a) Navigate to the bin directory of the WebSphere® Application Server Liberty profile deployment
that was configured by the deployment toolkit.
This SQL statement is then the definition of a corresponding staging table in Db2:
And this SQL statement is then the definition of a corresponding staging table in SQL Server:
Note: Additionally, staging tables for link types contain a column for the direction of the link, and further
columns for the information that uniquely identifies the link end data in the source.
The statements create the staging table in a separate schema from the Information Store data tables.
Many of the columns in the staging table have names that are derived from the display names of the
property types in the i2 Analyze schema. In most cases, the relationship between the schema and the
staging table is obvious, but there are a number of extra columns and differences:
• The source_id, origin_id_type, origin_id_keys columns of the staging table can be used
to store values that reference the rest of the data in its original source and can be used to make up
the origin identifier of the resulting record.
Note: If the staging table definition was for a link type, it would also contain from_ and to_
variations of each of the columns.
For more information about generating origin identifiers during ingestion, see Origin identifiers on
page 413.
• The next two columns of the staging table are source_created and source_last_updated.
You can use these columns to store information about when the data to be ingested was created and
modified in its source.
• The next two columns of the staging table are correlation_id_type and
correlation_id_key. If you want to correlate data during ingestion into the Information Store,
you can use these columns to store values that i2 Analyze uses to generate correlation identifiers.
For more information, see Overview of correlation.
Note: Although populating the correlation identifier columns is not mandatory, doing so acts like a
switch. The presence of correlation identifier values in any row of a staging table causes i2 Analyze
to perform correlation for all the rows in that table.
• Any property type in the i2 Analyze schema that has the logical type DATE_AND_TIME occupies four
columns in the staging table. These columns always appear in the same order:
• The "P0" column is for the local date and time as originally recorded, as a DATE_AND_TIME.
• The "P1" column is for the time zone of the local date and time, as listed in the IANA database.
For example, Europe/London.
• The "P2" column is for an indicator of whether Daylight Saving Time is (1) or is not (0) in effect.
Note: i2 Analyze considers this value only when the time is ambiguous because it occurs during
the hour that is "repeated" when Daylight Saving Time ends.
• The "P3" column is for the date and time as expressed in Coordinated Universal Time (UTC), as
another DATE_AND_TIME.
For more information about the permitted values for DATE_AND_TIME columns in your database
management system, see Information Store property value ranges on page 404.
• The next columns derive from the security schema rather than the i2 Analyze schema. One column
exists for each security dimension that the security schema defines. You can use these columns if
you want to give different dimension values to each i2 Analyze record that is created or updated as a
result of ingestion.
• In link tables, there is also a direction column to store the direction of links.
• The final three columns are named source_ref_source_type,
source_ref_source_location, and source_ref_source_image_url. These columns are
used to populate the source reference that is generated when the data is ingested.
For more information about implementing source references in your system, see Configuring source
references.
The staging tables contain some, but never all, of the data for i2 Analyze records. They do not contain
the type identifiers that Information Store records must have, and it is not mandatory to populate the
columns for timestamps, security dimension values, or correlation identifiers. You can supply the
remainder of the information in an ingestion mapping.
* The value for SINGLE_LINE_STRING depends on the value that is specified for each property with
that type in the i2 Analyze schema.
** If a Db2 database underlies the Information Store you can load time values that represent midnight as
24:00:00. When it stores such values, the database converts them to fit the ranges in the table.
Data that includes non-printing or control characters might not be indexed and can cause errors.
In addition to the values in the table, you can set the value of any non-mandatory property to null. In the
staging table for an item type that has a DATE_AND_TIME property type, all four columns that the value
is spread across must be null in that case.
The root element of an ingestion mapping file is an <ingestionMappings> element from the defined
namespace. For example:
<ns2:ingestionMappings
xmlns:ns2="http://www.i2group.com/Schemas/2016-08-12/IngestionMappings"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
</ns2:ingestionMappings>
Within the ingestion mapping file, you use an <ingestionMapping> element to define a mapping for
a particular entity type or link type. Each <ingestionMapping> element has a mandatory id attribute
that must be unique within the mapping file. You use the value to identify the mapping when you start
ingestion. For example:
<ingestionMapping id="Person">
...
</ingestionMapping>
Note: For examples of complete ingestion mapping files, search for files with the name mapping.xml
in the i2 Analyze deployment toolkit. All of those files contain definitions that are similar to the definitions
here.
When the mapping is for an entity type, the <ingestionMapping> element has the following children:
stagingArea
The <stagingArea> element specifies where the mapping gets its staged data from. In this
version of i2 Analyze, the staged data is always in a staging table, and <stagingArea> always
has a <tableName> child.
tableName
The value of <tableName> is the name of the staging table that contains the data to be ingested.
For example:
...
<stagingArea xsi:type="ns2:databaseIngestionSource">
<tableName>IS_Staging.E_Person</tableName>
</stagingArea>
...
itemTypeId
The value of the <itemTypeId> element is the identifier of the entity type (or the link type) to which
the mapping applies, as defined in the i2 Analyze schema.
For example:
...
<itemTypeId>ET5</itemTypeId>
...
originId
The <originId> element contains a template for creating the origin identifier of each ingested row.
<originId> has two mandatory child elements: <type> and <keys>.
For example:
...
<originId>
<type>$(origin_id_type)</type>
<keys>
<key>$(origin_id_keys)</key>
</keys>
</originId>
...
...
<dataSourceName>EXAMPLE</dataSourceName>
...
For example:
...
<createdSource>2002-10-04 09:21:33</createdSource>
<lastUpdatedSource>2002-10-05 09:34:45</lastUpdatedSource>
...
securityDimensionValues
Every row that the Information Store ingests must have at least one security dimension value from
each dimension in the security schema. The Information Store staging tables contain a column for
each access security dimension that the security schema defines.
In your ingestion process, you can use the staging table columns to store dimension values on
a per-row basis. Alternatively, you can specify that all the data that the Information Store ingests
through the same mapping get the same security dimension values.
In the ingestion mapping file, the <securityDimensionValues> element has
<securityDimensionValue> children. For per-row security, use the value of each
<securityDimensionValue> element to reference a security dimension column.
For example:
...
<securityDimensionValues>
<securityDimensionValue>$(security_level)</securityDimensionValue>
<securityDimensionValue>$(security_compartment)</securityDimensionValue>
</securityDimensionValues>
...
In the staging table, the referenced columns can contain either a single dimension value, or a
comma-separated list of dimension values.
For per-mapping security, set the value of each <securityDimensionValue> element to a
security dimension value.
For example:
...
<securityDimensionValues>
<securityDimensionValue>HI</securityDimensionValue>
<securityDimensionValue>UC</securityDimensionValue>
<securityDimensionValue>OSI</securityDimensionValue>
</securityDimensionValues>
...
In either approach, the values that you specify must be present in the i2 Analyze security schema.
When the ingestion mapping is for a link type, the <ingestionMapping> element has the same
children that entity types require, plus the following ones:
fromItemTypeId
The value of the <fromItemTypeId> element is the type identifier of entity records that the
schema permits at the "from" end of the link type to which this mapping applies.
For example:
...
<fromEntityTypeId>ET5</fromEntityTypeId>
...
fromOriginId
The <fromOriginId> element contains a template for creating the origin identifier of the entity
record at the "from" end of each ingested link row. Its syntax is identical to the <originId>
element.
The origin identifiers that result from <fromOriginId> must match the origin identifiers that result
from the <originId> element for the entity type in question. The ingestion process uses this
information to verify that the Information Store already ingested an entity record that has this origin
identifier.
For example:
...
<fromOriginId>
<type>$(from_origin_id_type)</type>
<keys>
<key>$(from_origin_id_keys)</key>
</keys>
</fromOriginId>
...
For more information about generating origin identifiers during ingestion, see Origin identifiers on
page 413.
toItemTypeId
The value of the <toItemTypeId> element is the type identifier of entity records that the schema
permits at the "to" end of the link type to which this mapping applies.
For example:
...
<toEntityTypeId>ET10</toEntityTypeId>
...
toOriginId
The <toOriginId> element behaves identically to the <fromOriginId> element, except that it
applies to the entity record at the "to" end of each ingested link row.
For example:
...
<toOriginId>
<type>$(to_origin_id_type)</type>
<keys>
<key>$(to_origin_id_keys)</key>
</keys>
</toOriginId>
...
For more information about generating origin identifiers during ingestion, see Origin identifiers on
page 413.
linkDirection
The <linkDirection> element is a non-mandatory child of the <ingestionMapping> element.
When you include a <linkDirection> element in an ingestion mapping, you can either provide
the same value for all links, or refer to the direction column of the staging table. Legal values for
the element or the column are WITH, AGAINST, BOTH, and NONE.
For example, to use a fixed value:
...
<linkDirection>WITH</linkDirection>
...
...
<linkDirection>$(direction)</linkDirection>
...
If an ingestion mapping for a link type does not contain a <linkDirection> element, then any
links that the Information Store ingests through the mapping have no direction.
SEC_LEVEL_VALUE=UC
SEC_COMPARTMENT_VALUE=HI,OSI
IngestionFailureMode=MAPPING
When you run the ingestion command, you reference your settings file as follows:
setup -t ingestInformationStoreRecords
...
-p importConfigFile=ingestion_settings.properties
System properties
As well as providing values for ingestion mappings, you can use the settings file to configure the
behavior of the ingestion process. The file supports a handful of system properties that you can set in
the same way as you create and set custom properties.
IngestionFailureMode [ RECORD | MAPPING ]
When the Information Store encounters a problem with a record during ingestion, its default
behavior is to log the error and move on to the next record. Failure is record-based. Instead, you
can specify that a problem with one record causes the Information Store not to ingest any of the
records from that staging table. Failure then is mapping-based.
In the settings file, the possible values for the IngestionFailureMode setting are RECORD or
MAPPING. The default value is RECORD.
For example, to change the failure mode to mapping, add the following line to your settings file:
IngestionFailureMode=MAPPING
RecordFailureThreshold
During the ingestion process, if the number of errors that occur is greater than the value for
the RecordFailureThreshold property the process stops and no data is ingested into the
Information Store. By default, the value for this property is 1000.
For example:
RecordFailureThreshold=500
IngestionRunstats=FALSE
The following command is the RUNSTATS command to run manually after an ingestion.
The command generates query output, and saves the output to the file named
ISStatisticsCollection.sql that you can run.
db2 -x
"SELECT
'RUNSTATS ON TABLE ' || TRIM(TABSCHEMA) ||'.'|| TRIM(TABNAME) || '
WITH DISTRIBUTION ON ALL COLUMNS
AND INDEXES ALL
ALLOW WRITE ACCESS
TABLESAMPLE BERNOULLI(10)
INDEXSAMPLE BERNOULLI(10);'
FROM SYSCAT.TABLES
WHERE TYPE = 'T' AND TABSCHEMA = 'IS_DATA'"
> ISStatisticsCollection.SQL
ImportBatchSize
The ImportBatchSize controls the number of rows from a staging table that are ingested per
batch. By default, the ImportBatchSize is set to 100,000.
For example:
ImportBatchSize=100000
If you are ingesting data into the system while analysts are using the system, it is possible that
some analysis operations are blocked from returning results while a batch of data is being ingested.
During the ingestion process, up to 100,000 rows (or the value of your batch size) can be locked
in the database during each batch of the import. These locks cause a potential for the following
analytical operations to stop returning results until the batch is complete: Find Path, Expand, and
Visual Query.
To determine how long each batch takes to ingest, inspect the IS_Data.Ingestion_Batches
table. The time that the batch takes to complete is the time that analytical operations might be
blocked from returning results. If the time is too long for your requirements, you can reduce the
value for ImportBatchSize to reduce the batch size and the time that it takes to complete.
For SQL Server, a batch size greater than 1,700 can cause table locks during link ingestion.
References
Many of the pieces of information that you provide in an ingestion mapping are fixed for that mapping.
Item types, end types, and some parts of the origin identifier do not change between the i2 Analyze
records that one mapping is responsible for. The most appropriate way to specify this kind of information
is to use constant values on a per-mapping basis.
The two main reasons for preferring references to constant values lie at opposite ends of the spectrum:
• To give different values for the same field in records that are ingested through the same mapping,
you can refer to a staging table column. This approach is appropriate for many non-property values
that change from one record to the next.
• To use the same values across multiple ingestion mappings, refer to a property in a settings file. This
approach might be appropriate when you want all the data from a source to get the same security
dimension values. You can refer to the same property from every mapping that you write.
A settings file that defines properties for the ingestion process is just a text file that contains a set of
name=value pairs, with one pair on each line:
SEC_LEVEL_VALUE=UC
SEC_COMPARTMENT_VALUE=HI,OSI
When you run one of the ingestion commands, you can supply it with the name of the properties file
whose values you want to use.
To use a value by reference in an ingestion mapping, you use the $(name) syntax. name is the name
of either a column in the staging table or a property in a settings file. For example, $(SOURCE_ID) and
$(DIRECTION) refer to staging table columns, while in the previous example $(SEC_LEVEL_VALUE)
and $(SEC_COMPARTMENT_VALUE) refer to properties.
Note: Since referring to columns and properties uses the same syntax, a clash can happen if a column
and a property have the same name. In that case, the value of the property takes precedence.
Origin identifiers
The role of a source identifier is to reference the data for a record reproducibly in its original source. The
source identifiers that records receive during ingestion are unique within i2 Analyze, and they have a
special name in this context. They are called origin identifiers.
The nature of a origin identifier depends on the source and the creation method, and sometimes on
whether the record is a link or an entity. When you ingest data into the Information Store, i2 Analyze
compares the incoming origin identifier with existing records. If it finds a match, i2 Analyze updates a
record instead of creating one.
After you develop your process for creating origin identifiers, you must continue to use that process.
If you change the way that your origin identifiers are created and ingest the same data again, the
Information Store creates new records for the data instead of updating the existing records. To ensure
that changes to data are processed as updates, you must create your origin identifiers consistently.
For more information about different identifiers in i2 Analyze, see Identifiers in i2 Analyze records.
During the ingestion process, you specify the data for your identifiers in the staging table and ingestion
mapping file. An origin identifier is constructed of a "type" and "keys".
type
The "type" of an origin identifier allows the services in an i2 Analyze deployment to determine
quickly whether they are interested in (or how to process) a particular row of data. The value of
the type element does not have to be meaningful, but data from different sources generally have
different values.
For a deployment that uses a Db2 database, the length of the origin identifier type must not exceed
100 bytes, which is equivalent to 50 two-byte Unicode characters. For a SQL Server database, the
limit is 200 bytes, or 100 two-byte Unicode characters.
keys
The "keys" of an origin identifier contain the information necessary to reference the data in its
original source. The pieces of information that you use to make up the keys differs depending on the
source of the data. For data that originates in relational sources, you might use keys whose values
include the source name, the table name, and the unique identifier of the data within that table.
The length of the origin identifier keys must not exceed the following sizes:
• On Db2: 1000 bytes. This is equivalent to 500 Unicode characters.
• On SQL Server: 692 bytes. This is equivalent to 346 Unicode characters.
• It is recommended that your origin identifiers are as short as possible in length, and that any
common values are at the end of the key.
• Do not use non-printing or control characters in your origin identifiers because they might not be
indexed correctly and cause your origin identifiers to be different from your intended values.
There are two mechanisms for specifying the data for your origin identifiers. You can populate
the staging table with all the information required to create the origin identifiers, or you can use a
combination of information in the staging table and the ingestion mapping.
When you can provide all the information in the staging tables, there is less processing of the data
during ingestion, which can improve ingestion performance.
All information in the staging table
If you can populate the staging table with all the information for your origin identifiers, you can use
the origin_id_type and origin_id_keys columns to store this information. Populate the
origin_id_type column with the type of your origin identifier. Populate the origin_id_keys
column with a unique value that is already a composite of key values including the unique identifier
from the source. When you use these columns, you must specify them in the ingestion mapping file
that you use.
To ingest links, you must specify the origin identifiers at the end of the link. You specify the "to" end
of the link in the to_origin_id_type and to_origin_id_keys columns, and the "from" end in
from_origin_id_type and from_origin_id_keys.
When you specify all the information in the staging table, the origin identifier section of your
ingestion mapping is more simple. For example:
...
<originId>
<type>$(origin_id_type)</type>
<keys>
<key>$(origin_id_keys)</key>
</keys>
</originId>
...
...
<fromOriginId>
<type>$(from_origin_id_type)</type>
<keys>
<key>$(from_origin_id_keys)</key>
</keys>
</fromOriginId>
<toOriginId>
<type>$(to_origin_id_type)</type>
<keys>
<key>$(to_origin_id_keys)</key>
</keys>
</toOriginId>
...
...
<originId>
<type>OI.EXAMPLE</type>
<keys>
<key>$(source_id)</key>
<key>PERSON</key>
</keys>
</originId>
...
To specify the origin identifiers at the link ends, if the to end is an "Account" entity type:
...
<fromOriginId>
<type>OI.EXAMPLE</type>
<keys>
<key>$(from_source_id)</key>
<key>PERSON</key>
</keys>
</fromOriginId>
<toOriginId>
<type>OI.EXAMPLE</type>
<keys>
<key>$(to_source_id)</key>
<key>ACCOUNT</key>
</keys>
</toOriginId>
...
The deployment toolkit command for ingesting records looks like this:
setup -t ingestInformationStoreRecords
-p importMappingsFile=ingestion_mapping_file
-p importMappingId=ingestion_mapping_id
-p importLabel=ingestion_label
-p importConfigFile=ingestion_settings_file
-p importMode=STANDARD|VALIDATE|BULK
Here, ingestion_mapping_file is the path to the XML file that contains the mapping that you want to use,
and ingestion_mapping_id is the identifier of the mapping within that file. The latter is mandatory unless
the file contains only one mapping.
The importLabel, importConfigFile, and importMode parameters are optional:
• When you specify importLabel, ingestion_label is a name that identifies a particular use of the
command in the Information Store's IS_Public.Ingestion_Deletion_Reports view.
• When you specify importConfigFile, ingestion_settings_file is the path to a settings file that
contains name=value pairs. You can refer to names in the settings file from references in the
ingestion mapping file to use their values when you run the ingestInformationStoreRecords
command.
• When you specify importMode, you can set it to STANDARD, VALIDATE, or BULK.
• STANDARD mode can be used to ingest new and updated records, with or without correlation. If
you do not specify importMode, STANDARD mode is used.
• VALIDATE mode checks the validity of the specified mapping, but no ingestion takes place.
• BULK mode can be used to ingest new records without correlation.
ETL toolkit
The equivalent ETL toolkit command looks like this. The ETL toolkit command can also delete
provenance from the Information Store.
ingestInformationStoreRecords
-imf ingestion_mapping_file
-imid ingestion_mapping_id
-il ingestion_label
-icf ingestion_settings_file
-lcl true|false
-im STANDARD|VALIDATE|BULK|DELETE_PREVIEW|DELETE|BULK_DELETE
Here, ingestion_mapping_file is the path to the XML file that contains the mapping that you want to use,
and ingestion_mapping_id is the identifier of the mapping within that file. The latter is mandatory unless
the file contains only one mapping.
The other parameters are optional:
• When you specify il, ingestion_label is a name that identifies a particular use of the command in the
Information Store's IS_Public.Ingestion_Deletion_Reports view.
• When you specify icf, ingestion_settings_file is the path to a settings file that contains
name=value pairs. You can refer to names in the settings file from references in the ingestion
mapping file to use their values when you run the ingestInformationStoreRecords command.
• When you specify lcl as true, any links that are removed as part of an entity delete operation are
logged in IS_Public.D<import identifier><entity type id>_<link type id>_Links
tables. For example, IS_Public.D20180803090624143563ET5_LAC1_Links.
You can specify lcl with the DELETE import mode only.
• When you specify im you can set it to one of the following modes:
• STANDARD mode can be used to ingest new and updated records, with or without correlation. If
you do not specify im, STANDARD mode is used.
• VALIDATE mode checks the validity of the specified mapping, but no ingestion takes place.
• BULK mode can be used to ingest new records without correlation.
• DELETE_PREVIEW mode can be used to preview the effect of running a delete.
• DELETE mode can be used to delete provenance from the Information Store.
• BULK_DELETE mode can be used to delete provenance that does not contribute to a correlated
record.
setup -t previewDeleteProvenance
-p importMappingsFile=ingestion_mapping_file
-p importMappingId=ingestion_mapping_id
setup -t deleteProvenance
-p importMappingsFile=ingestion_mapping_file
-p importMappingId=ingestion_mapping_id
-p importLabel=ingestion_label
-p importConfigFile=ingestion_settings_file
-p logConnectedLinks
-p importMode=BULK_DELETE
Here, ingestion_mapping_file is the path to the XML file that contains the mapping that you want to use,
and ingestion_mapping_id is the identifier of the mapping within that file. The latter is mandatory unless
the file contains only one mapping.
The other parameters are optional:
• When you specify importLabel, ingestion_label is a name that identifies a particular use of the
command in the Information Store's IS_Public.Ingestion_Deletion_Reports view.
• When you specify importConfigFile, ingestion_settings_file is the path to a settings file that
contains name=value pairs. You can refer to names in the settings file from references in the
ingestion mapping file to use their values when you run the ingestInformationStoreRecords
command.
• When you specify logConnectedLinks, any links that are removed as
part of an entity delete operation are logged in IS_Public.D<import
identifier><entity type id>_<link type id>_Links tables. For example,
IS_Public.D20180803090624143563ET5_LAC1_Links.
You can specify logConnectedLinks when you do not specify importMode only.
• When you specify importMode, you can set it to BULK_DELETE. BULK_DELETE mode can be
used to delete provenance that does not contribute to a correlated record. If you do not specify
importMode, the standard delete process is used.
contents of this view to track the history of all such operations, and to examine the impact of a particular
operation.
Each time that you run a command that might change the contents of the Information Store, you create
a job in the database. Each job acts on one or more batches of i2 Analyze records. There is always one
batch per item type that the command affects, but there can also be several batches for the same type if
the number of affected records is large.
For example, consider a command that processes updates for deleted Person entity data. The first
batch in the resulting job is for Person records, and there might be more such batches if there are many
records to be deleted. If the Person data has links, then the job has further batches for each type of link
that might get deleted as a result of the entity deletion.
The IS_Public.Ingestion_Deletion_Reports view contains information about every batch from
every toolkit operation to create or update data in the Information Store. When you query the view,
include ORDER BY job_id to group entries for the same job.
Note: Deletion-by-rule operations also result in job and batch creation, and view population, according
to the same rules. For more information, see the Deletion Guide.
The first few columns in the view have the same value for all batches within a job:
The remaining columns can have different values for different batches of records:
Ingest example
The (abbreviated) report for successful ingestion operations might look like this:
job_id 1 2
ingestion_mode Standard Standard
primary_item_type ET10 ET4
primary_record_count 62 8
batch_item_type ET10 ET4
batch_start_time 2017-11-30 2017-11-30 15:27:09.45
15:27:06.76
batch_end_time 2017-11-30 2017-11-30 15:27:09.63
15:27:09.87
insert_count 57 7
update_count 0 0
merge_count 5 1
unmerge_count 0 6
delete_count 0 0
delete_record_count 0 0
reject_count 0 0
status Succeeded Succeeded
In this example, several commands to ingest entity records resulted in the creation of several jobs.
Each job demonstrates different behavior that is possible during ingestion, including correlation
operations:
JOB_ID 1
This job demonstrates what the ingestion report can look like when data in the staging table causes
merge operations. In this example, five merge operations are completed on the incoming rows of
data, as shown in the merge_count column. This results in 57 i2 Analyze records created from the
62 rows of data, as shown in the insert_count and primary_record_count columns. This
includes merging five rows of data with existing i2 Analyze records in the Information Store.
JOB_ID 2
This job demonstrates what the ingestion report can look like when the data in the staging table
causes unmerge and merge operations. In this example, six unmerge operations are completed
on the incoming rows of data, as shown in the unmerge_count column. One merge operation
is completed on the incoming rows, as shown in the merge_count column. This results in
7 i2 Analyze records created, from eight rows of data as shown in the insert_count and
primary_record_count columns. The primary_record_count value does not include the
unmerge_count.
Delete example
The (abbreviated) report for a successful delete operation might look like this:
job_id 26 26 26 26 26
ingestion_mode Delete Delete Delete Delete Delete
primary_item_type ET5 ET5 ET5 ET5 ET5
primary_record_count 324 324 324 324 324
batch_item_type ET5 LAC1 LAS1 LEM1 LIN1
batch_start_time 2017-11-30 2017-11-30 2017-11-30 2017-11-30 2017-11-30
15:27:06.76 15:27:08.60 15:27:08.60 15:27:09.43 15:27:09.45
batch_end_time 2017-11-30 2017-11-30 2017-11-30 2017-11-30 2017-11-30
15:27:09.87 15:27:09.30 15:27:09.29 15:27:09.62 15:27:09.63
insert_count 0 0 0 0 0
update_count 0 0 0 0 0
merge_count 0 0 0 0 0
unmerge_count 0 0 0 0 0
delete_count 324 187 27 54 33
In this example, a command to update the Information Store for deleted entity data (with
item type ET5) resulted in the creation of a job with five batches. The first few columns of the
Ingestion_Deletion_Reports view contain the same values for all batches in the same job.
Later columns reveal how deleting entity records results in the deletion of connected link records
(with item types LAC1, LAS1, LEM1, LIN1).
In one case, the delete_record_count value is less than the delete_count value. This is
because some of the provenance to be deleted was associated with an i2 Analyze record that had
more than one piece of provenance. An i2 Analyze record is deleted only when the last associated
provenance is deleted.
Partial success
If you ran the command in record-based failure mode, and it processed some of the rows in the
staging table without error, then it reports partial success like this example:
The records in the Information Store reflect the rows from the staging table that the command
successfully processed. The report includes the name of a database view that you can examine to
discover what went wrong with each failed row.
Failure
If you ran the command in mapping-based failure mode, then any error you see is the first one that it
encountered, and the report is of failure:
When the process fails in this fashion, the next lines of output describe the error in more detail. In
this event, the command does not change the contents of the Information Store.
Note: If a serious error occurs, it is possible for the ingestion command not to run to completion. When
that happens, it is harder to be certain of the state of the Information Store. The ingestion process uses
batching, and the records in the store reflect the most recently completed batch. If you are using the
bulk import mode, see Bulk import mode error on page 425 for more information about recovering
from errors at this stage.
If the command reports partial success, you might be able to clean up the staging table by removing
the rows that were ingested and fixing the rows that failed. However, the main benefit of record-based
failure is that you can find out about multiple problems at the same time.
The most consistent approach to addressing failures of all types is to fix up the problems in the staging
table and run the ingestion command again. The following sections describe how to react to some of the
more common failures.
When the Information Store ingests link data, you might see the following error message in the console
output:
Link data in the staging table refers to missing entity records
This message is displayed if the entity record at either end of a link is not present in the Information
Store. To resolve the error:
• Examine the console output for your earlier operations to check that the Information Store ingested
all the entity records properly.
• Ensure that the link end origin identifiers are constructed correctly, and exist for each row in the
staging table.
• Ensure that the link type and the entity types at the end of the links are valid according to the i2
Analyze schema.
Then, rerun the ingestion command.
During any ingestion procedure, but especially when a staging table is large, you might see the following
error message in the console output:
Rows in the staging table have duplicate origin identifiers
This message is displayed when several rows in a staging table generate the same origin identifier. For
example, more than one row might have the same value in the source_id column.
If more than one row in the staging table contains the same provenance information, you must resolve
the issue and repopulate the staging table. Alternatively, you can separate the rows so that they are not
in the same staging table at the same time.
This problem is most likely to occur during an update to the Information Store that attempts to change
the same record (with the same provenance) twice in the same batch. It might be appropriate to
combine the changes, or to process only the last change. After you resolve the problem, repopulate the
staging table and rerun the ingestion command.
During an ingestion procedure that contains geospatial data, you might see the following error
messages in the console output:
On Db2:
SQLERRMC=GSEGEOMFROMWKT;;GSE3052N Unknown type "FOO(33.3" in WKT.
On SQL Server:
System.FormatException: 24114: The label FOO(33.3 44.0) in the input well-
known text (WKT) is not valid.
This message is displayed when data in a geospatial property column is not in the correct format.
Data in geospatial property columns must be in the POINT(longitude latitude) format. For more
information, see Information Store property value ranges on page 404.
During an ingestion procedure with correlated data, you might see the following error message in the
console output:
An error occurred during a correlation operation. There might be some data
in an unusable state.
This message is displayed if the connection to the database or Solr is interrupted during a correlation
operation.
To resolve the problem, you must repair the connection that caused the error, and then run the
syncInformationStoreCorrelation toolkit task. This task synchronizes the data in the
Information Store with the data in the Solr index so that the data returns to a usable state.
After you run the syncInformationStoreCorrelation task, reingest the data that you were
ingesting when the failure occurred. Any attempt to run an ingestion or a deletion command before you
run syncInformationStoreCorrelation will fail.
During an ingestion procedure, you might see the following error message in the console output:
You cannot ingest data because an ingestion with correlated data is still in
progress,
or because an error occurred during a correlation operation in a previous
ingestion.
If another ingestion is still in progress, you must wait until it finishes. If a previous ingestion failed during
a correlation operation, you must run the syncInformationStoreCorrelation toolkit task.
For more information about running the syncInformationStoreCorrelation toolkit task, see Error
occurred during a correlation operation on page 425.
During an ingestion procedure, you might see the following error message in the console output:
You cannot ingest data for item type <ET5> because an ingestion is still in
progress.
You must wait until the process is finished before you can start another
ingestion for this item type.
If another ingestion of the same item type is still in progress, you must wait until it finishes.
If you are sure that the ingestion is complete or not in progress, you can remove the file that is blocking
the ingestion. To determine whether an ingestion is in progress, a file is created in the temporary
directory on the server where the ingestion command was run. For example, AppData\Local\Temp.
The file name is INGESTION_IN_PROGRESS_<item type ID>. After you remove the file, you can
run the ingestion command again.
The symptoms of this type of failure are a stack trace and failure message in the console and importer
log. To recover from a failure at this time:
1. Identify the cause of the failure. You must use the SQL error codes to determine the cause of the
failure.
You might see error messages about the following issues:
Intended audience
This documentation is intended for users who want to correlate data as it is being ingested into the
Information Store.
Users must understand how to ingest data into the Information Store. For more information, see
Information Store data ingestion.
Users must understand the i2 Analyze data model, and i2 Analyze record structure. For more
information, see Data in i2 Analyze records.
Overview of correlation
Correlation is the process of associating multiple pieces of data with each other based on strong
identifiers. For the process of ingesting data into the Information Store, i2 Analyze can use correlation
identifiers that you provide to determine how to process and represent data in i2 Analyze records.
Correlation in i2 Analyze
During the ingestion process for the Information Store, correlation can be used to determine when data
that is ingested is associated with existing records, and must be represented by a single i2 Analyze
record. Conversely, if data no longer represents the same object, the data can be represented by
multiple i2 Analyze records. In i2 Analyze, these operations are known as merge and unmerge.
During the correlation process, an identifier is used to determine how each row of data is associated.
You present the identifier to the Information Store with the other staging data during ingestion.
You can limit the use of correlation to data from a specific source, of certain item types, or per row of
data ingested into the Information Store. You do not have to provide a correlation identifier for all the
data that you ingest into the Information Store.
Correlation uses
You might want to use correlation when you are ingesting data that originates from disparate sources,
which have common properties, or have the potential to represent the same real-world objects. For
example, if you have two data sources that contain information about people.
Another scenario where you might use correlation, is when the data that you are ingesting is in the form
of event driven models (crime or complaint reports) where the same actors (people, locations, phones,
and vehicles) might be referred to frequently in the same source.
Correlation can be used in these scenarios to combine multiple source records into single i2 Analyze
records for link analysis.
Correlation method
In i2 Analyze, correlation identifiers and implicit discriminators are used to determine how the
Information Store processes data during ingestion.
When you ingest data into the Information Store, you can provide a correlation identifier type and key
value that are used to construct the correlation identifier for each row of data in the staging table. The
type and key values that you provide are used to process data that is determined to represent the
same real world object. Implicit discriminators are formed from parts of the i2 Analyze data model in
the Information Store. Even if correlation identifiers match, if values for elements of the i2 Analyze data
model are not compatible, that data cannot be represented by the same i2 Analyze record. For more
information about correlation identifiers and implicit discriminators, see Correlation identifiers on page
427.
During the ingestion process, i2 Analyze compares the correlation identifiers of the data to be ingested
and existing data in the Information Store. The value of the correlation identifiers determines the
operations that occur. For more information about the correlation operations that can occur, see
Correlation operations on page 429.
The example data sets demonstrate the correlation behavior in this release of i2 Analyze. For more
information, see Ingesting example correlation data on page 449.
Correlation identifiers
The role of a correlation identifier is to indicate that data is about a specific real-world object. If multiple
pieces of data are about the same specific real-world object, they have the same correlation identifier.
At ingestion time, the correlation identifier of incoming data informs the Information Store how to
process that data. Depending on the current state of the i2 Analyze record that is associated with the
incoming data, a match with the correlation identifier on an inbound row of data determines the outcome
of the association.
You specify the values for the correlation identifier in the staging table that you are ingesting the data
from. The correlation identifier is made up of two parts, the correlation identifier type and the correlation
identifier key.
type
The type of a correlation identifier specifies the type of correlation key that you are using as part of
the correlation identifier. If you are generating correlation keys by using different methods, you might
want to distinguish them by specifying the name of the method as the correlation identifier type. If
your correlation keys are consistent regardless of how they are created, you might want to use a
constant value for the correlation identifier type.
When you specify the identifier type, consider that this value might be seen by analysts.
The length of the value for the type must not exceed 100 bytes. This value is equivalent to 50
Unicode characters.
key
The key of a correlation identifier contains the information necessary to identify whether multiple
pieces of data represent the same real-world object. If multiple pieces of data represent the same
real-world object, they have the same correlation identifier key.
The length of the correlation identifier key must not exceed the following sizes:
• On Db2: 1000 bytes. This is equivalent to 500 Unicode characters.
• On SQL Server: 692 bytes. This is equivalent to 346 Unicode characters.
To prepare your data for correlation by i2 Analyze, you might choose to use a matching engine or
context computing platform. Matching engines and context computing platforms can support the
identification of matches that enable you to identify when data that is stored in multiple sources
represents a single entity. You can provide these values to the Information Store at ingestion time. An
example of such a tool is IBM InfoSphere Identity Insight. InfoSphere Identity Insight provides resolved
entities with an entity identifier. If you are using such a platform, you can populate the correlation
identifier type to record this, for example identityInsight. You might populate the correlation
identifier key with the entity identifier, for example 1234. This generates a correlation identifier of
identityInsight.1234. For more information about IBM InfoSphere Identity Insight, see Overview
of IBM InfoSphere Identity Insight.
Alternatively, as part of the data processing to add data to the staging tables, you might populate
the correlation identifier with values from property fields that distinguish entities. For example, to
distinguish People entities you might combine the values for their date of birth and an identification
number, and you might specify the type as manual. This generates a correlation identifier of
manual.1991-02-11123456.
The complete correlation identifier is used for comparison. Only data with correlation identifiers of the
same type is correlated.
For more information about specifying a correlation identifier during the ingestion process, see
Information Store staging tables.
Implicit discriminators
In addition to the correlation identifier that is created from the type and key values that you provide,
implicit discriminators are also used during the matching process. In addition to the correlation identifier,
the following implicit discriminators are also compared. The implicit discriminators must be compatible to
enable correlation to occur.
Item type
The item type of the data that you are ingesting must be the same as the item type of the existing i2
Analyze record that is matched by the correlation identifier. If the item types are not the same, then
no correlation operations occur.
Correlation operations
When the Information Store receives a correlation identifier, the way the system responds depends on
the value of the correlation identifier, and the state of any records that are associated with it.
The following section explains the merge and unmerge operations that the system can respond with
when it receives a correlation identifier. This response is in addition to the insert and update operations
that are part of the standard ingestion process.
Merge
When one or more pieces of data are determined to represent the same real-world object, the data
is merged into a single i2 Analyze record. For the Information Store to merge data, the correlation
identifiers must match, the implicit discriminators must be compatible, and the origin identifiers must be
different.
During ingestion, a merge operation can occur in the following scenarios:
• New data in the staging table contains the same correlation identifier as an existing record in the
Information Store. The new data has an origin identifier that is not associated with the existing
record.
• An update to an existing record with a single piece of provenance in the Information Store causes
the correlation identifier of that record to change. The new correlation identifier matches with another
record in the Information Store.
• Multiple rows of data in the staging table contain the same correlation identifier. The Information
Store ingests the data as a new i2 Analyze record, or the record merges with an existing record in
the Information Store.
After a merge operation, the following statements are true for the merged record:
• The record has a piece of provenance for all of the source information that contributed to the merged
record.
• By default, the property values for the merged i2 Analyze record are taken from the
source information associated with the provenance that has the most recent value for the
source_last_updated column.
If only one piece of provenance for a record has a value for the source_last_updated column,
the property values from the source information that is associated with that provenance are
used. Otherwise, the property values to use are determined by the ascending order of the origin
identifier keys that are associated with the record. The piece of provenance that is last in the
order is chosen. To ensure data consistency, update your existing records with a value for the
source_last_updated column before you start to use correlation, and continue to update the
value.
If the default behavior does not match the requirements of your deployment, you can change the
method for defining property values for merged records. For more information, see Define how
property values of merged records are calculated on page 438.
• If an existing record to be merged contained any notes, the notes are moved to the merged record.
• If an existing record to be merged was an entity record at the end of any links, the links are updated
to reference the merged record.
Note: Any links that were created through Analyst's Notebook Premium are also updated to
reference the merged record.
During ingestion, the number of merge operations that occur is reported in the merge_count column of
the ingestion report.
The following diagrams demonstrate the merge operation.
In the first example of a merge operation, data in the staging table is merged into an existing i2 Analyze
entity record because the correlation identifiers match and the origin identifiers are different.
Provenance
Merge
Correlation identifier :
IIS.1234
Provenance
Provenance
Property values
Merged i2 Analyze
record (a)
In the diagram, the correlation identifiers of data in the staging table and the existing i2 Analyze record
(a) match, which causes a merge operation. The existing i2 Analyze record (a) is not associated with the
origin identifier of the incoming data. In this example, it is assumed that the staging table data is more
recent than the existing data. As part of the merge, the property values from the data in the staging table
row are used. This results in a change to the value for the first name property from "John" to "Jon". The
merged i2 Analyze record (a) now contains provenance for the origin identifier OI.12 and one for the
new data, OI.22.
In the second example of a merge operation, data in the staging table causes an update to an existing
record (a) that changes the correlation identifier to match another record (b) in the Information Store,
causing a merge.
Provenance Provenance
Merge
Remove
Correlation identifier :
IIS.9012
Provenance
Provenance
Property values
Merged i2
Analyze record (b)
Figure 2: Incoming data updates an existing record, which causes the existing record to merge
with another existing record in the Information Store. One of the existing records now has no
provenance, and it is removed.
In the diagram, the data in the staging table has a different correlation identifier to the record (a) that it
is currently associated with by its origin identifier, and the same correlation identifier as another existing
record (b). This causes a merge operation. The first existing record (a) no longer has any provenance
associated with it, and is removed. In this example, it is assumed that the staging table data is more
recent than the existing data. As part of the merge, the property values from the staging table row are
used. This results in a change to the value for the first name property from "Jon" to "John". The merged
i2 Analyze record (b) contains multiple pieces of provenance, one for the origin identifier OI.32 and one
for the new data, OI.12.
Unmerge
If the data for a merged i2 Analyze record is determined to no longer represent the same real-world
object, the i2 Analyze record can be unmerged into two i2 Analyze records. For the Information Store
to unmerge records, the correlation identifier or implicit discriminators of the data associated with that
record must be changed.
Assuming that the implicit discriminators are compatible, the unmerge operation occurs when the
correlation identifier of a row in the staging table is different from the correlation identifier on the merged
record that it is currently associated with by its origin identifier.
After the unmerge operation, the following statements are true for the existing i2 Analyze record:
• The piece of provenance for the source information that caused the operation is unmerged from the
record.
• The property values for the record are taken from the source information associated with the
provenance that has the most recent value for source_last_updated.
If only one piece of provenance for a record has a value for the source_last_updated column,
the property values from the source information that is associated with that provenance are
used. Otherwise, the property values to use are determined by the ascending order of the origin
identifier keys that are associated with the record. The piece of provenance that is last in the
order is chosen. To ensure data consistency, update your existing records with a value for the
source_last_updated column before you start to use correlation, and continue to update the
value.
If the default behavior does not match the requirements of your deployment, you can change the
method for defining property values for merged records. For more information, see Define how
property values of merged records are calculated on page 438.
• All notes remain on the record.
• The last updated time of the record is updated to the time that the unmerge operation occurred.
• If the existing record was an entity record at the end of any links, any links to the unmerged piece of
provenance are updated to reference the record that now contains the provenance.
After the unmerge operation, depending on the change in correlation identifier that is presented to
the Information Store, either a new i2 Analyze record is inserted or a merge operation is completed.
During ingestion, this process is reported in the unmerge_count, insert_count, and merge_count
columns of the ingestion report.
The following diagrams demonstrate the unmerge operation. In each diagram, the data that is ingested
from the staging table contains a different correlation identifier to the one on the record that it is
associated with by its origin identifier. One unmerge operation results in a new record, and one results
in a merge operation.
In the first example of an unmerge, the correlation identifier of the provenance that is unmerged from
an existing record (a) does not match with another correlation identifier in the staging table or the
Information Store. A new i2 Analyze record (b) is created with the property values from the staging
table.
Provenance
Provenance
Insert Unmerge
Provenance Provenance
Figure 3: Incoming staging table data causes an unmerge operation. After the unmerge, a new
record is inserted.
In the diagram, the data in the staging table has a different correlation identifier to the record (a) that
it is currently associated with. This causes an unmerge operation. The provenance is unmerged from
the existing record (a). The existing record (a) only contains the provenance for the origin identifier
OI.12 and the property values from the source information that is associated with that provenance. The
correlation identifier of the staging table data does not match with any others in the Information Store, so
a new record (b) is inserted.
In the second example of an unmerge, if the correlation identifier of the provenance that is unmerged
from an existing record (a) now matches the correlation identifier of another record (b), a merge
operation is performed.
Provenance Provenance
Provenance
Unmerge Merge
Provenance Provenance
Provenance
Property values
Merged i2
Analyze record (b)
Figure 4: Incoming staging table data causes an unmerge operation. After the unmerge, a merge
operation occurs.
In the diagram, the data in the staging table that is ingested has a different correlation identifier to the
record (a) that it is currently associated with. This causes the unmerge operation. The origin identifier
and provenance are unmerged from the existing record (a). The existing record (a) now only contains
the provenance for the origin identifier OI.12 and the property values from the source information that
is associated with that provenance.
The correlation identifier of the staging table data now matches with another record (b) in the
Information Store, so a merge operation occurs. In this example, it is assumed that the staging table
data is more recent than the existing data. As part of the merge, the property values from the staging
table row are used. This results in a change to the value for the first name property from "Jon" to "John".
The merged i2 Analyze record (b) now contains two pieces of provenance, one for the origin identifier
OI.32 and one for the new data, OI.22.
For more information about the behavior of a merge operation, see Merge on page 429.
Intended audience
The information about defining the property values of merged records is intended for users who are
database administrators and experienced in SQL. To define the property values, you must write
complex SQL view definition statements.
Important: You must write and test the SQL view statements for defining the property values of
merged records in a non-production deployment of i2 Analyze. If you create an incorrect view, you might
have to clear all the data from the system. Before you implement your view definitions in a production
system, you must complete extensive testing in your development and test environments.
When a record contains more than one piece of provenance, it is a merged record. The properties for
an i2 Analyze record are calculated when a merge or unmerge operation occurs, or when provenance is
removed from a record.
By default, all of the property values for a record come from the source data that contributed to the
record with the most recent source-last-updated-time. If no source data has a source-last-updated-time,
the property values to use are determined by the ascending order of the origin identifier keys that are
associated with the record. The source data that is last in the order is chosen.
If the default source-last-updated-time behavior does not match the requirements of your deployment,
you can define how the property values are calculated for merged i2 Analyze records. You might define
your own rules when multiple data sources contain values for different properties of an item type or one
data source is more reliable for a particular item or property type.
To demonstrate when it is useful to define how to calculate the property values for merged records,
imagine that the following two pieces of source data contributed to an i2 Analyze record of type Person:
In the default behavior, the property values are used from the source data with the most recent value
for the Source last updated column. The property values from the row with the ingestion source name of
PNC are used for the merged i2 Analyze record, and the record gets the value of Jon for the first given
name property.
If you know that data from the DVLA ingestion source is more reliable for this item type, you can define
that source data with the value of DVLA for the ingestion source name takes precedence. By using this
definition, the i2 Analyze record gets the value of John for the first given name property.
After you define this rule for the Person entity type, all future updates to the records of this item type
take the property values from the DVLA ingestion source if it is present in any of the source data that
contributed to a merged i2 Analyze record.
For more information about how to enable this function, and create your own rules, see Defining the
property values of merged i2 Analyze records on page 439.
For more information about the merge contributors view, see The merge contributors view on page
441.
• The view whose name is suffixed with _MPVDV is where you can define the rules for calculating
the property values of merged records. This is the merged property values definition view. You can
modify this view to define how the merged property values are calculated from the data in the merge
contributors view.
When you generate the merged property values definition view, the default source-last-updated-time
behavior is implemented in the view.
For more information about modifying the definition, and some examples, see The merged property
values definition view on page 443.
For example, the Person entity type from the example law enforcement schema produces the
IS_Public.E_Person_MCV and IS_Public.E_Person_MPVDV views.
1. Identify the item types that you want to define the property values for.
For more information about why you might want to do this, see Define how property values of
merged records are calculated on page 438.
2. Run the enableMergedPropertyValues toolkit task for each of the item types that you identified
in step 1.
a) Open a command prompt and navigate to the toolkit/scripts directory.
b) Run the enableMergedPropertyValues toolkit task:
For example, to create the views for the Person entity type from the example law enforcement
schema, run the following command:
You can run the toolkit task with schemaTypeId=ItemTypeId or without any arguments to create
the views for all item types in the i2 Analyze schema. You can run the task multiple times with
different item type identifiers.
3. Inspect the merge contributors view (_MCV) that is created in the IS_Public schema and identify
any rules to create for calculating the property values of merged records.
For more information about the merge contributors view, see The merge contributors view on page
441.
4. Modify the SQL create statement for the merged property values definition view (_MPVDV) that is
created in the IS_Public schema to create a view that implements the rules you identified in step
3.
For more information about modifying the definition, and some examples, see The merged property
values definition view on page 443.
After you ingest data that causes correlation operations, ensure that the property values for the merged
i2 Analyze records match your expectations by searching for them. If you can ingest data successfully,
and the property values are correct in your testing, the merged property values definition view is correct.
If the property values are not correct in all cases, you must continue to modify the merged property
values definition view and test the results until your requirements are met.
It is recommended that you keep a backup of your complete merged property values definition view.
During the testing of your deployment, you might need to clear any test data from the system to ensure
that your view is behaving correctly. For more information, see Clearing data from the system.
You must maintain the merged property values definition view for the lifetime of your deployment. If
you change the i2 Analyze schema file, you must update the SQL statement for your view to match
any changes. For example, if you add a property type to an item type, the new property must be added
to the SQL statement for your merged property values definition view. You do not need to update the
merge contributors view. The merge contributors view is updated when the schema is updated so that
you can inspect the column name for the new property. For more information, see How to maintain your
merged property values definition view on page 448.
After you test the view definition in your non-production database, you can implement the view on your
production database.
For example, values for the First Given Name property type for a Person entity type from the
example law enforcement schema are stored in the p_first_given_name column.
correlation_id_type and correlation_id_key
The correlation identifier is used to indicate that data is about a specific real-world object. For
multiple pieces of data to contribute to the same i2 Analyze record, they must have the same
correlation identifier.
In the merge contributors view, the correlation identifier is stored in the correlation_id_type
and correlation_id_key columns.
ingestion_source_name
When data is ingested into the Information Store, you must provide a data source name that
identifies the data source from which it originated. This is stored as the ingestion source name in the
Information Store.
In the merge contributors view, the ingestion source name is stored in the
ingestion_source_name column.
In most scenarios, the origin_id_type, origin_id_keys, source_last_updated, and
ingestion_source_name values can be used for choosing the source data to provide the property
values.
The following table shows an example merge contributors view for a Person entity type where two
pieces of source data contribute to the same i2 Analyze record:
record_id 2R1HCAGm2FnmSkdgRbRjpTuW2R1HCAGm2FnmSkdgRbRjpTuW
item_id 1 1
origin_id_type DVLA PNC
origin_id_keys 1234 5678
source_last_updated 12:20:22 09/10/2018 14:10:43 09/10/2018
p_first_give_name John Jon
p_middle_name James
p_... ... ...
correlation_id_type II II
correlation_id_key 1 1
ingestion_source_name DVLA PNC
In this example merge contributors view, you can see the following details:
• The data originates from two ingestion sources; DVLA and PNC
• The data from the PNC ingestion source was updated more recently than the data from the DVLA
ingestion source
• The data from the PNC ingestion source is missing a value for the middle name property
You might already know some of these details about the data in your system, or you might be able to
determine these differences from inspecting the contents of the view. If you decide that you want to
change the default behavior for calculating property values, you must customize the merged property
values definition view. For more information about how to customize the merged property values
definition view, see The merged property values definition view on page 443.
The merged property values definition view is over the merge contributors view. To produce a single
value for each property type, the view must contain one row for each item_id and populate each
column with a value. You can include NULL values for property types that have no value.
After you customize the merged property values definition view, it must conform to the following
requirements:
• The view must have columns for the item identifier and every property type column in the merge
contributors view. The column names in both views must be the same.
The merged property values definition view must not contain the origin_id_type,
origin_id_keys, source_last_updated, correlation_id_type, correlation_id_keys,
or ingestion_source_name columns.
• Each item identifier must be unique.
Important: You cannot ingest data into the Information Store if the view contains multiple rows with
the same item identifier.
Default behavior
By default, the merged property values definition view uses the default source-last-updated-time
behavior to define which source data the property values come from. You can see the SQL statement
for the view in IBM Data Studio or SQL Server Management Studio. An example of the SQL statement
for the generated view for the Person entity from the example law enforcement schema is:
The SQL statement for the default view works in the following way:
• To ensure that the view contains the correct columns, the first SELECT statement returns the
item_id and all property type columns. For example, p_title.
• To split the rows into groups that contain only the data that contributed to a single merged record,
the OVER and PARTITION BY clauses are used on the item_id column.
• To define the value for each property, the ROW_NUMBER function and ORDER BY clauses are
used. For the default behavior, the ORDER BY clause is used on the source_last_updated,
origin_id_keys, and origin_id_type columns. Each column is in descending order. By using
NULLS LAST, you ensure that if the source_last_updated column does not contain a value, it is
listed last.
The ROW_NUMBER function returns the number of each row in the partition. After the rows are
ordered, the first row is number 1. The initial SELECT statement uses a WHERE clause to select the
values for each row with a value of 1 for partition_row_number. Because only one row has a
value of 1, the view contains only one row for each item identifier.
• When you enable merged property views by using the enableMergedPropertyViews toolkit task,
a table alias for an internal staging table is also created. For example, IS_Public.E_Person_STP
is the table alias for the staging table that is created during a Person item type ingestion.
During the ingestion process, the internal staging table contains the item identifiers of the records to
be inserted or updated as part of the current ingestion in the item_id column. When the property
values for records are calculated by using the _MPVDV view, it uses the information in the _STP
alias to restrict the number of rows that are processed. The following statement from the previous
example ensures that the _MPVDV view uses the alias for a specific item type:
Examples
You can modify the following example views to match the data in your i2 Analyze schema or to meet
your property value requirements:
Ingestion Source Name precedence
In this example, the data is ordered so that a specified ingestion source takes precedence. For
example, the DVLA ingestion source takes precedence over the PNC one.
If an ingestion source name does not match DVLA or PNC, it is ordered last. If multiple contributing
pieces of data have the same ingestion source name, the behavior is non-deterministic. In a
production system, you must include another clause that provides deterministic ordering in all
scenarios.
To order the ingestion sources, the ORDER BY CASE clause is used. The rows are ordered by the
value that they are assigned in the CASE expression. For more information about the ORDER BY
CASE statement, for Db2 see ORDER BY clause and CASE expression or for SQL Server see Using
CASE in an ORDER BY clause.
The ROW_NUMBER function and the OVER and PARTITION BY clauses are used in the same way as
the view for the default behavior.
Source last updated and non-NULL
In this example, the data is ordered by the source last updated time. However if there is a NULL
value for a particular property type, the value is taken from the data that has the next most recent
time. This process is completed until a value for the property is found, or no data is left for that
record. The definitions are slightly different depending on the database management system that
hosts the Information Store database.
Db2
SELECT DISTINCT
item_id,
FIRST_VALUE(p_first_given_name,
'IGNORE NULLS')
OVER ( PARTITION BY MCV.item_id
ORDER BY source_last_updated DESC NULLS LAST
RANGE BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) AS p_first_given_name,
FIRST_VALUE(CAST(p_additional_informatio AS VARCHAR(1000)),
'IGNORE NULLS')
OVER ( PARTITION BY MCV.item_id
ORDER BY source_last_updated DESC NULLS LAST
RANGE BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) AS p_additional_informatio
FROM IS_Public.E_Person_MCV
INNER JOIN IS_Public.E_Person_STP AS STP
ON MCV.item_id = STP.item_id
SQL Server
SELECT DISTINCT
P.item_id,
P1.p_first_given_name,
P2.p_additional_informatio,
... (all property type columns) ...
FROM "IS_Public".E_Person_MCV AS P
LEFT OUTER JOIN
( SELECT DISTINCT
item_id,
FIRST_VALUE(p_first_given_name)
OVER ( PARTITION BY MCV.item_id
ORDER BY CASE source_last_updated
WHEN NULL THEN
CAST('0001-01-01' AS date)
ELSE
source_last_updated
END DESC
RANGE BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) AS p_first_given_name
FROM "IS_Public".E_Person_MCV
INNER JOIN IS_Public.E_Person_STP AS STP
ON MCV.item_id = STP.item_id
WHERE p_first_given_name IS NOT NULL
) AS P1
ON P1.item_id = P.item_id
... (LEFT OUTER JOIN and FIRST_VALUE function for all property types) ...
This example uses the FIRST_VALUE function to identify which data has the first non-NULL value
for a property type. The FIRST_VALUE function returns the first value in an ordered set of values.
You must specify a scalar expression, and PARTITION BY, ORDER BY, and RANGE clauses. In the
example, the scalar expressions are the property type columns. The PARTITION BY and ORDER
BY clauses are similar to the other examples, where they act on the item identifier and the source
last updated time.
In Db2, the FIRST_VALUE function allows you to ignore null values by specifying 'IGNORE
NULLS' in the expression. This functionality is not available in SQL Server. To achieve the same
result, you must perform a series of LEFT OUTER JOINs for each FIRST_VALUE function. In the
ORDER BY CASE clause for each FIRST_VALUE function, when the value for the column in the
FIRST_VALUE function is NULL you can set the source last updated time to 0001-01-01. Setting
the last updated time to this value and ordering the partition this way ensures that the row with a
value for the property and the most recent source last updated time is returned.
• For more information about the FIRST_VALUE function in Db2, see OLAP specification -
FIRST_VALUE.
• For more information about the FIRST_VALUE function in SQL Server see FIRST_VALUE
(Transact-SQL).
To use the FIRST_VALUE function, you must define a function for each property type separately.
In the previous examples, there are two examples of the FIRST_VALUE function for two property
types:
• The first acts on the p_first_given_name column.
• The second acts on the p_additional_informatio column. The same process is
completed, however in the merge contributors view the p_additional_informatio column is
of data type CLOB(32M). This data type is not supported by the FIRST_VALUE function. To use
this column, you must cast it into a supported data type. For example, VARCHAR(1000). You
must ensure that if a column is cast, that you do not lose any data.
The merged property values definition views include all of the property types for an item type as they
were when the view was created. If you change the i2 Analyze schema to add a property type to an item
type, the merged property values definition view is now incorrect. The merged property values definition
view does not contain a column for the new property type, and therefore cannot populate an i2 Analyze
record with a value for that property. When you add property types to the i2 Analyze schema for an item
type, you must update any merged property values definition view to include any new property types.
i2 Analyze updates the merge contributors view when the schema is updated, you can use the updated
merge contributors view to see the name of the new column.
Note: If you do not update your merged property values definition view, you cannot ingest data into the
Information Store.
To stop controlling the property values of merged records for an item type, you can inform
i2 Analyze that you no longer intend to use this function. Inform i2 Analyze by running the
disableMergedPropertyValues toolkit task. You can specify schemaTypeId=ItemTypeId
to remove the views for one item type only, or without any arguments to remove the views for every
item type in the i2 Analyze schema. After you run the toolkit task, the default source-last-updated-time
behavior is used.
You can modify the merged property values definition view or run the
disableMergedPropertyValues toolkit task at any time. The property values of i2 Analyze records
that are already in the Information Store are not updated. If you ingest any data that updates an existing
i2 Analyze record, the property values for that record are recalculated with the new merged property
values definition view.
with some of the existing data in the database, but the origin identifiers are different. These
matches cause a number of merge operations to occur during the ingestion.
For example, you can see on line 2 of the person.csv file the correlation identifier key is
person_0, which matches the correlation identifier of the record for person "Julia Yochum". This
match causes a merge between the existing record and the incoming row of data. As part of the
merge, the property values from the incoming row of data are used for the record, this results in
the full name changing to "Julie Yocham". This is an example of the scenario that is described in
Figure 1: Incoming staging data merges with an existing i2 Analyze record. on page 431
In the ingestion reports, you can see the number and type of correlation operations that occurred
during the ingestion. For more information about understanding the ingestion reports, see
Understanding ingestion reports.
b) In Analyst's Notebook Premium, select the chart item that represents Julia Yochum and click Get
changes.
You can see that the name changes due to the merge operation described previously.
4. Ingest the law-enforcement-data-set-2-unmerge data to demonstrate unmerge operations.
a) Run the following command to ingest the third example data set:
individually, selectively, or in bulk; and it describes the circumstances in which each approach is
possible or desirable.
This guide is intended for system implementers and database administrators who want to delete data
directly from the Information Store. It applies to data added by ingestion from one or more external data
sources, and to data added by users who share information through i2 Analyst's Notebook Premium.
Note: For more information about adding data by ingestion, see Information Store data ingestion. For
more information about adding data through i2 Analyst's Notebook Premium, see Uploading records to
the Information Store.
The tasks that the following topics describe variously require familiarity with the i2 Analyze data model,
your database management systems, SQL queries, and IBM Data Studio or SQL Server Management
Studio. Some of the approaches also require specialized authorization with the Information Store.
Overview
To comply with data protection regulations, storage restrictions, or other requirements, you might
need to delete data from the Information Store. i2 Analyze provides mechanisms for doing so. The
mechanism that you choose depends on how the records were created in the Information Store and
your reason for deleting them.
At version 4.3.5 of i2 Analyze, there are two ways to delete records from the Information Store:
• Some of the records in the Information Store are created and uploaded through Analyst's Notebook
Premium. If a user deletes a record that they or a colleague uploaded, it becomes unreachable for
all users, but remains in the Information Store. You can arrange to delete these soft-deleted records
permanently, either automatically or on demand.
• For all records in the Information Store, you can write identifying rules that target them for deletion
individually or as a group. This approach deletes records no matter how they were originally created
or what their current state is.
It is common to all record deletion that deleting an entity record forces i2 Analyze to delete the link
records that are attached to that entity. A link must always have two ends. If one end is deleted, the link
is automatically deleted as well.
It is also common to both of these ways for deleting records from the Information Store that the
procedure is permanent. If you might need to recover the deleted data in future, ensure that you have a
backup plan in place before you begin.
Note: The demands of synchronizing with the contents of an external source can also require you
to delete data from the Information Store. In that situation, you can reflect deleted data in an external
source by using the same ingestion pipeline that you use to reflect creation and modification.
For more information about updating the Information Store in this manner, see Updating the Information
Store for deleted data.
However, by default, the data for the record remains in the database until you decide to delete it
permanently. The Information Store has a mechanism for removing these "soft-deleted" records that you
can start on an automatic or a manual basis.
The mechanism for purging (that is, fully deleting soft-deleted) records uses three stored procedures
from the IS_Public schema of the Information Store database.
To do its work, the purging mechanism uses the same infrastructure as the mechanism for deleting
records by rule. Each manual or automated request to purge soft-deleted records from the Information
Store causes a set of deletion jobs to be created. There is one job for each entity type and link type in
the i2® Analyze schema, and the Db2® task scheduler or SQL Server Agent runs them at the earliest
opportunity.
i2® Analyze keeps logs of manual and automated purge operations alongside its logs of deletion-by-rule
operations, in the Deletion_By_Rule_Log view on page 462. In that view, the rule name for all jobs
that are associated with purging soft-deleted records is PURGE_SOFT_DELETE.
When users delete records from the Information Store through Analyst's Notebook, the data remains
in the Information Store (but is inaccessible to users) unless you do something to change that. The
following procedures describe how to purge soft-deleted records manually, how to automate that
process, and how to understand the effect of a purge operation.
To perform a one-off, manual purge of all soft-deleted records from the Information Store:
• Run the IS_Public.Purge_Soft_Deleted_Records stored procedure.
• For example, for Db2:
CALL IS_Public.Purge_Soft_Deleted_Records;
EXECUTE "IS_Public".Purge_Soft_Deleted_Records;
You can use this procedure regardless of whether you also configure automatic purging.
It immediately creates jobs for purging soft-deleted records of every type.
To set up the Information Store so that soft-deleted records are purged automatically, on a schedule:
• Run the IS_Public.Set_Purge_Soft_Delete_Schedule stored procedure with the schedule
that you want to use.
• For example, for Db2:
For more information about the format of the schedule, see UNIX cron format.
• For example, for SQL Server:
EXECUTE "IS_Public".Set_Purge_Soft_Delete_Schedule
@freq_type=4,@freq_interval=1,
@freq_subday_type=1,@active_start_time = 00000;
For more information about the arguments and values for the schedule, see sp_add_jobschedule.
Note: You must not specify values for the @job_id, @job_name, @name, @enabled
arguments. i2 Analyze provides the values for these arguments.
In this example, the Information Store is configured to create the jobs that purge any soft-deleted
records every day, at midnight.
To turn off automatic purging, call the IS_Public.Remove_Purge_Soft_Delete_Schedule
stored procedure.
Regardless of whether a purge was started manually or automatically, the effect is always the same: a
set of jobs is created to remove soft-deleted records from the Information Store. To inspect the status
and outcome of those jobs, you can use a view that the Information Store provides:
• In IBM Data Studio or SQL Server Management Studio, open the
IS_Public.Deletion_By_Rule_Log view and look for blocks of jobs where the rule_name is
PURGE_SOFT_DELETE.
For more information about the contents of the IS_Public.Deletion_By_Rule_Log view, see
Deletion_By_Rule_Log view on page 462.
By default, deletion-by-rule privileges are granted to the Information Store database administration user.
If necessary, the privileges can be granted to other users by giving them the Deletion_By_Rule
role. For more information, see Authorization to delete by rule. The privileges give you access to all the
deletion-by-rule views and procedures, as summarized in the following outline of the main tasks.
CAUTION: Grant the Deletion_By_Rule role only to users with sufficient knowledge and
authority. Use caution when you complete deletion-by-rule tasks as they constitute a powerful
mechanism for deletion of data that might be difficult to recover. Ensure that there is a reliable
backup in place before you delete by rule.
1. Identify the data to be deleted by composing and running an SQL query to select the data from the
deletion view.
You can use a different view to see details of the deletion views for your database.
2. Check the SQL results to confirm that the selected data is as you expected.
You can use the condition in subsequent steps to identify the particular records in the Information
Store that you want to be deleted.
3. Create a deletion rule based on the SQL query you verified at step 2 by using the supplied stored
procedure.
By default when a rule is created, automated job creation is switched off. You can use another view
to see a list of rules and their automation status.
4. Create a deletion job manually by using the supplied stored procedure.
The procedure creates a deletion job that contains the rule, which is queued to run.
5. Check the status of the job, and that deletion occurred as you expected, by consulting the
IS_Public.Deletion_By_Rule_Log view.
For more information, see Verifying deletion by rule.
6. Optional: Automate deletion by rule by using the supplied stored procedure to include the rule in
automated job creation.
SELECT *
FROM IS_Public.E_Event_DV
WHERE source_last_updated < CURRENT_DATE -3 YEARS
SELECT *
FROM "IS_Public".E_Event_DV
WHERE source_last_updated < DATE_ADD(yyyy, -3, GETDATE())
4. Verify that the data that is returned from your SQL query is the data that you want to delete.
Create a deletion rule based on your query. For more information, see Creating a deletion rule.
Running this procedure does not run the rule. However, the rule is validated to inform you of any
errors. If the rule has no conditions, you receive a warning message to ensure that you are aware of
the consequences when you do create a deletion job based on the rule
CAUTION: When a job is created from a rule that has no conditions, it deletes every record
in the view.
To review the rule, browse the data in the IS_Public.Deletion_Rules view. For example, see the
following rule details.
By default, the rule is set up so that you can use it to create jobs manually. For more information,
see Creating a deletion job on page 457. You can also configure the rule so that jobs are created
automatically, according to a schedule. For more information, see Automating deletion by rule.
You can update a stored rule by a procedure similar to rule creation, or use a procedure to delete a rule.
For more information, see Stored procedures on page 461.
Troubleshooting
When a deletion by rule job is completed, information that indicates success or failure is
sent to the IS_Public.Deletion_By_Rule_Log view. More details are recorded in the
IS_Public.Ingestion_Deletion_Reports view.
You can see the status of all deletion-by-rule jobs in the IS_Public.Deletion_By_Rule_Log view.
For more information, see Verifying deletion by rule on page 457.
If the value of status is not Succeeded, the type of error that is recorded informs your diagnosis and
troubleshooting steps. The possible status values, explanations, and steps to take for each value are
described as follows.
2. Check whether the server is running. You might need to restart the server. For more information, see
Deploying i2® Analyze.
3. Check whether the server is connected to the database. There might be a network issue. You can
find the Opal server logs here: IBM\i2analyze\deploy\wlp\usr\servers\opal-server
\logs
If you are applying deletion by rule to a remote Information Store on Db2, the remote database must be
cataloged by using the local Db2® client. If the configuration files are not already updated, complete the
update by using the toolkit. For more information, see Configuring remote IBM® Db2® database storage.
If you are using IBM Db2 and none of the previous steps resolve the issue, there might be a
problem with the Administrative Task Scheduler or Db2. For more information, see Troubleshooting
administrative task scheduler.
Server error
1. To assess the extent of the problem, in the IS_Public.Deletion_By_Rule_Log view, check the
reject_count value to see the number of items that are rejected.
2. Access the IS_Public.Ingestion_Deletion_Reports view for more information. A specific
deletion job can be identified by the job_id value, which is identical to the job_id value in the
IS_Public.Deletion_By_Rule_Log view.
Note: Deletion-by-rule automatically populates the label column in the
IS_Public.Ingestion_Deletion_Reports view with the value Deletion rule:<Deletion Rule
Name>.
There is a deletion view for each record type. Each of these views has a suffix of _DV. Prefixes of E_
and L_ are used for entities and links. In IBM Data Studio or SQL Server Management Studio, you can
expand a view to see its columns.
Any columns with a prefix of P_ contain values of properties that are defined in the i2 Analyze schema.
The other columns contain metadata values that it might be useful to base rules on. Each view has
columns for property values, which vary by item type, and columns for metadata values that are
common to all records. The following table contains some examples of property columns.
Table 8: Sample extract of deletion views and properties from an IS_Public schema
L_Access_To_DV
p_unique_reference ADC153
p_type_of_use Account Holder
... ...
Metadata columns can be useful for creating deletion rule conditions. All deletion views have a common
set of metadata columns as described in the following table.
Note:
• In the Information Store, there is a one-to-one mapping between record identifiers and item
identifiers. For more information about the identifiers that are used in i2 Analyze, see Identifiers in i2
Analyze records.
• Records that are created through i2 Analyst's Notebook have a value of ANALYST in the
source_names column and null values for source_created and source_last_updated.
Views that represent link types have a few extra metadata columns as described in the following table.
Each deletion view has a security dimension column, which contains the security dimension values for
each record. For more information, see Security model. It can be useful to delete records based on their
security dimension values.
Note: The deletion of data is not subject to restriction based on the i2 Analyze security dimension
values, but you can filter data for deletion based on these values.
Stored procedures
The i2 Analyze deletion-by-rule functions depend upon a set of standard stored procedures. You can
use these procedures to manage deletion rules and create deletion jobs.
The procedures that are available are described in the following table. An asterisk indicates a
mandatory parameter.
Deletion_By_Rule_Log view
The IS_Public.Deletion_By_Rule_Log view provides information on the status of the deletion
jobs in the queue in relation to each rule. The view also contains details on the results of each deletion
job that is run.
You can access the view to check the status of a deletion-by-rule job. For more information, see
Verifying deletion by rule. After the deletion job is run, the IS_Public.Deletion_By_Rule_Log view
contains one or more entries per job that describe the result for each item type. For more information,
see Sample use cases on page 465.
The information that is contained in the IS_Public.Deletion_By_Rule_Log view is described in
the following table.
item_type Item type of the records that were deleted. There can be more
than one entry in this view for the same job. For example, deleting
entity records can also cause link records to be deleted.
delete_count The number of records that are deleted.
reject_count The number of records that are rejected.
See the following tables for an example of how the IS_Public.Deletion_By_Rule_Log view might
look with completed job details.
In this example, deleting three Account entity records caused the deletion in the same job of three
Transaction link records that were attached to the Accounts.
Note: The Db2 administrative task scheduler checks for new or updated tasks at 5-minute intervals. If
you change the setting, be aware of a potential delay of 5 minutes before it takes effect. You can still
create a job from the rule manually if you need to.
If you are using SQL Server, you must specify arguments for the sp_add_jobschedule stored
procedure. For example, run the stored procedure with the following value:
EXECUTE "IS_Public".Update_Deletion_Schedule
@freq_type=4,@freq_interval=1,
@freq_subday_type=8,@freq_subday_interval=24;
Note: If you are using Db2 and you follow this procedure in a deployment that provides high availability,
you must run the stored procedure on the primary and any standby database instances.
CAUTION: Assign the Deletion_By_Rule role only to users with sufficient knowledge
and authority. Exercise caution when you create deletion rules as they constitute a powerful
mechanism for deletion of data that cannot be recovered afterward.
Select records of all persons who are below a specific age. The rule matches records of persons who
are younger than 18 years. This example is one that you might use to delete data that does not suit the
specific purpose of the database.
• For Db2:
SELECT *
FROM IS_Public.E_Person_DV
WHERE p_date_of_birth > CURRENT_DATE - 18 years
SELECT *
FROM "IS_Public".E_Person_DV
WHERE p_date_of_birth > DATEADD(yyyy, -18, GETDATE())
Select all records that are sourced from the "DMV" data source.
SELECT *
FROM IS_Public.E_Person_DV
WHERE source_names LIKE '%DMV%'
Use SQL to select any record with a creation date that is not in the current calendar year.
• For Db2:
SELECT *
FROM IS_Public.E_Person_DV
WHERE YEAR(create_time) < YEAR(CURRENT_DATE)
SELECT *
FROM IS_Public.E_Person_DV
WHERE create_time < DATEFROMPARTS(DATEPART(yyyy, GETDATE()), 1, 1)
SELECT *
FROM IS_Public.L_Communication_DV
WHERE create_time < CURRENT_DATE - 3 MONTHS
SELECT *
FROM IS_Public.L_Communication_DV
WHERE create_time < DATEADD(mm, -3, GETDATE())
This case is an example where you require deletion of entities that are not linked to other entities. You
might want this type of deletion as a general database cleanup, or to target links of a specific type, for
example those granting access to a building or an account. Select records of people who do not have
any access links.
The rule matches records of people when the item_id is not contained in a from_item_id column in
the link deletion view. When your rule joins to another deletion view, it is recommended that you identify
records by using the item_id columns.
SELECT *
FROM IS_Public.E_Person_DV
WHERE item_id NOT IN (
SELECT from_item_id
FROM IS_Public.L_Access_To_DV
)
Select records of any person who is associated with a person named 'Michael Wilson'.
When your rule joins to another deletion view, it is recommended that you identify records by using the
item_id columns.
SELECT *
FROM IS_Public.E_Person_DV
WHERE item_id IN (
SELECT A.from_item_id
FROM IS_Public.L_Associate_DV AS A
INNER JOIN IS_Public.E_Person_DV AS P
ON P.item_id = A.to_item_id
WHERE P.p_first_given_name = 'Michael' AND
P.p_family_name = 'Wilson'
)
This case is an example of the kind of deletion that might be used to clean up a database or reduce
storage requirements. Select records from a specific source based on when the last update occurred.
The rule matches records of people with the named data source and an update date that is older than
three years from the current date.
• For Db2:
SELECT *
FROM IS_Public.E_Person_DV
WHERE source_names LIKE '%LawEnforcementDB%' AND
source_last_updated < CURRENT_DATE - 3 YEARS
SELECT *
FROM IS_Public.E_Person_DV
WHERE source_names LIKE '%LawEnforcementDB%' AND
source_last_updated < DATEADD(yyyy, -3, GETDATE())
Note: In this context, "accessed" refers to when the chart was uploaded, downloaded, or had its
details (that is, its history) requested by a user.
With this knowledge, you might implement a policy such that charts that haven't been accessed for a
certain period of time are automatically deleted - but only after you Sending alerts to i2 Analyze users on
page 469 to all the users who uploaded the chart to warn them.
For this purpose, the last_seen_time column is pertinent. To determine which charts have not been
accessed for approximately 12 months, you might execute the following (Db2) SQL statement:
SELECT item_id
FROM IS_Public.E_Analyst_S_Notebook_Ch_DV
WHERE last_seen_time < NOW() - 12 MONTHS;
You can then use the list of identifiers that this statement returns to find out which users need to be
notified.
Therefore, to determine which users have uploaded (any version of) a chart that has an item_id of
1234, you might execute the following SQL statement:
SELECT user_principal_name
FROM IS_Public.E_Analyst_S_Notebook_Ch_Upload_Users
WHERE item_id = 1234;
Note: Even if the same user has uploaded multiple versions of the same chart, their principal name
is only returned once for each chart.
With this list of user names in hand, you might decide to use one of the i2 Analyze APIs for Sending
alerts to i2 Analyze users on page 469.
Important: Alerting is always enabled in deployments that include the Information Store,
but by default it is not enabled in deployments that include only the Chart Store. To
enable alerting in a Chart Store deployment, you must set EnableAlerting=true in
DiscoServerSettingsCommon.properties, and then restart the i2 Analyze server.
Creating alerts
The alerts that you create contain an icon, a title, and an optional message. In addition, they can contain
either a group of records or a link to an external resource. They can be created through three different
APIs:
• Database stored procedures
• A Java API
• A REST API
Each API provides the same control over the contents of alerts. The Java and REST APIs provide
additional functionality for specifying the recipients of alerts, based on group membership or command
access control permissions. (The stored procedures require recipients to be specified explicitly, one at a
time.)
Note: Alerts can only be sent to users who have previously logged in to i2 Analyze. An authorized
user who has never logged in will not see alerts that were created and sent before they do so.
Alert structure
The structure of an alert is the same no matter which API creates it. Alerts can contain the following
elements:
• title
The title is mandatory and should contain a short string that will be displayed for the alert.
• message
The message is optional. If supplied, it can contain a longer string with more information than the
title.
• icon
The icon is optional and determines the image that is shown next to the alert in the user interface. If
supplied, it must be one of the following values. If not supplied, it defaults to INFORMATION.
• SUCCESS
• INFORMATION
• WARNING
• ERROR
• PLUS
• MINUS
• record identifiers
Record identifiers are optional. If they are supplied, the alert links to the specified records or charts
and displays them in a results grid (in the same way that Visual Query alert results are displayed).
Record identifiers cannot be supplied if an href is supplied.
• href
The href is optional. If supplied, it should contain a URL for a resource that provides more
information or context for the alert. The URL is displayed as a hyperlink in the user interface.
An href cannot be supplied if record identifiers are supplied.
Database API
The Information Store (or Chart Store, when alerting is enabled) database contains two
public stored procedures for creating alerts. To create an alert with an optional hyperlink, use
IS_Public.Create_Alert, which takes the following parameters:
For example:
[Db2]
CALL IS_Public.Create_Alert('test-user', 'For your information', 'Take a look at
this link', 'https://example.domain/news/123', 'INFORMATION')
[SQL-Server]
EXEC IS_Public.Create_Alert 'test-user', 'For your information', 'Take a look at
this link', 'https://example.domain/news/123', 'INFORMATION'
Note: To send the same alert to multiple users, you must execute the stored procedure multiple
times, changing the user_principal_name as required.
For example:
[Db2]
CALL IS_Public.Create_Records_Alert('test-user', 'Records added!',
'h6cBtBz6CuYEU3YwzqUuZEmrhJ,TaUjsCpKUnmY85YWgBQbeNqou2', 'New records added from
example source', 'PLUS')
[SQL-Server]
EXEC IS_Public.Create_Records_Alert 'test-user', 'Records added!',
'h6cBtBz6CuYEU3YwzqUuZEmrhJ,TaUjsCpKUnmY85YWgBQbeNqou2', 'New records added from
example source', 'PLUS'
Note: Alerts that you create through this stored procedure can contain at most 242 records,
because of the sizes of record identifiers and the alert_record_ids_csv parameter. The Java
and REST APIs call the stored procedure, so they have the same limit.
Java API
The Java API for creating alerts is available through the API for creating Scheduling custom server
tasks on page 475.
The API provides the ability to identify the recipients of an alert by specifying lists of users, groups, and
command access control permissions. The criteria are ORed together inclusively, so only one criterion
must be met in order for a user to receive an alert. It is also possible to specify that an alert should be
sent to all system users.
Note: You can send an alert to all users that are in the Analyst group OR the Clerk group.
But you cannot specify that an alert should be sent only to users that are members of both the
Analyst group AND the Clerk group.
For example:
package com.example.tasks;
import com.i2group.disco.alerts.AlertIcon;
import com.i2group.disco.alerts.IAlertManager;
import com.i2group.disco.alerts.ISendTo;
import com.i2group.disco.alerts.SendTo;
import com.i2group.disco.task.spi.CustomTaskFailedException;
import com.i2group.disco.task.spi.IScheduledTask;
import com.i2group.tracing.ITracer;
import java.sql.SQLException;
import java.util.List;
@Override
public void onStartup(IScheduledTaskObjects scheduledTaskObjects) {
this.scheduledTaskObjects = scheduledTaskObjects;
}
@Override
public void run() {
final IAlertManager alertManager = scheduledTaskObjects.getAlertManager();
try {
// Create an alert with a hyperlink
final ISendTo sendToTestUser = SendTo.createWithUsers("test-user");
alertManager.createAlert(
sendToTestUser, "For your information",
AlertIcon.INFORMATION, "Take a look at this link",
"https://example.domain/news/123");
REST API
The REST API for creating alerts maps to the Java API and provides the same functionality. You can
view the documentation for the REST API on a running i2 Analyze server at /opal/doc/#!/alerts.
Important: In order to use the REST API for creating alerts, you must be logged in to the i2
Analyze server as a user with the i2:AlertsCreate command access control permission.
For example, the following POST request creates an alert that contains a link, and sends it to the user
named test-user:
POST http://<server-name>:9082/opal/api/v1/alerts
{
"title": "For your information",
"sendTo": {
"users": ["test-user"]
},
"icon": "INFORMATION",
"message": "Take a look at this link",
"href": "https://example.domain/news/123"
}
And the next POST request creates an alert that contains records, and sends it to all the users in the
Analyst group:
POST http://<server-name>:9082/opal/api/v1/alerts
{
"title": "Records added!",
"sendTo": {
"groups": ["Analyst"]
},
"icon": "PLUS",
"message": "New records added from example source",
"recordIdentifiers": [
"h6cBtBz6CuYEU3YwzqUuZEmrhJ",
"TaUjsCpKUnmY85YWgBQbeNqou2"
]
}
{
"sentToCount" : 1
}
The sentToCount field indicates how many users the alert was sent to. If the count is
different from what you expected - too high or too low - you can enable DEBUG logging on the
com.i2group.disco.alerts.SendToResolver class, and send the request again to see an
explanation in the server logs of who the alert was sent to.
Deleting alerts
To delete an alert programmatically, use the database view named IS_Public.Alerts_DV. Alerts
can be deleted through this view with standard SQL expressions.
For example, to delete all alerts, execute:
To delete specific alerts, use a WHERE clause. For example, to delete all alerts sent more than 30 days
ago (on Db2):
Note: A user who has received an alert can delete it for themselves through the user interface in i2
Analyst's Notebook Premium.
Note: Writing implementations of the onStartup() and close() methods is optional. They have
default implementations that do nothing.
If the custom task fails for any reason, you can control the subsequent behavior by throwing a
CustomTaskFailedException from the run() method. Depending on how the exception is
constructed, the server might re-run the task immediately, or when it is next scheduled, or never.
Note: A new fragment specifically for custom tasks is also an option, provided that it is also added
to the topology.xml file for the deployment.
Configuration
In order for the i2 Analyze server to discover the custom task, entries that identify it must be present
in one of the settings files. A typical location for these entries is the toolkit/configuration/
fragments/opal-services/WEB-INF/classes/DiscoServerSettingsCommon.properties
file.
The new entries have two parts, in the following format:
CustomTaskScheduler.<TaskName>.Class=<ImplementationClass>
CustomTaskScheduler.<TaskName>.Expression=<CronExpression>
CustomTaskScheduler.AlertUsersBeforeDeletingCharts.Class=com.example.alert.ScheduledTask
CustomTaskScheduler.AlertUsersBeforeDeletingCharts.Expression=0 0 * * *
CustomTaskScheduler.TableCleaner.Class=com.example.cleaner.CleanupTask
CustomTaskScheduler.TableCleaner.Expression=0 */4 * * *
Deployment
When the .jar file is copied into the toolkit, and the DiscoServerSettingsCommon.properties
file is updated with the custom task class name and the cron schedule, the task is ready for deployment
to the i2 Analyze server.
To deploy the custom task to the i2 Analyze server, run setup -t deployLiberty to update the
configuration with your changes. Then, to set up the task to run according to its defined schedule, run
setup -t startLiberty.
Upgrading i2 Analyze
Different deployments of i2 Analyze can contain different combinations of its component parts. To
upgrade a particular deployment of i2 Analyze, you must upgrade the deployment toolkit and then
configure and upgrade the deployment itself.
This information relates to the latest release of i2 Analyze.
Troubleshooting and support
i2 Support
Upgrade paths
To upgrade a deployment of i2® Analyze, you must first upgrade the deployment toolkit that you are
using. You then use this upgraded toolkit to upgrade your deployment. The version of your current
deployment determines exactly which path to follow.
• If you are upgrading from i2 Analyze 4.3.1.0 or earlier, you must first upgrade your deployment to
a supported version before upgrading to this release. The earliest supported version from which to
upgrade to this release is 4.3.1.1.
For further information on changes to supported versions, see Configuration and database changes.
• If you are upgrading i2® Analyze version 4.3.1.1 or later, complete the instructions in Upgrading to i2
Analyze 4.3.5 on page 478.
Note: From i2® Analyze version 4.3.3, the Analysis Repository, Intelligence Portal, and Onyx services
are no longer supported. Deployments that contain an Analysis Repository should migrate data into
another repository.
Software prerequisites
If you are upgrading a deployment of i2 Analyze 4.3.3 or earlier that uses SQL Server for the
Information Store database, you must install a later version of the ODBC Driver SQL Server and
sqlcmd utility. For more information, see Software prerequisites.
As part of the i2 Analyze upgrade process, WebSphere® Liberty, Solr, ZooKeeper, and Java™ are
updated. You do not need to download and update these prerequisites before you upgrade an existing
deployment.
6. If you are upgrading a deployment of i2 Analyze 4.3.3 or earlier that uses SQL Server for the
Information Store database, update the JDBC driver in your deployment configuration to the Java 11
version.
For more information about which driver to install, see Specifying the JDBC driver.
After you upgrade the deployment toolkit, you can use it to upgrade the deployment to version 4.3.5:
7. Upgrade and start i2 Analyze according to the instructions in Upgrading an i2 Analyze deployment.
8. If you are using the IBM HTTP Server, restart it.
If your deployment includes the ETL toolkit, you must upgrade the ETL toolkit to version 4.3.5 after you
upgrade the rest of the deployment. For more information, see Upgrading the ETL toolkit on page 479.
After you upgrade, you might need to update the configuration of your deployment for any new or
modified configuration settings. For more information about new and modified configuration settings,
see Configuration and database changes.
When you start the server after you upgrade, extra processing of the data in the Information Store is
completed after the upgrade. During this processing, you might not be able to ingest, update, and delete
data in the Information Store. For more information, see Information Store processing after you upgrade
i2 Analyze on page 486.
Note: If your previous deployment of i2 Analyze contained an Analysis Repository, the connection
to that repository is no longer available after the upgrade. If you had a Local Analysis Repository, the
database is retained to allow you to export the data.
Upgrade resources
Depending on the configuration of your deployment, you might need to complete extra tasks to upgrade
your system. The extra tasks might need to be completed before or after you upgrade the system.
Procedure
1. Open a command prompt on the server, and navigate to the toolkit\scripts directory of the i2
Analyze toolkit.
2. To upgrade the deployment, run the following command:
setup -t upgrade
If the create-databases attribute in the topology.xml file is set to false, you must run the
scripts to update the database manually.
The scripts are created in the toolkit\scripts\database\db2\InfoStore\generated
\upgrade directory. Run the scripts in ascending alphanumeric order. The user that you run the
scripts as must have permission to drop and create database tables and views.
3. To complete this part of the upgrade and start the application, run the following command:
setup -t start
What to do next
After you upgrade and restart i2 Analyze, return to perform the rest of the instructions to finish
Upgrading to i2 Analyze 4.3.5 on page 478.
Procedure
1. Upgrade and copy the i2 Analyze configuration.
a. Upgrade the i2 Analyze configuration:
setup -t upgradeConfiguration
b. Provided that all servers have the same configuration, copy the upgraded toolkit
\configuration to the toolkit directory on each Solr and ZooKeeper server in your
environment. Accept any file overwrites.
2. Upgrade the ZooKeeper and Solr components of i2 Analyze.
a. On each ZooKeeper server, run the following command:
Where zookeeper.hostname is the hostname of the ZooKeeper server where you are running
the command, and matches the value for the host-name attribute of a <zkhost> element in the
topology.xml file.
b. On each Solr server, run the following command:
Where solr.hostname is the hostname of the Solr server where you are running the
command, and matches the value for the host-name attribute of a <solr-node> element in the
topology.xml file.
3. Upgrade the Information Store database.
On the Liberty server, run the following command:
setup -t upgradeDatabases
If the create-databases attribute in the topology.xml file is set to false, you must run the
scripts to update the database manually.
The scripts are created in the toolkit\scripts\database\db2\InfoStore\generated
\upgrade directory. Run the scripts in ascending alphanumeric order. The user that you run the
scripts as must have permission to drop and create database tables and views.
4. Start the ZooKeeper component of i2 Analyze.
On each ZooKeeper server, start the ZooKeeper hosts:
What to do next
After you upgrade and start i2 Analyze, return to perform the rest of the instructions to finish Upgrading
to i2 Analyze 4.3.5 on page 478.
Version 4.3.5
The following changes are introduced at i2 Analyze version 4.3.5:
Version 4.3.4
The following changes are introduced at i2 Analyze version 4.3.4:
Version 4.3.3
The following changes are introduced at i2 Analyze version 4.3.3:
Record matching
If your existing deployment of i2 Analyze included the i2 Connect gateway but
not the Information Store, then the upgrade process modifies the contents of
ApolloServerSettingsMandatory.properties to reclassify your schema as a gateway schema.
The effect of this change is to modify the identifiers of the item types in the schema.
If your deployment includes match rules files on the server, the upgrade process automatically updates
those files to contain the correct item type identifiers. However, if your users have developed local rules
files for Find Matching Records, you must edit those files before they reconnect to the server after an
upgrade.
On each workstation, follow the instructions to Deploying Find Matching Records match rules on page
213 and use the information in Match rules syntax on page 215 to update the item type identifiers in
the file.
Version 4.3.2
No changes are introduced at i2 Analyze version 4.3.2 that affect upgrade.
Chart storage
For the Information Store database to store charts, the Information Store schema is updated to include
the chart item type. After you upgrade the Information Store database, the following tables are added for
chart storage:
• E_ANALYST_S_NOTEBOOK_CH - the data table, includes the chart properties
• E_ANALYST_S_NOTEBOOK_CH_BIN - the binary data table, includes the binary representation of the
chart
• E_ANALYST_S_NOTEBOOK_CH_TXT - the text data table, includes the text from the chart that can be
searched for
Version 4.3.1
The following changes are introduced at i2 Analyze version 4.3.1:
Version 4.3.0
The following changes are introduced at i2 Analyze version 4.3.0:
Version 4.2.1
The following changes are introduced at i2 Analyze version 4.2.1:
Configuration fragments
You can deploy the Information Store and i2 Connect in the same deployment of i2 Analyze. To
configure a deployment with this topology, some properties files and settings are in a different location:
• If your deployment contained the opal-services-daod fragment, it is renamed to opal-
services. The settings in the OpalServerSettingsDaodMandatory.properties file are now
in the DiscoServerSettingsCommon.properties file.
• If your deployment contained the opal-services-is fragment, the opal-services fragment is
added and the following files are moved to the opal-services fragment:
• DiscoClientSettings.properties
• DiscoServerSettingsCommon.properties
• Results configuration file
• visual-query-configuration.xml
By default, all of the values for the settings in the files are unchanged.
Upgrade processes
There are a number of different processes that can occur after you upgrade i2 Analyze. The processes
that are started depend on the version that you are upgrading to or from. Each process places some
restrictions on the system before they are completed.
The following processes are completed the first time that you upgrade a deployment with a SQL Server
database to version 4.3.4 or later:
• Update multi-line property columns data type from CLOB to VARCHAR
• Blocks analysts from uploading to the Information Store
• Blocks ingestion and deletion
The following processes are completed the first time that you upgrade to version 4.3.1 or later:
The IS_Public.Upgrade_Status view in the Information Store database shows the list of processes
that are complete and pending:
Additionally, the IS_Public.Upgrade_Progress view shows the progress of each process by item
type:
You can use this view to check how many item types are completed and how many are pending. The
IS_Public.Upgrade_Progress view is populated with the item type status only when the process is
in the Pending state in the IS_Public.Upgrade_Status view.
setup -t generateInfoStoreUpgradeScripts
Download pdfs
i2 Analyze documentation is also available in PDF documents.
• Full i2 Analyze documentation
Index
G
glossary 367