Hazelcast Manual PDF
Hazelcast Manual PDF
1 Reference Manual
Hazelcast IMDG Reference Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Customer Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Release Notes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Partners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
JMX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
REST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3. Hazelcast Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5.3. Repartitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6. Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4. Understanding Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.10.1. EncryptionReplacer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.10.2. PropertyReplacer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5. Setting Up Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.1.2. Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Important to Know . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Performance Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
HOST_AWARE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
CUSTOM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
PER_MEMBER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
ZONE_AWARE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
SPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
multicast element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
tcp-ip element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
aws element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
discovery-strategies element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.10.7. Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.1. Terminology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
18.4.2. Checking the Status of the Cluster for REST Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
18.4.3. Checking the Name of the Instance for REST Client. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
JMX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
BuildInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
SystemProperties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
ConfigProperties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
SlowOperations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
Invocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
HazelcastInstance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
SystemLog. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
StoreLatency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
OperationHeartbeats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
MemberHeartbeats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
OperationThreadSamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
WanDiagnostics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
Credentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
24.1.14. Tuning WAN Replication For Lower Latencies and Higher Throughput . . . . . . . . . . . . 631
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767
Hazelcast IMDG Reference Manual
Version 4.0.1
Preface
Welcome to the Hazelcast IMDG (In-Memory Data Grid) Reference Manual. This manual includes
concepts, instructions and examples to guide you on how to use Hazelcast and build Hazelcast
IMDG applications.
This reference manual mostly talks about the Hazelcast member and clients in Java language.
Although, the core of Hazelcast IMDG is based on the Java programming language, it has the
following clients and programming language APIs.
• Java
• .NET
• C++
• Node.js
• Python
• Go
• Scala
We recommend you to learn the basics of Hazelcast IMDG using this manual first. Then, you can
always get the client related resources/links in the Clients chapter.
• Hazelcast or Hazelcast IMDG refers to the open source edition of Hazelcast in-memory data
grid middleware. Hazelcast is also the name of the company (Hazelcast, Inc.) providing the
Hazelcast product.
1
Hazelcast IMDG Architecture
You can see the features for all Hazelcast IMDG editions in the following architecture diagram.
You can see small "HD" boxes for some features in the above diagram. Those
features can use High-Density (HD) Memory Store when it is available. It means if
you have Hazelcast IMDG Enterprise HD, you can use those features with HD
Memory Store.
For more information on Hazelcast IMDG’s Architecture, see the white paper An Architect’s View of
Hazelcast.
Licensing
Hazelcast IMDG and Hazelcast Reference Manual are free and provided under the Apache License,
Version 2.0. Hazelcast IMDG Enterprise and Hazelcast IMDG Enterprise HD is commercially
licensed by Hazelcast, Inc.
2
For more detailed information on licensing, see the License Questions appendix.
Trademarks
Hazelcast is a registered trademark of Hazelcast, Inc. All other trademarks in this manual are held
by their respective owners.
Customer Support
Support for Hazelcast is provided via GitHub\, Mail Group and StackOverflow.
For information on the commercial support for Hazelcast IMDG and Hazelcast IMDG Enterprise, see
hazelcast.com.
Release Notes
See the Release Notes document for the new features, enhancements and fixes performed for each
Hazelcast IMDG release.
• Developing with Git: Document that explains the branch mechanism of Hazelcast and how to
request changes.
• Hazelcast Contributor Agreement form: Form that each contributing developer needs to fill and
send back to Hazelcast.
• Hazelcast on GitHub: Hazelcast repository where the code is developed, issues and pull requests
are managed.
Partners
Hazelcast partners with leading hardware and software technologies, system integrators, resellers
and OEMs including Amazon Web Services, Vert.x, Azul Systems, C2B2. See the Partners page for
the full list of and information on our partners.
See the Release Notes for the new features, enhancements and fixes performed for
each Hazelcast release.
3
Chapter Description
System Properties Added the descriptions for the following new system properties:
• ``
2. Getting Started
This chapter explains how to install Hazelcast and start a Hazelcast member and client. It describes
the executable files in the download package and also provides the fundamentals for configuring
Hazelcast and its deployment options.
2.1. Installation
The following sections explain the installation of Hazelcast IMDG and Hazelcast IMDG Enterprise. It
also includes notes and changes to consider when upgrading Hazelcast.
You can find Hazelcast in standard Maven repositories. If your project uses Maven, you do not need
to add additional repositories to your pom.xml or add hazelcast-4.0.1.jar file into your classpath
(Maven does that for you). Just add the following lines to your pom.xml:
<dependencies>
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast</artifactId>
<version>4.0.1</version>
</dependency>
</dependencies>
As an alternative, you can download and install Hazelcast IMDG yourself. You only need to:
4
2.1.2. Installing Hazelcast IMDG Enterprise
There are two Maven repositories defined for Hazelcast IMDG Enterprise:
<repository>
<id>Hazelcast Private Snapshot Repository</id>
<url>https://repository.hazelcast.com/snapshot/</url>
</repository>
<repository>
<id>Hazelcast Private Release Repository</id>
<url>https://repository.hazelcast.com/release/</url>
</repository>
Hazelcast IMDG Enterprise customers may also define dependencies. See the following example:
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast-enterprise</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast-enterprise-all</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast-enterprise-all</artifactId>
<version>4.0.1</version>
<classifier>javadoc</classifier>
</dependency>
Hazelcast IMDG Enterprise offers you two types of licenses: Enterprise and Enterprise HD. The
supported features differ in your Hazelcast setup according to the license type you own.
• Enterprise license: In addition to the open source edition of Hazelcast, Enterprise features are
the following:
◦ Security
◦ WAN Replication
◦ Clustered REST
◦ Clustered JMX
◦ Rolling Upgrades
5
• Enterprise HD license: In addition to the Enterprise features, Enterprise HD features are the
following:
To use Hazelcast IMDG Enterprise, you need to set the provided license key using one of the
configuration methods shown below.
Hazelcast IMDG Enterprise license keys are required only for members. You do not
need to set a license key for your Java clients for which you want to use IMDG
Enterprise features.
Declarative Configuration:
Add the below line to any place you like in the file hazelcast.xml. This XML file offers you a
declarative way to configure your Hazelcast. It is included in the Hazelcast download package.
When you extract the downloaded package, you will see the file hazelcast.xml under the /bin
directory.
<hazelcast>
...
<license-key>Your Enterprise License Key</license-key>
...
</hazelcast>
Programmatic Configuration:
Alternatively, you can set your license key programmatically as shown below.
If you are using Spring with Hazelcast, then you can set the license key using the Spring XML
schema, as shown below.
<hz:config>
...
<hz:license-key>Your Enterprise License Key</hz:license-key>
...
</hz:config>
As another option, you can set your license key using the below command (the "-D" command line
6
option).
The strings before the <License key> is the human readable part. You can use your license key with
or without this human readable part. So, both the following example license keys are valid:
HazelcastEnterpriseHD#2Nodes#1q2w3e4r5t
1q2w3e4r5t
JMX
The MBean HazelcastInstance.LicenseInfo holds all the relative license details and can be accessed
through Hazelcast’s JMX port (if enabled). The following parameters represent these details:
• maxNodeCountAllowed: Maximum members allowed to form a cluster under the current license.
7
REST
You can access the license details by issuing a GET request through the REST API (if enabled; see the
Using the REST Endpoint Groups section) on the /license resource, as shown below.
curl -v http://localhost:5701/hazelcast/rest/license
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 5701 (#0)
> GET /hazelcast/rest/license HTTP/1.1
> Host: localhost:5701
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 165
<
{"licenseInfo":{"expiryDate":4090168799999,"maxNodeCount":99,"type":3,"companyName":nu
ll,"ownerEmail":null,"keyHash":"OsLh4O6vqDuKEq8lOANQuuAaRnmDfJfRPrFSEhA7T3Y="}}
To update the license of a running cluster, you can issue a POST request through the REST API (if
enabled; see the Using the REST Endpoint Groups section) on the /license as shown below:
The above command updates the license on all running Hazelcast members of the cluster. If
successful, the response looks as follows:
8
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 5001 (#0)
> POST /hazelcast/rest/license HTTP/1.1
> Host: 127.0.0.1:5001
> User-Agent: curl/7.54.0
> Accept: */*
> Content-Length: 164
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 164 out of 164 bytes
< HTTP/1.1 200 OK
< Content-Type: application/javascript
< Content-Length: 364
<
* Connection #0 to host 127.0.0.1 left intact
{"status":"success","licenseInfo":{"expiryDate":1560380399161,"maxNodeCount":10,
"type":-1,"companyName":"ExampleCompany","ownerEmail":"[email protected]",
"keyHash":"ml/u6waTNQ+T4EWxnDRykJpwBmaV9uj+skZzv0SzDhs="},
"message":"License updated at run time - please make sure to update the license
in the persistent configuration to avoid losing the changes on restart."}
As the message in the above example indicates, the license is updated only at runtime. The
persistent configuration of each member needs to be updated manually to ensure that the license
change is not lost on restart. The same message is logged as a warning in each member’s log.
It is only possible to update to a license that expires at the same time or after the current license.
The new license must allow the exact same list of features and the same number of members.
If, for any reason, updating the license fails on some members (member does not respond, license is
not compatible, etc.), the whole operation fails, leaving the cluster in a potentially inconsistent state
(some members have been switched to the new license while some have not). It is up to you to
resolve this situation manually.
Logs
Besides the above approaches (JMX and REST) to access the license details, Hazelcast also starts to
log a license information banner into the log files when the license expiration is approaching.
During the last two months prior to the expiration, this license information banner is logged daily,
as a reminder to renew your license to avoid any interruptions. Once the expiration is due to a
month, the frequency of logging this banner becomes hourly (instead of daily). Lastly, when the
expiration is due in a week, this banner is printed every 30 minutes.
9
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ WARNING @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
HAZELCAST LICENSE WILL EXPIRE IN 29 DAYS.
Your Hazelcast cluster will stop working after this time.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Please pay attention to the license warnings to prevent any possible interruptions
in the operation of your Hazelcast applications.
Hazelcast IMDG Version JDK Oracle JDK IBM SDK, Azul Zing OpenJD
Version Java JDK K
Technology
Edition
Up to 3.11 6
10
Hazelcast IMDG Version JDK Oracle JDK IBM SDK, Azul Zing OpenJD
Version Java JDK K
Technology
Edition
Hazelcast IMDG 3.10 and older releases are not fully tested on JDK 9 and newer, so
there may be some features that are not working properly.
See the following sections for the details of Hazelcast IMDG supporting JDK 9 and
newer:
• Running in Modular Java: Talks about the new module system present in Java 9
and newer and how you can run a Hazelcast application on it.
• TLS/SSL for Hazelcast Members: Lists TLSv1.3, which comes with Java 11, as a
supported TLS version.
The Java Module System comes with stricter visibility rules. It affects Hazelcast which uses internal
Java API to reach the best performance results.
Hazelcast needs the java.se module and access to the following Java packages for a proper work:
• java.base/jdk.internal.ref
• java.base/java.nio (reflective access)
11
You can provide the access to the above mentioned packages by using --add-exports and --add
-opens (for the reflective access) Java arguments.
1. Create a simple Java application using the Hazelcast distributed map and queue.
2. Run our application twice to have a cluster with two members (JVMs).
3. Connect to our cluster from another Java application by using the Hazelcast Native Java Client
API.
Let’s begin.
• The following code starts the first Hazelcast member and creates and uses the customers map
and queue.
12
Config cfg = new Config();
HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg);
Map<Integer, String> mapCustomers = instance.getMap("customers");
mapCustomers.put(1, "Joe");
mapCustomers.put(2, "Ali");
mapCustomers.put(3, "Avi");
• Run this GettingStarted class a second time to get the second member started. The members
form a cluster and the output is similar to the following.
Here, you can see the size of your cluster (size) and member list version (ver). The member list
version is incremented when changes happen to the cluster, e.g., a member leaving from or
joining to the cluster.
The above member list format is introduced with Hazelcast 3.9. You can enable the legacy
member list format, which was used for the releases before Hazelcast 3.9, using the system
property hazelcast.legacy.memberlist.format.enabled. See the System Properties appendix. The
following is an example for the legacy member list format:
Members [2] {
Member [127.0.0.1]:5701 - c1ccc8d4-a549-4bff-bf46-9213e14a9fd2 this
Member [127.0.0.1]:5702 - 33a82dbf-85d6-4780-b9cf-e47d42fb89d4
}
• Now, add the hazelcast-client-4.0.1.jar library to your classpath. This is required to use a
Hazelcast client.
• The following code starts a Hazelcast Client, connects to our cluster, and prints the size of the
customers map.
13
public class GettingStartedClient {
public static void main( String[] args ) {
ClientConfig clientConfig = new ClientConfig();
HazelcastInstance client = HazelcastClient.newHazelcastClient( clientConfig
);
IMap map = client.getMap( "customers" );
System.out.println( "Map Size:" + map.size() );
}
}
• When you run it, you see the client properly connecting to the cluster and printing the map size
as 3.
Hazelcast also offers a tool, Management Center, that enables you to monitor your cluster. It is
included in your Hazelcast IMDG download package and can also be downloaded from the
Hazelcast website’s download page. You can use this tool to monitor your cluster, cluster members,
clients, data structures and WAN replications. See the documentation for details on Hazelcast
Management Center.
By default, Hazelcast uses multicast to discover other members that can form a cluster. If you are
working with other Hazelcast developers on the same network, you may find yourself joining their
clusters under the default settings. Hazelcast provides a way to segregate clusters within the same
network when using multicast. See the Creating Clusters section for more information.
Alternatively, if you do not wish to use the default multicast mechanism, you can provide a fixed
list of IP addresses that are allowed to join. See the Join configuration section for more information.
• start.sh / start.bat: Starts a Hazelcast member with default configuration in the working
directory.
• stop.sh / stop.bat: Stops the Hazelcast member that was started in the current working
directory.
14
start.sh / start.bat scripts lets you start one Hazelcast instance per folder. To
start a new instance, please unzip Hazelcast ZIP or TAR.GZ package in a new
folder.
You can also use the start scripts to deploy your own library to a Hazelcast
member. See the Adding User Library to CLASSPATH section.
• cluster.sh: Provides basic functionalities for cluster management, such as getting and changing
the cluster state, shutting down the cluster or forcing the cluster to clean its persisted data and
make a fresh start. See the Using the Script cluster.sh section to learn the usage of this script.
• cp-subsystem.sh: Provides access to the CP subsystem management APIs using the REST
interface. See the CP Subsystem Management APIs section.
• healthcheck.sh: Provides basic information about your clusters such as the state and size. See
the Using the healthcheck.sh Script section.
Hazelcast Cloud uses Docker and Kubernetes, and is powered by Hazelcast IMDG Enterprise HD. It
is initially available on Amazon Web Services (AWS), to be followed by Microsoft Azure and Google
Cloud Platform (GCP). Since it is based on Hazelcast IMDG Enterprise HD, it features advanced
functionality such as TLS, multi-region, persistence, and high availability.
See the Hazelcast AWS cloud discovery module’s documentation to learn more about this module.
Alternative to the discovery module mentioned above, if you are a Vagrant and/or Chef user, you
can also check our sample project that uses these 3rd party tools to deploy a Hazelcast cluster on
EC2.
15
For information on how to automatically deploy your cluster onto Azure, see the Deployment
section of the Hazelcast Azure plugin repository.
• Hazelcast IMDG
• Hazelcast OpenShift
After you pull an image from the Docker registry, you can run your image to start the Management
Center or a Hazelcast instance with Hazelcast’s default configuration. All repositories provide the
latest stable releases but you can pull a specific release, too. You can also specify environment
variables when running the image.
If you want to start a customized Hazelcast instance, you can extend the Hazelcast image by
providing your own configuration file.
This feature is provided as a Hazelcast plugin. See its own GitHub repo at Hazelcast Docker for
details on configurations and usages.
3. Hazelcast Overview
Hazelcast is an open source In-Memory Data Grid (IMDG). It provides elastically scalable
distributed In-Memory computing, widely recognized as the fastest and most scalable approach to
application performance. Hazelcast does this in open source. More importantly, Hazelcast makes
distributed computing simple by offering distributed implementations of many developer-friendly
interfaces from Java such as Map, Queue, ExecutorService, Lock and JCache. For example, the Map
interface provides an In-Memory Key Value store which confers many of the advantages of NoSQL
in terms of developer friendliness and developer productivity.
In addition to distributing data In-Memory, Hazelcast provides a convenient set of APIs to access
the CPUs in your cluster for maximum processing speed. Hazelcast is designed to be lightweight
and easy to use. Since Hazelcast is delivered as a compact library (JAR) and since it has no external
dependencies other than Java, it easily plugs into your software solution and provides distributed
data structures and distributed computing utilities.
Hazelcast is highly scalable and available. Distributed applications can use Hazelcast for distributed
16
caching, synchronization, clustering, processing, pub/sub messaging, etc. Hazelcast is implemented
in Java and has clients for Java, C/C++, .NET, REST, Python, Go and Node.js. Hazelcast also speaks
Memcached protocol. It plugs into Hibernate and can easily be used with any existing database
system.
If you are looking for in-memory speed, elastic scalability and the developer friendliness of NoSQL,
Hazelcast is a great choice.
Hazelcast is Simple
Hazelcast is written in Java with no other dependencies. It exposes the same API from the familiar
Java util package, exposing the same interfaces. Just add hazelcast.jar to your classpath and you
can quickly enjoy JVMs clustering and start building scalable applications.
Hazelcast is Peer-to-Peer
Unlike many NoSQL solutions, Hazelcast is peer-to-peer. There is no master and slave; there is no
single point of failure. All members store equal amounts of data and do equal amounts of
processing. You can embed Hazelcast in your existing application or use it in client and server
mode where your application is a client to Hazelcast members.
Hazelcast is Scalable
Hazelcast is designed to scale up to hundreds and thousands of members. Simply add new
members; they automatically discover the cluster and linearly increase both the memory and
processing capacity. The members maintain a TCP connection between each other and all
communication is performed through this layer.
Hazelcast is Fast
Hazelcast stores everything in-memory. It is designed to perform very fast reads and updates.
Hazelcast is Redundant
Hazelcast keeps the backup of each data entry on multiple members. On a member failure, the data
is restored from the backup and the cluster continues to operate without downtime.
Data is at the core of software systems. In conventional architectures, a relational database persists
and provides access to data. Applications are talking directly with a database which has its backup
as another machine. To increase performance, tuning or a faster machine is required. This can cost
a large amount of money or effort.
There is also the idea of keeping copies of data next to the database, which is performed using
technologies like external key-value stores or second level caching that help offload the database.
However, when the database is saturated or the applications perform mostly "put" operations
(writes), this approach is of no use because it insulates the database only from the "get" loads
17
(reads). Even if the applications are read-intensive there can be consistency problems - when data
changes, what happens to the cache and how are the changes handled? This is when concepts like
time-to-live (TTL) or write-through come in.
In the case of TTL, if the access is less frequent than the TTL, the result is always a cache miss. On
the other hand, in the case of write-through caches, if there are more than one of these caches in a
cluster, it means there are consistency issues. This can be avoided by having the members
communicate with each other so that entry invalidations can be propagated.
We can conclude that an ideal cache would combine TTL and write-through features. There are
several cache servers and in-memory database solutions in this field. However, these are stand-
alone single instances with a distribution mechanism that is provided by other technologies to an
extent. So, we are back to square one; we experience saturation or capacity issues if the product is a
single instance or if consistency is not provided by the distribution.
Hazelcast, a brand new approach to data, is designed around the concept of distribution. Hazelcast
shares data around the cluster for flexibility and performance. It is an in-memory data grid for
clustering and highly scalable data distribution.
One of the main features of Hazelcast is that it does not have a master member. Each cluster
member is configured to be the same in terms of functionality. The oldest member (the first
member created in the cluster) automatically performs the data assignment to cluster members. If
the oldest member dies, the second oldest member takes over.
You can come across with the term "master" or "master member" in some sections
of this manual. They are used for contextual clarification purposes; please
remember that they refer to the "oldest member" which is explained in the above
paragraph.
Another main feature of Hazelcast is that the data is held entirely in-memory. This is fast. In the
case of a failure, such as a member crash, no data is lost since Hazelcast distributes copies of the
data across all the cluster members.
As shown in the feature list in the Distributed Data Structures chapter, Hazelcast supports a
number of distributed data structures and computing utilities. These provide powerful ways of
accessing distributed clustered memory and accessing CPUs for true distributed computing.
• Hazelcast provides out-of-the-box distributed data structures, such as Map, Queue, MultiMap,
Topic, Lock and Executor.
• There is no "master," meaning no single point of failure in a Hazelcast cluster; each member in
the cluster is configured to be functionally the same.
18
• When the size of your memory and compute requirements increase, new members can be
dynamically joined to the Hazelcast cluster to scale elastically.
• Data is resilient to member failure. Data backups are distributed across the cluster. This is a big
benefit when a member in the cluster crashes as data is not lost.
• Members are always aware of each other unlike in traditional key-value caching solutions.
• You can build your own custom-distributed data structures using the Service Programming
Interface (SPI) if you are not happy with the data structures provided.
Finally, Hazelcast has a vibrant open source community enabling it to be continuously developed.
• a cache, particularly an open source JCache provider with elastic distributed scalability
• a primary data store for applications with utmost performance, scalability and low-latency
requirements
• to cluster highly changing data with event notifications, e.g., user based events, and to queue
and distribute background tasks
• to share information (user information, queues, maps, etc.) on the fly with multiple members in
different installations under OSGI environments
• to share thousands of keys in a cluster where there is a web service interface on an application
server and some validation
• as a distributed topic (publish/subscribe server) to build scalable chat servers for smartphones
• to distribute user object states across the cluster, to pass messages between objects and to share
system data structures (static initialization state, mirrored objects, object identity generators)
19
• to share datasets, e.g., table-like data structure, to be used by applications
• to distribute the load and collect status from Amazon EC2 servers where the front-end is
developed using, for example, Spring framework
• as storage for session data in web applications (enables horizontal scalability of the web
application).
If you have an application whose main focal point is asynchronous or high performance computing
and lots of task executions, then Embedded deployment is the preferred way. In Embedded
deployment, members include both the application and Hazelcast data and services. The advantage
of the Embedded deployment is having a low-latency data access.
In the Client/Server deployment, Hazelcast data and services are centralized in one or more server
members and they are accessed by the application through clients. You can have a cluster of server
members that can be independently created and scaled. Your clients communicate with these
members to reach to Hazelcast data and services on them.
20
Hazelcast provides native clients (Java, .NET and C++), Memcache and REST clients, Scala, Python
and Node.js client implementations.
Client/Server deployment has advantages including more predictable and reliable Hazelcast
performance, easier identification of problem causes and, most importantly, better scalability.
When you need to scale in this deployment type, just add more Hazelcast server members. You can
address client and server scalability concerns separately.
Note that Hazelcast member libraries are available only in Java. Therefore, embedding a member
to a business service, it is only possible with Java. Applications written in other languages (.NET,
C++, Node.js, etc.) can use Hazelcast client libraries to access the cluster. See the Hazelcast Clients
chapter for information on the clients and other language implementations.
If you want low-latency data access, as in the Embedded deployment, and you also want the
scalability advantages of the Client/Server deployment, you can consider defining Near Caches for
your clients. This enables the frequently used data to be kept in the client’s local memory. See the
Configuring Client Near Cache section.
See the Data Partitioning section for more information on how Hazelcast
partitions your data.
21
3.5. Data Partitioning
As you read in the Sharding in Hazelcast section, Hazelcast shards are called Partitions. Partitions
are memory segments that can contain hundreds or thousands of data entries each, depending on
the memory capacity of your system. Each Hazelcast partition can have multiple replicas, which are
distributed among the cluster members. One of the replicas becomes the primary and other replicas
are called backups. Cluster member which owns primary replica of a partition is called partition
owner. When you read or write a particular data entry, you transparently talk to the owner of the
partition that contains the data entry.
By default, Hazelcast offers 271 partitions. When you start a cluster with a single member, it owns
all of 271 partitions (i.e., it keeps primary replicas for 271 partitions). The following illustration
shows the partitions in a Hazelcast cluster with single member.
When you start a second member on that cluster (creating a Hazelcast cluster with two members),
the partition replicas are distributed as shown in the illustration here.
Partition distributions in the below illustrations are shown for the sake of
simplicity and for descriptive purposes. Normally, the partitions are not
distributed in any order, as they are shown in these illustrations, but are
distributed randomly (they do not have to be sequentially distributed to each
member). The important point here is that Hazelcast equally distributes the
partition primaries and their backup replicas among the members.
22
In the illustration, the partition replicas with black text are primaries and the partition replicas
with blue text are backups. The first member has primary replicas of 135 partitions (black) and
each of these partitions are backed up in the second member (i.e., the second member owns the
backup replicas) (blue). At the same time, the first member also has the backup replicas of the
second member’s primary partition replicas.
As you add more members, Hazelcast moves some of the primary and backup partition replicas to
the new members one by one, making all members equal and redundant. Thanks to the consistent
hashing algorithm, only the minimum amount of partitions are moved to scale out Hazelcast. The
following is an illustration of the partition replica distributions in a Hazelcast cluster with four
members.
Hazelcast distributes partitions' primary and backup replicas equally among the members of the
cluster. Backup replicas of the partitions are maintained for redundancy.
Your data can have multiple copies on partition primaries and backups, depending
on your backup count. See the Backing Up Maps section.
Hazelcast also offers lite members. These members do not own any partition. Lite members are
intended for use in computationally-heavy task executions and listener registrations. Although they
do not own any partitions, they can access partitions that are owned by other members in the
cluster.
Hazelcast distributes data entries into the partitions using a hashing algorithm. Given an object key
(for example, for a map) or an object name (for example, for a topic or list):
The result of this modulo - MOD(hash result, partition count) - is the partition in which the data
23
will be stored, that is the partition ID. For ALL members you have in your cluster, the partition ID
for a given key is always the same.
When you start a member, a partition table is created within it. This table stores the partition IDs
and the cluster members to which they belong. The purpose of this table is to make all members
(including lite members) in the cluster aware of this information, making sure that each member
knows where the data is.
The oldest member in the cluster (the one that started first) periodically sends the partition table to
all members. In this way each member in the cluster is informed about any changes to partition
ownership. The ownerships may be changed when, for example, a new member joins the cluster, or
when a member leaves the cluster.
If the oldest member of the cluster goes down, the next oldest member sends the
partition table information to the other ones.
You can configure the frequency (how often) that the member sends the partition table the
information by using the hazelcast.partition.table.send.interval system property. The property is
set to every 15 seconds by default.
3.5.3. Repartitioning
In these cases, the partition table in the oldest member is updated with the new partition
ownerships. Note that if a lite member joins or leaves a cluster, repartitioning is not triggered since
lite members do not own any partitions.
3.6. Resources
• Hazelcast source code can be found at GitHub/Hazelcast.
4. Understanding Configuration
This chapter describes the options to configure your Hazelcast applications and explains the
utilities which you can make use of while configuring. You can configure Hazelcast using one or
mix of the following options:
• Declarative way
24
• Programmatic way
• hazelcast.xml: Default declarative XML configuration file for Hazelcast. The configuration for
the distributed data structures in this XML file should be fine for most of the Hazelcast users. If
not, you can tailor this XML file according to your needs by adding/removing/modifying
properties. Also see the Setting Up Clusters chapter for the network related configurations.
25
<hazelcast>
...
<cluster-name>dev</cluster-name>
<management-center scripting-enabled="false" />
<network>
<port auto-increment="true" port-count="100">5701</port>
<outbound-ports>
<!--
Allowed port range when connecting to other members.
0 or * means the port provided by the system.
-->
<ports>0</ports>
</outbound-ports>
<join>
<multicast enabled="true">
<multicast-group>224.2.2.3</multicast-group>
<multicast-port>54327</multicast-port>
</multicast>
<tcp-ip enabled="false">
<interface>127.0.0.1</interface>
<member-list>
<member>127.0.0.1</member>
</member-list>
</tcp-ip>
</join>
</network>
<map name="default">
<time-to-live-seconds>0</time-to-live-seconds>
</map>
...
</hazelcast>
The identical part of the configuration extracted from hazelcast.yaml is shown as below.
26
hazelcast:
...
cluster-name: dev
management-center:
scripting-enabled: false
network:
port:
auto-increment: true
port-count: 100
port: 5701
outbound-ports:
# Allowed port range when connecting to other nodes.
# 0 or * means use system provided port.
- 0
join:
multicast:
enabled: true
multicast-group: 224.2.2.3
multicast-port: 54327
tcp-ip:
enabled: false
interface: 127.0.0.1
member-list:
- 127.0.0.1
map:
default:
time-to-live-seconds: 0
...
You can compose the declarative configuration of your Hazelcast member or Hazelcast client from
multiple declarative configuration snippets. In order to compose a declarative configuration, you
can import different declarative configuration files. Composing configuration files is supported both
in XML and YAML configurations with the limitation that only configuration files written in the
same language can be composed.
Let’s say you want to compose the declarative configuration for Hazelcast out of two XML
configurations: development-cluster-config.xml and development-network-config.xml. These two
configurations are shown below.
development-cluster-config.xml:
<hazelcast>
<cluster-name>dev</cluster-name>
</hazelcast>
development-network-config.xml:
27
<hazelcast>
<network>
<port auto-increment="true" port-count="100">5701</port>
<join>
<multicast enabled="true">
<multicast-group>224.2.2.3</multicast-group>
<multicast-port>54327</multicast-port>
</multicast>
</join>
</network>
</hazelcast>
To get your example Hazelcast declarative configuration out of the above two, use the <import/>
element as shown below.
<hazelcast>
<import resource="development-group-config.xml"/>
<import resource="development-network-config.xml"/>
</hazelcast>
The above example using the YAML configuration files looks like the following:
development-cluster-config.yaml:
hazelcast:
cluster-name: dev
development-network-config.yaml:
hazelcast:
network:
port:
auto-increment: true
port-count: 100
port: 5701
join:
multicast:
enabled: true
multicast-group: 224.2.2.3
multicast-port: 54327
Composing the above two YAML configuration files needs them to be imported as shown below.
28
hazelcast:
import:
- development-group-config.yaml
- development-network-config.yaml
This feature also applies to the declarative configuration of Hazelcast client. See the following
examples.
client-cluster-config.xml:
<hazelcast-client>
<cluster-name>dev</cluster-name>
</hazelcast-client>
client-network-config.xml:
<hazelcast-client>
<network>
<cluster-members>
<address>127.0.0.1:7000</address>
</cluster-members>
</network>
</hazelcast-client>
To get a Hazelcast client declarative configuration from the above two examples, use the <import/>
element as shown below.
<hazelcast-client>
<import resource="client-cluster-config.xml"/>
<import resource="client-network-config.xml"/>
</hazelcast-client>
The same client configuration using the YAML language is shown below.
client-cluster-config.yaml:
hazelcast-client:
cluster-name: dev
client-network-config.yaml:
29
hazelcast-client:
network:
cluster-members:
- 127.0.0.1:7000
Composing a Hazelcast client declarative configuration by importing the above two examples is
shown below.
hazelcast-client:
import:
- client-cluster-config.yaml
- client-network-config.yaml
Resources from the classpath and file system may also be used to compose a declarative
configuration:
<hazelcast>
<import resource="file:///etc/hazelcast/development-cluster-config.xml"/> <!--
loaded from filesystem -->
<import resource="classpath:development-network-config.xml"/> <!-- loaded from
classpath -->
</hazelcast>
hazelcast:
import:
# loaded from filesystem
- file:///etc/hazelcast/development-cluster-config.yaml
# loaded from classpath
- classpath:development-network-config.yaml
Importing resources with variables in their names is also supported. See the following example
snippets:
<hazelcast>
<import resource="${environment}-cluster-config.xml"/>
<import resource="${environment}-network-config.xml"/>
</hazelcast>
30
hazelcast:
import:
- ${environment}-cluster-config.yaml
- ${environment}-network-config.yaml
See the Using Variables section to learn how you can set the configuration
elements with variables.
You can configure the Hazelcast members and Java clients declaratively with YAML configuration
files in installations of Hazelcast running on Java runtime version 8 or above.
The structure of the YAML configuration follows the structure of the XML configuration. Therefore,
you can rewrite the existing XML configurations in YAML easily. There are some differences
between the XML and YAML languages that make the two declarative configurations to slightly
derive as the the following examples show.
In the YAML declarative configuration, mappings are used in which the name of the mapping node
needs to be unique within its enclosing mapping. See the following example with configuring two
maps in the same configuration file.
In the XML configuration files, we have two <map> elements as shown below.
<hazelcast>
...
<map name="map1">
<!-- map1 configuration -->
</map>
<map name="map2">
<!-- map2 configuration -->
</map>
...
</hazelcast>
In the YAML configuration, the map can be configured under a mapping map as shown in the
following example.
hazelcast:
...
map:
map1:
# map1 configuration
map2:
# map2 configuration
...
31
The XML and YAML configurations above define the same maps map1 and map2. Please note that in
the YAML configuration file there is no name node, instead, the name of the map is used as the name
of the mapping for configuring the given map.
There are other configuration entries that have no unique names and are listed in the same
enclosing entry. Examples to this kind of configurations are listing the member addresses,
interfaces in the networking configurations and defining listeners. The following example
configures listeners to illustrate this.
<hazelcast>
...
<listeners>
<listener>com.hazelcast.examples.MembershipListener</listener>
<listener>com.hazelcast.examples.MigrationListener</listener>
<listener>com.hazelcast.examples.PartitionLostListener</listener>
</listeners>
...
</hazelcast>
hazelcast:
...
listeners:
- com.hazelcast.examples.MembershipListener
- com.hazelcast.examples.MigrationListener
- com.hazelcast.examples.PartitionLostListener
...
Another notable difference between XML and YAML is the lack of the attributes in the case of
YAML. Everything that can be configured with an attribute in the XML configuration is a scalar
node in the YAML configuration with the same name. See the following example.
hazelcast:
<hazelcast>
...
<network>
<join>
<multicast enabled="true">
<multicast-group>1.2.3.4</multicast-group>
<!-- other multicast configuration options -->
</multicast>
</join>
</network>
...
</hazelcast>
32
In the identical YAML configuration, the enabled attribute of the XML configuration is a scalar node
on the same level with the other items of the multicast configuration.
hazelcast:
...
network:
join:
multicast:
enabled: true
multicast-group: 1.2.3.4
# other multicast configuration options
...
You can refer to the full example YAML configuration files placed in the /bin folder of the
downloadable hazelcast-4.0.1.zip after unzipping it. Please see the complete list of the full
example YAML configurations here.
To create a Hazelcast member with the above example configuration, pass the configuration object
as shown below:
The Config must not be modified after the Hazelcast instance is started. In other
words, all configuration must be completed before creating the HazelcastInstance.
Certain additional configuration elements can be added at runtime as described in
the Dynamically Adding Data Structure Configuration on a Cluster section.
You can also create a named Hazelcast member. In this case, you should set instanceName of Config
object as shown below:
33
Config config = new Config();
config.setInstanceName( "my-instance" );
Hazelcast.newHazelcastInstance( config );
Hazelcast.getHazelcastInstanceByName( "my-instance" );
Hazelcast.getAllHazelcastInstances();
If you want to specify your own configuration file to create Config, Hazelcast supports several ways
including filesystem, classpath, InputStream and URL.
Declarative Configuration:
34
<hazelcast>
...
<properties>
<property name="hazelcast.property.foo">value</property>
</properties>
...
</hazelcast>
hazelcast:
...
properties:
hazelcast.property.foo: value
...
Programmatic Configuration:
or
java -Dhazelcast.property.foo=value
You will see Hazelcast system properties mentioned throughout this Reference Manual as required
in some of the chapters and sections. All Hazelcast system properties are listed in the System
Properties appendix with their descriptions, default values and property types as a reference for
you.
35
<hz:hazelcast id="instance">
<hz:config>
<hz:cluster name="dev"/>
<hz:network port="5701" port-auto-increment="false">
<hz:join>
<hz:multicast enabled="false"/>
<hz:tcp-ip enabled="true">
<hz:members>10.10.1.2, 10.10.1.3</hz:members>
</hz:tcp-ip>
</hz:join>
</hz:network>
</hz:config>
</hz:hazelcast>
See the Integration with Spring section for more information on Hazelcast-Spring integration.
It is possible to dynamically add configuration for certain data structures at runtime; these can be
added by invoking one of the Config.add*Config methods on the Config object obtained from a
running member’s HazelcastInstance.getConfig() method. For example:
Dynamic configuration elements must be fully configured before the invocation of add*Config
method: at that point, the configuration object is delivered to every member of the cluster and
added to each member’s dynamic configuration, so mutating the configuration object after the
add*Config invocation does not have an effect.
As dynamically added data structure configuration is propagated across all cluster members,
failures may occur due to conditions such as timeout and network partition. The configuration
propagation mechanism internally retries adding the configuration whenever a membership
change is detected. However if an exception is thrown from add*Config method, the configuration
may have been partially propagated to some cluster members and adding the configuration should
be retried by the user.
Adding a new dynamic configuration is supported for all add*Config methods except the following:
36
• JobTracker: It has been deprecated since Hazelcast 3.8.
• ListenerConfig: Listeners can be instead added at runtime via other API such as
HazelcastInstance.getCluster().addMembershipListener and
HazelcastInstance.getPartitionService().addMigrationListener.
Keep in mind that this feature also works for Hazelcast Java clients. See the following example:
If your cluster has data structures with configurations added during runtime,
those configurations are lost when a cluster restart occurs due to any reason since
they are not persisted. This will be improved in the future Hazelcast IMDG
releases.
Attempting to add a dynamic configuration, when a static configuration for the same element
already exists, throws ConfigurationException. For example, assuming we start a member with the
following fragment in hazelcast.xml configuration:
<hazelcast>
...
<map name="sessions">
...
</map>
...
</hazelcast>
Then adding a dynamic configuration for a map with the name sessions throws a
ConfigurationException:
37
HazelcastInstance instance = Hazelcast.newHazelcastInstance();
When attempting to add dynamic configuration for an element for which dynamic configuration
has already been added, then if a configuration conflict is detected a ConfigurationException is
thrown. For example:
Dynamically added data structure configuration may reference user customizations, such as a user-
provided MapLoader implementation referenced by a MapConfig. User customizations can be usually
configured using either of the following:
• by specifying a class or factory class name, e.g., MapStoreConfig.setClassName, and letting the
Hazelcast members instantiate the object
When dynamically adding new a data structure configuration with user customizations, take the
following considerations into account:
• For the user customizations submitted as a class name or factory class name, the referenced
classes are resolved lazily. Therefore, they should be either already on each member’s local
classpath or resolvable via user code deployment.
• When the user customizations are submitted as instances (or similarly factory instances), the
instances themselves have to be serializable. This is because the entire configuration needs to
be sent over the network to all cluster members, and their classes have to be available on each
member’s local classpath.
38
4.6. Checking Configuration
When you start a Hazelcast member without passing a Config object, as explained in the
Configuring Programmatically section, Hazelcast checks the member’s configuration as follows:
• First, it looks for the hazelcast.config system property. If it is set, its value is used as the path.
This is useful if you want to be able to change your Hazelcast configuration; you can do this
because it is not embedded within the application. You can set the config option with the
following command:
The path can be a regular one or a classpath reference with the prefix classpath:.
• If the above system property is not set, Hazelcast then checks whether there is a hazelcast.xml
file in the working directory.
• If none of the above works, Hazelcast loads the default configuration (hazelcast.xml) that comes
with your Hazelcast package.
Before configuring Hazelcast, please try to work with the default configuration to see if it works for
you. This default configuration should be fine for most of the users. If not, you can consider to
modify the configuration to be more suitable for your environment.
39
class ExampleConfigPatternMatcher extends MatchingPointConfigPatternMatcher {
@Override
public String matches(Iterable<String> configPatterns, String itemName) throws
InvalidConfigurationException {
String matches = super.matches(configPatterns, itemName);
if (matches == null) throw new InvalidConfigurationException("No config found
for " + itemName);
return matches;
}
}
A single asterisk (*) can be placed anywhere inside the configuration name.
For instance, a map named com.hazelcast.test.mymap can be configured using one of the following
configurations:
40
<hazelcast>
...
<map name="com.hazelcast.test.*">
...
</map>
<!-- OR -->
<map name="com.hazel*">
...
</map>
<!-- OR -->
<map name="*.test.mymap">
...
</map>
<!-- OR -->
<map name="com.*test.mymap">
...
</map>
...
</hazelcast>
<hazelcast>
...
<queue name="*hazelcast.test.myqueue">
...
</queue>
<!-- OR -->
<queue name="com.hazelcast.*.myqueue">
...
</queue>
...
</hazelcast>
41
• You can use only a single asterisk as a wildcard for each data structure
configuration.
• If you have matching wildcard configurations for a data structure, the most
specific (longest) one is used when configuring it. Let’s say you have a map
named mymap.customer.name and you have map configurations mymap.* and
mymap.customer.*. Hazelcast uses mymap.customer.* to configure this map.
For example, see the following command that sets two system properties.
-Dcluster.name=dev
Let’s get the values of these system properties in the declarative configuration of Hazelcast, as
shown below.
<hazelcast>
<cluster-name>${cluster.name}</cluster-name>
</hazelcast>
hazelcast:
cluster-name: ${cluster.name}
This also applies to the declarative configuration of Hazelcast Java Client, as shown below.
<hazelcast-client>
<cluster-name>${cluster.name}</cluster-name>
</hazelcast-client>
42
hazelcast-client:
cluster-name: ${cluster.name}
If you do not want to rely on the system properties, you can use the XmlConfigBuilder or
YamlConfigBuilder and explicitly set a Properties instance, as shown below.
Variable replacers are configured within the element <config-replacers> under <hazelcast>, as
shown below.
<hazelcast>
...
<config-replacers fail-if-value-missing="false">
<replacer class-name="com.acme.MyReplacer">
<properties>
<property name="propName">value</property>
...
</properties>
</replacer>
<replacer class-name="example.AnotherReplacer"/>
</config-replacers>
...
</hazelcast>
43
hazelcast:
...
config-replacers:
fail-if-value-missing: false
replacers:
- class-name: com.acme.MyReplacer
properties:
propName: value
...
- class-name: example.AnotherReplacer
...
As you can see, <config-replacers> is the parent element for your replacers, which are declared
using the <replacer> sub-elements. You can define multiple replacers under the <config-replacers>.
Here are the descriptions of elements and attributes used for the replacer configuration:
• <properties>: Contains names and values of the properties used to configure a replacer. Each
property is defined using the <property> sub-element. All of the properties are explained in the
upcoming sections.
The following replacer classes are provided by Hazelcast as example implementations of the
ConfigReplacer interface. Note that you can also implement your own replacers.
• EncryptionReplacer
• PropertyReplacer
There is also a ExecReplacer which runs an external command and uses its
standard output as the value for the variable. See its code sample.
4.10.1. EncryptionReplacer
This example EncryptionReplacer replaces encrypted variables by its plain form. The secret key for
encryption/decryption is generated from a password which can be a value in a file and/or
environment specific values, such as MAC address and actual user data.
• cipherAlgorithm: Cipher algorithm used for the encryption/decryption. Its default value is AES.
• keyLengthBits: Length of the secret key to be generated in bits. Its default value is 128 bits.
• passwordFile: Path to a file whose content should be used as a part of the encryption password.
When the property is not provided no file is used as a part of the password. Its default value is
44
null.
• passwordUserProperties: Specifies whether the current user properties (user.name and user.home)
should be used as a part of the encryption password. Its default value is true.
• saltLengthBytes: Length of a random password salt in bytes. Its default value is 8 bytes.
• secretKeyAlgorithm: Name of the secret-key algorithm to be associated with the generated secret
key. Its default value is AES.
• secretKeyFactoryAlgorithm: Algorithm used to generate a secret key from a password. Its default
value is PBKDF2WithHmacSHA256.
• securityProvider: Name of a Java Security Provider to be used for retrieving the configured
secret key factory and the cipher. Its default value is null.
Older Java versions may not support all the algorithms used as defaults. Please use
the property values supported your Java version.
As a usage example, let’s create a password file and generate the encrypted string out of this file as
instructed below:
3. Configure the replacer and put the encrypted variables into the configuration:
<hazelcast>
<config-replacers>
<replacer class-name="com.hazelcast.config.replacer.EncryptionReplacer">
<properties>
<property name="passwordFile">/opt/master-password</property>
<property name="passwordUserProperties">false</property>
</properties>
</replacer>
</config-replacers>
<cluster-name>$ENC{Gw45stIlan0=:531:yVN9/xQpJ/Ww3EYkAPvHdA==}</cluster-name>
</hazelcast>
45
4. Check if the decryption works:
As you can see in the logs, the correctly decrypted cluster name value ("aCluster") is used.
4.10.2. PropertyReplacer
The PropertyReplacer replaces variables by properties with the given name. Usually the system
properties are used, e.g., ${user.name}. There is no need to define it in the declarative configuration
files.
You can also provide your own replacer implementations. All replacers have to implement the
interface com.hazelcast.config.replacer.spi.ConfigReplacer. A simple snippet is shown below.
5. Setting Up Clusters
This chapter describes Hazelcast clusters and the methods cluster members and native clients use
to form a Hazelcast cluster.
Please note that, after a cluster is formed, communication between cluster members is always via
TCP/IP, regardless of the discovery mechanism used.
46
See the Hazelcast IMDG Deployment and Operations Guide for advices on the best
discovery mechanism to use.
5.1.1. TCP
You can configure Hazelcast to be a full TCP/IP cluster. See the Discovering Members by TCP section
for configuration details.
5.1.2. Multicast
Multicast mechanism is not recommended for production since UDP is often blocked in production
environments and other discovery mechanisms are more definite.
With this mechanism, Hazelcast allows cluster members to find each other using multicast
communication. See the Discovering Members by Multicast section.
Hazelcast supports EC2 auto-discovery. It is useful when you do not want to provide or you cannot
provide the list of possible IP addresses. This discovery feature is provided as a Hazelcast plugin.
See its documentation for information on configuring and using it.
Hazelcast supports discovering members in the GCP Compute Engine environment. You can easily
configure Hazelcast members discovery, WAN replication, and Hazelcast Client to work seamlessly
on the native GCP VM Instances. This discovery feature is provided as a Hazelcast plugin. See its
documentation for information on configuring and using it.
Hazelcast members and native clients support jclouds® for discovery. This mechanism allows
applications to be deployed in various cloud infrastructure ecosystems in an infrastructure-
agnostic way. This discovery feature is provided as a Hazelcast plugin. See its documentation for
information on configuring and using it.
Hazelcast offers a discovery strategy for your Hazelcast applications running on Azure. This
strategy provides all of your Hazelcast instances by returning the virtual machines within your
Azure resource group that are tagged with a specified value. This discovery feature is provided as a
Hazelcast plugin. See its documentation for information on configuring and using it.
This discovery mechanism provides a service based discovery strategy by using Apache Curator to
communicate with your Zookeeper server. You can use this plugin with Discovery SPI enabled
applications. This is provided as a Hazelcast plugin. See its documentation for information on
47
configuring and using it.
Consul is a highly available and distributed service discovery and key-value store designed with
support for the modern data center to make distributed systems and configuration easy. This
mechanism provides a Consul based discovery strategy for Hazelcast enabled applications and
enables Hazelcast members to dynamically discover one another via Consul. This discovery feature
is provided as a Hazelcast plugin. See its documentation for information on configuring and using
it.
This mechanism provides an etcd based discovery strategy for Hazelcast enabled applications. This
is an easy to configure plug-and-play Hazelcast discovery strategy that optionally registers each of
your Hazelcast members with etcd and enables Hazelcast members to dynamically discover one
another via etcd. This discovery feature is provided as a Hazelcast plugin. See its documentation for
information on configuring and using it.
Using a clickable Hazelcast Tile for Pivotal Cloud Foundry (PCF), you can deploy your Hazelcast
cluster on PCF. This feature is provided as a Hazelcast plugin. See its documentation on how to
install, configure and use the plugin Hazelcast for PCF.
Hazelcast can run inside OpenShift benefiting from its cluster management software Kubernetes
for discovery of members. Using Hazelcast Docker images, templates and default configuration
files, you can deploy Hazelcast IMDG, Hazelcast IMDG Enterprise and Management Center onto
OpenShift. See the following related documentation:
• Management Center
See also the Hazelcast for OpenShift guide, which presents how to set up the local OpenShift
environment, start a Hazelcast cluster, configure the Management Center and finally run a sample
client application.
Eureka is a REST based service that is primarily used in the AWS cloud for locating services for the
purpose of load balancing and failover of middle-tier servers. Hazelcast supports Eureka V1
discovery; Hazelcast members within EC2 Virtual Private Cloud can discover each other using this
mechanism. This discovery feature is provided as a Hazelcast plugin. See its documentation.
48
5.1.13. Heroku Cloud Discovery
Heroku is a platform as a service (PaaS) with which you can build, run and operate applications
entirely in the cloud. It is a cloud platform based on a managed container system, with integrated
data services and a powerful ecosystem. Hazelcast offers a discovery plugin that looks for IP
addresses of other members by resolving service names against the Heroku DNS Discovery in
Heroku Private Spaces. This discovery feature is provided as a Hazelcast plugin. See its
documentation.
Kubernetes is an open source system for automating deployment, scaling and management of
containerized applications. Hazelcast provides Kubernetes discovery mechanism that looks for IP
addresses of other members by resolving the requests against a Kubernetes Service Discovery
system. It supports two different options of resolving against the discovery registry: (i) a request to
the REST API, (ii) DNS Lookup against a given DNS service name. This discovery feature is provided
as a Hazelcast plugin. See its documentation for information on configuring and using it.
To configure your Hazelcast to be a full TCP/IP cluster, set the following configuration elements. See
the tcp-ip element section for the full descriptions of the TCP/IP discovery configuration elements.
49
<hazelcast>
...
<network>
<join>
<multicast enabled="false">
</multicast>
<tcp-ip enabled="true">
<member>machine1</member>
<member>machine2</member>
<member>machine3:5799</member>
<member>192.168.1.0-7</member>
<member>192.168.1.21</member>
</tcp-ip>
</join>
</network>
...
</hazelcast>
As shown above, you can provide IP addresses or host names for member elements. You can also give
a range of IP addresses, such as 192.168.1.0-7.
Instead of providing members line-by-line as shown above, you also have the option to use the
members element and write comma-separated IP addresses, as shown below.
<members>192.168.1.0-7,192.168.1.21</members>
If you do not provide ports for the members, Hazelcast automatically tries the ports 5701, 5702 and
so on.
By default, Hazelcast binds to all local network interfaces to accept incoming traffic. You can
change this behavior using the system property hazelcast.socket.bind.any. If you set this property
to false, Hazelcast uses the interfaces specified in the interfaces element (see the Interfaces
Configuration section). If no interfaces are provided, then it tries to resolve one interface to bind
from the member elements.
To set your Hazelcast to multicast auto-discovery, set the following configuration elements. See the
multicast element section for the full description of the multicast discovery configuration elements.
• Set the enabled attribute of both tcp-ip and aws elements to "false".
50
The following is an example declarative configuration.
<hazelcast>
...
<network>
<join>
<multicast enabled="true">
<multicast-group>224.2.2.3</multicast-group>
<multicast-port>54327</multicast-port>
<multicast-time-to-live>32</multicast-time-to-live>
<multicast-timeout-seconds>2</multicast-timeout-seconds>
<trusted-interfaces>
<interface>192.168.1.102</interface>
</trusted-interfaces>
</multicast>
<tcp-ip enabled="false">
</tcp-ip>
<aws enabled="false">
</aws>
</join>
</network>
...
</hazelcast>
To configure your cluster to have the multicast discovery plugin, follow these steps:
• Disable the multicast and TCP/IP join mechanisms. To do this, set the enabled attributes of the
multicast and tcp-ip elements to false in your hazelcast.xml configuration file
51
• Add multicast discovery strategy configuration to your XML file, i.e., <discovery-strategies>
element.
<hazelcast>
...
<properties>
<property name="hazelcast.discovery.enabled">true</property>
</properties>
<network>
<join>
<multicast enabled="false">
</multicast>
<tcp-ip enabled="false">
</tcp-ip>
<discovery-strategies>
<discovery-strategy class=
"com.hazelcast.spi.discovery.multicast.MulticastDiscoveryStrategy" enabled="true">
<properties>
<property name="group">224.2.2.3</property>
<property name="port">54327</property>
</properties>
</discovery-strategy>
</discovery-strategies>
</join>
</network>
...
</hazelcast>
The following are the multicast discovery plugin configuration properties with their descriptions:
• group: String value that is used to set the multicast group, so that you can isolate your clusters.
You can separate and group your clusters in a simple way by specifying cluster names. Example
groupings can be by development, production, test, app, etc. The following is an example
declarative configuration.
<hazelcast>
<cluster-name>production</cluster-name>
</hazelcast>
You can also define the cluster configuration programmatically. A JVM can host multiple Hazelcast
52
instances. Each Hazelcast instance can only participate in one group. Each Hazelcast instance only
joins to its own group and does not interact with other groups. The following code example creates
three separate Hazelcast instances--h1 belongs to the production cluster, while h2 and h3 belong to
the development cluster.
If you have a Hazelcast release older than 3.8.2, you need to provide also a group password along
with the group name. The following are the configuration examples with the password element:
<hazelcast>
<group>
<name>production</name>
<password>prod-pass</password>
</group>
</hazelcast>
Starting with 3.8.2, members no longer perform a password check during the
cluster join process. Starting with 3.11, members no longer perform a password
check when a client connects to the cluster.
53
members. For this purpose Hazelcast offers a distributed dynamic class loader.
The following is a brief working mechanism of the User Code Deployment feature:
2. It first checks locally available classes, i.e. the member’s classpath. If the class is found, it is
used.
3. Then it checks the cache of classes loaded from remote members or clients (if caching is enabled
on your local member, see the Configuring User Code Deployment section). If your class is found
there, it is used.
4. Finally, the dynamic class loader checks configured remote members, one by one. If some
member returns the class, it will be used. It can also put this class into the local class cache as
mentioned in the previous step.
6. The dynamic class loader is released after the operation is handled. A next operation will load
the class from the cache or re-fetch it.
User Code Deployment feature is not enabled by default. You can control local caching of the classes
loaded from other members, control classes to be provided to other members and create blacklists
and whitelists of classes and packages.
Declarative Configuration:
<hazelcast>
...
<user-code-deployment enabled="true">
<class-cache-mode>ETERNAL</class-cache-mode>
<provider-mode>LOCAL_AND_CACHED_CLASSES</provider-mode>
<blacklist-prefixes>com.foo,com.bar</blacklist-prefixes>
<whitelist-prefixes>com.bar.MyClass</whitelist-prefixes>
<provider-filter>HAS_ATTRIBUTE:lite</provider-filter>
</user-code-deployment>
...
</hazelcast>
Programmatic Configuration:
54
Config config = new Config();
UserCodeDeploymentConfig distCLConfig = config.getUserCodeDeploymentConfig();
distCLConfig.setEnabled( true )
.setClassCacheMode( UserCodeDeploymentConfig.ClassCacheMode.ETERNAL )
.setProviderMode( UserCodeDeploymentConfig.ProviderMode
.LOCAL_AND_CACHED_CLASSES )
.setBlacklistedPrefixes( "com.foo,com.bar" )
.setWhitelistedPrefixes( "com.bar.MyClass" )
.setProviderFilter( "HAS_ATTRIBUTE:lite" );
• enabled: Specifies whether dynamic class loading is enabled or not. Its default value is "false"
and it’s a mandatory attribute. If feature is disabled, the member will never load classes from
other members or clients.
• <class-cache-mode>: Controls the local caching behavior for the classes loaded from remote
members (classes loaded from clients are always cached). Available values are:
◦ ETERNAL: Cache the loaded classes locally. This is the default value and suitable when you
load long-living objects, such as domain objects stored in a map.
◦ OFF: Do not cache the loaded classes locally. It is suitable for loading runnables, callables,
entry processors, etc.
• <provider-mode>: Controls which classes are served to other cluster members. Available values
are:
◦ LOCAL_AND_CACHED_CLASSES: Serve classes loaded from both local classpath and from other
members. This is the default value.
◦ LOCAL_CLASSES_ONLY: Serve classes from the local classpath only. Classes loaded from other
members are used locally, but they are not served to other members.
• <blacklist-prefixes>: Comma separated class/package name prefixes that the member will
never attempt to load from other members and that the client won’t be allowed to upload. For
example, if you set it to "com.foo", remote loading of all classes from the "com.foo" package is
prevented, including the classes from all its sub-packages. If you set it to "com.foo.Class", then
"Class" and all classes starting with "Class" in the "com.foo" package are blacklisted. There are
built-in prefixes which are always blacklisted. These are as follows:
◦ javax.
◦ java.
◦ sun.
◦ com.hazelcast.
• <whitelist-prefixes>: Comma separated name prefixes of classes/packages only from which the
classes are allowed to be loaded. It allows to quickly configure remote loading only for classes
from selected packages. It can be used together with blacklisting. For example, you can whitelist
the prefix "com.foo" and blacklist the prefix "com.foo.secret". If the list is empty, all classes are
allowed.
55
• <provider-filter>: Filter to constrain members that can be used for a class loading request
when a class is not available locally. The value is in the format "HAS_ATTRIBUTE:foo". When it is
set to "HAS_ATTRIBUTE:foo", the class loading request is only sent to the members which have
"foo" as a member attribute. Setting this to null allows loading of classes from all members. See
an example in the next section.
As described above, the configuration element provider-filter is used to limit members that can be
used to load classes. The attribute required in the provider-filter must be set as a member
attribute on the members from which the classes are to be loaded. See the following examples
provided as programmatic configurations.
The example configuration below allows the Hazelcast member to load classes only from members
with the class-provider attribute set. It prevents from asking any other member to provide a locally
unavailable class:
The example configuration below sets the attribute class-provider for a member. Therefore the
above member will be able to load classes from this member:
1. You have objects that run on the cluster via the clients such as Runnable, Callable and
EntryProcessor.
2. You have new user domain objects which need to be deployed into the cluster.
When this feature is enabled on the client, the client will deploy the classes to the members when
connecting. This way, when a client adds a new class, the members do not require a restart to
include it in their classpath.
You can also use the client permission policy to specify which clients are permitted to use User Code
Deployment. See the Permissions section.
56
5.7.1. Configuring Client User Code Deployment
Client User Code Deployment feature is not enabled by default. You can configure this feature
declaratively or programmatically. Following are example configuration snippets:
Declarative Configuration:
In your hazelcast-client.xml:
<hazelcast>
...
<user-code-deployment enabled="true">
<jarPaths>
<jarPath>/User/example/example.jar</jarPath>
<jarPath>example.jar</jarPath> <!--from class path -->
<jarPath>https://com.example.com/example.jar</jarPath>
<jarPath>file://Users/example/example.jar</jarPath>
</jarPaths>
<classNames>
<!-- for classes available in client's class path -->
<className>example.ClassName</className>
<className>example.ClassName2</className>
</classNames>
</user-code-deployment>
...
</hazelcast>
Programmatic Configuration:
clientUserCodeDeploymentConfig.addJar("/User/example/example.jar");
clientUserCodeDeploymentConfig.addJar("https://com.example.com/example.jar");
clientUserCodeDeploymentConfig.addClass("example.ClassName");
clientUserCodeDeploymentConfig.addClass("example.ClassName2");
clientUserCodeDeploymentConfig.setEnabled(true);
clientConfig.setUserCodeDeploymentConfig(clientUserCodeDeploymentConfig);
Important to Know
The members have to be configured in a specific way for the feature to work correctly:
• User Code Deployment must be enabled on the members. Otherwise, the classes from the client
will be ignored. Also blacklisted and non-whitelisted classes will be ignored.
57
members.
The client uploads the classes only to one member. If the members don’t load classes from each
other, other members won’t see the class.
Here’s a programmatic configuration of the members that will work with client user code
deployment:
See the Member User Code Deployment section for more information on enabling it on the member
side and the configuration properties.
Classes deployed from clients are always cached on the members, no matter whether ETERNAL or OFF
is configured on the members.
Performance Considerations
The client always uploads all added classes and jars to one of the members, whether it has them or
not. So avoid adding large jar files for each connection - if configured properly, the member will
have the class the next time the client connects.
If the client uploads a class and the member already has that class, an exception is thrown if the
byte code is different. If byte code is same, it is ignored. Therefore classes uploaded from the client
can’t be updated with a new version.
When you want to use a Hazelcast feature in a non-Java client, you need to make sure that the
Hazelcast member recognizes it. For this, you can use the /user-lib directory that comes with the
Hazelcast package and deploy your own library to the member. Let’s say you use Hazelcast Node.js
client and want to use an entry processor. This processor should be IdentifiedDataSerializable or
Portable in the Node.js client. You need to implement the Java equivalents of the processor and its
factory on the member side, and put these compiled class or JAR files into the /user-lib directory.
Then you can run the start.sh script which adds them to the classpath.
The following is an example code which can be the Java equivalent of entry processor in the
Node.js client:
58
public class IdentifiedEntryProcessor implements EntryProcessor<String, String,
String>, IdentifiedDataSerializable {
static final int CLASS_ID = 1;
private String value;
public IdentifiedEntryProcessor() {
}
@Override
public int getFactoryId() {
return IdentifiedFactory.FACTORY_ID;
}
@Override
public int getClassId() {
return CLASS_ID;
}
@Override
public void writeData(ObjectDataOutput out) throws IOException {
out.writeUTF(value);
}
@Override
public void readData(ObjectDataInput in) throws IOException {
value = in.readUTF();
}
@Override
public String process(Map.Entry<String, String> entry) {
entry.setValue(value);
return value;
}
}
59
<hazelcast>
<serialization>
<data-serializable-factories>
<data-serializable-factory factory-id="5">
IdentifiedFactory
</data-serializable-factory>
</data-serializable-factories>
</serialization>
</hazelcast>
Then, you can start your Hazelcast member by using the start scripts (start.sh or start.bat) in the
/bin directory. The start scripts automatically adds your class and JAR files to the classpath.
Hazelcast member that owns the primary replica of a partition is called as the partition owner.
Other replicas are called backups. Based on the configuration, a key object can be kept in multiple
replicas of a partition. A member can hold at most one replica of a partition (ownership or backup).
By default, Hazelcast distributes partition replicas randomly and equally among the cluster
members, assuming all members in the cluster are identical. But what if some members share the
same JVM or physical machine or chassis and you want backups of these members to be assigned to
members in another machine or chassis? What if processing or memory capacities of some
members are different and you do not want an equal number of partitions to be assigned to all
members?
To deal with such scenarios, you can group members in the same JVM (or physical machine) or
members located in the same chassis. Or you can group members to create identical capacity. We
call these groups partition groups. Partitions are assigned to those partition groups instead of
individual members. Backup replicas of a partition which is owned by a partition group are located
in other partition groups.
When you enable partition grouping, Hazelcast presents the following choices for you to configure
partition groups.
HOST_AWARE
You can group members automatically using the IP addresses of members, so members sharing the
same network interface are grouped together. All members on the same host (IP address or domain
name) form a single partition group. This helps to avoid data loss when a physical server crashes,
60
because multiple replicas of the same partition are not stored on the same host. But if there are
multiple network interfaces or domain names per physical machine, this assumption is invalid.
The following are declarative and programmatic configuration snippets that show how to enable
HOST_AWARE grouping:
CUSTOM
You can do custom grouping using Hazelcast’s interface matching configuration. This way, you can
add different and multiple interfaces to a group. You can also use wildcards in the interface
addresses. For example, the users can create rack-aware or data warehouse partition groups using
custom partition grouping.
The following are declarative and programmatic configuration examples that show how to enable
and use CUSTOM grouping:
<hazelcast>
...
<partition-group enabled="true" group-type="CUSTOM">
<member-group>
<interface>10.10.0.*</interface>
<interface>10.10.3.*</interface>
<interface>10.10.5.*</interface>
</member-group>
<member-group>
<interface>10.10.10.10-100</interface>
<interface>10.10.1.*</interface>
<interface>10.10.2.*</interface>
</member-group>
</partition-group>
...
</hazelcast>
61
Config config = new Config();
PartitionGroupConfig partitionGroupConfig = config.getPartitionGroupConfig();
partitionGroupConfig.setEnabled( true )
.setGroupType( PartitionGroupConfig.MemberGroupType.CUSTOM );
partitionGroupConfig.addMemberGroupConfig( memberGroupConfig );
partitionGroupConfig.addMemberGroupConfig( memberGroupConfig2 );
While your cluster was forming, if you configured your members to discover each
other by their IP addresses, you should use the IP addresses for the <interface>
element. If your members discovered each other by their host names, you should
use host names.
PER_MEMBER
You can give every member its own group. Each member is a group of its own and primary and
backup partitions are distributed randomly (not on the same physical member). This gives the least
amount of protection and is the default configuration for a Hazelcast cluster. This grouping type
provides good redundancy when Hazelcast members are on separate hosts. However, if multiple
instances run on the same host, this type is not a good option.
The following are declarative and programmatic configuration snippets that show how to enable
PER_MEMBER grouping:
ZONE_AWARE
You can use ZONE_AWARE configuration with Hazelcast Kubernetes, Hazelcast AWS, Hazelcast GCP,
Hazelcast jclouds or Hazelcast Azure Discovery Service plugins.
As discovery services, these plugins put zone information to the Hazelcast member attributes map
during the discovery process. When ZONE_AWARE is configured as partition group type, Hazelcast
62
creates the partition groups with respect to member attributes map entries that include zone
information. That means backups are created in the other zones and each zone is accepted as one
partition group.
The following is the list of supported attributes which is set by the Discovery Service plugins during
a Hazelcast member start-up:
The following are declarative and programmatic configuration snippets that show how to enable
ZONE_AWARE grouping:
SPI
You can provide your own partition group implementation using the SPI configuration. To create
your partition group implementation, you need to first extend the DiscoveryStrategy class of the
discovery service plugin, override the method public PartitionGroupStrategy
getPartitionGroupStrategy() and return the PartitionGroupStrategy configuration in that
overridden method.
The following code covers the implementation steps mentioned in the above paragraph:
63
public class CustomDiscovery extends AbstractDiscoveryStrategy {
@Override
public Iterable<DiscoveryNode> discoverNodes() {
Iterable<DiscoveryNode> iterable = //your implementation
return iterable;
}
@Override
public PartitionGroupStrategy getPartitionGroupStrategy() {
return new CustomPartitionGroupStrategy();
}
To use the built-in adapters, set the hazelcast.logging.type property to one of the predefined types
below:
• log4j: Log4j
• log4j2: Log4j2
• slf4j: Slf4j
If you choose to use log4j, log4j2, or slf4j, you should include the proper
dependencies in the classpath.
64
Declarative Configuration:
<hazelcast>
...
<properties>
<property name="hazelcast.logging.type">log4j</property>
</properties>
...
</hazelcast>
Programmatic Configuration
System Property
If the provided logging mechanisms are not satisfactory, you can implement your own using the
custom logging feature. To use it, implement the com.hazelcast.logging.LoggerFactory and
com.hazelcast.logging.ILogger interfaces and set the system property hazelcast.logging.class as
your custom LoggerFactory class name.
-Dhazelcast.logging.class=foo.bar.MyLoggingFactory
You can also listen to logging events generated by Hazelcast runtime by registering LogListeners to
LoggingService.
Through the LoggingService, you can get the currently used ILogger implementation and log your
own messages too.
If you are not using command line for configuring logging, you should be careful
about Hazelcast classes. They may be defaulted to jdk logging before newly
configured logging is read. When logging mechanism is selected, it will not change.
65
Below are example configurations for Log4j2 and Log4j. Note that Hazelcast does not recommend
any specific logging library, these examples are provided only to demonstrate how to configure the
logging. You can use your custom logging as explained above.
Specify the logging type as Log4j2 and a separate logging configuration file as shown below.
-Dhazelcast.logging.type=log4j2
-Dlog4j.configurationFile=/path/to/properties/log4j2.properties
<hazelcast>
...
<properties>
<property name="hazelcast.logging.type">log4j2</property>
<property name="log4j2.configuration">
/path/to/properties/log4j2.properties</property>
</properties>
...
</hazelcast>
66
rootLogger=file
rootLogger.level=info
property.filepath=/path/to/log/files
property.filename=hazelcast
appender.file.type=RollingFile
appender.file.name=RollingFile
appender.file.fileName=${filepath}/${filename}.log
appender.file.filePattern=${filepath}/${filename}-%d{yyyy-MM-dd}-%i.log.gz
appender.file.layout.type=PatternLayout
appender.file.layout.pattern = %d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
appender.file.policies.type=Policies
appender.file.policies.time.type=TimeBasedTriggeringPolicy
appender.file.policies.time.interval=1
appender.file.policies.time.modulate=true
appender.file.policies.size.type=SizeBasedTriggeringPolicy
appender.file.policies.size.size=50MB
appender.file.strategy.type=DefaultRolloverStrategy
appender.file.strategy.max=100
rootLogger.appenderRefs=file
rootLogger.appenderRef.file.ref=RollingFile
#log4j.logger.com.hazelcast=debug
#log4j.logger.com.hazelcast.cluster=debug
#log4j.logger.com.hazelcast.partition=debug
#log4j.logger.com.hazelcast.partition.InternalPartitionService=debug
#log4j.logger.com.hazelcast.nio=debug
#log4j.logger.com.hazelcast.hibernate=debug
To enable the debug logs for all Hazelcast operations uncomment the below line in the above
configuration file:
log4j.logger.com.hazelcast=debug
If you do not need detailed logs, the default settings is enough. Using the Hazelcast specific lines in
the above configuration file, you can select to see specific logs (cluster, partition, hibernate, etc.) in
desired levels.
Its configuration is similar to that of Log4j2. Below is the JVM argument way of specifying the
logging type and configuration file:
67
-Dhazelcast.logging.type=log4j
-Dlog4j.configuration=file:/path/to/properties/log4j.properties
log4j.rootLogger=INFO,file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/path/to/log/files/hazelcast.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %p [%c{1}] - %m%n
log4j.appender.file.maxFileSize=50MB
log4j.appender.file.maxBackupIndex=100
log4j.appender.file.threshold=DEBUG
#log4j.logger.com.hazelcast=debug
#log4j.logger.com.hazelcast.cluster=debug
#log4j.logger.com.hazelcast.partition=debug
#log4j.logger.com.hazelcast.partition.InternalPartitionService=debug
#log4j.logger.com.hazelcast.nio=debug
#log4j.logger.com.hazelcast.hibernate=debug
public-address overrides the public address of a member. By default, a member selects its socket
address as its public address. But behind a network address translation (NAT), two endpoints
(members) may not be able to see/access each other. If both members set their public addresses to
their defined addresses on NAT, then that way they can communicate with each other. In this case,
their public addresses are not an address of a local network interface but a virtual address defined
by NAT. It is optional to set and useful when you have a private cloud. Note that, the value for this
element should be given in the format host IP address:port number. See the following examples.
Declarative Configuration:
68
<hazelcast>
...
<network>
<public-address>11.22.33.44:5555</public-address>
</network>
...
</hazelcast>
Programmatic Configuration:
5.10.2. Port
You can specify the ports that Hazelcast uses to communicate between cluster members. Its default
value is 5701. The following are example configurations.
Declarative Configuration:
<hazelcast>
...
<network>
<port port-count="20" auto-increment="true">5701</port>
</network>
...
</hazelcast>
Programmatic Configuration:
According to the above example, Hazelcast tries to find free ports between 5701 and 5720.
• port-count: By default, Hazelcast tries 100 ports to bind. Meaning that, if you set the value of
port as 5701, as members are joining to the cluster, Hazelcast tries to find ports between 5701
and 5801. You can choose to change the port count in the cases like having large instances on a
single machine or willing to have only a few ports to be assigned. The parameter port-count is
used for this purpose, whose default value is 100.
• auto-increment: In some cases you may want to choose to use only one port. In that case, you can
69
disable the auto-increment feature of port by setting auto-increment to false. The port-count
attribute is not used when auto-increment feature is disabled.
By default, Hazelcast lets the system pick up an ephemeral port during socket bind operation. But
security policies/firewalls may require you to restrict outbound ports to be used by Hazelcast-
enabled applications. To fulfill this requirement, you can configure Hazelcast to use only defined
outbound ports. The following are example configurations.
Declarative Configuration:
<hazelcast>
...
<network>
<outbound-ports>
<!-- ports between 33000 and 35000 -->
<ports>33000-35000</ports>
<!-- comma separated ports -->
<ports>37000,37001,37002,37003</ports>
<ports>38000,38500-38600</ports>
</outbound-ports>
</network>
...
</hazelcast>
Programmatic Configuration:
...
NetworkConfig networkConfig = config.getNetworkConfig();
// ports between 35000 and 35100
networkConfig.addOutboundPortDefinition("35000-35100");
// comma separated ports
networkConfig.addOutboundPortDefinition("36001, 36002, 36003");
networkConfig.addOutboundPort(37000);
networkConfig.addOutboundPort(37001);
...
As shown in the programmatic configuration, you use the method addOutboundPort to add only one
port. If you need to add a group of ports, then use the method addOutboundPortDefinition.
In the declarative configuration, the element ports can be used for both single and multiple port
definitions. When you set this element to 0 or *, your operating system (not Hazelcast) selects a free
port from the ephemeral range.
70
5.10.4. Reuse Address
When you shutdown a cluster member, the server socket port goes into the TIME_WAIT state for the
next couple of minutes. If you start the member right after shutting it down, you may not be able to
bind it to the same port because it is in the TIME_WAIT state. If you set the reuse-address element to
true, the TIME_WAIT state is ignored and you can bind the member to the same port again.
Declarative Configuration:
<hazelcast>
...
<network>
<reuse-address>true</reuse-address>
</network>
...
</hazelcast>
Programmatic Configuration:
...
NetworkConfig networkConfig = config.getNetworkConfig();
networkConfig.setReuseAddress( true );
...
5.10.5. Join
The join configuration element is used to discover Hazelcast members and enable them to form a
cluster. Hazelcast provides multicast, TCP/IP, EC2 and jclouds® discovery mechanisms. These
mechanisms are explained the Discovery Mechanisms section. This section describes all the sub-
elements and attributes of join element. The following are example configurations.
Declarative Configuration:
71
<hazelcast>
...
<network>
<join>
<multicast enabled="true">
<multicast-group>224.2.2.3</multicast-group>
<multicast-port>54327</multicast-port>
<multicast-time-to-live>32</multicast-time-to-live>
<multicast-timeout-seconds>2</multicast-timeout-seconds>
<trusted-interfaces>
<interface>192.168.1.102</interface>
</trusted-interfaces>
</multicast>
<tcp-ip enabled="false">
<required-member>192.168.1.104</required-member>
<member>192.168.1.104</member>
<members>192.168.1.105,192.168.1.106</members>
</tcp-ip>
<aws enabled="false">
<access-key>my-access-key</access-key>
<secret-key>my-secret-key</secret-key>
<region>us-west-1</region>
<host-header>ec2.amazonaws.com</host-header>
<security-group-name>hazelcast-sg</security-group-name>
<tag-key>type</tag-key>
<tag-value>hz-members</tag-value>
</aws>
<discovery-strategies>
<discovery-strategy ... />
</discovery-strategies>
</join>
</network>
...
</hazelcast>
Programmatic Configuration:
72
multicast element
The multicast element includes parameters to fine tune the multicast join mechanism.
• enabled: Specifies whether the multicast discovery is enabled or not, true or false.
• multicast-group: The multicast group IP address. Specify it when you want to create clusters
within the same network. Values can be between 224.0.0.0 and 239.255.255.255. Its default value
is 224.2.2.3.
• multicast-port: The multicast socket port that the Hazelcast member listens to and sends
discovery messages through. Its default value is 54327.
• multicast-time-to-live: Time-to-live value for multicast packets sent out to control the scope of
multicasts. See more information here.
• multicast-timeout-seconds: Only when the members are starting up, this timeout (in seconds)
specifies the period during which a member waits for a multicast response from another
member. For example, if you set it as 60 seconds, each member waits for 60 seconds until a
leader member is selected. Its default value is 2 seconds.
tcp-ip element
The tcp-ip element includes parameters to fine tune the TCP/IP join mechanism.
• enabled: Specifies whether the TCP/IP discovery is enabled or not. Values can be true or false.
• required-member: IP address of the required member. Cluster is only formed if the member with
this IP address is found.
• member: IP address(es) of one or more well known members. Once members are connected to
these well known ones, all member addresses are communicated with each other. You can also
give comma separated IP addresses using the members element.
tcp-ip element also accepts the interface parameter. See the Interfaces
element description.
73
aws element
The aws element includes parameters to allow the members to form a cluster on the Amazon EC2
environment.
• enabled: Specifies whether the EC2 discovery is enabled or not, true or false.
• region: The region where your members are running. Its default value is us-east-1. You need to
specify this if the region is other than the default one.
• host-header: The URL that is the entry point for a web service. It is optional.
• security-group-name: Name of the security group you specified at the EC2 management console.
It is used to narrow the Hazelcast members to be within this group. It is optional.
• tag-key, tag-value: To narrow the members in the cloud down to only Hazelcast members, you
can set these parameters as the ones you specified in the EC2 console. They are optional.
If you are using a cloud provider other than AWS, you can use the programmatic configuration to
specify a TCP/IP cluster. The members need to be retrieved from that provider, e.g., jclouds.
discovery-strategies element
The discovery-strategies element configures internal or external discovery strategies based on the
Hazelcast Discovery SPI. For further information, see the Discovery SPI section and the vendor
documentation of the used discovery strategy.
To make sure EC2 instances are found correctly, you can use the AWSClient class. It determines the
private IP addresses of EC2 instances to be connected. Give the AWSClient class the values for the
parameters that you specified in the aws element, as shown below. You will see whether your EC2
instances are found.
74
public static void main( String[] args )throws Exception{
AwsConfig config = new AwsConfig();
config.setSecretKey( ... ) ;
config.setSecretKey( ... );
config.setRegion( ... );
config.setSecurityGroupName( ... );
config.setTagKey( ... );
config.setTagValue( ... );
config.setEnabled( true );
AWSClient client = new AWSClient( config );
Collection<String> ipAddresses = client.getPrivateIpAddresses();
System.out.println( "addresses found:" + ipAddresses );
for ( String ip: ipAddresses ) {
System.out.println( ip );
}
}
5.10.7. Interfaces
You can specify which network interfaces that Hazelcast should use. Servers mostly have more than
one network interface, so you may want to list the valid IPs. Range characters (* and -) can be used
for simplicity. For instance, 10.3.10.* refers to IPs between 10.3.10.0 and 10.3.10.255. Interface
10.3.10.4-18 refers to IPs between 10.3.10.4 and 10.3.10.18 (4 and 18 included). If network interface
configuration is enabled (it is disabled by default) and if Hazelcast cannot find a matching interface,
then it prints a message on the console and does not start on that member.
Declarative Configuration:
<hazelcast>
...
<network>
<interfaces enabled="true">
<interface>10.3.16.*</interface>
<interface>10.3.10.4-18</interface>
<interface>192.168.1.3</interface>
</interfaces>
</network>
...
</hazelcast>
Programmatic Configuration:
75
Config config = new Config();
NetworkConfig network = config.getNetworkConfig();
InterfacesConfig interfaceConfig = network.getInterfaces();
interfaceConfig.setEnabled( true )
.addInterface( "192.168.1.3" );
Hazelcast supports IPv6 addresses seamlessly (This support is switched off by default, see the note
at the end of this section).
All you need is to define IPv6 addresses or interfaces in the network configuration. The only
current limitation is that you cannot define wildcard IPv6 addresses in the TCP/IP join
configuration (tcp-ip element). Interfaces configuration does not have this limitation, you can
configure wildcard IPv6 interfaces in the same way as IPv4 interfaces.
<hazelcast>
...
<network>
<port auto-increment="true">5701</port>
<join>
<multicast enabled="false">
<multicast-group>FF02:0:0:0:0:0:0:1</multicast-group>
<multicast-port>54327</multicast-port>
</multicast>
<tcp-ip enabled="true">
<member>[fe80::223:6cff:fe93:7c7e]:5701</member>
<interface>192.168.1.0-7</interface>
<interface>192.168.1.*</interface>
<interface>fe80:0:0:0:45c5:47ee:fe15:493a</interface>
</tcp-ip>
</join>
<interfaces enabled="true">
<interface>10.3.16.*</interface>
<interface>10.3.10.4-18</interface>
<interface>fe80:0:0:0:45c5:47ee:fe15:*</interface>
<interface>fe80::223:6cff:fe93:0-5555</interface>
</interfaces>
</network>
...
</hazelcast>
JVM has two system properties for setting the preferred protocol stack (IPv4 or IPv6) as well as the
preferred address family types (inet4 or inet6). On a dual stack machine, IPv6 stack is preferred by
default, you can change this through the java.net.preferIPv4Stack=<true|false> system property.
When querying name services, JVM prefers IPv4 addresses over IPv6 addresses and returns an IPv4
address if possible. You can change this through java.net.preferIPv6Addresses=<true|false> system
property.
76
See also additional details on IPv6 support in Java.
IPv6 support has been switched off by default, since some platforms have issues
using the IPv6 stack. Some other platforms such as Amazon AWS have no support
at all. To enable IPv6 support, just set configuration
hazelcast.prefer.ipv4.stack to false. See the System Properties appendix for
property
details.
This SPI is not intended to provide addresses of other cluster members with which
the Hazelcast instance forms a cluster. To do that, see the previous sections above.
By default, Hazelcast chooses the public and bind address. You can influence on the choice by
defining a public-address in the configuration or by using other properties mentioned above. In
some cases, though, these properties are not enough and the default address picking strategy
chooses wrong addresses. This may be the case when deploying Hazelcast in some cloud
environments, such as AWS, when using Docker or when the instance is deployed behind a NAT
and the public-address property is not enough (see the Public Address section).
In these cases, it is possible to configure the bind and public address in a more advanced way. You
can provide an implementation of the com.hazelcast.spi.MemberAddressProvider interface which
provides the bind and public address. The implementation may then choose these addresses in any
way - it may read from a system property or file or even invoke a web service to retrieve the public
and private address.
The details of the implementation depend heavily on the environment in which Hazelcast is
deployed. As such, we now demonstrate how to configure Hazelcast to use a simplified custom
member address provider SPI implementation. An example implementation is shown below:
@Override
public InetSocketAddress getPublicAddress() {
// determine the address using some configuration, calling an API, ...
return new InetSocketAddress(hostname, port);
}
}
Note that if the bind address port is 0 then it uses a port as configured in the Hazelcast network
configuration (see the Port section). If the public address port is set to 0 then it broadcasts the same
port that it is bound to. If you wish to bind to any local interface, you may return new
77
InetSocketAddress((InetAddress) null, port) from the getBindAddress() address.
The following configuration examples contain properties that are provided to the constructor of the
provider class. If you do not provide any properties, the class may have either a no-arg constructor
or a constructor accepting a single java.util.Properties instance. On the other hand, if you do
provide properties in the configuration, the class must have a constructor accepting a single
java.util.Properties instance.
Declarative Configuration:
<hazelcast>
...
<network>
<member-address-provider enabled="true">
<class-name>SimpleMemberAddressProvider</class-name>
<properties>
<property name="prop1">prop1-value</property>
<property name="prop2">prop2-value</property>
</properties>
</member-address-provider>
<!-- other network configurations -->
</network>
...
</hazelcast>
Programmatic Configuration:
config.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
Hazelcast.newHazelcastInstance(config);
78
crashed member from a slow one in an asynchronous system. A workaround to this limitation is to
use unreliable failure detectors. An unreliable failure detector allows a member to suspect that
others have failed, usually based on liveness criteria but it can make mistakes to a certain degree.
Hazelcast has the following built-in failure detectors: Deadline Failure Detector and Phi Accrual
Failure Detector.
There is also a Ping Failure Detector, that, if enabled, works in parallel with the above ones, but
identifies the failures on OSI Layer 3 (Network Layer). This detector is by default disabled.
Note that, Hazelcast also offers failure detectors for its Java client. See the Client Failure Detectors
section for more information.
Deadline Failure Detector uses an absolute timeout for missing/lost heartbeats. After timeout, a
member is considered as crashed/unavailable and marked as suspected.
<hazelcast>
...
<properties>
<property name="hazelcast.heartbeat.failuredetector.type">deadline</property>
<property name="hazelcast.heartbeat.interval.seconds">5</property>
<property name="hazelcast.max.no.heartbeat.seconds">120</property>
</properties>
...
</hazelcast>
79
5.11.2. Phi Accrual Failure Detector
This is the failure detector based on The Phi Accrual Failure Detector' by Hayashibara et al.
Phi Accrual Failure Detector keeps track of the intervals between heartbeats in a sliding window of
time and measures the mean and variance of these samples and calculates a value of suspicion
level (Phi). The value of phi increases when the period since the last heartbeat gets longer. If the
network becomes slow or unreliable, the resulting mean and variance increase, there needs to be a
longer period for which no heartbeat is received before the member is suspected.
In addition to the above two properties, Phi Accrual Failure Detector has the following configuration
properties:
phi = 1 means likeliness that we will make a mistake is about 10%. The likeliness is about 1% with
phi = 2, 0.1% with phi = 3 and so on. Default phi threshold is 10.
80
<hazelcast>
...
<properties>
<property name="hazelcast.heartbeat.failuredetector.type">phi-
accrual</property>
<property name="hazelcast.heartbeat.interval.seconds">1</property>
<property name="hazelcast.max.no.heartbeat.seconds">60</property>
<property name="hazelcast.heartbeat.phiaccrual.failuredetector.threshold">
10</property>
<property name="hazelcast.heartbeat.phiaccrual.failuredetector.sample.size"
>200</property>
<property name=
"hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis">100</property>
</properties>
...
</hazelcast>
The Ping Failure Detector may be configured in addition to one of Deadline and Phi Accrual Failure
Detectors. It operates at Layer 3 of the OSI protocol and provides much quicker and more
deterministic detection of hardware and other lower level events. This detector may be configured
to perform an extra check after a member is suspected by one of the other detectors, or it can work
in parallel, which is the default. This way hardware and network level issues are detected more
quickly.
This failure detector is based on InetAddress.isReachable(). When the JVM process has enough
permissions to create RAW sockets, the implementation chooses to rely on ICMP Echo requests. This
is preferred.
If there are not enough permissions, it can be configured to fallback on attempting a TCP Echo on
port 7. In the latter case, both a successful connection or an explicit rejection is treated as "Host is
Reachable". Or, it can be forced to use only RAW sockets. This is not preferred as each call creates a
heavy weight socket and moreover the Echo service is typically disabled.
For the Ping Failure Detector to rely only on ICMP Echo requests, there are some criteria that need
to be met.
81
Requirements and Linux/Unix Configuration
• Supported OS: as of Java 1.8 only Linux/Unix environments are supported. This detector
relies on ICMP, i.e., the protocol behind the ping command. It tries to issue the ping attempts
periodically, and their responses are used to determine the reachability of the remote member.
However, you cannot simply create an ICMP Echo Request because these type of packets do not
rely on any of the preexisting transport protocols such as TCP. In order to create such a request,
you must have the privileges to create RAW sockets (see https://linux.die.net/man/7/raw). Most
operating systems allow this to the root users, however Unix based ones are more flexible and
allow the use of custom privileges per process instead of requiring root access. Therefore, this
detector is supported only on Linux.
• The Java executable must have the cap_net_raw capability. As described in the above
requirement, on Linux, you have the ability to define extra capabilities to a single process,
which would allow the process to interact with the RAW sockets. This interaction is achieved via
the capability cap_net_raw (see https://linux.die.net/man/7/capabilities). To enable this capability
run the following command:
• When running with custom capabilities, the dynamic linker on Linux rejects loading the
libs from untrusted paths. Since you have now cap_net_raw as a custom capability for a
process, it becomes suspicious to the dynamic linker and throws an error: java: error while
loading shared libraries: libjli.so: cannot open shared object file: No such file or
directory
◦ To overcome this rejection, the <JDK_HOME>/jre/lib/amd64/jli/ path needs to be added in the
ld.conf. Run the following command to do this: echo "<JDK_HOME>/jre/lib/amd64/jli/" >>
/etc/ld.so.conf.d/java.conf && sudo ldconfig
• ICMP Echo Requests must not be blocked by the receiving hosts.
/proc/sys/net/ipv4/icmp_echo_ignore_all set to 0. Run the following command:
If any of the above criteria isn’t met, then the isReachable always falls back on TCP Echo attempts
on port 7.
To be able to use the Ping Failure Detector, you can configure it using the icmp element in your
Hazelcast IMDG declarative configuration file, e.g., hazelcast.xml. An example is shown below:
82
<hazelcast>
<network>
...
<failure-detector>
<icmp enabled="true">
<timeout-milliseconds>1000</timeout-milliseconds>
<fail-fast-on-startup>true</fail-fast-on-startup>
<interval-milliseconds>1000</interval-milliseconds>
<max-attempts>3</max-attempts>
<parallel-mode>true</parallel-mode>
<ttl>0</ttl>
</icmp>
</failure-detector>
</network>
...
</hazelcast>
• enabled: Specifies whether the legacy ICMP detection mode is enabled; works cooperatively with
the existing failure detector and only kicks-in after a pre-defined period has passed with no
heartbeats from a member. Its default value is false.
• parallel-mode: Specifies whether the parallel ping detector is enabled; works separately from
the other detectors. Its default value is true.
• max-attempts: Maximum number of ping attempts before the member/node gets suspected by
the detector. Its default value is 2.
• ttl: Maximum number of hops the packets should go through. Its default value is 0.
• fail-fast-on-startup: Specifies whether the cluster member fails to start if it is unable to action
an ICMP ping command when ICMP is enabled. Failure is usually due to OS level restrictions.
In the above example configuration, the Ping detector attempts 3 pings, one every second and waits
up to 1 second for each to complete. If after 3 seconds, there was no successful ping, the member
gets suspected.
83
Until Hazelcast IMDG 3.10, the following system properties (corresponding to the
above configuration elements) have been used to configure Ping Failure Detector.
These have become deprecated starting with Hazelcast IMDG 3.10:
<hazelcast>
...
<properties>
<property name="hazelcast.icmp.enabled">true</property>
<property name="hazelcast.icmp.parallel.mode">true</property>
<property name="hazelcast.icmp.timeout">1000</property>
<property name="hazelcast.icmp.max.attempts">3</property>
<property name="hazelcast.icmp.interval">1000</property>
<property name="hazelcast.icmp.ttl">0</property>
</properties>
...
</hazelcast>
Below is a summary table of all possible configuration combinations of the ping failure detector.
84
ICMP Parallel Fail-Fast Description Linux Window macOS
s
true true false Parallel ping detector, works Supported ICMP Support Support
in parallel with the Echo if available - ed TCP ed ICMP
configured failure detector. Falls back on TCP Echo on Echo if
Checks periodically if Echo on port 7 port 7 availabl
members are live (OSI Layer e - Falls
3) and suspects them back on
immediately, regardless of TCP
the other detectors. Echo on
port 7
true true true Parallel ping detector, works Supported - Not Not
in parallel with the Requires OS Support Support
configured failure detector. Configuration ed ed -
Checks periodically if Enforcing ICMP Requires
members are live (OSI Layer Echo if available - root
3) and suspects them No start up if not privilege
immediately, regardless of available s
the other detectors.
Starting with Hazelcast 3.12, it is possible to configure the Hazelcast members with separate server
sockets using a different network configuration for different protocols. This configuration scheme
allows more flexibility when deploying Hazelcast as described in the following cases:
• For security, it is possible to bind the member protocol server socket on a protected internal
network interface, while the client connections can be established on another network interface
accessible by the Hazelcast clients.
• Different kinds of network connections can be established with different socket options. For
example varying send/receive window size to optimize the network usage, TLS for connections
over WAN while member-to-member connections may remain unencrypted, etc.
In the following example we introduce the advanced network configuration for a member to listen
for member-to-member connections on the default port 5701 while listening for client connections
on the port 9090:
85
Running this example prints something similar to the following output, indicating that the member
listens for the specified protocols on the respective configured ports:
{EndpointQualifier{type='CLIENT'}=[10.212.134.156]:9090, EndpointQualifier{type=
'MEMBER'}=[10.212.134.156]:5701}
<hazelcast>
...
<advanced-network enabled="true">
<member-server-socket-endpoint-config>
<port>5701</port>
</member-server-socket-endpoint-config>
<client-server-socket-endpoint-config>
<port>9090</port>
</client-server-socket-endpoint-config>
</advanced-network>
...
</hazelcast>
Advanced network configuration and single-socket network configuration are mutually exclusive:
either an enabled AdvancedNetworkConfig or the NetworkConfig object is used to configure a
member’s networking, including the joiner, discovery, failure detectors, etc. as described in the
previous sections of this chapter.
You cannot define both elements in the declarative configuration, i.e., the network and advanced-
network elements cannot be configured at the same time. In the programmatic configuration, an
enabled AdvancedNetworkConfig takes precedence over the NetworkConfig. AdvancedNetworkConfig is
disabled by default, therefore the unisocket member configuration under NetworkConfig is used in
the default case.
When using the advanced network configuration, the following configurations are defined
member-wide:
• MemberAddressProvider configuration
In addition to the above, the advanced network configuration allows the configuration of multiple
endpoints: each endpoint configuration applies for a specific protocol, e.g., MEMBER and CLIENT. An
additional optional identifier can be configured to separate the configuration of multiple WAN
protocol endpoints.
86
• MEMBER: A member server socket is required for Hazelcast to operate. The default advanced
network configuration defines a member endpoint configuration listening on port 5701 (same
as the single-socket Hazelcast member configuration).
• CLIENT: A single server socket handling the Hazelcast Open Binary Client Protocol can be
optionally configured. If no such endpoint is configured, then the clients will not be able to
connect to the Hazelcast member.
• MEMCACHE: When accessing a Hazelcast cluster over the Memcache text protocol, an endpoint
listening to MEMCACHE protocol must be defined.
• WAN: Multiple WAN endpoint configurations can be defined to determine the network settings of
outgoing connections (from the members of a source cluster to the target WAN cluster
members) or to establish server sockets on which a target WAN member can listen for the
incoming connections from the source cluster.
The server socket endpoint configuration is common for all protocols. The elements comprising a
server socket endpoint configuration are identical to their single-socket network configuration
counterparts.
The following declarative configuration example includes all the common server socket endpoint
elements:
87
<hazelcast>
...
<advanced-network enabled="true">
<member-server-socket-endpoint-config>
<port auto-increment="true" port-count="100">5701</port>
<outbound-ports>
<ports>33000-35000</ports>
<ports>37000,37001,37002,37003</ports>
<ports>38000,38500-38600</ports>
</outbound-ports>
<interfaces enabled="true">
<interface>10.10.1.*</interface>
</interfaces>
<ssl enabled="true">
<factory-class-name>
com.hazelcast.examples.MySSLContextFactory</factory-class-name>
<properties>
<property name="foo">bar</property>
</properties>
</ssl>
<symmetric-encryption>
<algorithm>ALGO</algorithm>
<salt>SALT</salt>
<password>PASS</password>
<iteration-count>10000</iteration-count>
</symmetric-encryption>
<socket-interceptor enabled="true">
<class-name>com.hazelcast.examples.MySocketInterceptor</class-name>
<properties>
<property name="foo">bar</property>
</properties>
</socket-interceptor>
<socket-options>
<buffer-direct>true</buffer-direct>
<tcp-no-delay>true</tcp-no-delay>
<keep-alive>true</keep-alive>
<connect-timeout-seconds>64</connect-timeout-seconds>
<send-buffer-size-kb>25</send-buffer-size-kb>
<receive-buffer-size-kb>33</receive-buffer-size-kb>
<linger-seconds>99</linger-seconds>
</socket-options>
<public-address>dummy</public-address>
<reuse-address>true</reuse-address>
</member-server-socket-endpoint-config>
</advanced-network>
...
</hazelcast>
When using the declarative configuration, specific element names introduce the server socket
endpoint configuration for each protocol:
88
• member-server-socket-endpoint-config for MEMBER protocol
When using the programmatic configuration, corresponding methods set the respective server
socket endpoint configuration:
config.getAdvancedNetworkConfig().setMemberEndpointConfig(
new ServerSocketEndpointConfig()
.setPort(5701)
.setPortAutoIncrement(false)
.setSSLConfig(new SSLConfig())
.setReuseAddress(true)
.setSocketTcpNoDelay(true)
);
In addition to the common server socket configuration described above, the REST endpoint
configuration includes certain additional elements which are used to enable/disable the REST
functionality groups.
config.getAdvancedNetworkConfig().setRestEndpointConfig(
new RestServerEndpointConfig()
.setPort(8080)
.setPortAutoIncrement(false)
.enableGroups(WAN, CLUSTER_READ, HEALTH_CHECK)
);
89
<hazelcast>
...
<advanced-network enabled="true">
<rest-server-socket-endpoint-config>
<port auto-increment="false">8080</port>
<endpoint-groups>
<endpoint-group name="WAN" enabled="true"/>
<endpoint-group name="CLUSTER_READ" enabled="true"/>
<endpoint-group name="HEALTH_CHECK" enabled="true"/>
</endpoint-groups>
</rest-server-socket-endpoint-config>
</advanced-network>
...
</hazelcast>
Multiple WAN endpoint configurations can be defined to configure the outgoing connections and
server sockets, depending on the role of the member in the WAN replication. The configuration
examples are provided in the following sections for both active and passive side of the WAN
replication.
The members on the active cluster initiate connections to the target cluster members, so there is no
need to create a server socket. A plain EndpointConfig is created that supplies the configuration for
the client side of connections that the active members will create:
config.getAdvancedNetworkConfig().addWanEndpointConfig(
new EndpointConfig().setName("tokyo")
.setSSLConfig(new SSLConfig()
.setEnabled(true)
.setFactoryClassName(
"com.hazelcast.examples.MySSLContextFactory")
.setProperty("foo", "bar"))
);
WanReplicationConfig wanReplicationConfig = new WanReplicationConfig();
WanBatchPublisherConfig publisherConfig = new WanBatchPublisherConfig()
.setEndpoint("tokyo")
.setTargetEndpoints("tokyo.hazelcast.com:8765");
wanReplicationConfig.addBatchReplicationPublisherConfig(publisherConfig);
config.addWanReplicationConfig(wanReplicationConfig);
config.getMapConfig("customers").setWanReplicationRef(
new WanReplicationRef("replicate-to-tokyo", "com.company.MergePolicy",
emptyList(), false)
);
90
The following is the equivalent declarative configuration:
<hazelcast>
...
<advanced-network enabled="true">
<wan-endpoint-config name="tokyo">
<ssl enabled="true">
<factory-class-name>
com.hazelcast.examples.MySSLContextFactory</factory-class-name>
<properties>
<property name="endpoints">tokyo.example.com:11010</property>
</properties>
</ssl>
</wan-endpoint-config>
</advanced-network>
...
<wan-replication name="replicate-to-tokyo">
<batch-publisher>
<cluster-name>clusterB</cluster-name>
<target-endpoints>...</target-endpoints>
</batch-publisher>
</wan-replication>
...
<map name="customer">
<wan-replication-ref name="replicate-to-tokyo">
<merge-policy>...</merge-policy>
</wan-replication-ref>
</map>
...
</hazelcast>
On the passive cluster, a server socket is configured on the members to listen for the incoming WAN
connections, matching the network configuration (SSL configuration, etc.) configured on the active
side of the WAN replication.
91
config.getAdvancedNetworkConfig().addWanEndpointConfig(
new ServerSocketEndpointConfig()
.setName("tokyo")
.setPort(11010)
.setPortAutoIncrement(false)
.setSSLConfig(new SSLConfig()
.setEnabled(true)
.setFactoryClassName(
"com.hazelcast.examples.MySSLContextFactory")
.setProperty("foo", "bar")
));
<hazelcast>
...
<advanced-network enabled="true">
<wan-server-socket-endpoint-config name="tokyo">
<port auto-increment="false">11010</port>
<ssl enabled="true">
<factory-class-name>
com.hazelcast.examples.MySSLContextFactory</factory-class-name>
<properties>
<property name="foo">bar</property>
</properties>
</ssl>
</wan-server-socket-endpoint-config>
</advanced-network>
...
</hazelcast>
Can I multiplex protocols on a single advanced network endpoint? For example, can I use a
single server socket to listen for MEMBER and CLIENT protocols?
No, each endpoint configuration that defines a server socket must bind to a different socket
address.
Can I mix unisocket and advanced network members in the same cluster?
No, the results will be undefined.
Can I configure multiple server socket endpoints for the same protocol?
You can only configure multiple server socket endpoints for WAN protocol. For other protocols
(MEMBER, CLIENT, REST, MEMCACHE), a single server socket can be configured.
92
6. Rolling Member Upgrades
Hazelcast IMDG Enterprise
This chapter explains the procedure of upgrading the version of Hazelcast members in a running
cluster without interrupting the operation of the cluster.
6.1. Terminology
• Minor version: A version change after the decimal point, e.g., 3.12 and 3.13.
• Patch version: A version change after the second decimal point, e.g., 3.12.1 and 3.12.2.
• Member codebase version: The major.minor.patch version of the Hazelcast binary on which
the member executes. For example, when running on hazelcast-3.12.jar, your member’s
codebase version is 3.12.0.
• Cluster version: The major.minor version at which the cluster operates. This ensures that
cluster members are able to communicate using the same cluster protocol and determines the
feature set exposed by the cluster.
Also, each minor version is compatible with the previous one (back until Hazelcast IMDG 3.8). For
example, it is possible to perform a rolling upgrade on a cluster running Hazelcast IMDG Enterprise
3.11 to Hazelcast IMDG Enterprise 3.12. Rolling upgrades across minor versions is a Hazelcast IMDG
Enterprise feature.
The compatibility guarantees described above are given in the context of rolling member upgrades
and only apply to GA (general availability) releases. It is never advisable to run a cluster with
members running on different patch or minor versions for prolonged periods of time.
The version numbers used in the paragraph below are only used as an example.
Let’s assume a cluster with four members running on codebase version 3.12.0 with cluster version
3.12, that should be upgraded to codebase version 3.13.0 and cluster version 3.13. The rolling
upgrade process for this cluster, i.e., replacing existing 3.12.0 members one by one with an
upgraded one at version 3.13.0, includes the following steps which should be repeated for each
member:
93
• Wait until all partition migrations are completed; during migrations, membership changes
(member joins or removals) are not allowed.
• Start the member and wait until it joins the cluster. You should see something like the following
in your logs:
...
INFO: [192.168.2.2]:5701 [cluster] [3.13] Hazelcast 3.9 (20170630 - a67dc3a)
starting at [192.168.2.2]:5701
...
INFO: [192.168.2.2]:5701 [cluster] [3.13] Cluster version set to 3.12
The version in brackets ([3.13]) still denotes the member’s codebase version (running on the
hypothetical hazelcast-3.13.jar binary). Once the member locates the existing cluster members, it
sends its join request to the master. The master validates that the new member is allowed to join
the cluster and lets the new member know that the cluster is currently operating at 3.12 cluster
version. The new member sets 3.12 as its cluster version and starts operating normally.
At this point all members of the cluster have been upgraded to codebase version 3.13.0 but the
cluster still operates at cluster version 3.12. In order to use 3.13 features the cluster version must be
changed to 3.13.
Rolling upgrade can be used for one version at a time, e.g., 3.n to 3.n+1. You cannot
upgrade your members, for example, from 3.13 to 3.15 in a single rolling upgrade
session.
Note that you need to enable the REST API to use either of the above methods to upgrade your
cluster version. For this, enable the CLUSTER_WRITE REST endpoint group (its default is disabled). See
the Using the REST Endpoint Groups section on how to enable them.
Also note that you need to upgrade your Management Center version before upgrading the
member version if you want to change cluster version using Management Center. Management
Center is compatible with the previous minor version of Hazelcast, starting with version 3.9. For
example, Management Center 3.12 works with both Hazelcast IMDG 3.11 and 3.12. To change your
cluster version to 3.12, you need Management Center 3.12.
94
Upgrading Cluster Version From IMDG 3.11 to 3.12
For the IMDG versions before 3.12, REST API could be enabled by using the
hazelcast.rest.enabled system property, which is deprecated for 3.12.x versions.
IMDG 3.12 and newer versions introduce the rest-api configuration element along
with REST endpoint groups. Therefore, a configuration change is needed
specifically when performing a rolling member upgrade from IMDG 3.11 to 3.12.
So, the steps listed in the above Rolling Upgrade Procedure section should be as
follows:
For the 4th step ("Update the configuration"), the configuration should be updated
as follows:
<hazelcast>
...
<rest-api enabled="true">
<endpoint-group name="CLUSTER_WRITE" enabled="true"/>
</rest-api>
...
</hazelcast>
See the Using the REST Endpoint Groups section for more information.
There is one tricky detail here: as you are shutting down and upgrading the members one by one,
when you shut down the last one, all the members in the remaining cluster have the newer version,
but you don’t want the auto-upgrade to kick in before you have successfully upgraded the last
member as well. To avoid this, you can use the
hazelcast.cluster.version.auto.upgrade.min.cluster.size system property. You should set it to the
size of your cluster, and then Hazelcast will wait for the last member to join before it can proceed
with the auto-upgrade.
95
6.6. Network Partitions and Rolling Upgrades
In the event of network partitions which split your cluster into two subclusters, split-brain handling
works as explained in the Network Partitioning chapter, with the additional constraint that two
subclusters only merge as long as they operate on the same cluster version. This is a requirement to
ensure that all members participating in each one of the subclusters are able to operate as
members of the merged cluster at the same cluster version.
With regards to rolling upgrades, the above constraint implies that if a network partition occurs
while a change of cluster version is in progress, then with some unlucky timing, one subcluster may
be upgraded to the new cluster version and another subcluster may have upgraded members but
still operate at the old cluster version.
In order for the two subclusters to merge, it is necessary to change the cluster version of the
subcluster that still operates on the old cluster version, so that both subclusters will be operating at
the same, upgraded cluster version and able to merge as soon as the network partition is fixed.
When a new member starts, it is not yet joined to a cluster; therefore its cluster version is still
undetermined. In order for the cluster version to be set, one of the following must happen:
• the member cannot locate any members of the cluster to join or is configured without a joiner:
in this case, the member appoints itself as the master of a new single-member cluster and its
cluster version is set to the major.minor version of its own codebase version. So a standalone
member running on codebase version 3.12.0 sets its own cluster version to 3.12.
• the member that is starting locates members of the cluster and identifies which is the master: in
this case, the master validates that the joining member’s codebase version is compatible with
the current cluster version. If it is found to be compatible, then the member joins and the
master sends the cluster version, which is set on the joining member. Otherwise, the starting
member fails to join and shuts down.
The version numbers used in the paragraph below are only used as an example.
On startup, as answered in the above question (How is the cluster version set?), the cluster version
is not yet known to a member that has not joined any cluster. By default the newly started member
uses the cluster protocol that corresponds to its codebase version until this member joins a cluster
(so for codebase 3.12.0 this means implicitly assuming cluster version 3.12). If, hypothetically,
major changes in discovery & join operations have been introduced which do not allow the
member to join a 3.11 cluster, then the member should be explicitly configured to start assuming a
96
3.11 cluster version.
Clients which implement the Open Binary Client Protocol are compatible with Hazelcast version 3.6
and newer minor versions. Thus older client versions are compatible with next minor versions.
Newer clients connected to a cluster operate at the lower version of capabilities until all members
are upgraded and the cluster version upgrade occurs.
Can I stop and start multiple members at once during a rolling member upgrade?
It is not recommended due to potential network partitions. It is advised to always stop and start one
member in each upgrade step.
Can I upgrade my business app together with Hazelcast while doing a rolling member
upgrade?
Yes, but make sure to make the new version of your app compatible with the old one since there
will be a timespan when both versions interoperate. Checking if two versions of your app are
compatible includes verifying binary and algorithmic compatibility and some other steps.
It is worth mentioning that a business app upgrade is orthogonal to a rolling member upgrade. A
rolling business app upgrade may be done without upgrading the members.
◦ Map is the distributed implementation of java.util.Map. It lets you read from and write to a
Hazelcast map with methods such as get and put.
◦ Set is the distributed and concurrent implementation of java.util.Set. It does not allow
duplicate elements and does not preserve their order.
◦ List is similar to Hazelcast Set. The only difference is that it allows duplicate elements and
preserves their order.
◦ Multimap is a specialized Hazelcast map. It is a distributed data structure where you can
store multiple values for a single key.
◦ Replicated Map does not partition data. It does not spread data to different cluster members.
Instead, it replicates the data to all members.
97
◦ Cardinality Estimator is a data structure which implements Flajolet’s HyperLogLog
algorithm.
• Topic is the distributed mechanism for publishing messages that are delivered to multiple
subscribers. It is also known as the publish/subscribe (pub/sub) messaging model. See the Topic
section for more information. Hazelcast also has a structure called Reliable Topic which uses
the same interface of Hazelcast Topic. The difference is that it is backed up by the Ringbuffer
data structure. See the Reliable Topic section.
• Concurrency utilities
◦ PN counter is a distributed data structure where each Hazelcast instance can increment and
decrement the counter value and these updates are propagated to all replicas.
• Event Journal is a distributed data structure that stores the history of mutation actions on map
or cache.
1. Data structures where each partition stores a part of the instance, namely partitioned data
structures.
2. Data structures where a single partition stores the whole instance, namely non-partitioned data
structures.
• Map
• MultiMap
• Event Journal
98
The following are the non-partitioned Hazelcast data structures:
• Queue
• Set
• List
• Ringbuffer
• FencedLock
• ISemaphore
• IAtomicLong
• IAtomicReference
• FlakeIdGenerator
• ICountdownLatch
• Cardinality Estimator
• PN Counter
Besides these, Hazelcast also offers the Replicated Map structure as explained in the above
Standard utility collections list.
Hazelcast offers a get method for most of its distributed objects. To load an object, first create a
Hazelcast instance and then use the related get method on this instance. Following example code
snippet creates an Hazelcast instance and a map on this instance.
As to the configuration of distributed object, Hazelcast uses the default settings from the file
hazelcast.xml that comes with your Hazelcast download. Of course, you can provide an explicit
configuration in this XML or programmatically according to your needs. See the Understanding
Configuration section.
Note that, most of Hazelcast’s distributed objects are created lazily, i.e., a distributed object is
created once the first operation accesses it.
If you want to use an object you loaded in other places, you can safely reload it using its reference
without creating a new Hazelcast instance (customers in the above example).
To destroy a Hazelcast distributed object, you can use the method destroy. This method clears and
releases all resources of the object. Therefore, you must use it with care since a reload with the
same object reference after the object is destroyed creates a new data structure without an error.
See the following example code where one of the queues are destroyed and the other one is
accessed.
99
HazelcastInstance hz1 = Hazelcast.newHazelcastInstance();
HazelcastInstance hz2 = Hazelcast.newHazelcastInstance();
IQueue<String> q1 = hz1.getQueue("q");
IQueue<String> q2 = hz2.getQueue("q");
q1.add("foo");
System.out.println("q1.size: "+q1.size()+ " q2.size:"+q2.size());
q1.destroy();
System.out.println("q1.size: " + q1.size() + " q2.size:" + q2.size());
q1.size: 1 q2.size:1
q1.size: 0 q2.size:0
Hazelcast is designed to create any distributed data structure whenever it is accessed, i.e.,
whenever a call is made to the data structure. Therefore, keep in mind that a data structure is
recreated when you perform an operation on it even after you have destroyed it.
Hazelcast uses the name of a distributed object to determine which partition it will be put. Let’s
load two queues as shown below:
Since these queues have different names, they will be placed into different partitions. If you want
to put these two into the same partition, you use the @ symbol as shown below:
Now, these two queues will be put into the same partition whose partition key is foo. Note that you
can use the method getPartitionKey to learn the partition key of a distributed object. It may be
useful when you want to create an object in the same partition of an existing object. See its usage as
shown below:
100
7.1.3. Common Features of all Hazelcast Data Structures
• If a member goes down, its backup replica (which holds the same data) dynamically
redistributes the data, including the ownership and locks on them, to the remaining live
members. As a result, there will not be any data loss.
• There is no single cluster master that can be a single point of failure. Every member in the
cluster has equal rights and responsibilities. No single member is superior. There is no
dependency on an external 'server' or 'master'.
Here is an example of how you can retrieve existing data structure instances (map, queue, set,
topic, etc.) and how you can listen for instance events, such as an instance being created or
destroyed.
@Override
public void distributedObjectCreated(DistributedObjectEvent event) {
DistributedObject instance = event.getDistributedObject();
System.out.println("Created " + instance.getName());
}
@Override
public void distributedObjectDestroyed(DistributedObjectEvent event) {
DistributedObject instance = event.getDistributedObject();
System.out.println("Destroyed " + instance.getName());
}
7.2. Map
Hazelcast Map (IMap) extends the interface java.util.concurrent.ConcurrentMap and hence
java.util.Map. It is the distributed implementation of Java map. You can perform operations like
reading and writing from/to a Hazelcast map with the well known get and put methods.
101
IMap data structure can also be used by Hazelcast Jet for Real-Time Stream
Processing (by enabling the Event Journal on your map) and Fast Batch Processing.
Hazelcast Jet uses IMap as a source (reads data from IMap) and as a sink (writes
data to IMap). See the Fast Batch Processing and Real-Time Stream Processing use
cases for Hazelcast Jet. See also here in the Hazelcast Jet Reference Manual to learn
how Jet uses IMap, i.e., how it can read from and write to IMap.
Hazelcast partitions your map entries and their backups, and almost evenly distribute them onto all
Hazelcast members. Each member carries approximately "number of map entries * 2 * 1/n" entries,
where n is the number of members in the cluster. For example, if you have a member with 1000
objects to be stored in the cluster and then you start a second member, each member will both
store 500 objects and back up the 500 objects in the other member.
Let’s create a Hazelcast instance and fill a map named Capitals with key-value pairs using the
following code. Use the HazelcastInstance getMap method to get the map, then use the map put
method to put an entry into the map.
When you run this code, a cluster member is created with a map whose entries are distributed
across the members' partitions. See the below illustration. For now, this is a single member cluster.
102
Please note that some of the partitions do not contain any data entries since we
only have 120 objects and the partition count is 271 by default. This count is
configurable and can be changed using the system property
hazelcast.partition.count. See the System Properties appendix.
Now let’s create a second member by running the above code again. This creates a cluster with two
members. This is also where backups of entries are created - remember the backup partitions
mentioned in the Hazelcast Overview section. The following illustration shows two members and
how the data and its backup is distributed.
As you see, when a new member joins the cluster, it takes ownership and loads some of the data in
103
the cluster. Eventually, it will carry almost "(1/n * total-data) + backups" of the data, reducing the
load on other members.
All ConcurrentMap operations such as put and remove might wait if the key is locked by another
thread in the local or remote JVM. But, they will eventually return with success. ConcurrentMap
operations never throw a java.util.ConcurrentModificationException.
Hazelcast distributes map entries onto multiple cluster members (JVMs). Each member holds some
portion of the data.
Distributed maps have one backup by default. If a member goes down, your data is recovered using
the backups in the cluster. There are two types of backups as described below: sync and async.
104
Creating Sync Backups
To provide data safety, Hazelcast allows you to specify the number of backup copies you want to
have. That way, data on a cluster member is copied onto other member(s).
To create synchronous backups, select the number of backup copies using the backup-count
property.
<hazelcast>
...
<map name="default">
<backup-count>1</backup-count>
</map>
...
</hazelcast>
When this count is 1, a map entry will have its backup on one other member in the cluster. If you
set it to 2, then a map entry will have its backup on two other members. You can set it to 0 if you do
not want your entries to be backed up, e.g., if performance is more important than backing up. The
maximum value for the backup count is 6.
Hazelcast supports both synchronous and asynchronous backups. By default, backup operations
are synchronous and configured with backup-count. In this case, backup operations block
operations until backups are successfully copied to backup members (or deleted from backup
members in case of remove) and acknowledgements are received. Therefore, backups are updated
before a put operation is completed, provided that the cluster is stable. Sync backup operations
have a blocking cost which may lead to latency issues.
Asynchronous backups, on the other hand, do not block operations. They are fire & forget and do
not require acknowledgements; the backup operations are performed at some point in time.
To create asynchronous backups, select the number of async backups with the async-backup-count
property. An example is shown below.
<hazelcast>
...
<map name="default">
<backup-count>0</backup-count>
<async-backup-count>1</async-backup-count>
</map>
...
</hazelcast>
Backups increase memory usage since they are also kept in memory.
105
A map can have both sync and async backups at the same time.
By default, Hazelcast has one sync backup copy. If backup-count is set to more than 1, then each
member will carry both owned entries and backup copies of other members. So for the
map.get(key) call, it is possible that the calling member has a backup copy of that key. By default,
map.get(key) always reads the value from the actual owner of the key for consistency.
To enable backup reads (read local backup entries), set the value of the read-backup-data property
to true. Its default value is false for consistency. Enabling backup reads can improve performance
but on the other hand it can cause stale reads while still preserving monotonic-reads property.
<hazelcast>
...
<map name="default">
<backup-count>0</backup-count>
<async-backup-count>1</async-backup-count>
<read-backup-data>true</read-backup-data>
</map>
...
</hazelcast>
This feature is available when there is at least one sync or async backup.
Please note that if you are performing a read from a backup, you should take into account that your
hits to the keys in the backups are not reflected as hits to the original keys on the primary
members. This has an impact on IMap’s maximum idle seconds or time-to-live seconds expiration.
Therefore, even though there is a hit on a key in backups, your original key on the primary member
may expire.
Starting with Hazelcast 3.7, Hazelcast Map uses a new eviction mechanism which
is based on the sampling of entries. See the Eviction Algorithm section for details.
Unless you delete the map entries manually or use an eviction policy, they will remain in the map.
Hazelcast supports policy-based eviction for distributed maps. Currently supported policies are LRU
(Least Recently Used) and LFU (Least Frequently Used).
Hazelcast Map performs eviction based on partitions. For example, when you specify a size using
the PER_NODE attribute for max-size (see the Configuring Map Eviction section), Hazelcast internally
calculates the maximum size for every partition. Hazelcast uses the following equation to calculate
the maximum size of a partition:
106
partition-maximum-size = max-size * member-count / partition-count
The eviction process starts according to this calculated partition maximum size when you try to put
an entry. When entry count in that partition exceeds partition maximum size, eviction starts on
that partition.
The total number of entries here is 20000 (partition count * entry count for each partition). This
means you are at the eviction threshold since you set the max-size to 20000. When you try to put an
entry:
As a result of this eviction process, when you check the size of your map, it is 19999. After this
eviction, subsequent put operations do not trigger the next eviction until the map size is again close
to the max-size.
The above scenario is simply an example that describes how the eviction process
works. Hazelcast finds the most optimum number of entries to be evicted
according to your cluster size and selected policy.
<hazelcast>
...
<map name="default">
<time-to-live-seconds>0</time-to-live-seconds>
<max-idle-seconds>0</max-idle-seconds>
<eviction eviction-policy="LRU" max-size-policy="PER_NODE" size="5000"/>
</map>
...
</hazelcast>
107
The following are the configuration element descriptions:
• time-to-live-seconds: Maximum time in seconds for each entry to stay in the map (TTL). It
limits the lifetime of the entries relative to the time of the last write access performed on them.
If it is not 0, the entries whose lifetime exceeds this period (without any write access performed
on them during this period) are expired and evicted automatically. An individual entry may
have its own lifetime limit by using one of the methods accepting a TTL; see Evicting Specific
Entries section. If there is no TTL value provided for the individual entry, it inherits the value
set for this element. Valid values are integers between 0 and Integer.MAX VALUE. Its default value
is 0, which means infinite (no expiration and eviction). If it is not 0, entries are evicted
regardless of the set eviction-policy described below.
• max-idle-seconds: Maximum time in seconds for each entry to stay idle in the map. It limits the
lifetime of the entries relative to the time of the last read or write access performed on them.
The entries whose idle period exceeds this limit are expired and evicted automatically. An entry
is idle if no get, put, EntryProcessor.process or containsKey is called on it. Valid values are
integers between 0 and Integer.MAX VALUE. Its default value is 0, which means infinite.
Setting this property to 1 second expires the entry after 1 second, regardless of
the operations done on that entry in-between, due to the loss of millisecond
resolution on the entry timestamps. Assume that you create a record at time =
1 second (1000 milliseconds) and access it at wall clock time 1100 milliseconds
and then again at 1400 milliseconds. In this case, the entry is deemed as not
touched. So, setting this property to 1 second is not supported.
• eviction-policy: Eviction policy to be applied when the size of map grows larger than the value
specified by the size element described below. Valid values are:
◦ NONE: Default policy. If set, no items are evicted and the property size described below is
ignored. You still can combine it with time-to-live-seconds and max-idle-seconds.
Apart from the above values, you can also develop and use your own eviction policy. See the
Custom Eviction Policy section.
• size: Maximum size of the map. When maximum size is reached, the map is evicted based on
the policy defined. Valid values are integers between 0 and Integer.MAX VALUE. Its default value
is 0, which means infinite. If you want size to work, set the eviction-policy property to a value
other than NONE. Its attributes are described below.
◦ PER_NODE: Maximum number of map entries in each cluster member. This is the default
policy.
108
◦ PER_PARTITION: Maximum number of map entries within each partition. Storage size depends
on the partition count in a cluster member. This attribute should not be used often. For
instance, avoid using this attribute with a small cluster. If the cluster is small, it hosts more
partitions, and therefore map entries, than that of a larger cluster. Thus, for a small cluster,
eviction of the entries decreases performance (the number of entries is large).
◦ USED_HEAP_SIZE: Maximum used heap size in megabytes per map for each Hazelcast instance.
Please note that this policy does not work when in-memory format is set to OBJECT, since the
memory footprint cannot be determined when data is put as OBJECT.
◦ USED_HEAP_PERCENTAGE: Maximum used heap size percentage per map for each Hazelcast
instance. If, for example, a JVM is configured to have 1000 MB and this value is 10, then the
map entries will be evicted when used heap size exceeds 100 MB. Please note that this policy
does not work when in-memory format is set to OBJECT, since the memory footprint cannot
be determined when data is put as OBJECT.
◦ FREE_HEAP_PERCENTAGE: Minimum free heap size percentage for each JVM. If, for example, a
JVM is configured to have 1000 MB and this value is 10, then the map entries will be evicted
when free heap size is below 100 MB.
To put it briefly, Hazelcast maps have no restrictions on the size and may grow arbitrarily large, by
default. When it comes to reducing the size of a map, there are two concepts: expiration and
109
eviction.
Expiration puts a limit on the maximum lifetime of an entry stored inside the map. When the entry
expires it cannot be retrieved from the map any longer and at some point in time it will be cleaned
out from the map to free up the memory. Expiration, and hence the eviction based on the
expiration, can be configured using the element time-to-live-seconds and max-idle-seconds as
described above.
Eviction puts a limit on the maximum size of the map. If the size of the map grows larger than the
maximum allowed size, an eviction policy decides which item to evict from the map to reduce its
size. The maximum allowed size can be configured using the element size and the eviction policy
can be configured using the element eviction-policy as described above.
Eviction and expiration can be used together. In this case, the expiration configurations (time-to-
live-seconds and max-idle-seconds) continue to work as usual cleaning out the expired entries
regardless of the map size. Note that locked map entries are not the subjects for eviction and
expiration.
<hazelcast>
...
<map name="documents">
<eviction eviction-policy="LRU" max-size-policy="PER_NODE" size="10000"/>
<max-idle-seconds>60</max-idle-seconds>
</map>
...
</hazelcast>
In the above example, documents map starts to evict its entries from a member when the map size
exceeds 10000 in that member. Then the entries least recently used will be evicted. The entries not
used for more than 60 seconds will be evicted as well.
And the following is an example eviction configuration for a map having NATIVE as the in-memory
format:
<hazelcast>
...
<map name="nativeMap*">
<in-memory-format>NATIVE</in-memory-format>
<eviction max-size-policy="USED_NATIVE_MEMORY_PERCENTAGE" eviction-policy="
LFU" size="99"/>
</map>
...
</hazelcast>
110
Evicting Specific Entries
The eviction policies and configurations explained above apply to all the entries of a map. The
entries that meet the specified eviction conditions are evicted.
If you want to evict some specific map entries, you can use the ttl and ttlUnit parameters of the
method map.put(). An example code line is given below.
The map entry with the key "1" will be evicted 50 seconds after it is put into myMap.
You may also use map.setTTL method to alter the time-to-live value of an existing entry. It is done as
follows:
In addition to the ttl, you may also specify a maximum idle timeout for specific map entries using
the maxIdle and maxIdleUnit parameters:
Here ttl is set as 50 seconds and maxIdle is set as 40 seconds. The entry is considered to be evicted if
at least one of these policies marks it as expired. If you want to specify only the maxIdle parameter,
you need to set ttl as 0 seconds.
To evict all keys from the map except the locked ones, use the method evictAll(). If a MapStore is
defined for the map, deleteAll is not called by evictAll. If you want to call the method deleteAll,
use clear().
111
final int numberOfKeysToLock = 4;
final int numberOfEntriesToAdd = 1000;
Forced Eviction
Hazelcast may use forced eviction in the cases when the eviction explained in Understanding Map
Eviction is not enough to free up your memory. Note that this is valid if you are using Hazelcast
IMDG Enterprise and you set your in-memory format to NATIVE.
The forced eviction mechanism is explained below as steps in the given order:
• When the normal eviction is not enough, forced eviction is triggered and first it tries to evict
approx. 20% of the entries from the current partition. It retries this five times.
• If the result of above step is still not enough, forced eviction applies the above step to all maps.
This time it might perform eviction from some other partitions too, provided that they are
owned by the same thread.
• If that is still not enough to free up your memory, it evicts not the 20% but all the entries from
the current partition.
• If that is not enough, it will evict all the entries from the other data structures; from the
partitions owned by the local thread.
Finally, when all the above steps are not enough, Hazelcast throws a native OutOfMemoryException.
When you have an evictable cache/map, you should safely put entries to it without facing with any
memory shortages. Forced eviction helps to achieve this. Regular eviction removes one entry at a
112
time while forced eviction can remove multiple entries, which can even be owned by another
caches/maps.
Apart from the policies such as LRU and LFU, which Hazelcast provides out-of-the-box, you can
develop and use your own eviction policy.
instance.shutdown();
113
}
/**
* Odd evictor tries to evict odd keys first.
*/
private static class OddEvictor
implements MapEvictionPolicyComparator<Integer, Integer> {
@Override
public int compare(EntryView<Integer, Integer> e1,
EntryView<Integer, Integer> e2) {
return 0;
}
}
}
Then you can enable your policy by setting it via the method
MapConfig.getEvictionConfig().setComparatorClassName() programmatically or via XML
declaratively. Following is the example declarative configuration for the eviction policy OddEvictor
implemented above:
<hazelcast>
...
<map name="test">
...
<eviction comparator-class-name="com.mycompany.OddEvictor"/>
...
</map>
</hazelcast>
If you Hazelcast with Spring, you can enable your policy as shown below.
<hz:map name="test">
<hz:map-eviction comparator-class-name="com.package.OddEvictor"/>
</hz:map>
114
7.2.5. Setting In-Memory Format
IMap (and a few other Hazelcast data structures, such as ICache) has an in-memory-format
configuration option. By default, Hazelcast stores data into memory in binary (serialized) format.
Sometimes it can be efficient to store the entries in their object form, especially in cases of local
processing, such as entry processor and queries.
Specify the in-memory-format element in the configuration to set how the data will be stored in the
memory. You have the following format options:
• BINARY (default): The data (both the key and value) is stored in serialized binary format. You can
use this option if you mostly perform regular map operations, such as put and get.
• OBJECT: The data is stored in deserialized form. This configuration is good for maps where entry
processing and queries form the majority of all operations and the objects are complex, making
the serialization cost comparatively high. By storing objects, entry processing does not contain
the deserialization cost. Note that when you use OBJECT as the in-memory format, the key is still
stored in binary format and the value is stored in object format.
• NATIVE: (Hazelcast IMDG Enterprise HD) This format behaves the same as BINARY, however,
instead of heap memory, key and value are stored in the off-heap memory.
Regular operations like get rely on the object instance. When the OBJECT format is used and a get is
performed, the map does not return the stored instance, but creates a clone. Therefore, this whole
get operation first includes a serialization on the member owning the instance and then a
deserialization on the member calling the instance. When the BINARY format is used, only a
deserialization is required; BINARY is faster.
Similarly, a put operation is faster when the BINARY format is used. If the format was OBJECT, the map
would create a clone of the instance, and there would first be a serialization and then a
deserialization. When BINARY is used, only a deserialization is needed.
If a value is stored in OBJECT format, a change on a returned value does not affect
the stored instance. In this case, the returned instance is not the actual one but a
clone. Therefore, changes made on an object after it is returned will not reflect on
the actual stored data. Similarly, when a value is written to a map and the value is
stored in OBJECT format, it will be a copy of the put value. Therefore, changes made
on the object after it is stored will not reflect on the stored data.
Hazelcast instances are Java programs. In case of BINARY and OBJECT in-memory formats, Hazelcast
stores your distributed data into the heap of its server instances. Java heap is subject to garbage
collection (GC). In case of larger heaps, garbage collection might cause your application to pause for
tens of seconds (even minutes for really large heaps), badly affecting your application performance
and response times.
As the data gets bigger, you either run the application with larger heap, which would result in
115
longer GC pauses or run multiple instances with smaller heap which can turn into an operational
nightmare if the number of such instances becomes very high.
To overcome this challenge, Hazelcast offers High-Density Memory Store for your maps. You can
configure your map to use High-Density Memory Store by setting the in-memory format to NATIVE.
The following snippet is the declarative configuration example.
<hazelcast>
...
<map name="nativeMap*">
<in-memory-format>NATIVE</in-memory-format>
</map>
...
</hazelcast>
Keep in mind that you should have already enabled the High-Density Memory Store usage for your
cluster. See the Configuring High-Density Memory Store section.
You can also benefit from the persistent memory technologies such as Intel® Optane™ DC to be
used by the High-Density Memory Store. See the Using Persistent Memory section.
Note that the eviction mechanism is different for NATIVE in-memory format. The new eviction
algorithm for map with High-Density Memory Store is similar to that of JCache with High-Density
Memory Store and is described here.
<hazelcast>
...
<map name="nativeMap*">
<in-memory-format>NATIVE</in-memory-format>
<eviction-percentage>25</eviction-percentage> <--! NO IMPACT with NATIVE -->
</map>
...
</hazelcast>
+ * These IMap eviction policies for size cannot be used: FREE_HEAP_PERCENTAGE, FREE_HEAP_SIZE,
USED_HEAP_PERCENTAGE, USED_HEAP_SIZE.
Hazelcast IMap offers automatic preprocessing of various data types on the update time to make
116
queries faster. It is currently supported only by the HazelcastJsonValue type. When metadata
creation is on, IMap creates additional metadata about the objects of supported types and uses this
metadata during the querying. It does not affect the latency and throughput of the object of any
type except the supported types.
This feature is on by default. You can configure it using the metadata-policy configuration element.
Declarative Configuration:
<hazelcast>
...
<map name="map-a">
<!--
valid values for metadata-policy are:
- OFF
- CREATE_ON_UPDATE (default)
-->
<metadata-policy>OFF</metadata-policy>
</map>
...
</hazelcast>
Programmatic Configuration:
Hazelcast allows you to load and store the distributed map entries from/to a persistent data store
such as a relational database. To do this, you can use Hazelcast’s MapStore and MapLoader interfaces.
When you provide a MapLoader implementation and request an entry (IMap.get()) that does not exist
in memory, MapLoader's load method loads that entry from the data store. This loaded entry is placed
into the map and will stay there until it is removed or evicted.
When a MapStore implementation is provided, an entry is also put into a user defined data store.
Data store needs to be a centralized system that is accessible from all Hazelcast
members. Persistence to a local file system is not supported.
Also note that the MapStore interface extends the MapLoader interface as you can see
in the interface code.
Starting with Hazelcast IMDG 3.11, all loads can be listened via
EntryLoadedListener.
117
Following is a MapStore example.
public PersonMapStore() {
try {
con = DriverManager.getConnection("jdbc:hsqldb:mydatabase", "SA", "");
con.createStatement().executeUpdate(
"create table if not exists person (id bigint not null, name
varchar(45), primary key (id))");
allKeysStatement = con.prepareStatement("select id from person");
} catch (SQLException e) {
throw new RuntimeException(e);
}
}
118
public synchronized Person load(Long key) {
try {
ResultSet resultSet = con.createStatement().executeQuery(
format("select name from person where id =%s", key));
try {
if (!resultSet.next()) {
return null;
}
String name = resultSet.getString(1);
return new Person(key, name);
} finally {
resultSet.close();
}
} catch (SQLException e) {
throw new RuntimeException(e);
}
}
During the initial loading process, MapStore uses a thread different from the
partition threads that are used by the ExecutorService. After the initialization is
completed, the map.get method looks up any nonexistent value from the database
in a partition thread, or the map.put method looks up the database to return the
previously associated value for a key also in a partition thread.
Entries loaded by MapLoader do not have a set time-to-live property. Therefore, they live until evicted
or explicitly removed. It is possible to enforce time-to-live on the entries by using EntryLoader.
EntryLoader allows you to set time-to-live values per key before handing the values to Hazelcast.
Therefore, you can store and load key specific time-to-live values in the external storage.
Similar to EntryLoader, in order to store custom expiration times associated with the entries, you
may use EntryStore. EntryStore allows you to retrieve associated expiration date for each entry. The
expiration date is an offset from an epoch in milliseconds. Epoch is January 1, 1970 UTC which is
used by System.currentTimeMillis().
119
Although the expiration date is expressed in milliseconds, IMap has second
granularity when it comes to expiration. Therefore, the expiration date is rounded
to the nearest lower whole second.
EntryLoader and EntryStore extend from MapLoader and MapStore, respectively. Therefore, all features
and configuration parameters of MapLoader and MapStore apply to them, too.
public PersonEntryStore() {
try {
con = DriverManager.getConnection("jdbc:hsqldb:mydatabase", "SA", "");
con.createStatement().executeUpdate(
"create table if not exists person (id bigint not null, name
varchar(45), expiration-date bigint, primary key (id))");
allKeysStatement = con.prepareStatement("select id from person");
} catch (SQLException e) {
throw new RuntimeException(e);
}
}
@Override
public synchronized void delete(Long key) {
System.out.println("Delete:" + key);
try {
con.createStatement().executeUpdate(
format("delete from person where id = %s", key));
} catch (SQLException e) {
throw new RuntimeException(e);
}
}
@Override
public synchronized void store(Long key, MetadataAwareValue<Person> value) {
try {
con.createStatement().executeUpdate(
format("insert into person values(%s,'%s', %d)", key, value
.getValue().getName(), value.getExpirationTime()));
} catch (SQLException e) {
throw new RuntimeException(e);
}
}
@Override
public void storeAll(Map<Long, MetadataAwareValue<Person>> map) {
120
for (Map.Entry<Long, MetadataAwareValue<Person>> entry : map.entrySet()) {
store(entry.getKey(), entry.getValue());
}
}
@Override
public synchronized void deleteAll(Collection<Long> keys) {
for (Long key : keys) {
delete(key);
}
}
@Override
public synchronized MetadataAwareValue<Person> load(Long key) {
try {
ResultSet resultSet = con.createStatement().executeQuery(
format("select name,expiration-date from person where id =%s",
key));
try {
if (!resultSet.next()) {
return null;
}
String name = resultSet.getString(1);
Long expirationDate = resultSet.getLong(2);
return new MetadataAwareValue<>(new Person(key, name), expirationDate
);
} finally {
resultSet.close();
}
} catch (SQLException e) {
throw new RuntimeException(e);
}
}
@Override
public synchronized Map<Long, MetadataAwareValue<Person>> loadAll(Collection<Long>
keys) {
Map<Long, MetadataAwareValue<Person>> result = new HashMap<>();
for (Long key : keys) {
result.put(key, load(key));
}
return result;
}
121
Hazelcast supports read-through, write-through and write-behind persistence modes, which are
explained in the subsections below.
If an entry does not exist in memory when an application asks for it, Hazelcast asks the loader
implementation to load that entry from the data store. If the entry exists there, the loader
implementation gets it, hands it to Hazelcast, and Hazelcast puts it into memory. This is read-
through persistence mode.
As you can remember from the introduction of this section, the IMap.get() method triggers the
load() method in your MapLoader implementation if an entry does not exist in the memory. In this
case, note that the IMap.get() method does not create backup copies for such entries, when the
mode is read-through persistence: there is no need for backups for these entries since if the
primary entry is lost, then a read for the key triggers the load() method and loads the entry from
the persistence layer.
• In-Memory backup copies are successfully created on other cluster members (if backup-count is
greater than 0).
If MapStore throws an exception then the exception is propagated to the original put or remove call in
the form of RuntimeException.
You can configure MapStore as write-behind by setting the write-delay-seconds property to a value
bigger than 0. This means the modified entries will be put to the data store asynchronously after a
configured delay.
122
When you set MapStoreConfig.setWriteCoalescing() to FALSE, after you reached per-
node maximum write-behind-queue capacity, subsequent put operations will fail
with ReachedMaxSizeException. This exception is thrown to prevent uncontrolled
grow of write-behind queues. You can set per-node maximum capacity using the
system property hazelcast.map.write.behind.queue.capacity. See the System
Properties appendix for information on this property and how to set the system
properties.
• in-memory backup copies are successfully created on the other cluster members (if backup-
count is greater than 0)
• the entry is marked as dirty so that after write-delay-seconds, it can be persisted with
MapStore.store(key,value) call
• and for fault tolerance, dirty entries are stored in a queue on the primary member and also on a
back-up member.
The same behavior goes for the map.remove(key), the only difference is that MapStore.delete(key) is
called when the entry will be deleted.
If MapStore throws an exception, then Hazelcast tries to store the entry again. If the entry still
cannot be stored, a log message is printed and the entry is re-queued.
For batch write operations, which are only allowed in write-behind mode, Hazelcast calls the
MapStore.storeAll(map) and MapStore.deleteAll(collection) methods to do all writes in a single call.
123
<hazelcast>
...
<map name="default">
<map-store enabled="true" initial-mode="LAZY">
<class-name>com.hazelcast.examples.DummyStore</class-name>
<write-delay-seconds>60</write-delay-seconds>
<write-batch-size>1000</write-batch-size>
<write-coalescing>true</write-coalescing>
</map-store>
</map>
...
</hazelcast>
The following are the descriptions of MapStore configuration elements and attributes:
• write-batch-size: Used to create batch chunks when writing map store. In default mode, all map
entries are tried to be written in one go. To create batch chunks, the minimum meaningful
value for write-batch-size is 2. For values smaller than 2, it works as in default mode.
• enabled: True to enable this map-store, false to disable. Its default value is true.
• initial-mode: Sets the initial load mode. LAZY is the default load mode, where load is
asynchronous. EAGER means load is blocked till all partitions are loaded. See the Initializing
Map on Startup section for more details.
A configuration can be applied to more than one map using wildcards (see Using Wildcards),
meaning that the configuration is shared among the maps. But MapStore does not know which
entries to store when there is one configuration applied to multiple maps.
To store entries when there is one configuration applied to multiple maps, use Hazelcast’s
MapStoreFactory interface. Using the MapStoreFactory interface, MapStores for each map can be
created when a wildcard configuration is used. Example code is shown below.
124
Config config = new Config();
MapConfig mapConfig = config.getMapConfig( "*" );
MapStoreConfig mapStoreConfig = mapConfig.getMapStoreConfig();
mapStoreConfig.setFactoryImplementation( new MapStoreFactory<Object, Object>() {
@Override
public MapLoader<Object, Object> newMapStore( String mapName, Properties
properties ) {
return null;
}
});
To initialize the MapLoader implementation with the given map name, configuration properties and
the Hazelcast instance, implement the MapLoaderLifecycleSupport interface. This interface has the
methods init() and destroy().
The method init() initializes the MapLoader implementation. Hazelcast calls this method when the
map is first used on the Hazelcast instance. The MapLoader implementation can initialize the
required resources for implementing MapLoader such as reading a configuration file or creating a
database connection.
Hazelcast calls the method destroy() before shutting down. You can override this method to
cleanup the resources held by this MapLoader implementation, such as closing the database
connections.
To pre-populate the in-memory map when the map is first touched/used, use the
MapLoader.loadAllKeys API.
The InitialLoadMode configuration parameter in the class MapStoreConfig has two values: LAZY and
EAGER. If InitialLoadMode is set to LAZY, data is not loaded during the map creation. If it is set to EAGER,
all the data is loaded while the map is created and everything becomes ready to use. Also, if you
add indices to your map with the IndexConfig class or the addIndex method, then InitialLoadMode is
overridden and MapStoreConfig behaves as if EAGER mode is on.
1. When getMap() is first called from any member, initialization starts depending on the value of
InitialLoadMode. If it is set to EAGER, initialization starts on all partitions as soon as the map is
touched, i.e., all partitions are loaded when getMap is called. If it is set to LAZY, data is loaded
partition by partition, i.e., each partition is loaded with its first touch.
2. Hazelcast calls MapLoader.loadAllKeys() to get all your keys on one of the members.
125
3. That member distributes keys to all other members in batches.
4. Each member loads values of all its owned keys by calling MapLoader.loadAll(keys).
5. Each member puts its owned entries into the map by calling IMap.putTransient(key,value).
If the load mode is LAZY and the clear() method is called (which triggers
MapStore.deleteAll()), Hazelcast removes ONLY the loaded entries from your map
and datastore. Since all the data is not loaded in this case (LAZY mode), please note
that there may still be entries in your datastore.
If you do not want the MapStore start to load as soon as the first cluster member
starts, you can use the system property hazelcast.initial.min.cluster.size. For
example, if you set its value as 3, loading process will be blocked until all three
members are completely up.
The return type of loadAllKeys() is changed from Set to Iterable with the release
of Hazelcast 3.5. MapLoader implementations from previous releases are also
supported and do not need to be adapted.
If the number of keys to load is large, it is more efficient to load them incrementally rather than
loading them all at once. To support incremental loading, the MapLoader.loadAllKeys() method
returns an Iterable which can be lazily populated with the results of a database query.
Hazelcast iterates over the Iterable and, while doing so, sends out the keys to their respective
owner members. The Iterator obtained from MapLoader.loadAllKeys() may also implement the
Closeable interface, in which case Iterator is closed once the iteration is over. This is intended for
releasing resources such as closing a JDBC result set.
The method loadAll loads some or all keys into a data store in order to optimize the multiple load
operations. The method has two signatures; the same method can take two different parameter
lists. One signature loads the given keys and the other loads all keys. See the example code below.
126
final int numberOfEntriesToAdd = 1000;
final String mapName = LoadAll.class.getCanonicalName();
final Config config = createNewConfig(mapName);
final HazelcastInstance node = Hazelcast.newHazelcastInstance(config);
final IMap<Integer, Integer> map = node.getMap(mapName);
populateMap(map, numberOfEntriesToAdd);
System.out.printf("# Map store has %d elements\n", numberOfEntriesToAdd);
map.evictAll();
System.out.printf("# After evictAll map size\t: %d\n", map.size());
map.loadAll(true);
System.out.printf("# After loadAll map size\t: %d\n", map.size());
In some scenarios, you may need to modify the object after storing it into the map store. For
example, you can get an ID or version auto-generated by your database and then need to modify
your object stored in the distributed map, but not to break the synchronization between the
database and the data grid.
To post-process an object in the map store, implement the PostProcessingMapStore interface to put
the modified object into the distributed map. This triggers an extra step of Serialization, so use it
only when needed. (This is only valid when using the write-through map store configuration.)
Please note that if you are using a post-processing map store in combination with
the entry processors, post-processed values will not be carried to backups.
You can prepare your own MapLoader to access a database such as Cassandra and MongoDB. For this,
you can first declaratively specify the database properties in your hazelcast.xml configuration file
and then implement the MapLoaderLifecycleSupport interface to pass those properties.
You can define the database properties, such as its URL and name, using the properties
configuration element. The following is a configuration example for MongoDB:
127
<hazelcast>
...
<map name="supplements">
<map-store enabled="true" initial-mode="LAZY">
<class-name>com.hazelcast.loader.YourMapStoreImplementation</class-name>
<properties>
<property name="mongo.url">mongodb://localhost:27017</property>
<property name="mongo.db">mydb</property>
<property name="mongo.collection">supplements</property>
</properties>
</map-store>
</map>
...
</hazelcast>
After you specified the database properties in your configuration, you need to implement the
MapLoaderLifecycleSupport interface and give those properties in the init() method, as shown
below:
public YourMapStoreImplementation() {
}
@Override
public void init(HazelcastInstance hazelcastInstance, Properties properties,
String mapName) {
String mongoUrl = (String) properties.get("mongo.url");
String dbName = (String) properties.get("mongo.db");
String collectionName = (String) properties.get("mongo.collection");
this.mongoClient = new MongoClient(new MongoClientURI(mongoUrl));
this.collection = mongoClient.getDatabase(dbName).getCollection(
collectionName);
}
As it is explained in the above sections, you can configure Hazelcast maps to be backed by a map
store to persist the entries. In this case many of the IMap methods call
MapLoader or MapStore methods to load, store or remove data. This section summarizes these
methods. Here are the Hazelcast IMap operations that may trigger the MapStore or MapLoader
methods:
128
IMap Method Impact on the MapStore/MapLoader
flush() If the map has a MapStore, this method flushes all the local dirty entries. It calls
the MapStore.storeAll(Map) or MapStore.deleteAll(Collection) methods with the
elements marked as dirty.
• put() These methods are used to put entries to the map. They call the
• putAll() MapLoader.load(Object) method for each entry not found in the memory to load
• putAsync() the value from the map store backing the map. They also call the
MapStore.store(Object, Object) method for each entry, if write-through
• tryPut()
persistence mode is configured before the entry is added into the memory.
• putIfAbsent
()
• set() These methods put an entry into the map without returning the old value. They
• setAsync() call the MapStore.store(Object, Object) method if write-through persistence
mode is configured before the entry is added into the memory, to write the value
into the map store.
remove() Removes the mapping for a key from the map if it is present. It calls the
MapLoader.load(Object) method if no value is found with key in the memory, to
load the value from the map store backing the map. It also calls the
MapStore.delete(Object) method if write-through persistence mode is configured
before the value is removed from the memory, to remove the value from the map
store.
• removeAll() These methods are used to remove entries from the map for various conditions.
• delete() They call the MapStore.delete(Object) method if write-through persistence mode
• removeAsync is configured before the value is removed from the memory, to remove the value
() from the map store.
• tryRemove()
clear() It clears the map and deletes the items from the backing map store. It calls
the MapStore.deleteAll(Collection) method on each partition with the keys that
the given partition stores.
replace() It replaces the entry for a key only if currently mapped to a given value. It calls
the MapStore.store(Object, Object) method if write-through persistence mode is
configured before the value is stored in the memory, to write the value into the
map store.
129
IMap Method Impact on the MapStore/MapLoader
• executeOnKe These methods apply the user defined entry processors to the entry or entries.
y() They call the MapLoader.load(Object) method if the value with key is not found in
• executeOnKe the memory, to load the value from the map store backing the map. If the entry
ys()
processor updates the entry and write-through persistence mode is configured,
• submitToKey before the value is stored in memory, they call the MapStore.store(Object,
()
Object) method to write the value into the map store. If the entry processor
• executeOnAl updates the entry’s value to null value and write-through persistence mode is
lEntries()
configured, before the value is removed from the memory, they call the
MapStore.delete(Object) method to delete the value from the map store.
The Hazelcast distributed map supports a local Near Cache for remotely stored entries to increase
the performance of local read operations. See the Near Cache section for a detailed explanation of
the Near Cache feature and its configuration.
Hazelcast Distributed Map (IMap) is thread-safe to meet your thread safety requirements. When
these requirements increase or you want to have more control on the concurrency, consider the
Hazelcast solutions described here.
130
If the above code is run by more than one cluster member simultaneously, a race condition is likely.
You can solve this condition with Hazelcast using either pessimistic or optimistic locking.
Pessimistic Locking
One way to solve the race issue is by using pessimistic locking - lock the map entry until you are
finished with it.
To perform pessimistic locking, use the lock mechanism provided by the Hazelcast distributed map,
i.e., the map.lock and map.unlock methods. See the below example code.
The IMap lock will automatically be collected by the garbage collector when the lock is released and
no other waiting conditions exist on the lock.
In some cases, a client application connected to your cluster may cause the entries
in a map to remain locked after the application has been restarted (which were
already locked before such a restart). This can be due to the reasons such as
incomplete/incorrect client implementations. In these cases, you can unlock the
entries, either from the thread which locked them using the IMap.unlock() method,
or check if the entry is locked using the IMap.isLock() method and then call
IMap.forceUnlock().
131
For the above case, as a workaround, you can also kill all the applications
connected to the cluster and use the Management Center’s scripting functionality
to clear the map and release the locks (instead of using IMap.forceUnlock()). Keep
in mind that the scripting functionality is limited to working with maps that have
primitive key types, e.g., string keys and limited to relaying only a single string of
output per member to the result panel in the Management Center.
Another way to solve the race issue is by acquiring a predictable Lock object from Hazelcast. This
way, every value in the map can be given a lock, or you can create a stripe of locks.
Optimistic Locking
In Hazelcast, you can apply the optimistic locking strategy with the map’s replace method. This
method compares values in object or data forms depending on the in-memory format
configuration. If the values are equal, it replaces the old value with the new one. If you want to use
your defined equals method, in-memory-format should be OBJECT. Otherwise, Hazelcast serializes
objects to BINARY forms and compares them.
132
public class OptimisticMember {
public static void main( String[] args ) throws Exception {
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
IMap<String, Value> map = hz.getMap( "map" );
String key = "1";
map.put( key, new Value() );
System.out.println( "Starting" );
for ( int k = 0; k < 1000; k++ ) {
if ( k % 10 == 0 ) System.out.println( "At: " + k );
for (; ; ) {
Value oldValue = map.get( key );
Value newValue = new Value( oldValue );
Thread.sleep( 10 );
newValue.amount++;
if ( map.replace( key, oldValue, newValue ) )
break;
}
}
System.out.println( "Finished! Result = " + map.get( key ).amount );
}
public Value() {
}
Optimistic locking is better for mostly read-only systems. It has a performance boost over
pessimistic locking.
Pessimistic locking is good if there are lots of updates on the same key. It is more robust than
optimistic locking from the perspective of data consistency.
133
In Hazelcast, use IExecutorService to submit a task to a key owner, or to a member or members.
This is the recommended way to perform task executions, rather than using pessimistic or
optimistic locking techniques. IExecutorService has fewer network hops and less data over wire,
and tasks are executed very near to the data. See the Data Affinity section.
The ABA problem occurs in environments when a shared resource is open to change by multiple
threads. Even if one thread sees the same value for a particular key in consecutive reads, it does not
mean that nothing has changed between the reads. Another thread may change the value, do work
and change the value back, while the first thread thinks that nothing has changed.
To prevent these kind of problems, you can assign a version number and check it before any write
to be sure that nothing has changed between consecutive reads. Although all the other fields are
equal, the version field will prevent objects from being seen as equal. This is the optimistic locking
strategy; it is used in environments that do not expect intensive concurrent changes on a specific
key.
In Hazelcast, you can apply the optimistic locking strategy with the map replace method.
Locks can be configured to check the number of currently present members before applying a
locking operation. If the check fails, the lock operation fails with a SplitBrainProtectionException
(see the Split-Brain Protection section). As pessimistic locking uses lock operations internally, it also
uses the configured lock split-brain protection. This means that you can configure a lock split-brain
protection with the same name or a pattern that matches the map name. Note that the split-brain
protection for IMap locking actions can be different from the split-brain protection for other IMap
actions.
The following actions check for lock split-brain protection before being applied:
• IMap.isLocked()
• IMap.tryLock(K), IMap.tryLock(K, long, java.util.concurrent.TimeUnit) and IMap.tryLock(K,
long, java.util.concurrent.TimeUnit, long, java.util.concurrent.TimeUnit)
• IMap.unlock()
• IMap.forceUnlock()
• MultiMap.lock(K) and MultiMap.lock(K, long, java.util.concurrent.TimeUnit)
• MultiMap.isLocked()
• MultiMap.tryLock(K), MultiMap.tryLock(K, long, java.util.concurrent.TimeUnit) and
MultiMap.tryLock(K, long, java.util.concurrent.TimeUnit, long,
java.util.concurrent.TimeUnit)
• MultiMap.unlock()
• MultiMap.forceUnlock()
134
<hazelcast>
...
<map name="myMap">
<split-brain-protection-ref>map-actions-split-brain-protection</split-brain-
protection-ref>
</map>
<lock name="myMap">
<split-brain-protection-ref>map-lock-actions-split-brain-protection</split-
brain-protection-ref>
</lock>
...
</hazelcast>
Here the configured map uses the map-lock-actions-split-brain-protection for map lock actions
and the map-actions-split-brain-protection for other map actions.
You can retrieve the statistics of the map in your Hazelcast IMDG member using the
getLocalMapStats() method, which is the programmatic approach. It returns information such as
primary and backup entry count, last update time and locked entry count. If you need the cluster-
wide map statistics, you can get the local map statistics from all members of the cluster and
combine them. Alternatively, you can see the map statistics on the Hazelcast Management Center.
To be able to retrieve the map statistics, the statistics-enabled element under the map
configuration should be set as true, which is the default value:
<hazelcast>
...
<map name="myMap">
<statistics-enabled>true</statistics-enabled>
</map>
...
</hazelcast>
When this element is set to false, the statistics are not gathered for the map and cannot be seen on
the Hazelcast Management Center, nor retrieved by the getLocalMapStats() method.
Hazelcast also keeps statistics about each map entry, such as creation time, last update time, last
access time, and number of hits and version. To access the map entry statistics, use an
IMap.getEntryView(key) call. Here is an example.
135
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
EntryView entry = hz.getMap( "quotes" ).getEntryView( "1" );
System.out.println ( "size in memory : " + entry.getCost() );
System.out.println ( "creationTime : " + entry.getCreationTime() );
System.out.println ( "expirationTime : " + entry.getExpirationTime() );
System.out.println ( "number of hits : " + entry.getHits() );
System.out.println ( "lastAccessedTime: " + entry.getLastAccessTime() );
System.out.println ( "lastUpdateTime : " + entry.getLastUpdateTime() );
System.out.println ( "version : " + entry.getVersion() );
System.out.println ( "key : " + entry.getKey() );
System.out.println ( "value : " + entry.getValue() );
You can listen to the modifications performed on specific map entries. You can think of it as an
entry listener with predicates. See the Listening for Map Events section for information on how to
add entry listeners to a map.
Since version 3.7, when you configure Hazelcast members with property
hazelcast.map.entry.filtering.natural.event.types set to true, handling of entry updates
conceptually treats value transition as entry, update or exit with regards to the predicate value
space. The following table compares how a listener is notified about an update to a map entry value
under the default backwards-compatible Hazelcast behavior (when property
hazelcast.map.entry.filtering.natural.event.types is not set or is set to false) versus when set to
true:
Default hazelcast.map.entry.filtering.
natural.event.types = true
When old value matches No event is delivered to entry REMOVED event is delivered to
predicate, new value does not listener entry listener
match predicate
When old value matches UPDATED event is delivered to UPDATED event is delivered to
predicate, new value matches entry listener entry listener
predicate
136
When old value does not match No event is delivered to entry No event is delivered to entry
predicate, new value does not listener listener
match predicate
When old value does not match UPDATED event is delivered to ADDED event is delivered to entry
predicate, new value matches entry listener listener
predicate
As an example, let’s listen to the changes made on an employee with the surname "Smith". First,
let’s create the Employee class.
@Override
public String toString() {
return "Employee{" +
"surname='" + surname + '\'' +
'}';
}
}
Then, let’s create a listener with predicate by adding a listener that tracks ADDED, UPDATED and REMOVED
entry events with the surname predicate.
137
public class ListenerWithPredicate {
@Override
public void entryRemoved(EntryEvent<String, String> event) {
System.out.println("Entry Removed:" + event);
}
@Override
public void entryUpdated(EntryEvent<String, String> event) {
System.out.println("Entry Updated:" + event);
}
}
}
And now, let’s play with the employee "smith" and see how that employee is listened to.
138
public class Modify {
When you first run the class ListenerWithPredicate and then run Modify, an output similar to the
one below appears.
You can remove all map entries that match your predicate. For this, Hazelcast offers the method
removeAll(). Its syntax is as follows:
Normally the map entries matching the predicate are found with a full scan of the map. If the
entries are indexed, Hazelcast uses the index search to find them. With index, you can expect that
finding the entries is faster.
When removeAll() is called, ALL entries in the caller member’s Near Cache are also
removed.
You can add intercept operations and execute your own business logic synchronously blocking the
operations. You can change the returned value from a get operation, change the value in put, or
cancel operations by throwing an exception.
Interceptors are different from listeners. With listeners, you take an action after the operation has
139
been completed. Interceptor actions are synchronous and you can alter the behavior of operation,
change its values, or totally cancel it.
Map interceptors are chained, so adding the same interceptor multiple times to the same map can
result in duplicate effects. This can easily happen when the interceptor is added to the map at
member initialization, so that each member adds the same interceptor. When you add the
interceptor in this way, be sure to implement the hashCode() method to return the same value for
every instance of the interceptor. It is not strictly necessary, but it is a good idea to also implement
equals() as this ensures that the map interceptor can be removed reliably.
The IMap API has two methods for adding and removing an interceptor to the map: addInterceptor
and removeInterceptor. See also the MapInterceptor interface to learn about the methods used to
intercept the changes in a map.
140
public class MapInterceptorMember {
map.put("1", "1");
System.out.println(map.get("1"));
}
@Override
public Object interceptGet(Object value) {
return value + "-foo";
}
@Override
public void afterGet(Object value) {
}
@Override
public Object interceptPut(Object oldValue, Object newValue) {
return null;
}
@Override
public void afterPut(Object value) {
}
@Override
public Object interceptRemove(Object removedValue) {
return null;
}
@Override
public void afterRemove(Object value) {
}
}
}
It is very easy to trigger an out of memory exception (OOME) with query-based map methods,
especially with large clusters or heap sizes. For example, on a cluster with five members having 10
GB of data and 25 GB heap size per member, a single call of IMap.entrySet() fetches 50 GB of data
and crashes the calling instance.
141
A call of IMap.values() may return too much data for a single member. This can also happen with a
real query and an unlucky choice of predicates, especially when the parameters are chosen by a
user of your application.
To prevent this, you can configure a maximum result size limit for query based operations. This is
not a limit like SELECT * FROM map LIMIT 100, which you can achieve by a Paging Predicate. A
maximum result size limit for query based operations is meant to be a last line of defense to
prevent your members from retrieving more data than they can handle.
If the QueryResultSizeLimiter is activated, it calculates a result size limit per partition. Each
QueryOperation runs on all partitions of a member, so it collects result entries as long as the member
limit is not exceeded. If that happens, a QueryResultSizeExceededException is thrown and propagated
to the calling instance.
This feature depends on an equal distribution of the data on the cluster members to calculate the
result size limit per member. Therefore, there is a minimum value defined in
QueryResultSizeLimiter.MINIMUM_MAX_RESULT_LIMIT. Configured values below the minimum will be
increased to the minimum.
Local Pre-check
In addition to the distributed result size check in the QueryOperations, there is a local pre-check on
the calling instance. If you call the method from a client, the pre-check is executed on the member
that invokes the QueryOperations.
Since the local pre-check can increase the latency of a QueryOperation, you can configure how many
local partitions should be considered for the pre-check, or you can deactivate the feature
completely.
Besides the designated query operations, there are other operations that use predicates internally.
Those method calls throw the QueryResultSizeExceededException as well. See the following matrix
for the methods that are covered by the query result size limit.
142
Configuring Query Result Size
The query result size limit is configured via the following system properties.
• hazelcast.query.result.size.limit: Result size limit for query operations on maps. This value
defines the maximum number of returned elements for a single query result. If a query exceeds
this number of elements, a QueryResultSizeExceededException is thrown.
See the System Properties appendix to see the full descriptions of these properties and how to set
them.
7.3. Queue
Hazelcast distributed queue is an implementation of java.util.concurrent.BlockingQueue. Being
distributed, Hazelcast distributed queue enables all cluster members to interact with it. Using
Hazelcast distributed queue, you can add an item in one cluster member and remove it from
another one.
Use the Hazelcast instance’s getQueue method to get the queue, then use the queue’s put method to
put items into the queue.
143
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
BlockingQueue<MyTask> queue = hazelcastInstance.getQueue( "tasks" );
queue.put( new MyTask() );
MyTask task = queue.take();
FIFO ordering applies to all queue operations across the cluster. The user objects (such as MyTask in
the example above) that are enqueued or dequeued have to be Serializable.
Hazelcast distributed queue performs no batching while iterating over the queue. All items are
copied locally and iteration occurs locally.
Hazelcast distributed queue uses ItemListener to listen to the events that occur when items are
added to and removed from the queue. See the Listening for Item Events section for information on
how to create an item listener class and register it.
The following example code illustrates a distributed queue that connects a producer and consumer.
Let’s put one integer on the queue every second, 100 integers total.
Producer puts a -1 on the queue to show that the puts are finished.
144
Taking Items off the Queue
Now, let’s create a Consumer class to take a message from this queue, as shown below.
As seen in the above example code, Consumer waits five seconds before it consumes the next
message. It stops once it receives -1. Also note that Consumer puts -1 back on the queue before the
loop is ended.
When you first start Producer and then start Consumer, items produced on the queue will be
consumed from the same queue.
From the above example code, you can see that an item is produced every second and consumed
every five seconds. Therefore, the consumer keeps growing. To balance the produce/consume
operation, let’s start another consumer. This way, consumption is distributed to these two
consumers, as seen in the example outputs below.
The second consumer is started. After a while, here is the first consumer output:
...
Consumed 13
Consumed 15
Consumer 17
...
145
...
Consumed 14
Consumed 16
Consumer 18
...
In the case of a lot of producers and consumers for the queue, using a list of queues may solve the
queue bottlenecks. In this case, be aware that the order of the messages sent to different queues is
not guaranteed. Since in most cases strict ordering is not important, a list of queues is a good
solution.
The items are taken from the queue in the same order they were put on the queue.
However, if there is more than one consumer, this order is not guaranteed.
Hazelcast gives an itemId for each item you offer, which is an incrementing sequence identification
for the queue items. You should consider the following to understand the itemId assignment
behavior:
• When a Hazelcast member has a queue and that queue is configured to have at least one
backup, and that member is restarted, the itemId assignment resumes from the last known
highest itemId before the restart; itemId assignment does not start from the beginning for the
new items.
• When the whole cluster is restarted, the same behavior explained in the above consideration
applies if your queue has a persistent data store (QueueStore). If the queue has QueueStore, the
itemId for the new items are given, starting from the highest itemId found in the IDs returned by
the method loadAllKeys. If the method loadAllKeys does not return anything, the itemIds starts
from the beginning after a cluster restart.
• The above two considerations mean there are no duplicated itemIds in the memory or in the
persistent data store.
A bounded queue is a queue with a limited capacity. When the bounded queue is full, no more
items can be put into the queue until some items are taken out.
To turn a Hazelcast distributed queue into a bounded queue, set the capacity limit with the max-size
property. You can set the max-size property in the configuration, as shown below. The max-size
element specifies the maximum size of the queue. Once the queue size reaches this value, put
operations are blocked until the queue size goes below max-size, which happens when a consumer
removes items from the queue.
Let’s set 10 as the maximum size of our example queue in Creating an Example Queue.
146
<hazelcast>
...
<queue name="queue">
<max-size>10</max-size>
</queue>
...
</hazelcast>
When the producer is started, ten items are put into the queue and then the queue will not allow
more put operations. When the consumer is started, it will remove items from the queue. This
means that the producer can put more items into the queue until there are ten items in the queue
again, at which point the put operation again becomes blocked.
In this example code, the producer is five times faster than the consumer. It will effectively always
be waiting for the consumer to remove items before it can put more on the queue. For this example
code, if maximum throughput is the goal, it would be a good option to start multiple consumers to
prevent the queue from filling up.
Hazelcast allows you to load and store the distributed queue items from/to a persistent datastore
using the interface QueueStore. If queue store is enabled, each item added to the queue is also stored
at the configured queue store. When the number of items in the queue exceeds the memory limit,
the subsequent items are persisted in the queue store, they are not stored in the queue memory.
The QueueStore interface enables you to store, load and delete queue items with methods like store,
storeAll, load and delete. The following example class includes all of the QueueStore methods.
147
public class TheQueueStore implements QueueStore<Item> {
@Override
public void delete(Long key) {
System.out.println("delete");
}
@Override
public void store(Long key, Item value) {
System.out.println("store");
}
@Override
public void storeAll(Map<Long, Item> map) {
System.out.println("store all");
}
@Override
public void deleteAll(Collection<Long> keys) {
System.out.println("deleteAll");
}
@Override
public Item load(Long key) {
System.out.println("load");
return null;
}
@Override
public Map<Long, Item> loadAll(Collection<Long> keys) {
System.out.println("loadALl");
return null;
}
@Override
public Set<Long> loadAllKeys() {
System.out.println("loadAllKeys");
return null;
}
}
148
<hazelcast>
...
<queue name="queue">
<max-size>10</max-size>
<queue-store>
<class-name>com.hazelcast.QueueStoreImpl</class-name>
<properties>
<property name="binary">false</property>
<property name="memory-limit">1000</property>
<property name="bulk-load">500</property>
</properties>
</queue-store>
</queue>
...
</hazelcast>
The following are the descriptions for each queue store property:
• Binary: By default, Hazelcast stores the queue items in serialized form, and before it inserts the
queue items into the queue store, it deserializes them. If you are not reaching the queue store
from an external application, you might prefer that the items be inserted in binary form. Do this
by setting the binary property to true: then you can get rid of the deserialization step, which is a
performance optimization. The binary property is false by default.
• Memory Limit: This is the number of items after which Hazelcast stores items only to the
datastore. For example, if the memory limit is 1000, then the 1001st item is put only to the
datastore. This feature is useful when you want to avoid out-of-memory conditions. If you want
to always use memory, you can set it to Integer.MAX_VALUE. The default number for memory-limit
is 1000.
• Bulk Load: When the queue is initialized, items are loaded from QueueStore in bulks. Bulk load
is the size of these bulks. The default value of bulk-load is 250.
Queues can be configured to check for a minimum number of available members before applying
queue operations (see the Split-Brain Protection section). This is a check to avoid performing
successful queue operations on all parts of a cluster during a network partition.
The following is a list of methods, grouped by the protection types, that support split-brain
protection checks:
• WRITE, READ_WRITE
◦ Collection.addAll()
◦ Collection.removeAll(), Collection.retainAll()
◦ BlockingQueue.drainTo()
◦ IQueue.poll(), Queue.remove(), IQueue.take()
149
◦ BlockingQueue.remove()
• READ, READ_WRITE
◦ Collection.clear()
◦ Collection.containsAll(), BlockingQueue.contains()
◦ Collection.isEmpty()
◦ Collection.iterator(), Collection.toArray()
◦ Queue.peek(), Queue.element()
◦ Collection.size()
◦ BlockingQueue.remainingCapacity()
The following are examples of queue configurations. It includes the QueueStore configuration, which
is explained in the Queueing with Persistent Datastore section.
Declarative Configuration:
<hazelcast>
...
<queue name="default">
<max-size>0</max-size>
<backup-count>1</backup-count>
<async-backup-count>0</async-backup-count>
<empty-queue-ttl>-1</empty-queue-ttl>
<item-listeners>
<item-listener>com.hazelcast.examples.ItemListener</item-listener>
</item-listeners>
<statistics-enabled>true</statistics-enabled>
<queue-store>
<class-name>com.hazelcast.QueueStoreImpl</class-name>
<properties>
<property name="binary">false</property>
<property name="memory-limit">10000</property>
<property name="bulk-load">500</property>
</properties>
</queue-store>
<split-brain-protection-ref>splitbrainprotection-name</split-brain-protection-
ref>
</queue>
...
</hazelcast>
Programmatic Configuration:
150
Config config = new Config();
QueueConfig queueConfig = config.getQueueConfig("default");
queueConfig.setName("MyQueue")
.setBackupCount(1)
.setMaxSize(0)
.setStatisticsEnabled(true)
.setSplitBrainProtectionName("splitbrainprotectionname");
queueConfig.getQueueStoreConfig()
.setEnabled(true)
.setClassName("com.hazelcast.QueueStoreImpl")
.setProperty("binary", "false");
config.addQueueConfig(queueConfig);
Hazelcast distributed queue has one synchronous backup by default. By having this backup, when a
cluster member with a queue goes down, another member having the backup of that queue will
continue. Therefore, no items are lost. You can define the number of synchronous backups for a
queue using the backup-count element in the declarative configuration. A queue can also have
asynchronous backups: you can define the number of asynchronous backups using the async-
backup-count element.
To set the maximum size of the queue, use the max-size element. To purge unused or empty queues
after a period of time, use the empty-queue-ttl element. If you define a value (time in seconds) for
the empty-queue-ttl element, then your queue will be destroyed if it stays empty or unused for the
time in seconds that you give.
The following is the full list of queue configuration elements with their descriptions:
• max-size: Maximum number of items in the queue. It is used to set an upper bound for the
queue. You will not be able to put more items when the queue reaches to this maximum size
whether you have a queue store configured or not.
• empty-queue-ttl: Used to purge unused or empty queues. If you define a value (time in seconds)
for this element, then your queue will be destroyed if it stays empty or unused for that time.
• item-listeners: Adds listeners (listener classes) for the queue items. You can also set the
attribute include-value to true if you want the item event to contain the item values. You can set
local to true if you want to listen to the items on the local member.
• queue-store: Includes the queue store factory class name and the properties binary, memory
limit and bulk load. See the Queueing with Persistent Datastore section.
• statistics-enabled: Specifies whether the statistics gathering is enabled for your queue. If set
to false, you cannot collect statistics in your implementation (using getLocalQueueStats()) and
also Hazelcast Management Center will not show them. Its default value is true.
151
• split-brain-protection-ref : Name of the split-brain protection configuration that you want this
queue to use.
7.4. MultiMap
Hazelcast MultiMap is a specialized map where you can store multiple values under a single key. Just
like any other distributed data structure implementation in Hazelcast, MultiMap is distributed and
thread-safe.
The following example creates a MultiMap and puts items into it:
We use the getMultiMap method to create the MultiMap and then use the put method to put an entry
into it.
Now let’s print the entries in this MultiMap using the following code:
map.put("a", "1");
map.put("a", "2");
map.put("b", "3");
System.out.printf("PutMember:Done");
After you run ExampleMultiMap, run PrintMember. You will see the key a has two values, as shown
below:
152
b → [3]
a → [2, 1]
Hazelcast MultiMap uses EntryListener to listen to events which occur when entries are added to,
updated in or removed from the MultiMap. See the Listening for MultiMap Events section for
information on how to create an entry listener class and register it.
When using MultiMap, the collection type of the values can be either Set or List. Configure the
collection type with the valueCollectionType parameter. If you choose Set, duplicate and null values
are not allowed in your collection and ordering is irrelevant. If you choose List, ordering is
relevant and your collection can include duplicate and null values.
You can also enable statistics for your MultiMap with the statisticsEnabled parameter. If you
enable statisticsEnabled, statistics can be retrieved with getLocalMultiMapStats() method.
Declarative Configuration:
<hazelcast>
...
<multimap name="default">
<backup-count>0</backup-count>
<async-backup-count>1</async-backup-count>
<value-collection-type>SET</value-collection-type>
<entry-listeners>
<entry-listener include-value="false" local="false"
>com.hazelcast.examples.EntryListener</entry-listener>
</entry-listeners>
<split-brain-protection-ref>split-brain-protection-name</split-brain-
protection-ref>
</multimap>
...
</hazelcast>
Programmatic Configuration:
153
• backup-count: Defines the number of synchronous backups. For example, if it is set to 1, backup
of a partition will be placed on one other member. If it is 2, it will be placed on two other
members.
• async-backup-count: The number of asynchronous backups. Behavior is the same as that of the
backup-count element.
• statistics-enabled: Specifies whether the statistics gathering is enabled for your MultiMap. If
set to false, you cannot collect statistics in your implementation (using getLocalMultiMapStats())
and also Hazelcast Management Center will not show them. Its default value is true.
• entry-listeners: Lets you add listeners (listener classes) for the map entries. You can also set the
attribute include-value to true if you want the item event to contain the entry values. You can
set local to true if you want to listen to the entries on the local member.
• split-brain-protection-ref: Name of the split-brain protection configuration that you want this
MultiMap to use. See the Split-Brain Protection for MultiMap and TransactionalMultiMap
section.
The following is a list of methods that support split-brain protection checks. The list is grouped by
the protection types.
MultiMap:
• WRITE, READ_WRITE:
◦ clear
◦ forceUnlock
◦ lock
◦ put
◦ remove
◦ tryLock
◦ unlock
• READ, READ_WRITE:
◦ containsEntry
◦ containsKey
◦ containsValue
◦ entrySet
◦ get
◦ isLocked
◦ keySet
154
◦ localKeySet
◦ size
◦ valueCount
◦ values
TransactionalMultiMap:
• WRITE, READ_WRITE:
◦ put
◦ remove
• READ, READ_WRITE:
◦ size
◦ get
◦ valueCount
Split-brain protection for MultiMap can be configured programmatically using the method
setSplitBrainProtectionName(), or declaratively using the element split-brain-protection-ref.
Following is an example declarative configuration:
<hazelcast>
...
<multimap name="default">
<split-brain-protection-ref>splitbrainprotection-name</split-brain-protection-
ref>
</multimap>
...
</hazelcast>
7.5. Set
Hazelcast Set (ISet) is a distributed and concurrent implementation of java.util.Set. It has the
following features:
• Hazelcast Set is a non-partitioned data structure: all the data that belongs to a set lives on one
single partition in that member.
• Hazelcast Set cannot be scaled beyond the capacity of a single machine. Since the whole set lives
on a single partition, storing a large amount of data on a single set may cause memory pressure.
Therefore, you should use multiple sets to store a large amount of data. This way, all the sets are
155
spread across the cluster, sharing the load.
• A backup of Hazelcast Set is stored on a partition of another member in the cluster so that data
is not lost in the event of a primary member failure.
• All items are copied to the local member and iteration occurs locally.
• The equals method implemented in Hazelcast Set uses a serialized byte version of objects, as
opposed to java.util.HashSet.
Use the HazelcastInstances getSet method to get the Set, then use the add method to put items into it.
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
ISet<String> set = hz.getSet("set");
set.add("Tokyo");
set.add("Paris");
set.add("London");
set.add("New York");
System.out.println("Putting finished!");
Hazelcast Set uses ItemListener to listen to events that occur when items are added to and removed
from the Set. See the Listening for Item Events section for information on how to create an item
listener class and register it.
Declarative Configuration:
<hazelcast>
...
<set name="default">
<backup-count>1</backup-count>
<async-backup-count>0</async-backup-count>
<max-size>10</max-size>
<item-listeners>
<item-listener>com.hazelcast.examples.ItemListener</item-listener>
</item-listeners>
<split-brain-protection-ref>splitbrainprotection-name</split-brain-protection-
ref>
</set>
...
</hazelcast>
Programmatic Configuration:
156
Config config = new Config();
CollectionConfig collectionSet = config.getSetConfig("MySet");
collectionSet.setBackupCount(1)
.setMaxSize(10)
.setSplitBrainProtectionName("splitbrainprotectionname");
• statistics-enabled: True (default) if statistics gathering is enabled on the Set, false otherwise.
• max-size: The maximum number of entries for this Set. It can be any number between 0 and
Integer.MAX_VALUE. Its default value is 0, meaning there is no capacity constraint.
• item-listeners: Lets you add listeners (listener classes) for the list items. You can also set the
attributes include-value to true if you want the item event to contain the item values. You can
set local to true if you want to listen to the items on the local member.
• split-brain-protection-ref: Name of the split-brain protection configuration that you want this
Set to use. See the Split-Brain Protection for ISet and TransactionalSet section.
ISet & TransactionalSet can be configured to check for a minimum number of available members
before applying queue operations (see the Split-Brain Protection section). This is a check to avoid
performing successful queue operations on all parts of a cluster during a network partition.
The following is a list of methods, grouped by the protection types, that support split-brain
protection checks:
ISet:
• WRITE, READ_WRITE:
◦ add
◦ addAll
◦ clear
◦ remove
◦ removeAll
• READ, READ_WRITE:
◦ contains
◦ containsAll
◦ isEmpty
◦ iterator
157
◦ size
◦ toArray
TransactionalSet:
• WRITE, READ_WRITE:
◦ add
◦ remove
• READ, READ_WRITE:
◦ size
Split-brain protection for ISet can be configured programmatically using the method
setSplitBrainProtectionName(), or declaratively using the element split-brain-protection-ref. The
following is an example declarative configuration:
<hazelcast>
...
<set name="default">
<split-brain-protection-ref>splitbrainprotection-name</split-brain-protection-
ref>
</set>
...
</hazelcast>
7.6. List
Hazelcast List (IList) is similar to Hazelcast Set, but it also allows duplicate elements.
• Besides allowing duplicate elements, Hazelcast List preserves the order of elements.
• Hazelcast List is a non-partitioned data structure where values and each backup are
represented by their own single partition.
158
While IMap and ICache are the recommended data structures to be used by
Hazelcast Jet, IList can also be used by it for unit testing or similar non-production
situations. See here in the Hazelcast Jet Reference Manual to learn how Jet can use
IList, e.g., how it can fill IList with data, consume it in a Jet job and drain the
results to another IList. See also the Fast Batch Processing and Real-Time Stream
Processing use cases for Hazelcast Jet.
Use the HazelcastInstances getList method to get the List, then use the add method to put items into
it.
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
IList<String> list = hz.getList("list");
list.add("Tokyo");
list.add("Paris");
list.add("London");
list.add("New York");
System.out.println("Putting finished!");
Hazelcast List uses ItemListener to listen to events that occur when items are added to and removed
from the List. See the Listening for Item Events section for information on how to create an item
listener class and register it.
Declarative Configuration:
<hazelcast>
...
<list name="default">
<backup-count>1</backup-count>
<async-backup-count>0</async-backup-count>
<max-size>10</max-size>
<item-listeners>
<item-listener>
com.hazelcast.examples.ItemListener
</item-listener>
</item-listeners>
<split-brain-protection-ref>splitbrainprotection-name</split-brain-protection-
ref>
</list>
...
</hazelcast>
159
Programmatic Configuration:
• statistics-enabled: True (default) if statistics gathering is enabled on the list, false otherwise.
• item-listeners: Lets you add listeners (listener classes) for the list items. You can also set the
attribute include-value to true if you want the item event to contain the item values. You can set
the attribute local to true if you want to listen the items on the local member.
• split-brain-protection-ref: Name of the split-brain protection configuration that you want this
List to use. See the Split-Brain Protection for IList and TransactionalList section.
IList & TransactionalList can be configured to check for a minimum number of available members
before applying queue operations (see the Split-Brain Protection section). This is a check to avoid
performing successful queue operations on all parts of a cluster during a network partition.
The following is a list of methods, grouped by the protection types, that support split-brain
protection checks:
IList:
• WRITE, READ_WRITE:
◦ add
◦ addAll
◦ clear
◦ remove
◦ removeAll
◦ set
• READ, READ_WRITE:
◦ add
◦ contains
◦ containsAll
160
◦ get
◦ indexOf
◦ isEmpty
◦ iterator
◦ lastIndexOf
◦ listIterator
◦ size
◦ subList
◦ toArray
TransactionalList:
• WRITE, READ_WRITE:
◦ add
◦ remove
• READ, READ_WRITE:
◦ size
Split-brain protection for IList can be configured programmatically using the method
setSplitBrainProtectionName(), or declaratively using the element split-brain-protection-ref.
Following is an example declarative configuration:
<hazelcast>
...
<list name="default">
<split-brain-protection-ref>splitbrainprotection-name</split-brain-protection-
ref>
</list>
...
</hazelcast>
7.7. Ringbuffer
Hazelcast Ringbuffer is a replicated but not partitioned data structure that stores its data in a ring-
like structure. You can think of it as a circular array with a given capacity. Each Ringbuffer has a tail
and a head. The tail is where the items are added and the head is where the items are overwritten
or expired. You can reach each element in a Ringbuffer using a sequence ID, which is mapped to
the elements between the head and tail (inclusive) of the Ringbuffer.
161
7.7.1. Getting a Ringbuffer and Reading Items
Reading from Ringbuffer is simple: get the Ringbuffer with the HazelcastInstance getRingbuffer
method, get its current head with the headSequence method and start reading. Use the method
readOne to return the item at the given sequence; readOne blocks if no item is available. To read the
next item, increment the sequence by one.
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
Ringbuffer<String> ringbuffer = hz.getRingbuffer("rb");
long sequence = ringbuffer.headSequence();
while(true){
String item = ringbuffer.readOne(sequence);
sequence++;
// process item
}
By exposing the sequence, you can now move the item from the Ringbuffer as long as the item is
still available. If the item is not available any longer, StaleSequenceException is thrown.
Adding an item to a Ringbuffer is also easy with the Ringbuffer add method:
Use the method add to return the sequence of the inserted item; the sequence value is always
unique. You can use this as a very cheap way of generating unique IDs if you are already using
Ringbuffer.
Hazelcast Ringbuffer can sometimes be a better alternative than an Hazelcast IQueue. Unlike
IQueue, Ringbuffer does not remove the items, it only reads items using a certain position. There
are many advantages to this approach as described below:
• The same item can be read multiple times by the same thread. This is useful for realizing
semantics of read-at-least-once or read-at-most-once.
• The same item can be read by multiple threads. Normally you could use an IQueue per thread
for the same semantic, but this is less efficient because of the increased remoting. A take from
an IQueue is destructive, so the change needs to be applied for backup also, which is why a
queue.take() is more expensive than a ringBuffer.read(…).
• Reads are extremely cheap since there is no change in the Ringbuffer. Therefore no replication
is required.
• Reads and writes can be batched to speed up performance. Batching can dramatically improve
the performance of Ringbuffer.
162
7.7.4. Configuring Ringbuffer Capacity
By default, a Ringbuffer is configured with a capacity of 10000 items. This creates an array with a
size of 10000. If a time-to-live is configured, then an array of longs is also created that stores the
expiration time for every item. In a lot of cases you may want to change this capacity number to
something that better fits your needs.
<hazelcast>
...
<ringbuffer name="rb">
<capacity>2000</capacity>
</ringbuffer>
...
</hazelcast>
Currently, Hazelcast Ringbuffer is not a partitioned data structure; its data is stored in a single
partition and the replicas are stored in another partition. Therefore, create a Ringbuffer that can
safely fit in a single cluster member.
Hazelcast Ringbuffer has a single synchronous backup by default. You can control the Ringbuffer
backup just like most of the other Hazelcast distributed data structures by setting the synchronous
and asynchronous backups: backup-count and async-backup-count. In the example below, a
Ringbuffer is configured with no synchronous backups and one asynchronous backup:
<hazelcast>
...
<ringbuffer name="rb">
<backup-count>0</backup-count>
<async-backup-count>1</async-backup-count>
</ringbuffer>
...
</hazelcast>
An asynchronous backup probably gives you better performance. However, there is a chance that
the item added will be lost when the member owning the primary crashes before the backup could
complete. You may want to consider batching methods if you need high performance but do not
want to give up on consistency.
You can configure Hazelcast Ringbuffer with a time-to-live in seconds. Using this setting, you can
control how long the items remain in the Ringbuffer before they are expired. By default, the time-
to-live is set to 0, meaning that unless the item is overwritten, it will remain in the Ringbuffer
163
indefinitely. If you set a time-to-live and an item is added, then, depending on the Overflow Policy,
either the oldest item is overwritten, or the call is rejected.
<hazelcast>
...
<ringbuffer name="rb">
<time-to-live-seconds>180</time-to-live-seconds>
</ringbuffer>
...
</hazelcast>
Using the overflow policy, you can determine what to do if the oldest item in the Ringbuffer is not
old enough to expire when more items than the configured Ringbuffer capacity are being added.
The below options are currently available:
• OverflowPolicy.FAIL: The call is aborted. The methods that make use of the OverflowPolicy
return -1 to indicate that adding the item has failed.
Overflow policy gives you fine control on what to do if the Ringbuffer is full. You can also use the
overflow policy to apply a back pressure mechanism. The following example code shows the usage
of an exponential backoff.
164
Random random = new Random();
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
Ringbuffer<Long> rb = hz.getRingbuffer("rb");
long i = 100;
while (true) {
long sleepMs = 100;
for (; ; ) {
long result = rb.addAsync(i, OverflowPolicy.FAIL).toCompletableFuture().get();
if (result != -1) {
break;
}
TimeUnit.MILLISECONDS.sleep(sleepMs);
sleepMs = min(5000, sleepMs * 2);
}
Hazelcast allows you to load and store the Ringbuffer items from/to a persistent datastore using the
interface RingbufferStore. If a Ringbuffer store is enabled, each item added to the Ringbuffer will
also be stored at the configured Ringbuffer store.
If the Ringbuffer store is configured, you can get items with sequences which are no longer in the
actual Ringbuffer but are only in the Ringbuffer store. This is probably much slower but still allows
you to continue consuming items from the Ringbuffer even if they are overwritten with newer
items in the Ringbuffer.
When a Ringbuffer is being instantiated, it checks if the Ringbuffer store is configured and requests
the latest sequence in the Ringbuffer store. This is to enable the Ringbuffer to start with sequences
larger than the ones in the Ringbuffer store. In this case, the Ringbuffer is empty but you can still
request older items from it (which will be loaded from the Ringbuffer store).
The Ringbuffer store stores items in the same format as the Ringbuffer. If the BINARY in-memory
format is used, the Ringbuffer store must implement the interface RingbufferStore<byte[]> meaning
that the Ringbuffer receives items in the binary format. If the OBJECT in-memory format is used, the
Ringbuffer store must implement the interface RingbufferStore<K>, where K is the type of item being
stored (meaning that the Ringbuffer store receives the deserialized object).
When adding items to the Ringbuffer, the method storeAll allows you to store items in batches.
165
public class TheRingbufferObjectStore implements RingbufferStore<Item> {
@Override
public void store(long sequence, Item data) {
System.out.println("Object store");
}
@Override
public void storeAll(long firstItemSequence, Item[] items) {
System.out.println("Object store all");
}
@Override
public Item load(long sequence) {
System.out.println("Object load");
return null;
}
@Override
public long getLargestSequence() {
System.out.println("Object get largest sequence");
return -1;
}
}
Item must be serializable. The following is an example of a Ringbuffer with the Ringbuffer store
configured and enabled.
<hazelcast>
...
<ringbuffer name="default">
<capacity>10000</capacity>
<time-to-live-seconds>30</time-to-live-seconds>
<backup-count>1</backup-count>
<async-backup-count>0</async-backup-count>
<in-memory-format>BINARY</in-memory-format>
<ringbuffer-store>
<class-name>com.hazelcast.RingbufferStoreImpl</class-name>
</ringbuffer-store>
</ringbuffer>
...
</hazelcast>
The following are the explanations for the Ringbuffer store configuration elements:
166
RingbufferStore interface.
You can configure Hazelcast Ringbuffer with an in-memory format that controls the format of the
Ringbuffer’s stored items. By default, BINARY in-memory format is used, meaning that the object is
stored in a serialized form. You can select the OBJECT in-memory format, which is useful when
filtering is applied or when the OBJECT in-memory format has a smaller memory footprint than
BINARY.
In the declarative configuration example below, a Ringbuffer is configured with the OBJECT in-
memory format:
<hazelcast>
...
<ringbuffer name="rb">
<in-memory-format>OBJECT</in-memory-format>
</ringbuffer>
...
</hazelcast>
Ringbuffer can be configured to check for a minimum number of available members before
applying Ringbuffer operations. This is a check to avoid performing successful Ringbuffer
operations on all parts of a cluster during a network partition and can be configured using the
element split-brain-protection-ref. You should set this element’s value as the quorum’s name,
which you configured under the split-brain-protection element as explained in the Split-Brain
Protection section. Following is an example snippet:
<hazelcast>
...
<ringbuffer name="rb">
<split-brain-protection-ref>quorumname</split-brain-protection-ref>
</ringbuffer>
...
</hazelcast>
The following is a list of methods, grouped by the protection types, that support split-brain
protection checks:
• WRITE, READ_WRITE:
◦ add
◦ addAllAsync
◦ addAsync
167
• READ, READ_WRITE:
◦ capacity
◦ headSequence
◦ readManyAsync
◦ readOne
◦ remainingCapacity
◦ size
◦ tailSequence
In the previous examples, the method ringBuffer.add() is used to add an item to the Ringbuffer.
The problems with this method are that it always overwrites and that it does not support batching.
Batching can have a huge impact on the performance. You can use the method addAllAsync to
support batching.
In the above case, three strings are added to the Ringbuffer using the policy
OverflowPolicy.OVERWRITE. See the Overflow Policy section for more information.
In the previous example, the readOne method read items from the Ringbuffer. readOne is simple but
not very efficient for the following reasons:
• readOne cannot filter items at the source; the items need to be retrieved before being filtered.
The method readManyAsync can read a batch of items and can filter items at the source.
CompletionStage<ReadResultSet<E>> readManyAsync(
long startSequence,
int minCount,
int maxCount,
IFunction<E, Boolean> filter);
168
• startSequence: Sequence of the first item to read.
• minCount: Minimum number of items to read. If you do not want to block, set it to 0. If you want
to block for at least one item, set it to 1.
• maxCount: Maximum number of the items to retrieve. Its value cannot exceed 1000.
• filter: A function that accepts an item and checks if it should be returned. If no filtering should
be applied, set it to null.
Please take a careful look at how your sequence is being incremented. You cannot always rely on
the number of items being returned if the items are filtered out.
There is not any filtering applied in the above example. The following example shows how you can
apply a filter when reading batched items. First, let’s create our filter as shown below:
So, the FruitFilter checks whether a String object starts with the letter "a". You can see this filter in
action in the below example:
169
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
Ringbuffer<String> rb = hz.getRingbuffer("rb");
rb.add("apple");
rb.add("orange");
rb.add("pear");
rb.add("peach");
rb.add("avocado");
ReadResultSet<String> rs = f.join();
for (String s : rs) {
System.out.println(s);
}
Hazelcast Ringbuffer provides asynchronous methods for more powerful operations like batched
writing or batched reading with filtering. To wait for the result of the operation in a blocking way,
obtain a CompletableFuture from the returned CompletionStage by invoking
CompletionStage#toCompletableFuture() method, then use either CompletableFuture#get() or
CompletableFuture#join().
However, you can also use CompletionStage API to add subsequent dependent computation stages
which will be executed when the operation has completed. This way the thread used for the call is
not blocked until the response is returned.
See the below code as an example of when you want to get notified when a batch of reads has
completed.
170
CompletionStage<ReadResultSet<String>> stage = rb.readManyAsync(sequence, min, max,
someFilter);
stage.whenCompleteAsync((response, throwable) -> {
if (throwable == null) {
for (String s : response) {
System.out.println("Received:" + s);
}
} else {
throwable.printStackTrace();
}
});
The following shows the declarative configuration of a Ringbuffer called rb. The configuration is
modeled after the Ringbuffer defaults.
<hazelcast>
...
<ringbuffer name="rb">
<capacity>10000</capacity>
<backup-count>1</backup-count>
<async-backup-count>0</async-backup-count>
<time-to-live-seconds>0</time-to-live-seconds>
<in-memory-format>BINARY</in-memory-format>
<split-brain-protection-ref>quorumname</split-brain-protection-ref>
</ringbuffer>
...
</hazelcast>
You can also configure a Ringbuffer programmatically. The following is a programmatic version of
the above declarative configuration.
7.8. Topic
Hazelcast provides a distribution mechanism for publishing messages that are delivered to multiple
subscribers. This is also known as a publish/subscribe (pub/sub) messaging model. Publishing and
171
subscribing operations are cluster wide. When a member subscribes to a topic, it is actually
registering for messages published by any member in the cluster, including the new members that
joined after you add the listener.
Publish operation is async. It does not wait for operations to run in remote
members; it works as fire and forget.
Use the HazelcastInstance’s getTopic method to get the topic, then use the topic’s publish method to
publish your messages. The following is an example publisher:
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
ITopic<Date> topic = hz.getTopic("topic");
topic.publish(new Date());
}
}
Hazelcast Topic uses the MessageListener interface to listen for events that occur when a message is
received. See the Listening for Topic Messages section for information on how to create a message
listener class and register it.
Topic has two statistic variables that you can query. These values are incremental and local to the
172
member.
myTopic.getLocalTopicStats().getPublishOperationCount();
myTopic.getLocalTopicStats().getReceiveOperationCount();
You can disable this feature with topic configuration. See the Configuring Topic section.
These statistics values can be also viewed in Management Center. See the
Monitoring Topics section in Hazelcast Management Center Reference Manual.
Each cluster member has a list of all registrations in the cluster. When a new member is registered
for a topic, it sends a registration message to all members in the cluster. Also, when a new member
joins the cluster, it receives all registrations made so far in the cluster.
The behavior of a topic varies depending on the value of the configuration parameter
globalOrderEnabled.
If globalOrderEnabled is disabled, messages are not ordered and listeners (subscribers) process the
messages in the order that the messages are published. If cluster member M publishes messages
m1, m2, m3, …, mn to a topic T, then Hazelcast makes sure that all of the subscribers of topic T
receive and process m1, m2, m3, …, mn in the given order.
Here is how it works: Let’s say that we have three members (member1, member2 and member3)
and that member1 and member2 are registered to a topic named news. Note that all three members
know that member1 and member2 are registered to news.
In this example, member1 publishes two messages: a1 and a2. Member3 publishes two messages:
c1 and c2. When member1 and member3 publish a message, they check their local list for
registered members, discover that member1 and member2 are in their lists, and then they fire
messages to those members. One possible order of the messages received could be the following.
If globalOrderEnabled is enabled, all members listening to the same topic get its messages in the
173
same order.
Here is how it works. Let’s say that we have three members (member1, member2 and member3)
and that member1 and member2 are registered to a topic named news. Note that all three members
know that member1 and member2 are registered to news.
In this example, member1 publishes two messages: a1 and a2. Member3 publishes two messages:
c1 and c2. When a member publishes messages over the topic news, it first calculates which partition
the news ID corresponds to. Then it sends an operation to the owner of the partition for that
member to publish messages. Let’s assume that news corresponds to a partition that member2
owns. member1 and member3 first sends all messages to member2. Assume that the messages are
published in the following order:
member2 then publishes these messages by looking at registrations in its local list. It sends these
messages to member1 and member2 (it makes a local dispatch for itself).
This way we guarantee that all members see the events in the same order.
In both cases, there is a StripedExecutor in EventService that is responsible for dispatching the
received message. For all events in Hazelcast, the order that events are generated and the order
they are published to the user are guaranteed to be the same via this StripedExecutor.
To configure a topic, set the topic name, decide on statistics and global ordering, and set the
message listeners. The following are the default values:
Declarative Configuration:
174
<hazelcast>
...
<topic name="yourTopicName">
<global-ordering-enabled>true</global-ordering-enabled>
<statistics-enabled>true</statistics-enabled>
<message-listeners>
<message-listener>MessageListenerImpl</message-listener>
</message-listeners>
</topic>
...
</hazelcast>
Programmatic Configuration:
• statistics-enabled: Specifies whether the statistics gathering is enabled for your topic. If set
to false, you cannot collect statistics in your implementation (using getLocalTopicStats()) and
also Hazelcast Management Center will not show them. Its default value is true.
• message-listeners: Lets you add listeners (listener classes) for the topic messages.
Besides the above elements, there are the following system properties that are topic related but not
topic specific:
For the descriptions of these parameters, see the Global Event Configuration section.
175
7.9. Reliable Topic
Reliable Topic uses the same ITopic interface as a regular topic. The main difference is that Reliable
Topic is backed up by the Ringbuffer data structure. The following are the advantages of this
approach:
• Events are not lost since the Ringbuffer is configured with one synchronous backup by default.
• Each Reliable ITopic gets its own Ringbuffer; if a topic has a very fast producer, it will not lead
to problems at topics that run at a slower pace.
• Since the event system behind a regular ITopic is shared with other data structures, e.g.,
collection listeners, you can run into isolation problems. This does not happen with the Reliable
ITopic.
while (true) {
topic.publish(messageId);
messageId++;
System.out.println("Written: " + messageId);
sleepMillis(random.nextInt(100));
}
}
public static boolean sleepMillis(int millis) {
try {
MILLISECONDS.sleep(millis);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return false;
}
return true;
}
}
176
public class SubscribedMember {
When you create a Reliable Topic, Hazelcast automatically creates a Ringbuffer for it. You may
configure this Ringbuffer by adding a Ringbuffer config with the same name as the Reliable Topic.
For instance, if you have a Reliable Topic with the name "sometopic", you should add a Ringbuffer
config with the name "sometopic" to configure the backing Ringbuffer. Some of the things that you
may configure are the capacity, the time-to-live for the topic messages, and you can even add a
Ringbuffer store which allows you to have a persistent topic. By default, a Ringbuffer does not have
any TTL (time-to-live) and it has a limited capacity; you may want to change that configuration. The
following is an example configuration for the "sometopic" given above.
<hazelcast>
...
<!-- This is the ringbuffer that is used by the 'sometopic' Reliable-topic. As you
can see the
ringbuffer has the same name as the topic. -->
<ringbuffer name="sometopic">
<capacity>1000</capacity>
<time-to-live-seconds>5</time-to-live-seconds>
</ringbuffer>
<reliable-topic name="sometopic">
<topic-overload-policy>BLOCK</topic-overload-policy>
</reliable-topic>
...
</hazelcast>
See the Configuring Reliable Topic section below for the descriptions of all Reliable Topic
configuration elements.
By default, the Reliable ITopic uses a shared thread pool. If you need a better isolation, you can
configure a custom executor on the ReliableTopicConfig.
Because the reads on a Ringbuffer are not destructive, batching is easy to apply. ITopic uses read
batching and reads ten items at a time (if available) by default. See Reading Batched Items for more
information.
177
7.9.1. Slow Consumers
The Reliable ITopic provides control and a way to deal with slow consumers. It is unwise to keep
events for a slow consumer in memory indefinitely since you do not know when the slow consumer
is going to catch up. You can control the size of the Ringbuffer by using its capacity. For the cases
when a Ringbuffer runs out of its capacity, you can specify the following policies for the
TopicOverloadPolicy configuration:
• DISCARD_OLDEST: Overwrite the oldest item, even if a TTL is set. In this case the fast producer
supersedes a slow consumer.
Declarative Configuration:
<hazelcast>
...
<reliable-topic name="default">
<statistics-enabled>true</statistics-enabled>
<message-listeners>
<message-listener>
...
</message-listener>
</message-listeners>
<read-batch-size>10</read-batch-size>
<topic-overload-policy>BLOCK</topic-overload-policy>
</reliable-topic>
...
</hazelcast>
Programmatic Configuration:
• statistics-enabled: Specifies whether the statistics gathering is enabled for your Reliable Topic.
If set to false, you cannot collect statistics in your implementation and also Hazelcast
178
Management Center will not show them. Its default value is true.
• message-listener: Message listener class that listens to the messages when they are added or
removed.
• read-batch-size: Minimum number of messages that Reliable Topic tries to read in batches. Its
default value is 10.
7.10. FencedLock
FencedLock is a member of CP Subsystem API. For detailed information, see the CP
Subsystem chapter.
For detailed information and configuration, see the FencedLock section under the
CP Subsystem chapter.
Always use locks with try-catch blocks. This ensures that locks are released if an exception is
thrown from the code in a critical section. Also note that the lock method is outside the try-catch
block because we do not want to unlock if the lock operation itself fails.
If a lock is not released in the cluster, another thread that is trying to get the lock can wait forever.
To avoid this, use tryLock with a timeout value. You can set a high value (normally it should not take
that long) for tryLock. You can check the return value of tryLock as follows:
179
if ( lock.tryLock ( 10, TimeUnit.SECONDS ) ) {
try {
// do some stuff here..
} finally {
lock.unlock();
}
} else {
// warning
}
• Locks are fail-safe. If a member holds a lock and some other members go down, the cluster will
keep your locks safe and available. Moreover, when a member leaves the cluster, all the locks
acquired by that dead member will be removed so that those locks are immediately available
for live members.
• Locks are not automatically removed. If a lock is not used anymore, Hazelcast does not
automatically perform garbage collection in the lock. This can lead to an OutOfMemoryError. If
you create locks on the fly, make sure they are destroyed.
• Locks are re-entrant. The same thread can lock multiple times on the same lock. Note that for
other threads to be able to require this lock, the owner of the lock must call unlock as many
times as the owner called lock.
7.11. IAtomicLong
IAtomicLong is a member of CP Subsystem API. For detailed information, see the CP
Subsystem chapter.
The following example code creates an instance, increments it by a million and prints the count.
180
When you start other instances with the code above, you will see the count as member count times
a million.
You can send functions to an IAtomicLong. IFunction is a Hazelcast owned, single method interface.
The following example IFunction implementation adds two to the original value.
• apply: Applies the function to the value in IAtomicLong without changing the actual value and
returning the result.
• alter: Alters the value stored in the IAtomicLong by applying the function. It does not send back
a result.
• alterAndGet: Alters the value stored in the IAtomicLong by applying the function, storing the
result in the IAtomicLong and returning the result.
• getAndAlter: Alters the value stored in the IAtomicLong by applying the function and returning
the original value.
181
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
IAtomicLong atomicLong = hazelcastInstance.getCPSubsystem().getAtomicLong( "counter"
);
atomicLong.set( 1 );
long result = atomicLong.apply( new Add2Function() );
System.out.println( "apply.result: " + result);
System.out.println( "apply.value: " + atomicLong.get() );
atomicLong.set( 1 );
atomicLong.alter( new Add2Function() );
System.out.println( "alter.value: " + atomicLong.get() );
atomicLong.set( 1 );
result = atomicLong.alterAndGet( new Add2Function() );
System.out.println( "alterAndGet.result: " + result );
System.out.println( "alterAndGet.value: " + atomicLong.get() );
atomicLong.set( 1 );
result = atomicLong.getAndAlter( new Add2Function() );
System.out.println( "getAndAlter.result: " + result );
System.out.println( "getAndAlter.value: " + atomicLong.get() );
apply.result: 3
apply.value: 1
alter.value: 3
alterAndGet.result: 3
alterAndGet.value: 3
getAndAlter.result: 1
getAndAlter.value: 3
The reason for using a function instead of a simple code line like atomicLong.set(atomicLong.get() +
2)); is that the IAtomicLong read and write operations are not atomic. Since IAtomicLong is a
distributed implementation, those operations can be remote ones, which may lead to race
problems. By using functions, the data is not pulled into the code, but the code is sent to the data.
This makes it more scalable.
7.12. ISemaphore
ISemaphore is a member of CP Subsystem API. For detailed information, see the CP
Subsystem chapter.
182
7.12.1. Controlling Thread Counts with Permits
Semaphores offer permits to control the thread counts when performing concurrent activities. To
execute a concurrent activity, a thread grants a permit or waits until a permit becomes available.
When the execution is completed, the permit is released.
ISemaphore with a single permit may be considered as a lock. Unlike the locks,
when semaphores are used, any thread can release the permit depending on the
configuration, and semaphores can have multiple permits. For more information,
see the Semaphore Configuration section.
Hazelcast ISemaphore does not support fairness at all times. There are some edge
cases where the fairness is not honored, e.g., when the permit becomes available
at the time when an internal timeout occurs.
• If there are permits, the number of permits in the semaphore is decreased by one and the
calling thread performs its activity. If there is contention, the longest waiting thread acquires
the permit before all other threads.
• If no permits are available, the calling thread blocks until a permit becomes available. When a
timeout happens during this block, the thread is interrupted.
The following example code uses an IAtomicLong resource 1000 times, increments the resource
when a thread starts to use it and decrements it when the thread completes.
If you execute the above SemaphoreMember class 5 times, the following output appears:
183
At iteration: 1, Active Threads: 2
As you can see, the maximum count of concurrent threads is equal or smaller than three. If you
remove the semaphore acquire/release statements in SemaphoreMember, you will see that there is no
limitation on the number of concurrent usages.
7.13. IAtomicReference
IAtomicReference is a member of CP Subsystem API. For detailed information, see
the CP Subsystem chapter.
The IAtomicLong is very useful if you need to deal with a long, but in some cases you need to deal
with a reference. That is why Hazelcast also supports the IAtomicReference which is the distributed
version of the java.util.concurrent.atomic.AtomicReference.
HazelcastInstance hz = Hazelcast.newHazelcastInstance(config);
foo
Just like IAtomicLong, IAtomicReference has methods that accept a 'function' as an argument, such as
alter, alterAndGet, getAndAlter and apply. There are two big advantages of using these methods:
• From a performance point of view, it is better to send the function to the data than the data to
the function. Often the function is a lot smaller than the data and therefore cheaper to send
over the line. Also the function only needs to be transferred once to the target machine and the
data needs to be transferred twice.
• You do not need to deal with concurrency control. If you would perform a load, transform,
store, you could run into a data race since another thread might have updated the value you are
about to overwrite.
184
7.13.2. Using IAtomicReference
The following are some considerations you need to know when you use IAtomicReference:
• IAtomicReference works based on the byte-content and not on the object-reference. If you use
the compareAndSet method, do not change to the original value because its serialized content will
then be different. It is also important to know that if you rely on Java serialization, sometimes
(especially with hashmaps) the same object can result in different binary content.
• All methods returning an object return a private copy. You can modify the private copy, but the
rest of the world is shielded from your changes. If you want these changes to be visible to the
rest of the world, you need to write the change back to the IAtomicReference; but be careful
about introducing a data-race.
• The 'in-memory format' of an IAtomicReference is binary. The receiving side does not need to
have the class definition available unless it needs to be deserialized on the other side, e.g.,
because a method like 'alter' is executed. This deserialization is done for every call that needs to
have the object instead of the binary content, so be careful with expensive object graphs that
need to be deserialized.
• If you have an object with many fields or an object graph and you only need to calculate some
information or need a subset of fields, you can use the apply method. With the apply method, the
whole object does not need to be sent over the line; only the information that is relevant is sent.
7.14. ICountDownLatch
ICountDownLatch is a member of CP Subsystem API. For detailed information, see
the CP Subsystem chapter.
ICountDownLatch is considered to be a gate keeper for concurrent activities. It enables the threads to
wait for other threads to complete their operations. The following examples describe the
mechanism of ICountDownLatch.
Assume that there is a leader process and there are follower processes that will wait until the
leader completes. Here is the leader:
185
public class Leader {
public static void main( String[] args ) throws Exception {
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
ICountDownLatch latch = hazelcastInstance.getCPSubsystem().getCountDownLatch(
"countDownLatch" );
System.out.println( "Starting" );
latch.trySetCount( 1 );
Thread.sleep( 30000 );
latch.countDown();
System.out.println( "Leader finished" );
latch.destroy();
}
}
Since only a single step is needed to be completed as a sample, the above code initializes the latch
with 1. Then, the code sleeps for a while to simulate a process and starts the countdown. Finally, it
clears up the latch. Let’s write a follower:
The follower class above first retrieves ICountDownLatch and then calls the await method to enable
the thread to listen for the latch. The method await has a timeout value as a parameter. This is
useful when the countDown method fails. To see ICountDownLatch in action, start the leader first and
then start one or more followers. You will see that the followers wait until the leader completes.
7.15. PN Counter
A Conflict-free Replicated Data Type (CRDT) is a distributed data structure that achieves high
availability by relaxing consistency constraints. There may be several replicas for the same data
and these replicas can be modified concurrently without coordination. This means that you may
achieve high throughput and low latency when updating a CRDT data structure. On the other hand,
all of the updates are replicated asynchronously. Each replica then receives updates made on other
replicas eventually and if no new updates are done, all replicas which can communicate to each
other return the same state (converge) after some time.
186
counter method invocations performed on a Hazelcast member are usually local (depending on the
configured replica count). If there is no member failure, it is guaranteed that each replica sees the
final value of the counter eventually. Counter’s state converges with each update and all CRDT
replicas that can communicate to each other will eventually have the same state.
Using the PN Counter, you can get a distributed counter, increment and decrement it, and query its
value with RYW (read-your-writes) and monotonic reads. The implementation borrows most
methods from the AtomicLong which should be familiar in most cases and easily interchangeable in
the existing code.
How it works
The counter supports adding and subtracting values as well as retrieving the current counter value.
Each replica of this counter can perform operations locally without coordination with the other
replicas, thus increasing availability. The counter guarantees that whenever two members have
received the same set of updates, possibly in a different order, their state is identical, and any
conflicting updates are merged automatically. If no new updates are made to the shared state, all
members that can communicate will eventually have the same data.
The updates to the counter are applied locally when invoked on a CRDT replica. A CRDT replica can
be any Hazelcast instance which is NOT a client or a lite member. You can configure the number
of replicas in the cluster using the replica-count configuration element.
When invoking updates from a non-replica instance, the invocation is remote. This may lead to
indeterminate state - the update may be applied but the response has not been received. In this
case, the caller is notified with a TargetDisconnectedException when invoked from a client or a
MemberLeftException when invoked from a member.
The read and write methods provide monotonic read and RYW (read-your-write) guarantees. These
guarantees are session guarantees which mean that if no replica with the previously observed state
is reachable, the session guarantees are lost and the method invocation throws a
ConsistencyLostException. This does not mean that an update is lost. All of the updates are part of
some replica and eventually reflected in the state of all other replicas. This exception just means
that you cannot observe your own writes because all replicas that contain your updates are
currently unreachable. After you have received a ConsistencyLostException, you can either wait for
a sufficiently up-to-date replica to become reachable in which case the session can be continued or
you can reset the session by calling the method `reset(). If you have called this method, a new
session is started with the next invocation to a CRDT replica.
The CRDT state is kept entirely on non-lite (data) members. If there aren’t any and
the methods here are
NoDataMemberInClusterException.
invoked on a lite member, they fail with a
187
The following is an example code.
This code snippet creates an instance of a PN counter, increments it by 5 and retrieves the value.
<hazelcast>
...
<pn-counter name="default">
<replica-count>10</replica-count>
<statistics-enabled>true</statistics-enabled>
</pn-counter>
...
</hazelcast>
• replica-count: Number of replicas on which state for this PN counter is kept. This number
applies in quiescent state, if there are currently membership changes or clusters are merging,
the state may be temporarily kept on more replicas. Its default value is Integer.MAX_VALUE.
Generally, keeping the state on more replicas means that more Hazelcast members are able to
perform updates locally but it also means that the PN counter state is kept on more replicas,
increasing the network traffic, decreasing the speed at which replica states converge and
increasing the size of the PN counter state kept on each replica.
• statistics-enabled: Specifies whether the statistics gathering is enabled for your PN Counter. If
set to false, you cannot collect statistics in your implementation (using
getLocalPNCounterStats()) and also Hazelcast Management Center will not show them. Its
default value is true.
188
7.15.2. Configuring the CRDT Replication Mechanism
Configuring the replication mechanism is for advanced use cases only - usually the
default configuration works fine for most cases.
In some cases, you may want to configure the replication mechanism for all CRDT implementations.
The CRDT states are replicated in rounds (the period is configurable) and in each round the state is
replicated up to the configured number of members. Generally speaking, you may increase the
speed at which replicas converge at the expense of more network traffic or decrease the network
traffic at the expense of slower convergence of replicas. Hazelcast implements the state-based
replication mechanism - the CRDT state for changed CRDTs is replicated in its entirety to other
replicas on each replication round.
<hazelcast>
...
<crdt-replication>
<max-concurrent-replication-targets>1</max-concurrent-replication-targets>
<replication-period-millis>1000</replication-period-millis>
</crdt-replication>
...
</hazelcast>
189
7.16. Flake ID Generator
Hazelcast Flake ID Generator is used to generate cluster-wide unique identifiers. Generated
identifiers are long primitive values and are k-ordered (roughly ordered). IDs are in the range from
0 to Long.MAX_VALUE.
The IDs contain timestamp component and a node ID component, which is assigned when the
member joins the cluster. This allows the IDs to be ordered and unique without any coordination
between the members, which makes the generator safe even in split-brain scenarios (for limitations
in this case, see the Node ID assignment section below).
Timestamp component is in milliseconds since 1.1.2018, 0:00 UTC and has 41 bits. This caps the
useful lifespan of the generator to little less than 70 years (until ~2088). The sequence component is
6 bits. If more than 64 IDs are requested in single millisecond, IDs gracefully overflow to the next
millisecond and uniqueness is guaranteed in this case. The implementation does not allow
overflowing by more than 15 seconds, if IDs are requested at higher rate, the call blocks. Note,
however, that clients are able to generate even faster because each call goes to a different (random)
member and the 64 IDs/ms limit is for single member.
7.16.2. Performance
Operation on member is always local, if the member has valid node ID, otherwise it’s remote. On
the client, the newId() method goes to a random member and gets a batch of IDs, which is then
returned locally for a limited time. The pre-fetch size and the validity time can be configured for
each client and member.
7.16.3. Example
190
public class ExampleFlakeIdGenerator {
public static void main(String[] args) {
HazelcastInstance hazelcast = Hazelcast.newHazelcastInstance();
Flake IDs require a unique node ID to be assigned to each member, from which point the member
can generate unique IDs without any coordination. Hazelcast uses the member list version from the
moment when the member joined the cluster as a unique node ID.
The join algorithm is specifically designed to ensure that member list join version is unique for
each member in the cluster. This ensures that IDs are unique even during network splits, with one
caveat: at most one member is allowed to join the cluster during a network split. If two members
join different subclusters, they are likely to get the same node ID. This is resolved when the cluster
heals, but until then, they can generate duplicate IDs.
Node ID Overflow
Node ID component of the ID has 16 bits. Members with the member list join version higher than
2^16 won’t be able to generate IDs, but functionality is preserved by forwarding to another
member. It is possible to generate IDs on any member or client as long as there is at least one
member with join version smaller than 2^16 in the cluster. The remedy is to restart the cluster: the
node ID component will be reset and assigned starting from zero again. Uniqueness after the
restart will be preserved thanks to the timestamp component.
191
<hazelcast>
...
<flake-id-generator name="default">
<prefetch-count>100</prefetch-count>
<prefetch-validity-millis>600000</prefetch-validity-millis>
<id-offset>0</id-offset>
<node-id-offset>0</node-id-offset>
<statistics-enabled>true</statistics-enabled>
</flake-id-generator>
...
</hazelcast>
• prefetch-count: Count of IDs which are pre-fetched on the background when one call to
FlakeIdGenerator.newId() is made. Its value must be in the range 1 -100,000. Its default value is
100. This setting pertains only to newId() calls made on the member that configured it.
• prefetch-validity-millis: Specifies for how long the pre-fetched IDs can be used. After this time
elapses, a new batch of IDs are fetched. Time unit is milliseconds. Its default value is 600,000
milliseconds (10 minutes). The IDs contain a timestamp component, which ensures a rough
global ordering of them. If an ID is assigned to an object that was created later, it will be out of
order. If ordering is not important, set this value to 0. This setting pertains only to newId() calls
made on the member that configured it.
• id-offset: Specifies the offset that is added to the returned IDs. Its default value is 0. Setting
might be useful when migrating from ID Generator. The default value works for all green-field
projects. For example, assume the largest ID returned from ID Generator is 150. And, Flake ID
Generator now returns 100. If you set this element to 50 and stop using the ID Generator, the
next ID from Flake ID Generator will be 151 or larger and no duplicate IDs will be generated. In
real-life, the IDs are much larger. You also need to add a reserve to the offset because the IDs
from Flake ID Generator are only roughly ordered. Recommended reserve is 2^38, that is
274877906944. Negative values are allowed to increase the lifespan of the generator, however
keep in mind that the generated IDs might also be negative.
• node-id-offset: Specifies the offset that is added to the node ID assigned to cluster member for
this generator. Might be useful in A/B deployment scenarios where you have cluster A which
you want to upgrade. You create cluster B and for some time both will generate IDs and you
want to have them unique. In this case, configure node ID offset for generators on cluster B.
• statistics-enabled: Specifies whether the statistics gathering is enabled for your Flake ID
Generator. If set to false, you cannot collect statistics in your implementation (using
getLocalFlakeIdGeneratorStats()) and also Hazelcast Management Center will not show them.
Its default value is true.
192
members in the cluster. It provides full replication of entries to all members for high speed access.
• When you have a Replicated Map in the cluster, your clients can communicate with any cluster
member.
• It provides statistics for entry access, write and update so that you can monitor it using
Hazelcast Management Center.
• New members joining to the cluster pull all the data from the existing members.
• You can listen to entry events using listeners. See the Using EntryListener on Replicated Map
section.
A Replicated Map does not partition data (it does not spread data to different cluster members);
instead, it replicates the data to all members.
Replication leads to higher memory consumption. However, a Replicated Map has faster read and
write access since the data is available on all members.
Writes could take place on local/remote members in order to provide write-order, eventually being
replicated to all other members.
Replicated Map is suitable for objects, catalog data, or idempotent calculable data (such as HTML
pages). It fully implements the java.util.Map interface, but it lacks the methods from
java.util.concurrent.ConcurrentMap since there are no atomic guarantees to writes or reads.
If Replicated Map is used from a unisocket client and this unisocket client is
connected to a lite member, the entry listeners cannot be registered/de-registered.
Here is an example of Replicated Map code. The HazelcastInstance’s getReplicatedMap method gets
the Replicated Map, and the Replicated Map’s put method creates map entries.
193
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
Map<String, String> map = hz.getReplicatedMap("map");
map.put("1", "Tokyo");
map.put("2", "Paris");
map.put("3", "New York");
The com.hazelcast.core.ReplicatedMap interface has some additional methods for registering entry
listeners or retrieving values in an expected order.
If you have a large cluster or very high occurrences of updates, the Replicated Map may not scale
linearly as expected since it has to replicate update operations to all members in the cluster.
Replicated Map has an anti-entropy system that converges values to a common one if some of the
members are missing replication updates.
Replicated Map does not guarantee eventual consistency because there are some edge cases that
fail to provide consistency.
Replicated Map uses the internal partition system of Hazelcast in order to serialize updates
happening on the same key at the same time. This happens by sending updates of the same key to
the same Hazelcast member in the cluster.
Due to the asynchronous nature of replication, a Hazelcast member could die before successfully
replicating a "write" operation to other members after sending the "write completed" response to
its caller during the write process. In this scenario, Hazelcast’s internal partition system promotes
one of the replicas of the partition as the primary one. The new primary partition does not have the
latest "write" since the dead member could not successfully replicate the update. (This leaves the
system in a state that the caller is the only one that has the update and the rest of the cluster have
not.) In this case even the anti-entropy system simply could not converge the value since the source
of true information is lost for the update. This leads to a break in the eventual consistency because
different values can be read from the system for the same key.
Other than the aforementioned scenario, the Replicated Map behaves like an eventually consistent
system with read-your-writes and monotonic-reads consistency.
194
7.17.4. Configuration Design for Replicated Map
There are several technical design decisions you should consider when you configure a Replicated
Map.
Initial Provisioning
If a new member joins the cluster, there are two ways you can handle the initial provisioning that is
executed to replicate all existing values to the new member. Each involves how you configure the
async fill up.
First, you can configure async fill up to true, which does not block reads while the fill up operation
is underway. That way, you have immediate access on the new member, but it will take time until
all the values are eventually accessible. Not yet replicated values are returned as non-existing
(null).
Second, you can configure for a synchronous initial fill up (by configuring the async fill up to false),
which blocks every read or write access to the map until the fill up operation is finished. Use this
with caution since it might block your application from operating.
Declarative Configuration:
You can declare your Replicated Map configuration in the Hazelcast configuration file
hazelcast.xml. See the following example:
<hazelcast>
...
<replicatedmap name="default">
<in-memory-format>BINARY</in-memory-format>
<async-fillup>true</async-fillup>
<statistics-enabled>true</statistics-enabled>
<entry-listeners>
<entry-listener include-value="true">
com.hazelcast.examples.EntryListener
</entry-listener>
</entry-listeners>
<split-brain-protection-ref>quorumname</split-brain-protection-ref>
</replicatedmap>
...
</hazelcast>
• in-memory-format: Internal storage format. See the In-Memory Format section. Its default value is
OBJECT.
195
• async-fillup: Specifies whether the Replicated Map is available for reads before the initial
replication is completed. Its default value is true. If set to false, i.e., synchronous initial fill up,
no exception is thrown when the Replicated Map is not yet ready, but null values can be seen
until the initial replication is completed.
• statistics-enabled: Specifies whether the statistics gathering is enabled for your Replicated
Map. If set to false, you cannot collect statistics in your implementation (using
getLocalReplicatedMapStats()) and also Hazelcast Management Center will not show them. Its
default value is true.
◦ entry-listener#local: Not used for Replicated Map since listeners are always local.
• split-brain-protection-ref: Name of quorum configuration that you want this Replicated Map
to use. See the Split-Brain Protection for Replicated Map section.
Programmatic Configuration:
You can configure a Replicated Map programmatically, as you can do for all other data structures in
Hazelcast. You must create the configuration upfront, when you nstantiate the HazelcastInstance. A
basic example of how to configure the Replicated Map using the programmatic approach is shown
in the following snippet.
ReplicatedMapConfig replicatedMapConfig =
config.getReplicatedMapConfig( "default" );
replicatedMapConfig.setInMemoryFormat( InMemoryFormat.BINARY )
.setSplitBrainProtectionName( "splitbrainprotectionname" );
All properties that can be configured using the declarative configuration are also available using
programmatic configuration by transforming the tag names into getter or setter names.
Currently, you can use the following in-memory-format options with the Replicated Map:
• OBJECT (default): The data is stored in deserialized form. This configuration is the default choice
since the data replication is mostly used for high speed access. Please be aware that changing
the values without a Map.put() is not reflected on the other members but is visible on the
changing members for later value accesses.
• BINARY: The data is stored in serialized binary format and has to be deserialized on every
request. This option offers higher encapsulation since changes to values are always discarded as
long as the newly changed object is not explicitly Map.put() into the map again.
196
7.17.6. Using EntryListener on Replicated Map
The fundamental difference in Replicated Map behavior, compared to the other data structures, is
that an EntryListener only reflects changes on local data. Since replication is asynchronous, all
listener events are fired only when an operation is finished on a local member. Events can fire at
different times on different members.
The HazelcastInstance s getReplicatedMap method gets a Replicated Map (customers), and the
ReplicatedMap s addEntryListener method adds an entry listener to the Replicated Map. Then, the
ReplicatedMap s put method adds a Replicated Map entry and updates it. The method remove removes
the entry.
197
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
ReplicatedMap<String, String> map = hz.getReplicatedMap("somemap");
map.addEntryListener(new MyEntryListener());
System.out.println("EntryListener registered");
}
@Override
public void entryAdded(EntryEvent<String, String> event) {
System.out.println("entryAdded: " + event);
}
@Override
public void entryRemoved(EntryEvent<String, String> event) {
System.out.println("entryRemoved: " + event);
}
@Override
public void entryUpdated(EntryEvent<String, String> event) {
System.out.println("entryUpdated: " + event);
}
@Override
public void entryEvicted(EntryEvent<String, String> event) {
System.out.println("entryEvicted: " + event);
}
@Override
public void entryExpired(EntryEvent<String, String> event) {
System.out.println( "Entry expired: " + event );
}
@Override
public void mapEvicted(MapEvent event) {
System.out.println("mapEvicted:" + event);
}
@Override
public void mapCleared(MapEvent event) {
System.out.println("mapCleared: " + event);
}
Replicated Map can be configured to check for a minimum number of available members before
applying its operations (see the Split-Brain Protection section). This is a check to avoid performing
successful queue operations on all parts of a cluster during a network partition.
The following is a list of methods, grouped by the protection types, that support split-brain
198
protection checks:
• WRITE, READ_WRITE:
◦ clear
◦ put
◦ putAll
◦ remove
• READ, READ_WRITE:
◦ containsKey
◦ containsValue
◦ entrySet
◦ get
◦ isEmpty
◦ keySet
◦ size
◦ values
Split-brain protection for Replicated Map can be configured programmatically using the method
setSplitBrainProtectionName(), or declaratively using the element split-brain-protection-ref.
Following is an example declarative configuration:
<hazelcast>
...
<replicatedmap name="default">
<split-brain-protection-ref>quorumname</split-brain-protection-ref>
</replicatedmap>
...
</hazelcast>
The cardinality estimator service does not provide any ways to configure its properties, but rather
uses some well tested defaults:
• P: Stands for precision with a default value of 14 (using the 14 LSB of the hash for the index)
199
• M: 2 ^ P = 16384 (16K) registers
The memory consumption of this data structure is close to 16K despite the size of elements in the
source data set or stream.
1. Add objects to the instance of the estimator, e.g., for IPs estimator.add("0.0.0.0."). The
provided object is first serialized and then the byte array is used to generate a hash for that
object.
See the cardinality estimator Javadoc for more information on its API.
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
CardinalityEstimator visitorsEstimator = hz.getCardinalityEstimator("visitors");
Hazelcast.shutdownAll();
200
7.18.1. Split-Brain Protection for Cardinality Estimator
Cardinality Estimator can be configured to check for a minimum number of available members
before applying its operations (see the Split-Brain Protection section). This is a check to avoid
performing successful queue operations on all parts of a cluster during a network partition.
The following is a list of methods, grouped by the protection types, that support split-brain
protection checks:
• WRITE, READ_WRITE:
◦ add
◦ addAsync
• READ, READ_WRITE:
◦ estimate
◦ estimateAsync
Split-brain protection for Cardinality Estimator can be configured programmatically using the
method setSplitBrainProtectionName(), or declaratively using the element split-brain-protection-
ref. Following is an example declarative configuration:
<hazelcast>
...
<cardinality-estimator name="default">
<split-brain-protection-ref>quorumname</split-brain-protection-ref>
</cardinality-estimator>
...
</hazelcast>
While recovering from a split-brain syndrome, Cardinality Estimator in the small cluster merges
into the bigger cluster based on a configured merge policy. When an estimator merges into the
cluster, an estimator with the same name might already exist in the cluster. So the merge policy
resolves these kinds of conflicts with different out-of-the-box strategies. It can be configured
programmatically using the method setMergePolicyConfig(), or declaratively using the element
merge-policy. Following is an example declarative configuration:
201
<hazelcast>
...
<cardinality-estimator name="default">
<merge-policy>HyperLogLogMergePolicy</merge-policy>
</cardinality-estimator>
...
</hazelcast>
• HyperLogLogMergePolicy: Estimator merges with the existing one, using the algorithmic merge for
HyperLogLog. This is the default policy.
• PutIfAbsentMergePolicy: Estimator from the smaller cluster wins if it doesn’t exist in the cluster.
Currently the event journal does not expose a public API for reading the event
journal in Hazelcast IMDG. The event journal can be used to stream event data to
Hazelcast Jet, so it should be used in conjunction with Hazelcast Jet. Because of this
we describe how to configure it but not how to use it from IMDG in this section. If
you enable and configure the event journal, you may only reach it through private
API and you most probably do not get any benefits but the journal retains events
nevertheless and consumes heap space.
The event journal has a fixed capacity and an expiration time. Internally it is structured as a
ringbuffer (partitioned by ringbuffer item) and shares many similarities with it.
Configuring IMap with eviction and expiration can cause the event journal to contain different
events on the different replicas of the same partition. You can run into issues if you are reading
from the event journal and the partition owner is terminated. A backup replica is then promoted
into the partition owner but the event journal will contain different events. The event count should
stay the same but the entries which you previously thought were evicted and expired could now be
"alive" and vice versa.
202
This is because eviction and expiration randomly choose entries to be evicted/expired. The entry is
not coordinated between partition replicas. In these cases, the event journal diverges and will not
converge at any future point, but will remain inconsistent just as well as the contents of the internal
record stores are inconsistent between replicas. You may say that the event journal on a specific
replica is in-sync with the record store on that replica but the event journals and record stores
between replicas are out-of-sync.
By default, an event journal is configured with a capacity of 10000 items. This creates a single array
per partition, roughly the size of the capacity divided by the number of partitions. Thus, if the
configured capacity is 10000 and number of partitions is 271, we create 271 arrays of size 36
(10000/271). If a time-to-live is configured, then an array of longs is also created that stores the
expiration time for every item. A single array of the event journal keeps events that are only related
to the map entries in that partition. In a lot of cases you may want to change this capacity number
to something that better fits your needs. As the capacity is shared between partitions, keep in mind
not to set it to a value which is too low for you. Setting the capacity to a number lower than the
partition count results in an error when initializing the event journal.
Below is a declarative configuration example of an event journal with a capacity of 5000 items for a
map and 10000 items for a cache:
<hazelcast>
...
<event-journal enabled="true">
<mapName>myMap</mapName>
<capacity>5000</capacity>
<time-to-live-seconds>20</time-to-live-seconds>
</event-journal>
<event-journal enabled="true">
<cacheName>myCache</cacheName>
<capacity>10000</capacity>
<time-to-live-seconds>0</time-to-live-seconds>
</event-journal>
...
</hazelcast>
You can also configure an event journal programmatically. The following is a programmatic version
of the above declarative configuration:
203
EventJournalConfig eventJournalMapConfig = new EventJournalConfig()
.setEnabled(true)
.setCapacity(5000)
.setTimeToLiveSeconds(20);
The mapName and cacheName attributes define the map or cache to which this event journal
configuration applies. You can use pattern-matching and the default keyword when doing so. For
instance, by using a mapName of journaled*, the journal configuration applies to all maps whose
names start with "journaled" and don’t have other journal configurations that match (e.g., if you
would have a more specific journal configuration with an exact name match). If you specify the
mapName or cacheName as default, the journal configuration applies to all maps and caches that don’t
have any other journal configuration. This means that potentially all maps and/or caches have one
single event journal configuration.
The event journal is a partitioned data structure. The partitioning is done by the event key. Because
of this, the map and cache entry with a specific key is co-located with the events for that key and
will be migrated accordingly. Also, the backup count for the event journal is equal to the backup
count of the map or cache for which it contains events. The events on the backup replicas will be
created with the map or cache backup operations and no additional network traffic is introduced
when appending events to the event journal.
You can configure Hazelcast event journal with a time-to-live in seconds. Using this setting, you
can control how long the items remain in the event journal before they are expired. By default, the
time-to-live is set to 0, meaning that unless the item is overwritten, it remains in the journal
indefinitely. The expiration time of the existing journal events is checked whenever a new event is
appended to the event journal or when the event journal is being read. If the journal is not being
read from or written to, the journal may keep expired items indefinitely.
In the example below, an event journal is configured with a time-to-live of 180 seconds:
204
<hazelcast>
...
<event-journal enabled="true">
<cacheName>myCache</cacheName>
<capacity>10000</capacity>
<time-to-live-seconds>180</time-to-live-seconds>
</event-journal>
...
</hazelcast>
8. Distributed Events
You can register for Hazelcast entry events so you are notified when those events occur. Event
listeners are cluster-wide: when a listener is registered in one member of cluster, it is actually
registered for the events that originated at any member in the cluster. When a new member joins,
events originated at the new member are also delivered.
An event is created only if you registered an event listener. If no listener is registered, then no
event is created. If you provided a predicate when you registered the event listener, pass the
predicate before sending the event to the listener (member/client).
As a rule of thumb, your event listener should not implement heavy processes in its event methods
that block the thread for a long time. If needed, you can use ExecutorService to transfer long
running processes to another thread and thus offload the current listener thread.
In a failover scenario, events are not highly available and may get lost. However,
you can perform workarounds such as configuring the event queue capacity as
explained in the Global Event Configuration section.
• Distributed Object Listener for distributed object creation and destruction events
205
For Hazelcast JCache implementation:
• Lifecycle Listener
• Membership Listener
The Membership Listener interface has methods that are invoked for the following events:
To write a Membership Listener class, you implement the MembershipListener interface and its
methods.
When a respective event is fired, the membership listener outputs the addresses of the members
that joined and left, and also which attribute changed on which member.
After you create your class, you can configure your cluster to include the membership listener.
Below is an example using the method addMembershipListener.
206
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
hazelcastInstance.getCluster().addMembershipListener( new ClusterMembershipListener()
);
With the above approach, there is the possibility of missing events between the creation of the
instance and registering the listener. To overcome this race condition, Hazelcast allows you to
register listeners in the configuration. You can register listeners using declarative, programmatic,
or Spring configuration, as shown below.
<hazelcast>
...
<listeners>
<listener>
com.yourpackage.ClusterMembershipListener
</listener>
</listeners>
...
</hazelcast>
<hz:listeners>
<hz:listener class-name="com.yourpackage.ClusterMembershipListener"/>
<hz:listener implementation="MembershipListener"/>
</hz:listeners>
207
public class ExampleDistObjListener implements DistributedObjectListener {
@Override
public void distributedObjectCreated(DistributedObjectEvent event) {
DistributedObject instance = event.getDistributedObject();
System.out.println("Created " + instance.getName() + ", service=" + instance
.getServiceName());
}
@Override
public void distributedObjectDestroyed(DistributedObjectEvent event) {
System.out.println("Destroyed " + event.getObjectName() + ", service=" +
event.getServiceName());
}
}
When a respective event is fired, the distributed object listener outputs the event type, the object
name and a service name (for example, for a Map object the service name is "hz:impl:mapService").
After you create your class, you can configure your cluster to include distributed object listeners.
Below is an example using the method addDistributedObjectListener. You can also see this portion
in the above class creation.
hazelcastInstance.addDistributedObjectListener( example );
With the above approach, there is the possibility of missing events between the creation of the
instance and registering the listener. To overcome this race condition, Hazelcast allows you to
register the listeners in the configuration. You can register listeners using declarative,
programmatic, or Spring configuration, as shown below.
config.addListenerConfig(
new ListenerConfig( "com.yourpackage.ExampleDistObjListener" ) );
208
<hazelcast>
...
<listeners>
<listener>
com.yourpackage.ExampleDistObjListener
</listener>
</listeners>
...
</hazelcast>
<hz:listeners>
<hz:listener class-name="com.yourpackage.ExampleDistObjListener"/>
<hz:listener implementation="DistributedObjectListener"/>
</hz:listeners>
The Migration Listener interface has methods that are invoked for the following events:
• migrationFinished: The migration finishes. MigrationState parameter shows the result of the
migration: number of the completed migrations, number of the remaining migrations, total
elapsed time, etc.
To write a Migration Listener class, you implement the MigrationListener interface and its methods.
209
public class ClusterMigrationListener implements MigrationListener {
@Override
public void migrationStarted(MigrationState state) {
System.out.println("Migration Started: " + state);
}
@Override
public void migrationFinished(MigrationState state) {
System.out.println("Migration Finished: " + state);
}
@Override
public void replicaMigrationCompleted(ReplicaMigrationEvent event) {
System.out.println("Replica Migration Completed: " + event);
}
@Override
public void replicaMigrationFailed(ReplicaMigrationEvent event) {
System.out.println("Replica Migration Failed: " + event);
}
}
After you create your class, you can configure your cluster to include migration listeners. Below is
an example using the method addMigrationListener.
With the above approach, there is the possibility of missing events between the creation of the
instance and registering the listener. To overcome this race condition, Hazelcast allows you to
register the listeners in the configuration. You can register listeners using declarative,
programmatic, or Spring configuration, as shown below.
config.addListenerConfig(
new ListenerConfig( "com.yourpackage.ClusterMigrationListener" ) );
210
<hazelcast>
...
<listeners>
<listener>
com.yourpackage.ClusterMigrationListener
</listener>
</listeners>
...
</hazelcast>
<hz:listeners>
<hz:listener class-name="com.yourpackage.ClusterMigrationListener"/>
<hz:listener implementation="MigrationListener"/>
</hz:listeners>
Hazelcast provides fault-tolerance by keeping multiple copies of your data. For each partition, one
of your cluster members becomes the owner and some of the other members become replica
members, based on your configuration. Nevertheless, data loss may occur if a few members crash
simultaneously.
Let’s consider the following example with three members: N1, N2, N3 for a given partition-0. N1 is
owner of partition-0. N2 and N3 are the first and second replicas respectively. If N1 and N2 crash
simultaneously, partition-0 loses its data that is configured with less than two backups. For instance,
if we configure a map with one backup, that map loses its data in partition-0 since both owner and
first replica of partition-0 have crashed. However, if we configure our map with two backups, it
does not lose any data since a copy of partition-0’s data for the given map also resides in N3.
The Partition Lost Listener notifies for possible data loss occurrences with the information of how
many replicas are lost for a partition. It listens to PartitionLostEvent instances. Partition lost events
are dispatched per partition.
Partition loss detection is done after a member crash is detected by the other members and the
crashed member is removed from the cluster. Please note that false-positive PartitionLostEvent
instances may be fired on the network split errors.
To write a Partition Lost Listener, you implement the PartitionLostListener interface and its
partitionLost method, which is invoked when a partition loses its owner and all backups.
211
public class ConsoleLoggingPartitionLostListener implements PartitionLostListener {
@Override
public void partitionLost(PartitionLostEvent event) {
System.out.println(event);
}
}
When a PartitionLostEvent is fired, the partition lost listener given above outputs the partition ID,
the replica index that is lost and the member that has detected the partition loss. The following is
an example output.
com.hazelcast.partition.PartitionLostEvent{partitionId=242, lostBackupCount=0,
eventSource=Address[192.168.2.49]:5701}
After you create your class, you can configure your cluster programmatically or declaratively to
include the partition lost listener. Below is an example of its programmatic configuration.
<hazelcast>
...
<listeners>
<listener>
com.yourpackage.ConsoleLoggingPartitionLostListener
</listener>
</listeners>
...
</hazelcast>
212
• MERGED: A member’s merge operation has completed.
This listener is local to an individual member. It notifies the application that uses Hazelcast about
the events mentioned above for a particular member.
After you create your class, you can configure your cluster to include lifecycle listeners. Below is an
example using the method addLifecycleListener.
With the above approach, there is the possibility of missing events between the creation of the
instance and registering the listener. To overcome this race condition, Hazelcast allows you to
register the listeners in the configuration. You can register listeners using declarative,
programmatic, or Spring configuration, as shown below.
config.addListenerConfig(
new ListenerConfig( "com.yourpackage.NodeLifecycleListener" ) );
213
<hazelcast>
...
<listeners>
<listener>
com.yourpackage.NodeLifecycleListener
</listener>
</listeners>
...
</hazelcast>
<hz:listeners>
<hz:listener class-name="com.yourpackage.NodeLifecycleListener"/>
<hz:listener implementation="LifecycleListener"/>
</hz:listeners>
The client listener is used by the Hazelcast cluster members. It notifies the cluster member when a
client is connected to or disconnected from it, i.e., the clients fire an event from only one member
they are connected to. Other cluster members do not fire a "client is connected" or "client is
disconnected" event.
To write a client listener class, you implement the ClientListener interface and its methods
clientConnected and clientDisconnected, which are invoked when a client is connected to or
disconnected from the cluster. You can add your client listener as shown below.
hazelcastInstance.getClientService().addClientListener(new ExampleClientListener());
<hazelcast>
...
<listeners>
<listener>
com.yourpackage.ExampleClientListener
</listener>
</listeners>
...
</hazelcast>
214
<hz:listeners>
<hz:listener class-name="com.yourpackage.ExampleClientListener"/>
<hz:listener implementation="com.yourpackage.ExampleClientListener"/>
</hz:listeners>
You can also add event listeners to a Hazelcast client. See the Client Listenerconfig
section for the related information.
You can listen to map-wide or entry-based events using the listeners provided by the Hazelcast’s
eventing framework. To listen to these events, implement a MapListener sub-interface.
The EntryListener interface still can be implemented (we kept it for backward
compatibility reasons). However, if you need to listen to a different event, one that
is not available in the EntryListener interface, you should also implement a
relevant MapListener sub-interface.
215
MapClearedListener {
@Override
public void entryAdded( EntryEvent<String, String> event ) {
System.out.println( "Entry Added:" + event );
}
@Override
public void entryRemoved( EntryEvent<String, String> event ) {
System.out.println( "Entry Removed:" + event );
}
@Override
public void entryUpdated( EntryEvent<String, String> event ) {
System.out.println( "Entry Updated:" + event );
}
@Override
public void entryEvicted( EntryEvent<String, String> event ) {
System.out.println( "Entry Evicted:" + event );
}
@Override
public void entryLoaded( EntryEvent<String, String> event ) {
System.out.println( "Entry Loaded:" + event );
}
@Override
public void mapEvicted( MapEvent event ) {
System.out.println( "Map Evicted:" + event );
}
@Override
public void mapCleared( MapEvent event ) {
System.out.println( "Map Cleared:" + event );
}
}
}
Now, let’s perform some modifications on the map entries using the following example code.
216
public class ModifyMap {
If you execute the Listen class and then the Modify class, you get the following output produced by
the Listen class.
Please note that the method IMap.clear() does not fire an "EntryRemoved" event,
but fires a "MapCleared" event.
217
public class ListenMapPartitionLostEvents {
map.addPartitionLostListener(new MapPartitionLostListener() {
@Override
public void partitionLost(MapPartitionLostEvent event) {
System.out.println(event);
}
});
}
}
See the Listening for Partition Lost Events section for more information about partition lost
detection and partition lost events.
After you create your listener class, you can configure your cluster to include map listeners using
the method addEntryListener (as you can see in the example Listen class above). Below is the
related portion from this code, showing how to register a map listener.
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
IMap<String, String> map = hz.getMap( "somemap" );
map.addEntryListener( new MyEntryListener(), true );
With the above approach, there is the possibility of missing events between the creation of the
instance and registering the listener. To overcome this race condition, Hazelcast allows you to
register listeners in configuration. You can register listeners using declarative, programmatic, or
Spring configuration, as shown below.
218
mapConfig.addEntryListenerConfig(
new EntryListenerConfig( "com.yourpackage.MyEntryListener",
false, false ) );
<hazelcast>
...
<map name="somemap">
<entry-listeners>
<entry-listener include-value="false" local="false">
com.yourpackage.MyEntryListener
</entry-listener>
</entry-listeners>
</map>
...
</hazelcast>
<hz:map name="somemap">
<hz:entry-listeners>
<hz:entry-listener include-value="true"
class-name="com.hazelcast.spring.DummyEntryListener"/>
<hz:entry-listener implementation="dummyEntryListener" local="true"/>
</hz:entry-listeners>
</hz:map>
As you see, there are attributes of the map listeners in the above examples: include-value and local.
The attribute include-value is a boolean attribute that is optional, and if you set it to true, the map
event contains the map value. Its default value is true.
The attribute local is also a boolean attribute that is optional, and if you set it to true, you can listen
to the map on the local member. Its default value is false.
You can listen to entry-based events in the MultiMap using EntryListener. The following is an
example entry listener implementation for MultiMap.
219
public class ExampleEntryListener implements EntryListener<String, String> {
@Override
public void entryAdded(EntryEvent<String, String> event) {
System.out.println("Entry Added: " + event);
}
@Override
public void entryRemoved( EntryEvent<String, String> event ) {
System.out.println( "Entry Removed: " + event );
}
@Override
public void entryUpdated(EntryEvent<String, String> event) {
System.out.println( "Entry Updated: " + event );
}
@Override
public void entryEvicted(EntryEvent<String, String> event) {
System.out.println( "Entry evicted: " + event );
}
@Override
public void entryExpired(EntryEvent<String, String> event) {
System.out.println( "Entry expired: " + event );
}
@Override
public void mapCleared(MapEvent event) {
System.out.println( "Map Cleared: " + event );
}
@Override
public void mapEvicted(MapEvent event) {
System.out.println( "Map Evicted: " + event );
}
}
After you create your listener class, you can configure your cluster to include MultiMap listeners
using the method addEntryListener. Below is the related portion from a code, showing how to
register a map listener.
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
MultiMap<String, String> map = hz.getMultiMap( "somemap" );
map.addEntryListener( new ExampleEntryListener(), true );
With the above approach, there is the possibility of missing events between the creation of the
instance and registering the listener. To overcome this race condition, Hazelcast allows you to
register listeners in the configuration. You can register listeners using declarative, programmatic,
or Spring configuration, as shown below.
220
multiMapConfig.addEntryListenerConfig(
new EntryListenerConfig( "com.yourpackage.ExampleEntryListener",
false, false ) );
[source,xml]
<hazelcast>
...
<multimap name="somemap">
<value-collection-type>SET</value-collection-type>
<entry-listeners>
<entry-listener include-value="false" local="false">
com.yourpackage.ExampleEntryListener
</entry-listener>
</entry-listeners>
</multimap>
...
</hazelcast>
As you see, there are attributes of the MultiMap listeners in the above examples: include-value and
local. The attribute include-value is a boolean attribute that is optional, and if you set it to true, the
MultiMap event contains the map value. Its default value is true.
The attribute local is also a boolean attribute that is optional, and if you set it to true, you can listen
to the MultiMap on the local member. Its default value is false.
The Item Listener is used by the Hazelcast IQueue, ISet and IList interfaces.
To write an Item Listener class, you implement the ItemListener interface and its methods
itemAdded and itemRemoved. These methods are invoked when an item is added or removed.
221
public class ExampleItemListener implements ItemListener<Price> {
@Override
public void itemAdded(ItemEvent<Price> event) {
System.out.println( "Item added: " + event );
}
@Override
public void itemRemoved(ItemEvent<Price> event) {
System.out.println( "Item removed: " + event );
}
}
You can use ICollection when creating any of the collection (queue, set and list)
data structures, as shown above. You can also use IQueue, ISet or IList instead of
ICollection.
After you create your class, you can configure your cluster to include item listeners. Below is an
example using the method addItemListener for ISet (it applies also to IQueue and IList). You can also
see this portion in the above class creation.
With the above approach, there is the possibility of missing events between the creation of the
instance and registering the listener. To overcome this race condition, Hazelcast allows you to
register listeners in the configuration. You can register listeners using declarative, programmatic,
or Spring configuration, as shown below.
setConfig.addItemListenerConfig(
new ItemListenerConfig( "com.yourpackage.ExampleItemListener", true ) );
222
<hazelcast>
...
<set>
<item-listeners>
<item-listener include-value="true">
com.yourpackage.ExampleItemListener
</item-listener>
</item-listeners>
</set>
...
</hazelcast>
As you see, there is an attribute in the above examples: include-value. It is a boolean attribute that
is optional, and if you set it to true, the item event contains the item value. Its default value is true.
There is also another attribute called local, which is not shown in the above examples. It is also a
boolean attribute that is optional, and if you set it to true, you can listen to the items on the local
member. Its default value is false.
The Message Listener is used by the ITopic interface. It notifies when a message is received for the
registered topic.
To write a Message Listener class, you implement the MessageListener interface and its method
onMessage, which is invoked when a message is received for the registered topic.
223
Registering Message Listeners
After you create your class, you can configure your cluster to include message listeners. Below is an
example using the method addMessageListener.
With the above approach, there is the possibility of missing messaging events between the creation
of the instance and registering the listener. To overcome this race condition, Hazelcast allows you
to register this listener in the configuration. You can register it using declarative, programmatic, or
Spring configuration, as shown below.
topicConfig.addMessageListenerConfig(
new ListenerConfig( "com.yourpackage.ExampleMessageListener" ) );
<hazelcast>
...
<topic name="default">
<message-listeners>
<message-listener>
com.yourpackage.ExampleMessageListener
</message-listener>
</message-listeners>
</topic>
...
</hazelcast>
<hz:topic name="default">
<hz:message-listeners>
<hz:message-listener
class-name="com.yourpackage.ExampleMessageListener"/>
</hz:message-listeners>
</hz:topic>
224
8.3. Event Listeners for Hazelcast Clients
You can add event listeners to a Hazelcast Java client. You can configure the following listeners to
listen to the events on the client side. See the respective content under the Cluster Events section for
example codes.
• Lifecycle Listener: Notifies when the client is starting, started, shutting down and shutdown.
• Membership Listener: Notifies when a member joins to/leaves the cluster to which the client is
connected, or when an attribute is changed in a member.
A striped executor in each cluster member controls and dispatches the received events. This striped
executor also guarantees the event order. For all events in Hazelcast, the order in which events are
generated and the order in which they are published are guaranteed for given keys. For map and
multimap, the order is preserved for the operations on the same key of the entry. For list, set, topic
and queue, the order is preserved for events on that instance of the distributed data structure.
To achieve the order guarantee, you make only one thread responsible for a particular set of events
(entry events of a key in a map, item events of a collection, etc.) in StripedExecutor (within
com.hazelcast.util.executor).
If the event queue reaches its capacity (hazelcast.event.queue.capacity) and the last item cannot be
put into the event queue for the period specified in hazelcast.event.queue.timeout.millis, these
events are dropped with a warning message, such as "EventQueue overloaded".
If event listeners perform a computation that takes a long time, the event queue can reach its
maximum capacity and lose events. For map and multimap, you can configure
hazelcast.event.thread.count to a higher value so that fewer collisions occur for keys, and
therefore worker threads do not block each other in StripedExecutor. For list, set, topic and queue,
you should offload heavy work to another thread. To preserve order guarantee, you should
implement similar logic with StripedExecutor in the offloaded thread pool.
9. Hazelcast Jet
This chapter briefly describes Hazelcast Jet. For detailed information and Jet
documentation, please visit jet.hazelcast.org.
225
9.1. Overview
Hazelcast Jet, built on top of the Hazelcast IMDG, is a distributed processing engine for fast stream
and batch processing of large data sets. It reuses the features and services of Hazelcast IMDG, but it
is a separate product with features not available in IMDG.
With Hazelcast IMDG providing storage functionality, Jet performs parallel execution in a Hazelcast
Jet cluster, composed of Jet instances, to enable data-intensive applications to operate in near real-
time. Jet uses green threads (threads that are scheduled by a runtime library or VM) to achieve this
parallel execution.
Since Jet uses Hazelcast IMDG’s discovery mechanisms, it can be used both on-premises and on the
cloud environments. Hazelcast Jet typically runs on several machines that form a cluster.
The Pipeline API is the primary high-level API of Hazelcast Jet for batch and stream processing. This
API is easy-to-use and set-up providing you with the tools to compose batch computations from
building blocks such as filters, aggregators and joiners - saving time and resource. With Pipeline
API, you can build bounded and unbounded data pipelines on a variety of sources and sinks.
In addition to the Pipeline API, Jet also offers a distributed implementation of java.util.stream. You
can express your computation over any data source Jet supports using the familiar API from the
JDK 8. This distributed implementation can be used for simple transform and reduce operations on
top of IMap and IList.
There is also Jet’s Core API for advanced users to build custom data sources and sinks, to have a
low-level control over the data flow, to fine-tune performance and build DSLs.
See the Work with Jet section in the Hazelcast Jet Reference Manual to see a simple example.
Hazelcast Jet is appropriate for applications that require a near real-time experience such as
operations in IoT architectures (house thermostats, lighting systems, etc.), in-store e-commerce
systems and social media platforms. Typical use cases include the following:
• Streaming analytics
• Online trading
226
• System log events
The aforementioned use cases require huge amounts of data to be processed in near real-time.
Hazelcast Jet achieves this by processing the incoming records as soon as possible, hence lowering
the latency, and ingesting the data at high-velocity. Jet’s execution model and keeping both the
computation and data storage in memory enables high application speeds.
The data processing is traditionally divided into batch and stream processing.
Batch data is considered as bounded, i.e., finite, and fast batch processing typically may refer to
running a job on a data set which is available in a data center. You simply provide one or more pre-
existing datasets and order Hazelcast Jet to mine them for the information you need.
Stream data is considered as unbounded, i.e., infinite, and infinite stream processing deals with in-
flight data before it is stored. It offers lower latency; data is processed on-the-fly and you do not
have to wait for the whole data set to arrive in order to run a computation.
A Jet job is implemented as a Hazelcast IMDG proxy, similar to the other services and data
structures in Hazelcast. Hazelcast operations are used for different actions that can be performed
on a job. Jet can also be used with the Hazelcast Client, which uses the Hazelcast Open Binary
Protocol to communicate different actions to the server instance.
In the Hazelcast Jet world, Hazelcast IMDG can be used for data ingestion prior to processing,
connecting multiple Jet jobs, enriching processed events, caching the remote data, distributing Jet-
processed data and running advanced data processing tasks on top of IMDG data structures.
Hazelcast Jet can use Hazelcast IMDG’s IMap, ICache and IList on the embedded cluster as sources
(data structures from which Jet reads data) and sinks (data structures to which Jet writes data).
IMap and ICache are partitioned data structures distributed across the cluster and Jet members can
read from these structures by having each member read just its local partitions. Hazelcast IMDG’s
IList is stored on a single partition; all the data is read on the single member that owns that
partition. See the IMap and ICache and IList sections in the Hazelcast Jet Reference Manual to learn
how Jet uses these IMDG data structures. In addition to these data structures, Jet can also process a
stream of changes of IMap and ICache, using the Event Journal.
You can use Hazelcast Jet with embedded Hazelcast IMDG or a remote Hazelcast IMDG cluster.
Benefits of using Hazelcast Jet with embedded Hazelcast IMDG are as follows:
227
• enriching processed events; cache remote data, e.g., fact tables from a database, on the Jet
members
• improving development processes by making start up of a Jet cluster simple and fast
Jet Jobs use Hazelcast IMDG connector by allowing reading and writing records to/from a remote
Hazelcast IMDG instance. You can use a remote Hazelcast IMDG cluster for the following cases:
• isolating the processing cluster (Jet) from operational data storage cluster (IMDG)
Aggregations are a good fit for simple operations (count, distinct, sum, avg, min, max, etc.).
However, they may not be sufficient for operations that group data by key and produce the results
of size O(keyCount). The architecture of Hazelcast aggregations is not well suited to this use case,
although it still works even for moderately sized results (up to 100 MB, as a ballpark figure).
Hazelcast Jet can be the preferred choice for larger sized results and whenever something more
than a single aggregation step is needed. See the Jet Compared with New Aggregations section.
Another Hazelcast IMDG computing feature is Entry Processors. They are used for fast mutating
operations in an atomic way, in which the map entry is mutated by executing logic directly on the
JVM where the data resides. And this means the network hops are reduced and atomicity is
provided in a single step. Keeping this in mind, you can use Hazelcast IMDG Entry Processors when
they perform bulk mutations of an IMap, where the processing function is fast and involves a single
map entry per call. On the other hand, you can prefer to use Hazelcast Jet when the processing
involves multiple entries (aggregations, joins, etc.), or involves multiple computing steps to be made
parallel, or when the data source and sink are not a single IMap instance.
228
single JVM (cluster member). In distributed systems, this implementation is not desired since you
may want a task submitted in one JVM and processed in another one. Hazelcast offers
IExecutorService for you to use in distributed environments. It implements
java.util.concurrent.ExecutorService to serve the applications requiring computational and data
processing power.
Note that you may want to use Hazelcast Jet if you want to process batch or real-
time streaming data. See the Fast Batch Processing and Real-Time Stream
Processing use cases for Hazelcast Jet.
With IExecutorService, you can execute tasks asynchronously and perform other useful tasks. If
your task execution takes longer than expected, you can cancel the task execution. Tasks should be
Serializable since they are distributed.
In the Java Executor framework, you implement tasks two ways: Callable or Runnable.
• Callable: If you need to return a value and submit it to Executor, implement the task as
java.util.concurrent.Callable.
Note that, the distributed executor service (IExecutorService) is intended to run processing where
the data is hosted: on the server members. In general, you cannot run a Java Runnable or Callable
on the clients as the clients may not be Java. Also, the clients do not host any data, so they would
have to fetch what data they need from the servers potentially. If you want something to run on all
or some clients connected to your cluster, you could implement this using the publish/subscribe
mechanism; a payload could be sent to an ITopic with the necessary execution parameters, and
clients listening can act on the message.
Below is an example of a Callable task. SumTask prints out map keys and returns the summed map
values.
229
public class SumTask
implements Callable<Integer>, Serializable, HazelcastInstanceAware {
Another example is the Echo callable below. In its call() method, it returns the local member and
the input passed in. Remember that instance.getCluster().getLocalMember() returns the local
member and toString() returns the member’s address (IP + port) in String form, just to see which
member actually executed the code for our example. Of course, the call() method can do and
return anything you like.
public Echo() {
}
230
Executing a Callable Task
• after executing the task, you do not have to wait for the execution to complete, you can process
other things
• when ready, use the Future object to retrieve the result as shown in the code example below.
Please note that the Echo callable in the above example also implements a Serializable interface,
since it may be sent to another member to be processed.
In Hazelcast, when you implement a task as java.util.concurrent.runnable (a task that does not
return a value), you implement Runnable and Serializable.
Below is Runnable example code. It is a task that waits for some time and echoes a message.
231
public class EchoTask implements Runnable, Serializable {
@Override
public void run() {
try {
Thread.sleep( 5000 );
} catch ( InterruptedException e ) {
}
System.out.println( "echo:" + msg );
}
}
Now let’s write a class that submits and executes these echo messages. Executor is retrieved from
HazelcastInstance and 1000 echo tasks are submitted.
You can scale the Executor service both vertically (scale up) and horizontally (scale out).
To scale up, you should improve the processing capacity of the cluster member (JVM). You can do
this by increasing the pool-size property mentioned in Configuring Executor Service (i.e.,
232
increasing the thread count). However, please be aware of your member’s capacity. If you think it
cannot handle such an additional load caused by increasing the thread count, you may want to
consider improving the member’s resources (CPU, memory, etc.). As an example, set the pool-size
to 5 and run the above MasterMember. You will see that EchoTask is run as soon as it is produced.
To scale out, add more members instead of increasing only one member’s capacity. In reality, you
may want to expand your cluster by adding more physical or virtual machines. For example, in the
EchoTask example in the Runnable section, you can create another Hazelcast instance. That
instance automatically gets involved in the executions started in MasterMember and start processing.
• echoOnTheMemberOwningTheKey: On the member owning the key you choose with the
IExecutorService submitToKeyOwner method.
• echoOnSomewhere: On the member Hazelcast picks with the IExecutorService submit method.
233
public void echoOnSomewhere( String input ) throws Exception {
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
IExecutorService executorService =
hazelcastInstance.getExecutorService( "default" );
A task in the code that you execute in a cluster might take longer than expected. If you cannot
stop/cancel that task, it keeps eating your resources.
To cancel a task, you can use the standard Java executor framework’s cancel() API. This framework
encourages us to code and design for cancellations, a highly ignored part of software development.
The Fibonacci callable class below calculates the Fibonacci number for a given number. In the
calculate method, we check if the current thread is interrupted so that the code can respond to
cancellations once the execution is started.
234
int input = 0;
The fib() method below submits the Fibonacci calculation task above for number 'n' and waits a
maximum of 3 seconds for the result. If the execution does not completed in three seconds, the
future.get() method throws a TimeoutException and upon catching it, we cancel the execution,
saving some CPU cycles.
fib(20) probably takes less than 3 seconds. However, fib(50) takes much longer. (This is not an
example for writing better Fibonacci calculation code, but for showing how to cancel a running
execution that takes too long.) The method future.cancel(false) can only cancel execution before it
is running (executing), but future.cancel(true) can interrupt running executions provided that
your code is able to handle the interruption. If you are willing to cancel an already running task,
then your task should be designed to handle interruptions. If the calculate (int n) method did not
have the (Thread.currentThread().isInterrupted()) line, then you would not be able to cancel the
235
execution after it is started.
You can use the ExecutionCallback offered by Hazelcast to asynchronously be notified when the
execution is done. To be notified when your task completes without an error, implement the
onResponse method. To be notified when your task completes with an error, implement the
onFailure method.
Let’s use the Fibonacci series to explain this. The example code below is the calculation that is
executed. Note that it is Callable and Serializable.
The example code below submits the Fibonacci calculation to ExecutionCallback and prints the
result asynchronously. ExecutionCallback has the methods onResponse and onFailure. In this
example code, onResponse is called upon a valid response and prints the calculation result, whereas
onFailure is called upon a failure and prints the stacktrace.
236
public class MasterMemberCallback {
As previously mentioned, it is possible to indicate where in the Hazelcast cluster the Runnable or
Callable is executed. Usually you execute these in the cluster based on the location of a key or a set
of keys, or you allow Hazelcast to select a member.
If you want more control over where your code runs, use the MemberSelector interface. For example,
you may want certain tasks to run only on certain members, or you may wish to implement some
form of custom load balancing regime. The MemberSelector is an interface that you can implement
and then provide to the IExecutorService when you submit or execute.
The select(Member) method is called for every available member in the cluster. Implement this
method to decide if the member is going to be used or not.
In a simple example shown below, we select the cluster members based on the presence of an
attribute.
237
10.1.8. Configuring Executor Service
Declarative Configuration:
<hazelcast>
...
<executor-service name="exec">
<pool-size>1</pool-size>
<queue-capacity>10</queue-capacity>
<statistics-enabled>true</statistics-enabled>
<split-brain-protection-ref>splitbrainprotection-name</split-brain-protection-
ref>
</executor-service>
...
</hazelcast>
Programmatic Configuration:
• pool-size: The number of executor threads per Member for the Executor. By default, Executor is
configured to have 16 threads in the pool. You can change that with this element.
• queue-capacity: Executor’s task queue capacity; the number of tasks this queue can hold.
• statistics-enabled: Specifies whether the statistics gathering is enabled for your Executor
Service. If set to false, you cannot collect statistics in your implementation (using
getLocalExecutorStats()) and also Hazelcast Management Center will not show them. Its default
value is true.
• split-brain-protection-ref: Name of the split-brain protection configuration that you want this
Executor Service to use. See the Split-Brain Protection for IExecutorService section.
IExecutorService can be configured to check for a minimum number of available members before
applying its operations (see the Split-Brain Protection section). This is a check to avoid performing
successful queue operations on all parts of a cluster during a network partition.
The following is a list of methods, grouped by the operations, that support split-brain protection
checks:
238
• WRITE, READ_WRITE:
◦ execute
◦ executeOnAllMembers
◦ executeOnKeyOwner
◦ executeOnMember
◦ executeOnMembers
◦ shutdown
◦ shutdownNow
◦ submit
◦ submitToAllMembers
◦ submitToKeyOwner
◦ submitToMember
◦ submitToMembers
Split-brain protection for Executor Service can be configured programmatically using the method
setSplitBrainProtectionName(), or declaratively using the element split-brain-protection-ref.
Following is an example declarative configuration:
<hazelcast>
...
<executor-service name="default">
<split-brain-protection-ref>splitbrainprotection-name</split-brain-protection-
ref>
</executor-service>
...
</hazelcast>
Processing of the tasks when using durable executor service involves two invocations:
1. Sending the task to primary Hazelcast member (primary partition) and to its backups, if
configured, and executing the task.
239
2. Retrieving the result of the task.
As you may already know, Hazelcast’s executor service returns a future representing the task to the
user. With the above two-invocations approach, it is guaranteed that the task is executed before the
future returns and you can track the response of a submitted task with a unique ID. Hazelcast
stores the task on both primary and backup members, and starts the execution also.
With the first invocation, a Ringbuffer stores the task and a generated sequence for the task is
returned to the caller as a result. In addition to the storing, the task is executed on the local
execution service for the primary member. By this way, the task is now resilient to member failures
and you are able to track the task with its ID.
After the first invocation has completed and the sequence of task is returned, second invocation
starts to retrieve the result of task with that sequence. This retrieval waits in the waiting operations
queue until notified, or it runs immediately if the result is already available.
When task execution is completed, Ringbuffer replaces the task with the result for the given task
sequence. This replacement notifies the waiting operations queue.
This section presents example configurations for durable executor service along with the
descriptions of its configuration elements and attributes.
Declarative Configuration:
<hazelcast>
...
<durable-executor-service name="myDurableExecSvc">
<pool-size>8</pool-size>
<durability>1</durability>
<capacity>1</capacity>
<split-brain-protection-ref>splitbrainprotection-name</split-brain-protection-
ref>
</durable-executor-service>
...
</hazelcast>
Programmatic Configuration:
240
Config config = new Config();
config.getDurableExecutorConfig( "myDurableExecSvc" )
.setPoolSize ( 8 )
.setDurability( 1 )
.setCapacity( 1 )
.setSplitBrainProtectionName( "splitbrainprotectionname" );
The following are the descriptions of each configuration element and attribute:
• durability: Number of backups in the cluster for the submitted task. Its default value is 1.
• capacity: Executor’s task queue capacity; the number of tasks this queue can hold.
• split-brain-protection-ref: Name of the split-brain protection configuration that you want this
Durable Executor Service to use. See the Split-Brain Protection for Durable Executor Service
section.
Durable Executor Service can be configured to check for a minimum number of available members
before applying its operations (see the Split-Brain Protection section). This is a check to avoid
performing successful queue operations on all parts of a cluster during a network partition.
The following is a list of methods, grouped by the operations, that support split-brain protection
checks:
• WRITE, READ_WRITE:
◦ disposeResult
◦ execute
◦ executeOnKeyOwner
◦ retrieveAndDisposeResult
◦ shutdown
◦ shutdownNow
◦ submit
◦ submitToKeyOwner
• READ, READ_WRITE:
◦ retrieveResult
Split-brain protection for Durable Executor Service can be configured programmatically using the
241
method setSplitBrainProtectionName(), or declaratively using the element split-brain-protection-
ref. Following is an example declarative configuration:
<hazelcast>
...
<durable-executor-service name="myDurableExecSvc">
<split-brain-protection-ref>splitbrainprotection-name</split-brain-protection-
ref>
</durable-executor-service>
...
</hazelcast>
On top of the Vanilla Scheduling API, IScheduledExecutorService allows additional methods such as
the following:
1. Upon partition specific scheduling, the future task is stored both in the primary partition and
also in its N backups, N being the <durability> property in the configuration. More specifically,
there are always one or more backups to take ownership of the task in the event of a lost
member. If a member is lost, the task is re-scheduled on the backup (new primary) member,
which might induce further delays on the subsequent executions of the task. For example, if we
schedule a task to run in 10 seconds from now, schedule(new ExampleTask(), 10,
TimeUnit.SECONDS); and after 5 seconds the owner member goes down (before the execution
takes place), then the backup owner re-schedules the task in 10 seconds from now. Therefore,
from the user’s perspective waiting on the result, this will be available in 10 + 5 = 15 seconds
rather than 10 seconds as it is anticipated originally. If atFixedRate was used, then only the
initial delay is affected in the above scenario, all subsequent executions should adhere to the
given period parameter.
242
2. Upon member specific scheduling, the future task is only stored in the member itself, which
means that in the event of a lost member, the task is lost as well.
To accomplish the described durability, all tasks provide a unique identity/name before the
scheduling takes place. The name allows the service to reach the scheduled task even after the
caller (client or member) goes down and also allows to prevent duplicate tasks. The name of the
task can be user-defined if it needs to be, by implementing the
com.hazelcast.scheduledexecutor.NamedTask interface (plain wrapper util is available here:
com.hazelcast.scheduledexecutor.TaskUtils.named(java.lang.String, java.lang.Runnable)). If the
task does not provide a name in its implementation, the service provides a random UUID for it,
internally.
Futures associated with a scheduled task, in order to be aware of lost partitions and/or members,
act as listeners on the local member/client. Therefore, they are always strongly referenced, on the
member/client side. In order to clean up their resources, once completed, you can use the method
dispose(). This method also cancels further executions of the task if scheduled at a fixed rate. See
the IScheduledFuture Javadoc for its API details.
The task handler is a descriptor class holding information for the scheduled future, which is used to
pinpoint the actual task in the cluster. It contains the name of the task, the owner (member or
partition) and the scheduler name.
The handler is always available after scheduling and can be stored in a plain string format
com.hazelcast.scheduledexecutor.ScheduledTaskHandler.toUrn() and re-constructed back from that
String com.hazelcast.scheduledexecutor.ScheduledTaskHandler.of(). If the handler is lost, you can
still find a task under a given scheduler by using the Scheduler’s
com.hazelcast.scheduledexecutor.IScheduledExecutorService.getAllScheduledFutures().
Last but not least, similar to executor service, the scheduled executor service allows Stateful tasks
to be scheduled. Stateful tasks, are tasks that require any kind of state during their runtime, which
must also be durable along with the task in the event of a lost partition.
As with the tasks, Objects stored in the state Map need to be Hazelcast serializable.
This section presents example configurations for scheduled executor service along with the
descriptions of its configuration elements and attributes.
Declarative Configuration:
243
<hazelcast>
...
<scheduled-executor-service name="myScheduledExecSvc">
<pool-size>16</pool-size>
<durability>1</durability>
<capacity>100</capacity>
<split-brain-protection-ref>splitbrainprotection-name</split-brain-protection-
ref>
</scheduled-executor-service>
...
</hazelcast>
Programmatic Configuration:
The following are the descriptions of each configuration element and attribute:
• capacity: Maximum number of tasks that a scheduler can have per partition. Attempt to
schedule more results in RejectedExecutionException. To free up the capacity, tasks should get
disposed by the user.
• split-brain-protection-ref: Name of the split-brain protection configuration that you want this
Scheduled Executor Service to use. See the Split-Brain Protection for IScheduled Executor
Service section.
10.3.2. Examples
Scheduling a callable that computes the cluster size in 10 seconds from now:
244
static class DelayedClusterSizeTask implements Callable<Integer>,
HazelcastInstanceAware, Serializable {
@Override
public Integer call()
throws Exception {
return instance.getCluster().getMembers().size();
}
@Override
public void setHazelcastInstance(HazelcastInstance hazelcastInstance) {
this.instance = hazelcastInstance;
}
}
The following is a list of methods, grouped by the operations, that support split-brain protection
checks:
• WRITE, READ_WRITE:
◦ schedule
◦ scheduleAtFixedRate
◦ scheduleOnAllMembers
◦ scheduleOnAllMembersAtFixedRate
◦ scheduleOnKeyOwner
◦ scheduleOnKeyOwnerAtFixedRate
◦ scheduleOnMember
◦ scheduleOnMemberAtFixedRate
245
◦ scheduleOnMembers
◦ scheduleOnMembersAtFixedRate
◦ shutdown
• READ, READ_WRITE:
◦ getAllScheduledFutures
Split-brain protection for Scheduled Executor Service can be configured programmatically using
the method setSplitBrainProtectionName(), or declaratively using the element split-brain-
protection-ref. Following is an example declarative configuration:
<hazelcast>
...
<scheduled-executor-service name="myScheduledExecSvc">
<split-brain-protection-ref>splitbrainprotection-name</split-brain-protection-
ref>
</scheduled-executor-service>
...
</hazelcast>
An entry processor is a good option if you perform bulk processing on an IMap. Usually you perform
a loop of keys - executing IMap.get(key), mutating the value and finally putting the entry back in the
map using IMap.put(key,value). If you perform this process from a client or from a member where
the keys do not exist, you effectively perform two network hops for each update: the first to
retrieve the data and the second to update the mutated value.
If you are doing the process described above, you should consider using entry processors. An entry
processor executes a read and updates upon the member where the data resides. This eliminates
the costly network hops described above.
Entry processor is meant to process a single entry per call. Processing multiple
entries and data structures in an entry processor is not supported as it may result
in deadlocks.
246
Note that Hazelcast Jet is a good fit when you want to perform processing that
involves multiple entries (aggregations, joins, etc.), or involves multiple computing
steps to be made parallel. Hazelcast Jet contains an Entry Processor Sink to allow
you to update Hazelcast IMDG data as a result of your Hazelcast Jet computation.
See the Hazelcast Jet Reference Manual.
An entry processor enables fast in-memory operations on your map without you having to worry
about locks or concurrency issues. You can apply it to a single map entry or to all map entries. Entry
processors support choosing target entries using predicates. You do not need any explicit lock on
entry thanks to the isolated threading model: Hazelcast runs the entry processor for all entries on a
partitionThread so there will NOT be any interleaving of the entry processor and other mutations.
Hazelcast sends the entry processor to each cluster member and these members apply it to map
entries. Therefore, if you add more members, your processing completes faster.
Using Indexes
Entry processors can be used with predicates. Predicates help to process a subset of data by
selecting eligible entries. This selection can happen either by doing a full-table scan or by using
indexes. To accelerate entry selection step, you can consider to add indexes. If indexes are there,
entry processor automatically uses them.
If entry processing is the major operation for a map and if the map consists of complex objects, you
should use OBJECT as the in-memory-format to minimize serialization cost. By default, the entry value
is stored as a byte array (BINARY format). When it is stored as an object (OBJECT format), then the
entry processor is applied directly on the object. In that case, no serialization or deserialization is
performed. However, if there is a defined event listener, a new entry value will be serialized when
passing to the event publisher service.
When in-memory-format is OBJECT, the old value of the updated entry will be null.
Processing Entries
The IMap interface provides the following methods for entry processing:
• executeOnKey processes an entry mapped by a key, blocking until the processing is complete and
the result is returned.
• executeOnKeys processes entries mapped by a collection of keys, blocking until the processing is
complete and the results are returned.
• submitToKey processes an entry mapped by a key and provides a way to register a callback to
receive notifications about the result of the entry processing.
• executeOnEntries processes all entries in a map, blocking until the processing is complete and
the results are returned.
247
• executeOnEntries process all entries in a map matching the provided predicate, blocking until
the processing is complete and the results are returned.
When using the executeOnEntries method, if the number of entries is high and you do not need the
results, then returning null with the process() method is a good practice. This method is offered by
the EntryProcessor interface. By returning null, results of the processing are not collected and thus
out of memory errors are eliminated.
If you do not need to read or modify the entry in any way but would like to execute a task on the
member owning the entry with that key (i.e. the member is the partition owner for that key), you
can also use executeOnKeyOwner provided by IExecutorService. You need to make sure that the
runnable can be serialized (using any of the available serialization techniques in Hazelcast). The
runnable will not receive the map entry key or value and is not running on the same thread as
operations reading the map data so operations such as map.get() or map.put() will not be blocked.
Entry processors run via operation threads that are dedicated to specific
partitions. Therefore, with long running entry processor executions, other
partition operations such as map.put(key) on some partitions can be blocked while
partition operations on other partitions might run concurrently. With this in mind,
it is a good practice to make your entry processor executions as quick as possible.
The entry processor respects locks ONLY when its executions are performed on a single key. As
explained in the above section, the entry processor has the following methods to process a single
key:
Therefore, if you want to to perform an entry processor execution on a single key using one of these
methods and that key has a lock on it, the execution will wait until the lock on that key is removed.
If your code modifies the data, then you will most likely need to modify backup entries as well. This
should be done to prevent divergence of map values between copies of data in the cluster (the
primary and backup replicas). In most cases, this is simple. By implementing the EntryProcessor
interface and providing only the process() method, the same entry processor will be applied on all
copies of the map entry.
If, however, you would like to run a custom processor on backup entries, you may provide the
processor by overriding the EntryProcessor#getBackupProcessor method. The method should return
an instance of an EntryProcessor which will be run on backup entries exclusively. As such, it may
carry some state that was derived from running the entry processor on primary replicas.
You may also return null from the EntryProcessor#getBackupProcessor method. This signifies that
there is nothing to be done on the backup replicas which is most convenient when you are using
248
the entry processor to read and not modify entries.
It is possible that an entry processor could see that a key exists though its backup
processor may not find it due to an unsent backup of a previous operation, e.g., a
previous put operation. In those situations, Hazelcast internally/eventually
synchronizes those owner and backup partitions so you do not lose any data.
When coding a backup entry processor, you should take that case into account,
otherwise NullPointerException can be seen since Map.Entry.getValue() may return
null.
The class IncrementingEntryProcessor creates an entry processor to process the map entries. It
implements the EntryProcessor interface. The process() method will be called for both primary and
backup entries.
@Override
public EntryProcessor<Integer, Integer, Integer> getBackupProcessor() {
return IncrementingEntryProcessor.this;
}
}
You should explicitly call the setValue method of Map.Entry when modifying data in
the entry processor. Otherwise, the entry processor is accepted as read-only.
249
10.4.3. Entry Processor Performance Optimizations
By default the entry processor executes on a partition thread. A partition thread is responsible for
handling one or more partitions. The design of entry processor assumes users have fast user code
execution of the process() method. In the pathological case where the code is very heavy and
executes in multi-milliseconds, this may create a bottleneck.
We have a slow user code detector which can be used to log a warning controlled by the following
system properties:
The defaults catch extremely slow operations but you should set this much lower, say to 1ms, at
development time to catch entry processors that could be problematic in production. These are
good candidates for our optimizations.
• Offloadable which moves execution off the partition thread to an executor thread
These are enabled very simply by implementing these interfaces in your EntryProcessor.
As of Hazelcast IMDG 3.9, these optimizations apply to the following IMap methods only:
• executeOnKey(Object, EntryProcessor)
• submitToKey(Object, EntryProcessor)
• submitToKey(Object, EntryProcessor, ExecutionCallback)
If an entry processor implements the Offloadable interface, the process() method is executed in the
executor specified by the Offloadable's getExecutorName() method.
Offloading unblocks the partition thread allowing the user to profit from much higher throughput.
The key is locked for the time span of the processing in order to not generate a write conflict.
3. partition thread (set new value & unlock key, or just unlock key if the entry has not been
modified)
The method getExecutorName() method may also return two constants defined in the Offloadable
interface:
250
• OFFLOADABLE_EXECUTOR: Processing is offloaded to the default
ExecutionService.OFFLOADABLE_EXECUTOR.
Note that if the method getExecutorName() cannot find an executor whose name matches the one
called by this method, then the default executor service is used. Here is the configuration for the
"default" executor:
<hazelcast>
...
<executor-service name="default">
<pool-size>16</pool-size>
<queue-capacity>0</queue-capacity>
</executor-service>
...
</hazelcast>
<hazelcast>
...
<executor-service name="OffloadedInventoryEntryProcessor”>
<pool-size>30</pool-size>
<queue-capacity>0</queue-capacity>
</executor-service>
...
</hazelcast>
Remember to set the pool-size (count of executor threads per member) according to your execution
needs. See the Configuring Executor Service section for the configuration details.
By default, an entry processor does not run if the key is locked. It waits until the key has been
unlocked (it applies to the executeOnKey, submitToKey methods, that were mentioned before).
If the entry processor implements the ReadOnly interface without implementing the Offloadable
interface, the processing is not offloaded to an external executor. However, the entry processor
does not observe if the key of the processed entry is locked, nor tries to acquire the lock since the
entry processor will not do any modifications.
If the entry processor implements both ReadOnly and Offloadable interfaces, we observe the
combination of both optimizations described above.
251
The process() method is executed in the executor specified by the Offloadable’s `getExecutorName()
method. Also, the entry processor does not observe if the key of the processed entry is locked, nor
tries to acquire the lock since the entry processor will not do any modifications.
@Override
public Object process(Map.Entry<String, Employee> entry) {
// heavy logic
return null;
}
@Override
public EntryProcessor<String, Employee, Object> getBackupProcessor() {
// ReadOnly EntryProcessor has to return null, since it's just a read-only
operation that will not be
// executed on the backup
return null;
}
@Override
public String getExecutorName() {
return OFFLOADABLE_EXECUTOR;
}
}
Hazelcast partitions your data and spreads it across cluster of members. You can iterate over the
map entries and look for certain entries (specified by predicates) you are interested in. However,
252
this is not very efficient because you have to bring the entire entry set and iterate locally. Instead,
Hazelcast allows you to run distributed queries on your distributed map.
2. Each member looks at its own local entries and filters them according to the predicate. At this
stage, key/value pairs of the entries are deserialized and then passed to the predicate.
3. The predicate requester merges all the results coming from each member into a single set.
Distributed query is highly scalable. If you add new members to the cluster, the partition count for
each member is reduced and thus the time spent by each member on iterating its entries is
reduced. In addition, the pool of partition threads evaluates the entries concurrently in each
member and the network traffic is also reduced since only filtered data is sent to the requester.
• Criteria API
Assume that you have an "employee" map containing values of Employee objects, as coded below.
253
public class Employee implements Serializable {
private String name;
private int age;
private boolean active;
private double salary;
public Employee() {
}
Now let’s look for the employees who are active and have an age less than 30 using the
aforementioned APIs (Criteria API and Distributed SQL Query). The following subsections describe
each query mechanism for this example.
When using Portable objects, if one field of an object exists on one member but
does not exist on another one, Hazelcast does not throw an unknown field
exception. Instead, Hazelcast treats that predicate, which tries to perform a query
on an unknown field, as an always false predicate.
Criteria API is a programming interface offered by Hazelcast that is similar to the Java Persistence
Query Language (JPQL). Below is the code for the above example query.
254
IMap<String, Employee> map = hazelcastInstance.getMap( "employee" );
EntryObject e = Predicates.newPredicateBuilder().getEntryObject();
Predicate predicate = e.is( "active" ).and( e.get( "age" ).lessThan( 30 ) );
In the above example code, predicate verifies whether the entry is active and its age value is less
than 30. This predicate is applied to the employee map using the map.values(predicate) method. This
method sends the predicate to all cluster members and merges the results coming from them. Since
the predicate is communicated between the members, it needs to be serializable.
Predicates can also be applied to keySet, entrySet and localKeySet of the Hazelcast
distributed map.
The Predicates class includes many operators for your query requirements. The following are
descriptions for some of them:
• like: Checks if the result of an expression matches some string pattern. % (percentage sign) is
the placeholder for many characters, (underscore) is placeholder for only one character.
• greaterEqual: Checks if the result of an expression is greater than or equal to a certain value.
• lessEqual: Checks if the result of an expression is less than or equal to a certain value.
• between: Checks if the result of an expression is between two values (this is inclusive).
You can combine predicates using the and, or and not operators, as shown in the below examples.
255
public Collection<Employee> getWithNameAndAge( String name, int age ) {
Predicate namePredicate = Predicates.equal( "name", name );
Predicate agePredicate = Predicates.equal( "age", age );
Predicate predicate = Predicates.and( namePredicate, agePredicate );
return employeeMap.values( predicate );
}
You can simplify predicate usage with the PredicateBuilder interface, which offers simpler
predicate building. See the below example code which selects all people with a certain name and
age.
256
• active AND age>30
• active=false OR age = 45 OR name = 'Joe'
• active AND ( age > 20 OR salary < 60000 )
• <expression> = value
• age ⇐ 30
• name = 'Joe'
• salary != 50000
The % (percentage sign) is placeholder for multiple characters, an _ (underscore) is placeholder for
only one character.
• name NOT LIKE 'Jo_' (true for 'Josh'; false for 'Joe')
• name LIKE 'J_s%' (true for 'Josh', 'Joseph'; false 'John', 'Joe')
• name ILIKE 'Jo_' (true for 'Joe' or 'jOE'; false for 'Josh')
You can escape the % and _ placeholder characters in your SQL queries with
predicates using the backslash (\) character. The apostrophe (') can be escaped
with another apostrophe, i.e., ''. If you use REGEX, you need to escape characters
according to the normal Java escape syntax; see here for the details.
257
Querying Entry Keys with Predicates
You can use __key attribute to perform a predicated search for entry keys. See the following
example:
In this example, the code creates a collection with the entries whose keys start with the letter "A”.
You can query JSON strings stored inside your Hazelcast clusters. To query a JSON string, you first
need to create a HazelcastJsonValue from the JSON string. You can use HazelcastJsonValues both as
keys and values in the distributed data structures. Then, it is possible to query these objects using
the Hazelcast query methods explained in this section.
When running the queries, Hazelcast treats values extracted from the JSON documents as Java
types so they can be compared with the query attribute. JSON specification defines five primitive
types to be used in the JSON documents: number,string, true, false and null. The string, true/false
and null types are treated as String, boolean and null, respectively. We treat the extracted number
values as longs if they can be represented by a long. Otherwise, numbers are treated as doubles.
It is possible to query nested attributes and arrays in JSON documents. The query syntax is the
same as querying other Hazelcast objects as explained in the Querying in Collections and Arrays
section.
258
/**
* Sample JSON object
*
* {
* "departmentId": 1,
* "room": "alpha",
* "people": [
* {
* "name": "Peter",
* "age": 26,
* "salary": 50000
* },
* {
* "name": "Jonah",
* "age": 50,
* "salary": 140000
* }
* ]
* }
*
*
* The following query finds all the departments that have a person named "Peter"
working in them.
*/
Collection<HazelcastJsonValue> departmentWithPeter = departments.values(Predicates
.equal("people[any].name", "Peter"));
HazelcastJsonValue is a lightweight wrapper around your JSON strings. It is used merely as a way to
indicate that the contained string should be treated as a valid JSON value. Hazelcast does not check
the validity of JSON strings put into to maps. Putting an invalid JSON string in a map is permissible.
However, in that case whether such an entry is going to be returned or not from a query is not
defined.
Hazelcast stores a metadata object per HazelcastJsonValue stored. This metadata object is created
every time a HazelcastJsonValue is put into an IMap. Metadata is later used to speed up the query
operations. Metadata creation is on by default. Depending on your application’s needs, you may
want to turn off the metadata creation to decrease the put latency and increase the throughput. You
can configure this using Metadata Policy.
JSON metadata is stored on-heap even when you use the NATIVE in-memory format.
If you are storing HazelcastJsonValues in your NATIVE maps, there is a certain
amount of on-heap cost per object. Metadata is not created unless you put
HazelcastJsonValues in your NATIVE maps even when metadata creation is on.
259
11.1.5. Filtering with Paging Predicates
Hazelcast provides paging for defined predicates. With its PagingPredicate interface, you can get a
collection of keys, values, or entries page by page by filtering them with predicates and giving the
size of the pages. Also, you can sort the entries by specifying comparators. In this case, the
comparator should be Serializable and the serialization factory implementations you use, e.g.,
PortableFactory and DataSerializableFactory, should be registered. See the Serialization chapter on
how to register these factories.
Paging predicates require the objects to be deserialized both on the calling side (either a member or
client) and the member side from which the collection is retrieved. Therefore, you need to register
the serialization factories you use on all the members and clients on which the paging predicates
are used. See the Serialization chapter on how to register these factories.
• The greaterEqual predicate gets values from the "students" map. This predicate has a filter to
retrieve the objects with an "age" greater than or equal to 18.
• Then a PagingPredicate is constructed in which the page size is 5, so that there are five objects in
each page. The first time the values are called creates the first page.
• It gets subsequent pages with the nextPage() method of PagingPredicate and querying the map
again with the updated PagingPredicate.
If a comparator is not specified for PagingPredicate, but you want to get a collection of keys or
values page by page, this collection must be an instance of Comparable (i.e., it must implement
java.lang.Comparable). Otherwise, the java.lang.IllegalArgument exception is thrown.
You can also access a specific page more easily with the help of the setPage() method. This way, if
you make a query for the hundredth page, for example, it gets all 100 pages at once instead of
reaching the hundredth page one by one using the nextPage() method. Note that this feature tires
the memory and see the PagingPredicate Javadoc.
Paging Predicate, also known as Order & Limit, is not supported in Transactional Context.
260
11.1.6. Filtering with Partition Predicate
You can run queries on a single partition in your cluster using the partition predicate
(PartitionPredicate).
...
Predicate predicate = Predicates.partitionPredicate(partitionKey, Predicates
.alwaysTrue());
By default there are 271 partitions, and using a regular predicate, each partition needs to be
accessed. However, if the partition predicate only accesses a single partition, this can lead to a big
performance gain.
For the partition predicate to work correctly, you need to know which partition your data belongs
to so that you can send the request to the correct partition. One of the ways of doing it is to make
use of the PartitionAware interface when data is inserted, thereby controlling the owning partition.
See the PartitionAware section for more information and examples.
A concrete example may be a web shop that sells phones and accessories. To find all the accessories
of a phone, a query could be executed that selects all accessories for that phone. This query is
executed on all members in the cluster and therefore could generate quite a lot of load. However, if
we would store the accessories in the same partition as the phone, the partition predicate could use
the partitionKey of the phone to select the right partition and then it queries for the accessories for
that phone; and this reduces the load on the system and get faster query results.
Hazelcast distributed queries run on each member in parallel and return only the results to the
caller. Then, on the caller side, the results are merged.
When a query runs on a member, Hazelcast iterates through all the owned entries and find the
matching ones. This can be made faster by indexing the mostly queried fields, just like you would
do for your database. Indexing adds overhead for each write operation but queries will be a lot
faster. If you query your map a lot, make sure to add indexes for the most frequently queried fields.
For example, if you do an active and age < 30 query, make sure you add an index for the active
and age fields. The following example code does that by getting the map from the Hazelcast instance
and adding indexes to the map with the IMap addIndex method.
261
IMap map = hazelcastInstance.getMap( "employees" );
// ordered, since we have ranged queries for this field
map.addIndex(new IndexConfig(IndexType.SORTED, "age"));
// not ordered, because boolean field cannot have range
map.addIndex(new IndexConfig(IndexType.HASH, "active"));
Note that creating indexes once is sufficient. Subsequent operations are reflected in the index
automatically. So, although it is safe to call the addIndex() method consecutively, there will be a
performance penalty due to the redundant index creation.
When you call, for example, map.addIndex("fieldName", true), each partition iterates over their
recordset and creates an index for each entry. The previously created index entry will be recreated
and replaced with the new entry. The performance penalty will be parallel with the number of
entries. If you have maps with a large number of entries, then synchronizing index addition
process is recommended.
Other than using the addIndex() method, you can define your index declaratively or
programmatically as described in the Configuring IMap Indexes section.
IMap.addIndex(IndexConfig) is used for adding index. For each indexed field, if you have ranged
queries such as age>30, age BETWEEN 40 AND 60, then use IndexType.SORTED index Otherwise, use
IndexType.HASH.
Also, you can define IMap indexes in configuration. An example is shown below.
<hazelcast>
...
<map name="default">
<indexes>
<index type="HASH">
<attributes>
<attribute>name</attribute>
</attributes>
</index>
<index>
<attributes>
<attribute>age</attribute>
</attributes>
</index>
</indexes>
</map>
...
</hazelcast>
262
You can also define IMap indexes using programmatic configuration, as in the example below.
The following is the Spring declarative configuration for the same example.
<hz:map name="default">
<hz:indexes>
<hz:index type="HASH">
<hz:attributes>
<hz:attribute>name</hz:attribute>
</hz:attributes>
</hz:index>
<hz:index>
<hz:attributes>
<hz:attribute>age</hz:attribute>
</hz:attributes>
</hz:index>
</hz:indexes>
</hz:map>
If you configure the data structure to use High-Density Memory Store and indexes,
the indexes are automatically stored in the High-Density Memory Store as well.
This prevents from running into full garbage collections when doing a lot of
updates to index.
Composite Indexes
Composite indexes, also known as compound indexes, are special kind of indexes that are built on
top of the multiple map entry attributes and therefore may be used to significantly speed up the
queries involving those attributes simultaneously.
There are two distinct composite index types used for two different purposes: unordered composite
indexes and ordered ones.
The unordered indexes are used to perform equality queries, also known as the point queries, e.g.,
name = 'Alice'. These are specifically optimized for equality queries and don’t support other
comparison operators like > or <=.
Additionally, the composite unordered indexes allow speeding up the equality queries involving
multiple attributes simultaneously, e.g., name = 'Alice' and age = 33. This example query results in
a single composite index lookup operation which can be performed very efficiently.
263
The unordered composite index on the name and age attributes may be configured for a map as
follows:
<hazelcast>
...
<map name="persons">
<indexes>
<index type="HASH">
<attributes>
<attribute>name</attribute>
<attribute>age</attribute>
</attributes>
</index>
</indexes>
</map>
...
</hazelcast>
The attributes indexed by the unordered composite indexes can’t be matched partially: the name =
'Alice' query can’t utilize the composite index configured above.
The ordered indexes are specifically designed to perform efficient order comparison queries, also
known as the range queries, e.g., age > 33. The equality queries, like age = 33, are still supported by
the ordered indexes, but they are handled in a slightly less efficient manner comparing to the
unordered indexes.
The composite ordered indexes extend the concept by allowing multiple equality predicates and a
single order comparison predicate to be combined into a single index query operation. For
instance, the name = 'Alice' and age > 33 and name = 'Bob' and age = 33 and balance > 0.0 queries
are good candidates to be covered by an ordered composite index configured as follows:
<hazelcast>
...
<map name="persons">
<indexes>
<index>
<attributes>
<attribute>name</attribute>
<attribute>age</attribute>
<attribute>balance</attribute>
</attributes>
</index>
</indexes>
</map>
...
</hazelcast>
264
Unlike the unordered composite indexes, partial attribute prefixes may be matched for the ordered
composite indexes. In general, a valid non-empty attribute prefix is formed as a sequence of zero or
more equality predicates followed by a zero or exactly one order comparison predicate. Given the
index definition above, the following queries may be served by the index: name = 'Alice', name >
'Alice', name = 'Alice' and age > 33, name = 'Alice' and age = 33 and balance = 5.0. The
following queries can’t be served the index: age = 33, age > 33 and balance = 0.0, balance > 0.0.
While matching the ordered composite indexes, multiple order comparison predicates acting on the
same attribute are treated as a single range predicate acting on that attribute. Given the index
definition above, the following queries may be served by the index: name > 'Alice' and name <
'Bob', name = 'Alice' and age > 33 and age < 55, name = 'Alice' and age = 33 and balance > 0.0
and balance < 100.0.
The order of attributes involved in a query plays no role in the selection of the matching composite
index: name = 'Alice' and age = 33 and age = 33 and name = 'Alice' queries are equivalent from
the point of view of the index matching procedure.
The attributes involved in a query can be matched partially by the composite index matcher: name =
'Alice' and age = 33 and balance > 0.0 can be partially matched by the name, age composite index,
the name = 'Alice' and age = 33 predicates are served by the matched index, while the balance >
0.0 predicate is processed by other means.
Bitmap Indexes
Bitmap indexes provide capabilities similar to unordered/hash indexes. The same set of predicates
is supported:
• equal
• notEqual
• in,
• and
• or
• not
But, unlike hash indexes, bitmap indexes are able to achieve a much higher memory efficiency for
low cardinality attributes at the cost of reduced query performance. In practice, the query
performance is comparable to the performance of hash indexes, while memory footprint reduction
is high, usually around an order of magnitude.
Bitmap indexes are specifically designed for indexing of collection and array attributes since a
single IMap entry produces many index entries in that case. A single hash index entry costs a few
tens of bytes, while a single bitmap index entry usually costs just a few bytes.
It’s also possible to improve the memory footprint while indexing regular single-value attributes,
but the improvement is usually minor, depending on the data layout and total number of indexes.
265
Currently, bitmap indexes are not supported by off-heap High-Density Memory
Stores (HD).
In the simplest form, bitmap index for an IMap entry attribute can be declaratively configured as
follows:
<hazelcast>
...
<map name="persons">
<indexes>
<index type="BITMAP">
<attributes>
<attribute>age</attribute>
</attributes>
</index>
</indexes>
</map>
...
</hazelcast>
Internally, a unique non-negative long ID is assigned to every indexed IMap entry based on the entry
key. That unique ID is required for bitmap indexes to distinguish one indexed IMap entry from
another.
The mapping between IMap entries and long IDs is not free and its performance and memory
footprint can be improved in certain cases. For instance, if IMap entries already have a unique
integer-valued attribute, the attribute values can be used as unique long IDs directly without any
additional transformations. That can be configured as follows:
<index type="BITMAP">
<attributes>
<attribute>age</attribute>
</attributes>
<bitmap-index-options>
<unique-key>uniqueId</unique-key>
<unique-key-transformation>RAW</unique-key-transformation>
</bitmap-index-options>
</index>
The index definition above instructs Hazelcast to create a bitmap index on the age attribute, extract
the unique key values from uniqueId attribute and use the raw (RAW) extracted values directly as
long IDs. If the extracted unique key value is not of long type, the widening conversion is performed
for the following types: byte, short and int; boxed variants are also supported.
In certain cases, the extracted raw IDs might be randomly distributed. This causes increased
memory usage in bitmap indexes since the best case scenario for them is sequential contiguous IDs.
266
That can be countered by applying the renumbering technique:
<index type="BITMAP">
<attributes>
<attribute>age</attribute>
</attributes>
<bitmap-index-options>
<unique-key>uniqueId</unique-key>
<unique-key-transformation>LONG</unique-key-transformation>
</bitmap-index-options>
</index>
The index definition above instructs the bitmap index to extract the unique keys from uniqueId
attribute, convert every extracted non-negative value to long (LONG) and assign an internal
sequential unique long ID based on that extracted and then converted unique value. The widening
conversion is applied to the extracted values, if necessary.
This long-to-long mapping is performed more efficiently than the general object-to-long mapping
done for the simple index definitions. Basically, the following simple bitmap index definition:
<index type="BITMAP">
<attributes>
<attribute>age</attribute>
</attributes>
</index>
<index type="BITMAP">
<attributes>
<attribute>age</attribute>
</attributes>
<bitmap-index-options>
<unique-key>__key</unique-key>
<unique-key-transformation>OBJECT</unique-key-transformation>
</bitmap-index-options>
</index>
Which indexes age attribute, uses IMap entry keys (__key) interpreted as Java objects (OBJECT) to
assign internal unique long IDs.
267
<index type="BITMAP">
<attributes>
<attribute><attr></attribute>
</attributes>
<bitmap-index-options>
<unique-key><key></unique-key>
<unique-key-transformation><transformation></unique-key-transformation>
</bitmap-index-options>
</index>
• <key>: Specifies the attribute to use as a unique key source for internal unique long ID
assignment.
◦ LONG: Long-to-long transformation. Each extracted unique key value is interpreted as a non-
negative long value, the widening conversion from byte, short and int is performed, if
necessary. Internally, a long-to-long hash table is used to establish the mapping from unique
keys to unique IDs, which is more efficient than the object-to-long hash table. It is good for
sparse/random unique integer-valued keys renumbering to raise the IDs density and to
make the bitmap index more memory-efficient as a result.
◦ RAW: Raw transformation. Each extracted unique key value is interpreted as a non-negative
long value, the widening conversion from byte, short and int is performed, if necessary.
Internally, no hash table of any kind is used to establish the mapping from unique keys to
unique IDs, the raw extracted keys are used directly as IDs. It is good for dense unique
integer-valued keys, and it has the best performance in terms of time and memory.
The regular dotted attribute path syntax is supported for <attr> and <key>:
268
<index type="BITMAP">
<attributes>
<attribute>name.first</attribute>
</attributes>
</index>
<index type="BITMAP">
<attributes>
<attribute>name.first</attribute>
</attributes>
<bitmap-index-options>
<unique-key>__key.id</unique-key>
</bitmap-index-options>
</index>
<index type="BITMAP">
<attributes>
<attribute>name.first</attribute>
</attributes>
<bitmap-index-options>
<unique-key>id.external</unique-key>
</bitmap-index-options>
</index>
...
Collection and array indexing is also possible using the regular syntax:
<index type="BITMAP">
<attributes>
<attribute>habits[any]</attribute>
</attributes>
</index>
<index type="BITMAP">
<attributes>
<attribute>habits[0]</attribute>
</attributes>
</index>
...
Bitmap index matching and selection for queries are performed automatically. No special treatment
is required. The querying can be performed using the regular IMap querying methods:
IMap.values(Predicate), IMap.entrySet(Predicate), etc.
Copying Indexes
The underlying data structures used by the indexes need to copy the query results to make sure
269
that the results are correct. This copying process is performed either when reading the index from
the data structure (on-read) or writing to it (on-write).
On-read copying means that, for each index-read operation, the result of the query is copied before
it is sent to the caller. Depending on the query result’s size, this type of index copying may be
slower since the result is stored in a map, i.e., all entries need to have the hash calculated before
being stored. Unlike the index-read operations, each index-write operation is fast, since there is no
copying. So, this option can be preferred in index-write intensive cases.
On-write copying means that each index-write operation completely copies the underlying map to
provide the copy-on-write semantics and this may be a slow operation depending on the index size.
Unlike index-write operations, each index-read operation is fast since the operation only includes
accessing the map that stores the results and returning them to the caller.
Another option is never copying the results of a query to a separate map. This means the results
backed by the underlying index-map can change after the query has been executed (such as an
entry might have been added or removed from an index, or it might have been remapped). This
option can be preferred if you expect "mostly correct" results, i.e., if it is not a problem when some
entries returned in the query result set do not match the initial query criteria. This is the fastest
option since there is no copying.
You can set one these options using the system property hazelcast.index.copy.behavior. The
following values, which are explained in the above paragraphs, can be set:
• COPY_ON_WRITE
• NEVER
Usage of this system property is supported for BINARY and OBJECT in-memory
formats. Only in Hazelcast 3.8.7, it is also supported for NATIVE in-memory format.
You can also define custom attributes that may be referenced in predicates, queries and indexes.
Custom attributes can be defined by implementing a ValueExtractor. See the Custom Attributes
section for details.
You can use the keyword this as an attribute name while adding an index or creating a predicate. A
basic usage is shown below.
270
Predicates.sql("this = 'jones'")
Predicates.sql("this.age > 33")
The special attribute this acts on the value of a map entry. Typically, you do not need to specify it
while accessing a property of an entry’s value, since its presence is implicitly assumed if the special
attribute __key is not specified.
You can change the size of thread pool dedicated to query operations using the pool-size property.
Each query consumes a single thread from a Generic Operations ThreadPool on each Hazelcast
member - let’s call it the query-orchestrating thread. That thread is blocked throughout the whole
execution-span of a query on the member.
The query-orchestrating thread uses the threads from the query-thread pool in the following cases:
See the Filtering with Paging Predicates section and System Properties appendix for information on
paging predicates and for description of the above system property.
<hazelcast>
...
<executor-service name="hz:query">
<pool-size>100</pool-size>
</executor-service>
...
</hazelcast>
When dealing with the query requests coming from the clients to your members, Hazelcast offers
the following system properties to tune your thread pools:
271
• hazelcast.clientengine.query.thread.count which is the number of threads to process query
requests coming from the clients. Its default value is the number of cores.
If there are a lot of query request from the clients, you may want to increase the value of
hazelcast.clientengine.query.thread.count. In addition to this tuning, you may also consider
increasing the value of hazelcast.clientengine.thread.count if the CPU load in your system is not
high and there is plenty of free memory.
class Motorbike {
Wheel[] wheels;
}
class Wheel {
String name;
In order to query a single element of a collection/array, you can execute the following query:
It is also possible to query a collection/array using the any semantic as shown below:
The exact same query may be executed using the SQL predicate as shown below:
272
Hazelcast requires all elements of a collection to have the same type. Considering
and expanding the above example:
• If you have a wheels collection attribute, all of its elements must be of the Wheel
type, subclasses of Wheel are not allowed.
• Let’s say you have added a seats collection attribute, which is a Seat object.
Then all of its elements must of this concrete Seat type.
So, you may have collections of different types in your map. However, each
collection’s elements must be of the same concrete type within that collection
attribute.
You can also create an index using a query in collections and arrays.
Please note that in order to leverage the index, the attribute name used in the query has to be the
same as the one used in the index definition.
<hazelcast>
...
<indexes>
<index type="HASH">
<attributes>
<attribute>wheels[any].name</attribute>
</attributes>
</index>
</indexes>
...
</hazelcast>
The following query, however, does NOT leverage the index, since it does not use exactly the same
attribute name that was used in the index:
Predicates.equal("wheels[0].name", "front-wheel")
273
In order to use the index in the case mentioned above, you have to create another index, as shown
below:
<hazelcast>
...
<indexes>
<index type="HASH">
<attributes>
<attribute>wheels[0].name</attribute>
</attributes>
</index>
</indexes>
...
</hazelcast>
Handling of corner cases may be a bit different than in a programming language like Java.
Let’s have a look at the following examples in order to understand the differences. To make the
analysis simpler, let’s assume that there is only one Motorbike object stored in a Hazelcast Map.
Looking at examples 4, 6 and 8, we can also easily notice that it is impossible to distinguish which
part of the expression was null. If we execute the following query wheels[1].name = null, it may be
evaluated to true because:
274
• name attribute of the wheels[1] object is null.
In order to make the query unambiguous, extra conditions would have to be added, e.g., wheels !=
null AND wheels[1].name = null.
A custom attribute is a "synthetic" attribute that does not exist as a field or a getter in the object
that it is extracted from. Thus, it is necessary to define the policy on how the attribute is supposed
to be extracted. Currently the only way to extract a custom attribute is to implement a
com.hazelcast.query.extractor.ValueExtractor that encompasses the extraction logic.
Custom Attributes are compatible with all Hazelcast serialization methods, including the Portable
serialization.
Portable serialization is a special kind of serialization where there is no need to have the class of
the serialized object on the classpath in order to read its attributes. That is the reason why the
target object passed to the ValueExtractor.extract() method is not of the exact type that has been
stored. Instead, an instance of a com.hazelcast.query.extractor.ValueReader is passed. ValueReader
enables reading the attributes of a Portable object in a generic and type-agnostic way. It contains
two methods:
• read(String path, ValueCollector<T> collector) - enables passing all results directly to the
ValueCollector.
It sounds counter-intuitive, but a single extraction may return multiple values when arrays or
collections are involved. Let’s have a look at the following data structure in pseudo-code:
275
class Motorbike {
Wheel[] wheel;
}
class Wheel {
String name;
}
Let’s assume that we want to extract the names of all wheels from a single motorbike object. Each
motorbike has two wheels so there are two names for each bike. In order to return both values
from the extraction operation, collect them separately using the ValueCollector. Collecting multiple
values in this way allows you to operate on these multiple values as if they were single values
during the evaluation of the predicates.
Let’s assume that we registered a custom extractor with the name wheelName and executed the
following query: wheelName = front-wheel.
The extraction may return up to two wheel names for each Motorbike since each Motorbike has up to
two wheels. In such a case, it is enough if a single value evaluates the predicate’s condition to true
to return a match, so it returns a Motorbike if "any" of the wheels matches the expression.
A ValueExtractor may use a custom argument if it is specified in the query. The custom argument
may be passed within the square brackets located after the name of the custom attribute, e.g.,
customAttribute[argument].
Let’s have a look at the following query: currency[incoming] == EUR The currency is a custom
attribute that uses a com.test.CurrencyExtractor for extraction.
The string incoming is an argument that is passed to the ArgumentParser during the extraction. The
parser parses the string according to its custom logic and it returns a parsed object. The parsed
object may be a single object, array, collection, or any arbitrary object. It is up to the ValueExtractor
implementation to understand the semantics of the parsed argument object.
For now it is not possible to register a custom ArgumentParser, thus a default parser is used. It
follows a pass-through semantic, which means that the string located in the square brackets is
passed "as is" to the ValueExtractor.extract() method.
Please note that using square brackets within the argument string are not allowed.
The following snippet demonstrates how to define a custom attribute using a ValueExtractor.
276
AttributeConfig attributeConfig = new AttributeConfig();
attributeConfig.setName("currency");
attributeConfig.setExtractorClassName("com.bank.CurrencyExtractor");
currency is the name of the custom attribute that will be extracted using the CurrencyExtractor class.
Keep in mind that an extractor may not be added after the map has been instantiated. All extractors
have to be defined upfront in the map’s initial configuration.
The following snippet demonstrates how to define a custom attribute in the Hazelcast XML
Configuration.
<hazelcast>
...
<map name="trades">
<attributes>
<attribute extractor-class-name="com.bank.CurrencyExtractor">
currency</attribute>
</attributes>
</map>
...
</hazelcast>
Analogous to the example above, currency is the name of the custom attribute that will be extracted
using the CurrencyExtractor class.
Please note that an attribute name may begin with an ASCII letter [A-Za-z] or digit [0-9] and may
contain ASCII letters [A-Za-z], digits [0-9] or underscores later on.
The name of the attribute used in the index definition has to match the one used in the attributes
configuration.
Defining indexes with extraction arguments is allowed, as shown in the example below:
277
<hazelcast>
...
<indexes>
<!-- custom attribute without an extraction argument -->
<index>
<attributes>
<attribute>currency</attribute>
</attributes>
</index>
<!-- custom attribute using an extraction argument -->
<index>
<attributes>
<attribute>currency[incoming]</attribute>
</attributes>
</index>
</indexes>
...
</hazelcast>
11.4. Aggregations
Aggregations allow to compute a value of some function (e.g sum or max) over the stored map
entries. The computation is performed in a fully distributed manner, so no data other than the
computed function value is transferred to a caller, making the computation fast.
If the in-memory format of your data is NATIVE, aggregations always run on the
partition threads. If the data is of type BINARY or OBJECT, they also mostly run on the
partition threads, however, they may run on the separate query threads to avoid
blocking partition threads (if there are no ongoing migrations).
1. accumulate()
2. combine()
3. aggregate()
These callbacks enable releasing the state that might have been initialized and stored in the
Aggregator - to reduce the network traffic.
Each phase is described below. See also the Aggregator Javadoc for the API’s details.
278
Accumulation:
During the accumulation phase each Aggregator accumulates all entries passed to it by the query
engine. It accumulates only those pieces of information that are required to calculate the
aggregation result in the last phase - that’s implementation specific.
Combination:
Since aggregation is executed in parallel on each partition of the cluster, the results need to be
combined after the accumulation phase in order to be able to calculate the final result.
In case of the DoubleAverage aggregation, the aggregator would sum up all the sums of the elements
and all the counts.
Aggregation:
Aggregation is the last phase that calculates the final result from the results accumulated and
combined in the preceding phases.
In case of the DoubleAverage aggregation, the Aggregator would just divide the sum of the elements
by their count (if non-zero).
Aggregations are available on com.hazelcast.core.IMap only. IMap offers the method aggregate to
apply the aggregation logic on the map entries. This method can be called with or without a
predicate. You can refer to its Javadoc to see the method details.
279
private static void simpleCustomAverageAggregation(IMap<String, FAEmployee> employees)
{
System.out.println("Calculating salary average");
@Override
public void accumulate(Map.Entry<String, FAEmployee> entry) {
count++;
sum += entry.getValue().getSalaryPerMonth();
}
@Override
public void combine(Aggregator aggregator) {
this.sum += this.getClass().cast(aggregator).sum;
this.count += this.getClass().cast(aggregator).count;
}
@Override
public Double aggregate() {
if (count == 0) {
return null;
}
return ((double) sum / (double) count);
}
});
• the accumulate() method calculates the sum and count of the elements
• the combine() method combines the results from all the accumulations
• count
280
• distinct
• bigDecimal sum/avg/min/max
• bigInteger sum/avg/min/max
• double sum/avg/min/max
• integer sum/avg/min/max
• long sum/avg/min/max
• number avg
• comparable min/max
• fixedPointSum, floatingPointSum
To use the any of these Aggregators, instantiate them using the Aggregators factory class.
Each built-in Aggregator can also navigate to an attribute of the object passed to the accumulate()
method (via reflection). For example, Aggregators.distinct("address.city") extracts the
address.city attribute from the object passed to the Aggregator and accumulate the extracted value.
On each partition, after the entries have been passed to the aggregator, the accumulation runs in
parallel. It means that each aggregator is cloned and receives a sub-set of the entries received from
a partition. Then, it runs the accumulation phase in all of the cloned aggregators - at the end, the
result is combined into a single accumulation result. It speeds up the processing by at least the
factor of two - even in case of simple aggregations. If the accumulation logic is more "heavy", the
speed-up may be more significant.
In order to switch the accumulation into a sequential mode just set the
hazelcast.aggregation.accumulation.parallel.evaluation property to false (it’s set to true by
default).
11.5. Projections
There are cases where instead of sending all the data returned by a query from a member, you
want to transform (strip down) each result object in order to avoid redundant network traffic.
For example, you select all employees based on some criteria, but you just want to return their
name instead of the whole Employee object. It is easily doable with the Projection API.
The Projection API provides the method transform() which is called on each result object. Its result
is then gathered as the final query result entity. You can refer to the Projection Javadoc for the API’s
details.
Projections are available on com.hazelcast.core.IMap only. IMap offers the method project to apply
281
the projection logic on the map entries. This method can be called with or without a predicate. See
its Javadoc to see the method details.
public Employee() {
}
To return just the names of the Employees, you can run the query in the following way:
@Override
public String transform(Map.Entry<String, Employee> entry) {
return entry.getValue().getName();
}
}, somePredicate);
• singleAttribute
• multiAttribute
The singleAttribute Projection enables extracting a single attribute from an object (via reflection).
For example, Projection.singleAttribute("address.city") extracts the address.city attribute from
the object passed to the Projection.
The multiAttribute Projection enables extracting multiples attributes from an object (via
reflection). For example, Projection.multiAttribute("address.city", "postalAddress.city") extracts
both attributes from the object passed to the Projection and return them in an Object[] array.
282
11.6. Continuous Query Cache
A continuous query cache is used to cache the result of a continuous query. After the construction
of a continuous query cache, all changes on IMap are asynchronously reflected to this cache via
events. This makes this cache as an asynchronously updated view of IMap. You can create a
continuous query cache either on the client or member.
A continuous query cache is beneficial when you need to query the distributed IMap data in a very
frequent and fast way. By using a continuous query cache, the result of the query will always be
ready and local to the application.
The following code snippet shows how you can access a continuous query cache from a member.
The following code snippet shows how you can access a continuous query cache from the client
side. The difference in this code from the member side code above is that you configure and
instantiate a client instance instead of a member instance.
283
11.6.4. Features of Continuous Query Cache
The following features of continuous query cache are valid for both the member and client:
• The initial query that is run on the existing IMap data during the continuous query cache
construction can be enabled/disabled according to the supplied predicate via
QueryCacheConfig.setPopulate().
• Continuous query cache allows you to run queries with indexes and perform event batching
and coalescing.
• A continuous query cache is evictable. Note that a continuous query cache has a default
maximum capacity of 10000. If you need a non-evictable cache, you should configure the
eviction via QueryCacheConfig.setEvictionConfig().
• IMap events are reflected in continuous query cache in the same order as they were generated
on map entries. Since events are created on entries stored in partitions, ordering of events is
maintained based on the ordering within the partition. You can add listeners to capture lost
events using EventLostListener and you can recover lost events with the method
QueryCache.tryRecover(). Recovery of lost events largely depends on the size of the buffer on
Hazelcast members. Default buffer size is 16 per partition, i.e., 16 events per partition can be
maintained in the buffer. If the event generation is high, setting the buffer size to a higher
number provides better chances of recovering lost events. You can set buffer size with
QueryCacheConfig.setBufferSize(). You can use the following example code for a recovery case.
• You can populate a continuous query cache with only the keys of its entries and retrieve the
subsequent values directly via QueryCache.get() from the underlying IMap. This helps to
decrease the initial population time when the values are very large.
You can configure continuous query cache declaratively or programmatically; the latter is mostly
explained in the previous section. The parent configuration element is <query-caches> which should
be placed within your <map> configuration. You can create your query caches using the <query-
cache> sub-element under <query-caches>.
284
<hazelcast>
...
<map>
<query-caches>
<query-cache name="myContQueryCache">
<include-value>true</include-value>
<predicate type="class-name">
com.hazelcast.examples.ExamplePredicate</predicate>
<entry-listeners>
<entry-listener>...</entry-listener>
</entry-listeners>
<in-memory-format>BINARY</in-memory-format>
<populate>true</populate>
<coalesce>false</coalesce>
<batch-size>2</batch-size>
<delay-seconds>3</delay-seconds>
<buffer-size>32</buffer-size>
<eviction size="1000" max-size-policy="ENTRY_COUNT" eviction-policy=
"LFU"/>
<indexes>
<index>
<attributes>
<attribute>age</attribute>
</attributes>
</index>
</indexes>
</query-cache>
</query-caches>
</map>
...
</hazelcast>
• include-value: Specifies whether the value will be cached too. Its default value is true.
• predicate: Predicate to filter events which are applied to the query cache.
• entry-listeners: Adds listeners (listener classes) for your query cache entries. See the
Registering Map Listeners section.
• in-memory-format: Type of the data to be stored in your query cache. See the Setting In-Memory
Format section. Its default value is BINARY.
• populate: Specifies whether the initial population of your query cache is enabled. Its default
value is true.
• coalesce: Specifies whether the coalescing of your query cache is enabled. Its default value is
false.
• delay-seconds: Minimum time in seconds that an event waits in the member’s buffer. Its default
285
value is 0.
• batch-size: Batch size used to determine the number of events sent in a batch to your query
cache. Its default value is 1.
• buffer-size: Maximum number of events which can be stored in a partition buffer. Its default
value is 16.
• eviction: Configuration for the eviction of your query cache. See the Configuring Map Eviction
section.
• indexes: Indexes for your query cache defined by using this element’s <index> sub-elements. See
the Configuring IMap Indexes section.
Please take the following configuration considerations and publishing logic into account:
If delay-seconds is equal to or smaller than 0, then batch-size loses its function. Each time there is
an event, all the entries in the buffer are pushed to the subscriber.
• If coalesce is set to true, the buffer is checked for an event with the same key; if so, it is
overridden by the current event. Then:
◦ The current size of the buffer is checked: if the current size of the buffer is equal to or larger
than batch-size, then the events counted as much as the batch-size are pushed to the
subscriber. Otherwise, no events are sent.
◦ After finishing with checking batch-size, the delay-seconds is checked. The buffer is scanned
from the oldest to youngest entries; all the entries that are older than delay-seconds are
pushed to the subscriber.
11.7.1. Motivation
We’ve decided to deprecate the MapReduce framework in Hazelcast IMDG 3.8. MapReduce support
was completely removed in Hazelcast IMDG 4.0. The MapReduce framework provided the
distributed computing model and it was used to back the old Aggregations system. Unfortunately
the implementation didn’t live up to the expectations and adoption wasn’t high, so it never got out
of Beta status. Apart from that the current shift in development away from M/R-like processing to a
more near-realtime, streaming approach left us with the decision to deprecate and finally remove
the MapReduce framework from Hazelcast IMDG. With that said, we want to introduce the
successors and replacements; fast Aggregations on top of Query infrastructure and the Hazelcast Jet
distributed computing platform.
MapReduce is a very powerful tool, however it’s demanding in terms of space, time and bandwidth.
286
We realized that we don’t need so much power when we simply want to find out a simple metric
such as the number of entries matching a predicate. Therefore, the built-in aggregations were
rebuilt on top of the existing Query infrastructure (count, sum, min, max, mean, variance) which
automatically leverages any matching query index. The aggregations are computed in tho phases:
It is not as flexible as a full-blown M/R system due to the 2nd phase being single-member and the
input can be massive in some use cases. The member doing the 2nd step needs enough capacity to
hold all intermediate results from all members from the 1st step, but in practice it is sufficient for
many aggregation tasks like "find average" or "find highest" and other common examples.
• improved performance
• simplified API
See the Aggregations section for examples. If you need a more powerful tool like MapReduce, then
there is Hazelcast Jet. See its reference here and website for more information.
Hazelcast has native support for aggregation operations on the contents of its distributed data
structures. They operate on the assumption that the aggregating function is commutative and
associative, which allows the two-tiered approach where first the local data is aggregated, then all
the local subresults sent to one member, where they are combined and returned to the user. This
approach works quite well as long as the result is of manageable size. Many interesting
aggregations produce an O(1) result and for those, the native aggregations are a good match.
The main area where native aggregations may not be sufficient are the operations that group the
data by key and produce results of size O (keyCount). The architecture of Hazelcast aggregations is
not well adapted to this use case, although it still works even for moderately-sized results (up to 100
MB, as a ballpark figure). Beyond these numbers, and whenever something more than a single
aggregation step is needed, Jet becomes the preferred choice. In the mentioned use case Jet helps
because it doesn’t send the entire hashtables in serialized form and materialize all the results on
the user’s machine, but rather streams the key-value pairs directly into a target IMap. Since it is a
distributed structure, it doesn’t focus its load on a single member.
Jet’s DAG paradigm offers much more than the basic map-reduce-combine cascade. Among other
setups, it can compose several such cascades and also perform co-grouping, joining and many other
operations in complex combinations.
12. CP Subsystem
287
CP Subsystem operates in the unsafe mode by default without the strong
consistency guarantee. See the CP Subsystem Unsafe Mode section for more
information. You should set a positive number to the CP member count
configuration to enable CP Subsystem and use it with the strong consistency
guarantee. See the CP Subsystem Configuration section for details.
CP Subsystem is a component of a Hazelcast cluster that builds a strongly consistent layer for a set
of distributed data structures. Its APIs can be used for implementing distributed coordination use
cases, such as leader election, distributed locking, synchronization, and metadata management. It is
accessed via HazelcastInstance.getCPSubsystem(). Its data structures are CP with respect to the CAP
principle, i.e., they always maintain linearizability and prefer consistency over availability during
network partitions. Besides network partitions, CP Subsystem withstands server and client failures.
Currently, CP Subsystem contains only the implementations of Hazelcast’s concurrency APIs. Since
these APIs do not maintain large states, all members of a Hazelcast cluster do not necessarily take
part in CP Subsystem. The number of Hazelcast members that take part in CP Subsystem is specified
with CPSubsystemConfig.setCPMemberCount(int). Say that it is configured as N. Then, when a Hazelcast
cluster starts, the first N members form CP Subsystem. These members are called CP members and
they can also contain data for the other regular AP Hazelcast data structures, such as IMap, ISet.
Data structures in CP Subsystem run in CP groups. Each CP group elects its own Raft leader and
runs the Raft consensus algorithm independently. CP Subsystem runs 2 CP groups by default:
• The first one is the METADATA CP group which is an internal CP group responsible for
managing CP members and CP groups. It is initialized during cluster startup if CP Subsystem is
enabled via CPSubsystemConfig.setCPMemberCount(int).
Besides these 2 predefined CP groups, custom CP groups can be created at run-time while fetching
the CP data structure proxies. For instance, if a CP IAtomicLong is created by calling
.getAtomicLong("myAtomicLong@myGroup"), first a new CP group is created with the name myGroup and
then myAtomicLong is initialized on this custom CP group.
This design implies that each CP member can participate to more than one CP group. CP Subsystem
runs a periodic background task to ensure that each CP member performs the Raft leadership role
for roughly equal number of CP groups. For instance, if there are 3 CP members and 3 CP groups,
each CP member becomes Raft leader for only 1 CP group. If one more CP group is created, then one
of the CP members gets the Raft leader role for 2 CP groups. This is done because Raft is a leader-
based consensus algorithm. A Raft leader node becomes responsible for handling incoming
requests from callers and replicating them to follower nodes. If a CP member gets the Raft
leadership role for too many CP groups compared to other CP members, it can turn into a
bottleneck.
288
that this configuration does not have to be the same with the CP member count. Namely, the
number of CP members in CP Subsystem can be larger than the configured CP group size. CP groups
usually consist of an odd number of CP members between 3 and 7. Operations are committed and
executed only after they are successfully replicated to the majority of CP members in a CP group.
An odd number of CP members is more advantageous to an even number because of the quorum or
majority calculations. For a CP group of N members, the majority is calculated as N / 2 + 1. For
instance, in a CP group of 5 CP members, operations are committed when they are replicated to at
least 3 CP members. This CP group can tolerate the failure of 2 CP members and remain available.
However, if we run a CP group with 6 CP members, it can still tolerate the failure of 2 CP members
because the majority of 6 is 4. Therefore, it does not improve the degree of fault tolerance
compared to 5 CP members.
By default, CP Subsystem works only in memory without persisting any state to disk. It means that a
crashed CP member is not able to join to the cluster back by restoring its previous state. Therefore,
crashed CP members create a danger for gradually losing majority of CP groups and eventually
cause the total loss of availability of CP Subsystem. To prevent such situations, crashed CP members
can be removed from CP Subsystem and replaced in CP groups with other available CP members.
This flexibility provides a good degree of fault tolerance at run-time. See the CP Subsystem
Configuration section and CP Subsystem Management section for more details. Moreover, CP
Subsystem Persistence enables more robustness. When it is enabled, CP members persist their local
state to stable storage and can restore their state after crashes. See the CP Subsystem Persistence
section for more details.
289
CPSubsystem cpSubsystem = hazelcastInstance.getCPSubsystem();
The CP data structure proxies differ from the other data Hazelcast structure
proxies in two aspects:
• The local Hazelcast member commits a CP member list which is different from other members'
committed CP member lists
• The local Hazelcast member fails to commit its discovered CP member list for any reason.
290
the CP Subsystem Management APIs section for more details.
CP Subsystem Persistence enables CP members to recover from crash scenarios. This capability
significantly improves the overall reliability of CP Subsystem. When it is enabled via
CPSubsystemConfig.setPersistenceEnabled(boolean), CP members persist their local state to stable
storage. When you restart the crashed CP members, they restore their local state and resume
working as if they have never crashed. CP Subsystem Persistence enables you to handle single or
multiple CP member crashes, or even whole cluster crashes and guarantees that committed
operations are not lost after recovery. In other words, CP member crashes and restarts do not
create any consistency problem. As long as majority of CP members are available after recovery, CP
Subsystem remains operational.
Please see the CP Subsystem Configuration section for the configuration options of CP Subsystem
Persistence.
When CP Subsystem Persistence is enabled, all Hazelcast cluster members create a sub-directory
under the base persistence directory which is specified via CPSubsystemConfig.getBaseDir(). This
means that AP Hazelcast members, which are the ones not marked as CP members during the CP
discovery process, create their persistence directories as well. Those members persist only the
information that they are not CP members. This is done because when a Hazelcast member starts
with CP Subsystem Persistence enabled, it checks if there is a CP persistence directory belonging to
itself. If it founds one, it skips the CP discovery process and initializes its CP member identity from
the persisted data. If it was an AP member before shutdown or crash, it restores this information
and starts as an AP member. Otherwise, it could think that the CP discovery process has not been
executed and trigger it, which would break CP Subsystem.
In light of this information, if you have both CP and AP members in your cluster
when CP Subsystem Persistence is enabled, and if you want to perform a cluster-
wide restart, you need to ensure that AP members are also restarted with their CP
persistence directories.
You can check the code sample below to see how CP Subsystem Persistence works in general. In this
code sample, we configure CP Subsystem with 3 CP members and also enable CP Subsystem
Persistence. We start 3 Hazelcast members with this configuration and update a CP IAtomicLong
instance. Each member creates a sub-directory for itself inside the default base CP Subsystem
Persistence directory and stores its local CP state there. Then, we terminate two of these members
as if they crash and restart only 1 of them back. When we fetch the same IAtomicLong instance from
the restarted members and get its current value, we see that it returns the update that we made
before terminating these members. Please note that we make sure that we have the majority of CP
291
members alive to keep CP Subsystem available after restart.
instance1.getLifecycleService().terminate();
instance2.getLifecycleService().terminate();
instance1 = Hazelcast.newHazelcastInstance(config);
counter = instance1.getCPSubsystem().getAtomicLong("counter");
If the majority of CP members are permanently lost, CP Subsystem becomes unavailable. There is
no solution to recover from this failure case with strong consistency guarantee. CP Subsystem
Management API contains a method to delete all CP Subsystem state on the remaining CP members
and start from scratch. CPSubsystemManagementService.reset() wipes and resets the whole CP
Subsystem state and initializes it as if the Hazelcast cluster is starting up for the first time. This
method deletes the persisted CP member states as well.
Hazelcast offers another persistence capability which is called Hot Restart Persistence. Hot Restart
Persistence is used for restarting a cluster with large AP data after a planned cluster shutdown or a
whole cluster-wide crash. Please note that CP Subsystem Persistence and Hot Restart Persistence
are separate features with different behaviors and reliability guarantees. For instance, CP
Subsystem Persistence guarantees that committed operations will be restored and the
linearizability semantics of the CP Subsystem data structures will be preserved on restarts.
However, Hot Restart Persistence may lose some of the acknowledged updates on AP data
292
structures, based on how you configure the fsync behavior for your persisted AP data structures.
Moreover, if you store AP and CP data in a single Hazelcast cluster and use both of the persistence
features, Hazelcast member restarts or cluster restarts can fail because of the Hot Restart
Persistence recovery semantics, even if the CP Subsystem Persistence recovery procedure is
successful, or vice-versa.
There is a significant behavioral difference during the CP member shutdown when CP Subsystem
Persistence is enabled and disabled. When disabled (the default mode in which CP Subsystem
works only in memory), a shutting down CP member is replaced with other available CP members
in all of its CP groups in order not to decrease or more importantly not to lose majorities of CP
groups. It is because CP members keep their local state only in memory when CP Subsystem
Persistence is disabled, hence a shut-down CP member cannot join back with its CP identity and
state, hence it is better to remove it from CP Subsystem to not to harm availability of CP groups. If
there is no other available CP member to replace a shutting down CP member in a CP group, that CP
group’s size is reduced by 1 and its majority value is recalculated. On the other hand, when CP
Subsystem Persistence is enabled, a shut-down CP member can come back by restoring its CP state.
Therefore, it is not automatically removed from CP Subsystem when CP Subsystem Persistence is
enabled. It is up to you to remove shut-down CP members via
CPSubsystemManagementService.removeCPMember(String) if they will not come back.
• When CP Subsystem Persistence is disabled (the default mode), shutting down CP members are
removed from CP Subsystem and the CP group majority values are recalculated.
• When CP Subsystem Persistence is enabled, shutting down CP members are still kept in CP
Subsystem so they will be a part of the CP group majority calculations.
Moreover, there is a subtle point about concurrent shutdown of CP members when CP Subsystem
Persistence is disabled. If there are N CP members in CP Subsystem, HazelcastInstance.shutdown()
can be called on N-2 CP members concurrently. Once these N-2 CP members complete their
shutdown, the remaining 2 CP members must be shut down serially. Even though the shutdown API
can be called concurrently on multiple members, the METADATA CP group handles shutdown
requests serially. Therefore, it would be simpler to shut down CP members one by one, by calling
HazelcastInstance.shutdown() on the next CP member once the current CP member completes its
shutdown. This rule does not apply when CP Subsystem Persistence is enabled so you can shut
down your CP members concurrently if you enabled CP Subsystem Persistence. It is enough for
users to recall this rule while shutting down CP members when CP Subsystem Persistence is
disabled. If interested, you can read the rest of this paragraph to learn the reasoning behind this
rule. Each shutdown request internally requires a Raft commit to the METADATA CP group when CP
Subsystem Persistence is disabled. A CP member proceeds to shutdown after it receives a response
of this commit. To be able to perform a Raft commit, the METADATA CP group must have its
293
majority up and running. When only 2 CP members are left after graceful shutdowns, the majority
of the METADATA CP group becomes 2. If the last 2 CP members shut down concurrently, one of
them is likely to perform its Raft commit faster than the other one and leave the cluster before the
other CP member completes its Raft commit. In this case, the last CP member waits for a response of
its commit attempt on the METADATA CP group, and times out eventually. This situation causes an
unnecessary delay on the shutdown process of the last CP member. On the other hand, when the
last 2 CP members shut down serially, the N-1th member receives the response of its commit after
its shutdown request is committed also on the last CP member. Then, the last CP member checks its
local data to notice that it is the last CP member alive, and proceeds its shutdown without
attempting a Raft commit on the METADATA CP group.
• If a CP member leaves the Hazelcast cluster, it is not automatically removed from CP Subsystem
because CP Subsystem cannot certainly determine if that member has actually crashed or just
disconnected from the cluster. Therefore, absent CP members are still considered in majority
calculations and cause a danger for the availability of CP Subsystem. If you know for sure that
an absent CP member is crashed, you can remove that CP member from CP Subsystem via
CPSubsystemManagementService.removeCPMember(String). This API call removes the given CP
member from all CP groups and recalculates their majority values. If there is another available
CP member in CP Subsystem, the removed CP member is replaced with that one, or you can
promote an AP member of the Hazelcast cluster to the CP role via
CPSubsystemManagementService.promoteToCPMember().
• There might be a small window of unavailability after a CP member crash even if the majority
of CP members are still online. For instance, if a crashed CP member is the Raft leader for some
CP groups, those CP groups run a new leader election round to elect a new leader among
remaining CP group members. CP Subsystem API calls that internally hit those CP groups are
retried until they have new Raft leaders. If a failed CP member has the Raft follower role, it
causes a very minimal disruption since Raft leaders are still able to replicate and commit
operations with the majority of their CP group members.
294
• A CP member can encounter a network issue and disconnect from the cluster. If you remove
this CP member from CP Subsystem even though it is actually alive but only disconnected, you
should terminate it to prevent any accidental communication with the other CP members in CP
Subsystem.
• CP Subsystem can tolerate failure of the minority of CP members (less than N / 2 + 1) for
availability. If N / 2 + 1 or more CP members crash, CP Subsystem loses its availability. If CP
Subsystem Persistence is enabled and the majority of CP members become online by
successfully restarting some of failed CP members, CP Subsystem regains its availability back.
Otherwise, it means that CP Subsystem has lost its majority irrevocably. In this case, the
only solution is to wipe-out the whole CP Subsystem state by performing a force-reset via
CPSubsystemManagementService.reset().
• When the majority of a non-METADATA CP group permanently crash, that CP group cannot
make progress anymore, even though other CP groups in CP Subsystem are running fine. Even a
new CP member cannot join to this CP group, because membership changes also go through the
Raft consensus algorithm. For this reason, the only option is to force-destroy this CP group via
CPSubsystemManagementService.forceDestroyCPGroup(String). When this API is called, the CP
group is terminated non-gracefully without the Raft mechanics. After this API call, all existing
CP data structure proxies that talk to this CP group fail with CPGroupDestroyedException.
However, if a new proxy is created afterwards, then this CP group is re-created from scratch
with a new set of CP members. Losing majority of a non-METADATA CP group can be likened to
partition-loss scenario of AP Hazelcast. Please note that non-METADATA CP groups that have lost
their majority must be force-destroyed immediately, because they can block the METADATA CP
group to perform membership changes on CP Subsystem.
295
12.5. CP Sessions
For CP data structures that involve resource ownership management, such as Locks or Semaphores,
sessions are required to keep track of liveliness of callers. In this context, caller means an entity
that uses CP Subsystem APIs. It can be either a Hazelcast member or a client. A caller initially
creates a session before sending its very first session based request to the CP group, such as a Lock /
Semaphore acquire. After creating a session on the CP group, the caller stores its session ID locally
and sends it alongside its session-based operations. A single session is used for all lock and
semaphore proxies of the caller. When a CP group receives a session-based operation, it checks the
validity of the session using the session ID information available in the operation. A session is valid
if it is still open in the CP group. An operation with a valid session ID is accepted as a new session
heartbeat. While a caller is idle, in other words, it does not send any session based operation to the
CP group for a while, it commits periodic heartbeats to the CP group in the background in order to
keep its session alive. This interval is specified in
CPSubsystemConfig.getSessionHeartbeatIntervalSeconds().
A session is closed when the caller does not touch the session during a predefined duration. In this
case, the caller is assumed to be crashed and all its resources are released automatically. This
duration is specified in CPSubsystemConfig.getSessionTimeToLiveSeconds(). See the CP Subsystem
Configuration section for recommendations to choose a reasonable session time-to-live duration.
Sessions offer a trade-off between liveliness and safety. If you set a very small value using
CPSubsystemConfig.setSessionTimeToLiveSeconds(int), then a session owner could be considered
crashed very quickly and its resources can be released prematurely. On the other hand, if you set a
large value, a session could be kept alive for an unnecessarily long duration even if its owner
actually crashes. However, it is a safer approach to not to use a small session time-to-live duration.
If a session owner is known to be crashed, its session could be closed manually via
CPSessionManagementService.forceCloseSession(String, long).
12.6. FencedLock
FencedLock is a linearizable & distributed & reentrant implementation of j.u.c.locks.Lock.
FencedLock is accessed via CPSubsystem.getLock(String). It is CP with respect to the CAP principle. It
works on top of the Raft consensus algorithm. It offers linearizability during crash-stop failures and
network partitions. If a network partition occurs, it remains available on at most one side of the
partition.
FencedLock works on top of CP sessions. Please see CP Sessions section for more information about
CP sessions.
By default, FencedLock is reentrant. Once a caller acquires the lock, it can acquire the lock
reentrantly as many times as it wants in a linearizable manner. You can configure the reentrancy
behavior via FencedLockConfig. For instance, reentrancy can be disabled and FencedLock can work as
a non-reentrant mutex. You can also set a custom reentrancy limit. When the reentrancy limit is
already reached, FencedLock does not block a lock call. Instead, it fails with
LockAcquireLimitReachedException or a specified return value. Please check the locking methods to
296
see details about the behavior and FencedLock Configuration section for the configuration.
Distributed locks are unfortunately not equivalent to single-node mutexes because of the
complexities in distributed systems, such as uncertain communication patterns, and independent
and partial failures. In an asynchronous network, no lock service can guarantee mutual exclusion,
because there is no way to distinguish between a slow and a crashed process. Consider the
following scenario, where a Hazelcast client acquires a FencedLock, then hits a long GC pause. Since
it will not be able to commit session heartbeats while paused, its CP session will be eventually
closed. After this moment, another Hazelcast client can acquire this lock. If the first client wakes up
again, it may not immediately notice that it has lost ownership of the lock. In this case, multiple
clients think they hold the lock. If they attempt to perform an operation on a shared resource, they
can break the system. To prevent such situations, you can choose to use an infinite session timeout,
but this time probably you are going to deal with liveliness issues. For the scenario above, even if
the first client actually crashes, the requests sent by two clients can be reordered in the network
and hit the external resource in the reverse order.
There is a simple solution for this problem. Lock holders are ordered by a monotonic fencing token,
which increments each time the lock is assigned to a new owner. This fencing token can be passed
to external services or resources to ensure sequential execution of the side effects performed by
lock holders.
The following diagram illustrates the idea. Client-1 acquires the lock first and receives 1 as its
fencing token. Then, it passes this token to the external service, which is our shared resource in this
scenario. Just after that, Client-1 hits a long GC pause and eventually loses ownership of the lock
because it misses to commit CP session heartbeats. Then, Client-2 chimes in and acquires the lock.
Similar to Client-1, Client-2 passes its fencing token to the external service. After that, once Client-1
comes back alive, its write request will be rejected by the external service, and only Client-2 will be
able to safely talk to it.
You can read more about the fencing token idea in Martin Kleppmann’s How to do distributed
297
locking blog post and Google’s Chubby paper. FencedLock integrates this idea with the
j.u.c.locks.Lock abstraction, excluding j.u.c.locks.Condition. newCondition() is not implemented
and throws UnsupportedOperationException.
All of the API methods in the new FencedLock abstraction offer exactly-once execution semantics.
For instance, even if a lock() call is internally retried because of a crashed CP member, the lock is
acquired only once. The same rule also applies to the other methods in the API.
12.7. Configuration
12.7.1. CP Subsystem Configuration
• session-time-to-live-seconds: Duration for a CP session to be kept alive after the last received
heartbeat. A CP session is closed if no session heartbeat is received during this duration. Session
TTL must be decided wisely. If a very low value is set, a CP session can be closed prematurely if
its owner Hazelcast instance temporarily loses connectivity to CP Subsystem because of a
network partition or a GC pause. In such an occasion, all CP resources of this Hazelcast instance,
such as FencedLock or ISemaphore, are released. On the other hand, if a very large value is set, CP
resources can remain assigned to an actually crashed Hazelcast instance for too long and
liveliness problems can occur. CP Subsystem offers an API in CPSessionManagementService to deal
with liveliness issues related to CP sessions. In order to prevent premature session expires,
session TTL configuration can be set a relatively large value and
CPSessionManagementService.forceCloseSession(String, long) can be manually called to close CP
session of a crashed Hazelcast instance.
298
Default value is 5 seconds.
With the default configuration, missing CP members are automatically removed from CP
Subsystem after 4 hours. This feature is very useful in terms of fault tolerance when CP member
count is also configured to be larger than group size. In this case, a missing CP member will be
safely replaced in its CP groups with other available CP members in CP Subsystem. This
configuration also implies that no network partition is expected to be longer than the
configured duration.
If a missing CP member comes back alive after it is automatically removed from CP Subsystem
with this feature, that CP member must be terminated manually.
• base-dir: Specifies the parent directory where CP data is stored. You can use the default value,
or you can specify the value of another folder, but it is mandatory that base-dir element has a
value. This directory is created automatically if it does not exist.
base-dir is used as the parent directory, and a unique directory is created inside base-dir for
each CP member which uses the same base-dir. That means, base-dir is shared among multiple
CP members safely. This is especially useful for cloud environments where CP members
299
generally use a shared filesystem.
• data-load-timeout-seconds: Timeout duration for CP members to restore their data from disk. A
CP member fails its startup if it cannot complete its CP data restore process in the configured
duration.
Declarative Configuration:
<hazelcast>
...
<cp-subsystem>
<cp-member-count>7</cp-member-count>
<group-size>3</group-size>
<session-time-to-live-seconds>300</session-time-to-live-seconds>
<session-heartbeat-interval-seconds>5</session-heartbeat-interval-seconds>
<missing-cp-member-auto-removal-seconds>14400</missing-cp-member-auto-removal-
seconds>
<fail-on-indeterminate-operation-state>false</fail-on-indeterminate-operation-
state>
<persistence-enabled>true</persistence-enabled>
<base-dir>/custom-cp-dir</base-dir>
</cp-subsystem>
...
</hazelcast>
Programmatic Configuration:
config.getCPSubsystemConfig()
.setCPMemberCount(7)
.setGroupSize(3)
.setSessionTimeToLiveSeconds(300)
.setSessionHeartbeatIntervalSeconds(5)
.setMissingCPMemberAutoRemovalSeconds(14400)
.setFailOnIndeterminateOperationState(false);
• lock-acquire-limit: Maximum number of reentrant lock acquires. Once a caller acquires the
lock this many times, it will not be able to acquire the lock again, until it makes at least one
unlock() call.
By default, no upper bound is set for the number of reentrant lock acquires, which means that
once a caller acquires a FencedLock, all of its further lock() calls will succeed. However, for
300
instance, if you set lock-acquire-limit to 2, once a caller acquires the lock, it will be able to
acquire it once more, but its third lock() call will not succeed.
0 means there is no upper bound for the number of reentrant lock acquires.
Default value is 0.
Declarative Configuration:
<hazelcast>
...
<cp-subsystem>
...
<locks>
<fenced-lock>
<name>reentrant-lock</name>
<lock-acquire-limit>0</lock-acquire-limit>
</fenced-lock>
<fenced-lock>
<name>limited-reentrant-lock</name>
<lock-acquire-limit>10</lock-acquire-limit>
</fenced-lock>
<fenced-lock>
<name>non-reentrant-lock</name>
<lock-acquire-limit>1</lock-acquire-limit>
</fenced-lock>
</locks>
</cp-subsystem>
...
</hazelcast>
Programmatic Configuration:
config.getCPSubsystemConfig()
.addLockConfig(new FencedLockConfig("reentrant-lock", 0))
.addLockConfig(new FencedLockConfig("limited-reentrant-lock", 10))
.addLockConfig(new FencedLockConfig("non-reentrant-lock", 1));
301
must acquire permits before releasing them and it cannot release a permit that it has not
acquired. It means, you can acquire a permit from one thread and release it from another
thread using the same HazelcastInstance, but not different HazelcastInstances. In this mode,
acquired permits are automatically released upon failure of the holder HazelcastInstance. So
there is a minor behavioral difference to the j.u.c.Semaphore.release() method.
• initial-permits: Number of permits to initialize the Semaphore. If a positive value is set, the
Semaphore is initialized with the given number of permits.
Default value is 0.
Declarative Configuration:
<hazelcast>
...
<cp-subsystem>
...
<semaphores>
<cp-semaphore>
<name>jdk-compatible-semaphore</name>
<jdk-compatible>true</jdk-compatible>
</cp-semaphore>
<cp-semaphore>
<name>another-semaphore</name>
<jdk-compatible>false</jdk-compatible>
<initial-permits>5</initial-permits>
</cp-semaphore>
</semaphores>
</cp-subsystem>
...
</hazelcast>
Programmatic Configuration:
config.getCPSubsystemConfig()
.addSemaphoreConfig(new SemaphoreConfig("jdk-compatible-semaphore", true, 0))
.addSemaphoreConfig(new SemaphoreConfig("another-semaphore", false, 5));
302
Default value is 2000 milliseconds.
Default value is 5.
303
append entries requests. After a Raft leader sends an append entries request to a follower, it
will not send a subsequent append entries request either until the follower responds or this
timeout occurs. Backoff durations are increased exponentially if followers remain
unresponsive.
Declarative Configuration:
<hazelcast>
...
<cp-subsystem>
...
<raft-algorithm>
<leader-election-timeout-in-millis>2000</leader-election-timeout-in-
millis>
<leader-heartbeat-period-in-millis>5000</leader-heartbeat-period-in-
millis>
<max-missed-leader-heartbeat-count>5</max-missed-leader-heartbeat-count>
<append-request-max-entry-count>100</append-request-max-entry-count>
<commit-index-advance-count-to-snapshot>10000</commit-index-advance-count-
to-snapshot>
<uncommitted-entry-count-to-reject-new-appends>200</uncommitted-entry-
count-to-reject-new-appends>
<append-request-backoff-timeout-in-millis>250</append-request-backoff-
timeout-in-millis>
</raft-algorithm>
...
</cp-subsystem>
...
</hazelcast>
Programmatic Configuration:
config.getCPSubsystemConfig()
.getRaftAlgorithmConfig()
.setLeaderElectionTimeoutInMillis(2000)
.setLeaderHeartbeatPeriodInMillis(5000)
.setMaxMissedLeaderHeartbeatCount(5)
.setAppendRequestMaxEntryCount(50)
.setAppendRequestMaxEntryCount(1000)
.setUncommittedEntryCountToRejectNewAppends(200)
.setAppendRequestBackoffTimeoutInMillis(250);
304
structures operate in the unsafe mode. In this mode, they use Hazelcast’s partitioning and lazy
replication mechanisms instead of CP Subsystem’s consensus mechanism. For more information
about the lazy replication mechanism of Hazelcast, see the Consistency and Replication Model
chapter.
The unsafe mode provides weaker consistency guarantees compared to when CP Subsystem is
enabled. For example, when you increment an IAtomicLong or acquire a FencedLock, just before
crash of a member, even though you receive a success response, the write operation (increment of
IAtomicLong or acquire of FencedLock) can be lost (which cannot happen when CP Subsystem is
enabled). For this reason, the unsafe mode is not recommended for use-cases requiring strong
consistency. It is more suitable for development or testing. You should take this limitation into
consideration if you use CP Subsystem in production with the unsafe mode.
Moreover, by default CP Subsystem works in memory without persisting any state to disk. It means
that a crashed CP member will not be able to recover by reloading its previous state. Therefore,
crashed CP members create a danger for gradually losing the majority of CP groups and eventually
total loss of the availability of CP Subsystem. To prevent such situations,
CPSubsystemManagementService offers a set of APIs. In addition, CP Subsystem Persistence can be
enabled to make CP members persist their local CP state to stable storage. Please see CP Subsystem
Persistence section for more details.
CP member additions and removals are internally handled by performing a single membership
change at a time. When multiple CP members are shutting down concurrently, their shutdown
process is executed serially. When a CP membership change is triggered, the METADATA CP group
creates a membership change plan for CP groups. Then, the scheduled changes are applied to the
CP groups one by one. After all CP group member removals are done, the shutting down CP
member is removed from the active CP members list and its shutdown process is completed. A shut-
down CP member is automatically replaced with another available CP member in all of its CP
groups, including the METADATA CP group, in order not to decrease or more importantly not to lose
the majority of CP groups. If there is no available CP member to replace a shutting down CP
member in a CP group, that group’s size is reduced by 1 and its majority value is recalculated.
Please note that this behavior is when CP Subsystem Persistence is disabled. When CP Subsystem
Persistence is enabled, shut-down CP members are not automatically removed from the active CP
305
members list and they are still considered as part of CP groups and majority calculations, because
they can come back by restoring their local CP state from stable storage. If you know that a shut-
down CP member will not be restarted, you need to remove that member from CP Subsystem via
CPSubsystemManagementService.removeCPMember(String).
A new CP member can be added to CP Subsystem to either increase the number of available CP
members for new CP groups or to fill the missing slots in existing CP groups. After the initial
Hazelcast cluster startup is done, an existing Hazelcast member can be be promoted to the CP
member role. This new CP member automatically joins to CP groups that have missing members,
and majority values of these CP groups are recalculated.
You can access the CP Subsystem management APIs using the Java API or REST interface. To
communicate with the REST interface there are two options; one is to access REST endpoint URL
directly or using the cp-subsystem.sh shell script, which comes with the Hazelcast package.
The cp-cluster.sh script uses curl command, and curl must be installed to be able
to use the script.
Java API
REST API
• Get CP Groups:
306
Java API
REST API
Returns the active CP group with the given name. There can be at most one active CP group with
a given name.
Java API
307
REST API
• Get CP Members:
Java API
308
REST API
Unconditionally destroys the given active CP group without using the Raft algorithm mechanics.
This method must be used only when a CP group loses its majority and cannot make progress
anymore. Normally, membership changes in CP groups, such as CP member promotion or
removal, are done via the Raft consensus algorithm. However, when a CP group permanently
loses its majority, it will not be able to commit any new operation. Therefore, this method
ungracefully terminates the remaining members of the given CP group on the remaining CP
group members. It also performs a Raft commit to the METADATA CP group in order to update
the status of the destroyed group. Once a CP group is destroyed, all CP data structure proxies
created before the destroy fails with CPGroupDestroyedException. However, if a new proxy is
created afterwards, then this CP group is re-created from scratch with a new set of CP members.
This method is idempotent. It has no effect if the given CP group is already destroyed.
Java API
309
REST API
• Remove a CP Member:
Removes the given unreachable CP member from the active CP members list and all CP groups
it belongs to. If any other active CP member is available, it replaces the removed CP member in
its CP groups. Otherwise, CP groups which the removed CP member is a member of shrinks and
their majority values are recalculated.
Java API
REST API
Promotes the local Hazelcast member to the CP member. If the local member is already in the
active CP members list, i.e., it is already a CP member, then this method has no effect. When the
local member is promoted to the CP role, its member UUID is assigned as CP member UUID. The
promoted CP member will be added to the CP groups that have missing members, i.e., whose
current size is smaller than CPSubsystemConfig.getGroupSize().
310
Java API
REST API
Wipes and resets the whole CP Subsystem state and initializes it as if the Hazelcast cluster is
starting up initially. This method must be used only when the METADATA CP group loses its
majority and cannot make progress anymore.
After this method is called, all CP state and data are wiped and CP members start with empty
state.
This method can be invoked only from the Hazelcast master member, which is the first member
in the Hazelcast cluster member list. Moreover, the Hazelcast cluster must have at least
CPSubsystemConfig.getCPMemberCount() members.
This method must not be called while there are membership changes in the Hazelcast cluster.
Before calling this method, please make sure that there is no new member joining and all
existing Hazelcast members have seen the same member list.
To be able to use this method, the initial CP member count of CP Subsystem, which is defined by
CPSubsystemConfig.getCPMemberCount(), must be satisfied. For instance, if
CPSubsystemConfig.getCPMemberCount() is 5 and only 1 CP member is alive, when this method is
called, 4 additional AP Hazelcast members should exist in the cluster, or new Hazelcast
members must be started.
This method triggers a new CP discovery process round. However, if the new CP discovery
round fails for any reason, Hazelcast members are not terminated, because Hazelcast members
are likely to contain data for AP data structures and their termination can cause data loss.
Hence, you need to observe the cluster and check if the CP discovery process completes
successfully.
This method is NOT idempotent and multiple invocations can break the whole
system! After calling this API, you must observe the system to see if the reset
process is successfully completed or failed before making another call.
311
This method deletes all CP data written by CP Subsystem Persistence.
Java API
REST API
Java API
312
REST API
If a Hazelcast instance that owns a CP session crashes, its CP session is not terminated
immediately. Instead, the session is closed after
CPSubsystemConfig.getSessionTimeToLiveSeconds() passes. If it is known for sure that the session
owner is not partitioned and definitely crashed, this method can be used for closing the session
and releasing its resources immediately.
Java API
313
REST API
13. Transactions
This chapter explains the usage of Hazelcast in a transactional context. It describes the Hazelcast
transaction types and how they work, how to provide XA (eXtended Architeture) transactions and
how to integrate Hazelcast with J2EE containers.
Hazelcast supports two types of transactions: ONE_PHASE and TWO_PHASE. The type of transaction
controls what happens when a member crashes while a transaction is committing. The default
behavior is TWO_PHASE.
• ONE_PHASE: By selecting this transaction type, you execute the transactions with a single phase
that is committing the changes. Since a preparing phase does not exist, the conflicts are not
detected. When a conflict happens while committing the changes, e.g., due to a member crash,
not all the changes are written and this leaves the system in an inconsistent state.
• TWO_PHASE: When you select this transaction type, Hazelcast first tries to execute the prepare
phase. This phase fails if there are any conflicts. Once the prepare phase is successful, Hazelcast
executes the commit phase (writing the changes). Before TWO_PHASE commits, Hazelcast
copies the commit log to other members, so in case of a member failure, another member can
complete the commit.
314
public class TransactionalMember {
try {
Object obj = queue.poll();
//process obj
map.put( "1", "value1" );
set.add( "value" );
//do other things
context.commitTransaction();
} catch ( Throwable t ) {
context.rollbackTransaction();
}
}
}
In a transaction, operations are not executed immediately. Their changes are local to the
TransactionContext until committed. However, they ensure the changes via locks.
For the above example, when map.put is executed, no data is put in the map but the key is locked
against changes. While committing, operations are executed, the value is put to the map and the
key is unlocked.
The isolation level in Hazelcast Transactions is READ_COMMITTED on the level of a single partition. If
you are in a transaction, you can read the data in your transaction and the data that is already
committed. If you are not in a transaction, you can only read the committed data.
The REPEATABLE_READ isolation level can also be exercised using the method
getForUpdate() of TransactionalMap.
The isolation levels might be broken if the objects involved in the transaction span
multiple partitions. A reader which is not in a transaction can then temporarily
observe partially committed data.
315
13.1.1. Queue/Set/List vs. Map/Multimap
MapStore and QueueStore do not participate in transactions. Hazelcast suppresses exceptions thrown
by the store in a transaction. See the XA Transactions section for further information.
As discussed in Creating a Transaction Interface, when you choose ONE_PHASE as the transaction
type, Hazelcast tracks all changes you make locally in a commit log, i.e., a list of changes. In this
case, all the other members are asked to agree that the commit can succeed and once they agree,
Hazelcast starts to write the changes. However, if the member that initiates the commit crashes
after it has written to at least one member (but has not completed writing to all other members),
your system may be left in an inconsistent state.
On the other hand, if you choose TWO_PHASE as the transaction type, the commit log is again
tracked locally but it is copied to another cluster member. Therefore, when a failure happens, e.g.,
the member initiating the commit crashes, you still have the commit log in another member and
that member can complete the commit. However, copying the commit log to another member
makes the TWO_PHASE approach slow.
Consequently, it is recommended that you choose ONE_PHASE as the transaction type if you want
better performance, and that you choose TWO_PHASE if reliability of your system is more
important than the performance.
When you implement the XAResource interface, Hazelcast provides XA transactions. You can obtain
the HazelcastXAResource instance via the HazelcastInstance getXAResource method. You can see the
HazelcastXAResource API here.
Below is example code that uses JTA API for transaction management.
316
cleanAtomikosLogs();
tm.commit();
cleanAtomikosLogs();
Hazelcast.shutdownAll();
Caching keeps data in memory that either are slow to calculate/process or originate from another
underlying backend system. Caching is used to prevent additional request round trips for
frequently used data. In both cases, caching can be used to gain performance or decrease
application latencies.
In addition to the given specification, we added some features like asynchronous versions of almost
all operations to give the user extra power.
This chapter gives a basic understanding of how to configure your application and how to setup
317
Hazelcast to be your JCache provider. It also shows examples of basic JCache usage as well as the
additionally offered features that are not part of JSR-107. To gain a full understanding of the JCache
functionality and provided guarantees of different operations, read the specification document
(which is also the main documentation for functionality) at the specification page of JSR-107.
• The original release, version 1.0.0, was released in March 2014. Hazelcast versions 3.3.1 up to
3.9.2 (included) implement version 1.0.0 of the JCache specification.
• A maintenance release, version 1.1.0 was released in December 2017. Hazelcast version 3.9.3
and higher implement JCache specification version 1.1.0.
• A patch release, version 1.1.1 was released in May 2019. Hazelcast version 3.12.1 and higher
implement JCache 1.1.1.
JCache 1.1.x versions are backwards compatible with JCache 1.0.0. As maintenance releases, JCache
1.1.x versions introduce clarifications and bug fixes in the specification, reference implementation
and TCK, without introducing any additional features.
JCache 1.1.1 is a bug-fix-only release. There are no behavioral differences between the JCache 1.1.0
and 1.1.1 specifications.
When upgrading from a Hazelcast version which implements JCache 1.0.0 to a version that
implements version 1.1.0 of the specification, some behavioral differences must be taken into
account:
• Statistics effects of Cache.putIfAbsent on misses and hits are properly applied when using
JCache 1.1.0, while under JCache 1.0.0 misses and hits were not updated.
Note that these behavioral differences apply on the Hazelcast member that executes the operation.
Thus when performing a rolling member upgrade from a JCache 1.0.0-compliant Hazelcast version
to a newer Hazelcast version that supports JCache 1.1.0, operations executed on the new members
exhibit JCache 1.1.0 behavior while those executed on old members implement JCache 1.0.0
behavior.
The complete list of issues addressed in JCache specification version 1.1.0 is available on Github.
318
14.2. JCache Setup and Configuration
This section shows what is necessary to provide the JCache API and the Hazelcast JCache
implementation for your application. In addition, it demonstrates the different configuration
options and describes the configuration properties.
To provide your application with this JCache functionality, your application needs the JCache API
inside its classpath. This API is the bridge between the specified JCache standard and the
implementation provided by Hazelcast.
The method of integrating the JCache API JAR into the application classpath depends on the build
system used. For Maven, Gradle, SBT, Ivy and many other build systems, all using Maven-based
dependency repositories, perform the integration by adding the Maven coordinates to the build
descriptor.
As already mentioned, you have to add JCache coordinates next to the default Hazelcast coordinates
that might be already part of the application.
For Maven users, the coordinates look like the following code:
<dependency>
<groupId>javax.cache</groupId>
<artifactId>cache-api</artifactId>
<version>1.1.1</version>
</dependency>
With other build systems, you might need to describe the coordinates in a different way.
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast-all</artifactId>
<version>4.0.1</version>
</dependency>
319
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast</artifactId>
<version>4.0.1</version>
</dependency>
The users of other build systems have to adjust the definition of the dependency to their needs.
When the users want to use Hazelcast clients to connect to a remote cluster, the hazelcast-
client.jar dependency is also required on the client side applications. This JAR is already included
in hazelcast-all.jar. Or, you can add it to the classpath using the following Maven snippet:
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast-client</artifactId>
<version>4.0.1</version>
</dependency>
For other build systems, for instance, ANT, the users have to download these dependencies from
either the JSR-107 specification and Hazelcast community website (hazelcast.org) or from the
Maven repository search page (maven.org).
Before moving on to configuration, let’s have a look at a basic introductory example. The following
code shows how to use the Hazelcast JCache integration inside an application in an easy but
typesafe way.
320
// Retrieve the CachingProvider which is automatically backed by
// the chosen Hazelcast member or client provider.
CachingProvider cachingProvider = Caching.getCachingProvider();
// Create a CacheManager.
CacheManager cacheManager = cachingProvider.getCacheManager();
Although the example is simple, let’s go through the code lines one by one.
First of all, we retrieve the javax.cache.spi.CachingProvider using the static method from
javax.cache.Caching.getCachingManager(), which automatically picks up Hazelcast as the underlying
JCache implementation, if available in the classpath. This way, the Hazelcast implementation of a
CachingProvider automatically starts a new Hazelcast member or client (depending on the chosen
provider type) and pick up the configuration from either the command line parameter or from the
classpath. We will show how to use an existing HazelcastInstance later in this chapter; for now, we
keep it simple.
In the next line, we ask the CachingProvider to return a javax.cache.CacheManager. This is the general
application’s entry point into JCache. The CacheManager creates and manages named caches.
321
Creating the Cache
To create the cache, we call javax.cache.CacheManager.createCache() with a name for the cache and
the previously created configuration; the call returns the created cache. If you need to retrieve a
previously created cache, you can use the corresponding method overload
javax.cache.CacheManager.getCache(). If the cache was created using type parameters, you must
retrieve the cache afterward using the type checking version of getCache.
The following lines are simple put and get calls from the java.util.Map interface. The
javax.cache.Cache.put() has a void return type and does not return the previously assigned value of
the key. To imitate the java.util.Map.put() method, the JCache cache has a method called getAndPut.
• programmatically: the typical Hazelcast way, using the Config API seen above
Declarative Configuration
You can declare your JCache cache configuration using the hazelcast.xml or hazelcast-client.xml
configuration files. Using this declarative configuration makes creating the javax.cache.Cache fully
transparent and automatically ensures internal thread safety. You do not need a call to
javax.cache.Cache.createCache() in this case: you can retrieve the cache using
javax.cache.Cache.getCache() overloads and by passing in the name defined in the configuration
for the cache.
To retrieve the cache that you defined in the declaration files, you only need to perform a simple
call (example below) because the cache is created automatically by the implementation.
Note that this section only describes the JCache provided standard properties. For the Hazelcast
specific properties, see the ICache Configuration section.
322
<hazelcast>
...
<cache name="default">
<key-type class-name="java.lang.Object" />
<value-type class-name="java.lang.Object" />
<statistics-enabled>false</statistics-enabled>
<management-enabled>false</management-enabled>
<read-through>true</read-through>
<write-through>true</write-through>
<cache-loader-factory
class-name="com.example.cache.MyCacheLoaderFactory" />
<cache-writer-factory
class-name="com.example.cache.MyCacheWriterFactory" />
<expiry-policy-factory
class-name="com.example.cache.MyExpiryPolicyFactory" />
<cache-entry-listeners>
<cache-entry-listener old-value-required="false" synchronous="false">
<cache-entry-listener-factory
class-name="com.example.cache.MyEntryListenerFactory" />
<cache-entry-event-filter-factory
class-name="com.example.cache.MyEntryEventFilterFactory" />
</cache-entry-listener>
</cache-entry-listeners>
</cache>
...
</hazelcast>
• key-type#class-name: Fully qualified class name of the cache key type. Its default value is
java.lang.Object.
• value-type#class-name: Fully qualified class name of the cache value type. Its default value is
java.lang.Object.
• statistics-enabled: If set to true, statistics like cache hits and misses are collected. Its default
value is false.
• management-enabled: If set to true, JMX beans are enabled and collected statistics are provided. It
doesn’t automatically enable statistics collection. Its default value is false.
323
javax.cache.configuration.Factory implementation providing a
javax.cache.integration.CacheWriter instance to the cache.
The JMX MBeans provided by Hazelcast JCache show statistics of the local member
only. To show the cluster-wide statistics, the user should collect statistic
information from all members and accumulate them to the overall statistics.
Programmatic Configuration
To stay vendor independent, try to keep your code as near as possible to the
standard JCache API. We recommend that you use declarative configuration and
that you use the javax.cache.configuration.Configuration
javax.cache.configuration.CompleteConfiguration interfaces in your code only
or
when you need to pass the configuration instance throughout your code.
If you don’t need to configure Hazelcast specific properties, we recommend that you instantiate
javax.cache.configuration.MutableConfiguration and that you use the setters to configure Hazelcast
as shown in the example in the Example JCache Application section. Since the configurable
properties are the same as the ones explained in the JCache Declarative Configuration section, they
324
are not mentioned here. For Hazelcast specific properties, please read the ICache Configuration
section section.
Hazelcast has two types of providers. Depending on your application setup and the cluster topology,
you can use the Client Provider (used by Hazelcast clients) or the Server Provider (used by cluster
members).
For more information on cluster topologies and Hazelcast clients, see the Hazelcast Topology
section.
Hazelcast implements a delegating CachingProvider that can automatically be configured for either
client or member mode and that delegates to the real underlying implementation based on the
user’s choice. Hazelcast recommends that you use this CachingProvider implementation.
com.hazelcast.cache.HazelcastCachingProvider
To configure the delegating provider at the command line, add the following parameter to the Java
startup call, depending on the chosen provider:
-Dhazelcast.jcache.provider.type=[client|server]
By default, the delegating CachingProvider is automatically picked up by the JCache SPI and
provided as shown above. In cases where multiple javax.cache.spi.CachingProvider
implementations reside on the classpath (like in some Application Server scenarios), you can select
a CachingProvider by explicitly calling Caching.getCachingProvider() overloads and providing them
using the canonical class name of the provider to be used. The class names of member and client
providers provided by Hazelcast are mentioned in the following two subsections.
325
Hazelcast advises that you use the Caching.getCachingProvider() overloads to select
a CachingProvider explicitly. This ensures that uploading to later environments or
Application Server versions doesn’t result in unexpected behavior like choosing a
wrong CachingProvider.
For cluster topologies where Hazelcast light clients are used to connect to a remote Hazelcast
cluster, use the Client Provider to configure JCache.
The Client Provider provides the same features as the Server Provider. However, it does not hold
data on its own but instead delegates requests and calls to the remotely connected cluster.
The Client Provider can connect to multiple clusters at the same time. This can be achieved by
scoping the client side CacheManager with different Hazelcast configuration files. For more
information, see Scoping to Join Clusters.
com.hazelcast.client.cache.impl.HazelcastClientCachingProvider
If a Hazelcast member is embedded into an application directly and the Hazelcast client is not used,
the Server Provider is required. In this case, the member itself becomes a part of the distributed
cache and requests and operations are distributed directly across the cluster by its given key.
The Server Provider provides the same features as the Client provider, but it keeps data in the local
Hazelcast member and also distributes non-owned keys to other direct cluster members.
Like the Client Provider, the Server Provider can connect to multiple clusters at the same time. This
can be achieved by scoping the client side CacheManager with different Hazelcast configuration files.
For more information, see Scoping to Join Clusters.
com.hazelcast.cache.impl.HazelcastServerCachingProvider
326
14.4.1. JCache API Application Example
The code in this subsection creates a small account application by providing a caching layer over an
imagined database abstraction. The database layer is simulated using a single demo data in a
simple DAO interface. To show the difference between the "database" access and retrieving values
from the cache, a small waiting time is used in the DAO implementation to simulate network and
database latency.
Before we implement the JCache caching layer, let’s have a quick look at some basic classes we need
for this example.
The User class is the representation of a user table in the database. To keep it simple, it has just two
properties: userId and username.
public User() {
}
The DAO interface is also kept easy in this example. It provides a simple method to retrieve (find) a
user by its userId.
To show most of the standard features, the configuration example is a little more complex.
327
// Create javax.cache.configuration.CompleteConfiguration subclass
CompleteConfiguration<Integer, User> config =
new MutableConfiguration<Integer, User>()
// Configure the cache to be typesafe
.setTypes( Integer.class, User.class )
// Configure to expire entries 30 secs after creation in the cache
.setExpiryPolicyFactory( FactoryBuilder.factoryOf(
new AccessedExpiryPolicy( new Duration( TimeUnit.SECONDS, 30 ) )
) )
// Configure read-through of the underlying store
.setReadThrough( true )
// Configure write-through to the underlying store
.setWriteThrough( true )
// Configure the javax.cache.integration.CacheLoader
.setCacheLoaderFactory( FactoryBuilder.factoryOf(
new UserCacheLoader( userDao )
) )
// Configure the javax.cache.integration.CacheWriter
.setCacheWriterFactory( FactoryBuilder.factoryOf(
new UserCacheWriter( userDao )
) )
// Configure the javax.cache.event.CacheEntryListener with no
// javax.cache.event.CacheEntryEventFilter, to include old value
// and to be executed synchronously
.addCacheEntryListenerConfiguration(
new MutableCacheEntryListenerConfiguration<Integer, User>(
new UserCacheEntryListenerFactory(),
null, true, true
)
);
First, we set the expected types for the cache, which is already known from the previous example.
On the next line, a javax.cache.expiry.ExpiryPolicy is configured. Almost all integration
ExpiryPolicy implementations are configured using javax.cache.configuration.Factory instances.
Factory and FactoryBuilder are explained later in this chapter.
The next two lines configure the thread that are read-through and write-through to the underlying
backend resource that is configured over the next few lines. The JCache API offers
javax.cache.integration.CacheLoader and javax.cache.integration.CacheWriter to implement
adapter classes to any kind of backend resource, e.g., JPA, JDBC, or any other backend technology
implementable in Java. The interface provides the typical CRUD operations like create, get, update,
delete and some bulk operation versions of those common operations. We will look into the
implementation of those implementations later.
328
Configuring Entry Listeners
A full running example that is presented in this subsection is available in the code samples
repository. The application is built to be a command line app. It offers a small shell to accept
different commands. After startup, you can enter help to see all available commands and their
descriptions.
In the Example JCache Application section, we have already seen a couple of the base classes and
explained how those work. The following are quick descriptions of them:
javax.cache.Caching:
The access point into the JCache API. It retrieves the general CachingProvider backed by any
compliant JCache implementation, such as Hazelcast JCache.
javax.cache.spi.CachingProvider:
The SPI that is implemented to bridge between the JCache API and the implementation itself.
Hazelcast members and clients use different providers chosen as seen in the Configuring JCache
Provider section which enable the JCache API to interact with Hazelcast clusters.
javax.cache.CacheManager:
The CacheManager provides the capability to create new and manage existing JCache caches.
A javax.cache.Cache instance created with key and value types in the configuration
provides a type checking of those types at retrieval of the cache. For that reason,
all non-types retrieval methods like getCache throw an exception because types
cannot be checked.
javax.cache.configuration.Configuration, javax.cache.configuration.MutableConfiguration:
These two classes are used to configure a cache prior to retrieving it from a CacheManager. The
Configuration interface, therefore, acts as a common super type for all compatible configuration
classes such as MutableConfiguration.
329
Hazelcast itself offers a special implementation (com.hazelcast.config.CacheConfig) of the
Configuration interface which offers more options on the specific Hazelcast properties that can be
set to configure features like synchronous and asynchronous backups counts or selecting the
underlying in-memory format of the cache. For more information on this configuration class, see
the reference in the JCache Programmatic Configuration section.
javax.cache.Cache:
This interface represents the cache instance itself. It is comparable to java.util.Map but offers
special operations dedicated to the caching use case. Therefore, for example
javax.cache.Cache.put(), unlike java.util.Map.put(), does not return the old value previously
assigned to the given key.
Bulk operations on the Cache interface guarantee atomicity per entry but not over
all given keys in the same bulk operations since no transactional behavior is
applied over the whole batch process.
Factory implementations are easy to do, as they follow the default Provider- or Factory-Pattern. The
example class UserCacheEntryListenerFactory shown below implements a custom JCache Factory.
@Override
public CacheEntryListener<Integer, User> create() {
// just create a new listener instance
return new UserCacheEntryListener();
}
}
To simplify the process for the users, JCache API offers a set of helper methods collected in
javax.cache. configuration.FactoryBuilder. In the above configuration example,
FactoryBuilder.factoryOf() creates a singleton factory for the given instance.
Cache read-through
330
the cache when the key or the value is not yet found in the cache. If no value is found for a given
key, it returns null.
If the cache is not configured to be read-through, nothing is loaded automatically. The user code
must call javax.cache.Cache.loadAll() to load data for the given set of keys into the cache.
For the bulk load operation (loadAll()), some keys may not be found in the returned result set. In
this case, a javax.cache.integration.CompletionListener parameter can be used as an asynchronous
callback after all the key-value pairs are loaded because loading many key-value pairs can take lots
of time.
CacheLoader Example
Let’s look at the UserCacheLoader implementation. This implementation is quite straight forward.
• It implements CacheLoader.
• It overrides the load method to compute or retrieve the value corresponding to key.
• It overrides the loadAll method to compute or retrieve the values corresponding to keys.
331
public class UserCacheLoader implements CacheLoader<Integer, User>, Serializable {
@Override
public User load(Integer key) throws CacheLoaderException {
// just call through into the dao
return userDao.findUserById(key);
}
@Override
public Map<Integer, User> loadAll(Iterable<? extends Integer> keys) throws
CacheLoaderException {
// create the resulting map
Map<Integer, User> loaded = new HashMap<Integer, User>();
// for every key in the given set of keys
for (Integer key : keys) {
// try to retrieve the user
User user = userDao.findUserById(key);
// if user is not found do not add the key to the result set
if (user != null) {
loaded.put(key, user);
}
}
return loaded;
}
}
14.4.5. CacheWriter
• It implements CacheWriter.
• It overrides the write method to write the specified entry to the underlying store.
332
• It overrides the writeAll method to write the specified entires to the underlying store.
• It overrides the delete method to delete the key entry from the store.
• It overrides the deleteAll method to delete the data and keys from the underlying store for the
given collection of keys, if present.
@Override
public void write(Cache.Entry<? extends Integer, ? extends User> entry) throws
CacheWriterException {
// store the user using the dao
userDao.storeUser(entry.getKey(), entry.getValue());
}
@Override
public void writeAll(Collection<Cache.Entry<? extends Integer, ? extends User>>
entries) throws CacheWriterException {
// retrieve the iterator to clean up the collection from written keys in case
of an exception
Iterator<Cache.Entry<? extends Integer, ? extends User>> iterator = entries
.iterator();
while (iterator.hasNext()) {
// write entry using dao
write(iterator.next());
// remove from collection of keys
iterator.remove();
}
}
@Override
public void delete(Object key) throws CacheWriterException {
// test for key type
if (!(key instanceof Integer)) {
throw new CacheWriterException("Illegal key type");
}
// remove user using dao
userDao.removeUser((Integer) key);
}
@Override
public void deleteAll(Collection<?> keys) throws CacheWriterException {
// retrieve the iterator to clean up the collection from written keys in case
333
of an exception
Iterator<?> iterator = keys.iterator();
while (iterator.hasNext()) {
// write entry using dao
delete(iterator.next());
// remove from collection of keys
iterator.remove();
}
}
}
Again, the implementation is pretty straightforward and also as above all exceptions thrown by the
external resource, like java.sql.SQLException has to be wrapped into a
javax.cache.integration.CacheWriterException. Note this is a different exception from the one
thrown by CacheLoader.
By default, Hazelcast JCache sends the complete changed value to the backup partition. Again, this
can cause a lot of traffic if the object is big. The Hazelcast ICache extension can also prevent this.
Further information is available at Implementing BackupAwareEntryProcessor.
• It implements EntryProcessor.
334
public class UserUpdateEntryProcessor implements EntryProcessor<Integer, User, User> {
@Override
public User process(MutableEntry<Integer, User> entry, Object... arguments) throws
EntryProcessorException {
// test arguments length
if (arguments.length < 1) {
throw new EntryProcessorException("One argument needed: username");
}
In addition, when using a Cache.invokeAll() method, a java.util.Map is returned that maps the key
to its javax.cache.processor.EntryProcessorResult, which itself wraps the actual result or a thrown
javax.cache.processor.EntryProcessorException.
14.4.7. CacheEntryListener
335
set of sub-interfaces.
• CacheEntryRemovedListener: Fires after a cache entry was removed (not expired) from the cache.
• CacheEntryExpiredListener: Fires after a cache entry has been expired. Expiry does not have to
be a parallel process-- Hazelcast JCache implementation detects and removes expired entries
periodically. Therefore, the expiration event may not be fired as soon as the entry expires. See
ExpiryPolicy for details.
In this example application, the listener is implemented to print event information on the console.
That visualizes what is going on in the cache. This application performs the following tasks:
336
class UserCacheEntryListener implements CacheEntryCreatedListener<Integer, User>,
CacheEntryUpdatedListener<Integer, User>,
CacheEntryRemovedListener<Integer, User>,
CacheEntryExpiredListener<Integer, User> {
@Override
public void onCreated(Iterable<CacheEntryEvent<? extends Integer, ? extends User>>
cacheEntryEvents)
throws CacheEntryListenerException {
printEvents(cacheEntryEvents);
}
@Override
public void onUpdated(Iterable<CacheEntryEvent<? extends Integer, ? extends User>>
cacheEntryEvents)
throws CacheEntryListenerException {
printEvents(cacheEntryEvents);
}
@Override
public void onRemoved(Iterable<CacheEntryEvent<? extends Integer, ? extends User>>
cacheEntryEvents)
throws CacheEntryListenerException {
printEvents(cacheEntryEvents);
}
@Override
public void onExpired(Iterable<CacheEntryEvent<? extends Integer, ? extends User>>
cacheEntryEvents)
throws CacheEntryListenerException {
printEvents(cacheEntryEvents);
}
14.4.8. ExpiryPolicy
337
entries based on different rules.
JCache does not require expired entries to be removed from the cache immediately. It only enforces
that expired entries are not returned from cache. Therefore, exact time of removal is
implementation specific. Hazelcast complies JCache by checking the entries for expiration at the
time of get operations (lazy expiration). In addition to that, Hazelcast uses a periodic task to detect
and remove expired entries as soon as possible (eager expiration). Thanks to eager expiry, all
expired entries are removed from the memory eventually even when they are not touched again.
So the space used by such entries are released as well.
For a detailed explanation of interaction between expiry policies and JCache API, see the table in
the Expiry Policies section of JCache documentation.
By default, JCache delivers a set of predefined expiry strategies in the standard API.
• AccessedExpiryPolicy: Expires after a given set of time measured from creation of the cache
entry. The expiry timeout is updated on accessing the key.
• CreatedExpiryPolicy: Expires after a given set of time measured from creation of the cache
entry. The expiry timeout is never updated.
• EternalExpiryPolicy: Never expires. This is the default behavior, similar to ExpiryPolicy being
set to null.
• ModifiedExpiryPolicy: Expires after a given set of time measured from creation of the cache
entry. The expiry timeout is updated on updating the key.
• TouchedExpiryPolicy: Expires after a given set of time measured from creation of the cache
entry. The expiry timeout is updated on accessing or updating the key.
Because EternalExpiryPolicy does not expire cache entries, it is still possible to evict values from
memory if an underlying CacheLoader is defined.
If you create a cache through a ICacheManager which has its own specified URI scope (and/or
specified classloader), it must be prepended to the pure cache name as a prefix while retrieving the
cache through getCache(String name). Prefix generation for full cache name is exposed through
com.hazelcast.cache.CacheUtil.getPrefixedCacheName(String name, java.net.URI uri, ClassLoader
classloader). If the URI scope and classloader is not specified, the pure cache name can be used
directly while retrieving cache over ICacheManager.
338
If you have a cache which is not created, but is defined/exists (cache is specified in Hazelcast
configuration but not created yet), you can retrieve this cache by its name. This also triggers cache
creation before retrieving it. This retrieval is supported through HazelcastInstance. However,
HazelcastInstance does not support creating a cache by specifying configuration; this is supported
by Hazelcast’s ICacheManager as it is.
If a valid (rather than 1.0.0-PFD or 0.x versions) JCache library does not exist on
the classpath, IllegalStateException is thrown.
HazelcastInstance is injected into the following cache API interfaces (provided by javax.cache.Cache
and com.hazelcast.cache.ICache) if they implement HazelcastInstanceAware interface:
• Cache operations with custom ExpiryPolicy parameter to apply on that specific operation. See
Custom ExpiryPolicy.
ICache data structure can also be used by Hazelcast Jet for Real-Time Stream
Processing (by enabling the Event Journal on your cache) and Fast Batch
Processing. Hazelcast Jet uses ICache as a source (reads data from ICache) and as a
sink (writes data to ICache). See the Fast Batch Processing and Real-Time Stream
Processing use cases for Hazelcast Jet. See also here in the Hazelcast Jet Reference
Manual to learn how Jet uses ICache, i.e., how it can read from and write to ICache.
339
14.6.1. Scoping to Join Clusters
Examples
The following examples illustrate how HazelcastInstances are created or reused during the creation
of a new CacheManager. Complete reference on the HazelcastInstance lookup mechanism is provided
in the sections that follow.
Assuming no other HazelcastInstance exists in the same JVM, the cacheManager below starts a new
HazelcastInstance, configured according to the configuration lookup rules as defined for
Hazelcast.newHazelcastInstance() in case of an embedded member or
HazelcastClient.newHazelcastClient() for a client-side CacheManager.
When using both Hazelcast-specific features and JCache, a HazelcastInstance might be already
available to your JCache configuration. By configuring an instance name in hazelcast.xml in the
classpath root, the CacheManager locates the existing instance by name and reuses it.
• hazelcast.xml:
<hazelcast>
...
<instance-name>hz-member-1</instance-name>
...
</hazelcast>
340
// start hazelcast, configured with default hazelcast.xml
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
// start the default CacheManager -- it locates the default hazelcast.xml
configuration
// and identify the existing HazelcastInstance by its name
CachingProvider caching = Caching.getCachingProvider();
CacheManager cacheManager = caching.getCacheManager();
Starting a CacheManager with a New HazelcastInstance Configured with a Non-default Configuration File
Note that if the Hazelcast configuration file does specify an instance name, then any CacheManagers
referencing the same configuration file locates by name and reuses the same HazelcastInstance.
341
Properties properties = HazelcastCachingProvider.propertiesByInstanceName("hc-
instance");
CachingProvider caching = Caching.getCachingProvider();
CacheManager cacheManager = caching.getCacheManager(new URI("any-uri-will-do"),
null, properties);
To connect or join different clusters, apply a configuration scope to the CacheManager. If the same URI
is used to request a CacheManager that was created previously, those CacheManagers share the same
underlying HazelcastInstance.
• pass the path to the configuration file using the location property
HazelcastCachingProvider#HAZELCAST_CONFIG_LOCATION (which resolves to
hazelcast.config.location) as a mapping inside a java.util.Properties instance to the
CachingProvider.getCacheManager(uri, classLoader, properties) call.
342
CachingProvider cachingProvider = Caching.getCachingProvider();
The retrieved CacheManager is scoped to use the HazelcastInstance that was just created and
configured using the given XML configuration file.
Available protocols for config file URL include classpath to point to a classpath location, file to
point to a filesystem location and http and https for remote web locations. In addition, everything
that does not specify a protocol is recognized as a placeholder that can be configured using a
system property.
-Dmy-placeholder=classpath:my-configs/scoped-hazelcast.xml
You should consider the following rules about the Hazelcast instance name when you specify the
configuration file location using HazelcastCachingProvider#HAZELCAST_CONFIG_LOCATION (which
resolves to hazelcast.config.location):
• If you do not specify an instance name via property or in the configuration file, the URL of the
configuration file location is used as the instance name.
343
The configuration file location will not be a part of the resulting identity of the
CacheManager. An attempt to create a CacheManager with a different set of properties
but an already used name results in an undefined behavior.
You can bind CacheManager to an existing and named HazelcastInstance instance. If the instanceName
is specified in com.hazelcast.config.Config, it can be used directly by passing it to CachingProvider
implementation. Otherwise (instanceName not set or instance is a client instance) you must get the
instance name from the HazelcastInstance instance via the String getName() method to pass the
CachingProvider implementation. Please note that instanceName is not configurable for the client side
HazelcastInstance instance and is auto-generated by using cluster name (if it is specified). In
general, String getName() method over HazelcastInstance is safer and the preferable way to get the
name of the instance. Multiple CacheManagers created using an equal java.net.URI share the same
HazelcastInstance.
A named scope is applied nearly the same way as the configuration scope. Pass the instance name
using:
If a valid instance name is provided both as property and as URI, then the property value takes
precedence and is used to resolve the HazelcastInstance the first time a CacheManager is created for
the given URI.
Here is an example of Named Instance Scope with specified name passed as URI of the CacheManager:
344
Config config = new Config();
config.setInstanceName( "my-named-hazelcast-instance" );
// Create a named HazelcastInstance
Hazelcast.newHazelcastInstance( config );
Here is an example of Named Instance Scope with auto-generated name on client instance:
345
ClientConfig clientConfig = new ClientConfig();
ClientNetworkConfig networkConfig = clientConfig.getNetworkConfig();
networkConfig.addAddress("127.0.0.1", "127.0.0.2");
The instanceName will not be a part of the resulting identity of the CacheManager. An
attempt to create a CacheManager with a different set of properties but an already
used name will result in undefined behavior.
346
// Create a member HazelcastInstance
HazelcastInstance instance = Hazelcast.newHazelcastInstance();
14.6.2. Namespacing
The java.net.URIs that don’t use the above-mentioned Hazelcast-specific schemes are recognized as
namespacing. Those CacheManagers share the same underlying default HazelcastInstance created (or
set) by the CachingProvider, but they cache with the same names and different namespaces on the
CacheManager level, and therefore they won’t share the same data. This is useful where multiple
applications might share the same Hazelcast JCache implementation, e.g., on application or OSGi
servers, but are developed by independent teams. To prevent interfering on caches using the same
name, every application can use its own namespace when retrieving the CacheManager.
That way both applications share the same HazelcastInstance instance but not the same caches.
Besides Scoping to Join Clusters and Namespacing, which are implemented using the URI feature of
the specification, all other extended operations are required to retrieve the
com.hazelcast.cache.ICache interface instance from the JCache javax.cache.Cache instance. For
Hazelcast, both interfaces are implemented on the same object instance. It is recommended that
you stay with the specification method to retrieve the ICache version, since ICache might be subject
to change without notification.
To retrieve or unwrap the ICache instance, you can execute the following code example:
347
CachingProvider cachingProvider = Caching.getCachingProvider();
CacheManager cacheManager = cachingProvider.getCacheManager();
Cache<Object, Object> cache = cacheManager.getCache( ... );
After unwrapping the Cache instance into an ICache instance, you have access to all of the following
operations, e.g., ICache Async Methods and ICache Convenience Methods.
As mentioned in the JCache Declarative Configuration section, the Hazelcast ICache extension offers
additional configuration properties over the default JCache configuration. These additional
properties include internal storage format, backup counts, eviction policy and split-brain protection
reference.
The declarative configuration for ICache is a superset of the previously discussed JCache
configuration:
<hazelcast>
...
<cache>
<!-- ... default cache configuration goes here ... -->
<backup-count>1</backup-count>
<async-backup-count>1</async-backup-count>
<in-memory-format>BINARY</in-memory-format>
<eviction size="10000" max-size-policy="ENTRY_COUNT" eviction-policy="LRU" />
<partition-lost-listeners>
<partition-lost-listener>CachePartitionLostListenerImpl</partition-lost-
listener>
</partition-lost-listeners>
<split-brain-protection-ref>split-brain-protection-name</split-brain-
protection-ref>
<disable-per-entry-invalidation-events>true</disable-per-entry-invalidation-
events>
</cache>
...
</hazelcast>
• backup-count: Number of synchronous backups. Those backups are executed before the
mutating cache operation is finished. The mutating operation is blocked. Its default value is 1.
• in-memory-format: Internal storage format. For more information, see the in-memory format
section. Its default value is BINARY.
348
• eviction: Defines the used eviction strategies and sizes for the cache. For more information on
eviction, see the JCache Eviction section.
◦ size: Maximum number of records or maximum size in bytes depending on the max-size-
policy property. Size can be any integer between 0 and Integer.MAX_VALUE. The default max-
size-policy is ENTRY_COUNT and its default size is 10.000.
◦ max-size-policy: Maximum size. If maximum size is reached, the cache is evicted based on
the eviction policy. Default max-size-policy is ENTRY_COUNT and its default size is 10.000. The
following eviction policies are available:
▪ ENTRY_COUNT: Maximum number of the entries in cache. Based on this number, Hazelcast
calculates an approximate maximum size for each partition. See the Eviction Algorithm
section for more details. Available on heap based cache record store only.
▪ USED_NATIVE_MEMORY_SIZE: Maximum used native memory size in megabytes per cache for
each Hazelcast instance. Available on High-Density Memory cache record store only.
◦ eviction-policy: Eviction policy that compares values to find the best matching eviction
candidate. Its default value is LRU.
▪ LRU: Less Recently Used - finds the best eviction candidate based on the lastAccessTime.
▪ LFU: Less Frequently Used - finds the best eviction candidate based on the number of hits.
• partition-lost-listeners : Defines listeners for dispatching partition lost events for the cache.
For more information, see the ICache Partition Lost Listener section.
• split-brain-protection-ref : Name of the split-brain protection configuration that you want this
cache to use.
As another addition of Hazelcast ICache over the normal JCache specification, Hazelcast provides
349
asynchronous versions of almost all methods, returning a java.util.concurrent.CompletionStage. By
using these methods and the returned objects, you can use JCache in a reactive way by registering
dependent computation stages on the returned CompletionStage to prevent blocking the current
thread.
The asynchronous versions of the methods append the phrase Async to the method name. The
example code below uses the method putAsync().
• get(key):
◦ getAsync(key)
◦ getAsync(key, expiryPolicy)
• put(key, value):
◦ putAsync(key, value)
◦ putAsync(key, value, expiryPolicy)
• putIfAbsent(key, value):
◦ putIfAbsentAsync(key, value)
◦ putIfAbsentAsync(key, value, expiryPolicy)
• getAndPut(key, value):
◦ getAndPutAsync(key, value)
◦ getAndPutAsync(key, value, expiryPolicy)
• remove(key):
◦ removeAsync(key)
• remove(key, value):
◦ removeAsync(key, value)
• getAndRemove(key):
◦ getAndRemoveAsync(key)
• replace(key, value):
◦ replaceAsync(key, value)
◦ replaceAsync(key, value, expiryPolicy)
• replace(key, oldValue, newValue):
◦ getAndReplaceAsync(key, value)
◦ getAndReplaceAsync(key, value, expiryPolicy)
350
The methods with a given javax.cache.expiry.ExpiryPolicy are further discussed in the Defining a
Custom ExpiryPolicy.
The JCache specification has an option to configure a single ExpiryPolicy per cache. Hazelcast
ICache extension offers the possibility to define a custom ExpiryPolicy per key by providing a set of
method overloads with an expirePolicy parameter, as in the list of asynchronous methods in the
Async Methods section. This means that you can pass custom expiry policies to a cache operation.
To pass a custom ExpiryPolicy, a set of overloads is provided. You can use them as shown in the
following code example.
The ExpiryPolicy instance can be pre-created, cached and re-used, but only for each cache instance.
This is because ExpiryPolicy implementations can be marked as java.io.Closeable. The following
list shows the provided method overloads over javax.cache.Cache by com.hazelcast.cache.ICache
featuring the ExpiryPolicy parameter:
• get(key):
◦ get(key, expiryPolicy)
• getAll(keys):
◦ getAll(keys, expirePolicy)
• put(key, value):
◦ putAll(map, expirePolicy)
• putIfAbsent(key, value):
351
• replace(key, value):
Asynchronous method overloads are not listed here. See the ICache Async Methods section for the
list of asynchronous method overloads.
ICache also offers setExpiryPolicy(key, expirePolicy) method to associate certain keys with custom
expiry policies. Per key expiry policies defined by this method take precedence over cache policies,
but they are overridden by the expiry policies specified in above mentioned overloaded methods.
Caches are generally not expected to grow to an infinite size. Implementing an expiry policy is one
way you can prevent infinite growth, but sometimes it is hard to define a meaningful expiration
timeout. Therefore, Hazelcast JCache provides the eviction feature. Eviction offers the possibility of
removing entries based on the cache size or amount of used memory (Hazelcast IMDG Enterprise
Only) and not based on timeouts.
Since a cache is designed for high throughput and fast reads, Hazelcast put a lot of effort into
designing the eviction system to be as predictable as possible. All built-in implementations provide
an amortized O(1) runtime. The default operation runtime is rendered as O(1), but it can be faster
than the normal runtime cost if the algorithm finds an expired entry while sampling.
Cache Types
Most importantly, typical production systems have two common types of caches:
• Reference Caches: Caches for reference data are normally small and are used to speed up the
de-referencing as a lookup table. Those caches are commonly tend to be small and contain a
previously known, fixed number of elements, e.g., states of the USA or abbreviations of
elements.
• Active DataSet Caches: The other type of caches normally caches an active data set. These
caches run to their maximum size and evict the oldest or not frequently used entries to keep in
memory bounds. They sit in front of a database or HTML generators to cache the latest
requested data.
Hazelcast JCache eviction supports both types of caches using a slightly different approach based
on the configured maximum size of the cache. For detailed information, see the Eviction Algorithm
section.
352
Configuring Eviction Policies
Hazelcast JCache provides two commonly known eviction policies, LRU and LFU, but loosens the
rules for predictable runtime behavior. LRU, normally recognized as Least Recently Used, is
implemented as Less Recently Used and LFU known as Least Frequently Used is implemented as
Less Frequently Used. The details about this difference are explained in the Eviction Algorithm
section.
Eviction Policies are configured by providing the corresponding abbreviation to the configuration
as shown in the ICache Configuration section. As already mentioned, two built-in policies are
available:
And to configure the use of the LFU (Less Frequently Used) policy:
The default eviction policy is LRU. Therefore, Hazelcast JCache does not offer the possibility of
performing no eviction.
Besides the out-of-the-box eviction policies LFU and LRU, you can also specify your custom eviction
policies through the eviction configuration either programmatically or declaratively.
Here is an example for custom eviction policy comparator implementation for JCache:
353
public class MyCacheEvictionPolicyComparator
implements CacheEvictionPolicyComparator<Long, String> {
@Override
public int compare(CacheEntryView<Long, String> e1, CacheEntryView<Long, String>
e2) {
long id1 = e1.getKey();
long id2 = e2.getKey();
Custom eviction policy comparator can be specified through the eviction configuration by giving
the full class name of the EvictionPolicyComparator (CacheEvictionPolicyComparator for JCache and
its Near Cache) implementation or by specifying its instance itself.
Programmatic Configuration:
You can specify the custom EvictionPolicyComparator (CacheEvictionPolicyComparator for JCache and
its Near Cache) instance itself directly through EvictionConfig.
354
CacheConfig cacheConfig = new CacheConfig();
...
EvictionConfig evictionConfig =
new EvictionConfig(50000,
MaxSizePolicy.ENTRY_COUNT,
new MyEvictionPolicyComparator());
cacheConfig.setEvictionConfig(evictionConfig);
Declarative Configuration:
<hazelcast>
...
<cache name="cacheWithCustomEvictionPolicyComparator">
<eviction size="50000" max-size-policy="ENTRY_COUNT" comparator-class-name=
"com.mycompany.MyEvictionPolicyComparator"/>
</cache>
...
</hazelcast>
<hz:cache name="cacheWithCustomEvictionPolicyComparator">
<hz:eviction size="50000" max-size-policy="ENTRY_COUNT" comparator-class-name=
"com.mycompany.MyEvictionPolicyComparator"/>
</hz:cache>
You can specify the custom EvictionPolicyComparator (CacheEvictionPolicyComparator for JCache and
its Near Cache) bean in the <eviction> tag by referencing through comparator-bean attribute in
Hazelcast Spring configuration XML file
<hz:cache name="cacheWithCustomEvictionPolicyComparator">
<hz:eviction size="50000" max-size-policy="ENTRY_COUNT" comparator-bean=
"myEvictionPolicyComparatorBean"/>
</hz:cache>
Eviction Strategy
Eviction strategies implement the logic of selecting one or more eviction candidates from the
355
underlying storage implementation and passing them to the eviction policies. Hazelcast JCache
provides an amortized O(1) cost implementation for this strategy to select a fixed number of
samples from the current partition that it is executed against.
Eviction Algorithm
The Hazelcast JCache eviction algorithm is specially designed for the use case of high performance
caches and with predictability in mind. The built-in implementations provide an amortized O(1)
runtime and therefore provide a highly predictable runtime behavior which does not rely on any
kind of background threads to handle the eviction. Therefore, the algorithm takes some
assumptions into account to prevent network operations and concurrent accesses.
As an explanation of how the algorithm works, let’s examine the following flowchart step by step.
1. A new cache is created. Without any special settings, the eviction is configured to kick in when
the cache exceeds 10.000 elements and an LRU (Less Recently Used) policy is set up.
3. For every put, the eviction strategy evaluates the current cache size and decides if an eviction is
necessary or not. If not, the entry is stored in step 10.
4. If eviction is required, a new sampling is started. The built-in sampler is implemented as a lazy
iterator.
356
5. The sampling algorithm selects a random sample from the underlying data storage.
6. The eviction strategy tests whether the sampled entry is already expired (lazy expiration). If
expired, the sampling stops and the entry is removed in step 9.
7. If not yet expired, the entry (eviction candidate) is compared to the last best matching candidate
(based on the eviction policy) and the new best matching candidate is remembered.
8. The sampling is repeated 15 times and then the best matching eviction candidate is returned to
the eviction strategy.
9. The expired or best matching eviction candidate is removed from the underlying data storage.
Note that expiration based eviction does not only occur for the above scenario
(Step 6). It is mentioned for the sake of explaining the eviction algorithm.
As seen in the flowchart, the general eviction operation is easy. As long as the cache does not reach
its maximum capacity, or you execute updates (put/replace), no eviction is executed.
To prevent network operations and concurrent access, as mentioned earlier, the cache size is
estimated based on the size of the currently handled partition. Due to the imbalanced partitions,
the single partitions might start to evict earlier than the other partitions.
As mentioned in the Cache Types section, typically two types of caches are found in the production
systems. For small caches, referred to as Reference Caches, the eviction algorithm has a special set
of rules depending on the maximum configured cache size. See the Reference Caches section for
details. The other type of cache is referred to as an Active DataSet Cache, which in most cases
makes heavy use of the eviction to keep the most active data set in the memory. Those kinds of
caches use a very simple but efficient way to estimate the cluster-wide cache size.
All of the following calculations have a well known set of fixed variables:
Reference Caches
A Reference Cache is typically small and the number of elements to store in the reference caches is
normally known prior to creating the cache. Typical examples of reference caches are lookup tables
for abbreviations or the states of a country. They tend to have a fixed but small element number
and the eviction is an unlikely event and rather undesirable behavior.
Since an imbalanced partition is a worse problem in small and mid-sized caches than in caches
with millions of entries, the normal estimation rule (as discussed in a bit) is not applied to these
357
kinds of caches. To prevent unwanted eviction on the small and mid-sized caches, Hazelcast
implements a special set of rules to estimate the cluster size.
To adjust the imbalance of partitions as found in the typical runtime, the actual calculated
maximum cache size (known as the eviction threshold) is slightly higher than the user defined size.
That means more elements can be stored into the cache than expected by the user. This needs to be
taken into account especially for large objects, since those can easily exceed the expected memory
consumption!
Small caches:
If a cache is configured with no more than 4.000 elements, this cache is considered to be a small
cache. The actual partition size is derived from the number of elements (GlobalCapacity) and the
deviation using the following formula:
This formula ends up with big partition sizes which, summed up, exceed the expected maximum
cache size (set by the user). Since the small caches typically have a well known maximum number
of elements, this is not a big issue. Only if the small caches are used for a use case other than as a
reference cache, this needs to be taken into account.
Mid-sized caches
A mid-sized cache is defined as a cache with a maximum number of elements that is bigger than
4.000 but not bigger than 1.000.000 elements. The calculation of mid-sized caches is similar to that
of the small caches but with a different multiplier. To calculate the maximum number of elements
per partition, the following formula is used:
For large caches, where the maximum cache size is bigger than 1.000.000 elements, there is no
additional calculation needed. The maximum partition size is considered to be equal to
BalancedPartitionSize since statistically big partitions are expected to almost balance themselves.
Therefore, the formula is as easy as the following:
MaxPartitionSize := BalancedPartitionSize
The estimation algorithm is based on the previously calculated maximum partition size (see the
358
Reference Caches and Active DataSet Caches sections) and is calculated against the current
partition only.
The algorithm to reckon the number of stored entries in the cache (cluster-wide) and decide if the
eviction is necessary is shown in the following pseudo-code example:
The Hazelcast JCache implementation supports a local Near Cache for remotely stored entries to
increase the performance of local read operations. See the Near Cache section for a detailed
explanation of the Near Cache feature and its configuration.
Near Cache for JCache is only available for clients, NOT members.
In addition to the operations explained in ICache Async Methods and Defining a Custom
ExpiryPolicy, Hazelcast ICache also provides a set of convenience methods. These methods are not
part of the JCache specification.
• destroy(): Destroys the cache and removes its data, which makes it different from the method
javax.cache.Cache.close(); the close method closes the cache so no further operational methods
(get, put, remove, etc. See Section 4.1.6 in JCache Specification which can be downloaded from
here) can be executed on it - data is not necessarily destroyed, if you get again the same Cache
from the same CacheManager, the data will be there. In the case of destroy(), both the cache is
destroyed and cache’s data is removed.
See the ICache Javadoc to see all the methods provided by ICache.
Another feature, especially interesting for distributed environments like Hazelcast, is the JCache
specified javax.cache.processor.EntryProcessor. For more general information, see the
Implementing EntryProcessor section.
Since Hazelcast provides backups of cached entries on other members, the default way to backup
an object changed by an EntryProcessor is to serialize the complete object and send it to the backup
partition. This can be a huge network overhead for big objects.
359
EntryProcessor to run on backup partitions and apply delta changes to the backup entries.
The backup partition EntryProcessor can either be the currently running processor (by returning
this) or it can be a specialized EntryProcessor implementation (different from the currently
running one) that does different operations or leaves out operations, e.g., sending emails.
If we again take the EntryProcessor example from the demonstration application provided in the
Implementing EntryProcessor section, the changed code looks like the following snippet:
@Override
public User process( MutableEntry<Integer, User> entry, Object... arguments )
throws EntryProcessorException {
360
create or return the EntryProcessor implementation to run on the backup partition (in the example
above, the same processor again).
For the backup runs, the returned value from the backup processor is ignored and
not returned to the user.
unwrappedCache.addPartitionLostListener(new CachePartitionLostListener() {
@Override
public void partitionLost(CachePartitionLostEvent event) {
System.out.println(event);
}
});
}
}
See the Partition Lost Listener section for more information about partition lost detection and
partition lost events.
361
14.7. Testing for JCache Specification Compliance
Hazelcast JCache is fully compliant with the JSR 107 TCK (Technology Compatibility Kit), and
therefore is officially a JCache implementation.
You can test Hazelcast JCache for compliance by executing the TCK. Just perform the instructions
below:
• Change the properties in pom.xml as shown below. Alternatively, you can set the values of these
properties directly on the maven command line without editing any files as shown in the
command line example below.
• Run the TCK using the command mvn clean install. This runs the tests using an embedded
Hazelcast member.
<properties>
<jcache.version>1.1.1</jcache.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<CacheInvocationContextImpl>
javax.cache.annotation.impl.cdi.CdiCacheKeyInvocationContextImpl
</CacheInvocationContextImpl>
<domain-lib-dir>${project.build.directory}/domainlib</domain-lib-dir>
<domain-jar>domain.jar</domain-jar>
362
$ git clone https://github.com/jsr107/jsr107tck
(clones JSR107 TCK repository to local directory jsr107tck)
$ cd jsr107tck
See also the TCK 1.1.0 User Guide or TCK 1.0.0 User Guide for more information on the testing
instructions.
363
15.4. Integration with Spring
You can integrate Hazelcast with Spring and this section explains the configuration of Hazelcast
within Spring context.
Supported Versions are Spring 2.5 and higher releases and the latest tested Spring version is 4.3.
Some old versions of Spring may require minor changes in the Hazelcast
configuration. The code and configuration snippets provided in this section are
tested using Spring 4.3.
Classpath Configuration:
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast-all</artifactId>
<version>4.0.1</version>
</dependency>
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast-spring</artifactId>
<version>4.0.1</version>
</dependency>
If you use other build systems, you have to adjust the definition of dependencies to your needs.
Troubleshooting
When the Spring Integration JARs are not correctly installed in the Java classpath, you may see
either of the following exceptions:
364
org.xml.sax.SAXParseException; systemId: http://hazelcast.com/schema/spring/hazelcast-
spring.xsd; lineNumber: 2; columnNumber: 35; s4s-elt-character: Non-whitespace
characters are not allowed in schema elements other than 'xs:appinfo' and
'xs:documentation'. Saw '301 Moved Permanently'.
org.springframework.beans.factory.parsing.BeanDefinitionParsingException:
Configuration problem: Unable to locate Spring NamespaceHandler for XML schema
namespace [http://www.hazelcast.com/schema/spring]
In this case, please ensure that the required classes are in the classpath, as explained above.
Bean Declaration:
You can declare Hazelcast Objects using the default Spring beans namespace. Example code for a
Hazelcast Instance declaration is listed below.
Hazelcast has its own namespace hazelcast for bean definitions. You can easily add the namespace
declaration xmlns:hz="http://www.hazelcast.com/schema/spring" to the beans element in the
context file so that hz namespace shortcut can be used as a bean declaration.
365
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:hz="http://www.hazelcast.com/schema/spring"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-4.3.xsd
http://www.hazelcast.com/schema/spring
http://www.hazelcast.com/schema/spring/hazelcast-spring.xsd">
<hz:hazelcast id="instance">
<hz:config>
<hz:cluster name="dev" password="password"/>
<hz:network port="5701" port-auto-increment="false">
<hz:join>
<hz:multicast enabled="false"
multicast-group="224.2.2.3"
multicast-port="54327"/>
<hz:tcp-ip enabled="true">
<hz:members>10.10.1.2, 10.10.1.3</hz:members>
</hz:tcp-ip>
</hz:join>
</hz:network>
<hz:map name="map"
backup-count="2"
read-backup-data="true"
merge-policy="com.hazelcast.spi.merge.PassThroughMergePolicy">
<hz:eviction eviction-policy="NONE" size="0"/>
</hz:map>
</hz:config>
</hz:hazelcast>
<hz:client id="client">
<hz:cluster name="${cluster.name}" password="${cluster.password}" />
<hz:network connection-attempt-period="3000"
connection-timeout="1000"
redo-operation="true"
smart-routing="true">
<hz:member>10.10.1.2:5701</hz:member>
<hz:member>10.10.1.3:5701</hz:member>
</hz:network>
</hz:client>
366
• Hazelcast Supported Type Configurations and Examples
◦ map
◦ multiMap
◦ replicatedmap
◦ queue
◦ topic
◦ reliableTopic
◦ set
◦ list
◦ executorService
◦ durableExecutorService
◦ scheduledExecutorService
◦ ringbuffer
◦ cardinalityEstimator
◦ idGenerator
◦ flakeIdGenerator
◦ atomicLong
◦ atomicReference
◦ semaphore
◦ countDownLatch
◦ lock
367
<hz:map id="map" instance-ref="client" name="map" lazy-init="true" />
<hz:multiMap id="multiMap" instance-ref="instance" name="multiMap"
lazy-init="false" />
<hz:replicatedMap id="replicatedmap" instance-ref="instance"
name="replicatedmap" lazy-init="false" />
<hz:queue id="queue" instance-ref="client" name="queue"
lazy-init="true" depends-on="instance"/>
<hz:topic id="topic" instance-ref="instance" name="topic"
depends-on="instance, client"/>
<hz:reliableTopic id="reliableTopic" instance-ref="instance" name="
reliableTopic"/>
<hz:set id="set" instance-ref="instance" name="set" />
<hz:list id="list" instance-ref="instance" name="list"/>
<hz:executorService id="executorService" instance-ref="client"
name="executorService"/>
<hz:durableExecutorService id="durableExec" instance-ref="instance" name=
"durableExec"/>
<hz:scheduledExecutorService id="scheduledExec" instance-ref="instance" name=
"scheduledExec"/>
<hz:ringbuffer id="ringbuffer" instance-ref="instance" name="ringbuffer"/>
<hz:cardinalityEstimator id="cardinalityEstimator" instance-ref="instance" name
="cardinalityEstimator"/>
<hz:idGenerator id="idGenerator" instance-ref="instance"
name="idGenerator"/>
<hz:flakeIdGenerator id="flakeIdGenerator" instance-ref="instance"
name="flakeIdGenerator"/>
<hz:atomicLong id="atomicLong" instance-ref="instance" name="atomicLong"/>
<hz:atomicReference id="atomicReference" instance-ref="instance"
name="atomicReference"/>
<hz:semaphore id="semaphore" instance-ref="instance" name="semaphore"/>
<hz:countDownLatch id="countDownLatch" instance-ref="instance"
name="countDownLatch"/>
<hz:lock id="lock" instance-ref="instance" name="lock"/>
For map-store, you should set either the class-name or the implementation attribute.
368
<hz:config id="config">
<hz:map name="map1">
<hz:map-store enabled="true" class-name="com.foo.DummyStore"
write-delay-seconds="0" />
<hz:near-cache time-to-live-seconds="0"
max-idle-seconds="60" invalidate-on-change="true" >
<hz:eviction eviction-policy="LRU" size="5000"/>
</hz:near-cache>
</hz:map>
<hz:map name="map2">
<hz:map-store enabled="true" implementation="dummyMapStore"
write-delay-seconds="0" />
</hz:map>
</hz:config>
You can mark Hazelcast Distributed Objects with @SpringAware if the object wants to apply:
• bean properties
Hazelcast Distributed ExecutorService, or more generally any Hazelcast managed object, can benefit
from these features. To enable SpringAware objects, you must first configure HazelcastInstance
using hazelcast namespace as explained in Configuring Spring and add <hz:spring-aware /> tag.
SpringAware Examples
• Configure a Hazelcast Instance via Spring Configuration and define someBean as Spring Bean.
369
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:hz="http://www.hazelcast.com/schema/spring"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
http://www.springframework.org/schema/context
http://www.springframework.org/schema/context/spring-context-
3.0.xsd
http://www.hazelcast.com/schema/spring
http://www.hazelcast.com/schema/spring/hazelcast-spring.xsd">
<context:component-scan base-package="..."/>
<hz:hazelcast id="instance">
<hz:config>
<hz:spring-aware />
<hz:cluster name="dev" password="password"/>
<hz:network port="5701" port-auto-increment="false">
<hz:join>
<hz:multicast enabled="false" />
<hz:tcp-ip enabled="true">
<hz:members>10.10.1.2, 10.10.1.3</hz:members>
</hz:tcp-ip>
</hz:join>
</hz:network>
...
</hz:config>
</hz:hazelcast>
• Create a class called SomeValue which contains Spring Bean definitions like ApplicationContext
and SomeBean.
370
@SpringAware
@Component("someValue")
@Scope("prototype")
public class SomeValue implements Serializable, ApplicationContextAware {
@Autowired
public void setSomeBean( SomeBean someBean) {
this.someBean = someBean;
}
@PostConstruct
public void init() {
someBean.doSomethingUseful();
init = true;
}
}
• Get SomeValue Object from Context and put it into Hazelcast Distributed Map on the first
member.
HazelcastInstance hazelcastInstance =
(HazelcastInstance) context.getBean( "instance" );
SomeValue value = (SomeValue) context.getBean( "someValue" );
IMap<String, SomeValue> map = hazelcastInstance.getMap( "values" );
map.put( "key", value );
• Read SomeValue Object from Hazelcast Distributed Map and assert that init method is called
since it is annotated with @PostConstruct.
HazelcastInstance hazelcastInstance =
(HazelcastInstance) context.getBean( "instance" );
IMap<String, SomeValue> map = hazelcastInstance.getMap( "values" );
SomeValue value = map.get( "key" );
Assert.assertTrue( value.init );
• Create a Callable Class called SomeTask which contains Spring Bean definitions like
371
ApplicationContext, SomeBean.
@SpringAware
public class SomeTask
implements Callable<Long>, ApplicationContextAware, Serializable {
@Autowired
public void setSomeBean( SomeBean someBean ) {
this.someBean = someBean;
}
}
• Submit SomeTask to two Hazelcast Members and assert that someBean is autowired.
HazelcastInstance hazelcastInstance =
(HazelcastInstance) context.getBean( "instance" );
SomeBean bean = (SomeBean) context.getBean( "someBean" );
Future<Long> f = hazelcastInstance.getExecutorService("executorService")
.submit(new SomeTask());
Assert.assertEquals(bean.value, f.get().longValue());
// choose a member
Member member = hazelcastInstance.getCluster().getMembers().iterator().next();
As of version 3.1, Spring Framework provides support for adding caching into an existing Spring
372
application. Spring 3.2 and later versions support JCache compliant caching providers. You can also
use JCache caching backed by Hazelcast if your Spring version supports JCache.
<hz:hazelcast id="instance">
...
</hz:hazelcast>
Hazelcast uses its Map implementation for underlying cache. You can configure a map with your
cache’s name if you want to set additional configuration such as ttl.
<hz:hazelcast id="instance">
<hz:config>
...
You can define a timeout value for the get operations from your Spring cache. This may be useful
for some cases, such as SLA requirements. Hazelcast provides a property to specify this timeout:
hazelcast.spring.cache.prop. This can be specified as a Java property (using -D) or you can add this
property to your Spring properties file (usually named as application.properties).
373
hazelcast.spring.cache.prop=defaultReadTimeout=2,cache1=10,cache2=20
The argument defaultReadTimeout applies to all of your Spring caches. If you want to define
different timeout values for some specific Spring caches, you can provide them as a comma
separated list as shown in the above example usage. The values are in milliseconds. If you want to
have no timeout for a cache, simply set it to 0 or a negative value.
<hz:hazelcast id="instance">
...
</hz:hazelcast>
You can use JCache implementation in both member and client mode. A cache manager should be
bound to an instance. Instance can be referenced by instance-ref attribute or provided by
hazelcast.instance.name property which is passed to CacheManager. Instance should be specified
using one of these methods.
You can specify an URI for each cache manager with uri attribute.
Annotation-Based Configuration does not require any XML definition. To perform Annotation-
Based Configuration:
374
@Configuration
@EnableCaching
public class CachingConfiguration extends CachingConfigurerSupport {
@Bean
public CacheManager cacheManager() {
ClientConfig config = new ClientConfig();
HazelcastInstance client = HazelcastClient.newHazelcastClient(config);
return new com.hazelcast.spring.cache.HazelcastCacheManager(client);
}
@Bean
public KeyGenerator keyGenerator() {
return null;
}
}
For more information about Spring Cache, see Spring Cache Abstraction.
Code Sample: See the sample application for Hibernate 2nd Level Cache configuration.
If you are using Hibernate with Hazelcast as a second level cache provider, you can easily configure
your LocalSessionFactoryBean to use a Hazelcast instance by passing Hazelcast instance name. That
way, you can use the same HazelcastInstance as Hibernate L2 cache instance.
375
...
<bean id="sessionFactory"
class="org.springframework.orm.hibernate3.LocalSessionFactoryBean"
scope="singleton">
<property name="dataSource" ref="dataSource"/>
<property name="hibernateProperties">
<props>
...
<prop key="hibernate.cache.region.factory_class"
>com.hazelcast.hibernate.HazelcastLocalCacheRegionFactory</prop>
<prop key="hibernate.cache.hazelcast.instance_name">
${hz.instance.name}</prop>
</props>
</property>
...
</bean>
• com.hazelcast.hibernate.HazelcastLocalCacheRegionFactory
• com.hazelcast.hibernate.HazelcastCacheRegionFactory
See the Configuring RegionFactory section in the Hazelcast Hibernate GitHub repository for more
information.
Code Sample: See the sample application for Hazelcast Transaction Manager in our code samples
repository.
You can get rid of the boilerplate code to begin, commit or rollback transactions by using
HazelcastTransactionManager which is a PlatformTransactionManager implementation to be used
with Spring Transaction API.
376
...
<hz:hazelcast id="instance">
...
</hz:hazelcast>
...
<tx:annotation-driven transaction-manager="transactionManager"/>
<bean id="transactionManager" class=
"com.hazelcast.spring.transaction.HazelcastTransactionManager">
<constructor-arg ref="instance"/>
</bean>
<bean id="transactionalContext" class=
"com.hazelcast.spring.transaction.ManagedTransactionalTaskContext">
<constructor-arg ref="transactionManager"/>
</bean>
<bean id="YOUR_SERVICE" class="YOUR_SERVICE_CLASS">
<property name="transactionalTaskContext" ref="transactionalContext"/>
</bean>
...
@Transactional
public void transactionalPut(String key, String value) {
transactionalTaskContext.getMap("testMap").put(key, value);
}
...
}
After marking your method as Transactional either declaratively or by annotation and accessing
the data structure through the TransactionalTaskContext, HazelcastTransactionManager begins,
commits or rollbacks the transaction for you.
Spring tries to create a new Map/Collection instance and fill the new instance by iterating and
converting values of the original Map/Collection (IMap, IQueue, etc.) to required types when generic
type parameters of the original Map/Collection and the target property/attribute do not match.
Since Hazelcast Maps/Collections are designed to hold very large data which a single machine
cannot carry, iterating through whole values can cause out of memory errors.
To avoid this issue, the target property/attribute can be declared as un-typed Map/Collection as
shown below.
377
public class SomeBean {
@Autowired
IMap map; // instead of IMap<K, V> map
@Autowired
IQueue queue; // instead of IQueue<E> queue
...
}
Or, parameters of injection methods (constructor, setter) can be un-typed as shown below.
...
16. Storage
This chapter describes Hazelcast’s High-Density Memory Store and Hot Restart Persistence features
along with their configurations and gives recommendations on the storage sizing.
By default, data structures in Hazelcast store data on heap in serialized form for highest data
compaction; yet, these data structures are still subject to Java Garbage Collection (GC). Modern
hardware has much more available memory. If you want to make use of that hardware and scale
up by specifying higher heap sizes, GC becomes an increasing problem: the application faces long
GC pauses that make the application unresponsive. Also, you may get out of memory errors if you
378
fill your whole heap. Garbage collection, which is the automatic process that manages the
application’s runtime memory, often forces you into configurations where multiple JVMs with small
heaps (sizes of 2-4GB per member) run on a single physical hardware device to avoid garbage
collection pauses. This results in oversized clusters to hold the data and leads to performance level
requirements.
In Hazelcast IMDG Enterprise HD, the High-Density Memory Store is Hazelcast’s new enterprise
in-memory storage solution. It solves garbage collection limitations so that applications can exploit
hardware memory more efficiently without the need of oversized clusters. High-Density Memory
Store is designed as a pluggable memory manager which enables multiple memory stores for
different data structures. These memory stores are all accessible by a common access layer that
scales up to terabytes of the main memory on a single JVM by minimizing the GC pressure. High-
Density Memory Store enables predictable application scaling and boosts performance and latency
while minimizing garbage collection pauses.
This foundation includes, but is not limited to, storing keys and values next to the heap in a native
memory region.
High-Density Memory Store is currently provided for the following Hazelcast features and
implementations:
• Map
• JCache Implementation
• Near Cache
To use the High-Density memory storage, the native memory usage must be enabled using the
programmatic or declarative configuration. Also, you can configure its size, memory allocator type,
minimum block size, page size and metadata space percentage.
• size: Size of the total native memory to allocate in megabytes. Its default value is 512 MB.
• allocator type: Type of the memory allocator. Available values are as follows:
◦ STANDARD: This option is used internally by Hazelcast’s POOLED allocator type or for
debugging/testing purposes.
▪ With this option, the memory is allocated or deallocated using your operating system’s
default memory manager.
▪ It uses GNU C Library’s standard malloc() and free() methods which are subject to
379
contention on multithreaded/multicore systems.
▪ Memory operations may become slower when you perform a lot of small allocations and
deallocations.
▪ It may cause large memory fragmentations, unless you use a method in the background
that emphasizes fragmentation avoidance, such as jemalloc(). Note that a large memory
fragmentation can trigger the Linux Out of Memory Killer if there is no swap space
enabled in your system. Even if the swap space is enabled, the killer can be again
triggered if there is not enough swap space left.
▪ If you still want to use the operating system’s default memory management, you can set
the allocator type to STANDARD in your native memory configuration.
◦ POOLED: This is the default option, Hazelcast’s own pooling memory allocator.
▪ With this option, memory blocks are managed using internal memory pools.
▪ It allocates memory blocks, each of which has a 4MB page size by default, and splits
them into chunks or merges them to create larger chunks when required. Sizing of these
chunks follows the buddy memory allocation algorithm, i.e., power-of-two sizing.
▪ It never frees memory blocks back to the operating system. It marks disposed memory
blocks as available to be used later, meaning that these blocks are reusable.
▪ Memory allocation and deallocation operations (except the ones requiring larger sizes
than the page size) do not interact with the operating system mostly.
▪ For memory allocation, it tries to find the requested memory size inside the internal
memory pools. If it cannot be found, then it interacts with the operating system.
• minimum block size: Minimum size of the blocks in bytes to split and fragment a page block to
assign to an allocation request. It is used only by the POOLED memory allocator. Its default
value is 16 bytes.
• page size: Size of the page in bytes to allocate memory as a block. It is used only by the POOLED
memory allocator. Its default value is 1 << 22 = 4194304 Bytes, about 4 MB.
• metadata space percentage: Defines the percentage of the allocated native memory that is
used for internal memory structures by the High-Density Memory for tracking the used and
available memory blocks. It is used only by the POOLED memory allocator. Its default value is
12.5. Please note that when the memory runs out, you get a NativeOutOfMemoryException; if your
store has a large number of entries, you should consider increasing this percentage.
380
MemorySize memorySize = new MemorySize(512, MemoryUnit.MEGABYTES);
NativeMemoryConfig nativeMemoryConfig =
new NativeMemoryConfig()
.setAllocatorType(NativeMemoryConfig.MemoryAllocatorType.POOLED)
.setSize(memorySize)
.setEnabled(true)
.setMinBlockSize(16)
.setPageSize(1 << 20);
<hazelcast>
...
<native-memory allocator-type="POOLED" enabled="true">
<size unit="MEGABYTES" value="512"/>
<min-block-size>16</min-block-size>
<page-size>4194304</page-size>
<metadata-space-percentage>12.5</metadata-space-percentage>
<persistent-memory-directory>/mnt/persmemory/data</persistent-memory-
directory>
</native-memory>
...
</hazelcast>
You can check whether there is enough free physical memory for the requested
number of bytes using the system property hazelcast.hidensity.check.freememory.
See the System Properties appendix on how to use Hazelcast system properties.
To support larger and more affordable storage for data structures like IMap, ICache and Near
Cache, Hazelcast provides integration with persistent memory technologies like Intel® Optane™ DC.
To benefit from the technology you do not need to make any changes in your application code. Only
a few configuration changes are required.
The following snippets demonstrate how to configure the persistent memory as High-Density
Memory Store in Hazelcast. The directory /mnt/optane/data is used as an example for Intel®
Optane™ DC.
Declarative Configuration:
381
<hazelcast>
...
<native-memory allocator-type="POOLED" enabled="true">
<size unit="GIGABYTES" value="100" />
<persistent-memory-directory>/mnt/optane/data</persistent-memory-directory>
</native-memory>
...
</hazelcast>
Programmatic Configuration:
If you use only heap memory, each Hazelcast member with a 4 GB heap should accommodate a
maximum of 3.5 GB of total data (active and backup). If you use the High-Density Memory Store, up
to 75% of the configured physical memory footprint may be used for active and backup data, with
headroom of 25% for normal memory fragmentation. In both cases, however, you should also keep
some memory headroom available to handle any member failure or explicit member shutdown.
When a member leaves the cluster, the data previously owned by the newly offline member is
distributed among the remaining members. For this reason, we recommend that you plan to use
only 60% of available memory, with 40% headroom to handle member failure or shutdown.
This chapter explains Hazelcast’s Hot Restart Persistence feature. It provides fast cluster restarts by
382
storing the states of the cluster members on the disk. This feature is currently provided for the
Hazelcast map data structure and Hazelcast JCache implementation.
Hot Restart Persistence enables you to get your cluster up and running swiftly after a cluster
restart. A restart can be caused by a planned shutdown (including rolling upgrades) or a sudden
cluster-wide crash, e.g., power outage. For Hot Restart Persistence, required states for Hazelcast
clusters and members are introduced. See the Managing Cluster and Member States section for
information on the cluster and member states. The purpose of the Hot Restart Persistence is to
provide a maintenance window for member operations and restart the cluster in a fast way. It is
not meant to recover the catastrophic shutdown of one member.
Hot Restart Persistence supports optional data encryption. See the Encryption at Rest section for
more information.
The Hot Restart feature is supported for the following restart types:
◦ The cluster is shut down completely and restarted with the exact same previous setup and
data.
Difference between explicitly changing state to PASSIVE before shutdown and shutting down
cluster directly via HazelcastInstance.getCluster().shutdown() is, on the latter case when
cluster is restarted, the cluster state will be in the latest state before shutdown. That means
if cluster is ACTIVE before shutdown, cluster state automatically becomes ACTIVE after restart
is completed.
◦ Rolling restart: The cluster is restarted intentionally member by member. For example, this
could be done to install an operating system patch or new hardware.
To be able to shut down the cluster member by member as part of a planned restart, each
member in the cluster should be in the FROZEN or PASSIVE state. After the cluster state is
changed to FROZEN or PASSIVE, you can manually shut down each member by calling the
method HazelcastInstance.shutdown(). When that member is restarted, it rejoins the running
cluster. After all members are restarted, the cluster state can be changed back to ACTIVE.
• Restart after a cluster crash: The cluster is restarted after all its members crashed at the same
time due to a power outage, networking interruptions, etc.
383
16.3.3. Restart Process
During the restart process, each member waits to load data until all the members in the partition
table are started. During this process, no operations are allowed. Once all cluster members are
started, Hazelcast changes the cluster state to PASSIVE and starts to load data. When all data is
loaded, Hazelcast changes the cluster state to its previous known state before shutdown and starts
to accept the operations which are allowed by the restored cluster state.
If a member fails to either start, join the cluster in time (within the timeout), or load its data, then
that member is terminated immediately. After the problems causing the failure are fixed, that
member can be restarted. If the cluster start cannot be completed in time, then all members fail to
start. See the Configuring Hot Restart section for defining timeouts.
In the case of a restart after a cluster crash, the Hot Restart feature realizes that it was not a clean
shutdown and Hazelcast tries to restart the cluster with the last saved data following the process
explained above. In some cases, specifically when the cluster crashes while it has an ongoing
partition migration process, currently it is not possible to restore the last saved state.
• You have a cluster consisting of members A, B and C with Hot Restart enabled, which is initially
stable.
• Member B is killed.
• Member B restarts.
Since only a single member has failed, the cluster performed the standard High Availability routine
by recovering member B’s data from backups and redistributing the data among the remaining
members (the members A and B in this case). Member B’s persisted Hot Restart data is completely
irrelevant.
Furthermore, when a member starts with existing Hot Restart data, it expects to find itself within a
cluster that has been shut down as a whole and is now restarting as a whole. Since the reality is
that the cluster has been running all along, member B’s persisted cluster state does not match the
actual state. Depending on the automatic removal of stale data (auto-remove-stale-data)
configuration:
• Otherwise, member B aborts the initialization and shuts down. To be able to join the cluster, Hot
Restart directory previously used by member B inside the base directory (base-dir) must be
deleted manually.
A member can crash permanently and then be unable to recover from the failure. In that case,
restart process cannot be completed since some of the members do not start or fail to load their
384
own data. In that case, you can force the cluster to clean its persisted data and make a fresh start.
This process is called force start.
• Member A uses its Hot Restart data to initiate the Hot Restart procedure.
• Since it knows the cluster originally contained member B as well, it waits for it to join.
• Now you have the choice to Force Start the cluster without member B.
You can trigger the force start process using the Management Center, REST API and cluster
management scripts.
Please note that force start is a destructive process, which results in deletion of persisted Hot
Restart data.
See the Hot Restart functionality of the Management Center section to learn how you can perform a
force start using the Management Center.
When one or more members fail to start or have incorrect Hot Restart data (stale or corrupted data)
or fail to load their Hot Restart data, cluster becomes incomplete and restart mechanism cannot
proceed. One solution is to use Force Start and make a fresh start with existing members. Another
solution is to do a partial start.
Partial start means that the cluster starts with an incomplete member set. Data belonging to those
missing members is assumed lost and Hazelcast tries to recover missing data using the restored
backups. For example, if you have minimum two backups configured for all maps and caches, then
a partial start up to two missing members will be safe against data loss. If there are more than two
missing members or there are maps/caches with less than two backups, then data loss is expected.
When partial start is enabled, Hazelcast can perform a partial start automatically or manually, in
case of some members are unable to restart successfully. Partial start proceeds automatically when
some members fail to start and join to the cluster in validation-timeout-seconds. After the
validation-timeout-seconds duration is passed, Hot Restart chooses to perform partial start with the
members present in the cluster. Moreover, partial start can be requested manually using the
385
Management Center, REST API and cluster management scripts before the validation-timeout-
seconds duration passes.
The other situation to decide to perform a partial start is failures during the data load phase. When
Hazelcast learns data load result of all members which have passed the validation step, it
automatically performs a partial start with the ones which have successfully restored their Hot
Restart data. Please note that partial start does not expect every member to succeed in the data load
step. It completes the process when it learns data load result for every member and there is at least
one member which has successfully restored its Hot Restart data. Relatedly, if it cannot learn data
load result of all members before data-load-timeout-seconds duration, it proceeds with the ones
which have already completed the data load process.
Selection of members to perform partial start among live members is done according to the
cluster-data-recovery-policy configuration. Set of members which are not selected by the cluster-
data-recovery-policy are called Excluded members and they are instructed to perform force start.
Excluded members are allowed to join cluster only when they clean their Hot Restart data and
make a fresh-new start. This is a completely automatic process. For instance, if you start the missing
members after partial start is completed, they clean their Hot Restart data and join to the cluster.
Please note that partial start is a destructive process. Once it is completed, it cannot be repeated
with a new configuration. For this reason, one may need to perform the partial start process
manually. Automatic behavior of partial start relies on validation-timeout-seconds and data-load-
timeout-seconds configuration values. If you need to control the process manually, validation-
timeout-seconds and data-load-timeout-seconds properties can be set to very big values so that
Hazelcast cannot make progress on timeouts automatically. Then, the overall process can be
managed manually via aforementioned methods, i.e., Management Center, REST API and cluster
management scripts.
You can configure Hot Restart feature programmatically or declaratively. There are two steps of
configuration:
1. Enabling and configuring the Hot Restart feature globally in your Hazelcast configuration: This
is done using the configuration element hot-restart-persistence. See the Global Hot Restart
Configuration section below.
2. Enabling and configuring the Hazelcast data structures to use the Hot Restart feature: This is
done using the configuration element hot-restart. See the Per Data Structure Hot Restart
Configuration section below.
This is where you configure the Hot Restart feature itself using the hot-restart-persistence
element. The following are the descriptions of its attribute and sub-elements:
• enabled: Attribute of the hot-restart-persistence element which specifies whether the feature is
globally enabled in your Hazelcast configuration. Set this attribute to true if you want any of
your data structures to use the Hot Restart feature.
• base-dir: Specifies the parent directory where the Hot Restart data is stored. The default value
386
for base-dir is hot-restart. You can use the default value, or you can specify the value of
another folder containing the Hot Restart configuration, but it is mandatory that base-dir
element has a value. This directory is created automatically if it does not exist.
base-dir is used as the parent directory, and a unique Hot Restart directory is created inside
base-dir for each Hazelcast member which uses the same base-dir. That means, base-dir can be
shared among multiple Hazelcast members safely. This is especially useful for cloud
environments where the members generally use a shared filesystem.
When a Hazelcast member starts, it tries to acquire the ownership of the first available Hot
Restart directory inside the base-dir. If base-dir is empty or if the starting member fails to
acquire the ownership of any directory (happens when all the directories are already acquired
by other Hazelcast members), then it creates its own fresh directory.
Previously, base-dir was being used only by a single Hazelcast member. If such
an existing base-dir is configured for a Hazelcast member, Hot Restart starts in
legacy mode and base-dir is used only by a single member, without creating a
unique sub-directory. Other members trying to use that base-dir fails during
the startup.
• backup-dir: Specifies the directory under which Hot Restart snapshots (Hot Backups) are stored.
See the Hot Backup section for more information.
• parallelism: Level of parallelism in Hot Restart Persistence. There are this many I/O threads,
each writing in parallel to its own files. During the Hot Restart procedure, this many I/O threads
are reading the files and this many rebuilder threads are rebuilding the Hot Restart metadata.
The default value for this property is 1. This is a good default in most but not all cases. You
should measure the raw I/O throughput of your infrastructure and test with different values of
parallelism. In some cases such as dedicated hardware higher parallelism can yield more
throughput of Hot Restart. In other cases such as running on EC2, it can yield diminishing
returns - more thread scheduling, more contention on I/O and less efficient garbage collection.
• validation-timeout-seconds: Validation timeout for the Hot Restart process when validating the
cluster members expected to join and the partition table on the whole cluster.
• data-load-timeout-seconds: Data load timeout for the Hot Restart process. All members in the
cluster should finish restoring their local data before this timeout.
• cluster-data-recovery-policy: Specifies the data recovery policy that is respected during the Hot
Restart cluster start. Valid values are;
◦ FULL_RECOVERY_ONLY: Starts the cluster only when all expected members are present and
correct. Otherwise, it fails. This is the default value.
◦ PARTIAL_RECOVERY_MOST_RECENT: Starts the cluster with the members which have most up-to-
date partition table and successfully restored their data. All other members leave the cluster
and force start themselves. If no member restores its data successfully, cluster start fails.
◦ PARTIAL_RECOVERY_MOST_COMPLETE: Starts the cluster with the largest group of members which
have the same partition table version and successfully restored their data. All other
members leave the cluster and force start themselves. If no member restores its data
successfully, cluster start fails.
387
• auto-remove-stale-data: Enables automatic removal of the stale Hot Restart data. When a
member terminates or crashes when the cluster state is ACTIVE, the remaining members
redistribute the data among themselves and the data persisted on terminated member’s storage
becomes stale. That terminated member cannot rejoin the cluster without removing Hot Restart
data. When auto-removal of stale Hot Restart data is enabled, while restarting that member, Hot
Restart data is automatically removed and it joins the cluster as a completely new member.
Otherwise, Hot Restart data should be removed manually.
• encryption-at-rest: Configures encryption on the Hot Restart data level. See the Encryption at
Rest section for more information.
This is where you configure the data structures of your choice, so that they can have the Hot Restart
feature. This is done using the hot-restart configuration element. As it is explained in the
introduction paragraph, Hot Restart feature is currently supported by Hazelcast map data structure
and JCache implementation (map and cache), each of which has the hot-restart configuration
element. The following are the descriptions of this element’s attribute and sub-element:
• enabled: Attribute of the hot-restart element which specifies whether the Hot Restart feature is
enabled for the related data structure. Its default value is false.
• fsync: Turning on fsync guarantees that data is persisted to the disk device when a write
operation returns successful response to the caller. By default, fsync is turned off (false). That
means data is persisted to the disk device eventually, instead of on every disk write. This
generally provides a better performance.
The following are example configurations for a Hazelcast map and JCache implementation.
Declarative Configuration:
388
<hazelcast>
...
<hot-restart-persistence enabled="true">
<base-dir>/mnt/hot-restart</base-dir>
<parallelism>1</parallelism>
<validation-timeout-seconds>120</validation-timeout-seconds>
<data-load-timeout-seconds>900</data-load-timeout-seconds>
<cluster-data-recovery-policy>FULL_RECOVERY_ONLY</cluster-data-recovery-
policy>
<auto-remove-stale-data>true</auto-remove-stale-data>
</hot-restart-persistence>
...
<map name="test-map">
<hot-restart enabled="true">
<fsync>false</fsync>
</hot-restart>
</map>
...
<cache name="test-cache">
<hot-restart enabled="true">
<fsync>false</fsync>
</hot-restart>
</cache>
...
</hazelcast>
Programmatic Configuration:
389
Configuring Hot Restart Store on Intel® Optane™ DC Persistent Memory
Hazelcast can be configured to use Intel® Optane™ DC Persistent Memory as the Hot Restart
directory. For this, you need to perform the following steps:
Using this Persistent Memory, you can get the Hot Restart times approximately by 250%. You can
find the configuration steps in the Hot Restart Store section of the Hazelcast IMDG Operations and
Deployment Guide.
After Hazelcast member owning the Hot Restart data is shutdown, Hot Restart base-dir can be
copied/moved to a different server (which may have different IP address and/or different number
of CPU cores) and Hazelcast member can be restarted using the existing Hot Restart data on that
new server. Having a new IP address does not affect Hot Restart, since it does not rely on the IP
address of the server but instead uses Member UUID as a unique identifier.
• Replacing one or more faulty servers with the new ones easily without touching remaining
cluster.
• Using Hot Restart on the cloud environments easily. Sometimes cloud providers do not preserve
the IP addresses on restart or after shutdown. Also it is possible to startup the whole cluster on a
different set of machines.
• Copying production data to the test environment, so that a more functional test cluster can bet
setup.
Unfortunately having different number of CPU cores is not that straightforward. Hazelcast partition
threads, by default, uses a heuristic from the number of cores, e.g., # of partition threads = # of
CPU cores. When a Hazelcast member is started on a server with a different CPU core count,
number of Hazelcast partition threads changes and that makes Hot Restart fail during the startup.
Solution is to explicitly set number of Hazelcast partition threads
(hazelcast.operation.thread.count system property) and Hot Restart parallelism configuration and
use the same parameters on the new server. For setting system properties see the System Properties
appendix.
Hazelcast’s Hot Restart Persistence uses the log-structured storage approach. The following is a top-
level design description:
• What is appended are facts about events that happened to the data model represented by the
store; either a new value was assigned to a key or a key was removed.
• Each record associated with a key makes stale the previous record that was associated with that
390
key.
• Stale records contribute to the amount of garbage present in the persistent storage.
This kind of design focuses almost all of the system’s complexity into the garbage collection (GC)
process, stripping down the client’s operation to the bare necessity of guaranteeing persistent
behavior: a simple file append operation. Consequently, the latency of operations is close to the
theoretical minimum in almost all cases. Complications arise only during prolonged periods of
maximum load; this is where the details of the GC process begin to matter.
In order to maintain the lowest possible footprint in the update operation latency, the following
properties are built into the garbage collection process:
• A dedicated thread performs the GC. In Hazelcast terms, this thread is called the Collector and
the application thread is called the Mutator.
• The Collector keeps draining its work queue at all times, including the time it goes through the
GC cycle. Updates are taken into account at each stage in the GC cycle, preventing the copying of
already dead records into compacted files.
• All GC-induced I/O competes for the same resources as the Mutator’s update operations.
Therefore, measures are taken to minimize the impact of I/O done during GC:
◦ a heuristic scheme is employed which minimizes the number of bytes written to the disk for
each kilobyte of the reclaimed garbage
◦ measures are also taken to achieve a good interleaving of Collector and Mutator operations,
minimizing latency outliers perceived by the Mutator
The success of this scheme is subject to a bet on the Weak Generational Garbage Hypothesis, which
states that a new record entering the system is likely to become garbage soon. In other words, a key
updated now is more likely than average to be updated again soon.
The scheme was taken from the seminal Sprite LFS paper, Rosenblum, Ousterhout, The Design and
Implementation of a Log-Structured File System. The following is an outline of the paper:
• Data is not written to one huge file, but to many files of moderate size (8 MB) called "chunks".
• Garbage is collected incrementally, i.e. by choosing several chunks, then copying all their live
data to new chunks, then deleting the old ones.
• I/O is minimized using a collection technique which results in a bimodal distribution of chunks
with respect to their garbage content: most files are either almost all live data or they are all
garbage.
391
• The technique consists of two main principles:
Cost-Benefit Factor
1. The ratio of benefit (amount of garbage that can be collected) to I/O cost (amount of live data to
be written).
2. The age of the data in the chunk, measured as the age of the youngest record it contains.
The essence is in the second component: given equal amount of garbage in all chunks, it makes the
young ones less attractive to the Collector. Assuming the generational garbage hypothesis, this
allows the young chunks to quickly accumulate more garbage. On the flip side, it also ensures that
even files with little garbage are eventually garbage collected. This removes garbage which would
otherwise linger on, thinly spread across many chunk files.
Sorting records by age groups the young records together in a single chunk and does the same for
older records. Therefore the chunks are either tend to keep their data live for a longer time, or
quickly become full of garbage.
In this section you can find performance test summaries which are results of benchmark tests
performed with a single Hazelcast member running on a physical server and on AWS R3.
We have tested a member which has an IMap with High-Density Data Store. Its data size is changed
for each test, started from 10 GB to 500 GB (each map entry has a value of 1 KB).
The tests investigate the write and read performance of Hot Restart Persistence and are performed
on HP ProLiant servers with RHEL 7 operating system using Hazelcast Simulator.
The following are the specifications of the server hardware used for the test:
• CPU: 2x Intel® Xeon® CPU E5-2687W v3 @ 3.10GHz – with 10 cores per processor. Total 20
cores, 40 with hyper threading enabled.
• Memory: 768GB 2133 MHz memory 24x HP 32GB 4Rx4 PC4-2133P-L Kit
The following are the storage media used for the test:
392
Performance on AWS R3
We have tested a member which has an IMap with High-Density Data Store:
• This map has 40 million distinct keys, each map entry is 1 KB.
The tests investigate the write and read performance of Hot Restart Persistence and are performed
on R3.2xlarge and R3.4xlarge EC2 instances using Hazelcast Simulator.
The following are the AWS storage types used for the test:
• Elastic Block Storage with Provisioned IOPS (IO1) (Provisioned 10,000 IOPS on a 340 GiB volume,
enabled EBS-optimized on instance)
393
16.3.11. Hot Backup
During Hot Restart operations you can take a snapshot of the Hot Restart Store at a certain point in
time. This is useful when you wish to bring up a new cluster with the same data or parts of the data.
The new cluster can then be used to share load with the original cluster, to perform testing, QA or
reproduce an issue on production data.
Simple file copying of a currently running cluster does not suffice and can produce inconsistent
snapshots with problems such as resurrection of deleted values or missing values.
To create snapshots you must first configure the Hot Restart backup directory. You can configure
the directory programmatically or declaratively using the following configuration element:
• backup-dir: This element is included in the hot-restart-persistence and denotes the destination
under which backups are stored. If this element is not defined, hot backup is disabled. If a
directory is defined which does not exist, it is created on the first backup. To avoid clashing data
on multiple backups, each backup has a unique sequence ID which determines the name of the
directory which contains all Hot Restart data. This unique directory is created as a subdirectory
of the configured backup-dir.
Declarative Configuration:
394
<hazelcast>
...
<hot-restart-persistence enabled="true">
<backup-dir>/mnt/hot-backup</backup-dir>
...
</hot-restart-persistence>
...
</hazelcast>
Programmatic Configuration:
Once configured, you can initiate a new backup via API or from the Management Center. The
backup is started transactionally and cluster-wide. This means that either all or none of the
members start the same backup. The member which receives the backup request determines a new
backup sequence ID and send that information to all members. If all members respond that no
other backup is currently in progress and that no other backup request has already been made,
then the coordinating member commands the other members to start the actual backup process.
This creates a directory under the configured backup-dir with the name backup-<backupSeq> and
start copying the data from the original store.
The backup process is initiated nearly instantaneously on all members. Note that since there is no
limitation as to when the backup process is initiated, it may be initiated during membership
changes, partition table changes or during normal data update. Some of these operations may not
be completed fully yet, which means that some members will backup some data while some
members will backup a previous version of the same data. This is usually solved by the anti-entropy
mechanism on the new cluster which reconciles different versions of the same data. Please check
the Achieving High Consistency of Backup Data section for more information.
The duration of the backup process and the disk data usage drastically depends on what is
supported by the system and the configuration. Please check the Achieving high performance of
backup process section for more information on achieving better resource usage of the backup
process.
395
HotRestartService service = instance.getCluster().getHotRestartService();
service.backup();
The backupSeq is generated by the hot backup process, but you can define your own backup
sequences as shown below:
Keep in mind that the backup fails if any member contains a backup directory with the name
backup-<backupSeq>, where backupSeq is the given sequence.
As mentioned in the previous section, hot backup process creates subdirectories named backup-
<backupSeq> under the configured hot backup directory (i.e., backup-dir). When starting your cluster
with data from a hot backup, you need to set the base directory (i.e., base-dir) to the desired backup
subdirectory.
Let’s say you have configured your hot backup directory as follows:
<hazelcast>
...
<hot-restart-persistence enabled="true">
<backup-dir>/mnt/hot-backup</backup-dir>
...
</hot-restart-persistence>
...
</hazelcast>
And let’s say you have a subdirectory named backup-2018Oct24 under the backup directory
/mnt/hot-backup. When you want to start your cluster with data from this backup (backup-
2018Oct24), here is the configuration you should have for the base-dir while starting the cluster:
396
<hazelcast>
...
<hot-restart-persistence enabled="true">
<base-dir>backup-2018Oct24</base-dir>
<parallelism>1</parallelism>
</hot-restart-persistence>
...
<map name="test-map">
<hot-restart enabled="true">
<fsync>false</fsync>
</hot-restart>
</map>
...
</hazelcast>
The backup is initiated nearly simultaneously on all members but you can encounter some
inconsistencies in the data. This is because some members might have and some might not have
received updated values yet from executed operations, because the system could be undergoing
partition and membership changes or because there are some transactions which have not yet
been committed.
To achieve a high consistency of data on all members, the cluster should be put to PASSIVE state for
the duration of the call to the backup method. See the Cluster Member States section on
information on how to do this. The cluster does not need to be in PASSIVE state for the entire
duration of the backup process, though. Because of the design, only partition metadata is copied
synchronously during the invocation of the backup method. Once the backup method has returned,
all cluster metadata is copied and the exact partition data which needs to be copied is marked. After
that, the backup process continues asynchronously and you can return the cluster to the ACTIVE
state and resume operations.
Because of the design of Hot Restart Store, we can use hard links to achieve backups/snapshots of
the store. The hot backup process uses hard links whenever possible because they provide big
performance benefits and because the backups share disk usage.
The performance benefit comes from the fact that Hot Restart file contents are not being duplicated
(thus using disk and I/O resources) but rather a new file name is created for the same contents on
disk (another pointer to the same inode). Since all backups and stores share the same inode, disk
usage drops.
The bigger the percentage of stable data in the Hot Restart Store (data not undergoing changes), the
more files each backup shares with the operational Hot Restart Store and the less disk space it uses.
For the hot backup to use hard links, you must be running Hazelcast members on JDK 7 or higher
and must satisfy all requirements for the Files.createLink() method to be supported.
The backup process initially attempts to create a new hard link and if that fails for any reason it
397
continues by copying the data. Subsequent backups also attempt to use hard links.
Only cluster and distributed object metadata is copied synchronously during the invocation of the
backup method. The rest of the Hot Restart Store containing partition data is copied
asynchronously after the method call has ended. You can track the progress by API or view it from
the Management Center.
The completed and total count can provide you a way to track the percentage of the copied data.
Currently the count defines the number of copied and total local member Hot Restart Stores
(defined by HotRestartPersistenceConfig.setParallelism()) but this can change at a later point to
provide greater resolution.
Besides tracking the Hot Restart status by API, you can view the status in the Management Center
and you can inspect the on-disk files for each member. Each member creates an inprogress file
which is created in each of the copied Hot Restart Stores. This means that the backup is currently in
progress. When the backup task completes the backup operation, this file is removed. If an error
occurs during the backup task, the inprogress file is renamed to failure which contains a stack
trace of the exception.
Once the backup method call has returned and asynchronous copying of the partition data has
started, the backup task can be interrupted. This is helpful in situations where the backup task has
started at an inconvenient time. For instance, the backup task could be automatized and it could be
accidentally triggered during high load on the Hazelcast instances, causing the performance of the
Hazelcast instances to drop.
The backup task mainly uses disk IO, consumes little CPU and it generally does not last for a long
time (although you should test it with your environment to determine the exact impact).
Nevertheless, you can abort the backup tasks on all members via a cluster-wide interrupt
operation. This operation can be triggered programmatically or from the Management Center.
398
HotRestartService service = instance.getCluster().getHotRestartService();
service.interruptBackupTask();
...
This method sends an interrupt to all members. The interrupt is ignored if the backup task is
currently not in progress so you can safely call this method even though it has previously been
called or when some members have already completed their local backup tasks.
You can also interrupt the local member backup task as shown below:
The backup task stops as soon as possible and it does not remove the disk contents of the backup
directory meaning that you must remove it manually.
Records stored in the Hot Restart Store may contain sensitive information. This sensitive
information may be present in the keys, in the values, or in both. In Hot Restart terms, Encryption
at Rest concerns with encryption on the chunk file level. Since complete chunk files are encrypted,
all data stored in the Hot Restart Store is protected when Encryption at Rest is enabled.
Data persisted in the Hot Restart Store is encrypted using symmetric encryption. The
implementation is based on Java Cryptography Architecture (JCA). The encryption scheme uses two
levels of encryption keys: auto-generated Hot Restart Store-level encryption keys (one per
configured parallelism) that are used to encrypt the chunk files and a master encryption key that is
used to encrypt the store-specific encryption keys. The master encryption key is sourced from an
external system called Secure Store and, in contrast to the Hot Restart Store-level encryption keys, it
is not persisted anywhere within the Hot Restart Store.
When Hot Restart with Encryption at Rest is first enabled on a member, the member contacts the
Secure Store during the startup and retrieves the master encryption key. Then it generates the Hot
Restart Store-level encryption keys for the parallel Stores and stores them (encrypted using the
master key) under the Hot Restart Store’s directory. The subsequent writes to Hot Restart chunk
files will be encrypted using the Store-level encryption key. During Hot Restart, the member
retrieves the master encryption key from the Secure Store, decrypts the Store-level encryption keys
and uses those to decrypt the chunk files.
Master key rotation is supported. If the master encryption key changes in the Secure Store, the Hot
Restart subsystem will detect it and retrieve the new master encryption key. During this process, it
will also re-encrypt the Hot Restart Store-level encryption keys using the new master encryption
key.
The Configuring a Secure Store section provides information about the supported Secure Store
types.
399
Configuring Encryption at Rest
Encryption at Rest can be enabled and configured programmatically or declaratively using the
encryption-at-rest sub-element of hot-restart-persistence. The encryption-at-rest element has the
following attributes and sub-elements:
• enabled: Attribute that specifies whether Encryption at Rest is enabled; false by default.
• key-size: The size of the auto-generated Hot Restart Store-level encryption key.
• secure-store: Specifies the Secure Store to use for the retrieval of master encryption keys. See
the Configuring a Secure Store section for more details.
Declarative Configuration:
<hazelcast>
...
<hot-restart-persistence enabled="true">
...
<encryption-at-rest enabled="true">
<algorithm>AEC/CBC/PKCS5Padding</algorithm>
<salt>thesalt</salt>
<key-size>128</key-size>
<secure-store>...</secure-store>
</encryption-at-rest>
...
</hot-restart-persistence>
...
</hazelcast>
Programmatic Configuration:
400
Configuring a Secure Store
A Secure Store represents a (secure) source of master encryption keys and is required for using
Encryption at Rest.
Hazelcast IMDG Enterprise provides Secure Store implementations for the Java KeyStore and for
HashiCorp Vault.
The Java KeyStore Secure Store provides integration with the Java KeyStore. It can be configured
programmatically or declaratively using the keystore sub-element of secure-store. The keystore
element has the following sub-elements:
• current-key-alias: The alias for the current encryption key entry (optional).
• polling-interval: The polling interval (in seconds) for checking for changes in the KeyStore.
Disabled by default.
The Java KeyStore Secure treats all KeyStore.SecretKeyEntry entries stored in the KeyStore as
encryption keys. It expects that these entries use the same protection password as the KeyStore
itself. Entries of other types (private key entries, certificate entries) are ignored. If current-key-
alias is set, the corresponding entry will be treated as the current encryption key; otherwise, the
highest entry in the alphabetical order will be used. The remaining entries will represent historical
versions of the encryption key.
<secure-store>
<keystore>
<path>/path/to/keystore.file</path>
<type>PKCS12</type>
<password>password</password>
<current-key-alias>current</current-key-alias>
<polling-interval>60</polling-interval>
</keystore>
</secure-store>
401
JavaKeyStoreSecureStoreConfig keyStoreConfig =
new JavaKeyStoreSecureStoreConfig(new File("/path/to/keystore.file"))
.setType("PKCS12")
.setPassword("password")
.setCurrentKeyAlias("current")
.setPollingInterval(60);
The HashiCorp Vault Secure Store provides integration with HashiCorp Vault. It can be configured
programmatically or declaratively using the vault sub-element of secure-store. The vault element
has the following sub-elements:
• secret-path: The secret path under which the encryption keys are stored.
• polling-interval: The polling interval (in seconds) for checking for changes in Vault. Disabled
by default.
• ssl: The TLS/SSL configuration for HTTPS support. See the TLS/SSL section for more information
about how to use the ssl element.
The HashiCorp Vault Secure Store implementation uses the official REST API to integrate with
HashiCorp Vault. Only for the KV secrets engine, both KV V1 and KV V2 can be used, but since only
V2 provides secrets versioning, this is the recommended option. With KV V1 (no versioning
support), only one version of the encryption key can be kept, whereas with KV V2, the HashiCorp
Vault Secure Store is able to retrieve also the historical encryption keys. (Note that the size of the
version history is configurable on the Vault side.) Having access to the previous encryption keys
may be critical to avoid scenarios where the Hot Restart data becomes undecryptable because the
master encryption key is no longer usable (for instance, when the original master encryption key
got rotated out in the Secure Store while the cluster was down).
The encryption key is expected to be stored at the specified secret path and represented as a single
key/value pair in the following format:
name=Base64-encoded-data
where name can be an arbitrary string. Multiple key/value pairs under the same secret path are not
supported. Here is an example of how such a key/value pair can be stored using the HashiCorp
Vault command-line client (under the secret path hz/cluster):
402
With KV V2, a second put to the same secret path creates a new version of the encryption key. With
KV V1, it simply overwrites the current encryption key, discarding the old value.
<secure-store>
<vault>
<address>http://localhost:1234</address>
<secret-path>secret/path</secret-path>
<token>token</token>
<polling-interval>60</polling-interval>
<ssl>...</ssl>
</vault>
</secure-store>
VaultSecureStoreConfig vaultConfig =
new VaultSecureStoreConfig("http://localhost:1234", "secret/path", "token")
.setPollingInterval(60);
configureSSL(vaultConfig.getSSLConfig());
Change Data Capture (CDC) refers to the technology for identifying and capturing changes made to
a data source. These changes can then be applied to another data repository or made available in a
format supported by data integration tools.
Striim is a real-time data integration and streaming analytics software platform. It uses CDC
(Change Data Capture) mechanism to detect changes performed on a data source.
Hazelcast Striim Hot Cache, the integration solution of Hazelcast and Striim, enables real-time,
push-based propagation of changes from the database to the cache. The following sections describe
this integration.
17.1. Introduction
Through CDC, Striim is able to recognize which tables and key values have changed. It immediately
captures these changes with their table and key, and pushes the changes into a cache. Supported
databases are Oracle, My SQL and Microsoft SQL Server.
When it comes to Hazelcast, you can get the changes in a database and put them into your
Hazelcast IMDG member using a "writer" developed by Striim, i.e., Hazelcast Writer. This writer
403
creates a Hazelcast client once you start Striim, to connect to your IMDG member.
17.3. Logging
You can enable logging to see the status of the Hazelcast client created by the Hazelcast Writer. For
this, you need to add the following line to the server.sh file on the machine where Striim is
running:
-Dhazelcast.logging.type=log4j
You can also set the logging level by adding the following line to the log4j.server.properties file:
log4j.logger.com.hazelcast=debug
In the above example line, the logging level is set as DEBUG. The following lists all the available
levels:
• TRACE
• DEBUG
• INFO
• WARN
• ERROR
• OFF
The logs are written into the striim.server.log which is typically located at the /opt/striim/logs
directory.
The above settings are for the Hazelcast Client created by the writer. You can also change the
logging level dynamically for Hazelcast Writer. Follow the below instructions for this:
1. Open the Striim console using the console.sh command. See here for the usage of this command.
404
17.4. Full Worked Example Application
We have created a full example application with step-by-step instructions which guides you
through using Striim to load data from an Oracle database using the Striim Hazelcast Writer. We
recommend you start here before applying this to your own application.
Download a fully loaded evaluation copy of Striim for Hazelcast Hot Cache.
• Java
• C++
• .NET
• Memcache
• REST
• Node.js
• Go
• Python
• Scala
See the feature comparison matrix to learn about the features implemented across the clients and
language APIs.
Code Samples:
In the following client sections, you will find links to each client’s code samples.
Hundreds or even thousands of clients can be connected to the cluster. By default, there are core
405
count * 20 threads on the server side that handle all the requests, e.g., if the server has 4 cores,
there will be 80 threads.
Imagine a trading application where all the trading data are stored and managed in a Hazelcast
cluster with tens of members. Swing/Web applications at the traders' desktops can use clients to
access and modify the data in the Hazelcast cluster.
Starting with Hazelcast 3.5, a new Java Native Client Library is introduced in the
release package. This library contains clients which use the new Hazelcast Open
Binary Client Protocol.
• For 3.5.x releases: You can use the new client experimentally with the library
hazelcast-client-new. This library does not exist for the releases before 3.5.
Please do not use this library with the Hazelcast clusters from 3.6.x and higher
releases since it is not compatible with those releases.
• For 3.6.x releases: You can use the new client with the library hazelcast-client.
The old client’s library is hazelcast-client-legacy and you can still use it.
• For 3.7.x and newer releases: There is no more old client for these releases. The
only one is the hazelcast-client library, which includes clients implemented
with the Hazelcast Open Binary Client Protocol.
You do not need to set a license key for your Java clients for which you want to use
Hazelcast IMDG Enterprise features. Hazelcast IMDG Enterprise license keys are
required only for members.
Simply include the hazelcast.jar dependency in your classpath to start using the Hazelcast Java
client. Once included, you can start using this client as if you are using the Hazelcast API. The
differences are discussed in the below sections.
If you prefer to use Maven, simply add the hazelcast dependency to your pom.xml, which you may
already have done to start using Hazelcast IMDG:
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast</artifactId>
<version>4.0.1</version>
</dependency>
Client API
The first step is the configuration. You can configure the Java client declaratively or
programmatically. We use the programmatic approach for this section, as shown below.
406
ClientConfig clientConfig = new ClientConfig();
clientConfig.setClusterName("dev");
clientConfig.getNetworkConfig().addAddress("10.90.0.1", "10.90.0.2:5702");
This client interface is your gateway to access all Hazelcast distributed objects.
As the final step, if and when you are done with your client, you can shut it down as shown below:
client.shutdown();
The above code line releases all the used resources and closes connections to the cluster.
The client has two operation modes because of the distributed nature of the data and cluster.
Smart Client: In the smart mode, the clients connect to each cluster member. Since each data
partition uses the well known and consistent hashing algorithm, each client can send an operation
to the relevant cluster member, which increases the overall throughput and efficiency. Smart mode
is the default mode.
Unisocket Client: For some cases, the clients can be required to connect to a single member instead
of to each member in the cluster. Firewalls, security, or some custom networking issues can be the
reason for these cases.
In the unisocket client mode, the clients only connect to one of the configured addresses. This single
member behaves as a gateway to the other members. For any operation requested from the client,
it redirects the request to the relevant member and returns the response back to the client returned
from that member.
407
Handling Failures
There are two main failure cases and configurations you can perform to achieve proper behavior.
While the client is trying to connect initially to one of the members in the
ClientNetworkConfig.addressList, all the members might be not available. Instead of giving up,
throwing an exception and stopping the client, the client retries to connect as configured which is
described in the Configuring Client Connection Retry section.
The client executes each operation through the already established connection to the cluster. If this
connection(s) disconnects or drops, the client tries to reconnect as configured.
While sending the requests to related members, operations can fail due to various reasons. Read-
only operations are retried by default. If you want to enable retry for the other operations, you can
set the redoOperation to true. See the Enabling Redo Operation section.
You can set a timeout for retrying the operations sent to a member. This can be provided by using
the property hazelcast.client.invocation.timeout.seconds in ClientProperties. The client retries an
operation within this given period, of course, if it is a read-only operation or you enabled the
redoOperation as stated in the above paragraph. This timeout value is important when there is a
failure resulted by either of the following causes:
See the Client System Properties section for the description of the
hazelcast.client.invocation.timeout.seconds property.
When any failure happens between a client and member (such as an exception on the member side
or connection issues), an operation is retried if:
• or if it is idempotent such as a read-only operation, i.e., retrying does not have a side effect.
If it is not certain whether the operation has run on the member, then the non-idempotent
operations are not retried. However, as explained in the first paragraph of this section, you can
force all client operations to be retried (redoOperation) when there is a failure between the client
and member. But in this case, you should know that some operations may run multiple times
causing conflicts. For example, assume that your client sent a queue.offer operation to the member
and then the connection is lost. Since there will be no respond for this operation, you will not know
whether it has run on the member or not. If you enabled redoOperation, that queue.offer operation
may rerun and this causes the same objects to be offered twice in the member’s queue.
408
Using Supported Distributed Data Structures
Most of the Distributed Data Structures are supported by the Java client. When you use clients in
other languages, you should check for the exceptions.
As a general rule, you configure these data structures on the server side and access them through a
proxy on the client side.
You can use any Distributed Map object with the client, as shown below.
map.put(1, “Ali”);
String value= map.get(1);
map.remove(1);
Locality is ambiguous for the client, so addLocalEntryListener and localKeySet are not supported.
See the Distributed Map section for more information.
multiMap.put(1,”ali”);
multiMap.put(1,”veli”);
getLocalQueueStats is not supported because locality is ambiguous for the client. See the Distributed
Queue section for more information.
409
Using Other Supported Distributed Structures
The distributed data structures listed below are also supported by the client. Since their logic is the
same in both the member side and client side, you can see their sections as listed below.
• Replicated Map
• List
• Set
• IAtomicLong
• IAtomicReference
• ICountDownLatch
• ISemaphore
• FlakeIdGenerator
• Lock
Hazelcast provides the services discussed below for some common functionalities on the client side.
The distributed executor service is for distributed computing. It can be used to execute tasks on the
cluster on a designated partition or on all the partitions. It can also be used to process entries. See
the Distributed Executor Service section for more information.
After getting an instance of IExecutorService, you can use the instance as the interface with the one
provided on the server side. See the Distributed Computing chapter for detailed usage.
If you need to track clients and you want to listen to their connection events, you can use the
clientConnected and clientDisconnected methods of the ClientService class. This class must be run
on the member side. The following is an example code.
410
ClientConfig clientConfig = new ClientConfig();
//clientConfig.setClusterName("dev");
clientConfig.getNetworkConfig().addAddress("10.90.0.1", "10.90.0.2:5702");
clientService.addClientListener(new ClientListener() {
@Override
public void clientConnected(Client client) {
//Handle client connected event
}
@Override
public void clientDisconnected(Client client) {
//Handle client disconnected event
}
});
You use partition service to find the partition of a key. It returns all partitions. See the example code
below.
//partition of a key
Partition partition = partitionService.getPartition(key);
//all partitions
Set<Partition> partitions = partitionService.getPartitions();
Handling Lifecycle
411
• adding/removing lifecycle listeners.
if(lifecycleService.isRunning()){
//it is running
}
You can define labels in your Java client, similar to the way it can be done for the members.
Through the client labels, you can assign special roles for your clients and use these roles to
perform some actions specific to those client connections.
You can also group your clients using the client labels. These client groups can be blacklisted in the
Hazelcast Management Center so that they can be prevented from connecting to a cluster. See the
related section in the Hazelcast Management Center Reference Manual for more information on
this topic.
Declaratively, you can define the client labels using the client-labels configuration element. See
the below example.
<hazelcast-client>
...
<instance-name>barClient</instance-name>
<client-labels>
<label>user</label>
<label>bar</label>
</client-labels>
....
</hazelcast-client>
HazelcastClient.newHazelcastClient(clientConfig);
See the code sample for the client labels to see them in action.
412
Client Listeners
You can configure listeners to listen to various event types on the client side. You can configure
global events not relating to any distributed object through Client ListenerConfig. You should
configure distributed object listeners like map entry listeners or list item listeners through their
proxies. See the related sections under each distributed data structure in this Reference Manual.
Client Transactions
Transactional distributed objects are supported on the client side. See the Transactions chapter on
how to use them.
Java client can be configured to connect to a cluster in an async manner during the client start and
reconnecting after a cluster disconnect. Both of these options are configured via
ClientConnectionStrategyConfig.
Async client start is configured by setting the configuration element async-start to true. This
configuration changes the behavior of HazelcastClient.newHazelcastClient() call. It returns a client
instance without waiting to establish a cluster connection. Until the client connects to cluster, it
throws HazelcastClientOfflineException on any network dependent operations hence they won’t
block. If you want to check or wait the client to complete its cluster connection, you can use the
built-in lifecycle listener:
The Java client can also be configured to specify how it reconnects after a cluster disconnection.
The following are the options:
• A client can reject to reconnect to the cluster and trigger the client shutdown process.
• Client can open a connection to the cluster by blocking all waiting invocations.
• Client can open a connection to the cluster without blocking the waiting invocations. All
invocations receive HazelcastClientOfflineException during the establishment of cluster
connection. If cluster connection is failed to connect, then client shutdown is triggered.
413
See the Java Client Connection Strategy section to learn how to configure these.
You can configure Hazelcast Java Client declaratively (XML), programmatically (API), or using client
system properties.
For declarative configuration, the Hazelcast client looks at the following places for the client
configuration file:
• System property: The client first checks if hazelcast.client.config system property is set to a
file path, e.g., -Dhazelcast.client.config=C:/myhazelcast.xml.
• Classpath: If config file is not set as a system property, the client checks the classpath for
hazelcast-client.xml file.
If the client does not find any configuration file, it starts with the default configuration (hazelcast-
client-default.xml) located in the hazelcast.jar library. Before configuring the client, please try to
work with the default configuration to see if it works for you. The default should be just fine for
most users. If not, then consider custom configuration for your environment.
If you want to specify your own configuration file to create a Config object, the Hazelcast client
supports the following:
For programmatic configuration of the Hazelcast Java Client, just instantiate a ClientConfig object
and configure the desired aspects. An example is shown below:
Client Network
All network related configuration of Hazelcast Java Client is performed via the network element in
the declarative configuration file, or in the class ClientNetworkConfig when using programmatic
configuration. Let’s first give the examples for these two approaches. Then we will look at its sub-
elements and attributes.
Declarative Configuration:
Here is an example declarative configuration of network for Java Client, which includes all the
parent configuration elements.
<hazelcast-client>
...
<network>
<cluster-members>
414
<address>127.0.0.1</address>
<address>127.0.0.2</address>
</cluster-members>
<outbound-ports>
<ports>34600</ports>
<ports>34700-34710</ports>
</outbound-ports>
<smart-routing>true</smart-routing>
<redo-operation>true</redo-operation>
<connection-timeout>60000</connection-timeout>
<socket-options>
...
</socket-options>
<socket-interceptor enabled="true">
...
</socket-interceptor>
<ssl enabled="false">
...
</ssl>
<aws enabled="true" connection-timeout-seconds="11">
...
</aws>
<gcp enabled="false">
...
</gcp>
<azure enabled="false">
...
</azure>
<kubernetes enabled="false">
...
</kubernetes>
<eureka enabled="false">
...
</eureka>
<icmp-ping enabled="false">
...
</icmp-ping>
<hazelcast-cloud enabled="false">
<discovery-token>EXAMPLE_TOKEN</discovery-token>
</hazelcast-cloud>
<discovery-strategies>
<node-filter class="DummyFilterClass" />
<discovery-strategy class="DummyDiscoveryStrategy1" enabled="true">
<properties>
<property name="key-string">foo</property>
<property name="key-int">123</property>
<property name="key-boolean">true</property>
</properties>
</discovery-strategy>
</discovery-strategies>
415
</network>
...
</hazelcast-client>
Programmatic Configuration:
When an operation with sync backup is sent by a client to the Hazelcast member(s), the
acknowledgment of the operation’s backup is sent to the client by the backup replica member(s).
This improves the performance of the client operations.
By default, backup acknowledgement to the client is enabled for smart clients (unisocket clients do
not support it).
Here is an example of configuring the backup acknowledgement for Java Client declaratively.
416
clientConfig.setBackupAckToClientEnabled(boolean enabled)
You can also fine tune this feature using the following system properties:
Address List is the initial list of cluster addresses to which the client will connect. The client uses
this list to find an alive member. Although it may be enough to give only one address of a member
in the cluster (since all members communicate with each other), it is recommended that you give
the addresses for all the members.
Declarative Configuration:
<hazelcast-client>
...
<network>
<cluster-members>
<address>10.1.1.21</address>
<address>10.1.1.22:5703</address>
</cluster-members>
</network>
...
</hazelcast-client>
Programmatic Configuration:
If the port part is omitted, then 5701, 5702 and 5703 are tried in a random order.
You can provide multiple addresses with ports provided or not, as seen above. The provided list is
shuffled and tried in random order. Its default value is localhost.
417
If you have multiple members on a single machine and you are using unisocket
clients, we recommend you to set explicit ports for each member. Then you should
provide those ports in your client configuration when you give the member
addresses (using the address configuration element or addAddress method as
exemplified above). Otherwise, all the load coming from your clients may go
through a single member.
You may want to restrict outbound ports to be used by Hazelcast-enabled applications. To fulfill this
requirement, you can configure Hazelcast Java client to use only defined outbound ports. The
following are example configurations.
Declarative Configuration:
<hazelcast-client>
...
<network>
<outbound-ports>
<!-- ports between 34700 and 34710 -->
<ports>34700-34710</ports>
<!-- comma separated ports -->
<ports>34700,34701,34702,34703</ports>
<ports>34700,34705-34710</ports>
</outbound-ports>
</network>
...
</hazelcast-client>
Programmatic Configuration:
...
NetworkConfig networkConfig = config.getNetworkConfig();
// ports between 34700 and 34710
networkConfig.addOutboundPortDefinition("34700-34710");
// comma separated ports
networkConfig.addOutboundPortDefinition("34700,34701,34702,34703");
networkConfig.addOutboundPort(34705);
...
As shown in the programmatic configuration, you use the method addOutboundPort to add only one
port. If you need to add a group of ports, then use the method addOutboundPortDefinition.
In the declarative configuration, the element ports can be used for both single and multiple port
definitions.
418
Setting Smart Routing
Smart routing defines whether the client operation mode is smart or unisocket. See Java Client
Operation Modes to learn about these modes.
Declarative Configuration:
<hazelcast-client>
...
<network>
<smart-routing>true</smart-routing>
</network>
...
</hazelcast-client>
Programmatic Configuration:
Note that you need to disable smart routing (false) for the clients which want to use temporary
permissions defined in a member. See the Handling Permissions section.
Declarative Configuration:
<hazelcast-client>
...
<network>
<redo-operation>true</redo-operation>
</network>
...
</hazelcast-client>
Programmatic Configuration:
419
ClientConfig clientConfig = new ClientConfig();
ClientNetworkConfig networkConfig = clientConfig.getNetworkConfig();
networkConfig().setRedoOperation(true);
Connection timeout is the timeout value in milliseconds for members to accept client connection
requests. The following are the example configurations.
Declarative Configuration:
<hazelcast-client>
...
<network>
<connection-timeout>5000</connection-timeout>
</network>
...
</hazelcast-client>
Programmatic Configuration:
Connection attempt period is the duration in milliseconds between the connection attempts defined
by ClientNetworkConfig.connectionAttemptLimit. The following are example configurations.
Declarative Configuration:
<hazelcast-client>
...
<network>
<connection-attempt-period>5000</connection-attempt-period>
</network>
...
</hazelcast-client>
Programmatic Configuration:
420
ClientConfig clientConfig = new ClientConfig();
clientConfig.getNetworkConfig().setConnectionAttemptPeriod(5000);
SocketInterceptor has two steps. First, it is initialized by the configured properties. Second, it is
informed just after the socket is connected using the onConnect method.
socketInterceptorConfig.setEnabled(true);
socketInterceptorConfig.setImplementation(myClientSocketInterceptor);
If you want to configure the socket interceptor with a class name instead of an instance, see the
example below.
socketInterceptorConfig.setEnabled(true);
socketInterceptorConfig.setClassName(MyClientSocketInterceptor.class.getName());
421
Configuring Network Socket Options
You can configure the network socket options using SocketOptions. It has the following methods:
You can use TLS/SSL to secure the connection between the client and the members. If you want
TLS/SSL enabled for the client-cluster connection, you should set SSLConfig. Once set, the
connection (socket) is established out of an TLS/SSL factory defined either by a factory class name
or factory implementation. See the TLS/SSL section.
As explained in the TLS/SSL section, Hazelcast members have keyStores used to identify themselves
(to other members) and Hazelcast clients have trustStore used to define which members they can
trust. The clients also have their keyStores and members have their trustStores so that the members
can know which clients they can trust: see the Mutual Authentication section.
You can connect your Java client to a Hazelcast cluster which is hosted on Hazelcast Cloud. For this,
you simply enable the Hazelcast Cloud and specify the cluster’s discovery token provided by
Hazelcast Cloud while creating the cluster; this allows the Hazelcast cluster to discover your clients.
See the following example configurations.
Declarative Configuration:
422
<hazelcast-client>
...
<network>
<ssl enabled="true"/>
<hazelcast-cloud enabled="true">
<discovery-token>YOUR_TOKEN</discovery-token>
</hazelcast-cloud>
</network>
...
</hazelcast-client>
Programmatic Configuration:
Hazelcast Cloud is disabled for the Java client, by default (enabled attribute is false).
See this Hazelcast Cloud web page for more information on Hazelcast Cloud.
Since this is a REST based discovery, you need to enable the REST listener service.
See the Using the REST Endpoint Groups section on how to enable REST endpoints.
• com.sun.net.ssl.checkRevocation
• com.sun.security.enableCRLDP
And you need to set the Java security property as follows:
Security.setProperty("ocsp.enable", "true")
You can find more details on the related security topics here and here.
The example declarative and programmatic configurations below show how to configure a Java
client for connecting to a Hazelcast cluster in AWS.
Declarative Configuration:
423
<hazelcast-client>
...
<network>
<aws enabled="true">
<inside-aws>false</inside-aws>
<access-key>my-access-key</access-key>
<secret-key>my-secret-key</secret-key>
<iam-role>s3access</iam-role>
<region>us-west-1</region>
<host-header>ec2.amazonaws.com</host-header>
<security-group-name>hazelcast-sg</security-group-name>
<tag-key>type</tag-key>
<tag-value>hz-members</tag-value>
</aws>
</network>
...
</hazelcast-client>
Programmatic Configuration:
See the aws element section for the descriptions of the above AWS configuration elements except
inside-aws and iam-role, which are explained below.
If the inside-aws element is not set, the private addresses of cluster members are always converted
to public addresses. Also, the client uses public addresses to connect to the members. In order to
use private addresses, set the inside-aws parameter to true. Also note that, when connecting outside
from AWS, setting the inside-aws parameter to true causes the client to not be able to reach the
members.
IAM roles are used to make secure requests from your clients. You can provide the name of your
IAM role that you created previously on your AWS console using the iam-role or setIamRole()
method.
424
Configuring Client Load Balancer
LoadBalancer allows you to send operations to one of a number of endpoints (Members). Its main
purpose is to determine the next Member if queried. It is up to your implementation to use different
load balancing policies. You should implement the interface com.hazelcast.client.LoadBalancer for
that purpose.
If it is a smart client, only the operations that are not key-based are routed to the endpoint that is
returned by the LoadBalancer. If it is not a smart client, LoadBalancer is ignored.
Declarative Configuration:
<hazelcast-client>
...
<load-balancer type=“random”>
yourLoadBalancer
</load-balancer>
...
</hazelcast-client>
Programmatic Configuration:
You can configure global event listeners using ListenerConfig as shown below.
• LifecycleListener
• MembershipListener
• DistributedObjectListener
425
Configuring Client Near Cache
The Hazelcast distributed map supports a local Near Cache for remotely stored entries to increase
the performance of local read operations. Since the client always requests data from the cluster
members, it can be helpful in some use cases to configure a Near Cache on the client side. See the
Near Cache section for a detailed explanation of the Near Cache feature and its configuration.
Clients should provide a cluster name and password in order to connect to the cluster. You can
configure them using ClientConfig, as shown below.
clientConfig.setClusterName("dev").setClusterPassword("dev-pass");
In the cases where the security established with Config is not enough and you want your clients
connecting securely to the cluster, you can use ClientSecurityConfig. This configuration has a
credentials parameter to set the IP address and UID. See the ClientSecurityConfig Javadoc.
For the client side serialization, use the Hazelcast configuration. See the Serialization chapter.
Hazelcast has an internal executor service (different from the data structure Executor Service)
that has threads and queues to perform internal operations such as handling responses.
This can be configured using the executor-pool-size parameter in your Java client’s XML or YAML
configuration file. Alternatively, you can use the programmatic configuration approach using the
setExecutorPoolSize(int) method of your client configuration object (ClientConfig). This parameter
specifies the size of the pool of threads which perform these operations laying in the executor’s
queue. If not configured, this parameter has the value of 5 x core size of the client, i.e., it is 20 for a
machine that has 4 cores.
Configuring ClassLoader
You can configure a custom classLoader. It is used by the serialization service and to load any class
configured in configuration, such as event listeners or ProxyFactories.
Normally when a client uses a Hazelcast data structure, that structure is configured on the member
side and the client makes use of that configuration. For the Reliable Topic structure, this is not the
case; since it is backed by Ringbuffer, you should configure it on the client side. The class used for
this configuration is ClientReliableTopicConfig.
426
Config config = new Config();
RingbufferConfig ringbufferConfig = new RingbufferConfig("default");
ringbufferConfig.setCapacity(10000000)
.setTimeToLiveSeconds(5);
config.addRingBufferConfig(ringbufferConfig);
HazelcastInstance hz = Hazelcast.newHazelcastInstance(config);
HazelcastInstance client = HazelcastClient.newHazelcastClient(clientConfig);
ITopic topic = client.getReliableTopic(topicConfig.getName());
Note that, when you create a Reliable Topic structure on your client, a Ringbuffer (with the same
name as the Reliable Topic) is automatically created on the member side, with its default
configuration. See the Configuring Ringbuffer section for the defaults. You can edit that
configuration according to your needs.
You can configure a Reliable Topic structure on the client side also declaratively. The following is
the declarative configuration equivalent to the above example:
<hazelcast-client>
...
<ringbuffer name="default">
<capacity>10000000</capacity>
<time-to-live-seconds>5</time-to-live-seconds>
</ringbuffer>
<reliable-topic name="default">
<topic-overload-policy>BLOCK</topic-overload-policy>
<read-batch-size>10</read-batch-size>
</reliable-topic>
...
</hazelcast-client>
You can configure the client’s starting mode as async or sync using the configuration element
async-start. When it is set to true (async), Hazelcast creates the client without waiting a connection
to the cluster. In this case, the client instance throws an exception until it connects to the cluster. If
it is false, the client is not created until the cluster is ready to use clients and a connection with the
cluster is established. Its default value is false (sync)
You can also configure how the client reconnects to the cluster after a disconnection. This is
configured using the configuration element reconnect-mode; it has three options (OFF, ON or ASYNC).
The option OFF disables the reconnection. ON enables reconnection in a blocking manner where all
427
the waiting invocations are blocked until a cluster connection is established or failed. The option
ASYNC enables reconnection in a non-blocking manner where all the waiting invocations receive a
HazelcastClientOfflineException. Its default value is ON.
The example declarative and programmatic configurations below show how to configure a Java
client’s starting and reconnecting modes.
Declarative Configuration:
<hazelcast-client>
...
<connection-strategy async-start="true" reconnect-mode="ASYNC" />
...
</hazelcast-client>
Programmatic Configuration:
When client is disconnected from the cluster, it searches for new connections to reconnect. You can
configure the frequency of the reconnection attempts and client shutdown behavior using
ConnectionRetryConfig (programmatical approach)/ connection-retry (declarative approach).
Declarative Configuration:
<hazelcast-client>
...
<connection-strategy async-start="false" reconnect-mode="ON">
<connection-retry>
<initial-backoff-millis>1000</initial-backoff-millis>
<max-backoff-millis>60000</max-backoff-millis>
<multiplier>2</multiplier>
<cluster-connect-timeout-millis>50000</cluster-connect-timeout-millis>
<jitter>0.2</jitter>
</connection-retry>
</connection-strategy>
...
</hazelcast-client>
Programmatic Configuration:
428
ClientConfig config = new ClientConfig();
ClientConnectionStrategyConfig connectionStrategyConfig = config
.getConnectionStrategyConfig();
ConnectionRetryConfig connectionRetryConfig = connectionStrategyConfig
.getConnectionRetryConfig();
connectionRetryConfig.setInitialBackoffMillis(1000)
.setMaxBackoffMillis(60000)
.setMultiplier(2)
.setClusterConnectTimeoutMillis(50000)
.setJitter(0.2);
• initial-backoff-millis: Specifies how long to wait (backoff) after the first failure before
retrying in milliseconds. Its default value is 1000 ms.
• max-backoff-millis: Specifies the upper limit for the backoff in milliseconds. Its default value is
30000 ms.
• multiplier: Factor to multiply the backoff after a failed retry. Its default value is 1.
A pseudo-code is as follows:
begin_time = getCurrentTime()
current_backoff_millis = INITIAL_BACKOFF_MILLIS
while (TryConnect(connectionTimeout)) != SUCCESS) {
if (getCurrentTime() - begin_time >= CLUSTER_CONNECT_TIMEOUT_MILLIS) {
//Give up to connecting to the current cluster and switch to another if
exists.
}
Sleep(current_backoff_millis + UniformRandom(-JITTER * current_backoff_millis,
JITTER * current_backoff_millis))
current_backoff = Min(current_backoff_millis * MULTIPLIER, MAX_BACKOFF_MILLIS)
}
Note that, TryConnect above tries to connect to any member that the client knows, and for each
connection we have a connection timeout; see the Setting Connection Timeout section.
Hazelcast provides disaster recovery for the client-cluster connections and can use the well-known
blue-green mechanism, so that a Java client is automatically diverted to another cluster on demand
or when the intended cluster becomes unavailable.
429
Using the blue-green system, the clients can connect to another cluster automatically when they are
blacklisted from their currently connected cluster. See the Hazelcast Management Center Reference
Manual for information on blacklisting the clients.
Blue-Green Mechanism
You can make your clients to connect to another cluster by blacklisting them in a cluster and using
the blue-green mechanism. This is basically having two alive clusters, one of which is active (blue)
and the other one is idle (green).
When you blacklist a client in a cluster, the client which is disconnected from the cluster due to this
blacklisting, first tries to connect to another member of the same cluster. This is because the client
is not aware if this is a blacklisting or a normal disconnection.
The client’s behavior after this disconnection depends on its reconnect-mode. The following are the
options when you are using the blue-green mechanism, i.e., you have alternative clusters for your
clients to connect:
• If reconnect-mode is set to ON, the client changes the cluster and blocks the invocations while
doing so.
• If reconnect-mode is set to ASYNC, the client changes the cluster in the background and throws
ClientOfflineException while doing so.
• If reconnect-mode is set to OFF, the client does not change the cluster; it shuts down immediately.
Here it could be the case that the whole cluster is restarted. In this case, the owner
member of the client connection in the restarted cluster rejects the client’s
connection request, since the client is trying to connect to the old cluster. So, the
client needs to search for a new cluster, if available and according to the blue-
green configuration (see the following configuration related sections in this
section).
Consider the following notes for the blue-green mechanism (also valid for the disaster recovery
mechanism described in the next section):
• When a client disconnects from a cluster and connects to a new one the InitialMemberEvent and
CLIENT_CHANGED_CLUSTER events are fired.
• The client’s listener service re-registers its listeners to the new cluster; the listener service opens
a new connection to all members in the current member list and registers the listeners for each
connection.
• The client’s Near Caches and Continuous Query Caches are cleared when the client joins a new
cluster successfully.
• If the new cluster’s partition size is different, the client is rejected by the cluster. The client is
not able to connect to a cluster with different partition count.
430
Disaster Recovery Mechanism
When one of your clusters is gone due to a failure, the connection between your clients and owner
member in that cluster is gone, too. When a client is disconnected because of a failure in the
cluster, it first tries to connect to another member of that same cluster.
The client’s behavior after this disconnection depends on its reconnect-mode, and it has the same
options that are described in the above section (Blue-Green Mechanism).
If you have provided alternative clusters for your clients to connect, the client tries to connect to
those alternative clusters (depending on the reconnect-mode).
When a failover starts, i.e., the client is disconnected and was configured to connect to alternative
clusters, the current member list is not considered; the client cuts all the connections before
attempting to connect to a new cluster and tries the clusters as configured. See the below
configuration related sections.
The order of the clusters, that the client will try to connect in a blue-green or disaster recovery
scenario, is decided by the order of these cluster declarations as given in the client configuration.
Each time the client is disconnected from a cluster and it cannot connect back to the same one, the
configured list is iterated over. Count of these iterations before the client decides to shut down is
provided using the try-count configuration element. See the following configuration related
sections.
We didn’t go over the configuration yet (see the following configuration related sections), but for
the sake of explaining the ordering, assume that you have client-config1, client-config2 and
client-config3 in the given order as shown below. This means you have three alternative clusters.
<try-count>4</try-count>
<clients>
<client>client-config1.xml</client>
<client>client-config2.xml</client>
<client>client-config3.xml</client>
</clients>
And let’s say the client is disconnected from the cluster whose configuration is given by client-
config2.xml. Then, the client tries to connect to the next cluster in this list, whose configuration is
given by client-config3.xml. When the end of the list is reached, which is so in this example, and
the client could not connect to client-config3, then try-count is incremented and the client
continues to try to connect starting with client-config1.
This iteration continues until the client connects to a cluster or try-count is reached to the
configured value. When the iteration reaches this value and the client still could not connect to a
cluster, it shuts down. Note that, if try-count was set to 1 in the above example, and the client could
not connect to client-config3, it would shut down since it already tried once to connect to an
alternative cluster.
431
The following sections describe how you can configure the Java client for blue-green and disaster
recovery scenarios.
Using CNAME, you can change the hostname resolutions and use them dynamically. Let’s describe
the configuration with examples.
Assume that you have two clusters, Cluster A and Cluster B, and two Java clients.
<hazelcast>
...
<network>
<join>
<tcp-ip enabled="true">
<member>clusterA.member1</member>
<member>clusterA.member2</member>
</tcp-ip>
</join>
</network>
...
</hazelcast>
<hazelcast>
...
<network>
<join>
<tcp-ip enabled="true">
<member>clusterB.member1</member>
<member>clusterB.member2</member>
</tcp-ip>
</join>
</network>
...
</hazelcast>
432
<hazelcast-client>
...
<cluster-name>cluster-a</cluster-name>
<network>
<cluster-members>
<address>production1.myproject</address>
<address>production2.myproject</address>
</cluster-members>
</network>
...
</hazelcast-client>
<hazelcast-client>
...
<cluster-name>cluster-b</cluster-name>
<network>
<cluster-members>
<address>production1.myproject</address>
<address>production2.myproject</address>
</cluster-members>
</network>
...
</hazelcast-client>
4. Assuming that the client configuration file names of the above example clients are hazelcast-
client-c1.xml and hazelcast-client-c1.xml, you should configure the client failovers for a blue-
green deployment scenario as follows:
<hazelcast-client-failover>
<try-count>4</try-count>
<clients>
<client>hazelcast-client-c1.xml</client>
<client>hazelcast-client-c2.xml</client>
</clients>
</hazelcast-client-failover>
You can find the complete Hazelcast client failover example configuration file
(hazelcast-client-failover-full-example) both in XML and YAML formats
including the descriptions of elements and attributes, in the /bin folder of your
Hazelcast download directory.
5. You should also configure your clients to forget DNS lookups using the networkaddress.cache.ttl
JVM parameter.
6. Configure the addresses in your clients' configuration to resolve to hostnames of Cluster A via
CNAME so that the clients will connect to Cluster A when it starts:
433
production1.myproject → clusterA.member1
production2.myproject → clusterA.member2
7. When you want the clients to switch to the other cluster, change the mapping as follows:
production1.myproject → clusterB.member1
production2.myproject → clusterB.member2
8. Wait for the time you configured using the networkaddress.cache.ttl JVM parameter for the
client JVM to forget the old mapping.
Let’s first give example configurations and describe the configuration elements.
Declarative Configuration:
<hazelcast-client-failover>
<try-count>4</try-count>
<clients>
<client>hazelcast-client-c1.xml</client>
<client>hazelcast-client-c2.xml</client>
</clients>
</hazelcast-client-failover>
Programmatic Configuration:
• try-count: Count of connection retries by the client to the alternative clusters. When this value
434
is reached and the client still could not connect to a cluster, the client shuts down. Note that this
value applies to the alternative clusters whose configurations are provided with the client
element. For the above example, two alternative clusters are given with the try-count set as 4.
This means the number of connection attempts is 4 x 2 = 8.
• client: Path to the client configuration that corresponds to an alternative cluster that the client
will try to connect.
The client configurations must be exactly the same except the following configuration options:
• SecurityConfig
• NetworkConfig.Addresses
• NetworkConfig.SocketInterceptorConfig
• NetworkConfig.SSLConfig
• NetworkConfig.AwsConfig
• NetworkConfig.GcpConfig
• NetworkConfig.azureConfig
• NetworkConfig.KubernetesConfig
• NetworkConfig.EurekaConfig
• NetworkConfig.CloudConfig
• NetworkConfig.DiscoveryConfig
You can also configure it within the Spring context, as shown below:
<beans>
<hz:client-failover id="blueGreenClient" try-count="5">
<hz:client>
<hz:cluster-name name="dev"/>
<hz:network>
<hz:member>127.0.0.1:5700</hz:member>
<hz:member>127.0.0.1:5701</hz:member>
</hz:network>
</hz:client>
<hz:client>
<hz:cluster-name name="alternativeClusterName"/>
<hz:network>
<hz:member>127.0.0.1:5702</hz:member>
<hz:member>127.0.0.1:5703</hz:member>
</hz:network>
</hz:client>
</hz:client-failover>
</beans>
The client failure detectors are responsible to determine if a member in the cluster is unreachable
435
or crashed. The most important problem in the failure detection is to distinguish whether a
member is still alive but slow, or has crashed. But according to the famous FLP result, it is
impossible to distinguish a crashed member from a slow one in an asynchronous system. A
workaround to this limitation is to use unreliable failure detectors. An unreliable failure detector
allows a member to suspect that others have failed, usually based on liveness criteria but it can
make mistakes to a certain degree.
Hazelcast Java client has two built-in failure detectors: Deadline Failure Detector and Ping Failure
Detector. These client failure detectors work independently from the member failure detectors, e.g.,
you do not need to enable the member failure detectors to benefit from the client ones.
Deadline Failure Detector uses an absolute timeout for missing/lost heartbeats. After timeout, a
member is considered as crashed/unavailable and marked as suspected.
The following is a declarative example showing how you can configure the Deadline Failure
Detector for your client (in the client’s configuration XML file, e.g., hazelcast-client.xml):
<hazelcast-client>
...
<properties>
<property name="hazelcast.client.heartbeat.timeout">60000</property>
<property name="hazelcast.client.heartbeat.interval">5000</property>
</properties>
...
</hazelcast-client>
436
Client Ping Failure Detector
In addition to the Deadline Failure Detector, the Ping Failure Detector may be configured on your
client. Please note that this detector is disabled by default. The Ping Failure Detector operates at
Layer 3 of the OSI protocol and provides much quicker and more deterministic detection of
hardware and other lower level events. When the JVM process has enough permissions to create
RAW sockets, the implementation chooses to rely on ICMP Echo requests. This is preferred.
If there are not enough permissions, it can be configured to fallback on attempting a TCP Echo on
port 7. In the latter case, both a successful connection or an explicit rejection is treated as "Host is
Reachable". Or, it can be forced to use only RAW sockets. This is not preferred as each call creates a
heavy weight socket and moreover the Echo service is typically disabled.
For the Ping Failure Detector to rely only on the ICMP Echo requests, the following criteria need to
be met:
• The file ld.conf must be edited to overcome the rejection by the dynamic linker when loading
libs from untrusted paths.
The details of these requirements are explained in the Requirements section of Hazelcast members'
Ping Failure Detector.
If any of the above criteria isn’t met, then isReachable will always fallback on TCP Echo attempts on
port 7.
An example declarative configuration to use the Ping Failure Detector is as follows (in the client’s
configuration XML file, e.g., hazelcast-client.xml):
<hazelcast-client>
...
<network>
<icmp-ping enabled="true">
<timeout-milliseconds>1000</timeout-milliseconds>
<interval-milliseconds>1000</interval-milliseconds>
<ttl>255<ttl>
<echo-fail-fast-on-startup>false</echo-fail-fast-on-startup>
<max-attempts>2</max-attempts>
</icmp-ping>
</network>
...
</hazelcast-client>
437
ClientConfig config = ...;
• enabled: Enables the legacy ICMP detection mode, works cooperatively with the existing failure
detector and only kicks-in after a pre-defined period has passed with no heartbeats from a
member. Its default value is false.
• max-attempts: Maximum number of ping attempts before the member gets suspected by the
detector. Its default value is 3.
• ttl: Maximum number of hops the packets should go through. Its default value is 255. You can
set to 0 to use your system’s default TTL.
In the above example configuration, the Ping Failure Detector attempts 2 pings, one every second,
and waits up to 1 second for each to complete. If there is no successful ping after 2 seconds, the
member gets suspected.
To enforce the Requirements, the property echo-fail-fast-on-startup can also be set to true, in
which case Hazelcast fails to start if any of the requirements isn’t met.
Unlike the Hazelcast members, Ping Failure Detector works always in parallel with Deadline Failure
Detector on the clients. Below is a summary table of all possible configuration combinations of the
Ping Failure Detector.
438
ICMP Fail-Fast Description Linux Windows macOS
true false Parallel ping Supported Supported TCP Supported
detector, works ICMP Echo if Echo on port 7 ICMP Echo if
in parallel with available - Falls available - Falls
the configured back on TCP back on TCP
failure Echo on port 7 Echo on port 7
detector.
Checks
periodically if
members are
live (OSI Layer
3) and suspects
them
immediately,
regardless of
the other
detectors.
true true Parallel ping Supported - Not Supported Not Supported
detector, works Requires OS - Requires root
in parallel with Configuration privileges
the configured Enforcing
failure ICMP Echo if
detector. available - No
Checks start up if not
periodically if available
members are
live (OSI Layer
3) and suspects
them
immediately,
regardless of
the other
detectors.
There are some advanced client configuration properties to tune some aspects of Hazelcast Client.
You can set them as property name and value pairs through declarative configuration,
programmatic configuration, or JVM system property. See the System Properties appendix to learn
how to set these properties.
When you want to reconfigure a system property, you need to restart the clients
for which the property is modified.
The table below lists the client configuration properties with their descriptions.
439
Property Name Default Type Description
Value
hazelcast.invalidation.max. 10 int If missed invalidation count is bigger than this
tolerated.miss.count
value, the relevant cached data is made
unreachable.
hazelcast.client.shuffle.me true string The client shuffles the given member list to
mber.list
prevent all the clients to connect to the same
member when this property is true. When it is
set to false, the client tries to connect to the
members in the given order.
hazelcast.client.heartbeat. 60000 int Timeout for the heartbeat messages sent by the
timeout
client to members. If no messages pass between
the client and member within the given time via
this property in milliseconds, the connection
will be closed.
440
Property Name Default Type Description
Value
hazelcast.client.invocation -1 int Controls the maximum timeout, in milliseconds,
.backoff.timeout.millis
to wait for an invocation space to be available. If
an invocation cannot be made because there are
too many pending invocations, then an
exponential backoff is done to give the system
time to deal with the backlog of invocations.
This property controls how long an invocation is
allowed to wait before getting a
HazelcastOverloadException. When set to -1 then
HazelcastOverloadException is thrown
immediately without any waiting.
441
Property Name Default Type Description
Value
hazelcast.client.concurrent. 100 int Property needed for concurrency detection so
window.ms that write through and dynamic response
handling can be done correctly. This property
sets the window for a concurrency detection
(duration when it signals that a concurrency has
been detected), even if there are no further
updates in that window. Normally in a
concurrent system the windows keeps sliding
forward so it always remains concurrent.
Setting it too high effectively disables the
optimization because once concurrency has
been detected it will keep that way. Setting it too
low could lead to suboptimal performance
because the system will try write through and
other optimizations even though the system is
concurrent.
442
Property Name Default Type Description
Value
hazelcast.client.operation. 5000 int If an operation has backups, this property
backup.timeout.millis
specifies how long the invocation will wait for
acks from the backup replicas. If acks are not
received from some backups, there will not be
any rollback on other successful replicas.
hazelcast.discovery.public. false bool Enables the discovery joiner to use public IPs
ip.enabled
from DiscoveredNode. See Discovery SPI for more
information.
If you have Hazelcast IMDG Enterprise HD, your Hazelcast Java client’s Near Cache can benefit
443
from the High-Density Memory Store.
Let’s recall the Java client’s Near Cache configuration (see the Configuring Client Near Cache
section) without High-Density Memory Store:
<hazelcast-client>
...
<near-cache name="MENU">
<eviction size="2000" eviction-policy="LFU"/>
<time-to-live-seconds>0</time-to-live-seconds>
<max-idle-seconds>0</max-idle-seconds>
<invalidate-on-change>true</invalidate-on-change>
<in-memory-format>OBJECT</in-memory-format>
</near-cache>
...
</hazelcast-client>
You can configure this Near Cache to use Hazelcast’s High-Density Memory Store by setting the in-
memory format to NATIVE. See the following configuration example:
<hazelcast-client>
...
<near-cache>
<eviction size="1000" max-size-policy="ENTRY_COUNT" eviction-policy="LFU"/>
<time-to-live-seconds>0</time-to-live-seconds>
<max-idle-seconds>0</max-idle-seconds>
<invalidate-on-change>true</invalidate-on-change>
<in-memory-format>NATIVE</in-memory-format>
</near-cache>
</hazelcast-client>
• max-size-policy: Maximum size policy for eviction of the Near Cache. Available values are as
follows:
• eviction-policy: Eviction policy configuration. Its default values is NONE. Available values are
as follows:
◦ NONE: No items are evicted and the size property is ignored. You still can combine it with
444
time-to-live-seconds.
Keep in mind that you should have already enabled the High-Density Memory Store usage for your
client, using the <native-memory> element in the client’s configuration.
See the High-Density Memory Store section for more information on Hazelcast’s High-Density
Memory Store feature.
• Ability to add cluster listeners to a cluster and entry/item listeners to distributed data
structures.
See Hazelcast C++ client’s own GitHub repo for information on setting the client up, installing and
compiling it, its serialization support and APIs such as raw pointer and query. You can also find
code samples for this client in this repo.
See Hazelcast .NET client’s own GitHub repo for information on configuring and starting the client.
You can also find code samples for this client in this repo.
http://member IP address:port/hazelcast/rest/maps/mapName/key
445
http://member IP address:port/hazelcast/rest/queues/queueName
For the operations to be performed, standard REST conventions for HTTP calls are used.
REST client request listener service is not enabled by default. You should enable it
on your cluster members to use REST client. It can be enabled using the DATA
endpoint group; see the Using the REST Endpoint Groups section.
All parameters that are used in REST API URLs, such as the distributed data
structure and key names, must be URL encoded when composing a call. As an
example, name.with/special@chars parameter value would be encoded as
name.with%2Fspecial%40chars.
In the following GET, POST and DELETE examples, assume that your cluster members are as shown
below.
Members [5] {
Member [10.20.17.1:5701]
Member [10.20.17.2:5701]
Member [10.20.17.4:5701]
Member [10.20.17.3:5701]
Member [10.20.17.5:5701]
}
All of the requests below can return one of the following responses in case of a
failure.
• If the HTTP request syntax is not known, the following response is returned.
Content-Length: 0
You can put a new key1/value1 entry into a map by using POST call to
http://10.20.17.1:5701/hazelcast/rest/maps/mapName/key1 URL. This call’s content body should
contain the value of the key. Also, if the call contains the MIME type, Hazelcast stores this
information, too.
446
An example POST call is shown below.
This GET call returned a value, its length and also the MIME type (text/plain) since the POST call
example shown above included the MIME type.
You can use a DELETE call to remove an entry. An example DELETE call is shown below with its
response.
447
< HTTP/1.1 200 OK
< Content-Length: 0
If you leave the key empty as follows, the DELETE call deletes all entries from the map.
You can use a POST call to create an item on the queue. An example is shown below.
It returns the following if the queue is full and the item is not able to be offered to the queue:
You can use a DELETE call for retrieving items from a queue. Note that you should state the poll
timeout while polling for queue events by an extra path parameter.
448
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Content-Length: 3
foo
When the timeout is reached, the response is No Content success, i.e., there is no item on the queue
to be returned.
Besides the above operations, you can check the status of your cluster, an example of which is
shown below.
$ curl -v http://127.0.0.1:5701/hazelcast/rest/cluster
449
< HTTP/1.1 200 OK
{
"members": [
{
"address": "[127.0.0.1]:5701",
"liteMember": false,
"localMember": true,
"uuid": "73f5d6ad-7b51-4e74-bd74-15b2e7de7edd",
"memberVersion": "4.0.0"
},
{
"address": "[127.0.0.1]:5702",
"liteMember": false,
"localMember": false,
"uuid": "e8b41ac6-9db9-43f1-9e98-8b0392891560",
"memberVersion": "4.0.0"
},
{
"address": "[127.0.0.1]:5703",
"liteMember": false,
"localMember": false,
"uuid": "c6929312-d4d3-4527-83bc-474c229394d6",
"memberVersion": "4.0.0"
}
],
"connectionCount": 1,
"allConnectionCount": 3
}
Additionally, you can check the name of any instance of your cluster. An example is shown below.
$ curl -v http://127.0.0.1:5701/hazelcast/rest/instance
{"name":"adoring_brattain"}
RESTful access is provided through any member of your cluster. You can even put an HTTP load-
balancer in front of your cluster members for load balancing and fault tolerance.
450
You need to handle the failures on REST polls as there is no transactional
guarantee.
A Memcache client written in any language can talk directly to a Hazelcast cluster. No additional
configuration is required.
To be able to use a Memcache client, you must enable the Memcache client request listener service
using either one of the following configuration options:
<hazelcast>
...
<network>
<memcache-protocol enabled="true"/>
</network>
...
</hazelcast>
<hazelcast>
...
<advanced-network>
<memcache-server-socket-endpoint-config name="memcache">
<port auto-increment="false" port-count="10">6000</port>
</memcache-server-socket-endpoint-config>
</advanced-network>
...
</hazelcast>
451
Members [5] {
Member [10.20.17.1:5701]
Member [10.20.17.2:5701]
Member [10.20.17.4:5701]
Member [10.20.17.3:5701]
Member [10.20.17.5:5701]
}
Assume that you have a PHP application that uses PHP Memcache client to cache things in
Hazelcast. All you need to do is have your PHP Memcache client connect to one of these members. It
does not matter which member the client connects to because the Hazelcast cluster looks like one
giant machine (Single System Image). Here is a PHP client code example.
<?php
$memcache = new Memcache;
$memcache->connect( '10.20.17.1', 5701 ) or die ( "Could not connect" );
$memcache->set( 'key1', 'value1', 0, 3600 );
$get_result = $memcache->get( 'key1' ); // retrieve your data
var_dump( $get_result ); // show it
?>
Notice that Memcache client connects to 10.20.17.1 and uses port 5701. Here is a Java client code
example with SpyMemcached client:
If you want your data to be stored in different maps, for example to utilize per map configuration,
you can do that with a map name prefix as in the following example code.
hz_memcache prefix\_ separates Memcache maps from Hazelcast maps. If no map name is given,
it is stored in a default map named hz_memcache_default.
An entry written with a Memcache client can be read by another Memcache client written in
another language.
452
18.5.2. Unsupported Operations for Memcache
• CAS operations are not supported. In operations that get CAS parameters, such as append, CAS
values are ignored.
• Only a subset of statistics are supported. Below is the list of supported statistic values.
◦ cmd_set
◦ cmd_get
◦ incr_hits
◦ incr_misses
◦ decr_hits
◦ decr_misses
See Hazelcast Python client’s GitHub repo for its documentation and code samples.
See Hazelcast Node.js client’s GitHub repo for its documentation and code samples.
18.8. Go Client
Go Client implementation for Hazelcast. It is implemented using the Hazelcast Open Binary Client
Protocol.
See Hazelcast Go client’s GitHub repo for its documentation and code samples.
18.9. Scala
The API for Hazelcast Scala is based on Scala 2.11 and Hazelcast 3.6/3.7/3.8 releases. However, these
are not hard dependencies provided that you include the relevant Hazelcast dependencies.
19. Serialization
Hazelcast needs to serialize the Java objects that you put into Hazelcast because Hazelcast is a
distributed system. The data and its replicas are stored in different partitions on multiple cluster
453
members. The data you need may not be present on the local member, and in that case, Hazelcast
retrieves that data from another member. This requires serialization.
• Locking an object
Hazelcast optimizes the serialization for the basic types and their array types. You cannot override
this behavior.
Hazelcast optimizes all of the above object types. You do not need to worry about their
(de)serializations.
454
3. If the above check fails, then Hazelcast checks if it is an instance of
com.hazelcast.nio.serialization.Portable.
4. If the above check fails, then Hazelcast checks if it is an instance of one of the default types (see
the Serialization chapter introduction for default types).
5. If the above check fails, then Hazelcast looks for a user-specified Custom Serializer, i.e. an
implementation of ByteArraySerializer or StreamSerializer. Custom serializer is searched using
the input Object’s Class and its parent class up to Object. If parent class search fails, all
interfaces implemented by the class are also checked (excluding java.io.Serializable and
java.io.Externalizable).
7. If the above check fails, Hazelcast uses the registered Global Serializer if one exists.
If all of the above checks fail, then serialization fails. When a class implements multiple interfaces,
the above steps are important to determine the serialization mechanism that Hazelcast uses. When
a class definition is required for any of these serializations, you need to have all the classes needed
by the application on your classpath because Hazelcast does not download them automatically,
unless you are using user code deployment.
455
Serializ Advantages Drawbacks
ation
Interfac
e
Identifie • More CPU and memory usage • Specific to Hazelcast
dDataSe efficient than Serializable
• Serialization interface must be implemented
rializabl
e • Reflection is not used during
• A Factory and configuration must be
deserialization
implemented
• Supported by all Native Clients
Let’s dig into the details of the above serialization mechanisms in the following sections.
Let’s take a look at the example code below for Java Serializable.
456
public class Employee implements Serializable {
private static final long serialVersionUID = 1L;
private String surname;
Here, the fields that are non-static and non-transient are automatically serialized. To eliminate class
compatibility issues, it is recommended that you add a serialVersionUID, as shown above. Also,
when you are using methods that perform byte-content comparisons, such as IMap.replace(), and if
byte-content of equal objects is different, you may face unexpected behaviors. For example, if the
class relies on a hash map, the replace method may fail. The reason for this is the hash map is a
serialized data structure with unreliable byte-content.
Hazelcast also supports java.io.Externalizable. This interface offers more control on the way fields
are serialized or deserialized. Compared to native Java serialization, it also can have a positive
effect on performance. With java.io.Externalizable, there is no need to add serialVersionUID.
@Override
public void readExternal( ObjectInput in )
throws IOException, ClassNotFoundException {
this.surname = in.readUTF();
}
@Override
public void writeExternal( ObjectOutput out )
throws IOException {
out.writeUTF(surname);
}
}
You explicitly perform writing and reading of fields. Perform reading in the same order as writing.
457
19.4. Implementing DataSerializable
As mentioned in Implementing Java Serializable & Externalizable, Java serialization is an easy
mechanism. However, it does not control how fields are serialized or deserialized. Moreover, Java
serialization can lead to excessive CPU loads since it keeps track of objects to handle the cycles and
streams class descriptors. These are performance decreasing factors; thus, serialized data may not
have an optimal size.
The DataSerializable interface of Hazelcast overcomes these issues. Here is an example of a class
implementing the com.hazelcast.nio.serialization.DataSerializable interface.
public Address() {}
//getters setters..
Since the address field itself is DataSerializable, it calls address.writeData(out) when writing and
address.readData(in) when reading. Also note that you should have writing and reading of the
fields occur in the same order. When Hazelcast serializes a DataSerializable, it writes the className
first. When Hazelcast deserializes it, className is used to instantiate the object using reflection.
458
public class Employee implements DataSerializable {
private String firstName;
private String lastName;
private int age;
private double salary;
private Address address; //address itself is DataSerializable
public Employee() {}
//getters setters..
As you can see, since the address field itself is DataSerializable, it calls address.writeData(out) when
writing and address.readData(in) when reading. Also note that you should have writing and
reading of the fields occur in the same order. While Hazelcast serializes a DataSerializable, it
writes the className first. When Hazelcast deserializes it, className is used to instantiate the object
using reflection.
459
19.4.2. IdentifiedDataSerializable
For a faster serialization of objects, avoiding reflection and long class names, Hazelcast
recommends you implement com.hazelcast.nio.serialization.IdentifiedDataSerializable which is
a slightly better version of DataSerializable.
• int getClassId();
• int getFactoryId();
Implementing IdentifiedDataSerializable
Let’s take a look at the following example code and configuration to see IdentifiedDataSerializable
in action.
460
public class Employee
implements IdentifiedDataSerializable {
public Employee() {}
@Override
public void readData( ObjectDataInput in )
throws IOException {
this.surname = in.readUTF();
}
@Override
public void writeData( ObjectDataOutput out )
throws IOException {
out.writeUTF( surname );
}
@Override
public int getFactoryId() {
return EmployeeDataSerializableFactory.FACTORY_ID;
}
@Override
public int getClassId() {
return EmployeeDataSerializableFactory.EMPLOYEE_TYPE;
}
@Override
public String toString() {
return String.format( "Employee(surname=%s)", surname );
}
}
The methods getClassId and getFactoryId return a unique positive number within the
EmployeeDataSerializableFactory. Now, let’s create an instance of this
EmployeeDataSerializableFactory.
461
public class EmployeeDataSerializableFactory
implements DataSerializableFactory{
@Override
public IdentifiedDataSerializable create(int typeId) {
if ( typeId == EMPLOYEE_TYPE ) {
return new Employee();
} else {
return null;
}
}
}
The only method you should implement is create, as seen in the above example. It is recommended
that you use a switch-case statement instead of multiple if-else blocks if you have a lot of
subclasses. Hazelcast throws an exception if null is returned for typeId.
Registering EmployeeDataSerializableFactory
As the last step, you need to register EmployeeDataSerializableFactory declaratively (declare in the
configuration file hazelcast.xml) as shown below. Note that factory-id has the same value of
FACTORY_ID in the above code. This is crucial to enable Hazelcast to find the correct factory.
<hazelcast>
...
<serialization>
<data-serializable-factories>
<data-serializable-factory factory-id="1">
EmployeeDataSerializableFactory
</data-serializable-factory>
</data-serializable-factories>
</serialization>
...
</hazelcast>
462
• support for multi-version of the same object type
In order to support these features, a serialized Portable object contains meta information like the
version and the concrete location of the each field in the binary data. This way, Hazelcast navigates
in the byte[] and deserializes only the required field without actually deserializing the whole
object. This improves the Query performance.
With multi-version support, you can have two cluster members where each has different versions
of the same object. Hazelcast stores both meta information and uses the correct one to serialize and
deserialize Portable objects depending on the member. This is very helpful when you are doing a
rolling upgrade without shutting down the cluster.
Portable serialization is totally language independent and is used as the binary protocol between
Hazelcast server and clients.
463
public class Foo implements Portable {
final static int ID = 5;
@Override
public int getFactoryId() {
return 1;
}
@Override
public int getClassId() {
return ID;
}
@Override
public void writePortable( PortableWriter writer ) throws IOException {
writer.writeUTF( "foo", foo );
}
@Override
public void readPortable( PortableReader reader ) throws IOException {
foo = reader.readUTF( "foo" );
}
}
Similar to IdentifiedDataSerializable, a Portable Class must provide classId and factoryId. The
Factory object creates the Portable object given the classId.
464
public class MyPortableFactory implements PortableFactory {
@Override
public Portable create( int classId ) {
if ( Foo.ID == classId )
return new Foo();
else
return null;
}
}
The last step is to register the Factory to the SerializationConfig. Below are the programmatic and
declarative configurations for this step.
<hazelcast>
...
<serialization>
<portable-version>0</portable-version>
<portable-factories>
<portable-factory factory-id="1">
com.hazelcast.nio.serialization.MyPortableFactory
</portable-factory>
</portable-factories>
</serialization>
...
</hazelcast>
Note that the id that is passed to the SerializationConfig is the same as the factoryId that the Foo
class returns.
More than one version of the same class may need to be serialized and deserialized. For example, a
client may have an older version of a class and the member to which it is connected may have a
newer version of the same class.
Portable serialization supports versioning. It is a global versioning, meaning that all portable
classes that are serialized through a member get the globally configured portable version.
You can declare Version in the configuration file hazelcast.xml using the portable-version element,
as shown below.
465
<hazelcast>
...
<serialization>
<portable-version>1</portable-version>
<portable-factories>
<portable-factory factory-id="1">
PortableFactoryImpl
</portable-factory>
</portable-factories>
</serialization>
...
</hazelcast>
You can also use the interface VersionedPortable which enables to upgrade the version per class,
instead of global versioning. If you need to update only one class, you can use this interface. In this
case, your class should implement VersionedPortable instead of Portable, and you can give the
desired version using the method VersionedPortable.getClassVersion().
• It is important to change the version whenever an update is performed in the serialized fields of
a class, for example by incrementing the version.
• If a client performs a Portable deserialization on a field and then that Portable is updated by
removing that field on the cluster side, this may lead to a problem.
• Portable serialization does not use reflection and hence, fields in the class and in the serialized
content are not automatically mapped. Field renaming is a simpler process. Also, since the class
ID is stored, renaming the Portable does not lead to problems.
• Types of fields need to be updated carefully. Hazelcast performs basic type upgradings, such as
int to float.
Assume that a new member joins to the cluster with a class that has been modified and class'
version has been upgraded due to this modification.
• If you modified the class by adding a new field, the new member’s put operations include that
new field. If this new member tries to get an object that was put from the older members, it gets
null for the newly added field.
• If you modified the class by removing a field, the old members get null for the objects that are
put by the new member.
• If you modified the class by changing the type of a field, the error IncompatibleClassChangeError
is generated unless the change was made on a built-in type or the byte size of the new type is
less than or equal to the old one. The following are examples of allowed type conversions:
466
If you have not modify a class at all, it works as usual.
public EmployeePortable() {
}
As you see in the above example, first the name and then the age is written. This order should be
preserved in other members or clients.
467
19.5.5. Null Portable Serialization
Be careful with serializing null portables. Hazelcast lazily creates a class definition of portable
internally when the user first serializes. This class definition is stored and used later for
deserializing that portable class. When the user tries to serialize a null portable when there is no
class definition at the moment, Hazelcast throws an exception saying that
com.hazelcast.nio.serialization.HazelcastSerializationException: Cannot write null portable
without explicitly registering class definition!.
There are two solutions to get rid of this exception. Either put a non-null portable class of the same
type before any other operation, or manually register a class definition in serialization
configuration as shown below.
Putting a DistributedObject (Hazelcast Semaphore, Queue, etc.) in a cluster member and getting it
from another one is not a straightforward operation. Passing the ID and type of the
DistributedObject can be a solution. For deserialization, you can get the object from
HazelcastInstance. For instance, if your object is an instance of IQueue, you can either use
HazelcastInstance.getQueue(id) or Hazelcast.getDistributedObject.
You can use the HazelcastInstanceAware interface in the case of a deserialization of a Portable
DistributedObject if it gets an ID to be looked up. HazelcastInstance is set after deserialization, so
you first need to store the ID and then retrieve the DistributedObject using the
setHazelcastInstance method.
You can use a stream to serialize and deserialize data by using StreamSerializer. This is a good
option for your own implementations. It can also be adapted to external serialization libraries like
Kryo, JSON and protocol buffers.
468
StreamSerializer Example Code 1
@Override
public int getTypeId () {
return 1;
}
@Override
public void write( ObjectDataOutput out, EmployeeSS employee )
throws IOException {
out.writeUTF(employee.getSurname());
}
@Override
public EmployeeSS read( ObjectDataInput in )
throws IOException {
String surname = in.readUTF();
return new EmployeeSS(surname);
}
@Override
public void destroy () {
}
}
In practice, classes may have many fields. Just make sure the fields are read in the same order as
469
they are written. The type ID must be unique and greater than or equal to 1. Uniqueness of the type
ID enables Hazelcast to determine which serializer is used during deserialization.
As the last step, let’s register the EmployeeStreamSerializer in the configuration file hazelcast.xml, as
shown below.
<hazelcast>
...
<serialization>
<serializers>
<serializer type-class="EmployeeSS" class-name="EmployeeStreamSerializer"
/>
</serializers>
</serialization>
...
</hazelcast>
StreamSerializer cannot be created for well-known types, such as Long and String
and primitive arrays. Hazelcast already registers these types.
Assume that our custom serialization serializes Foo into XML. First you need to implement a
com.hazelcast.nio.serialization.StreamSerializer. A very simple one that uses XMLEncoder and
XMLDecoder could look like the following:
470
public class FooXmlSerializer implements StreamSerializer<Foo> {
@Override
public int getTypeId() {
return 10;
}
Configuring StreamSerializer
Note that typeId must be unique because Hazelcast uses it to look up the StreamSerializer while it
deserializes the object. The last required step is to register the StreamSerializer in your Hazelcast
configuration. Below are the programmatic and declarative configurations for this step.
<hazelcast>
<serialization>
<serializers>
<serializer type-class="com.www.Foo" class-name="com.www.FooXmlSerializer"
/>
</serializers>
</serialization>
...
</hazelcast>
471
From now on, this Hazelcast example will use FooXmlSerializer to serialize Foo objects. In this way,
you can write an adapter (StreamSerializer) for any Serialization framework and plug it into
Hazelcast.
ByteArraySerializer exposes the raw ByteArray used internally by Hazelcast. It is a good option if
the serialization library you are using deals with ByteArrays instead of streams.
@Override
public void destroy () {
}
@Override
public int getTypeId () {
return 1;
}
@Override
public byte[] write( EmployeeSS object )
throws IOException {
return object.getName().getBytes();
}
@Override
public EmployeeSS read( byte[] buffer )
throws IOException {
String surname = new String( buffer );
return new EmployeeSS( surname );
}
}
Configuring ByteArraySerializer
472
<hazelcast>
...
<serialization>
<serializers>
<serializer type-class="Employee">EmployeeByteArraySerializer</serializer>
</serializers>
</serialization>
...
</hazelcast>
By default, the global serializer does not handle java.io.Serializable and java.io.Externalizable
instances. However, you can configure it to be responsible for those instances.
A custom serializer should be registered for a specific class type. The global serializer handles all
class types if all the steps in searching for a serializer fail as described in Serialization Interface
Types.
• Third party serialization frameworks can be integrated using the global serializer.
• For your custom objects, you can implement a single serializer to handle all of them.
• You can replace the internal Java serialization by enabling the overrideJavaSerialization option
of the global serializer configuration.
Any custom serializer can be used as the global serializer. See the Custom Serialization section for
implementation details.
An example global serializer that integrates with a third party serializer is shown below.
473
public class GlobalStreamSerializer
implements StreamSerializer<Object> {
private init() {
//someThirdPartySerializer = ...
}
@Override
public int getTypeId () {
return 123;
}
@Override
public void write( ObjectDataOutput out, Object object ) throws IOException {
byte[] bytes = someThirdPartySerializer.encode(object);
out.writeByteArray(bytes);
}
@Override
public Object read( ObjectDataInput in ) throws IOException {
byte[] bytes = in.readByteArray();
return someThirdPartySerializer.decode(bytes);
}
@Override
public void destroy () {
someThirdPartySerializer.destroy();
}
}
Now, we can register the global serializer in the configuration file hazelcast.xml, as shown below.
<hazelcast>
...
<serialization>
<serializers>
<global-serializer override-java-serialization="true"
>GlobalStreamSerializer</global-serializer>
</serializers>
</serialization>
...
</hazelcast>
474
19.8. Implementing HazelcastInstanceAware
You can implement the HazelcastInstanceAware interface to access distributed objects for cases
where an object is deserialized and needs access to HazelcastInstance.
Let’s implement it for the Employee class mentioned in the Custom Serialization section.
PersonAwr(String name) {
this.name = name;
}
@Override
public void setHazelcastInstance(HazelcastInstance hz) {
this.hazelcastInstance = hz;
System.out.println("hazelcastInstance set");
}
@Override
public String toString() {
return String.format("Person(name=%s)", name);
}
}
After deserialization, the object is checked to see if it implements HazelcastInstanceAware and the
method setHazelcastInstance is called. Notice the hazelcastInstance is transient. This is because
this field should not be serialized.
It may be a good practice to inject a HazelcastInstance into a domain object, e.g., Employee in the
above example, when used together with Runnable/Callable implementations. These
runnables/callables are executed by IExecutorService which sends them to another machine. And
after a task is deserialized, run/call method implementations need to access HazelcastInstance.
We recommend you only set the HazelcastInstance field while using setHazelcastInstance method
and you not execute operations on the HazelcastInstance. The reason is that when
HazelcastInstance is injected for a HazelcastInstanceAware implementation, it may not be up and
running at the injection time.
475
19.9. Untrusted Deserialization Protection
Hazelcast offers a Java deserialization protection based on whitelisting and blacklisting the
class/package names. These listings support prefixes.
<hazelcast>
...
<serialization>
<java-serialization-filter defaults-disabled="true">
<whitelist>
<class>example.Foo</class>
<package>com.acme.app</package>
<prefix>com.hazelcast.</package>
<prefix>java.</package>
<prefix>javax.</package>
<prefix>[</package>
</whitelist>
<blacklist>
<class>com.acme.app.BeanComparator</class>
</blacklist>
</java-serialization-filter>
</serialization>
...
</hazelcast>
The protection uses a whitelist as the default configuration. When this list is not explicitly provided,
the following default prefixes are used for the whitelist:
• java
476
• com.hazelcast.
• [ (for primitives and arrays)
If you do not want to use the default whitelist prefixes, you must set the defaults-disabled attribute
to true.
Once the protection is enabled, the following filtering rules are used when the objects are
deserialized:
Note that the safest way to provide a protection against untrusted deserialization
is using whitelisting (also keep in mind that maintaining such a whitelist can be
difficult).
Declarative Configuration:
477
<hazelcast>
...
<serialization>
<portable-version>2</portable-version>
<use-native-byte-order>true</use-native-byte-order>
<byte-order>BIG_ENDIAN</byte-order>
<enable-compression>true</enable-compression>
<enable-shared-object>false</enable-shared-object>
<allow-unsafe>true</allow-unsafe>
<data-serializable-factories>
<data-serializable-factory factory-id="1001">
abc.xyz.Class
</data-serializable-factory>
</data-serializable-factories>
<portable-factories>
<portable-factory factory-id="9001">
xyz.abc.Class
</portable-factory>
</portable-factories>
<serializers>
<global-serializer>abc.Class</global-serializer>
<serializer type-class="Employee" class-name="com.EmployeeSerializer">
</serializer>
</serializers>
<check-class-def-errors>true</check-class-def-errors>
</serialization>
...
</hazelcast>
Programmatic Configuration:
478
changing a type of a field.
• use-native-byte-order: Set to true to use native byte order for the underlying platform. Its
default value is false.
• byte-order: Defines the byte order that the serialization uses: BIG_ENDIAN or LITTLE_ENDIAN. Its
default value is BIG_ENDIAN.
• enable-compression: Enables compression if default Java serialization is used. Its default value is
false.
• enable-shared-object: Enables shared object if default Java serialization is used. Its default value
is false.
• allow-unsafe: Set to true to allow unsafe to be used. Its default value is false.
• check-class-def-errors: When set to true, the serialization system checks for class definitions
error at start and throws a Serialization Exception with an error definition.
20. Management
This chapter provides information on managing and monitoring your Hazelcast cluster. It gives
detailed instructions related to gathering statistics, monitoring via JMX protocol and managing the
cluster with useful utilities.
To get local map statistics, use the getLocalMapStats() method from the IMap interface. This method
returns a LocalMapStats object that holds local map statistics.
Below is an example code where the getLocalMapStats() method and the getOwnedEntryCount()
method get the number of entries owned by this member.
479
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
IMap<String, String> customers = hazelcastInstance.getMap( "customers" );
LocalMapStats mapStatistics = customers.getLocalMapStats();
System.out.println( "number of entries owned on this member = "
+ mapStatistics.getOwnedEntryCount() );
The following are some of the metrics that you can access via the LocalMapStats object:
• Number of queries executed on the map (getQueryCount() and getIndexedQueryCount()) (it may
be imprecise for queries involving partition predicates (PartitionPredicate) on the off-heap
storage).
To access map index statistics, if you are using indexes to speed up map queries, use the
getIndexStats() method of the LocalMapStats interface returned by IMap.getLocalMapStats().
Below is an example where the getIndexStats() method is used to examine an average selectivity of
index hits:
The following are some of the metrics that you can obtain via the LocalIndexStats interface:
• Number of queries and hits into an index (getQueryCount() and getHitCount()): Number of hits
480
and queries may differ since a single query may hit the same index more than once.
• Average index hit selectivity (getAverageHitSelectivity): Returned values are in the range from
0.0 to 1.0. Values close to 1.0 indicate a high selectivity meaning the index is efficient; values
close to 0.0 indicate a low selectivity meaning the index efficiency is approaching an efficiency
of a simple full scan.
• Number of index insert, update and remove operations (getInsertCount(), getUpdateCount() and
getRemoveCount()).
• Memory cost of an index (getMemoryCost()): For on-heap storages, this memory cost metric value
is a best-effort approximation and doesn’t indicate a precise on-heap memory usage of an
index.
To compute an aggregated value of getAverageHitSelectivity() for all cluster members, you can use
a simple averaging computation as shown below:
In this computation, s(i) is an average hit selectivity on the member i and n is the total number of
cluster members.
(s(1) * h(1) + s(2) * h(2) + ... + s(n) * h(n)) / (h(1) + h(2) + ... + h(n))
Here, s(i) is an average hit selectivity on the member i, h(i) is a hit count (getHitCount()) on the
member i and n is the total number of cluster members. This more advanced solution may produce
more precise results in unstable dynamic clusters where new members do not have enough
statistics accumulated. The same technique may be applied to the getAverageHitLatency() metric.
• The values returned by getQueryCount() and getHitCount() may be imprecise for queries
involving partition predicates (PartitionPredicate) on off-heap storage.
• The index statistics may be imprecise after a new cluster member addition or the existing
member removal until enough fresh statistics is accumulated on a new owner of an index or its
partition.
481
20.1.3. Near Cache Statistics
To get Near Cache statistics, use the getNearCacheStats() method from the LocalMapStats object. This
method returns a NearCacheStats object that holds Near Cache statistics.
Below is an example code where the getNearCacheStats() method and the getRatio method from
NearCacheStats get a Near Cache hit/miss ratio.
The following are some of the metrics that you can access via the NearCacheStats object (applies to
both client and member Near Caches):
• memory cost (number of bytes) of owned entries in the Near Cache (getOwnedEntryMemoryCost())
To get MultiMap statistics, use the getLocalMultiMapStats() method from the MultiMap interface. This
method returns a LocalMultiMapStats object that holds local MultiMap statistics.
Below is an example code where the getLocalMultiMapStats() method and the getLastUpdateTime
method from LocalMultiMapStats get the last update time.
The following are some of the metrics that you can access via the LocalMultiMapStats object:
482
• creation time of the multimap on the member (getCreationTime())
To get local queue statistics, use the getLocalQueueStats() method from the IQueue interface. This
method returns a LocalQueueStats object that holds local queue statistics.
Below is an example code where the getLocalQueueStats() method and the getAverageAge method
from LocalQueueStats get the average age of items.
The following are some of the metrics that you can access via the `LocalQueueStats ` object:
• minimum and maximum ages of the items in the member (getMinAge() and getMaxAge())
To get local topic statistics, use the getLocalTopicStats() method from the ITopic interface. This
method returns a LocalTopicStats object that holds local topic statistics.
The following are the metrics that you can access via the `LocalTopicStats ` object:
483
• creation time of the topic on the member (getCreationTime())
To get local executor statistics, use the getLocalExecutorStats() method from the IExecutorService
interface. This method returns a LocalExecutorStats object that holds local executor statistics.
The following are some of the metrics that you can access via the `LocalExecutorStats ` object:
The metrics are local to the members, i.e., they do not reflect cluster wide values.
You can find the JMX API definition below with descriptions and the API methods in parenthesis.
• Name ( name )
484
• Decrement and Get ( decrementAndGet() )
• Name ( name )
• Name ( name )
• Countdown ( countDown() )
List ( IList )
• Name ( name )
Lock ( ILock )
• Name ( name )
Map ( IMap )
• Name ( name )
• Size ( size )
485
• Config ( config )
• Hits ( localHits )
• Clear ( clear() )
• Values ( values(p))
MultiMap ( MultiMap )
• Name ( name )
• Size ( size )
486
• Backup entry cost ( localBackupEntryMemoryCost )
• Hits ( localHits )
• Clear ( clear() )
• Name ( name )
• Size ( size )
• Config ( config )
• Hits ( localHits )
487
• Total remove latency ( localTotalRemoveLatency )
• Clear ( clear() )
• Values ( values())
Queue ( IQueue )
• Name ( name )
• Config ( QueueConfig )
• Clear ( clear() )
Semaphore ( ISemaphore )
• Name ( name )
• Drain ( drain())
488
Set ( ISet )
• Name ( name )
• Clear ( clear() )
Topic ( ITopic )
• Name ( name )
• Config ( config )
• Name ( name )
• Version ( version )
• Build ( build )
• Configuration ( config )
• Node ( HazelcastInstance.Node )
• Address ( address )
489
◦ Running operation count ( runningOperationsCount )
◦ Name ( name )
◦ Is shutdown ( isShutdown )
◦ Is terminated ( isTerminated )
◦ Name ( name )
◦ Is shutdown ( isShutdown )
◦ Is terminated ( isTerminated )
490
• Scheduled Executor ( HazelcastInstance.ManagedExecutorService )
◦ Name ( name )
◦ Is shutdown ( isShutdown )
◦ Is terminated ( isTerminated )
◦ Name ( name )
◦ Is shutdown ( isShutdown )
◦ Is terminated ( isTerminated )
◦ Name ( name )
◦ Is shutdown ( isShutdown )
◦ Is terminated ( isTerminated )
◦ Name ( name )
◦ Is shutdown ( isShutdown )
◦ Is terminated ( isTerminated )
491
◦ Completed task count ( completedTaskCount )
To achieve this, first add the following system properties to enable the JMX agent:
• -Dcom.sun.management.jmxremote
• -Dcom.sun.management.jmxremote.port=_portNo\_ (to specify JMX port, the default is 1099)
(optional)
Then, enable JMX by setting the hazelcast.jmx property to true using one of the following ways:
• By declarative configuration:
<hazelcast>
...
<properties>
<property name="hazelcast.jmx">true</property>
</properties>
...
</hazelcast>
• By programmatic configuration:
config.setProperty("hazelcast.jmx", "true");
<hz:properties>
<hz: property name="hazelcast.jmx">true</hz:property>
</hz:properties>
The MBeans name consists of the Hazelcast instance name, the type of the data structure and that
data structure’s name. In the above example, _hzInstance_1_dev is the instance name, we connect to
an IMap with the name trial.
492
20.3.2. Connecting to JMX Agent
One of the ways you can connect to JMX agent is using jconsole, jvisualvm (with MBean plugin) or
another JMX compliant monitoring tool.
First, you need to specify the URL where the Hazelcast JMX service is running. See the following
code snippet:
The port in the above example should be the one that you define while setting the JMX remote port
number (if different than the default port 1099).
Then use the URL you acquired to connect to the JMX service and get the JMXConnector object. Using
this object, get the MBeanServerConnection object. The MBeanServerConnection object enables you to
use the MBean methods. See the example code below.
Once you get the MBeanServerConnection object, you can call the getter methods of MBeans as
follows:
• CLUSTER_READ
• CLUSTER_WRITE
• HEALTH_CHECK
• HOT_RESTART
• WAN
• DATA
493
• CP
And the following table lists all the endpoints along with the groups they belong to.
• /hazelcast/rest/management/cluster/version (GET)
• /hazelcast/rest/management/cluster/nodes
• /hazelcast/rest/instance
• /hazelcast/rest/management/cluster/version (POST)
• /hazelcast/rest/management/cluster/clusterShutdown
• /hazelcast/rest/management/cluster/memberShutdown
• /hazelcast/rest/cp-subsystem/members/local
• /hazelcast/rest/cp-subsystem/groups
• /hazelcast/rest/cp-subsystem/groups/${CPGROUP_NAME}
• /hazelcast/rest/cp-subsystem/members
• /hazelcast/rest/cp-
subsystem/groups/${CPGROUP_NAME}/remove
• /hazelcast/rest/cp-
subsystem/members/${CPMEMBER_UUID}/remove
• /hazelcast/rest/cp-subsystem/restart
• /hazelcast/rest/cp-
subsystem/groups/${CPGROUP_NAME}/sessions
• /hazelcast/rest/cp-
subsystem/groups/${CPGROUP_NAME}/sessions/${CP_SESSION
_ID}/remove
• /hazelcast/ (Other HTTP REST API operations)
494
Endpoint Group Default Endpoints
HOT_RESTART Disabled • /hazelcast/rest/management/cluster/forceStart
• /hazelcast/rest/management/cluster/partialStart
• /hazelcast/rest/management/cluster/hotBackup
• /hazelcast/rest/management/cluster/hotBackupInterrupt
CP Disabled • /hazelcast/rest/cp-subsystem/members/local
• /hazelcast/rest/cp-subsystem/groups
• /hazelcast/rest/cp-subsystem/groups/${CPGROUP_NAME}
• /hazelcast/rest/cp-subsystem/members
• /hazelcast/rest/cp-
subsystem/groups/${CPGROUP_NAME}/remove
• /hazelcast/rest/cp-
subsystem/members/${CPMEMBER_UUID}/remove
• /hazelcast/rest/cp-subsystem/reset
• /hazelcast/rest/cp-
subsystem/groups/${CPGROUP_NAME}/sessions
• /hazelcast/rest/cp-
subsystem/groups/${CPGROUP_NAME}/sessions/${CP_SESSION
_ID}/remove
You can enable or disable any REST endpoint group using the following declarative configuration
(HEALTH_CHECK group is used as an example):
495
<hazelcast>
...
<network>
<rest-api enabled="true">
<endpoint-group name="HEALTH_CHECK" enabled="false"/>
</rest-api>
</network>
...
</hazelcast>
Alternatively, you can also use the advanced-network element for the same purpose:
<hazelcast>
...
<advanced-network enabled="true">
<rest-server-socket-endpoint-config>
<endpoint-groups>
<endpoint-group name="HEALTH_CHECK" enabled="false"/>
</endpoint-groups>
</rest-server-socket-endpoint-config>
</advanced-network>
...
</hazelcast>
See the Advanced Network Configuration section for more information on the
advanced-network element.
When you enable or disable a REST endpoint group, all the endpoints in that group are enabled or
disabled, respectively. For the examples above, we disabled the endpoints belonging to the
496
HEALTH_CHECK endpoint group.
Hazelcast allows you to register for membership events so that you are notified when members are
added or removed. You can also get the set of cluster members.
The following example code does the above: registers for member events, notifies when members
are added or removed and gets the set of cluster members.
See the Membership Listener section for more information on membership events.
497
20.5.2. Managing Cluster and Member States
This section explains the states of Hazelcast clusters and members which you can use to allow or
restrict the designated cluster/member operations.
Cluster States
By changing the state of your cluster, you can allow/restrict several cluster operations or change the
behavior of those operations. You can use the methods changeClusterState() and shutdown() which
are in the Cluster interface to change your cluster’s state.
• ACTIVE: This is the default cluster state. Cluster continues to operate without restrictions.
• NO_MIGRATION:
◦ In this state, there is no data movement between Hazelcast members. It means that when
there is a member crash or a new member in the cluster, there won’t be any partition
rebalancing, partition backup replica creation or migration.
Please note that promoting a backup replica to the primary replica is a local operation and
does not involve any data movement between cluster members. Hence, backup promotions
occur on member crashes when the cluster is in this mode. Upon a member crash, all other
members that keep backup replicas of the crashed member promote backup replicas to the
primary replica role and restore availability. However, there is a limitation here. Since the
maximum number of backups is 6, if you lose 7 members in your large cluster, you can lose
availability of the partitions whose primary and backup replicas are mapped to those
crashed members.
◦ You cannot change the state of a cluster to NO_MIGRATION when migration/replication tasks
are being performed.
◦ When you want to add multiple new members to the cluster, you can first change the cluster
state to NO_MIGRATION, then start the new members. Once all of them join to the cluster, you
can change the cluster state back to ACTIVE. Then, the cluster rebalances partition replica
distribution at once.
• FROZEN:
◦ In this state, the partition table is frozen and partition assignments are not performed.
◦ If a member leaves, it can join back. Its partition assignments (both primary and backup
replicas) remain the same until either it joins back or the cluster state is changed to ACTIVE.
When it joins back to the cluster, it owns all previous partition assignments as it was. On the
other hand, when the cluster state changes to ACTIVE, re-partitioning starts and unassigned
partition replicas are assigned to the active members.
◦ All other operations in the cluster, except migration, continue without restrictions.
498
◦ You cannot change the state of a cluster to FROZEN when migration/replication tasks are being
performed.
◦ You can make use of FROZEN state along with the Hot Restart Persistence feature. You can
change the cluster state to FROZEN, then restart some of your members using the Hot Restart
feature. The data on the restarting members will not be accessible but you will be able to
access to the data that is stored in other members. Basically, FROZEN cluster state allows you
do perform maintenance on your members with degrading availability partially.
• PASSIVE:
◦ In this state, the partition table is frozen and partition assignments are not performed.
◦ If a member leaves while the cluster is in this state, the member will be removed from the
partition table if cluster state moves back to ACTIVE.
◦ This state rejects ALL operations immediately EXCEPT the read-only operations like
map.get() and cache.get(), replication and cluster heartbeat tasks.
◦ You cannot change the state of a cluster to PASSIVE when migration/replication tasks are
being performed.
◦ You can make use of PASSIVE state along with the Hot Restart Persistence feature. See the
Cluster Shutdown API for more info.
• IN_TRANSITION:
◦ During this state, your cluster does not accept new members and migration/replication tasks
are paused.
All in-cluster methods are fail-fast, i.e., when a method fails in the cluster, it throws
an exception immediately (it is not retried).
The following snippet is from the Cluster interface showing the methods used to manage your
cluster’s states.
499
Cluster Member States
• ACTIVE: This is the initial member state. The member can execute and process all operations.
When the state of the cluster is ACTIVE or FROZEN, the members are in the ACTIVE state.
• PASSIVE: In this state, member rejects all operations EXCEPT the read-only ones, replication and
migration operations, heartbeat operations and the join operations as explained in the Cluster
States section above. A member can go into this state when either of the following happens:
1. Until the member’s shutdown process is completed after the method Node.shutdown(boolean)
is called. Note that, when the shutdown process is completed, member’s state changes to
SHUT_DOWN.
• SHUT_DOWN: A member goes into this state when the member’s shutdown process is completed.
The member in this state rejects all operations and invocations. A member in this state cannot
be restarted.
You can use the script cluster.sh, which comes with the Hazelcast package, to get/change the state
of your cluster, to shutdown your cluster and to force your cluster to clean its persisted data and
make a fresh start. The latter is the Force Start operation of Hazelcast’s Hot Restart Persistence
feature. See the Force Start section.
The script cluster.sh uses curl command and curl must be installed to be able to
use the script.
The script cluster.sh takes the following parameters to operate according to your needs. If these
parameters are not provided, the default values are used.
-s or --state None Updates the state of the cluster to a new state. New
state can be active, no_migration, frozen, passive. This
is used with the operation change-state. This
parameter has no default value; when you use this,
you should provide a valid state.
500
Parameter Default Value Description
-a or --address 127.0.0.1 Defines the IP address of a cluster member. If you
want to manage your cluster remotely, you should use
this parameter to provide the IP address of a member
to this script.
The script cluster.sh is self-documented; you can see the parameter descriptions using the
command ./cluster.sh -h or ./cluster.sh --help.
You can perform the above operations using the Hot Restart tab of Hazelcast
Management Center or using the REST API. See the Hot Restart and Using REST API
for Cluster Management sections in the Hazelcast Management Center Reference
Manual.
501
Example Usages for cluster.sh
Let’s say you have a cluster running on remote machines and one Hazelcast member is running on
the IP 172.16.254.1 and on the port 5702. The cluster name and password of the cluster are test and
test.
The following also gets the cluster state, using the alternative parameter names, e.g., --port instead
of -p:
To change the state of the cluster to frozen, use the following command:
Similarly, you can use the following command for the same purpose:
Similarly, you can use the following command for the same purpose:
To partial start the cluster when Hot Restart is enabled, use the following command:
Similarly, you can use the following command for the same purpose:
To force start the cluster when Hot Restart is enabled, use the following command:
502
Similarly, you can use the following command for the same purpose:
The following also gets the cluster state, using the alternative parameter names, e.g., --port instead
of -p:
See the Rolling Member Upgrades chapter to learn more about the cases when you should change
the cluster version.
The cluster version is always in the major.minor format, e.g., 3.12. Using other formats results in a
failure.
When the member has TLS configured, use the --https argument to instruct cluster.sh to use the
proper URL scheme:
./cluster.sh --https \
--operation get-state --address member1.example.com --port 5701
If the default set of trusted certificate authorities is not sufficient, e.g, you use a self-signed
certificate, you can provide a custom file with the root certificates:
./cluster.sh --https \
--cacert /path/to/ca-certs.pem \
--operation get-state --address member1.example.com --port 5701
When the TLS mutual authentication is enabled, you have to provide the client certificate and
related private key:
503
./cluster.sh --https \
--key privkey.pem \
--cert cert.pem \
--operation get-state --address member1.example.com --port 5701
Besides the Management Center’s Hot Restart tab and the script cluster.sh, you can also use REST
API to manage your cluster’s state. The following are the operations you can perform.
Some of the REST calls listed below need their REST endpoint groups to be enabled.
See the Using the REST Endpoint Groups section on how to enable them.
Also note that the value of ${PASSWORD} in the following calls is checked only if the security is
enabled in Hazelcast IMDG, i.e., if you have Hazelcast IMDG Enterprise Edition. If the security is
disabled, the ${PASSWORD} can be left empty.
504
IMDG Open Source commands
When a member joins the cluster, you can check whether it is ready to be used with the
following HTTP call. It should return the 200 status code, meaning that the member can be
safely used. Otherwise, it returns the 503 status code indicating the member is not available
yet. Only HTTP GET request method is supported.
curl http://127.0.0.1:${PORT}/hazelcast/health/ready
To change the state of the cluster to frozen, use the following command:
To get the current cluster version, use the following curl command:
$ curl http://127.0.0.1:${PORT}/hazelcast/rest/management/cluster/version
{"status":"success","version":"3.9"}
505
IMDG Enterprise commands
To partial start the cluster when Hot Restart is enabled, use the following command:
To force start the cluster when Hot Restart is enabled, use the following command:
You can also perform the above operations (partialStart and forceStart) using
the Hot Restart tab of Hazelcast Management Center or using the script
cluster.sh. See the Hot Restart and cluster.sh sections.
To upgrade the cluster version, after having upgraded all members of your cluster to a new
minor version, use the following curl command:
For example, assuming the default cluster name and password, issue the following command
to any member of the cluster to upgrade from cluster version 3.8 to 3.9:
506
20.5.5. Enabling Lite
You can Members
also perform the above cluster version operations using Hazelcast
Management Center or using the script cluster.sh. See the Rolling Member
Lite members are the Hazelcast cluster members that do not store data. These members are used
Upgrades and cluster.sh sections.
mainly to execute tasks and register listeners and they do not have partitions.
You can form your cluster to include the regular Hazelcast members to store data and Hazelcast lite
members to run heavy computations. The presence of the lite members do not affect the operations
performed on the other members in the cluster. You can directly submit your tasks to the lite
members, register listeners on them and invoke operations for the Hazelcast data structures on
them such as map.put() and map.get().
You can enable a cluster member to be a lite member using declarative or programmatic
configuration.
Declarative Configuration:
<hazelcast>
...
<lite-member enabled="true"/>
...
</hazelcast>
Programmatic Configuration:
Lite members can be promoted to data members using the Cluster interface. When they are
promoted, cluster partitions are rebalanced and ownerships of some portion of the partitions are
assigned to the newly promoted data members.
507
20.5.6. Defining Member Attributes
You can define various member attributes on your Hazelcast members. You can use these member
attributes to tag your members as may be required by your business logic.
• or provide the member attributes at runtime via attribute setter methods on the Member
interface.
For example, you can tag your members with their CPU characteristics and you can route CPU
intensive tasks to those CPU-rich members. Here is how you can do it:
508
public class ExampleMemberAttributes {
For another example, you can tag some members with a filter so that a member in the cluster can
load classes from those tagged members. See the User Code Deployment section for more
information.
You can also define your member attributes through declarative configuration and start your
member afterwards. Here is how you can use the declarative approach:
509
<hazelcast>
...
<member-attributes>
<attribute name="CPU_CORE_COUNT">4</attribute-name>
</member-attributes>
...
</hazelcast>
To prevent data loss when shutting down a cluster member, Hazelcast provides a graceful
shutdown feature. You perform this shutdown by calling the method HazelcastInstance.shutdown().
The oldest cluster member migrates all of the replicas owned by the shutdown-requesting member
to the other running (not initiated shutdown) cluster members. After these migrations are
completed, the shutting down member will not be the owner or a backup of any partition anymore.
It means that you can shutdown any number of Hazelcast members in a cluster concurrently with
no data loss.
Please note that the process of shutting down members waits for a predefined amount of time for
the oldest member to migrate their partition replicas. You can specify this graceful shutdown
timeout duration using the property hazelcast.graceful.shutdown.max.wait. Its default value is 10
minutes. If migrations are not completed within this duration, shutdown may continue non-
gracefully and lead to data loss. Therefore, you should choose your own timeout duration
considering the size of data in your cluster.
With the improvements in graceful shutdown procedure in Hazelcast 3.7, the following methods
are not needed to perform graceful shutdown. Nevertheless, you can use them to check the current
safety status of the partitions in your cluster.
The method isClusterSafe checks whether the cluster is in a safe state. It returns true if there are
no active partition migrations and all backups are in sync for each partition.
The method isMemberSafe checks whether a specific member is in a safe state. It checks if all
backups of partitions of the given member are in sync with the primary ones. Once it returns true,
the given member is safe and it can be shut down without data loss.
510
Similarly, the method isLocalMemberSafe does the same check for the local member. The method
forceLocalMemberToBeSafe forces the owned and backup partitions to be synchronized, making the
local member safe.
20.6. Metrics
Metrics are <string,value> key-value pairs of data that capture the runtime information about the
members and clients in a Hazelcast cluster. Such a metric can be the number of entries stored in a
particular IMap on a given member, JVM metrics like used heap, OS metrics like load average, and
so on. The metrics system is responsible for collecting these metrics and making them available for
the consumers of the metrics. There are a few hundred metrics collected during every metrics
collection cycle by default, but the number of metrics grows as more features and data structures
are used. This is because every data structure provides its own metrics. For example, if there are
two IMaps used in a cluster, both IMaps produce their metrics on every member.
You can configure the metrics system declaratively or programmatically. The following is an
example declarative configuration with the default values, on the member side:
<metrics enabled="true">
<management-center enabled="true">
<retention-seconds>5</retention-seconds>
</management-center>
<jmx enabled="true"/>
<collection-frequency-seconds>5</collection-frequency-seconds>
</metrics>
Note that all of the metrics configuration values can be overridden with system properties. The
properties are are listed below:
• hazelcast.metrics.mc.retention: Duration, in seconds, for which the metrics are retained for
Management Center.
• hazelcast.metrics.jmx.enabled: Enables exposing the collected metrics over JMX if set to true,
disables it otherwise.
The client configuration is very similar, it just lacks the Management Center configuration block
511
(management-center configuration element), as shown below. This is because the clients are not
connected to Management Center and the client metrics are sent to Management Center through a
member to which the client is connected.
<metrics enabled="true">
<jmx enabled="true"/>
<collection-frequency-seconds>5</collection-frequency-seconds>
</metrics>
Similarly to the member configuration, the client metrics configuration can be overridden with the
following system properties:
Metrics are part of and consumed by the following Hazelcast tools and interfaces:
• Management Center
• JMX
• Diagnostics
Management Center
Management Center receives the metrics used for building its view about the Hazelcast cluster
from the metrics system. The members collect their metrics with the frequency defined with
collection-frequency-seconds, which is by default once in every 5 seconds. Then it saves the
collected metrics into a blob stored in an in-memory buffer. The blob then is retained for the time
configured in the retention-seconds under the management-center configuration block. This is also 5
seconds by default, which means there is at most one blob stored by default. Management Center
periodically reads out the metrics from this buffer, which frees up the heap occupied by the blob
once it is consumed.
As mentioned earlier, the client metrics are also stored in these blobs on the member side with
timestamps assigned to them on the client side.
512
JMX
The metrics are available on the JMX interface of the Hazelcast members and clients. The metrics
are exposed under com.hazelcast/$INSTANCE_NAME/Metrics where $INSTANCE_NAME is the name of the
member or client instance to which the JMX client is connected.
Diagnostics
There are no diagnostics related settings in the metrics configuration section. See the Metrics
section of the Diagnostics for the details.
Version Compatibility
Note that the metric names may change between MINOR versions but not between PATCH versions.
The metrics system is designed with care to make the least possible impact on the performance of
the cluster. Since the metrics collection takes place periodically with a few seconds frequency, the
main focus is keeping allocation rates and memory footprint at minimum. Therefore, the blobs that
store the metrics for Management Center are stored in the memory in a compressed format. The
measurements, that use multiple IMaps to scale up the number of metrics, show that one blob
occupies only a few KBs and it grows above 10KB only if there are more than 1000 IMaps.
The allocation rate of a metric collection cycle is also low. With both Management Center and JMX
consumers enabled, the allocation rate with 100 IMaps is below 256KB per cycle, and it grows
above 1MB with 1000 IMaps. This means that metrics collection does not increase the frequency of
the garbage collection (GC) noticeably.
While the metrics collection is considered GC friendly, it should be noted that the blobs are not
recycled: configuring the retention time should be done with taking the frequency of the GC into
account to prevent the blobs from getting promoted into the tenured region of the heap that in the
end contributes to major GCs after time.
20.7. Diagnostics
Hazelcast offers an extended set of diagnostics plugins for both Hazelcast members and clients. A
dedicated log file is used to write the diagnostics content, and a rolling file approach is used to
prevent taking up too much disk space.
To enable diagnostics logging, you should specify the following properties on the member side:
513
-Dhazelcast.diagnostics.enabled=true
-Dhazelcast.diagnostics.metric.level=info
-Dhazelcast.diagnostics.invocation.sample.period.seconds=30
-Dhazelcast.diagnostics.pending.invocations.period.seconds=30
-Dhazelcast.diagnostics.slowoperations.period.seconds=30
-Dhazelcast.diagnostics.storeLatency.period.seconds=60
-Dhazelcast.diagnostics.enabled=true
-Dhazelcast.diagnostics.metric.level=info
You can use the following property to specify the location of the diagnostics log file:
-Dhazelcast.diagnostics.directory=/your/log/directory
The name of the log file can be prefixed with a custom string as shown below:
-Dhazelcast.diagnostics.filename.prefix=foobar
<Date> BuildInfo[
<log content for BuildInfo diagnostics plugin>]
<Date> SystemProperties[
<log content for SystemProperties diagnostics plugin>]
<Date> ConfigProperties[
<log content for ConfigProperties diagnostics plugin>]
<Date> Metrics[
<log content for Metrics diagnostics plugin>]
<Date> SlowOperations[
<log content for SlowOperations diagnostics plugin>]
<Date> HazelcastInstance[
<log content for HazelcastInstance diagnostics plugin>]
...
...
...
514
A rolling file approach is used to prevent creating too much data. By default 10 files of 50MB each
are allowed to exist. The size of the file can be changed using the following property:
-Dhazelcast.diagnostics.max.rolled.file.size.mb=100
You can also set the number of files using the following property:
-Dhazelcast.diagnostics.max.rolled.file.count=5
In Hazelcast 3.9 the default file size has been upgraded from 10MB to 50MB.
As it is stated in the introduction of this section and shown in the log file content above, diagnostics
utility consists of plugins such as BuildInfo, SystemProperties and HazelcastInstance.
BuildInfo
It shows the detailed Hazelcast build information including the Hazelcast release number, Git
revision number and whether you have Hazelcast IMDG Enterprise or not.
SystemProperties
It shows all the properties and their values in your system used by and configured for your
Hazelcast installation. These are the properties starting with java (excluding java.awt), hazelcast,
sun and os. It also includes the arguments that are used to startup the JVM.
ConfigProperties
It shows the Hazelcast properties and their values explicitly set by you either on the command line
(with -D) or by using declarative/programmatic configuration.
Metrics
You can configure the frequency of dumping information to the log file using the following
property:
SlowOperations
It shows the slow operations and invocations, See the SlowOperationDetector section for more
information.
515
Invocations
It shows all kinds of statistics about current and past invocations including current pending
invocations, history of invocations and slow history, i.e., all samples where the invocation took
more than the defined threshold. Slow history does not only include the invocations where the
operations took a lot of time, but it also includes any other invocations that have been obstructed.
Using the following properties, you can configure the frequency of scanning all pending
invocations and the threshold that makes an invocation to be considered as slow:
HazelcastInstance
It shows the basic state of your Hazelcast cluster including the count and addresses of current
members and the address of oldest cluster member. It is useful to get a fast impression of the
cluster without needing to analyze a lot of data.
You can configure the frequency at which the cluster information is dumped to the log file using the
following property:
SystemLog
It shows the activities in your cluster including when a connection/member is added or removed
and if there is a change in the lifecycle of the cluster. It also includes the reasons for connection
closings.
You can enable or disable the system log diagnostics plugin, and configure whether it shows
information about partition migrations using the following properties:
StoreLatency
It shows statistics including the count of methods for each store (load, loadAll, loadAllKeys, etc.),
average and maximum latencies for each store method calls and latency distributions for each
store. The following is an example output snippet as part of the diagnostics log file for Hazelcast
MapStore:
516
17-9-2016 13:12:34 MapStoreLatency[
map[
loadAllKeys[
count=1
totalTime(us)=8
avg(us)=8
max(us)=8
latency-distribution[
0..99us=1]]
load[
count=100
totalTime(us)=4,632,190
avg(us)=46,321
max(us)=99,178
latency-distribution[
0..99us=1
1600..3199us=3
3200..6399us=3
6400..12799us=7
12800..25599us=13
25600..51199us=32
51200..102399us=41]]]]
According to your store usage, a similar output can be seen for Hazelcast JCache, Queue and
Ringbuffer with persistent datastores.
You can control the StoreLatency plugin using the following properties:
OperationHeartbeats
It shows the deviation between member/member operation heartbeats. Each member, regardless if
there is an operation running on behalf of that member, sends an operation heartbeat to every
other member. It contains a listing of all callIds of the running operations from a given member.
This plugin also works fine between members/lite-members.
Because this operation heartbeat is sent periodically; by default 1/4 of the operation call timeout of
60 seconds, we would expect an operation heartbeat to be received every 15 seconds. Operation
heartbeats are high priority packets (so they overtake regular packets) and are processed by an
isolated thread in the invocation monitor. If there is any deviation in the frequency of receiving
these packets, it may be due to the problems such as network latencies.
The following shows an example of the output where an operation heartbeat has not been received
517
for 37 seconds:
The OperationHeartbeats plugin is enabled by default since it has very little overhead and only
prints to the diagnostics file if the maximum deviation percentage (explained below) is exceeded.
You can control the OperationHeartbeats plugin using the following properties:
MemberHeartbeats
This plugin looks a lot like the OperationHeartbeats plugin, but instead of relying on operation
heartbeats to determine the deviation, it relies on member/member cluster heartbeats. Every
member sends a heartbeat to other members periodically (by default every 5 seconds).
Just like the OperationHeartbeats, the MemberHeartbeats plugin can be used to detect if there are
networking problems long before they actually lead to problems such as split-brain syndromes.
The following shows an example of the output where no member/member heartbeat has been
received for 9 seconds:
The MemberHeartbeats plugin is enabled by default since it has very little overhead and only prints
to the diagnostics file if the maximum deviation percentage (explained below) is exceeded.
518
You can control the MemberHeartbeats plugin using the following properties:
OperationThreadSamples
This plugin samples the operation threads and checks the running operations/tasks. Hazelcast has
the slow operation detector which is useful for very slow operations. But it may not be efficient for
high volumes of not too slow operations. Using the OperationThreadSamples plugin it is more clear
to see which operations are actually running.
You can control the OperationThreadSamples plugin using the following properties:
519
28-08-2018 07:40:07 1535442007330 OperationThreadSamples[
Partition[
com.hazelcast.map.impl.operation.MapSizeOperation=304623 85.6927%
com.hazelcast.map.impl.operation.PutOperation=33061 9.300304%
com.hazelcast.map.impl.operation.GetOperation=17799 5.0069904%]
Generic[
com.hazelcast.client.impl.ClientEngineImpl$PriorityPartitionSpecificRunnable=2308
35.738617%
com.hazelcast.nio.Packet=1767 27.361412%
com.hazelcast.internal.cluster.impl.operations.JoinRequestOp=821 12.712914%
com.hazelcast.spi.impl.operationservice.impl.operations.PartitionIteratingOperation=27
8 4.3047385%
com.hazelcast.internal.cluster.impl.operations.HeartbeatOp=93 1.4400743%
com.hazelcast.internal.cluster.impl.operations.OnJoinOp=89 1.3781357%
com.hazelcast.internal.cluster.impl.operations.WhoisMasterOp=75 1.1613503%
com.hazelcast.client.impl.operations.ClientReAuthOperation=33 0.51099414%]]
As can be seen above, the MapSizeOperations run on the operation threads most of the time.
WanDiagnostics
The WAN diagnostics plugin provides information about the WAN replication.
520
20.8. Health Check and Monitoring
Hazelcast provides the HTTP-based Health Check endpoint, Health Check script and Health
Monitoring utility.
To be able to benefit from the Health Check endpoint and script, you must enable the Health Check
using either one of the following configuration options:
<hazelcast>
...
<network>
<rest-api enabled="true">
<endpoint-group name=“HEALTHCHECK” enabled=“true”/>
</rest-api>
</network>
...
</hazelcast>
<hazelcast>
...
<advanced-network>
<rest-server-socket-endpoint-config>
<endpoint-groups>
<endpoint-group name=“HEALTHCHECK” enabled=“true”/>
</endpoint-groups>
</rest-server-socket-endpoint-config>
</advanced-network>
...
</hazelcast>
This is Hazelcast’s HTTP based health check implementation which provides basic information
about your cluster and member (on which it is launched).
First, you need to enable the health check as explained in the introduction of this section above.
Now you retrieve information about your cluster’s health status (member state, cluster state,
cluster size, etc.) by launching http://<your member's host IP>:5701/hazelcast/health on your
preferred browser.
521
{
"nodeState": "ACTIVE",
"clusterState": "ACTIVE",
"clusterSafe": true,
"migrationQueueSize": 0,
"clusterSize": 3
}
See the Managing Cluster and Member States section to learn more about each state of a Hazelcast
cluster and member.
The healthcheck.sh script comes with the Hazelcast package. Internally, it uses the HTTP-based
Health Check endpoint. You will need to enable the endpoint by using the advanced-network or the
network configuration element. See the Health Check and Monitoring section.
You can use the script to check health parameters in the following manner:
$ ./healthcheck.sh <parameters>
522
Parameter Default Value Description
--cacert set of well-known CA Defines trusted PEM-encoded certificate file path. It’s
certificates used to verify member certificates.
Assuming the member is deployed under the address 127.0.0.1:5701 and it is in the healthy state,
the following output is expected:
Assuming there is no member running under the address 127.0.0.1:5701, the following output is
expected:
Health monitor periodically prints logs in your console to provide information about your
member’s state. By default, it is enabled when you start your cluster.
You can set the interval of health monitoring using the hazelcast.health.monitoring.delay.seconds
system property. Its default value is 30 seconds.
523
hazelcast.health.monitoring.threshold.cpu.percentage system properties, whose default values are
both 70.
operations.running.count=0, operations.pending.invocations.percentage=0.00%,
operations.pending.invocations.count=0, proxy.count=0, clientEndpoint.count=1,
See the Configuring with System Properties section to learn how to set system
properties.
The F5® BIG-IP® Local Traffic Manager™ (LTM) can be used as a load balancer for Hazelcast
cluster members. This section describes how you can configure a health monitor to check the
Hazelcast member states.
Monitor Types
• HTTP Monitor: A custom HTTP monitor enables you to send a command to Hazelcast’s Health
Check API using HTTP requests. This is a good choice if SSL/TLS is not enabled in your cluster.
• HTTPS Monitor: A custom HTTPS monitor enables you to verify the health of Hazelcast cluster
members by sending a command to Hazelcast’s Health Check API using Secure Socket Layer
524
(SSL) security. This is a good choice if SSL/TLS is enabled in your cluster.
Configuration
After signing in to the BIG-IP LTM User Interface, follow F5’s ^instructions to create a new monitor.
Next, apply the following configuration according to your monitor type.
HTTP/HTTPS Monitors
Please note that you should enable the Hazelcast health check for HTTP/HTTPS
monitors to run. You will need to enable the endpoint by using the advanced-
network or the network configuration element. See the Health Check and Monitoring
section.
{"nodeState":"ACTIVE","clusterState":"ACTIVE","clusterSafe":true,"migrationQueueSiz
e":0,"clusterSize":([^\s]+)}
The BIG-IP LTM monitors accept regular expressions in these strings allowing you to configure
them as needed. The example provided above remains green even if the cluster size changes.
200 OK
As you can see, the HEAD request only checks for a 200 OK response. A Hazelcast cluster member
525
sends this status code when it is alive and running without an issue. This provides a very basic
health check. For increased flexibility, we recommend using the GET request API.
TCP_HALF_OPEN Monitors
• Optionally, set the "Alias Service Port" as the port of Hazelcast cluster member if you want to
specify the port in the monitor.
See the Hazelcast Management Center Documentation for its usage details.
The support for script execution is disabled by default. The reason is security. Script engines allow
to access the underlying system on the members (files and other resources). Scripts access the
system, on which the member runs, with permissions of the running user.
<hazelcast>
...
<management-center scripting-enabled="false" />
...
</hazelcast>
Note that the JSR 223 API is used in Hazelcast IMDG to support scripting.
It’s possible to restrict the source IP addresses from which Management center operations are
allowed. By default all source connections are allowed.
526
<hazelcast>
<properties>
<property name="hazelcast.mc.trusted.interfaces">192.168.1.*</property>
</properties>
</hazelcast>
This property is only available in the patch releases for Hazelcast IMDG minor
version 4.0. Starting with IMDG 4.1 a new configuration element is added under
management-center configuration block and the property-based configuration is not
supported anymore.
See the Hazelcast Management Center Documentation for information on Clustered JMX and
Clustered REST (via Management Center) features.
21. Security
Hazelcast IMDG Enterprise Feature
This chapter describes the security features of Hazelcast. These features allow you to perform
security activities, such as intercepting socket connections and remote operations executed by the
clients, encrypting the communications between the members at socket level and using SSL socket
communication. All of the Security features explained in this chapter are the features of Hazelcast
IMDG Enterprise edition.
• and perform access control checks on client operations. Access control can be done according to
endpoint principal and/or endpoint address.
<hazelcast>
...
<security enabled="true">
...
</security>
...
</hazelcast>
527
Config cfg = new Config();
SecurityConfig securityCfg = cfg.getSecurityConfig();
securityCfg.setEnabled( true );
Also, see the Setting License Key section for information on how to set your Hazelcast IMDG
Enterprise license.
Hazelcast allows you to intercept socket connections before a member joins a cluster or a client
connects to a member of a cluster. This allow you to add custom hooks to join and perform
connection procedures (like identity checking using Kerberos, etc.).
The following is an example socket interceptor implementation for the member side.
528
public static class MySocketInterceptor implements MemberSocketInterceptor {
public MySocketInterceptor() {
}
@Override
public void onAccept(Socket socket) throws IOException {
socket.getOutputStream().write(memberId.getBytes());
byte[] bytes = new byte[1024];
int len = socket.getInputStream().read(bytes);
String otherMemberId = new String(bytes, 0, len);
if (!otherMemberId.equals("secondMember")) {
throw new RuntimeException("Not a known member!!!");
}
}
@Override
public void init(Properties properties) {
memberId = properties.getProperty("member-id");
}
@Override
public void onConnect(Socket socket) throws IOException {
socket.getOutputStream().write(memberId.getBytes());
byte[] bytes = new byte[1024];
int len = socket.getInputStream().read(bytes);
String otherMemberId = new String(bytes, 0, len);
if (!otherMemberId.equals("firstMember")) {
throw new RuntimeException("Not a known member!!!");
}
}
529
<hazelcast>
...
<network>
<socket-interceptor enabled="true">
<class-name>com.hazelcast.examples.MySocketInterceptor</class-name>
<properties>
<property name="kerberos-host">kerb-host-name</property>
<property name="kerberos-config-file">kerb.conf</property>
</properties>
</socket-interceptor>
</network>
...
</hazelcast>
The following is an example configuration of the above socket interceptor for the client side.
return config;
}
530
21.3. Security Interceptor
Hazelcast IMDG Enterprise Feature
Hazelcast allows you to intercept every remote operation executed by the client. This lets you add a
very flexible custom security logic. To do this, implement
com.hazelcast.security.SecurityInterceptor.
@Override
public void before(Credentials credentials, String objectType, String objectName,
String methodName,
Parameters parameters) throws AccessControlException {
if (objectName.equals(DENIED_MAP_NAME)) {
throw new RuntimeException("Denied Map!!!");
}
if (methodName.equals(DENIED_METHOD)) {
throw new RuntimeException("Denied Method!!!");
}
Object firstParam = parameters.get(0);
Object secondParam = parameters.get(1);
if (firstParam.equals(DENIED_KEY)) {
throw new RuntimeException("Denied Key!!!");
}
if (secondParam.equals(DENIED_VALUE)) {
throw new RuntimeException("Denied Value!!!");
}
}
@Override
public void after(Credentials credentials, String objectType, String objectName,
String methodName,
Parameters parameters) {
System.err.println("qwe c: " + credentials + "\t\tt: " + objectType + "\t\tn:
" + objectName
+ "\t\tm: " + methodName + "\t\tp1: " + parameters.get(0) + "\t\tp2: "
+ parameters.get(1));
}
The before method is called before processing the request on the remote server. The after method
is called after the processing. Exceptions thrown while executing the before method are propagated
to the client, but exceptions thrown while executing the after method are suppressed.
21.4. Encryption
Hazelcast IMDG Enterprise Feature
Hazelcast offers features which allow to reach a required privacy on communication level by
531
enabling encryption. Encryption is based on Java Cryptography Architecture (JCA).
1. TLS protocol
The preferred and recommended feature is the TLS protocol as it’s a standard way how to protect
communication on transport level.
Symmetric encryption for Hazelcast member protocol can be configured with cipher algorithms
implemented by security providers and accessed through Java Cryptography Architecture. Check
documentation of your Java version to learn about supported algorithm names. The following are
some examples:
• AES
• PBEWithMD5AndDES
• DES/ECB/PKCS5Padding
• Blowfish
Hazelcast uses MD5 message-digest algorithm as the cryptographic hash function. You can also use
the salting process by giving a salt and password which are then concatenated and processed with
MD5, and the resulting output is stored with the salt.
In symmetric encryption, each member uses the same key, so the key is shared. Here is an example
configuration for symmetric encryption.
<hazelcast>
...
<network>
<symmetric-encryption enabled="true">
<algorithm>AES</algorithm>
<salt>thesalt</salt>
<password>thepass</password>
<iteration-count>175</iteration-count>
</symmetric-encryption>
</network>
...
</hazelcast>
532
You set the encryption algorithm, the salt, password and the iteration count to be used for
generating the secret key. You also need to set the enabled attribute to true. Note that all members
should have the same encryption configuration.
Since symmetric encryption relies on JCA, you can additionally benefit from the algorithms
provided by the Bouncy Castle Crypto APIs. For this, you need to set the
hazelcast.security.bouncy.enabled property to true.
21.5. TLS/SSL
Hazelcast IMDG Enterprise Feature
You can use the SSL (Secure Sockets Layer) protocol to establish an encrypted communication
across your Hazelcast cluster with key stores and trust stores. Note that, if you are developing
applications using Java 8, you will be using its successor TLS (Transport Layer Security).
It is NOT recommended to reuse the key stores and trust stores for external
applications.
Hazelcast allows you to encrypt socket level communication between Hazelcast members and
between Hazelcast clients and members, for end to end encryption. To use it, you need to
implement com.hazelcast.nio.ssl.SSLContextFactory and configure the SSL section in the network
configuration.
The following is the base declarative configuration for the implemented SSLContextFactory:
533
<hazelcast>
...
<network>
<ssl enabled="true">
<factory-class-name>
com.hazelcast.examples.MySSLContextFactory
</factory-class-name>
<properties>
<property name="foo">bar</property>
</properties>
</ssl>
</network>
...
</hazelcast>
<hazelcast>
...
<network>
<ssl enabled="true">
<factory-class-name>
com.hazelcast.nio.ssl.BasicSSLContextFactory
</factory-class-name>
<properties>
<property name="keyStore">/opt/hazelcast-keystore.p12</property>
<property name="keyStorePassword">secret.123</property>
<property name="keyStoreType">PKCS12</property>
<property name="trustStore">/opt/hazelcast-truststore.p12</property>
<property name="trustStorePassword">changeit</property>
<property name="trustStoreType">PKCS12</property>
<property name="protocol">TLSv1.2</property>
<property name="mutualAuthentication">REQUIRED</property>
</properties>
</ssl>
</network>
...
</hazelcast>
• keyManagerAlgorithm: Name of the algorithm based on which the authentication keys are
provided.
534
• keyStoreType: Type of the keystore. Its default value is JKS. Another commonly used type is the
PKCS12. Available keystore/truststore types depend on your Operating system and the Java
runtime.
• trustStore: Path of your truststore file. The file truststore is a keystore file that contains a
collection of certificates trusted by your application.
• trustManagerAlgorithm: Name of the algorithm based on which the trust managers are provided.
• trustStoreType: Type of the truststore. Its default value is JKS. Another commonly used type is
the PKCS12. Available keystore/truststore types depend on your Operating system and the Java
runtime.
• ciphersuites: Comma-separated list of cipher suite names allowed to be used. Its default value
are all supported suites in your Java runtime.
• protocol: Name of the algorithm which is used in your TLS/SSL. Its default value is TLS. Available
values are:
◦ TLS
◦ TLSv1
◦ TLSv1.1
◦ TLSv1.2
◦ TLSv1.3
For the protocol property, we recommend you to provide TLS with its version information,
e.g., TLSv1.2. Note that if you write only TLS, your application chooses the TLS version
according to your Java version.
• validateIdentity: Flag which allows enabling endpoint identity validation. It means, during the
TLS handshake client verifies if the server’s hostname (or IP address) matches the information
in X.509 certificate (Subject Alternative Name extension). Possible values are "true" and "false"
(default).
535
<hazelcast-client>
...
<network>
<ssl enabled="true">
<factory-class-name>
com.hazelcast.nio.ssl.BasicSSLContextFactory
</factory-class-name>
<properties>
<property name="keyStore">/opt/hazelcast-client.keystore</property>
<property name="keyStorePassword">clientsSecret</property>
<property name="trustStore">/opt/hazelcast-
client.truststore</property>
<property name="trustStorePassword">changeit</property>
<property name="protocol">TLSv1.2</property>
</properties>
</ssl>
</network>
...
</hazelcast-client>
The same BasicSSLContextFactory properties used for members are available on clients. Clients
don’t need to set mutualAuthentication property as it’s used in configuring the server side of TLS
connections.
TLS connections have two sides: the one opening the connection (TLS client) and the one accepting
the connection (TLS server). By default only the TLS server proves its identity by presenting a
certificate to the TLS client. The mutual authentication means that also the TLS clients prove their
identity to the TLS servers.
Hazelcast members can be on both sides of TLS connection - TLS servers and TLS clients. Hazelcast
clients are always on the client side of a TLS connection.
By default Hazelcast members have keyStore used to identify themselves to the clients and other
members. Both Hazelcast members and Hazelcast clients have trustStore used to define which
members they can trust.
When the mutual authentication feature is enabled, Hazelcast clients need to provide keyStore. A
client proves its identity by providing its certificate to the Hazelcast member it’s connecting to. The
member only accepts the connection if the client’s certificate is present in the member’s trustStore.
To enable the mutual authentication, set the mutualAuthentication property value to REQUIRED on the
member side, as shown below:
536
Config cfg = new Config();
Properties props = new Properties();
props.setProperty("mutualAuthentication", "REQUIRED");
props.setProperty("keyStore", "/opt/hazelcast.keystore");
props.setProperty("keyStorePassword", "123456");
props.setProperty("trustStore", "/opt/hazelcast.truststore");
props.setProperty("trustStorePassword", "123456");
cfg.getNetworkConfig().setSSLConfig(new SSLConfig().setEnabled(true).setProperties
(props));
Hazelcast.newHazelcastInstance(cfg);
And on the client side, you need to set client identity by providing the keystore:
clientSslProps.setProperty("keyStore", "/opt/client.keystore");
clientSslProps.setProperty("keyStorePassword", "123456");
• REQUIRED: Server asks for client certificate. If the client does not provide a keystore or the
provided keystore is not verified against member’s truststore, the client is not authenticated.
• OPTIONAL: Server asks for client certificate, but client is not required to provide any valid
certificate.
When a new client is introduced with a new keystore, the truststore on the
member side should be updated accordingly to include new clients' information to
be able to accept it.
See the below example snippet to see the full configuration on the client side:
config.getNetworkConfig().setSSLConfig(new SSLConfig().setEnabled(true).setProperties
(clientSslProps));
HazelcastClient.newHazelcastClient(config);
If the mutual authentication is not required, the Hazelcast members accept all incoming TLS
connections without verifying if the connecting side is trusted. Therefore it’s recommended to
require the mutual authentication in Hazelcast members configuration.
537
21.5.4. TLS/SSL Performance Improvements for Java
TLS/SSL can have a significant impact on performance. There are a few ways to increase the
performance.
The first thing that can be done is making sure that AES intrinsics are used. Modern CPUs (2010 or
newer Westmere) have hardware support for AES encryption/decryption and if a Java 8 or newer
JVM is used, the JIT automatically makes use of these AES intrinsics. They can also be explicitly
enabled using -XX:+UseAES -XX:+UseAESIntrinsics, or disabled using -XX:-UseAES -XX:
-UseAESIntrinsics.
A lot of encryption algorithms make use of padding because they encrypt/decrypt in fixed sized
blocks. If there is no enough data for a block, the algorithm relies on random number generation to
pad. Under Linux, the JVM automatically makes use of /dev/random for the generation of random
numbers. /dev/random relies on entropy to be able to generate random numbers. However, if this
entropy is insufficient to keep up with the rate requiring random numbers, it can slow down the
encryption/decryption since /dev/random will block; it could block for minutes waiting for sufficient
entropy . This can be fixed by setting the -Djava.security.egd=file:/dev/./urandom system property.
For a more permanent solution, modify the <JAVA_HOME>/jre/lib/security/java.security file, look
for the securerandom.source=/dev/urandom and change it to securerandom.source=file:/dev/./urandom.
Switching to /dev/urandom could be controversial because /dev/urandom will not block if there is a
shortage of entropy and the returned random values could theoretically be vulnerable to a
cryptographic attack. If this is a concern in your application, use /dev/random instead.
Hazelcast’s Java smart client automatically makes use of extra I/O threads for
encryption/decryption and this have a significant impact on the performance. This can be changed
using the hazelcast.client.io.input.thread.count and hazelcast.client.io.output.thread.count
client system properties. By default it is 1 input thread and 1 output thread. If TLS/SSL is enabled, it
defaults to 3 input threads and 3 output threads. Having more client I/O threads than members in
the cluster does not lead to an increased performance. So with a 2-member cluster, 2 in and 2 out
threads give the best performance.
TLS/SSL in Java is normally provided by the JRE. However, the performance overhead can be
significant; even with AES intrinsics enabled. If you are using a x86_64 system (Linux, Mac,
Windows), Hazelcast supports native integration for TLS/SSL which can provide significant
performance improvements. There are two supported native TLS/SSL libraries available through
netty-tcnative libraries:
• OpenSSL
◦ dynamically linked
538
• BoringSSL - Google managed fork of the OpenSSL
◦ statically linked
The native TLS integration can be used on clients and/or members. For best performance, it is
recommended to install on a client and member and configure the appropriate cipher suite(s).
Starting with Hazelcast IMDG 4.0, if the Java version is less than 11 and OpenSSL
capabilities are detected (also the appropriate Java libraries are included) and if no
explicit SSLEngineFactory is set, Hazelcast IMDG defaults to use OpenSSL.
For the native TLS/SSL integration in Java, the Netty library is used.
Make sure the following libraries from the Netty framework are on the classpath:
◦ or OpenSSL: netty-tcnative-{tcnative_version}-{os_arch}.jar
It is very important that the version of Netty JAR(s) corresponds to a very specific
version of netty-tcnative. In case of doubt, the simplest thing to do is to download
the netty-<version>.tar.bz2 file from the Netty website and check which netty-
tcnative version is used for that Netty release.
The statically linked BoringSSL binaries are included within the netty-tcnative libraries. There is
no need to install additional software on supported systems.
539
<dependencies>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-tcnative-boringssl-static</artifactId>
<version>2.0.12.Final</version>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-handler</artifactId>
<version>4.1.27.Final</version>
</dependency>
</dependencies>
1. Install OpenSSL. Make sure that you are installing 1.0.1 or newer release. See its documentation
at github.com/openssl.
<dependencies>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-tcnative</artifactId>
<version>2.0.12.Final</version>
<classifier>linux-x86_64</classifier>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-handler</artifactId>
<version>4.1.27.Final</version>
</dependency>
</dependencies>
Configuring OpenSSL in Hazelcast is straight forward. On the client and/or member side, the
following snippet enables TLS/SSL using OpenSSL:
540
<hazelcast>
...
<network>
<ssl enabled="true">
<factory-class-name>com.hazelcast.nio.ssl.OpenSSLEngineFactory</factory-
class-name>
<properties>
<property name="protocol">TLSv1.2</property>
<property name="trustCertCollectionFile">trusted-certs.pem</property>
<!-- If the TLS mutual authentication is not used,
then the key configuration is not needed on client side. -->
<property name="keyFile">privkey.pem</property>
<property name="keyCertChainFile">chain.pem</property>
</properties>
</ssl>
</network>
...
</hazelcast>
The configuration is similar to a regular TLS/SSL integration. The main differences are the
OpenSSLEngineFactory factory class and the following properties:
• fipsMode: Boolean flag to switch OpenSSL into the FIPS mode. See the FIPS 140-2 section.
The key and certificate related properties take precedence over keyStore and trustStore
configurations. Using keyStores and trustStores together with OpenSSL causes problems on some
Java versions, therefore we recommend to use the OpenSSL native way.
◦ Using the keyStore property is not recommended, use keyFile and keyCertChainFile instead
• keyStoreType: Type of the keystore. Its default value is JKS. Another commonly used type is the
PKCS12. Available keystore/truststore types depend on your Operating system and the Java
runtime.
• keyManagerAlgorithm: Name of the algorithm based on which the authentication keys are
provided.
• trustManagerAlgorithm: Name of the algorithm based on which the trust managers are provided.
• trustStore: Path of your truststore file. The file truststore is a keystore file that contains a
collection of certificates trusted by your application. Its type should be JKS.
541
◦ Using the trustStore property is not recommended, use trustCertCollectionFile instead
• trustStoreType: Type of the truststore. Its default value is JKS. Another commonly used type is
the PKCS12. Available keystore/truststore types depend on your operating system and the Java
runtime.
• protocol: Name of the algorithm which is used in your TLS/SSL. Its default value is TLSv1.2.
Available values are:
◦ TLS
◦ TLSv1
◦ TLSv1.1
◦ TLSv1.2
◦ SSL (insecure!)
◦ SSLv2 (insecure!)
◦ SSLv3 (insecure!)
All of the algorithms listed above support Java 8 and higher versions. For the protocol
property, we recommend you to provide SSL or TLS with its version information, e.g.,
TLSv1.2. Note that if you provide only SSL or TLS as a value for the protocol property, they are
converted to SSLv3 and TLSv1.2, respectively. We strongly recommend to avoid SSL protocols.
• validateIdentity: Flag which allows enabling endpoint identity validation. It means, during the
TLS handshake client verifies if the server’s hostname (or IP address) matches the information
in X.509 certificate (Subject Alternative Name extension). Possible values are "true" and "false"
(default).
In order to use a secured communication between the Hazelcast cluster and Management Center,
you have to configure Management Center as explained in the Connecting Hazelcast members to
Management Center section in the Hazelcast Management Center Reference Manual.
Hazelcast allows updating TLS certificates on the members without fully stopping the cluster. You
can stop the cluster members one by one and replace the certificates gradually. We can distinguish
two cases based on the fact if the new certificate is already trusted:
This is usually a case when self-signed certificates are used on the members.
Before we can deploy new member certificates, we have to update the list of trusted certificates
542
on all members. Complete the following steps on each member (one by one) in the cluster:
◦ Import all new certificates to the member’s truststore, so it contains both old and new ones.
You can use the keytool executable from Java installation to import the new certificates.
Example:
After completing the above steps, follow the steps described in the next point (certificates
trusted).
To get the best performance, the correct cipher suites need to be configured. Each cipher suite has
different performance and security characteristics and depending on the hardware and selected
cipher suite, the overhead of TLS can range from dramatic to almost negligible.
The cipher suites are configured using the ciphersuites property as shown below:
543
<hazelcast>
...
<network>
<ssl enabled="true">
<factory-class-name>...</factory-class-name>
<properties>
<property name="keyStore">upload/hazelcast.keystore</property>
<property name="ciphersuites">TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,
TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA</property>
</properties>
</ssl>
</network>
...
</hazelcast>
The ciphersuites property accepts a comma separated list (spaces, enters, tabs are filtered out) of
cipher suites in the order of preference.
You can configure a member and client with different cipher suites; but there should be at least one
shared cipher suite.
One of the cipher suites that gave very low overhead but still provides solid security is
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256. However in our measurements this cipher suite only
performs well using OpenSSL; using the regular Java TLS integration, it performs badly. So keep
that in mind when configuring a client using regular SSL and a member using OpenSSL.
Please check with your security expert to determine which cipher suites are appropriate and run
performance tests to see which ones perform well in your environment.
If you don’t configure the cipher suites, then both client and/or member determine a cipher suite by
themselves during the TLS/SSL handshake. This can lead to suboptimal performance and lower
security than required.
You can set all the properties presented in this section using the javax.net.ssl prefix, e.g.,
javax.net.ssl.keyStore and javax.net.ssl.keyStorePassword.
Also note that these properties can be specified using the related Java system properties and also
Java’s -D command line option. This is very useful if you require a more flexible configuration, e.g.,
when doing performance tests.
System.setProperty("javax.net.ssl.trustStore", "/user/home/hazelcast.ts");
Or,
544
-Djavax.net.ssl.trustStore=/user/home/hazelcast.ts
System.setProperty("javax.net.ssl.ciphersuites",
"TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA");
Or,
-Djavax.net.ssl.ciphersuites=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
,TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA
Hazelcast IMDG Enterprise offers a secret validation mechanism including a strength policy. The
term "secret" here refers to the symmetric encryption password, salt and other passwords and keys.
For this validation, Hazelcast IMDG Enterprise comes with the DefaultSecretStrengthPolicy class to
identify all possible weaknesses of secrets and to display a warning in the system logger. Note that,
by default, no matter how weak the secrets are, the cluster members still start after logging this
warning; however, this is configurable (see the Enforcing the Secret Strength Policy section).
• Large keyspace use, ensuring the use of at least three of the following:
◦ mixed case
◦ alpha
◦ numerals
◦ special characters
◦ no dictionary words
The rules "Minimum length of eight characters" and "no dictionary words" can be configured using
the following system properties:
Example:
-Dhazelcast.security.secret.policy.min.length=10
545
hazelcast.security.dictionary.policy.wordlist.path: Set the path of a wordlist available in the file
system. The default is /usr/share/dict/words.
Example:
-Dhazelcast.security.dictionary.policy.wordlist.path=”/Desktop/myWordList”
You can implement SecretStrengthPolicy to develop your custom strength policy for a more flexible
or strict security. After you implement it, you can use the following system property to point to your
custom class:
Example:
-Dhazelcast.security.secret.strength.default.policy.class=”com.impl.myStrengthPolicy”
By default, secret strength policy is NOT enforced. This means, if a weak secret is detected, an
informative warning is shown in the system logger and the members continue to initialize.
However, you can enforce the policy using the following system property so that the members are
not started until the weak secret errors are fixed:
-Dhazelcast.security.secret.strength.policy.enforced=true
The following is an example warning when secret strength policy is NOT enforced, i.e., the above
system property is set to “false”:
The following is an example warning when secret strength policy is enforced, i.e., the above system
property is set to “true”:
546
WARNING: [192.168.2.112]:5701 [dev] [4.0-SNAPSHOT]
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ SECURITY WARNING @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Symmetric Encryption Password does not meet the current policy and complexity
requirements.
*Must contain at least 1 number.
*Must contain at least 1 special character.
Group Password does not meet the current policy and complexity requirements.
*Must not be set to the default.
*Must have at least 1 lower and 1 upper case characters.
*Must contain at least 1 number.
*Must contain at least 1 special character.
Symmetric Encryption Salt does not meet the current policy and complexity
requirements.
*Must contain 8 or more characters.
*Must contain at least 1 number.
*Must contain at least 1 special character.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Exception in thread "main" com.hazelcast.security.WeakSecretException: Weak secrets
found in configuration, check output above for more details.
at
com.hazelcast.security.impl.WeakSecretsConfigChecker.evaluateAndReport(WeakSecretsConf
igChecker.java:49)
at
com.hazelcast.instance.EnterpriseNodeExtension.printNodeInfo(EnterpriseNodeExtension.j
ava:197)
at com.hazelcast.instance.Node.<init>(Node.java:194)
at
com.hazelcast.instance.HazelcastInstanceImpl.createNode(HazelcastInstanceImpl.java:163
)
at com.hazelcast.instance.HazelcastInstanceImpl.<init>(HazelcastInstanceImpl.java:130)
at
com.hazelcast.instance.HazelcastInstanceFactory.constructHazelcastInstance(HazelcastIn
stanceFactory.java:195)
at
com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstance
Factory.java:174)
at
com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstance
Factory.java:124)
at com.hazelcast.core.Hazelcast.newHazelcastInstance(Hazelcast.java:58)
547
<security enabled="true">
<realms>
<realm name="realm1">
<authentication>
<jaas>
<login-module class-name=
"com.hazelcast.examples.MyRequiredLoginModule" usage="REQUIRED">
<properties>
<property name="property">value</property>
</properties>
</login-module>
</jaas>
</authentication>
<identity>
<credentials-factory class-name=
"com.hazelcast.examples.MyCredentialsFactory">
<properties>
<property name="property">value</property>
</properties>
</credentials-factory>
</identity>
</realm>
</realms>
<member-authentication realm="realm1"/>
<client-authentication realm="realm1"/>
</security>
There are several types of authentication configuration available in a security realm. The realm
cannot have more than one authentication method specified.
• tls: verifies that the TLS mutual authentication was used in the incoming connection and the
peer’s certificate chain is available.
The <jaas> authentication configuration is the most flexible form of the authentication
configuration, but it requires knowledge of JAAS login modules and related concepts. It allows
using custom login modules and ordering them in a login module stack.
The following is a sample configuration which authenticates against an LDAP server or database as
a fallback:
548
<realm name="jaasRealm">
<authentication>
<jaas>
<login-module class-name="com.examples.LdapLoginModule" usage="SUFFICIENT
">
<properties>
<property name="url">ldap://corp-ldap/</property>
</properties>
</login-module>
<login-module class-name="com.examples.DatabaseLoginModule" usage=
"SUFFICIENT">
<properties>
<property name="type">ora18</property>
<property name="host">corp-db</property>
<property name="table">USERS</property>
</properties>
</login-module>
</jaas>
</authentication>
</realm>
LDAP servers are one of the most popular identity stores in companies. They can track information
about the organization structure, users, groups, servers and configurations.
Hazelcast supports authentication and authorization against LDAP servers. The authentication
verifies the provided name and password. The authorization part allows to map roles to the
authenticated user.
• making a new LDAP bind operation with the given name and password
• using a separate "admin connection" to verify the provided password against an LDAP object
attribute.
The LDAP authentication allows also a role mapping. As there are more ways how roles can be
mapped in the LDAP, Hazelcast provides several approaches to retrieve them:
• attribute: The role name is stored as an attribute in the object representing the identity.
• direct mapping: The identity object contains an attribute with reference to the role object(s).
• reverse mapping: The role objects having a reference to the identity object are searched.
The direct and reverse mapping modes also allow a role search recursion.
549
Option Name Default Value Description
url URL of the LDAP server. The value is configured as the
JNDI environment property, i.e., java.naming.provider.url.
socket-factory- Socket factory class name. The factory can be used for fine
class-name grained configuration of the TLS protocol on top of the
LDAP protocol, i.e., ldaps scheme.
parse-dn false If set to true, it treats the value of role-mapping-attribute
as a DN and extracts only the role-name-attribute values
as role names. If set to false, the whole value of role-
mapping-attribute is used as a role name.
550
role-mapping-mode attribute Role mapping mode. It can have one of the following
values:
551
system-user-dn Admin account DN. If configured, then the following are
true:
• For the user and role object, search queries are used
an admin connection instead of the "user" one created
by LDAP bind with provided credentials.
552
user-search-scope subtree LDAP search scope used for user-filter search. It can
have one of the following values:
Detailed logging for LDAP authentication can be enabled by configuring a more verbose logger
level for the com.hazelcast.security package as described in the Security Debugging section.
The LDAP authentication type supports TLS protected connections to LDAP servers, i.e., the ldaps
protocol scheme. The TLS is handled on the Java runtime side (JNDI API and URL handlers).
When using TLS, the LDAP provider will, by default, use the socket factory,
javax.net.ssl.SSLSocketFactory for creating a TLS socket to communicate with the server, using the
default JSSE configuration. By default, the server’s certificate is validated against Java default CA
certificate store and hostname in LDAPs URL is verified against the name(s) in the server certificate.
The behavior can be controlled globally by using javax.net.ssl.* properties. Here is an example:
java -Djavax.net.ssl.trustStore=/opt/hazelcast.truststore \
-Djavax.net.ssl.trustStorePassword=123456 \
-Djavax.net.ssl.keyStore=/opt/hazelcast.keystore \
-Djavax.net.ssl.keyStorePassword=123456 \
...
There can be also properties specific to vendor or Java version allowing more fine-grained control.
Here is an example on disabling host name validation:
-Dcom.sun.jndi.ldap.object.disableEndpointIdentification=true
When even more control is necessary, you can implement your own SSLSocketFactory and use its
class name as the value in the ldap authentication option socket-factory-class-name.
package security.ldap;
import java.io.FileInputStream;
553
import java.io.IOException;
import java.net.InetAddress;
import java.net.Socket;
import java.security.KeyStore;
import java.security.SecureRandom;
import javax.net.SocketFactory;
import javax.net.ssl.SSLContext;
import javax.net.ssl.SSLSocketFactory;
import javax.net.ssl.TrustManagerFactory;
/**
* JNDI uses this method when creating {@code ldaps} connections.
*/
public static SocketFactory getDefault() {
return INSTANCE;
}
public CustomSSLSocketFactory() {
try {
KeyStore trustStore = KeyStore.getInstance(KeyStore.getDefaultType());
try (FileInputStream fis = new FileInputStream("/opt/ldap.truststore")) {
trustStore.load(fis, "S3cr3t".toCharArray());
}
TrustManagerFactory tmFactory = TrustManagerFactory.getInstance
(TrustManagerFactory.getDefaultAlgorithm());
tmFactory.init(trustStore);
SSLContext sc = SSLContext.getInstance("TLS");
sc.init(null, tmFactory.getTrustManagers(), new SecureRandom());
delegate = sc.getSocketFactory();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
@Override
public String[] getDefaultCipherSuites() {
return delegate.getDefaultCipherSuites();
}
@Override
public String[] getSupportedCipherSuites() {
return delegate.getSupportedCipherSuites();
}
554
@Override
public Socket createSocket(Socket arg0, String arg1, int arg2, boolean arg3)
throws IOException {
return delegate.createSocket(arg0, arg1, arg2, arg3);
}
@Override
public Socket createSocket(String arg0, int arg1) throws IOException {
return delegate.createSocket(arg0, arg1);
}
@Override
public Socket createSocket(InetAddress arg0, int arg1) throws IOException {
return delegate.createSocket(arg0, arg1);
}
@Override
public Socket createSocket(String arg0, int arg1, InetAddress arg2, int arg3)
throws IOException {
return delegate.createSocket(arg0, arg1, arg2, arg3);
}
@Override
public Socket createSocket(InetAddress arg0, int arg1, InetAddress arg2, int arg3)
throws IOException {
return delegate.createSocket(arg0, arg1, arg2, arg3);
}
}
<realm name="ldapsRealm">
<authentication>
<ldap>
<url>ldaps://ldapserver.acme.com</url>
<socket-factory-class-name>security.ldap.CustomSSLSocketFactory</socket-
factory-class-name>
<role-mapping-attribute>cn</role-mapping-attribute>
</ldap>
</authentication>
</realm>
The LDAP authentication is backed by the JNDI API in Java. It has also the failover support. You can
configure multiple space-separated URLs in the <url> option:
555
<realm name="ldapFallbackRealm">
<authentication>
<ldap>
<url>ldap://ldap-master.example.com ldap://ldap-backup.example.com</url>
</ldap>
</authentication>
</realm>
Hazelcast is able to protect network communication using TLS. The TLS mutual authentication is
also supported. It means not only the server side identifies itself to a client side (member, client,
REST client, etc.), but also the client side needs to prove its identity by using a TLS (X.509)
certificate.
The tls authentication type verifies within the JAAS authentication that the incoming connection
already authenticated the client’s TLS certificate. A ClusterIdentityPrincipal uses the subject DN
(distinguished name) from the client’s TLS certificate.
This authentication type is able to parse a role name from the client’s certificate subject DN. The
<tls> element has an attribute, roleAttribute, which specifies a part of DN to be used as a role
name.
<realm name="tlsRealm">
<authentication>
<tls roleAttribute="cn" />
</authentication>
</realm>
This tls authentication uses cn attribute from the subject DN as the role name. If the subject DN in
the certificate is cn=admin,ou=Devs,o=Hazelcast for instance, then the following Principals are
added:
• ClusterIdentityPrincipal: CN=admin,OU=Devs,O=Hazelcast
• ClusterRolePrincipal: admin
• ClusterEndpointPrincipal: [remote address of the connecting party]
The Identity configuration allows defining own Credentials. These Credentials are used to
authenticate to other systems.
• credentials-factory: Configures the factory class which creates the Credentials objects.
556
Credentials
One of the key elements in Hazelcast security is the Credentials object, which represents evidence
of the identity (member or client). The content of Credentials object is verified during the
authentication. Credentials is an interface which extends Serializable.
There are two subtype interfaces which simplify the Credentials usage. The subtypes reflect data
provided in the client authentication messages:
The Credentials instance can be retrieved in the login modules by handling a CredentialsCallback.
Here is an example:
557
CredentialsCallback credcb = new CredentialsCallback();
try {
callbackHandler.handle(new Callback[] { credcb });
} catch (IOException | UnsupportedCallbackException e) {
throw new LoginException("Unable to retrieve credetials");
}
Credentials credentials = credcb.getCredentials();
if (credentials instanceof PasswordCredentials) {
PasswordCredentials passwordCredentials = (PasswordCredentials) credentials;
if (expectedName.equals(credentials.getName())
&& expectedPassword.equals(passwordCredentials.getPassword())) {
name = credentials.getName();
addRole(name);
return true;
}
}
throw new FailedLoginException("Credentials verification failed.");
Password Credentials
<realms>
<realm name="passwordRealm">
<identity>
<username-password username="member1" password="s3crEt" />
</identity>
</realm>
</realms>
<member-authentication realm="passwordRealm" />
Token Credentials
TokenCredentials instances are also simply configurable for identity representation. The <token/>
XML configuration element allows using either plain ASCII tokens or Base64 encoded values. Its
optional argument encoding can have either base64 or none (default) as its value.
The following two realms define the same token value - bytes of the "Hazelcast" string:
558
<realm name="tokenRealm1">
<identity>
<token>Hazelcast</token>
</identity>
</realm>
<realm name="tokenRealm2">
<identity>
<token encoding="base64">SGF6ZWxjYXN0</token>
</identity>
</realm>
Credentials Factory
The most flexible way to define the Credentials objects is using a custom credential factory. It is an
implementation of com.hazelcast.security.ICredentialsFactory interface. Its newCredentials()
method is the one which provides credentials.
The XML configuration uses <credentials-factory> element to define the factory class.
The behavior of credential factories can be controlled by specifying factory properties. The
properties are provided in the init(Properties) method.
<realm name="credentialsFactoryRealm">
<identity>
<credentials-factory class-name="com.examples.TOTPCredentialsFactory">
<properties>
<property name="seed">
3132333435363738393031323334353637383930</property>
</properties>
</credentials-factory>
</identity>
</realm>
Hazelcast works with the following JAAS Principal implementations added to the Subject:
559
• ClusterIdentityPrincipal: Represents the name of authenticated party (usually one instance in
the Subject).
• ClusterRolePrincipal: Represents the role assigned to the authenticated party (usually zero or
more instances in the Subject).
Set<HazelcastPrincipal> hazelcastPrincipals =
subject.getPrincipals(HazelcastPrincipal.class);
JAAS Callback instances are used for accessing different kinds of data from the LoginModule
implementations. Hazelcast supports the following Callback types:
The callbacks are usually used in the login() method of a login module:
560
CredentialsCallback credcb = new CredentialsCallback();
ConfigCallback ccb = new ConfigCallback();
ClusterNameCallback cncb = new ClusterNameCallback();
try {
callbackHandler.handle(new Callback[] { credcb, ccb, cncb });
} catch (IOException | UnsupportedCallbackException e) {
throw new LoginException("Unable to retrieve necessary data");
}
Credentials remoteCredentials = credcb.getCredentials();
String remoteClusterName = cncb.getClusterName();
Config hazelcastConfig = ccb.getConfig();
21.10.3. ClusterLoginModule
Hazelcast has an abstract implementation of LoginModule that contains shared logic and cleanup
operations. It automatically creates the ClusterEndpointPrincipal instance and it also provides the
addRole(String) method which simplifies adding the ClusterRolePrincipal instances.
ClusterLoginModule implements all methods from the LoginModule interface and makes them final. It
provides protected methods with empty implementations, e.g., onCommit(), to align the logic to user
needs. The module comes also with the following abstract methods:
Extending the ClusterLoginModule is recommended instead of implementing all the required stuff
from scratch.
561
public abstract class ClusterLoginModule implements LoginModule {
ClusterLoginModule supports a basic set of login module options, which allow skipping adding
principals of a given type to the JAAS Subject. It allows, for instance, to have just one
ClusterIdentityPrincipal in the Subject even if there are more login modules in the chain:
Using the above API, you can implement a LoginModule that performs authentication against the
security system of your choice, such databases, directory services or some other corporate standard
you might have. For example, you may wish to have your clients send an identification token in the
Credentials object. This token can then be sent to your backend security system via the LoginModule
that runs on the cluster side.
Additionally, the same system may authenticate the user and also then return the roles that are
attributed to the user. These roles can then be used for data structure authorization.
562
21.11. Cluster Member Security
Hazelcast IMDG Enterprise Feature
Hazelcast supports the standard Java Security (JAAS) based authentication between the cluster
members. A Security Realm can be referenced by <member-authentication/> element to define
authentication between the member and identity of the current member.
<hazelcast>
...
<security enabled="true">
<realms>
<realm name="memberRealm">
<authentication>
<ldap>
<url>ldap://corp-ldap.example.com/</url>
</ldap>
</authentication>
<identity>
<username-password username="uid=member1,dc=example,dc=com" password=
"s3crEt"/>
</identity>
</realm>
</realms>
<member-authentication realm="memberRealm"/>
</security>
...
</hazelcast>
21.12.1. Authentication
The authentication mechanism works in similar way as the cluster member authentication. It can
be referenced by the <member-authentication/> element to define authentication between the
member and identity of the current member.
To implement the client authentication, you reference a Security Realm with the authentication
section defined in the <client-authentication/> element of a cluster member configuration.
563
<hazelcast>
...
<security enabled="true">
<realms>
<realm name="clientRealm">
<authentication>
<ldap>
<url>ldap://corp-ldap.example.com/</url>
<role-mapping-attribute>cn</role-mapping-attribute>
</ldap>
</authentication>
</realm>
</realms>
<member-authentication realm="clientRealm"/>
</security>
...
</hazelcast>
The identity of the connecting client is defined on the client side. There are no security realms on
the clients, but just identity defined directly in the security configuration:
<hazelcast-client>
...
<security>
<username-password username="uid=member1,dc=example,dc=com" password="s3crEt"/>
</security>
...
</hazelcast-client>
On the clients, You can use the same identity types as in security realms:
• username-password
• token
• credentials-factory
21.12.2. Authorization
Hazelcast client authorization is configured by a client permission policy. Hazelcast has a default
permission policy implementation that uses permission configurations defined in the Hazelcast
security configuration. Default policy permission checks are done against instance types (map,
queue, etc.), instance names (map, queue, name, etc.), instance actions (put, read, remove, add, etc.),
the client endpoint address (ClusterEndpointPrincipal) and client roles (ClusterRolePrincipal).
The default permission policy allows to use comma separated names in the principal attribute
configuration. Instance and principal names and endpoint addresses can be defined as
wildcards(*). See the Network Configuration and Using Wildcards sections.
564
<hazelcast>
...
<security enabled="true">
<client-permissions>
<!-- Principals 'admin' and 'root' from endpoint '127.0.0.1' have all
permissions. -->
<all-permissions principal="admin,root">
<endpoints>
<endpoint>127.0.0.1</endpoint>
</endpoints>
</all-permissions>
<!-- Principals named 'dev' from all endpoints have 'create', 'destroy',
'put', 'read' permissions for map named 'myMap'. -->
<map-permission name="myMap" principal="dev">
<actions>
<action>create</action>
<action>destroy</action>
<action>put</action>
<action>read</action>
</actions>
</map-permission>
565
</queue-permission>
package com.hazelcast.security;
/**
* IPermissionPolicy is used to determine any Subject's
* permissions to perform a security sensitive Hazelcast operation.
*
*/
public interface IPermissionPolicy {
void configure( SecurityConfig securityConfig, Properties properties );
void destroy();
}
Permission policy implementations can access client-permissions that are in the configuration by
using SecurityConfig.getClientPermissionConfigs() when Hazelcast calls the
configure(SecurityConfig securityConfig, Properties properties) method.
Permission policy should return a PermissionCollection containing permissions of the given type
for the given Subject. The Hazelcast access controller calls
PermissionCollection.implies(Permission) on returning PermissionCollection and it decides
whether the current Subject has permission to access the requested resources.
21.12.3. Permissions
The following is the list of client permissions that can be configured on the member:
• All Permission
566
<all-permissions principal="principal">
<endpoints>
...
</endpoints>
</all-permissions>
• Map Permission
Actions: all, create, destroy, put, read, remove, lock, intercept, index, listen
• Queue Permission
• Multimap Permission
• Topic Permission
567
<topic-permission name="name" principal="principal">
<endpoints>
...
</endpoints>
<actions>
...
</actions>
</topic-permission>
• List Permission
• Set Permission
• Lock Permission
568
<lock-permission name="name" principal="principal">
<endpoints>
...
</endpoints>
<actions>
...
</actions>
</lock-permission>
• AtomicLong Permission
• CountDownLatch Permission
• IdGenerator Permission
569
<id-generator-permission name="name" principal="principal">
<endpoints>
...
</endpoints>
<actions>
...
</actions>
</id-generator-permission>
• FlakeIdGenerator Permission
• Semaphore Permission
570
<executor-service-permission name="name" principal="principal">
<endpoints>
...
</endpoints>
<actions>
...
</actions>
</executor-service-permission>
• Transaction Permission
<transaction-permission principal="principal">
<endpoints>
...
</endpoints>
</transaction-permission>
• Cache Permission
<user-code-deployment-permission principal="principal">
<endpoints>
...
</endpoints>
<actions>
...
</actions>
</user-code-deployment-permission>
571
The name provided in cache-permission must be the Hazelcast distributed object
name corresponding to the Cache as described in the JCache - Hazelcast Instance
Integration section.
By default, the set of permissions defined in the leader member of a cluster is distributed to the
newly joining members, overriding their own permission configurations, if any. However, you can
configure a new member to be joined, so that it keeps its own set of permissions and even send
these to the existing members in the cluster. This can be done dynamically, i.e., without needing to
restart the cluster, using either one of the following configuration options:
Using the above, you can choose whether a new member joining to a cluster will apply the client
permissions stored in its own configuration, or use the ones defined in the cluster. The behaviors
that you can specify with the configuration are RECEIVE, SEND and NONE, which are described after the
examples below.
The following are the examples for both approaches on how to use them:
Declarative Configuration:
<hazelcast>
...
<security enabled="true">
<client-permissions on-join-operation="SEND">
<!-- ... -->
</client-permissions>
</security>
...
</hazelcast>
Programmatic Configuration:
• RECEIVE: Applies the permissions from the leader member in the cluster before join. This is the
default value.
• SEND: Doesn’t apply the permissions from the leader member before join. If the security is
enabled, then it refreshes or replaces the cluster wide permissions with the ones in the new
572
member after the join is complete. This option is suitable for the scenarios where you need to
replace the cluster wide permissions without restarting the cluster.
• NONE: Neither applies pre-join permissions, nor sends the local permissions to the other
members. It means that the new member does not send its own permission definitions to the
cluster, but keeps them when it joins. However, after the join, when you update the permissions
in the other cluster members, those updates are also sent to the newly joining member.
Therefore, this option is suitable for the scenarios where you need to elevate privileges
temporarily on a single member (preferably a lite member) for a limited time period. The
clients which want to use these temporary permissions have to access the cluster through this
single new member, meaning that you need to disable smart routing for such clients.
Note that, the create and destroy permissions will not work when using the NONE option, since
the distributed objects need to be created/destroyed on all the members.
573
<Configuration>
<Loggers>
<Logger name="com.hazelcast.security" level="ALL"/>
</Loggers>
</Configuration>
Java is able to print the debug information about using the security components. During the
security troubleshooting, it’s often helpful to print the additional information by using the
following system property:
-Djava.security.debug=all
To assist with the TLS/SSL issues, you can use the following system property:
-Djavax.net.debug=all
This property provides a lot of logging output including the TLS/SSL handshake, that can be used to
determine the cause of the problem. See the Debugging TSL/SSL Connections guide for more
information.
The NIST manages a list of FIPS certified cryptographic modules. These modules are certified under
the Cryptographic Module Validation Program. The list can be searched online here.
Hazelcast uses external modules for cryptographic tasks and it can be configured to use a FIPS 140-
2 validated module. It means most of the configuration required for FIPS is outside of the Hazelcast
configuration. To run Hazelcast in the FIPS compliant mode you have to set the underlying Java
runtime into FIPS mode. It may also require switching the underlying Operating System into the
FIPS mode. We consider using a FIPS enabled OS as a recommended approach even in cases when
it’s not asked for explicitly.
Hazelcast is not an authority which should document switching different Java runtimes into the
FIPS mode. Please consult the documentation of your Java version to learn how to enable the FIPS
574
mode. Usually it means changing the list of security providers in the java.security
JRE configuration file.
Hazelcast is only responsible for enabling the OpenSSL native library into FIPS mode (see the
Integrating OpenSSL section). If the Hazelcast cluster configuration enables TLS communication
using the native OpenSSL library, you have to enable its FIPS mode in the Hazelcast
OpenSSLEngineFactory configuration. The FIPS mode is controlled by an optional true/false property
called fipsMode. It is disabled by default.
<hazelcast>
...
<network>
<ssl enabled="true">
<factory-class-name>com.hazelcast.nio.ssl.OpenSSLEngineFactory</factory-
class-name>
<properties>
<property name="fipsMode">true</property>
<property name="protocol">TLSv1.2</property>
<property name="trustCertCollectionFile">trusted-certs.pem</property>
<property name="keyFile">privkey.pem</property>
<property name="keyCertChainFile">chain.pem</property>
</properties>
</ssl>
</network>
...
</hazelcast>
When the fipsMode property is set to true, the native OpenSSL engine is either set to the FIPS mode
or an exception is thrown, e.g., in the cases when OpenSSL is compiled without the FIPS support.
If there is more Hazelcast instances (members or clients) with TLS enabled employing the OpenSSL,
then all of them must have the fipsMode property configured in the same way, either enabled or
disabled.
When the FIPS mode is successfully enabled, you will see the following INFO level message in the log
files:
The FIPS environment configuration steps depend on the used operating system and Java version.
You should consult with their documentation for the specific configurations.
575
We will describe a sample configuration which uses Red Hat Enterprise Linux (RHEL) version 7 and
IBM Java SDK 8. If you find any difference between the sample configuration described here and
the documentation of the OS and Java vendors, use the vendor’s up-to-date instructions instead.
The steps on how to configure RHEL 7 in FIPS 140-2 mode are described in the Security guide on the
Red Hat customer portal.
3. Add the fips=1 option to the kernel command line of the boot loader.
After finishing these steps, check if the FIPS mode is enabled by running the following command:
# Following command should print "crypto.fips_enabled = 1" (value 1 means the FIPS
mode is enabled)
sysctl crypto.fips_enabled
To automate the FIPS mode enablement on RHEL 7, you can check the script which is shared in the
Red Hat discussion forum.
IBM Java 8 provides the FIPS mode itself without any third party dependencies.
Details on how to enable the FIPS 140-2 validated configuration can be found in the Security guide
in the Java 8 documentation.
First, it’s necessary to edit the jre/lib/security/java.security file and do the following changes:
• Put IBMJCEFIPS as the first security provider. It will be the first provider to be selected when a
JCA API call is made without specifying an explicit security provider.
security.provider.1=com.ibm.crypto.fips.provider.IBMJCEFIPS
And re-number the original set of security providers by increasing the priority of provider by
one, i.e., the old security.provider.1 becomes security.provider.2 and so on.
• Add the new security properties (related to handling TLS protected communication):
576
ssl.SocketFactory.provider=com.ibm.jsse2.SSLSocketFactoryImpl
ssl.ServerSocketFactory.provider=com.ibm.jsse2.SSLServerSocketFactoryImpl
The Security provider covering the TLS implementation in IBM Java is IBMJSSE2. To instruct this
provider about using the FIPS validated security primitives (from IBMJCEFIPS), use additional
system properties.
-Dcom.ibm.jsse2.usefipsprovider=true -Dcom.ibm.jsse2.usefipsProviderName=IBMJCEFIPS
22. Performance
This chapter provides information on the performance features of Hazelcast including near cache,
slow operations detector, back pressure and data affinity. Moreover, the chapter describes the best
performance practices for Hazelcast deployed on Amazon EC2. It also describes the threading
models for I/O, events, executors and operations.
22.1. Pipelining
With the pipelining, you can send multiple requests in parallel using a single thread and therefore
can increase throughput. As an example, suppose that the round trip time for a request/response is
1 millisecond. If synchronous requests are used, e.g., IMap.get(), then the maximum throughput out
of these requests from a single thread is 1/001 = 1000 operations/second. One way to solve this
problem is to introduce multithreading to make the requests in parallel. For the same example, if
we would use 2 threads, then the maximum throughput doubles from 1000 operations/second, to
2000 operations/second.
However, introducing threads for the sake of executing requests isn’t always convenient and
doesn’t always lead to an optimal performance; this is where the pipelining can be used. Instead of
using multiple threads to have concurrent invocations, you can use asynchronous method calls
such as IMap.getAsync(). If you would use 2 asynchronous calls from a single thread, then the
maximum throughput is 2*(1/001) = 2000 operations/second. Therefore, to benefit from the
pipelining, asynchronous calls need to be made from a single thread. The pipelining is a
convenience implementation to provide back pressure, i.e., controlling the number of inflight
operations, and it provides a convenient way to wait for all the results.
In the above example, we make 100 asynchronous map.getAsync() calls, but the maximum number
577
of inflight calls is 10.
By increasing the depth of the pipelining, throughput can be increased. The pipelining has its own
back pressure, you do not need to enable the back pressure on the client or member to have this
feature on the pipelining. However, if you have many pipelines, you may still need to enable the
client/member back pressure because it is possible to overwhelm the system with requests in that
situation. See the Back Pressure section to learn how to enable it on the client or member.
You can use the pipelining both on the clients and members. You do not need a special
configuration, it works out-of-the-box.
The pipelining can be used for any asynchronous call. You can use it for IMap asynchronous get/put
methods as well as for ICache, IAtomicLong, etc. It cannot be used as a transaction mechanism
though. So you cannot do some calls and throw away the pipeline and expect that none of the
requests are executed. If you want to use an atomic behavior, see the Transactions chapter. The
pipelining is just a performance optimization, not a mechanism for atomic behavior.
The pipelines are cheap and should frequently be replaced because they accumulate results. It is
fine to have a few hundred or even a few thousand calls being processed with the pipelining.
However, all the responses to all requests are stored in the pipeline as long as the pipeline is
referenced. So if you want to process a huge number of requests, then every few hundred or few
thousand calls wait for the pipelining results and just create a new instance.
Note that the pipelines are not thread-safe. They must be used by a single thread.
22.2.1. PartitionAware
Hazelcast has a standard way of finding out which member owns/manages each key object. The
following operations are routed to the same member, since all of them are operating based on the
same key "key1".
578
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
Map mapA = hazelcastInstance.getMap( "mapA" );
Map mapB = hazelcastInstance.getMap( "mapB" );
Map mapC = hazelcastInstance.getMap( "mapC" );
When the keys are the same, entries are stored on the same member. But we sometimes want to
have related entries stored on the same member, such as a customer and his/her order entries. We
would have a customers map with customerId as the key and an orders map with orderId as the
key. Since customerId and orderId are different keys, a customer and his/her orders may fall into
different members in your cluster. So how can we have them stored on the same member? We
create an affinity between customer and orders. If we make them part of the same partition then
these entries will be co-located. We achieve this by making orderKey s PartitionAware.
579
final class OrderKey implements PartitionAware, Serializable {
@Override
public Object getPartitionKey() {
return customerId;
}
@Override
public String toString() {
return "OrderKey{"
+ "orderId=" + orderId
+ ", customerId=" + customerId
+ '}';
Notice that OrderKey implements PartitionAware and that getPartitionKey() returns the customerId.
These make sure that the Customer entry and its Orders are stored on the same member.
Assume that you have a customers map where customerId is the key and the customer object is the
value. You want to remove one of the customer orders and return the number of remaining orders.
Here is how you would normally do it.
580
public static int removeOrder( long customerId, long orderId ) throws Exception {
IMap<Long, Customer> mapCustomers = instance.getMap( "customers" );
IMap mapOrders = hazelcastInstance.getMap( "orders" );
mapCustomers.lock( customerId );
mapOrders.remove( new OrderKey(orderId, customerId) );
Set orders = orderMap.keySet(Predicates.equal( "customerId", customerId ));
mapCustomers.unlock( customerId );
return orders.size();
}
• There are four distributed operations there: lock, remove, keySet, unlock. Can you reduce the
number of distributed operations?
• The customer object may not be that big, but can you not have to pass that object through the
wire? Think about a scenario where you set order count to the customer object for fast access,
so you should do a get and a put, and as a result, the customer object is passed through the wire
twice.
Instead, why not move the computation over to the member (JVM) where your customer data
resides. Here is how you can do this with distributed executor service.
2. Callable does the deletion of the order right there and returns with the remaining order count.
3. Upon completion of the Callable task, return the result (remaining order count). You do not
have to wait until the task is completed; since distributed executions are asynchronous, you can
do other things in the meantime.
return remainingOrders;
}
581
private long orderId;
private long customerId;
private HazelcastInstance hazelcastInstance;
public OrderDeletionTask() {
}
@Override
public Integer call() {
IMap<Long, Customer> customerMap = hazelcastInstance.getMap("customers");
IMap<OrderKey, Order> orderMap = hazelcastInstance.getMap("orders");
customerMap.lock(customerId);
customerMap.unlock(customerId);
return orderCount;
}
@Override
public Object getPartitionKey() {
return customerId;
}
@Override
public void setHazelcastInstance(HazelcastInstance hazelcastInstance) {
this.hazelcastInstance = hazelcastInstance;
}
}
The following are the benefits of doing the same operation with distributed ExecutorService based
on the key:
582
• less data is sent over the wire
• less lock duration, i.e., higher concurrency, for the Customer entry since lock/update/unlock cycle
is done locally (local to the customer data)
22.2.2. PartitioningStrategy
Another way of storing the related data on the same location is using/implementing the class
PartitioningStrategy. Normally (if no partitioning strategy is defined), Hazelcast finds the partition
of a key first by converting the object to binary and then by hashing this binary. If a partitioning
strategy is defined, Hazelcast injects the key to the strategy and the strategy returns an object out of
which the partition is calculated by hashing it.
• StringPartitioningStrategy: Works only for string keys. It uses the string after @ character as the
partition ID. For example, if you have two keys ordergroup1@region1 and customergroup1@region1,
both ordergroup1 and customergroup1 fall into the partition where region1 is located.
Following are the example configuration snippets. Note that these strategy configurations are per
map.
Declarative Configuration:
<hazelcast>
...
<map name="name-of-the-map">
<partition-strategy>
com.hazelcast.partition.strategy.StringAndPartitionAwarePartitioningStrategy
</partition-strategy>
</map>
...
</hazelcast>
Programmatic Configuration:
583
Config config = new Config();
MapConfig mapConfig = config.getMapConfig("name-of-the-map");
PartitioningStrategyConfig psConfig = mapConfig.getPartitioningStrategyConfig();
psConfig.setPartitioningStrategyClass( "StringAndPartitionAwarePartitioningStrategy"
);
// OR
psConfig.setPartitioningStrategy(YourCustomPartitioningStrategy);
...
You can also define your own partition strategy by implementing the class PartitioningStrategy. To
enable your implementation, add the full class name to your Hazelcast configuration using either
the declarative or programmatic approach, as exemplified above.
As stated previously, above strategies are defined per map in your Hazelcast member. You can also
define a strategy per member in your Hazelcast cluster. This can be done by defining the
hazelcast.partitioning.strategy.class system property. An example declarative way of configuring
this property is shown below:
<hazelcast>
...
<properties>
<property name="hazelcast.partitioning.strategy.class">
com.hazelcast.partition.strategy.StringAndPartitionAwarePartitioningStrategy
</property>
</properties>
...
</hazelcast>
You can also use other system property configuring options as explained in the Configuring with
System Properties section.
• Select at least a system with 8 vCPUs, e.g., c4.2xlarge. For an overview of all types of EC2
instances, please check this web page.
584
• Consider setting a placement group.
• Asynchronous calls: With async calls, the system may be flooded with the requests.
To prevent the system from crashing, Hazelcast provides back pressure. Back pressure works by:
Back pressure is disabled by default and you can enable it using the following system property:
hazelcast.backpressure.enabled
To control the number of concurrent invocations, you can configure the number of invocations
allowed per partition using the following system property:
hazelcast.backpressure.max.concurrent.invocations.per.partition
The default value of this system property is 100. Using a default configuration a system is allowed to
have (271 + 1) * 100 = 27200 concurrent invocations (271 partitions + 1 for generic operations).
Back pressure is only applied to normal operations. System operations like heart beats and
partition migration operations are not influenced by back pressure. 27200 invocations might seem
like a lot, but keep in mind that executing a task on IExecutor or acquiring a lock also requires an
operation.
If the maximum number of invocations has been reached, Hazelcast automatically applies an
exponential backoff policy. This gives the system some time to deal with the load. Using the
following system property, you can configure the maximum time to wait before a
HazelcastOverloadException is thrown:
hazelcast.backpressure.backoff.timeout.millis
The Health Monitor keeps an eye on the usage of the invocations. If it sees a member has consumed
70% or more of the invocations, it starts to log health messages.
585
Apart from controlling the number of invocations, you also need to control the number of pending
async backups. This is done by periodically making these backups sync instead of async. This forces
all pending backups to get drained. For this, Hazelcast tracks the number of asynchronous backups
for each partition. At every Nth call, one synchronization is forced. This N is controlled through the
following property:
hazelcast.backpressure.syncwindow
This system property’s default value is 100. It means, out of 100 asynchronous backups, Hazelcast
makes 1 of them a synchronous one. A randomization is added, so the sync window with default
configuration is between 75 and 125 invocations.
To prevent the system on the client side from overloading, you can apply a constraint on the
number of concurrent invocations. You can use the following system property on the client side for
this purpose:
hazelcast.client.max.concurrent.invocations
This property defines the maximum allowed number of concurrent invocations. When it is not
explicitly set, it has the value Integer.MAX_VALUE by default, which means infinite. When you set it
and if the maximum number of concurrent invocations is exceeded this value, Hazelcast throws
HazelcastOverloadException when a new invocation comes in.
Please note that back off timeout and controlling the number of pending async backups (sync
window) is not supported on the client side.
See the System Properties appendix to learn how to configure the system
properties.
Hazelcast uses a pool of threads for I/O. A single thread does not perform all the I/O. Instead,
multiple threads perform the I/O. On each cluster member, the I/O threading is split up in 3 types of
I/O threads:
You can configure the number of I/O threads using the hazelcast.io.thread.count system property.
Its default value is 3 per member. If 3 is used, in total there are 7 I/O threads: 1 accept I/O thread, 3
read I/O threads and 3 write I/O threads. Each I/O thread has its own Selector instance and waits on
586
the Selector.select if there is nothing to do.
You can also specify counts for input and output threads separately. There are
hazelcast.io.input.thread.count and hazelcast.io.output.thread.count properties
for this purpose. See the System Properties appendix for information on these
properties and how to set them.
Hazelcast periodically scans utilization of each I/O thread and can decide to migrate a connection to
a new thread if the existing thread is servicing a disproportionate number of I/O events. You can
customize the scanning interval by configuring the hazelcast.io.balancer.interval.seconds system
property; its default interval is 20 seconds. You can disable the balancing process by setting this
property to a negative value.
In case of the read I/O thread, when sufficient bytes for a packet have been received, the Packet
object is created. This Packet object is then sent to the system where it is de-multiplexed. If the
Packet header signals that it is an operation/response, the Packet is handed over to the operation
service (see the Operation Threading section). If the Packet is an event, it is handed over to the
event service (see the Event Threading section).
Hazelcast uses a shared event system to deal with components that rely on events, such as topic,
collections, listeners and Near Cache.
Each cluster member has an array of event threads and each thread has its own work queue. When
an event is produced, either locally or remotely, an event thread is selected (depending on if there
is a message ordering) and the event is placed in the work queue for that event thread.
You can set the following properties to alter the system’s behavior:
If you process a lot of events and have many cores, changing the value of
hazelcast.event.thread.count property to a higher value is a good practice. This way, more events
can be processed in parallel.
Multiple components share the same event queues. If there are 2 topics, say A and B, for certain
messages they may share the same queue(s) and hence the same event thread. If there are a lot of
pending messages produced by A, then B needs to wait. Also, when processing a message from A
takes a lot of time and the event thread is used for that, B suffers from this. That is why it is better
to offload processing to a dedicated thread (pool) so that systems are better isolated.
If the events are produced at a higher rate than they are consumed, the queue grows in size. To
prevent overloading the system and running into an OutOfMemoryException, the queue is given a
capacity of 1 million items. When the maximum capacity is reached, the items are dropped. This
means that the event system is a 'best effort' system. There is no guarantee that you are going to get
587
an event. Topic A might have a lot of pending messages and therefore B cannot receive messages
because the queue has no capacity and messages for B are dropped.
Executor threading is straight forward. When a task is received to be executed on Executor E, then
E will have its own ThreadPoolExecutor instance and the work is placed in the work queue of this
executor. Thus, Executors are fully isolated, but still share the same underlying hardware - most
importantly the CPUs.
You can configure the IExecutor using the ExecutorConfig (programmatic configuration) or using
<executor> (declarative configuration). See also the Configuring Executor Service section.
Each of these operation types has a different threading model explained in the following sections.
Partition-aware Operations
To execute partition-aware operations, an array of operation threads is created. The default value
of this array’s size is the number of cores and it has a minimum value of 2. This value can be
changed using the hazelcast.operation.thread.count property.
Each operation thread has its own work queue and it consumes messages from this work queue. If
a partition-aware operation needs to be scheduled, the right thread is found using the formula
below.
After the threadIndex is determined, the operation is put in the work queue of that operation
thread. This means the followings:
• A single operation thread executes operations for multiple partitions; if there are 271 partitions
and 10 partition threads, then roughly every operation thread executes operations for 27
partitions.
• Each partition belongs to only 1 operation thread. All operations for a partition are always
handled by exactly the same operation thread.
• Concurrency control is not needed to deal with partition-aware operations because once a
partition-aware operation is put in the work queue of a partition-aware operation thread, only 1
thread is able to touch that partition.
Because of this threading strategy, there are two forms of false sharing you need to be aware of:
• False sharing of the partition - two completely independent data structures share the same
588
partition. For example, if there is a map employees and a map orders, the method
employees.get("peter") running on partition 25 may be blocked by the method orders.get(1234)
also running on partition 25. If independent data structures share the same partition, a slow
operation on one data structure can slow down the other data structures.
• False sharing of the partition-aware operation thread - each operation thread is responsible for
executing operations on a number of partitions. For example, thread 1 could be responsible for
partitions 0, 10, 20, etc. and thread-2 could be responsible for partitions 1, 11, 21, etc. If an
operation for partition 1 takes a lot of time, it blocks the execution of an operation for partition
11 because both of them are mapped to the same operation thread.
You need to be careful with long running operations because you could starve operations of a
thread. As a general rule, the partition thread should be released as soon as possible because
operations are not designed as long running operations. That is why, for example, it is very
dangerous to execute a long running operation using AtomicReference.alter() or an
IMap.executeOnKey(), because these operations block other operations to be executed.
Currently, there is no support for work stealing. Different partitions that map to the same thread
may need to wait till one of the partitions is finished, even though there are other free partition-
aware operation threads available.
Example:
Take a cluster with three members. Two members have 90 primary partitions and one member has
91 primary partitions. Let’s say you have one CPU and four cores per CPU. By default, four
operation threads will be allocated to serve 90 or 91 partitions.
Non-Partition-aware Operations
A non-partition-aware operation thread does not execute an operation for a specific partition. Only
partition-aware operation threads execute partition-aware operations.
Unlike the partition-aware operation threads, all the generic operation threads share the same
work queue: genericWorkQueue.
If a non-partition-aware operation needs to be executed, it is placed in that work queue and any
generic operation thread can execute it. The big advantage is that you automatically have work
balancing since any generic operation thread is allowed to pick up work from this queue.
The disadvantage is that this shared queue can be a point of contention. You may not see this
contention in production since performance is dominated by I/O and the system does not run many
non-partition-aware operations.
589
Priority Operations
In some cases, the system needs to run operations with a higher priority, e.g., an important system
operation. To support priority operations, Hazelcast has the following features:
• For partition-aware operations: Each partition thread has its own work queue and it also has a
priority work queue. The partition thread always checks the priority queue before it processes
work from its normal work queue.
Since a worker thread blocks on the normal work queue (either partition specific or generic), a
priority operation may not be picked up because it is not put in the queue where it is blocking.
Hazelcast always sends a 'kick the worker' operation that only triggers the worker to wake up and
check the priority queue.
When an Operation is invoked, a Future is returned. See the example code below.
The calling side blocks for a reply. In this case, GetOperation is set in the work queue for the
partition of key, where it eventually is executed. Upon execution, a response is returned and placed
on the genericWorkQueue where it is executed by a "generic operation thread". This thread signals the
future and notifies the blocked thread that a response is available. Hazelcast has a plan of exposing
this future to the outside world, and we will provide the ability to register a completion listener so
you can perform asynchronous calls.
Local Calls
When a local partition-aware call is done, an operation is made and handed over to the work queue
of the correct partition operation thread, and a future is returned. When the calling thread calls get
on that future, it acquires a lock and waits for the result to become available. When a response is
calculated, the future is looked up and the waiting thread is notified.
In the future, this will be optimized to reduce the amount of expensive systems calls, such as
lock.acquire()/notify() and the expensive interaction with the operation-queue. Probably, we will
add support for a caller-runs mode, so that an operation is directly run on the calling thread.
22.6. SlowOperationDetector
The SlowOperationDetector monitors the operation threads and collects information about all slow
operations. An Operation is a task executed by a generic or partition thread (see Operation
590
Threading). An operation is considered as slow when it takes more computation time than the
configured threshold.
The SlowOperationDetector stores the fully qualified classname of the operation and its stacktrace as
well as operation details, start time and duration of each slow invocation. All collected data is
available in the Management Center.
• hazelcast.slow.operation.detector.enabled
• hazelcast.slow.operation.detector.log.purge.interval.seconds
• hazelcast.slow.operation.detector.log.retention.seconds
• hazelcast.slow.operation.detector.stacktrace.logging.enabled
• hazelcast.slow.operation.detector.threshold.millis
The detected slow operations are logged as warnings in the Hazelcast log files:
Stacktraces are always reported to the Management Center, but by default they are not printed to
keep the log size small. If logging of stacktraces is enabled, the full stacktrace is printed every 100
invocations. All other invocations print a shortened version.
Since a Hazelcast cluster can run for a very long time, Hazelcast purges the slow operation logs
periodically to prevent an OOME. You can configure the purge interval and the retention time for
each invocation.
The purging removes each invocation whose retention time is exceeded. When all invocations are
purged from a slow operation log, the log is deleted.
591
not have local data at all. Suppose you read the key k a number of times from a Hazelcast client or k
is owned by another member in your cluster. Then each map.get(k) or cache.get(k) will be a remote
operation, which creates a lot of network trips. If you have a data structure that is mostly read, then
you should consider creating a local Near Cache, so that reads are sped up and less network traffic
is created.
These benefits do not come for free. See the following trade-offs:
• Members with a Near Cache has to hold the extra cached data, which increases memory
consumption.
• If invalidation is enabled and entries are updated frequently, then invalidations will be costly.
• Near Cache breaks the strong consistency guarantees; you might be reading stale data.
Near Cache is highly recommended for data structures that are mostly read.
In a client/server system you must enable the Near Cache separately on the client, without the need
to configure it on the server. Please note that Near Cache configuration is specific to the server or
client itself: a data structure on a server may not have Near Cache configured while the same data
structure on a client may have Near Cache configured. They also can have different Near Cache
configurations.
If you are using Near Cache, you should take into account that your hits to the keys in the Near
Cache are not reflected as hits to the original keys on the primary members. This has for example
an impact on IMap’s maximum idle seconds or time-to-live seconds expiration. Therefore, even
though there is a hit on a key in Near Cache, your original key on the primary member may expire.
Near Cache works only when you access data via map.get(k) or cache.get(k)
methods. Data returned using a predicate is not stored in the Near Cache.
The following matrix shows the Hazelcast data structures with Near Cache support. Please have a
look at the next section for a detailed explanation of cache-local-entries, local-update-policy,
preloader and serialize-keys.
592
Data structure Near Cache cache-local- local-update- preloader serialize-keys
Support entries policy
Transactional limited no no no no
Map member
Transactional no no no no no
Map client
Even though lite members do not store any data for Hazelcast data structures, you
can enable Near Cache on lite members for faster reads.
The following shows the configuration for the Hazelcast Near Cache.
Please keep in mind that, if you want to use near cache on a Hazelcast member,
configure it on the member; if you want to use it on a Hazelcast client, configure it
on the client.
Declarative Configuration:
<hazelcast>
...
<near-cache name="myDataStructure">
<in-memory-format>(OBJECT|BINARY|NATIVE)</in-memory-format>
<serialize-keys>(true|false)</serialize-keys>
<invalidate-on-change>(true|false)</invalidate-on-change>
<time-to-live-seconds>(0..INT_MAX)</time-to-live-seconds>
<max-idle-seconds>(0..INT_MAX)</max-idle-seconds>
<eviction eviction-policy="(LRU|LFU|RANDOM|NONE)"
max-size-policy="(ENTRY_COUNT
|USED_NATIVE_MEMORY_SIZE|USED_NATIVE_MEMORY_PERCENTAGE
|FREE_NATIVE_MEMORY_SIZE|FREE_NATIVE_MEMORY_PERCENTAGE"
size="(0..INT_MAX)"/>
<cache-local-entries>(false|true)</cache-local-entries>
<local-update-policy>(INVALIDATE|CACHE_ON_UPDATE)</local-update-policy>
<preloader enabled="(true|false)"
directory="nearcache-example"
store-initial-delay-seconds="(0..INT_MAX)"
store-interval-seconds="(0..INT_MAX)"/>
</near-cache>
...
</hazelcast>
The element <near-cache> has an optional attribute name whose default value is default.
Programmatic Configuration:
593
EvictionConfig evictionConfig = new EvictionConfig()
.setMaxSizePolicy(MaxSizePolicy.ENTRY_COUNT)
.setEvictionPolicy(EvictionPolicy.LRU)
.setSize( 1 );
The class NearCacheConfig is used for all supported Hazelcast data structures on members and
clients.
The following are the descriptions of all configuration elements and attributes:
• in-memory-format: Specifies in which format data is stored in your Near Cache. Note that a map’s
in-memory format can be different from that of its Near Cache. Available values are as follows:
◦ NATIVE: Data is stored in the Near Cache that uses Hazelcast’s High-Density Memory Store
feature. This option is available only in Hazelcast IMDG Enterprise HD. Note that a map and
its Near Cache can independently use High-Density Memory Store. For example, while your
map does not use High-Density Memory Store, its Near Cache can use it.
• serialize-keys: Specifies if the keys of a Near Cache entry should be serialized or not.
Serializing the keys has a big impact on the read performance of the Near Cache. It should just
be activated when you have mutable keys, which are changed after use for the Near Cache. Its
default value is false.
• invalidate-on-change: Specifies whether the cached entries are evicted when the entries are
updated or removed. Its default value is true.
• time-to-live-seconds: Maximum number of seconds for each entry to stay in the Near Cache.
Entries that are older than this period are automatically evicted from the Near Cache.
Regardless of the eviction policy used, time-to-live-seconds still applies. Any integer between 0
and Integer.MAX_VALUE. 0 means infinite. Its default value is 0.
594
• max-idle-seconds: Maximum number of seconds each entry can stay in the Near Cache as
untouched (not read). Entries that are not read more than this period are removed from the
Near Cache. Any integer between 0 and Integer.MAX_VALUE. 0 means Integer.MAX_VALUE. Its
default value is 0.
• eviction: Specifies the eviction behavior when you use High-Density Memory Store for your
Near Cache. It has the following attributes:
▪ NONE: No items are evicted and the property max-size is ignored. You still can combine it
with time-to-live-seconds and max-idle-seconds to evict items from the Near Cache.
◦ max-size-policy: Maximum size policy for eviction of the Near Cache. Available values are as
follows:
▪ ENTRY_COUNT: Maximum size based on the entry count in the Near Cache (default value).
▪ USED_NATIVE_MEMORY_SIZE: Maximum used native memory size of the specified Near Cache
in MB to trigger the eviction. If the used native memory size exceeds this threshold, the
eviction is triggered. Available only for NATIVE in-memory format. This is supported only
by Hazelcast IMDG Enterprise.
▪ FREE_NATIVE_MEMORY_SIZE: Minimum free native memory size of the specified Near Cache
in MB to trigger the eviction. If free native memory size goes below this threshold,
eviction is triggered. Available only for NATIVE in-memory format. This is supported only
by Hazelcast IMDG Enterprise.
◦ size: Maximum size of the Near Cache used for max-size-policy. When this is reached the
Near Cache is evicted based on the policy defined. Any integer between 1 and
Integer.MAX_VALUE. This value has different defaults, depending on the data structure.
▪ IMap: Its default value is Integer.MAX_VALUE for on-heap maps and 10000 for the NATIVE in-
memory format.
• cache-local-entries: Specifies whether the local entries are cached. It can be useful when in-
memory format for Near Cache is different from that of the map. By default, it is disabled. Is just
available on Hazelcast members, not on Hazelcast clients (which have no local entries).
595
• local-update-policy: Specifies the update policy of the local Near Cache. It is available on JCache
clients. Available values are as follows:
◦ INVALIDATE: Removes the Near Cache entry on mutation. After the mutative call to the
member completes but before the operation returns to the caller, the Near Cache entry is
removed. Until the mutative operation completes, the readers still continue to read the old
value. But as soon as the update completes the Near Cache entry is removed. Any threads
reading the key after this point will have a Near Cache miss and call through to the member,
obtaining the new entry. This setting provides read-your-writes consistency. This is the
default setting.
◦ CACHE_ON_UPDATE: Updates the Near Cache entry on mutation. After the mutative call to the
member completes but before the put returns to the caller, the Near Cache entry is updated.
So a remove will remove it and one of the put methods will update it to the new value. Until
the update/remove operation completes, the entry’s old value can still be read from the Near
Cache. But before the call completes the Near Cache entry is updated. Any threads reading
the key after this point read the new entry. If the mutative operation was a remove, the key
will no longer exist in the cache, both the Near Cache and the original copy in the member.
The member initiates an invalidate event to any other Near Caches, however the caller Near
Cache is not invalidated as it already has the new value. This setting also provides read-
your-writes consistency.
• preloader: Specifies if the Near Cache should store and pre-load its keys for a faster re-
population after a Hazelcast client restart. Is just available on IMap and JCache clients. It has the
following attributes:
◦ enabled: Specifies whether the preloader for this Near Cache is enabled or not, true or false.
◦ directory: Specifies the parent directory for the preloader of this Near Cache. The filenames
for the preloader storage are generated from the Near Cache name. You can additionally
specify the parent directory to have multiple clients on the same machine with the same
Near Cache names.
◦ store-initial-delay-seconds: Specifies the delay in seconds until the keys of this Near Cache
are stored for the first time. Its default value is 600 seconds.
◦ store-interval-seconds: Specifies the interval in seconds in which the keys of this Near
Cache are stored. Its default value is 600 seconds.
This section shows some configuration examples for different Hazelcast data structures.
The following are configuration examples for IMap Near Caches for Hazelcast members and clients.
596
<hazelcast>
...
<map name="mostlyReadMap">
<in-memory-format>BINARY</in-memory-format>
<near-cache>
<in-memory-format>OBJECT</in-memory-format>
<invalidate-on-change>false</invalidate-on-change>
<time-to-live-seconds>600</time-to-live-seconds>
<eviction eviction-policy="NONE" max-size-policy="ENTRY_COUNT" size="5000
"/>
<cache-local-entries>true</cache-local-entries>
</near-cache>
</map>
...
</hazelcast>
The Near Cache configuration for maps on members is a child of the map configuration, so you do
not have to define the map name in the Near Cache configuration.
<hazelcast-client>
...
<near-cache name="mostlyReadMap">
<in-memory-format>OBJECT</in-memory-format>
<invalidate-on-change>true</invalidate-on-change>
<eviction eviction-policy="LRU" max-size-policy="ENTRY_COUNT" size="50000"/>
</near-cache>
...
</hazelcast-client>
597
EvictionConfig evictionConfig = new EvictionConfig()
.setEvictionPolicy(EvictionPolicy.LRU)
.setMaximumSizePolicy(MaxSizePolicy.ENTRY_COUNT)
.setSize(50000);
The Near Cache on the client side must have the same name as the data structure on the member
for which this Near Cache is being created. You can use wildcards, so in this example mostlyRead*
would also match the map mostlyReadMap.
A Near Cache can have its own in-memory-format which is independent of the in-memory-format of
the data structure.
The following is a configuration example for a JCache Near Cache for a Hazelcast client.
<hazelcast-client>
...
<near-cache name="mostlyReadCache">
<in-memory-format>OBJECT</in-memory-format>
<invalidate-on-change>true</invalidate-on-change>
<eviction eviction-policy="LRU" max-size-policy="ENTRY_COUNT" size="30000"/>
<local-update-policy>CACHE_ON_UPDATE</local-update-policy>
</near-cache>
...
</hazelcast-client>
598
EvictionConfig evictionConfig = new EvictionConfig()
.setEvictionPolicy(EvictionPolicy.LRU)
.setMaximumSizePolicy(MaxSizePolicy.ENTRY_COUNT)
.setSize(30000);
The following is a configuration example for an IMap High-Density Near Cache for a Hazelcast
member.
<hazelcast>
...
<map name="mostlyReadMapWithHighDensityNearCache">
<in-memory-format>OBJECT</in-memory-format>
<near-cache>
<in-memory-format>NATIVE</in-memory-format>
<eviction eviction-policy="LFU" max-size-policy=
"USED_NATIVE_MEMORY_PERCENTAGE" size="90"/>
</near-cache>
</map>
...
</hazelcast>
599
EvictionConfig evictionConfig = new EvictionConfig()
.setEvictionPolicy(EvictionPolicy.LFU)
.setMaximumSizePolicy(MaxSizePolicy.USED_NATIVE_MEMORY_PERCENTAGE)
.setSize(90);
Keep in mind that you should have already enabled the High-Density Memory Store usage for your
cluster. See the Configuring High-Density Memory Store section.
Note that a map and its Near Cache can independently use High-Density Memory Store. For
example, if your map does not use High-Density Memory Store, its Near Cache can still use it.
In the scope of Near Cache, eviction means evicting (clearing) the entries selected according to the
given eviction-policy when the specified max-size-policy has been reached.
The max-size-policy defines the state when the Near Cache is full and determines whether the
eviction should be triggered. The size is either interpreted as entry count, memory size or
percentage, depending on the chosen policy.
Once the eviction is triggered the configured eviction-policy determines which, if any, entries must
be evicted.
• In the background: there is an expiration task that periodically (currently 5 seconds) scans
records and evicts the expired records.
600
22.7.6. Near Cache Invalidation
Invalidation is the process of removing an entry from the Near Cache when its value is updated or
it is removed from the original data structure (to prevent stale reads). Near Cache invalidation
happens asynchronously at the cluster level, but synchronously at the current member. This means
that the Near Cache is invalidated within the whole cluster after the modifying operation is
finished, but updated from the current member before the modifying operation is done. A
modifying operation can be an EntryProcessor, an explicit update or remove as well as an
expiration or eviction. Generally, whenever the state of an entry changes in the record store by
updating its value or removing it, the invalidation event is sent for that entry.
Invalidations can be sent from members to client Near Caches or to member Near Caches, either
individually or in batches. Default behavior is sending in batches. If there are lots of mutating
operations such as put/remove on data structures, it is advised that you configure batch
invalidations. This reduces the network traffic and keeps the eventing system less busy, but may
increase the delay of individual invalidations.
You can use the following system properties to configure the Near Cache invalidation:
If there are a lot of clients or many mutating operations, batching should remain enabled and the
batch size should be configured with the hazelcast.map.invalidation.batch.size system property to
a suitable value.
Eventual Consistency
Near Caches are invalidated by invalidation events. Invalidation events can be lost due to the fire-
and-forget fashion of eventing system. If an event is lost, reads from Near Cache can indefinitely be
stale.
To solve this problem, starting with Hazelcast 3.8, Hazelcast provides eventually consistent
behavior for IMap/JCache Near Caches by detecting invalidation losses. After detection of an
invalidation loss, stale data is made unreachable and Near Cache’s get calls to that data are directed
to the underlying IMap/JCache to fetch the fresh data.
You can configure eventual consistency with the system properties below (same properties are
valid for both member and client side Near Caches):
601
• hazelcast.invalidation.reconciliation.interval.seconds: Default value is 60 seconds. This is a
periodic task that scans cluster members periodically to compare generated invalidation events
with the received ones from Near Cache.
For local invalidations, when a record is updated/removed, future reads will see this
update/remove to provide read-your-writes consistency. To achieve this consistency, Near Cache
configuration provides the following update policies:
• INVALIDATE
• CACHE_ON_UPDATE
If you choose INVALIDATE, the entry is removed from the Near Cache after the update/remove occurs
in the underlying data structure and before the operation (get) returns to the caller. Until the
update/remove operation completes, the entry’s old value can still be read from the Near Cache.
If you choose CACHE_ON_UPDATE, the entry is updated after the update/remove occurs in the
underlying data structure and before the operation (put/get) returns to the caller. If it is an update
operation, it removes the entry and the new value is placed. Until the update/remove operation
completes, the entry’s old value can still be read from the Near Cache. Any threads reading the key
after this point read the new entry. If the mutative operation was a remove, the key will no longer
exist in the Near Cache and the original copy in the member.
The Near Cache preloader is a functionality to store the keys from a Near Cache to provide a fast re-
population of the previous hot data set after a Hazelcast Client has been restarted. It is available on
IMap and JCache clients.
The Near Cache preloader stores the keys (not the values) of Near Cache entries in regular intervals.
You can define the initial delay via store-initial-delay-seconds, e.g., if you know that your hot data
set needs some time to build up. You can configure the interval via store-interval-seconds which
determines how often the key-set is stored. The persistence does not run continuously. Whenever
the storage is scheduled, it is performed on the actual keys in the Near Cache.
The Near Cache preloader is triggered on the first initialization of the data structure on the client,
e.g., client.getMap("myNearCacheMap"). This schedules the preloader, which works in the
background, so your application is not blocked. The storage is enabled after the loading is
completed.
The configuration parameter directory is optional. If you omit it, the base folder becomes the user
working directory (normally where the JVM was started or configured with the system property
user.dir). The storage filenames are always created from the Near Cache name. So even if you use a
wildcard name in the Near Cache Configuration, the preloader filenames are unique.
602
If you run multiple Hazelcast clients with enabled Near Cache preloader on the
same machine, you have to configure a unique storage filename for each client or
run them from different user directories. If two clients would write into the same
file, only the first client succeeds. The following clients throw an exception as soon
as the Near Cache preloader is triggered.
<hazelcast>
...
<map name="myMap">
<in-memory-format>BINARY</in-memory-format>
<cache-deserialized-values>INDEX-ONLY</cache-deserialized-values>
<backup-count>1</backup-count>
</map>
...
</hazelcast>
The cache-deserialized-values element controls the caching of deserialized values. Note that
caching makes the query evaluation faster, but it consumes more memory. This element has the
following values:
• INDEX-ONLY: Deserialized values are cached only when they are inserted into an index.
If you are using portable serialization or your map’s in-memory format is OBJECT or NATIVE, then
cache-deserialized-values element does not have any effect.
This section covers various recommendations to improve the performance of your Hazelcast IMDG
clusters.
603
Using Operation Threads Efficiently
By default, Hazelcast uses the machine’s core count to determine the number of operation threads.
Creating more operation threads than this core count is highly unlikely leads to an improved
performance since there will be more context switching, more thread notification, etc.
Especially if you have a system that does simple operations like put and get, it is better to use a
lower thread count than the number of cores. The reason behind the increased performance by
reducing the core count is that the operations executed on the operation threads normally execute
very fast and there can be a very significant amount of overhead caused by thread parking and
unparking. If there are less threads, a thread needs to do more work, will block less and therefore
needs to be notified less.
Tweaking can be very rewarding because significant performance improvements are possible. By
default, Hazelcast tries to behave at its best for all situations, but this doesn’t always lead to the best
performance. So if you know what you are doing and what to look for, it can be very rewarding to
tweak. However it is also important that tweaking should be done with proper testing to see if there
is actually an improvement. Tweaking without proper benchmarking is likely going to lead to
confusion and could cause all kinds of problems. In case of doubt, we recommend not to tweak.
When benchmarking, it is important that the benchmark reflects your production environment.
Sometimes with calculated guess, a representative smaller environment can be set up; but if you
want to use the benchmark statistics to inference how your production system is going to behave,
you need to make sure that you get as close as your production setup as possible. Otherwise, you
are at risk of spotting the issues too late or focusing on the things which are not relevant.
24. WAN
Hazelcast IMDG Enterprise Feature
604
24.1. WAN Replication
There could be cases where you need to synchronize multiple Hazelcast clusters to the same state.
Hazelcast WAN Replication allows you to keep multiple Hazelcast clusters in sync by replicating
their state over WAN environments such as the Internet.
Imagine you have different data centers in New York, London and Tokyo each running an
independent Hazelcast cluster. Every cluster would be operating at native speed in their own LAN
(Local Area Network), but you also want some or all record sets in these clusters to be replicated to
each other: updates in the Tokyo cluster should also replicate to London and New York and updates
in the New York cluster are to be synchronized to the Tokyo and London clusters.
This section explains how you can replicate the state of your clusters over Wide Area Network
(WAN) through Hazelcast WAN Replication.
You can download the white paper Hazelcast on AWS: Best Practices for
Deployment at Hazelcast.com.
To be able to use the REST calls related to WAN Replication mentioned in the
following sections, you need to enable the WAN REST endpoint group. See the Using
the REST Endpoint Groups section for details.
• Active-Passive: This mode is mostly used for failover scenarios where you want to replicate an
active cluster to one or more passive clusters, for the purpose of maintaining a backup.
• Active-Active: Every cluster is equal, each cluster replicates to all other clusters. This is
normally used to connect different clients to different clusters for the sake of the shortest path
between client and server.
There are two different ways of defining the WAN replication endpoints:
• Static endpoints
• Discovery SPI
You can use at most one of these when defining a single WAN publisher.
Below is an example of declarative configuration of WAN Replication from New York cluster to
target the London cluster:
605
<hazelcast>
...
<wan-replication name="my-wan-cluster-batch">
<batch-publisher>
<cluster-name>london</cluster-name>
<queue-full-behavior>THROW_EXCEPTION</queue-full-behavior>
<queue-capacity>1000</queue-capacity>
<batch-size>500</batch-size>
<batch-max-delay-millis>1000</batch-max-delay-millis>
<snapshot-enabled>false</snapshot-enabled>
<response-timeout-millis>60000</response-timeout-millis>
<acknowledge-type>ACK_ON_OPERATION_COMPLETE</acknowledge-type>
<target-endpoints>10.3.5.1:5701, 10.3.5.2:5701</target-endpoints>
<discovery-period-seconds>20</discovery-period-seconds>
</batch-publisher>
<consumer>
<class-name>
com.hazelcast.enterprise.wan.replication.YourWanConsumer</class-name>
<persist-wan-replicated-data>false</persist-wan-replicated-data>
<properties>
<property name="host">192.168.2.66</property>
<property name="vpn.name">YOUR_VPN_NAME</property>
<property name="username">admin</property>
<property name="password">YOUR_PASSWORD</property>
<property name="queue.name">Q/hz/clusterA</property>
</properties>
</consumer>
</wan-replication>
...
</hazelcast>
The wan-replication configuration element has the following sub-elements and attributes:
• name: Name of your WAN replication scheme. This name is referenced in IMap or ICache
configuration when you add WAN Replication for these data structures (using the element wan-
replication-ref in the configuration of IMap or ICache).
• batch-publisher: WAN publisher which uses the built-in WAN replication implementation in the
Hazelcast Enterprise Edition. It waits until a batch size is reached or a delay time is passed.
• consumer: Configuration for processing the WAN events received from a target cluster. You can
configure a certain behavior when processing the incoming WAN events or even configure your
own implementation for a WAN consumer. A custom WAN consumer allows you to define
custom processing logic and is usually used in combination with a custom WAN publisher. A
custom consumer is optional and you may simply omit defining it which causes the default
processing logic to be used.
A single WAN replication scheme can contain multiple batch-publisher and custom-publisher WAN
replication publishers.
606
The batch-publisher element is used to configure the built-in WAN replication implementation in
the Hazelcast Enterprise Edition. It defines how to connect to the target cluster and how WAN
events are sent to a specific target endpoint. As mentioned above, just before the configuration
example, the target endpoints can be a different cluster defined by static IPs or discovered using a
cloud discovery mechanism.
• cluster-name: Sets the cluster name used as an endpoint cluster name for authentication on the
target endpoint. If there is no separate publisher ID element defined, this cluster name is also
used as a WAN publisher ID. This ID is then used for identifying the publisher in a WAN
replication scheme. It is mandatory to set this attribute.
• publisher-id: Sets the publisher ID used to identify the publisher in a WAN replication scheme.
Setting this ID may be useful when the wan-replication element contains multiple WAN
publishers and the cluster names are not unique for all of the WAN replication publishers in a
single WAN replication scheme. It is optional to set this attribute. If this ID is not specified, the
cluster-name is used as a publisher ID.
• batch-size: Changes the maximum size of events that are sent to the target cluster in a single
batch. The batch of events is not sent until this size is reached or enough time has elapsed. Its
default value is 500. See the Batch Size section.
• batch-max-delay-millis: If the number of events generated does not reach the batch-size, they
are sent to the target cluster after a certain amount of time is passed. You can set this time in
milliseconds using this element. Its default value is 1000 milliseconds. See the Batch Maximum
Delay section.
• response-timeout-millis: After a replication event batch is sent to the target cluster, the source
member waits for a confirmation that says the event has reached the target. If confirmation is
not received for a period of response-timeout-millis, the event is resent to the target cluster.
The default value is 60000 milliseconds. See the Response Timeout section.
• acknowledge-type: Acknowledgment type for each target cluster when the events are replicated.
See the Acknowledgment Types section.
• snapshot-enabled: Sets if key-based coalescing is configured for this WAN publisher. When
enabled, only the latest event for a key is sent to target. Its default value is false.
• idle-max-park-ns: Sets the maximum duration in nanoseconds that the WAN replication thread
is parked if there are no events to replicate.
• idle-min-park-ns: Sets the minimum duration in nanoseconds that the WAN replication thread
is parked if there are no events to replicate.
• max-concurrent-invocations: Sets the maximum number of WAN event batches being sent to the
target cluster concurrently. Setting this element to anything less than 2 only allows a single
batch of events to be sent to each target endpoint and will maintain causality of events for a
single partition.
Setting this element to 2 or higher allows multiple batches of WAN events to be sent to each
target endpoint. Since this allows reordering or batches due to network conditions, causality
and ordering of events for a single partition is lost and batches for a single partition are now
sent randomly to any available target endpoint.
607
This, however, does present faster WAN replication for certain scenarios such as replicating
immutable, independent map entries which are only added once and where ordering of when
these entries are added is not necessary.
Keep in mind that if you set this element to a value which is less than the target endpoint count,
you will lose performance as not all target endpoints are used at any point in time to process the
WAN event batches. So, for instance, if you have a target cluster with 3 members (target
endpoints) and you want to use this element, it makes sense to set it to a value higher than 3.
Otherwise, you can simply disable it by setting it to less than 2 in which case WAN will use the
default replication strategy and adapt to the target endpoint count while maintaining causality.
• queue-full-behavior: Policy to be applied when the WAN Replication event queues are full. See
the Queue Full Behavior section.
• queue-capacity: Size of the queue of events. Its default value is 10000. If you exceed this queue
size, then the oldest, not yet replicated updates might get lost. Therefore, if you have a large rate
of put/update/remove operations, you should increase the queue capacity. See the Queue
Capacity section.
• target-endpoints: IP addresses and ports of the cluster members for which the WAN replication
is implemented. It is enough to specify some of the member IP/ports available in the target
cluster, i.e., you don’t need to provide the IP/ports of all members in there. WAN does not
perform the discovery of other members in the target cluster; it only expects that the IP
addresses you provide are available.
• aws: Configuration for discovery strategy for Amazon EC2 discovery plugin.
• gcp: Configuration for discovery strategy for Google cloud platform discovery plugin.
• azure: Configuration for discovery strategy for Microsoft Azure discovery plugin.
• discovery-strategies: Set its enabled attribute to true for discovery in various cloud
infrastructures. You can define multiple discovery strategies using the discovery-strategy sub-
element and its elements.
• class-name: Mandatory configuration value defining the fully qualified class name of the WAN
publisher implementation.
• publisher-id: Mandatory configuration value for the publisher ID used for identifying the
publisher in a WanReplicationConfig.
The consumer is used to configure the processing of the WAN events received from a source cluster.
You can configure a certain behavior when processing the incoming WAN events or even configure
your own implementation for a WAN consumer. A custom WAN consumer allows you to define a
608
custom processing logic and is usually used in combination with a custom WAN publisher. A
custom consumer is optional and you may simply omit defining it which causes the default
processing logic to be used. It has the following sub-elements:
• class-name: Name of the class implementing a custom WAN consumer (WanConsumer). If you don’t
define a class name, the default processing logic for incoming WAN events is used.
• properties: Properties for the custom WAN consumer. These properties are accessible when
initializing the WAN consumer. You can define the host, username and password for the host,
name of the queue to be polled by the consumer, etc.
• persist-wan-replicated-data: When set to true, an incoming event over WAN replication can be
persisted to a database for example, otherwise it is not persisted. Default value is true.
Using this configuration, the cluster replicates to a cluster with the name "london". The "london"
cluster should have a similar configuration if you want to run in Active-Active mode.
You can achieve various WAN topologies using different configurations on different clusters. For
instance, if the New York and London cluster configurations contain the wan-replication element
and the Tokyo cluster does not, it might mean that the New York and London clusters are active
endpoints and Tokyo is a passive endpoint.
In addition to defining target cluster endpoints with static IP addresses, you can configure WAN to
work with the discovery SPI and determine the endpoint IP addresses at runtime. This allows you
to use WAN with endpoints on various cloud infrastructures (such as Amazon EC2 or GCP Compute)
where the IP address is not known in advance. Typically you use a readily available discovery SPI
plugin such as Hazelcast AWS EC2 discovery plugin, Hazelcast GCP discovery plugin, or similar. For
more advanced cases, you can provide your own discovery SPI implementation with custom logic
for determining the WAN target endpoints such as looking up the endpoints in some service
registry or even reading the endpoint addresses from a file.
609
When using the discovery SPI, WAN always connects to the public address of the
members returned by the discovery SPI implementation. This is opposite to the
cluster membership mechanism using the discovery SPI where a member connects
to a different member in the same cluster through its private address. Should you
prefer for WAN to use the private address of the discovered member as well,
please use the use-endpoint-private-address publisher element (see below).
Following is an example of setting up the WAN replication with the EC2 discovery plugin. You must
have the Hazelcast AWS EC2 discovery plugin on the classpath.
<hazelcast>
...
<wan-replication name="my-wan-cluster-batch">
<batch-publisher>
<cluster-name>london</cluster-name>
<queue-full-behavior>THROW_EXCEPTION</queue-full-behavior>
<queue-capacity>1000</queue-capacity>
<batch-size>500</batch-size>
<batch-max-delay-millis>1000</batch-max-delay-millis>
<snapshot-enabled>false</snapshot-enabled>
<response-timeout-millis>60000</response-timeout-millis>
<acknowledge-type>ACK_ON_OPERATION_COMPLETE</acknowledge-type>
<discovery-period-seconds>20</discovery-period-seconds>
<max-target-endpoints>5</max-target-endpoints>
<discovery-strategies>
<discovery-strategy enabled="true" class=
"com.hazelcast.aws.AwsDiscoveryStrategy">
<properties>
<property name="access-key">test-access-key</property>
<property name="secret-key">test-secret-key</property>
<property name="region">test-region</property>
<property name="iam-role">test-iam-role</property>
<property name="host-header">ec2.test-host-header</property>
<property name="security-group-name">test-security-group-
name</property>
<property name="tag-key">test-tag-key</property>
<property name="tag-value">test-tag-value</property>
<property name="connection-timeout-seconds">10</property>
<property name="hz-port">5701</property>
</properties>
</discovery-strategy>
</discovery-strategies>
</batch-publisher>
</wan-replication>
...
</hazelcast>
The hz-port property defines the port or the port range on which the target endpoint is running.
The default port range 5701-5708 is used if this property is not defined. This is needed because the
610
Amazon API which the AWS plugin uses does not provide the port on which Hazelcast is running,
only the IP address. For some other discovery SPI implementations, this might not be necessary and
it might discover the port as well, e.g., by looking up in a service registry.
The other properties are the same as when using the aws element. In case of EC2 discovery you can
configure the WAN replication using the aws element. You may use either of these, but not both at
the same time.
<hazelcast>
...
<wan-replication name="my-wan-cluster-batch">
<batch-publisher>
<cluster-name>london</cluster-name>
<queue-full-behavior>THROW_EXCEPTION</queue-full-behavior>
<queue-capacity>1000</queue-capacity>
<batch-size>500</batch-size>
<batch-max-delay-millis>1000</batch-max-delay-millis>
<snapshot-enabled>false</snapshot-enabled>
<response-timeout-millis>60000</response-timeout-millis>
<acknowledge-type>ACK_ON_OPERATION_COMPLETE</acknowledge-type>
<discovery-period-seconds>20</discovery-period-seconds>
<use-endpoint-private-address>false</use-endpoint-private-address>
<max-target-endpoints>5</max-target-endpoints>
<aws enabled="true">
<access-key>my-access-key</access-key>
<secret-key>my-secret-key</secret-key>
<region>us-west-1</region>
<security-group-name>hazelcast-sg</security-group-name>
<tag-key>type</tag-key>
<tag-value>hz-members</tag-value>
<hz-port>5701</hz-port>
</aws>
</batch-publisher>
</wan-replication>
...
</hazelcast>
See the aws element and the Configuring Client for AWS sections for the descriptions of above AWS
configuration elements.
• discovery-period-seconds: Sets the period in seconds in which WAN tries to discover new target
endpoints and reestablish connections to failed endpoints.
• max-target-endpoints: Returns the maximum number of endpoints that WAN connects to when
a discovery mechanism is used to define the endpoints. This element has no effect when static
endpoint addresses are defined using target-endpoints. Default is Integer.MAX_VALUE.
• use-endpoint-private-address: Sets whether the WAN connection manager should connect to the
611
endpoint on the private address returned by the discovery SPI. By default this element is false
which means the WAN connection manager always uses the public address.
You can also define the WAN publisher with discovery SPI using the programmatic configuration:
discoveryConfig.addDiscoveryStrategyConfig(discoveryStrategyConfig);
publisherConfig.setDiscoveryConfig(discoveryConfig);
wrConfig.addBatchReplicationPublisherConfig(publisherConfig);
config.addWanReplicationConfig(wrConfig);
Hazelcast IMDG Enterprise Edition offers the WAN batch publisher implementation for WAN
replication.
As you see in the above configuration examples, this implementation is specified simply by using
the batch-publisher element (in the declarative configuration) or the
WanBatchReplicationPublisherConfig class (in the programmatic configuration) when defining a
WAN replication publisher.
612
The WAN batch publisher implementation waits until:
• a pre-defined number of replication events are generated, (see the Batch Size section)
• or a pre-defined amount of time is passed (see the Batch Maximum Delay section).
You can configure WAN replication for Hazelcast’s IMap and ICache data structures. To enable WAN
replication for an IMap or ICache instance, you can use the wan-replication-ref element. Each IMap
and ICache instance can have a different WAN replication configuration.
Imagine you have different distributed maps, however only one of those maps should be replicated
to a target cluster. To achieve this, configure the map that you want to be replicated by adding the
wan-replication-ref element in the map configuration as shown below.
<hazelcast>
...
<wan-replication name="my-wan-cluster">
...
</wan-replication>
<map name="my-shared-map">
<wan-replication-ref name="my-wan-cluster">
<merge-policy>com.hazelcast.spi.merge.PassThroughMergePolicy</merge-
policy>
<republishing-enabled>false</republishing-enabled>
</wan-replication-ref>
</map>
...
</hazelcast>
613
Config config = new Config();
config.addWanReplicationConfig(wrConfig);
You see that we have my-shared-map configured to replicate itself to the cluster targets defined in the
earlier wan-replication element.
• name: Name of wan-replication configuration. IMap or ICache instance uses this wan-replication
configuration.
• merge-policy: Resolve conflicts that occur when the target cluster already has the replicated
entry key.
When using Active-Active Replication, multiple clusters can simultaneously update the same entry
in a distributed data structure. You can configure a merge policy to resolve these potential conflicts,
as shown in the above example configuration (using the merge-policy sub-element under the wan-
replication-ref element).
614
• com.hazelcast.spi.merge.LatestUpdateMergePolicy: Incoming entry merges from the source map
to the target map if the source entry has been updated more recently than the target entry.
Please note that this merge policy can only be used when the clusters' clocks are in sync.
When using WAN replication, please note that only key based events are replicated
to the target cluster. Operations like clear, destroy, evict and evictAll are NOT
replicated.
The following is a declarative configuration example for enabling WAN Replication for ICache:
<hazelcast>
...
<wan-replication name="my-wan-cluster">
...
</wan-replication>
<cache name="my-shared-cache">
<wan-replication-ref name="my-wan-cluster">
<merge-policy>com.hazelcast.spi.merge.PassThroughMergePolicy</merge-
policy>
<republishing-enabled>true</republishing-enabled>
</wan-replication-ref>
</cache>
...
</hazelcast>
config.addWanReplicationConfig(wrConfig);
Caches that are created dynamically do not support WAN replication functionality.
Cache configurations should be defined either declaratively (by XML) or
programmatically on both source and target clusters.
615
• com.hazelcast.spi.merge.PutIfAbsentMergePolicy: Incoming entry merges from the source cache
to the target cache if it does not exist in the target cache.
The maximum size of events that are sent in a single batch can be changed depending on your
needs. Default value for batch size is 500.
Batch size can be set for each target cluster by modifying related
WanBatchReplicationPublisherConfig.
Below is the declarative configuration for changing the value of the element:
<hazelcast>
...
<wan-replication name="my-wan-cluster">
<batch-publisher>
<cluster-name>london</cluster-name>
<batch-size>1000</batch-size>
</batch-publisher>
</wan-replication>
...
</hazelcast>
616
WanReplicationConfig wanConfig = config.getWanReplicationConfig("my-wan-cluster");
WanBatchReplicationPublisherConfig publisherConfig = new
WanBatchReplicationPublisherConfig()
.setClusterName("london")
.setBatchSize(1000);
wanConfig.addWanPublisherConfig(publisherConfig);
When using the built-in WAN batch replication, if the number of WAN replication events generated
does not reach Batch Size, they are sent to the target cluster after a certain amount of time is
passed. You can set this duration in milliseconds using this batch maximum delay configuration.
Default value of for this duration is 1 second (1000 milliseconds).
Maximum delay can be set for each target cluster by modifying related
WanBatchReplicationPublisherConfig.
You can change this element using the declarative configuration as shown below.
<hazelcast>
...
<wan-replication name="my-wan-cluster">
<batch-publisher>
<cluster-name>london</cluster-name>
<batch-max-delay-millis>2000</batch-max-delay-millis>
</batch-publisher>
</wan-replication>
...
</hazelcast>
After a replication event is sent to the target cluster, the source member waits for an
617
acknowledgement of the delivery of the event to the target. If the confirmation is not received
inside a timeout duration window, the event is resent to the target cluster. Default value of this
duration is 60000 milliseconds.
You can change this duration depending on your network latency for each target cluster by
modifying related WanBatchReplicationPublisherConfig.
<hazelcast>
...
<wan-replication name="my-wan-cluster">
<batch-publisher>
<cluster-name>london</cluster-name>
<response-timeout-millis>5000</response-timeout-millis>
</batch-publisher>
</wan-replication>
...
</hazelcast>
For huge clusters or high data mutation rates, you might need to increase the replication queue
size. The default queue size for replication queues is 10000. This means, if you have heavy
put/update/remove rates, you might exceed the queue size so that the oldest, not yet replicated,
updates might get lost. Note that a separate queue is used for each WAN Replication configured for
IMap and ICache.
Queue capacity can be set for each target cluster by modifying related
WanBatchReplicationPublisherConfig.
You can change this element using the declarative configuration as shown below.
618
<hazelcast>
...
<wan-replication name="my-wan-cluster">
<batch-publisher>
<cluster-name>london</cluster-name>
<queue-capacity>15000</queue-capacity>
</batch-publisher>
</wan-replication>
...
</hazelcast>
Note that you can clear a member’s WAN replication event queue. It can be initiated through
Management Center’s Clear Queues action or Hazelcast’s REST API. Below is the URL for its REST
call:
http://member_ip:port/hazelcast/rest/wan/clearWanQueues
You need to add the following URL-encoded parameters to the request in the following order
separated by "&";
• Cluster name
• Cluster password
This may be useful, for instance, to release the consumed heap if you know that the target cluster is
being shut down, decommissioned, put out of use and it will never come back. Or, when a failure
happens and queues are not replicated anymore, you could clear the queues using this clearing
action.
In the previous Hazelcast releases, WAN replication was dropping the new events if WAN
replication event queues are full. This behavior is configurable.
619
• DISCARD_AFTER_MUTATION: If you select this option, the new WAN events generated by the member
are dropped and not replicated to the target cluster when the WAN event queues are full.
• THROW_EXCEPTION: If you select this option, the WAN queue size is checked before each supported
mutating operation (like IMap.put(), ICache.put()). If one the queues of target cluster is full,
WanReplicationQueueFullException is thrown and the operation is not allowed.
<hazelcast>
...
<wan-replication name="my-wan-cluster">
<batch-publisher>
<cluster-name>test-cluster-1</cluster-name>
<queue-full-behavior>DISCARD_AFTER_MUTATION</queue-full-behavior>
</batch-publisher>
</wan-replication>
...
</hazelcast>
Starting with Hazelcast 3.6, WAN replication allows you to intercept WAN replication events before
they are placed to WAN event replication queues by providing a filtering API. Using this API, you
can monitor WAN replication events of each data structure separately.
You can attach filters to your data structures using the filter element of wan-replication-ref
configuration inside hazelcast.xml as shown below. You can also configure it using the
programmatic configuration.
620
<hazelcast>
...
<map name="testMap">
<wan-replication-ref name="test">
<filters>
<filter-impl>com.example.MyFilter</filter-impl>
<filter-impl>com.example.MyFilter2</filter-impl>
</filters>
</wan-replication-ref>
</map>
...
</hazelcast>
As shown in the above configuration, you can define more than one filter. Filters are called in the
order that they are introduced. A WAN replication event is only eligible to publish if it passes all the
filters.
Map and Cache have different filter interfaces: MapWanEventFilter and CacheWanEventFilter. Both of
these interfaces have the method filter which takes the following parameters:
LOADED events are filtered out and not replicated to target cluster.
Starting with Hazelcast 3.6, WAN replication supports different acknowledgment (ACK) types for
each target cluster. You can choose from 2 different ACK type depending on your consistency
requirements. The following ACK types are supported:
• ACK_ON_OPERATION_COMPLETE: This option guarantees that the event is received by the target
cluster and it is applied. It is more time consuming. But it is the best way if you have strong
consistency requirements.
621
<hazelcast>
...
<wan-replication name="my-wan-cluster">
<batch-publisher>
<cluster-name>test-cluster-1</cluster-name>
<acknowledge-type>ACK_ON_OPERATION_COMPLETE</acknowledge-type>
</batch-publisher>
</wan-replication>
...
</hazelcast>
Starting with Hazelcast 3.7 you can initiate a synchronization operation on an IMap for a specific
target cluster. Synchronization operation sends all the data of an IMap to a target cluster to align
the state of target IMap with source IMap. Synchronization is useful if two remote clusters lost their
synchronization due to WAN queue overflow or in restart scenarios.
Synchronization can be initiated through Management Center and Hazelcast’s REST API.
http://member_ip:port/hazelcast/rest/wan/sync/map
You need to add URL-encoded parameters to the request in the following order separated by "&";
• Cluster name
• Cluster password
Assume that you have configured an IMap with a WAN replication configuration as follows:
622
<hazelcast>
...
<wan-replication name="my-wan-cluster">
<batch-publisher>
<cluster-name>istanbul</cluster-name>
</batch-publisher>
</wan-replication>
<map name="my-map">
<wan-replication-ref name="my-wan-cluster">
<merge-policy>com.hazelcast.spi.merge.PassThroughMergePolicy</merge-
policy>
</wan-replication-ref>
</map>
...
</hazelcast>
Then, an example CURL command to initiate the synchronization for "my-map" would be as
follows:
http://member_ip:port/hazelcast/rest/wan/sync/allmaps
You need to add the following URL-encoded parameters to the request in the
following order separated by "&";
• Cluster name
• Cluster password
Synchronization for a target cluster operates only with the data residing in the
memory. Therefore, evicted entries are not synchronized, not even if MapLoader is
configured.
When running clusters for an extensive period, you might need to dynamically change the
configuration while the cluster is running. This includes dynamically adding new WAN replication
publishers (new target clusters) and replicating the subsequent map and cache updates to the new
publishers without any manual intervention.
623
You can add new WAN publishers to an existing WAN replication using almost all of the
configuration options that are available when configuring the WAN publishers in the static
configuration (including using Discovery SPI). The new configuration is not persisted but it is
replicated to all existing and new members. Once the cluster is completely restarted, the
dynamically added publisher configuration is lost and the updates are not replicated to the target
cluster anymore until added again.
If you wish to preserve the new configuration over cluster restarts, you must add the exact same
configuration to the static configuration file after dynamically adding the publisher configuration
to a running cluster.
You cannot remove the existing configurations but can put the publishers into a STOPPED state
which prevents the WAN events from being enqueued in the WAN queues and prevents the
replication, rendering the publisher idle. The configurations also cannot be changed.
Synchronization can be initiated through Management Center and Hazelcast’s REST API.
http://member_ip:port/hazelcast/rest/wan/addWanConfig
You need to add the following URL-encoded parameters to the request in the following order
separated by "&";
• Cluster name
• Cluster password
You can, at any point, even when maps and caches are concurrently mutated, add a new WAN
publisher to an existing WAN replication configuration. The limitation is that there must be an
existing WAN replication configuration but it can be empty, without any publishers (target clusters).
For instance, this is an example of an XML configuration to which you can dynamically add new
publishers:
<hazelcast>
...
<wan-replication name="wanReplication"></wan-replication>
<map name="my-map">
<wan-replication-ref name="wan-replication">
<merge-policy>com.hazelcast.spi.merge.PassThroughMergePolicy</merge-
policy>
<republishing-enabled>false</republishing-enabled>
</wan-replication-ref>
</map>
...
</hazelcast>
624
Note that the map has defined WAN replication but there is no target cluster yet. You can then add
the new WAN replication publishers (target clusters) by performing an HTTP POST as shown below:
You can provide the full configuration as JSON as a parameter. Any WAN configuration supported
in the XML and programmatic configurations is also supported in this JSON format. Below are some
examples of JSON configuration for a WAN publisher using the Discovery SPI and static IP
configuration. Here are the integer values for initialPublisherState, queueFullBehavior and
consistencyCheckStrategy:
• initialPublisherState:
◦ 0: REPLICATING
◦ 1: PAUSED
◦ 2: STOPPED
• queueFullBehavior:
◦ 0: DISCARD_AFTER_MUTATION
◦ 1: THROW_EXCEPTION
◦ 2: THROW_EXCEPTION_ONLY_IF_REPLICATION_ACTIVE
• consistencyCheckStrategy:
◦ 0: NONE
◦ 1: MERKLE_TREES
625
{
"name":"wanReplication",
"publishers":[
{
"clusterName":"tokyo",
"queueCapacity":10000,
"queueFullBehavior":0,
"initialPublisherState":0,
"discovery":{
"nodeFilterClass":null,
"discoveryStrategy":[
{
"className":"com.hazelcast.aws.AwsDiscoveryStrategy",
"properties":{
"security-group-name":"hazelcast",
"tag-value":"cluster1",
"host-header":"ec2.amazonaws.com",
"tag-key":"aws-test-cluster",
"secret-key":"my-secret-key",
"iam-role":"s3access",
"access-key":"my-access-key",
"hz-port":"5701-5708",
"region":"us-west-1"
}
}
]
}
}
]
}
626
{
"name":"wanReplication",
"publishers":[
{
"clusterName":"tokyo",
"queueCapacity":1000,
"queueFullBehavior":0,
"initialPublisherState":0,
"aws":{
"enabled":true,
"usePublicIp":false,
"properties":{
"security-group-name":"hazelcast-sg",
"tag-value":"hz-nodes",
"host-header":"ec2.amazonaws.com",
"tag-key":"type",
"secret-key":"my-secret-key",
"iam-role":"dummy",
"access-key":"my-access-key",
"region":"us-west-1"
}
},
"sync":{
"consistencyCheckStrategy":0
}
}
]
}
627
{
"name":"wanReplication",
"publishers":[
{
"clusterName":"tokyo",
"queueCapacity":1000,
"queueFullBehavior":0,
"initialPublisherState":0,
"responseTimeoutMillis":5000,
"targetEndpoints":"10.3.5.1:5701, 10.3.5.2:5701",
"batchMaxDelayMillis":3000,
"batchSize":50,
"snapshotEnabled":false,
"acknowledgeType":1,
"sync":{
"consistencyCheckStrategy":0
}
}
]
}
Below is an XML configuration with two publishers and several (disabled) discovery strategy
configurations:
{
"name":"wanReplication",
"publishers":[
{
"clusterName":"tokyo",
"queueCapacity":1000,
"queueFullBehavior":0,
"initialPublisherState":0,
"aws":{
"enabled":true,
"usePublicIp":false,
"properties":{
"security-group-name":"hazelcast-sg",
"tag-value":"hz-nodes",
"host-header":"ec2.amazonaws.com",
"tag-key":"type",
"secret-key":"my-secret-key",
"iam-role":"dummy",
"access-key":"my-access-key",
"region":"us-west-1"
}
},
"gcp":{
"enabled":false,
"usePublicIp":true,
628
"properties":{
"gcp-prop":"gcp-val"
}
},
"azure":{
"enabled":false,
"usePublicIp":true,
"properties":{
"azure-prop":"azure-val"
}
},
"kubernetes":{
"enabled":false,
"usePublicIp":true,
"properties":{
"k8s-prop":"k8s-val"
}
},
"eureka":{
"enabled":false,
"usePublicIp":true,
"properties":{
"eureka-prop":"eureka-val"
}
},
"discovery":{
"nodeFilterClass":null,
"discoveryStrategy":[
]
},
"sync":{
"consistencyCheckStrategy":0
}
},
{
"clusterName":"london",
"queueCapacity":1000,
"queueFullBehavior":0,
"initialPublisherState":0,
"responseTimeoutMillis":5000,
"targetEndpoints":"10.3.5.1:5701, 10.3.5.2:5701",
"batchMaxDelayMillis":3000,
"batchSize":50,
"snapshotEnabled":false,
"acknowledgeType":1,
"aws":{
"enabled":false,
"usePublicIp":false
},
"gcp":{
629
"enabled":false,
"usePublicIp":false
},
"azure":{
"enabled":false,
"usePublicIp":false
},
"kubernetes":{
"enabled":false,
"usePublicIp":false
},
"eureka":{
"enabled":false,
"usePublicIp":false
},
"discovery":{
"nodeFilterClass":null,
"discoveryStrategy":[
]
},
"sync":{
"consistencyCheckStrategy":1
}
}
]
}
The failure detection and recovery mechanisms in WAN handle failures during WAN replication
and they closely interact with the list of endpoints that WAN is replicating to. There might be some
small differences when using static endpoints or the discovery SPI but here we will outline the
general mechanism of failure detection and recovery.
The WAN connection manager maintains a list of public addresses that it can replicate to at any
moment. This list may change over time as failures are detected or as new addresses are discovered
when using the discovery SPI. The connection manager does not eagerly create connections to these
addresses as they are added to the list to avoid overloading the endpoint with connections from all
members using the same configuration. It tries to connect to the endpoint just before WAN events
are about to be transmitted. This means that if there are no updates on the map or cache using
WAN replication, there are no WAN events and the connection will not be established to the
endpoint.
When more than one endpoint is configured, traffic is load balanced between them using the
partition, so that the same partitions are always sent to the same target member, ensuring ordering
by partition.
630
WAN Failure Detection
If using the Hazelcast IMDG Enterprise edition class WanBatchReplication (see the Defining WAN
replication section), the WAN replication catches any exceptions when sending the WAN events to
the endpoint. In the case of an exception, the endpoint is removed from the endpoint list to which
WAN replicates and the WAN events are resent to a different address. The replication is retried
until it is successful.
The WAN connection manager tries to "rediscover" new endpoints periodically. The period is 10
seconds by default but can be configured using the discovery-period-seconds element (see the
Defining WAN replication section).
The discovered endpoints depend on the configuration used to define WAN replication. If using
static WAN endpoints (by using the target-endpoints element), the discovered endpoints are always
the same and are equal to the values defined in the configuration. If using discovery SPI with WAN,
the discovered endpoints may be different each time.
When the discovery returns a list of endpoints (addresses), the WAN target endpoint list is updated.
Newly discovered endpoints are added and endpoints which are no longer in the discovered list are
removed. Newly discovered endpoints may include addresses to which WAN replication has
previously failed. This means that once a new WAN event is about to be sent, a connection is
reestablished to the previously failed endpoint and WAN replication is retried. The endpoint can
later be again removed from the target endpoint list if the replication again encounters failure.
Starting with Hazelcast IMDG 3.12, we have redesigned the WAN replication mechanism to allow
tuning for lower latencies of replication and higher throughput. In most cases, WAN replication is
sufficient with out-of-the-box settings which cause WAN replication to replicate the map and cache
events with little overhead. However, there might be some use cases where the latency between a
map/cache mutation on one cluster and its visibility on the other cluster must be kept within some
bounds. To achieve such demands, you can first try tuning the WAN replication mechanism using
the following publisher elements:
• batch-size
• batch-max-delay-millis
• idle-min-park-ns
• idle-max-park-ns
To understand the implications of these elements, let’s first dive into how WAN replication works.
WAN replication runs in a separate thread and tries to send map and cache mutation events in
batches to the target endpoints for higher throughput. The target endpoints are usually members in
a target Hazelcast cluster but different WAN implementations may have different target endpoints.
The event batch is collected by iterating over the WAN queues for different partitions and, different
maps and caches. WAN replication tries and collects a batch of a size which can be configured using
631
the batch-size element.
If enough time has passed and the WAN replication thread hasn’t collected enough events to fill a
batch, it sends what it has collected nevertheless. This is controlled by the batch-max-delay-millis
element. The "enough time" precisely means that more than the configured amount of milliseconds
has passed since the time the last batch was sent to any target endpoint.
If there are no events in any of the WAN queues, the WAN replication thread goes into the idle state
by parking the WAN replication thread. The minimum park time can be defined using the idle-min-
park-ns element and the maximum park time can be controlled using the idle-max-park-ns element.
If a WAN event is enqueued while the WAN replication thread is in the idle state, the latency for
replication of that WAN event increases.
An example WAN replication configuration using the default values of the above elements is shown
below.
<hazelcast>
...
<wan-replication name="my-wan-cluster-batch">
<batch-publisher>
<cluster-name>london</cluster-name>
<batch-size>500</batch-size>
<batch-max-delay-millis>1000</batch-max-delay-millis>
<idle-min-park-ns>10000000</idle-min-park-ns> <!-- 10 ms -->
<idle-max-park-ns>250000000</idle-max-park-ns> <!-- 250 ms -->
...
</batch-publisher>
</wan-replication>
...
</hazelcast>
We will now discuss tuning these elements. Unfortunately, the exact tuning parameters heavily
depend on the load, mutation rate, latency between the source and target clusters and even use
cases. We will thus discuss some general approaches and pointers.
When tuning for low latency, the first thing you might want to do is lower the idle-min-park-ns and
idle-max-park-ns element values. This will affect the latencies that you see when having a low
number of operations per second, since this is when the WAN replication thread will be mostly in
idle state. Try lowering both elements but keep in mind that the lower the element value, the more
time the WAN replication thread will spend consuming CPU in a quiescent state - when there is no
mutation on the maps or caches.
The next element you might lower is the batch-max-delay-millis. If you have a strict upper bound
on the latency for WAN replication, this element must be below that limit. Setting this value too low
might adversely affect the performance as well since then the WAN replication thread might be
sending smaller batches than what it would if the element was higher and it had waited for some
more time. You can even try setting this element to zero which instructs the WAN replication thread
to send batches as soon as it is able to collect any events; but keep in mind this will result in many
smaller batches instead of less bigger event batches.
632
When tuning for lower latencies, configuring the batch-size usually has little effect, especially at
lower mutation rates. At a low number of operations per second, the event batches will usually be
very small since the WAN replication thread will not be able to collect the full batch and respect the
required latencies for replication. The batch-size element might have more effect at higher
mutation rates. Here, you will probably want to use bigger batches to avoid paying for the latencies
when sending lots of smaller batches, so try increasing the batch size and benchmarking at high
load.
There are a couple of other configuration values that you might try changing but it depends on
your use case. The first one is adding a separate configuration for a WAN replication executor.
Collecting of WAN event batches and processing the responses from the target endpoints are done
on a shared executor. This executor is shared between the other parts of the Hazelcast system and
all of the WAN replication publishers will use the same executor. In some cases, you might want to
create a dedicated executor for all WAN replication publishers. The name of this executor is hz:wan.
Below is an example of a concrete, dedicated executor for WAN replication. See the Configuring
Executor Service section for more information on the configuration options of the executor.
<hazelcast>
...
<executor-service name="hz:wan">
<pool-size>16</pool-size>
</executor-service>
...
</hazelcast>
The last two elements that you might want to change are acknowledge-type and max-concurrent-
invocations. Changing these elements allow you to get a greater throughput at the expense of event
ordering. This means that these elements may only be changed if your application can tolerate
WAN events to be received out-of-order. For instance, if you are updating or removing the existing
map or cache entries, an out-of-order WAN event delivery would mean that the event for the entry
removal or update might be processed by the target cluster before the event is received to create
that entry. This does not causes exceptions but it causes the clusters to fall out-of-sync. In these
cases, you most probably will not be able to use these elements. On the other hand, if you are only
creating new, immutable entries (which are then removed by the expiration mechanism), you can
use these elements to achieve a greater throughput.
The acknowledge-type element controls at which time the target cluster will send a response for the
received WAN event batch. The default value is ACK_ON_OPERATION_COMPLETE which will ensure that
all events are processed before the response is sent to the source cluster. The value ACK_ON_RECEIPT
instructs the target cluster to send a response as soon as it has received the WAN event batch but
before it has been processed. This has two implications. One is that events can now be processed
out-of-order (see the previous paragraph) and the other is that the exceptions thrown on processing
the WAN event batch will not be received by the source cluster and the WAN event batch will not be
retried. As such, some events might get lost in case of errors and the clusters may fall out-of-sync.
WAN sync can help bring those clusters in-sync. The benefit of the ACK_ON_RECEIPT value is that now
the source cluster can send a new batch sooner, without waiting for the previous batch to be
processed fully.
633
WAN synchronization strategies (neither the default nor the Delta WAN
Synchronization) don’t synchronize the deletions since they are not yet tracked
under WAN.
The max-concurrent-invocations element controls the maximum number of WAN event batches
being sent to the target cluster concurrently. Setting this element to anything less than 2 will only
allow a single batch of events to be sent to each target endpoint and will maintain causality of
events for a single partition (events are not received out-of-order). Setting this element to 2 or
higher will allow multiple batches of WAN events to be sent to each target endpoint. Since this
allows reordering of batches due to the network conditions, causality and ordering of events for a
single partition is lost and batches for a single partition are now sent randomly to any available
target endpoint. This, however, does present a faster WAN replication for certain scenarios such as
replicating immutable, independent map entries which are only added once and where ordering,
when these entries are added, is not necessary. Keep in mind that if you set this element to a value
which is less than the target endpoint count, you will lose performance as not all target endpoints
will be used at any point in time to process the WAN event batches. So, for instance, if you have a
target cluster with 3 members (target endpoints) and you want to use this element, it only makes
sense to set it to a value higher than 3. Otherwise, you can simply disable it by setting it to less than
2 in which case WAN will use the default replication strategy and adapt to the target endpoint count
while maintaining causality.
An example WAN replication configuration using the default values of the aforementioned
elements is shown below.
<hazelcast>
...
<wan-replication name="my-wan-cluster-batch">
<batch-publisher>
<cluster-name>london</cluster-name>
<acknowledge-type>ACK_ON_OPERATION_COMPLETE</acknowledge-type>
<max-concurrent-invocations>-1</max-concurrent-invocations>
...
</batch-publisher>
</wan-replication>
...
</hazelcast>
Finally, as we’ve mentioned, the exact values which will give you the optimal performance depend
on your environment and use case. Please benchmark and try out different values to find out the
right values for you.
Each cluster in WAN topology has to have a unique cluster-name property for a proper handling of
forwarded events.
Starting with Hazelcast 3.6, WAN replication backs up its event queues to other members to prevent
event loss in case of member failures. WAN replication’s backup mechanism depends on the related
634
data structures' backup operations. Note that, WAN replication is supported for IMap and ICache.
That means, as far as you set a backup count for your IMap or ICache instances, WAN replication
events generated by these instances are also replicated.
Besides the default WAN synchronization, Hazelcast provides Delta WAN Synchronization which
uses Merkle tree for the same purpose. It is a data structure used for efficient comparison of the
difference in the contents of large data structures. The precision of this comparison is defined by
Merkle tree’s depth. Merkle tree hash exchanges can detect inconsistencies in the map data and
synchronize only the different entries when using WAN synchronization, instead of sending all the
map entries.
24.2.1. Requirements
1. Configure the WAN synchronization mechanism for your WAN publisher so that it uses the
Merkle tree: If configuring declaratively, you can use the consistency-check-strategy sub-
element of the wan-sync element. If configuring programmatically, you can use the setter of the
WanSyncConfig object.
2. Bind that WAN synchronization configuration to the data structure (currently IMap): Simply set
the WAN replication reference of your map to the name of the WAN replication configuration
which uses the Merkle tree.
635
Following is a declarative configuration example of the above:
<hazelcast>
...
<map name="myMap">
<wan-replication-ref name="wanReplicationScheme">
</wan-replication-ref>
</map>
<wan-replication name="wanReplicationScheme">
<batch-publisher>
<cluster-name>clusterName</cluster-name>
<wan-sync>
<consistency-check-strategy>MERKLE_TREES</consistency-check-strategy>
</wan-sync>
</batch-publisher>
</wan-replication>
...
</hazelcast>
Here, the element consistency-check-strategy sets the strategy for checking the consistency of data
between the source and target clusters. You must initiate the WAN synchronization (via
Management Center or REST API as explained in Synchronizing WAN clusters) to let this strategy
reconcile the inconsistencies. The element consistency-check-strategy has currently two values:
• NONE: Means that there are no consistency checks. This is the default value.
You can configure Delta WAN Synchronization declaratively using the merkle-tree element or
programmatically using the MerkleTreeConfig object.
<hazelcast>
...
<map name="someMap">
<merkle-tree enabled="true">
<depth>5</depth>
</merkle-tree>
</map>
...
</hazelcast>
636
• enabled: Specifies whether the Merkle tree structure is enabled. Its default value is true.
• mapName: Specifies the name of the map for which the Merkle tree structure is used.
• depth: Specifies the depth of Merkle tree. Valid values are between 2 and 27 (exclusive). Its
default value is 10.
◦ A larger depth means that a data synchronization mechanism is able to pinpoint a smaller
subset of the data structure (e.g., IMap) contents in which a change has occurred. This
causes the synchronization mechanism to be more efficient. However, keep in mind that a
large depth means that the Merkle tree will consume more memory. As the comparison
mechanism is iterative, a larger depth also prolongs the comparison duration. Therefore, it
is recommended not to have large tree depths if the latency of the comparison operation is
high.
◦ A smaller depth means that the Merkle tree is shallower and the data synchronization
mechanism transfers larger chunks of the data structure (e.g., IMap) in which a possible
change has happened. As you can imagine, a shallower Merkle tree will consume less
memory.
<hazelcast>
...
<map name="myMap">
<wan-replication-ref name="wanReplicationScheme">
...
</wan-replication-ref>
<merkle-tree enabled="true">
<depth>10</depth>
</merkle-tree>
</map>
<wan-replication name="wanReplicationScheme">
<batch-publisher>
<cluster-name>clusterName</cluster-name>
<wan-sync>
<consistency-check-strategy>MERKLE_TREES</consistency-check-strategy>
</wan-sync>
</batch-publisher>
</wan-replication>
...
</hazelcast>
637
Merkle trees are created for each partition holding IMap data. Therefore,
increasing the partition count also increases the efficiency of the Delta WAN
Synchronization.
1. Consistency check: Process of exchanging and comparing the hashes stored in the Merkle tree
structures in the source and target clusters. The check starts with the root node and continues
recursively with the children with different hash codes. Both sides send the children of the
nodes that the other side sent, hence the comparison is done by depth/2 steps. After this check,
the tree leaves holding different entries are identified.
2. Synchronization: Process of transferring the entries belong to the leaves identified by the
consistency check from the source to target cluster. On the target cluster the configured merge
policy is applied for each entry that is in both the source and target clusters.
If you only need the differences between the clusters, you can trigger the
consistency check without performing synchronization.
Since Merkle trees are built for each partition and each map, the memory overhead of the trees
with high entry count and deep trees can be significant. The trees are maintained on-heap,
therefore - besides the memory consumption - garbage collection could be another concern. Make
sure the configuration is tested with realistic data size before deployed in production.
The table below shows a few examples for what the memory overhead could be.
The efficiency of the Delta WAN Synchronization (WAN synchronization based on Merkle trees) is
determined by the average number of entries per the tree leaves that is proportionate to the
638
number of entries in the map. The bigger this average the more entries are getting synchronized for
the same difference. Raising the depth decreases this average at the cost of increasing the memory
overhead.
In order to specify the tree depth, a trade-off between memory consumption and effectiveness
might be needed.
Even if the map is huge and the Merkle trees are configured to be relatively shallow, the Merkle
tree based synchronization may be leveraged if only a small subset of the whole map is expected to
be synchronized. The table below illustrates the efficiency of the Merkle tree based synchronization
compared to the default synchronization mechanism.
The Difference count column shows the number of the entries different in the source and the target
clusters. This is the minimum number of the entries that need to be synchronized to make the
clusters consistent. The Entries synced column shows how many entries are synchronized in the
given case, calculated as Entries synced = Difference count * Avg entries / leaf.
As shown in the last two rows, the Merkle tree based synchronization transfers significantly less
entries than what the default mechanism does even with 8 deep trees. The efficiency with depth 12
is even better but consumes much more memory.
The average entries per leaf number above assumes perfect distribution of the
entries amongst the leaves. Since this is typically not true in real-life scenarios the
efficiency can be slightly worse. The statistics section below describes how to get
the actual average for the leaves involved in the synchronization.
639
To be able to use the REST calls related to synchronization, you need to enable the
WAN REST endpoint group. See the Using the REST Endpoint Groups section for
details.
The two phases of the Merkle tree based synchronization can be triggered by a REST call, as it can
be done with the default synchronization.
http://member_ip:port/hazelcast/rest/wan/consistencyCheck/map
The URL for the synchronization REST call - the same as it is for the default synchronization:
http://member_ip:port/hazelcast/rest/wan/sync/map
You need to add URL-encoded parameters to the request in both cases in the following order
separated by "&";
• Cluster name
• Cluster password
You can also use the following URL in your REST call if you want to synchronize all
the maps in source and target cluster:
http://member_ip:port/hazelcast/rest/wan/sync/allmaps. You need to add the
following URL-encoded parameters to the request in the following order separated
by "&";
• Cluster name
• Cluster password
24.2.8. Statistics
The consistency check and the synchronization both write statistics into the diagnostics subsystem
and send it to the Management Center. The following reported fields can be used to reason about
the efficiency of the configuration.
640
Consistency check reports the number of the:
• average number of the entries per tree leaves in the synchronized leaves.
25. OSGI
This chapter explains how Hazelcast is supported on OSGI (Open Service Gateway Initiatives)
environments.
Created Hazelcast instances can be served as an OSGI service that the other Hazelcast bundles can
access. Registering created Hazelcast instances behavior is enabled by default; you can disable it
using the property hazelcast.osgi.register.disabled.
Each Hazelcast bundle provides a different OSGI service. Their instances can be grouped (clustered)
together to prevent possible compatibility issues between different Hazelcast versions/bundles.
This grouping behavior is enabled by default and you disable it using the property
hazelcast.osgi.grouping.disabled.
Hazelcast OSGI service’s lifecycle (and the owned/created instances’s lifecycles) is the same with the
owner Hazelcast bundles. When the bundle is stopped (deactivated), the owned service and
Hazelcast instances are also deactivated/shutdown and deregistered automatically. When the
bundle is re-activated, its service is registered again.
The Hazelcast IMDG Enterprise JAR package is also an OSGI bundle like the Hazelcast Open Source
JAR package.
25.2. API
HazelcastOSGiService: Contract point for Hazelcast services on top of OSGI. Registered to
org.osgi.framework.BundleContext as the OSGI service so the other bundles can access and use
Hazelcast on the OSGI environment through this service.
641
HazelcastOSGiInstance: Contract point for HazelcastInstance implementations based on OSGI
service. HazelcastOSGiService provides proxy Hazelcast instances typed HazelcastOSGiInstance
which is a subtype of HazelcastInstance and these instances delegate all calls to the underlying
HazelcastInstance.
25.4. Design
HazelcastOSGiService is specific to each Hazelcast bundle. This means that every Hazelcast bundle
has its own HazelcastOSGiService instance.
Every Hazelcast bundle registers its HazelcastOSGiService instances via Hazelcast Bundle Activator
(com.hazelcast.osgi.impl.Activator) while they are being started, and it deregisters its
HazelcastOSGiService instances while they are being stopped.
HazelcastOSGiService instance lifecycle is the same with the owner Hazelcast bundle. This means
that when the owner bundle is deactivated, the owned HazelcastOSGiService instance is deactivated,
and all active Hazelcast instances that are created and served by that HazelcastOSGiService instance
are also shutdown and deregistered. When the Hazelcast bundle is re-activated, its
HazelcastOSGiService instance is registered again as the OSGI service.
642
25.5. Using Hazelcast OSGI Service
25.5.1. Getting Hazelcast OSGI Service Instances
You can use HazelcastOSGiService instance to create and shutdown Hazelcast instances on OSGI
environments. The created Hazelcast instances are HazelcastOSGiInstance typed (which is sub-type
of HazelcastInstance) and are just proxies to the underlying Hazelcast instance. There are several
methods in HazelcastOSGiService to use Hazelcast instances on OSGI environments as shown below.
643
// Get the default Hazelcast instance owned by `hazelcastOsgiService`
// Returns null if `HAZELCAST_OSGI_START` is not enabled
HazelcastOSGiInstance defaultInstance = hazelcastOsgiService
.getDefaultHazelcastInstance();
// Gets the Hazelcast instance with the name `OSGI-Instance`, which is `newInstance2`
created above
HazelcastOSGiInstance instance = hazelcastOsgiService.getHazelcastInstanceByName(
"OSGI-Instance");
26.1. OperationParker
OperationParker is an interface offered by SPI for the objects, such as Lock and Semaphore, to be
used when a thread needs to wait for a lock to be released.
644
• it looks for a waiter
• if the waiter responds no, the service executes its registered operation (operation itself knows
where to send a response)
Each waiter can sit on a wait-notify queue for, at most, its operation’s call timeout. For example, by
default, each waiter can wait here for at most 1 minute. A continuous task scans expired/timed-out
waiters and invalidates them with CallTimeoutException. Each waiter on the remote side should
retry and keep waiting if it still wants to wait. This is a liveness check for remote waiters.
This way, it is possible to distinguish an unresponsive member and a long (~infinite) wait. On the
caller side, if the waiting thread does not get a response for either a call timeout or for more than 2
times the call-timeout, it will exit with OperationTimeoutException.
Note that this behavior breaks the fairness. Hazelcast does not support fairness for any of the data
structures with blocking operations, such as Lock and Semaphore.
Since there is an ever growing number of public and private cloud environments, as well as
numerous Service Discovery systems in the wild, Hazelcast provides cloud or service discovery
vendors with the option to implement their own discovery strategy.
Over the course of this section, we will build a simple discovery strategy based on the /etc/hosts
file.
The Hazelcast Discovery SPI (Member Discovery Extensions) consists of multiple interfaces and
abstract classes. In the following subsections, we will have a quick look at all of them and shortly
introduce the idea and usage behind them. The example will follow in the next section, Discovery
Strategy.
DiscoveryStrategy: Implement
645
DiscoveryStrategys can also do automatic registration / de-registration on service discovery systems
if necessary. You can use the provided DiscoveryNode that is passed to the factory method to retrieve
local addresses and ports, as well as metadata.
You can configure com.hazelcast.spi.discovery.NodeFilter before startup and you can implement
logic to do additional filtering of members. This might be necessary if query languages for
discovery strategies are not expressive enough to describe members or to overcome inefficiencies
of strategy implementations.
The DiscoveryStrategy vendor does not need to take possibly configured filters into
account as their use is transparent to the strategies.
646
or frameworks. Certain needs might be necessary as part of the classloading or Java Service
Provider Interface lookup.
This subsection walks through the implementation of a simple DiscoveryStrategy and its necessary
setup.
The example strategy uses the local /etc/hosts (and on Windows it uses the equivalent to the *nix
hosts file named %SystemRoot%\system32\drivers\etc\hosts) to lookup IP addresses of different hosts.
The strategy implementation expects hosts to be configured with hostname sub-groups under the
same domain. So far to theory, let’s get into it.
As a first step we do some basic configuration setup. We want the user to be able to configure the
site domain for the discovery inside the hosts file, therefore we define a configuration property
called site-domain. The configuration is not optional: you need to configure it before the creation of
the HazelcastInstance, either via Hazelcast’s declarative or programmatic configuration.
It is recommended that you keep all defined properties in a separate configuration class as public
constants (public static final) with sufficient documentation. This allows users to easily look up
possible configuration values.
647
public final class HostsDiscoveryConfiguration {
private HostsDiscoveryConfiguration() {
}
}
An additional ValueValidator could be passed to the definition to make sure the configured value
looks like a domain or has a special format.
Creating Discovery
As the second step we create the very simple DiscoveryStrategyFactory implementation class. To
keep things clear we are going to name the discovery strategy after its purpose: looking into the
hosts file.
@Override
public Class<? extends DiscoveryStrategy> getDiscoveryStrategyType() {
return HostsDiscoveryStrategy.class;
}
@Override
public DiscoveryStrategy newDiscoveryStrategy(DiscoveryNode discoveryNode, ILogger
logger, Map<String, Comparable> properties) {
return new HostsDiscoveryStrategy(logger, properties);
}
@Override
public Collection<PropertyDefinition> getConfigurationProperties() {
return PROPERTIES;
}
}
This factory now defines properties known to the discovery strategy implementation and provides
a clean way to instantiate it. While creating the HostsDiscoveryStrategy we ignore the passed
DiscoveryNode since this strategy does not support automatic registration of new members. In cases
where the strategy does not support registration, the environment has to handle this in some
provided way.
648
Remember that, when created on a Hazelcast client, the provided DiscoveryNode is
null, as there is no local member in existence.
com.hazelcast.examples.spi.discovery.HostsDiscoveryStrategyFactory
If our JAR file contains multiple factories, each consecutive line can define another full canonical
DiscoveryStrategyFactory implementation class name.
Now comes the interesting part. We are going to implement the discovery itself. The previous parts
we did are normally pretty similar for all strategies aside from the configuration properties itself.
However, implementing the discovery heavily depends on the way the strategy has to come up with
IP addresses of other Hazelcast members.
@Override
public Iterable<DiscoveryNode> discoverNodes() {
List<String> assignments = filterHosts();
return mapToDiscoveryNodes(assignments);
}
649
String os = System.getProperty("os.name");
String hostsPath;
if (os.contains("Windows")) {
hostsPath = HOSTS_WINDOWS;
} else {
hostsPath = HOSTS_NIX;
}
if (matchesDomain(line)) {
assignments.add(line);
}
}
return assignments;
}
String line;
while ((line = reader.readLine()) != null) {
line = line.trim();
if (!line.startsWith("#")) {
650
lines.add(line.trim());
}
}
return lines;
} catch (IOException e) {
throw new RuntimeException("Could not read hosts file", e);
}
}
So far our implementation retrieves the configuration property for the site-domain. Our
implementation offers the option to override the value from the configuration (declarative or
programmatic) right from the system environment or JVM properties. That can be useful when the
hazelcast.xml defines a setup for an developer system (like cluster.local) and operations wants to
override it for the real deployment. By providing a prefix (in this case discovery.hosts) we created
651
an external property named discovery.hosts.site-domain which can be set as an environment
variable or passed as a JVM property from the startup script.
The lookup priority is explained in the following list, priority is from top to bottom:
• System environment
• Configuration properties
Implementing Lookup
Since we have the value for our property now, we can implement the actual lookup and mapping as
already prepared in the discoverNodes method. The following part is very specific to this special
discovery strategy; for completeness we are showing it anyways.
String hostsPath;
if ( os.contains( "Windows" ) ) {
hostsPath = HOSTS_WINDOWS;
} else {
hostsPath = HOSTS_NIX;
}
Mapping to DiscoveryNode
After we have collected the address assignments configured in the hosts file, we can go to the final
step and map those to the DiscoveryNodes to return them from our strategy.
652
private Iterable<DiscoveryNode> mapToDiscoveryNodes( List<String> assignments ) {
Collection<DiscoveryNode> discoveredNodes = new ArrayList<DiscoveryNode>();
With that mapping, we now have a full discovery, executed whenever Hazelcast asks for IPs. So
why don’t we read them in once and cache them? The answer is simple: it might happen that
members go down or come up over time. Since we expect the hosts file to be injected into the
running container, it also might change over time. We want to get the latest available members,
therefore we read the file on request.
Configuring DiscoveryStrategy
To actually use the new DiscoveryStrategy implementation we need to configure it like in the
following example:
653
<hazelcast>
...
<!-- activate Discovery SPI -->
<properties>
<property name="hazelcast.discovery.enabled">true</property>
</properties>
<network>
<join>
<!-- deactivating other discoveries -->
<multicast enabled="false"/>
<tcp-ip enabled="false" />
<aws enabled="false"/>
To find out further details, please have a look at the Discovery SPI Javadoc.
Since the DiscoveryStrategy is meant for cloud vendors or implementors of service discovery
systems, the DiscoveryService is meant for integrators. In this case, integrators mean people
integrating Hazelcast into their own systems or frameworks. In those situations, there may be
special requirements on how to lookup framework services like the discovery strategies or similar
services. Integrators can extend or implement their own DiscoveryService and
DiscoveryServiceProvider and inject them using the com.hazelcast.config.DiscoveryConfig
configuration API prior to instantiating the HazelcastInstance. In any case, integrators might have
to remember that a DiscoveryService might have to change behavior based on the runtime
environment (Hazelcast member or client) and then the DiscoveryServiceSettings should provide
information about the started HazelcastInstance.
Since the implementation heavily depends on one’s needs, there is no reason to provide an example
of how to implement your own DiscoveryService. However, Hazelcast provides a default
implementation which can be a good example to get started. This default implementation is
com.hazelcast.spi.discovery.impl.DefaultDiscoveryService.
654
26.3. Config Properties SPI
The Config Properties SPI is an easy way that you can configure SPI plugins using a prebuilt system
of automatic conversion and validation.
The Config Properties SPI consists of a small set of classes and provided implementations.
You can mark properties as optional and you can have an additional validation step to make sure
the provided value matches certain rules (like port numbers must be between 0-65535 or similar).
• String
• Short
• Integer
• Long
• Float
• Double
• Boolean
655
26.3.2. Config Properties SPI Example
This sub-section shows a quick example of how to setup, configure and use the Config Properties
SPI.
We defined a property named my-key with a type of a string. If none of the predefined
TypeConverters matches the need, users are free to provide their own implementation.
In any case, both options are useable interchangeably, however the later version is
recommended by Hazelcast for schema applicability.
To eventually retrieve a value, use the PropertyDefinition to get and convert the value
automatically.
656
27. Hazelcast Plugins
This chapter describes the plugins using which you can extend Hazelcast IMDG’s functionalities.
Apache jclouds® is an open source multi-cloud library for the Java platform which lets you create
applications that are portable across clouds and gives you the full control to use cloud-specific
features. Hazelcast members and native clients support Apache jclouds® for discovery.
You can configure your cluster to use jclouds® discovery by adding hazelcast-jclouds.jar
dependency to your project and enabling Hazelcast’s Discovery SPI. Since jclouds® depends on
various libraries, you also need to configure its dependencies using build automation tools like
Maven. Note that you can also define multiple regions in your jclouds® configuration; the members
can find each other over a different region.
You can easily configure your cluster to use EC2 discovery by adding hazelcast-aws.jar dependency
to your project and enabling Hazelcast’s Discovery SPI. This plugin does not depend on any other
third party modules. Note that this plugin puts the zone information into the Hazelcast’s member
attributes map during the discovery process; you can use its ZONE_AWARE configuration to create
backups in other Availability Zones (AZ). Each zone is accepted as one partition group. Note that,
when using the ZONE_AWARE partition grouping, a Hazelcast cluster spanning multiple AZs should
have an equal number of members in each AZ. Otherwise, it results in an uneven partition
distribution among the members.
You can easily configure Hazelcast members discovery, WAN replication and Hazelcast clients to
work seamlessly on the native GCP VM Instances. This plugin supports ZONE_AWARE configuration to
create backups in separate zones to prevent data loss in the case of a zone outage. This plugin also
supports discovering a Hazelcast cluster deployed on GCP by the Hazelcast client running outside of
657
the GCP infrastructure.
Microsoft Azure is a cloud computing service provided by Microsoft for managing applications
through a global network of Microsoft-managed data centers. Hazelcast Azure plugin provides a
discovery strategy for Hazelcast enabled applications running on Microsoft Azure. It provides all
Hazelcast instances by returning VMs within your Azure resource group that are tagged with a
specified value.
To use this plugin in your Java project, simply add the Azure dependency to your Maven or Gradle
configurations and enable Hazelcast’s Discovery SPI. Then you need to configure a couple of
properties at both Hazelcast and Azure sides.
Consul is a distributed service mesh to connect, secure and configure services across any public or
private cloud platforms. This plugin provides a Consul based discovery strategy for Hazelcast
clusters.
You can add the Consul dependency to your Maven or Gradle configurations and enable Hazelcast’s
Discovery SPI to use this plugin. You can then start Consul in your network and set the Consul
related properties in your Hazelcast configuration.
etcd is an open-source distributed key value store that provides shared configuration and service
discovery for Container Linux clusters. This plugin enables the Hazelcast members to dynamically
discover each other through etcd.
Add the etcd dependency to either your Maven or Gradle configurations and enable Hazelcast’s
Discovery SPI. Then start etcd in your network and set the etcd related properties (such username,
password and registrator) in your Hazelcast configuration.
Eureka is a REST based service that is primarily used in the AWS cloud to for load balancing and
failover of middle-tier servers, and Hazelcast supports Eureka V1 discovery.
To use this plugin, add the hazelcast-eureka-one.jar dependency to your project and enable
Hazelcast’s Discovery SPI. You also need to specify the Eureka properties file.
658
See Hazelcast Eureka plugin’s documentation for more information.
Pivotal Cloud Foundry(PCF) is an open source cloud platform on which you can build, deploy, run
and scale applications. You can deploy your Hazelcast IMDG Enterprise clusters on PCF using
clickable tiles.
After you install and configure Hazelcast IMDG Enterprise, you can create services, and configure
WAN replications, user code deployments and TLS.
See Hazelcast IMDG Enterprise for PCF documentation for more information.
OpenShift is an open source container application platform by Red Hat based on top of Docker
containers and the Kubernetes container cluster manager for application development and
deployment. Hazelcast can run inside OpenShift.
You can use Kubernetes for discovery of Hazelcast members. By using Hazelcast Docker images,
templates and default configuration files, you can deploy Hazelcast IMDG, Hazelcast IMDG
Enterprise and Management Center onto OpenShift.
See Hazelcast IMDG for OpenShift documentation and Hazelcast Management Center for OpenShift
documentation for more information.
Heroku is a cloud platform as a service supporting several programming languages so that you can
build, run and operate applications entirely in the cloud. This plugin offers a discovery strategy that
looks for IP addresses of members by resolving service names against the Heroku DNS Discovery in
Heroku Private Spaces.
You can use this plugin by adding the hazelcast-heroku-dependency to your Maven or Gradle
configurations and enabling Hazelcast’s Discovery SPI. By default there is no configuration needed,
but you can configure the service names or initial run delay for the merge after a Split-Brain.
Kubernetes is an open source container orchestration system to automate deployment, scaling and
management of containerized applications. This plugin looks up the IP addresses of Hazelcast
members by resolving the requests against a Kubernetes Service Discovery system. It supports two
different options of resolving against the discovery registry: a request to the REST API and DNS
lookup against a given DNS service name.
To use this plugin, add the hazelcast-kubernetes dependency to your Maven or Gradle
configurations and enable Hazelcast’s Discovery SPI. You need to configure Hazelcast according to
the option you want the plugin to use, i.e., REST API or DNS lookup.
659
See Hazelcast Kubernetes plugin’s documentation for more information.
To use this plugin, add the Curator dependencies to your Maven or Gradle configurations and
enable Hazelcast’s Discovery SPI. Thereafter, you need to configure properties such as the URL of
Zookeeper server and cluster ID.
Spring Data provides a consistent, Spring-based programming model for data access while
preserving the features of the underlying data store. This plugin provides Spring Data repository
support for Hazelcast IMDG. This integration enables the Spring Data paradigm to gain the power
of a distributed data repository.
To use this plugin, add the Spring Data dependency to your Maven or Gradle configurations and
specify the base packages and repositories.
This plugin provides Spring Integration extensions for Hazelcast. These extensions are included but
limited to the following:
• Event-driven inbound channel adapter: Listens related Hazelcast data structure events and
sends event messages to the defined channel.
• Continuous query inbound channel adapter: Listens the modifications performed on specific
map entries.
• Cluster monitor inbound channel adapter: Listen the modifications performed on the cluster.
• Distributed SQL inbound channel adapter: Runs the defined distributed SQL and returns the
results in the light of iteration type.
• Outbound channel adapter: Listens the defined channel and writes the incoming messages to
the related distributed data structure.
• Leader election: Elects a cluster member, for example, for highly available message consumer
660
where only one member should receive messages.
See Spring Integration Extension for Hazelcast documentation for more information.
Hazelcast JCA Resource Adapter is a system-level software driver which can be used by a Java
application to connect to an Hazelcast cluster. Using this adapter, you can integrate Hazelcast into
Java EE containers. After a proper configuration, Hazelcast can participate in standard Java EE
transactions.
Deploying and configuring the Hazelcast JCA Resource Adapter is not different than configuring
any other resource adapters since it is a standard JCA one. However, resource adapter installation
and configuration is container-specific, so you need to consult with your Java EE vendor
documentation for details.
See Hazelcast JCA Resource Adapter documentation for information on configuring the resource
adapter, Glassfish applications and JBoss web applications.
loader.override=com.hazelcast
Grails is an open source web application framework that uses the Apache Groovy programming
language. This plugin integrates Hazelcast data distribution framework into your Grails application.
You can reach the distributed data structures by injecting the HazelService. Also you can cache your
domain class into Hazelcast distributed cache.
See Hazelcast Grails plugin’s documentation and this blogpost for more information.
Hibernate is an object-relational mapping tool for the Java programming language. It provides a
framework for mapping an object-oriented domain model to a relational database and enables
developers to more easily write applications whose data outlives the application process. This
plugin provides Hazelcast’s own distributed second level cache implementation for your Hibernate
(versions 3, 4 and 5) entities, collections and queries.
To use this plugin, add the Hazelcast Hibernate dependency into your classpath depending on your
Hibernate version. Then you need to specify various properties in your Hibernate configuration
such as the RegionFactory and query cache properties.
See the documentation of this plugin for Hibernate 3.x, 4.x and for Hibernate 5.x.
661
27.2.6. Hazelcast DynaCache
DynaCache by IBM is used to store objects, and later, based on some data matching rules, to retrieve
those objects and serve them from its cache. This plugin is for Liberty Profile which is a lightweight
profile of IBM WebSphere Application Server.
In the Liberty Profile, you can use a dynamic cache engine in order to cache your data. With this
plugin, you can use Hazelcast as a cache provider.
This plugin allows you to write events from Kafka to HazelCast. It takes the value from the Kafka
Connect SinkRecords and inserts/updates an entry in Hazelcast. It supports writing to Hazelcast
distributed data structures including Reliable Topic, Ringbuffer, Queue, Set, List, Map, MultiMap
and ICache (Hazelcast’s JCache extension).
27.2.8. Openfire
Openfire is an open source real time collaboration server. It uses XMPP which is an open protocol
for instant messaging. This plugin adds support for running multiple redundant Openfire servers
together in a cluster.
By running Openfire as a cluster, you can distribute the connection load among several servers,
while also providing failover in the event of failures.
27.2.9. SubZero
Kryo is a popular serialization library. It is fast, easy to use, and it does not pollute your domain
model. It can even serialize classes which are not marked as Serializable.
Hazelcast has no out-of-the box support for Kryo. Although it is rather easy to integrate it, everyone
has to write the same code and face the same bugs. This plugin, SubZero, simplifies the integration
of Hazelcast and Kryo. Simply add SubZero dependency to your Maven or Gradle configurations,
and add the SubZero plugin as a global serializer (if you want to use it for all classes in your project)
or as a serializer (to have the option of selecting the classes in your project).
662
27.3.1. Filter Based Web Session Replication
This plugin (a.k.a. Generic Web Session Replication) provides HTTP session replication capabilities
across a Hazelcast cluster in order to handle failover cases. Assuming you have multiple web
servers with load balancers; if one server goes down, your users on that server are directed to one
of the other live servers, but their sessions are not lost. Using this plugin backs up these HTTP
sessions; it clusters them automatically. To use it, put the hazelcast-wm JAR file into your WEB-INF/lib
folder and configure your web.xml file according to your needs.
See the plugin’s documentation for information on configuring and using it.
See also the example application which uses filter based web session replication.
Note that filter based web session replication has the option to use a map with High-Density
Memory Store, is available in Hazelcast IMDG Enterprise HD, to keep your session objects. See the
High-Density Memory Store section for details on this feature.
Tomcat based web session replication is offered through Hazelcast Tomcat Session Manager. It is a
container specific module that enables session replication for JEE Web Applications without
requiring changes to the application.
See the plugin’s documentation for information on configuring and using it.
See also the example application which uses Tomcat based web session replication.
Jetty based web session replication is offered through Hazelcast Jetty Session Manager. It is a
container specific module that enables session replication for JEE Web Applications without
requiring changes to the application.
See the plugin’s documentation for information on configuring and using it.
See also the example application which uses Jetty based web session replication.
Apache Spark is an open source cluster-computing platform which has become one of the key big
data distributed processing frameworks. There is a Spark connector for Hazelcast which allows
your Spark applications to connect to a Hazelcast cluster with the Spark RDD API. See this
integration’s documentation for information on configuring and using it.
Apache Mesos is an open source cluster manager that handles workloads efficiently in a distributed
environment through dynamic resource sharing and isolation; you can run any distributed
application that requires clustered resources. It is widely used to manage big data infrastructures.
Hazelcast Mesos integration gives you the ability to deploy Hazelcast on the Mesos cluster. See this
663
integration’s documentation for information on configuring and using it.
One aspect of replication techniques is about where a replicated data set is accessed and updated.
For instance, primary-copy systems first elect a replica, which can be called as primary, master, etc.,
and use that replica to access the data. Changes in the data on the primary replica are propagated
to other replicas. This approach has different namings, such as primary-copy, single-master, passive
replication. The primary-copy technique is a powerful model as it prevents conflicts, deadlocks
among the replicas. However, primary replicas can become bottlenecks. On the other hand, we can
have a different technique by eliminating the primary-copy and treating each replica as equal.
These systems can achieve a higher level of availability as a data entry can be accessed and updated
using any replica. However, it can become more difficult to keep the replicas in sync with each
other.
Replication techniques also differ in how updates are propagated among replicas. One option is to
update each replica as part of a single atomic transaction, called as eager replication or synchronous
replication. Consensus algorithms apply this approach to achieve strong consistency on a replicated
data set. The main drawback is the amount of coordination and communication required while
running the replication algorithm. CP systems implement consensus algorithms under the hood.
Another option is the lazy replication technique, which is also called as asynchronous replication.
Lazy replication algorithms execute updates on replicas with separate transactions. They generally
work with best-effort. By this way, the amount of coordination among the replicas are degraded
and data can be accessed in a more performant manner. Yet, it can happen that a particular update
is executed on some replicas but not on others, which causes replicas to diverge. Such problems can
be resolved with different approaches, such as read-repair, write-repair, anti-entropy. Lazy
replication techniques are popular among AP systems.
664
the Raft consensus algorithm and accessed via HazelcastInstance.getCPSubsytem() which provides
CP data structures and APIs.
The replication algorithm and consistency model explained below apply to AP data
structures only. For CP subsystem and CP data structures, see the CP Subsystem
section.
For AP data structures, Hazelcast employs the combination of primary-copy and lazy replication
techniques. As briefly described in the Data Partitioning section, each data entry is mapped to a
single Hazelcast partition and put into replicas of that partition. One of the replicas is elected as the
primary replica, which is responsible for performing operations on that partition. When you read
or write a map entry, you transparently talk to the Hazelcast member to which primary replica of
the corresponding partition is assigned. By this way, each request hits the most up-to-date version
of a particular data entry in a stable cluster. Backup replicas stay in standby mode until the
primary replica fails. Upon failure of the primary replica, one of the backup replicas is promoted to
the primary role.
With lazy replication, when the primary replica receives an update operation for a key, it executes
the update locally and propagates it to backup replicas. It marks each update with a logical
timestamp so that backups apply them in the correct order and converge to the same state with the
primary. Backup replicas can be used to scale reads (see the Enabling Backup Reads section) with
no strong consistency but monotonic reads guarantee.
Hazelcast offers features such as SplitBrainProtection, ILock and AtomicLong. In the journey of
being a highly elastic, dynamic and easy to use product, Hazelcast tries to provide best-effort
consistency guarantees without being a complete CP solution. Therefore, we recommend these
features to be used for efficiency purposes in general, instead of correctness. For instance, they can
be used to prevent to run a resource-extensive computation multiple times, which would not create
any correctness problem if runs more than once. See the Best-Effort Consistency and Network
Partitioning sections for more information.
Hazelcast’s replication technique enables Hazelcast clusters to offer high throughput. However, due
to temporary situations in the system, such as network interruption, backup replicas can miss some
updates and diverge from the primary. Backup replicas can also hit VM or long GC pauses, and fall
behind the primary, which is a situation called as replication lag. If a Hazelcast partition primary
replica member crashes while there is a replication lag between itself and the backups, strong
consistency of the data can be lost.
Please note that CP systems can have similar problems as well. However, in a CP system, once a
replica performs an update locally (i.e., commits the update), the underlying consensus algorithm
guarantees durability of the update for the rest of the execution.
On the other hand, in AP systems like Hazelcast, a replica can perform an update locally, even if the
update is not to be performed on other replicas. This is a fair trade-off to reduce amount of
coordination among replicas and maintain high throughput & high availability of the system. These
systems employ additional measurements to maintain consistency in a best-effort manner. In this
regard, Hazelcast tries to minimize the effect of such scenarios using an active anti-entropy solution
665
as follows:
• For each primary replica it is assigned, it creates a summary information and sends it to the
backups.
• Then, each backup member compares the summary information with its own data to see if it is
up-to-date with the primary.
• If a backup member detects a missing update, it triggers the synchronization process with the
primary.
When a Hazelcast member receives a partition specific operation, it executes the operation and
propagates it to backup replica(s) with a logical timestamp. Number of backups for each operation
depends on the data structure and its configuration. See Threading Model - Operation Threading for
threading details.
Two types of backup replication are available: sync and async. Despite what their names imply,
both types are still implementations of the lazy (async) replication model. The only difference
between sync and async is that, the former makes the caller block until backup updates are applied
by backup replicas and acknowledgments are sent back to the caller, but the latter is just fire &
forget. Number of sync and async backups are defined in the data structure configurations, and you
can use a combination of sync and async backups.
When backup updates are propagated, response of the execution including number of sync backup
updates is sent to the caller and after receiving the response, caller waits to receive the specified
number of sync backup acknowledgements for a predefined timeout. This timeout is 5 seconds by
default and defined by the system property hazelcast.operation.backup.timeout.millis (see System
Properties appendix).
A backup update can be missed because of a few reasons, such as a stale partition table information
on a backup replica member, network interruption, or a member crash. That’s why sync backup
acks require a timeout to give up. Regardless of being a sync or async backup, if a backup update is
missed, the periodically running anti-entropy mechanism detects the inconsistency and
synchronizes backup replicas with the primary. Also the graceful shutdown procedure ensures that
all backup replicas for partitions whose primary replicas are assigned to the shutting down
member will be consistent.
In some cases, although the target member of an invocation is assumed to be alive by the failure
detector, the target may not execute the operation or send the response back in time. Network
splits, long pauses caused by high load, GC or I/O (disk, network) can be listed as a few possible
666
reasons. When an invocation doesn’t receive any response from the member that owns primary
replica, then invocation fails with an OperationTimeoutException. This timeout is 2 minutes by
default and defined by the system property hazelcast.operation.call.timeout.millis (see System
Properties appendix). When timeout is passed, result of the invocation will be indeterminate.
In the following failure case, exactly-once guarantee can be broken: When the target member of a
pending invocation leaves the cluster while the invocation is waiting for a response, that invocation
is re-submitted to its new target due to the new partition table. It can be that, it has already been
executed on the leaving member and backup updates are propagated to the backup replicas, but
the response is not received by the caller. If that happens, the operation will be executed twice.
In the following failure case, invocation state becomes indeterminate: As explained above, when an
invocation does not receive a response in time, invocation fails with an OperationTimeoutException.
This exception does not say anything about outcome of the operation, that means operation may
not be executed at all, it may be executed once or twice (due to member left case explained above).
28.5. IndeterminateOperationStateException
As described in Invocation Lifecycle section, for partition-based mutating invocations, such as
map.put(), a caller waits with a timeout for the operation that is executed on corresponding
partition’s primary replica and backup replicas, based on the sync backup configuration of the
distributed data structure. Hazelcast 3.9 introduces a new mechanism to detect indeterminate
situations while making such invocations. If hazelcast.operation.fail.on.indeterminate.state
system property is enabled, a mutating invocation throws IndeterminateOperationStateException
when it encounters the following cases:
• The operation fails on partition primary replica member with MemberLeftException. In this case,
the caller may not determine the status of the operation. It could happen that the primary
replica executes the operation, but fails before replicating it to all the required backup replicas.
Even if the caller receives backup acks from some backup replicas, it cannot decide if it has
received all required ack responses, since it does not know how many acks it should wait for.
• There is at least one missing ack from the backup replicas for the given timeout duration. In this
case, the caller knows that the operation is executed on the primary replica, but some backup
may have missed it. It could be also a false-positive, if the backup timeout duration is configured
with a very small value. However, Hazelcast’s active anti-entropy mechanism eventually kicks
in and resolves durability of the write on all available backup replicas as long as the primary
replica member is alive.
When an invocation fails with IndeterminateOperationStateException, the system does not try to
rollback the changes which are executed on healthy replicas. Effect of a failed invocation may be
even observed by another caller, if the invocation has succeeded on the primary replica. Hence, this
667
new behavior does not guarantee linearizability. However, if an invocation completes without
IndeterminateOperationStateException when the configuration is enabled, it is guaranteed that the
operation has been executed exactly-once on the primary replica and specified number of backup
replicas of the partition.
Please note that IndeterminateOperationStateException does not apply to read-only operations, such
as map.get(). If a partition primary replica member crashes before replying to a read-only
operation, the operation is retried on the new owner of the primary replica.
Even though this communication failure is called as network partitioning, in practice a process or
an entire OS that’s suspending/pausing very long can cause communication interruptions. If these
interruptions take long enough time to assume that the other side is crashed, the cluster splits into
multiple partitions and they start operating independently. That’s why any communication
failure/interruption long enough can be classified as network partitioning.
Moreover, communication failures don’t have to be symmetrical. A network failure can interrupt
only one side of the channel or a suspended process/member may not even observe the rest as
crashed. That kind of network partitioning can be called as partial network partitioning.
• Split-brain protection (quorums): Split-brain protection could be used when consistency is the
major concern on a network partitioning. It requires a minimum cluster size to keep a
particular data structure available. When cluster size is below the defined split-brain protection
size, then subsequent operations are rejected with a SplitBrainProtectionException. See the
Split-Brain Protection section.
• Split-brain recovery (merge policies): Split-brain recovery is to make data structures available
and operational on both sides of a network partition, and merge their data once the network
partitioning problem is resolved. See the Split-Brain Recovery section.
Split-brain recovery is also supported for the data structures whose in-memory
format is NATIVE.
668
29.3. Split-Brain Protection
Split-brain protection mechanism provided in Hazelcast protects your cluster in case the number of
cluster members drops below the specified one. How to respond to a split-brain scenario depends
on whether consistency of data or availability of your application is of primary concern. In either
case, because a split-brain scenario is caused by a network failure, you must initiate an effort to
identify and correct the network failure. Your cluster cannot be brought back to steady state
operation until the underlying network failure is fixed. If consistency is your primary concern, you
can use Hazelcast’s split-brain protection feature.
This feature enables you to specify the minimum cluster size required for operations to occur. This
is achieved by defining and configuring a minimum-cluster-size for the cluster. If the cluster size is
below this minimum value, the operations are rejected and the rejected operations return a
SplitBrainProtectionException to their callers. Additionally, it is possible to configure this size with
a user-defined SplitBrainProtectionFunction which is consulted to determine there is no split-brain
on each cluster membership change.
Your application continues its operations on the remaining operating cluster. Any application
instances connected to the cluster with sizes below the minimum threshold defined by the split-
brain protection configuration receive exceptions which, depending on the programming and
monitoring setup, should generate alerts. The key point is that rather than applications continuing
in error with stale data, they are prevented from doing so.
Each data structure to be protected should have the configuration added to it as explained in the
Configuring Split-Brain Protection section.
For this reason, there will be a time window between the network partitioning and the application
of split-brain protection. Length of this window depends on the failure detector. Given guarantee is,
every member eventually detects the failed members and rejects the operation on the data
669
structure which requires the split-brain protection.
Starting with Hazelcast 3.10, split-brain protection can be configured with new out-of-the-box
SplitBrainProtectionFunctions which determine there is no split-brain independently of the cluster
membership manager, taking advantage of heartbeat and other failure-detector information
configured on Hazelcast members.
For more information, see the Consistency and Replication Model chapter.
You can set up the split-brain protection configuration using either declarative or programmatic
mechanism.
Assume that you have a 7-member Hazelcast Cluster and you want to set the minimum number of
four members for the cluster to continue operating. In this case, if a split-brain happens, the sub-
clusters of sizes 1, 2 and 3 are prevented from being used. Only the sub-cluster of four members is
allowed to be used.
This type of split-brain protection function determines the presence of split-brain protection based
on the count of members in the cluster, as observed by the local member’s cluster membership
manager and is available since Hazelcast 3.5. The following are map configurations for the example
7-member cluster scenario described above:
Declarative Configuration:
<hazelcast>
...
<split-brain-protection name="splitBrainProtectionRuleWithFourMembers" enabled=
"true">
<minimum-cluster-size>4</minimum-cluster-size>
</split-brain-protection>
<map name="default">
<split-brain-protection-ref>splitBrainProtectionRuleWithFourMembers</split-
brain-protection-ref>
</map>
...
</hazelcast>
Programmatic Configuration:
670
SplitBrainProtectionConfig splitBrainProtectionConfig = new
SplitBrainProtectionConfig();
splitBrainProtectionConfig.setName("splitBrainProtectionRuleWithFourMembers")
.setEnabled(true)
.setMinimumClusterSize(4);
The probabilistic split-brain protection function uses a private instance of Phi Accrual Cluster
Failure Detector which is updated with member heartbeats and its parameters can be fine-tuned to
determine the count of live members in the cluster, independently of the cluster’s membership
manager.
• suspicion-threshold: Threshold for suspicion (φ) level. A low threshold is prone to generate
many wrong suspicions but ensures a quick detection in the event of a real crash. Conversely, a
high threshold generates fewer mistakes but needs more time to detect actual crashes. Its
default value is 10.
• max-sample-size: Number of samples to use for calculation of mean and standard deviation of
inter-arrival times. Its default value is 200.
• heartbeat-interval-millis: Bootstrap the stats with heartbeats that corresponds to this duration
in milliseconds, with a rather high standard deviation (since environment is unknown in the
beginning). Its default value is 5000 milliseconds.
• min-std-deviation-millis: Minimum standard deviation (in milliseconds) to use for the normal
distribution used when calculating phi. Too low standard deviation might result in too much
sensitivity for sudden, but normal, deviations in heartbeat inter arrival times. Its default value
is 100 milliseconds.
Declarative Configuration:
671
<hazelcast>
...
<split-brain-protection enabled="true" name="probabilistic-split-brain-protection
">
<minimum-cluster-size>3</minimum-cluster-size>
<protect-on>READ_WRITE</protect-on>
<probabilistic-split-brain-protection acceptable-heartbeat-pause-millis="5000"
max-sample-size="500" suspicion-threshold="10" />
</split-brain-protection>
<set name="split-brain-protected-set">
<split-brain-protection-ref>probabilistic-split-brain-protection</split-brain-
protection-ref>
</set>
...
</hazelcast>
Programmatic Configuration:
SplitBrainProtectionConfig splitBrainProtectionConfig =
SplitBrainProtectionConfig.newProbabilisticSplitBrainProtectionConfigBuilder(
"probabilist-splitBrainProtection", 3)
.withAcceptableHeartbeatPauseMillis(5000)
.withMaxSampleSize(500)
.withSuspicionThreshold(10)
.build();
splitBrainProtectionConfig.setProtectOn(SplitBrainProtectionOn.READ_WRITE);
SetConfig setConfig = new SetConfig("split-brain-protected-set");
setConfig.setSplitBrainProtectionName("probabilist-splitBrainProtection");
Config config = new Config();
config.addSplitBrainProtectionConfig(splitBrainProtectionConfig);
config.addSetConfig(setConfig);
This function can be used to implement a more conservative split-brain protection by requiring
that a heartbeat has been received from each member within a configurable time window since
now.
Declarative Configuration:
672
<hazelcast>
...
<split-brain-protection enabled="true" name="recently-active-split-brain-
protection">
<minimum-cluster-size>4</minimum-cluster-size>
<protect-on>READ_WRITE</protect-on>
<recently-active-split-brain-protection heartbeat-tolerance-millis="60000" />
</split-brain-protection>
<set name="split-brain-protected-set">
<split-brain-protection-ref>recently-active-split-brain-protection</split-
brain-protection-ref>
</set>
...
</hazelcast>
Programmatic Configuration:
SplitBrainProtectionConfig splitBrainProtectionConfig =
SplitBrainProtectionConfig.newRecentlyActiveSplitBrainProtectionConfigBuilder
("recently-active-splitBrainProtection", 4, 60000)
.build();
splitBrainProtectionConfig.setProtectOn(SplitBrainProtectionOn.READ_WRITE);
SetConfig setConfig = new SetConfig("split-brain-protected-set");
setConfig.setSplitBrainProtectionName("recently-active-splitBrainProtection");
Config config = new Config();
config.addSplitBrainProtectionConfig(splitBrainProtectionConfig);
config.addSetConfig(setConfig);
• protect-on: Type of the cluster split-brain protection. Available values are READ, WRITE and
READ_WRITE.
673
class-name or recently-active-split-brain-protection.
package my.domain;
<hazelcast>
...
<split-brain-protection enabled="true" name="member-count-split-brain-protection">
<protect-on>READ_WRITE</protect-on>
<minimum-cluster-size>3</minimum-cluster-size>
<split-brain-protection-function-class-
name>my.domain.CustomSplitBrainProtectionFunction</split-brain-protection-function-
class-name>
</split-brain-protection>
...
</hazelcast>
You can register listeners to be notified about the split-brain protection results. Split-brain
protection listeners are local to the member where they are registered, so they receive only events
that occurred on that local member.
These listeners can be configured via declarative or programmatic configuration. The following
examples are such configurations.
Declarative Configuration:
674
<hazelcast>
...
<split-brain-protection name="splitBrainProtectionRuleWithFourMembers" enabled=
"true">
<minimum-cluster-size>4</minimum-cluster-size>
<split-brain-protection-listeners>
<split-brain-protection-listener>
com.company.splitbrainprotection.FourMemberSplitBrainProtectionListener
</split-brain-protection-listener>
</split-brain-protection-listeners>
</split-brain-protection>
<map name="default">
<split-brain-protection-ref>splitBrainProtectionRuleWithFourMembers</split-
brain-protection-ref>
</map>
...
</hazelcast>
Programmatic Configuration:
675
SplitBrainProtectionListenerConfig listenerConfig = new
SplitBrainProtectionListenerConfig();
// You can either directly set SplitBrainProtection listener implementation of your
own
listenerConfig.setImplementation(new SplitBrainProtectionListener() {
@Override
public void onChange(SplitBrainProtectionEvent splitBrainProtectionEvent) {
if (splitBrainProtectionEvent.isPresent()) {
// handle SplitBrainProtection presence
} else {
// handle SplitBrainProtection absence
}
}
});
// Or you can give the name of the class that implements SplitBrainProtectionListener
interface.
listenerConfig.setClassName("com.company.splitBrainProtection.ThreeMemberSplitBrainPro
tectionListener");
Split-brain protection service gives you the ability to query split-brain protection results over the
SplitBrainProtection instances. These instances let you query the result of a particular split-brain
protection.
676
/**
* {@link SplitBrainProtection} provides access to the current status of a split-brain
protection.
*/
public interface SplitBrainProtection {
/**
* Returns true if the minimum cluster size is satisfied, otherwise false.
*
* @return boolean whether the minimum cluster size property is satisfied
*/
boolean hasMinimumSize();
}
After the merging side is decided, the oldest cluster member of the merging side initiates the cluster
merge process by sending merge instructions to the members in its cluster.
While recovering from partitioning, Hazelcast uses merge policies for supported data structures to
677
resolve data conflicts between split clusters. A merge policy is a callback function to resolve
conflicts between the existing and merging data. Hazelcast provides an interface to be implemented
and also a selection of out-of-the-box policies. Data structures without split-brain recovery support
discarding the data from merging side.
For more information, see the Consistency and Replication Model chapter.
Since Hazelcast 3.10 all merge policies implement the unified interface
com.hazelcast.spi.SplitBrainMergePolicy. We provide the following out-of-the-box
implementations:
• HyperLogLogMergePolicy: Specialized merge policy for the CardinalityEstimator, which uses the
default merge algorithm from HyperLogLog research, keeping the maximum register value of
the two given instances.
• PutIfAbsentMergePolicy: The entry from the smaller cluster wins if it doesn’t exist in the cluster.
Additionally you can develop a custom merge policy by implementing the SplitBrainMergePolicy
interface, as explained in the Custom Merge Policies section
• ReplicatedMap
• MultiMap
• IAtomicLong
678
• IAtomicReference
• IQueue
• IList
• ISet
• RingBuffer
• CardinalityEstimator
• ScheduledExecutorService
The statistic based out-of-the-box merge policies are only supported by IMap, ICache, ReplicatedMap
and MultiMap. The HyperLogLogMergePolicy is supported by the CardinalityEstimator.
Except the CardinalityEstimator data structure, the default merge policy for all the
Hazelcast data structures that support split-brain recovery (listed above) is
PutIfAbsentMergePolicy. For the CardinalityEstimator data structure, the default
merge policy is HyperLogLogMergePolicy.
See also the Merge Types section for a complete overview of supported merge types of each data
structure. There is a config validation which checks these constraints to provide fail-fast behavior
for invalid configurations.
For the other data structures, e.g., ISemaphore, ICountdownLatch and ILock, the
instance from the smaller cluster is discarded during the split-brain recovery.
The merge policies are configured via a MergePolicyConfig, which can be set for all supported data
structures. The only exception is ICache, which just accepts the merge policy classname (due to
compatibility reasons with older Hazelcast clients). For ICache, all other configurable merge
parameters are the default values from MergePolicyConfig.
For custom merge policies you should set the full class name of your implementation as the merge-
policy configuration. For the out-of-the-box merge policies the simple classname is enough.
Declarative Configuration
Here are examples how merge policies can be specified for various data structures:
679
<hazelcast>
...
<map name="default">
<merge-policy batch-size="100">LatestUpdateMergePolicy</merge-policy>
</map>
<replicatedmap name="default">
<merge-policy batch-size="100">org.example.merge.MyMergePolicy</merge-policy>
</replicatedmap>
<multimap name="default">
<merge-policy batch-size="50">HigherHitsMergePolicy</merge-policy>
</multimap>
<list name="default">
<merge-policy batch-size="500">org.example.merge.MyMergePolicy</merge-policy>
</list>
<atomic-long name="default">
<merge-policy>PutIfAbsentMergePolicy</merge-policy>
</atomic-long>
...
</hazelcast>
Here is how merge policies are specified for ICache (it is the same configuration tag, but lacks the
support for additional attributes like batch-size):
<hazelcast>
...
<cache name="default">
<merge-policy>org.example.merge.MyMergePolicy</merge-policy>
</cache>
...
</hazelcast>
Programmatic Configuration
Here are examples how merge policies can be specified for various data structures:
680
MergePolicyConfig mergePolicyConfig = new MergePolicyConfig()
.setPolicy("org.example.merge.MyMergePolicy")
.setBatchSize(100);
Here is how merge policies are specified for ICache (you can only set the merge policy classname):
Please have in mind that existingValue can be null. This happens when a data
structure or key-based entry was just created in the smaller cluster.
Merge Types
A merge type defines an attribute which is required by a merge policy and provided by a data
structure.
MergingValue is the main merge type, which is required by all merge policies and provided by all
data structures. It contains the value of the merged data in raw (in-memory storage) and
681
deserialized format:
V getValue();
Object getRawValue();
}
MergingValue extends MergingView, which is a marker interface extended by all provided merge
types.
The most common extension of MergingValue is MergingEntry, which additionally provides the key in
raw (in-memory storage) and deserialized format (used by all key-based data structures like IMap or
ICache):
K getKey();
Object getRawKey();
}
In addition we have a bunch of specialized merge types, e.g., for provided statistics. An example is
MergingHits, which provides the hit counter of the merge data:
long getHits();
}
The ReplicatedMap provides key/value merge data, with the creation time, access hits, last access
time, last update time and TTL. The return type of the merge policy is Object.
The IQueue just provides a collection of values. The return type is also a Collection<Object>.
682
The following is the full list of merge types:
• MergingLastAccessTime: Represents the last time when the merged data is accessed.
• MergingLastUpdateTime: Represents the last time when the merged data is updated.
• MergingMaxIdle: Represents the maximum idle timeout value of the merged data.
• MergingCost: Represents the memory costs for the merging process after a split-brain.
And the following table shows the merge types provided by each data structure:
ICache • MergingEntry
• MergingCreationTime
• MergingHits
• MergingLastAccessTime
• MergingLastUpdateTime
• MergingTTL
683
Data Structure Merge Type
ReplicatedMap • MergingEntry
• MergingCreationTime
• MergingHits
• MergingLastAccessTime
• MergingLastUpdateTime
• MergingTTL
MultiMap • MergingEntry
• MergingCreationTime
• MergingHits
• MergingLastAccessTime
• MergingLastUpdateTime
ScheduledExecutorServi • MergingEntry
ce
The following sections show various examples on how to implement merge type interfaces for all
data structures, specific merge types or a specific data structure.
If you need the deserialized key or value, you have to call MergingValue.getValue() or
MergingEntry.getKey(). The deserialization is done lazily on that method call, since it’s quite
expensive and should be avoided if the result is not needed. This also requires the deserialized
classes to be on the classpath of the server. Otherwise a ClassNotFoundException is thrown.
This is an example which checks if the (deserialized) value of the mergingValue or existingValue is
an Integer. If so it is merged, otherwise null is returned (which removes the entry):
684
public class MergeIntegerValuesMergePolicy<V> implements SplitBrainMergePolicy<V,
MergingValue<V>, Object> {
@Override
public Object merge(MergingValue<V> mergingValue, MergingValue<V> existingValue) {
Object mergingUserValue = mergingValue.getValue();
Object existingUserValue = existingValue == null ? null : existingValue
.getValue();
System.out.println("========================== Merging..."
+ "\n mergingValue: " + mergingUserValue
+ "\n existingValue: " + existingUserValue
+ "\n mergingValue class: " + mergingUserValue.getClass().getName()
+ "\n existingValue class: " + (existingUserValue == null ? "null"
: existingUserValue.getClass().getName())
);
if (mergingUserValue instanceof Integer) {
return mergingValue.getRawValue();
}
return null;
}
@Override
public void writeData(ObjectDataOutput out) {
}
@Override
public void readData(ObjectDataInput in) {
}
}
For data structures like ISet or ICollection you need a merge policy, which supports collections:
685
public class MergeCollectionOfIntegerValuesMergePolicy
implements SplitBrainMergePolicy<Collection<Object>, MergingValue<Collection
<Object>>, Collection<Object>> {
@Override
public Collection<Object> merge(MergingValue<Collection<Object>> mergingValue,
MergingValue<Collection<Object>> existingValue) {
Collection<Object> result = new ArrayList<>();
for (Object value : mergingValue.getValue()) {
if (value instanceof Integer) {
result.add(value);
}
}
if (existingValue != null) {
for (Object value : existingValue.getValue()) {
if (value instanceof Integer) {
result.add(value);
}
}
}
return result;
}
@Override
public void writeData(ObjectDataOutput out) {
}
@Override
public void readData(ObjectDataInput in) {
}
}
You can also combine both merge policies to support single values and collections. This merge
policy is a bit more complex and less type safe, but can be configured on all data structures:
686
public class MergeIntegerValuesMergePolicy2<V, T extends MergingValue<V>> implements
SplitBrainMergePolicy<V, T, Object> {
@Override
public Object merge(T mergingValue, T existingValue) {
if (mergingValue.getValue() instanceof Integer) {
return mergingValue.getRawValue();
}
if (existingValue != null && existingValue.getValue() instanceof Integer) {
return existingValue.getRawValue();
}
if (mergingValue.getRawValue() instanceof Collection) {
Collection<Object> result = new ArrayList<>();
addIntegersToCollection(mergingValue, result);
if (result.isEmpty() && existingValue != null) {
addIntegersToCollection(existingValue, result);
}
return result;
}
return null;
}
@Override
public void writeData(ObjectDataOutput out) {
}
@Override
public void readData(ObjectDataInput in) {
}
}
Please have in mind that existingValue can be null, so a null check is mandatory
before calling existingValue.getValue() or existingValue.getRawValue().
If you return null on a collection based data structure, the whole data structure
will be removed. An empty collection works in the same way, so you don’t have to
check Collection.isEmpty() in your merge policy.
687
Accessing Hazelcast UserContext
If you need access to external references in your merge policy, you can use the Hazelcast
UserContext to get them injected. An example would be a database connection to check which value
is stored in your database. To achieve this your merge policy needs to implement
HazelcastInstanceAware and call HazelcastInstance.getUserContext():
688
public class UserContextMergePolicy<V> implements SplitBrainMergePolicy<V,
MergingValue<V>, Object>, HazelcastInstanceAware {
@Override
public Object merge(MergingValue<V> mergingValue, MergingValue<V> existingValue) {
Object mergingUserValue = mergingValue.getValue();
Object existingUserValue = existingValue == null ? null : existingValue
.getValue();
boolean isMergeable = truthProvider.isMergeable(mergingUserValue,
existingUserValue);
System.out.println("========================== Merging..."
+ "\n mergingValue: " + mergingUserValue
+ "\n existingValue: " + existingUserValue
+ "\n isMergeable(): " + isMergeable
);
if (isMergeable) {
return mergingValue.getRawValue();
}
return null;
}
@Override
public void writeData(ObjectDataOutput out) {
}
@Override
public void readData(ObjectDataInput in) {
}
@Override
public void setHazelcastInstance(HazelcastInstance hazelcastInstance) {
ConcurrentMap<String, Object> userContext = hazelcastInstance.getUserContext(
);
truthProvider = (TruthProvider) userContext.get(TRUTH_PROVIDER_ID);
}
689
MergePolicyConfig mergePolicyConfig = new MergePolicyConfig()
.setPolicy(UserContextMergePolicy.class.getName());
Hazelcast.newHazelcastInstance(config);
The merge operations are executed on the partition threads. Database accesses are
slow compared to in-memory operations. The SplitBrainMergePolicy.merge()
method is called for every key-value pair or every collection from your smaller
cluster, which has a merge policy defined. So there can be millions of database
accesses due to a merge policy, which implements this. Be aware that this can
block your cluster for a long time or overload your database due to the high
amount of queries.
You can also write a merge policy, which requires multiple merge types. This merge policy is
supported by all data structures, which provide MergingHits and MergingCreationTime:
690
public class ComposedHitsAndCreationTimeMergePolicy<V, T extends MergingValue<V> &
MergingHits & MergingCreationTime>
implements SplitBrainMergePolicy<V, T, Object> {
@Override
public Object merge(T mergingValue, T existingValue) {
if (existingValue == null) {
return mergingValue.getValue();
}
System.out.println("========================== Merging value " + mergingValue
.getValue() + "..."
+ "\n mergingValue creation time: " + mergingValue.getCreationTime
()
+ "\n existingValue creation time: " + existingValue
.getCreationTime()
+ "\n mergingValue hits: " + mergingValue.getHits()
+ "\n existingValue hits: " + existingValue.getHits()
);
@Override
public void writeData(ObjectDataOutput out) {
}
@Override
public void readData(ObjectDataInput in) {
}
}
If you configure this merge policy on a data structures, which does not provide these merge types,
you get an InvalidConfigurationException with a message like:
It’s also possible to restrict a merge policy to a specific data structure. This merge policy, for
example, only works on IMap:
691
public class MapEntryCostsMergePolicy implements SplitBrainMergePolicy<Object,
MapMergeTypes<Object, Object>, Object> {
@Override
public Object merge(MapMergeTypes mergingValue, MapMergeTypes existingValue) {
if (existingValue == null) {
return mergingValue.getValue();
}
System.out.println("========================== Merging key " + mergingValue
.getKey() + "..."
+ "\n mergingValue costs: " + mergingValue.getCost()
+ "\n existingValue costs: " + existingValue.getCost()
);
@Override
public void writeData(ObjectDataOutput out) {
}
@Override
public void readData(ObjectDataInput in) {
}
}
This is another example for a merge policy, which only works on the IAtomicReference:
692
public class AtomicReferenceMergeIntegerValuesMergePolicy
implements SplitBrainMergePolicy<Object, AtomicReferenceMergeTypes, Object> {
@Override
public Object merge(AtomicReferenceMergeTypes mergingValue,
AtomicReferenceMergeTypes existingValue) {
Object mergingUserValue = mergingValue.getValue();
Object existingUserValue = existingValue == null ? null : existingValue
.getValue();
System.out.println("========================== Merging..."
+ "\n mergingValue: " + mergingUserValue
+ "\n existingValue: " + existingUserValue
+ "\n mergingValue class: " + mergingUserValue.getClass().getName()
+ "\n existingValue class: " + (existingUserValue == null ? "null"
: existingUserValue.getClass().getName())
);
if (mergingUserValue instanceof Integer) {
return mergingValue.getRawValue();
}
return null;
}
@Override
public void writeData(ObjectDataOutput out) {
}
@Override
public void readData(ObjectDataInput in) {
}
}
Although every data structure supports MergingValue, which is the only merge type of
AtomicReferenceMergeTypes, this merge policy is restricted to IAtomicReference data structures:
Best Practices
Here are some best practices when implementing your own merge policy
• Only call MergingValue.getValue() and MergingEntry.getKey() when you really need the
deserialized value to save costs (CPU and memory) and avoid ClassNotFoundException.
• If you want to return one of the given values (merging or existing), it’s best to return
mergingValue.getRawValue() or existingValue.getRawValue(), since they are already in the correct
in-memory format of the data structure. If you return a deserialized value, it might need to be
693
serialized again, which are avoidable costs.
• Be careful with slow operations in the merge policy (like database accesses), since they block
your partition threads. Also the LifeCycleEvent.MERGED or LifeCycleEvent.MERGE_FAILED may be
thrown too early, if the merge operations take too long to finish.
When you want to reconfigure a system property, you need to restart the members
for which the property is modified.
hazelcast.backpres 60000 int Controls the maximum timeout in milliseconds to wait for
sure.backoff.timeo
ut.millis an invocation space to be available. The value needs to be
equal to or larger than 0.
hazelcast.cache.in true bool Specifies whether the cache invalidation event batch
validation.batch.e
nabled sending is enabled or not.
694
hazelcast.client.c 120000 int Timeout duration to decide if a client is still part of the
leanup.timeout.mil
lis cluster. If a member cannot find any connection to a client
in the cluster, it cleans up the local resources that are
owned by that client.
hazelcast.client.p 1024 int Client protocol message size limit (in bytes) for unverified
rotocol.max.messag
e.bytes connections. I.e. maximal length of the client
authentication message.
hazelcast.cluster. false bool Specifies whether the automatic cluster version upgrading
version.auto.upgra
de.enabled is enabled.
695
hazelcast.diagnost 50 int Size of each diagnostic file to be rolled.
ics.max.rolled.fil
e.size.mb
hazelcast.diagnost 10 long Period for which the MemberHeartbeats plugin of the
ics.member-
heartbeat.seconds diagnostics tool runs. 0 means this plugin is disabled.
hazelcast.diagnost 0 long Period, in seconds, for resetting the statistics for the
ics.storeLatency.r
eset.period.second StoreLatency plugin of the diagnostics tool.
s
hazelcast.diagnost true bool Specifies whether the SystemLog plugin of the diagnostics
ics.systemlog.enab
led tool is enabled or not.
696
hazelcast.discover false bool Enables/disables the Discovery SPI lookup over the old
y.enabled
native implementations. See Discovery SPI for more
information.
hazelcast.discover false bool Enables the discovery joiner to use public IPs from
y.public.ip.enable
d DiscoveredNode. See Discovery SPI for more information.
hazelcast.executio false bool Controls whether the task scheduler removes tasks
nservice.tasksched
uler.remove.oncanc immediately upon cancellation. This is disabled by default,
el because it can cause severe delays on the other
operations. By default all cancelled tasks are eventually
get removed by the scheduler workers.
hazelcast.client.m 300 int Time after which the member assumes the client is dead
ax.no.heartbeat.se and closes its connections to the client.
conds
hazelcast.connect. 120 int Timeout to connect all other cluster members when a
all.wait.seconds
member is joining to a cluster.
hazelcast.discover false bool Enable use of public IP address in member discovery with
y.public.ip.enable
d Discovery SPI. If you set this property to true in your
source cluster, please make sure you have set the public
addresses for your target members since they will be
discovered using their public addresses. Otherwise, they
cannot be discovered. See the Public Address section.
697
hazelcast.health.m 30 int Health monitoring logging interval in seconds. NOTE: For
onitoring.delay.se
conds detailed information on the health monitoring tool, along
with this and the following health monitoring related
system properties, see the Health Check and Monitoring
section.
hazelcast.health.m SILENT string Health monitoring log level. When SILENT, logs are
onitoring.level
printed only when values exceed some predefined
threshold. When NOISY, logs are always printed
periodically. Set OFF to turn off completely.
hazelcast.health.m 70 int When the health monitoring level is SILENT, logs are
onitoring.threshol
d.cpu.percentage printed only when the CPU usage exceeds this threshold.
hazelcast.health.m 70 int When the health monitoring level is SILENT, logs are
onitoring.threshol
d.memory.percentag printed only when the memory usage exceeds this
e threshold.
hazelcast.hidensit true bool If enabled and is able to fetch memory statistics via Java’s
y.check.freememory
OperatingSystemMXBean, it checks whether there is enough
free physical memory for the requested number of bytes.
If the free memory checker is disabled (false), acts as if the
check is succeeded.
hazelcast.icmp.ech true bool Specifies whether ICMP Echo Request mode for ping
o.fail.fast.on.sta
rtup detector is enforced. If OS is not supported, or not
configured correctly, as explained in Requirements and
Linux/Unix Configuration, Hazelcast fails to start.
hazelcast.icmp.tim 1000 int ICMP timeout in milliseconds. This cannot be more than
eout
the value of hazelcast.icmp.interval property; it should
always be smaller.
698
hazelcast.icmp.ttl 0 int ICMP TTL (maximum numbers of hops to try).
hazelcast.index.co COPY_O string Defines the behavior for index copying on index
py.behavior N_READ read/write. See the Copying Indexes section.
hazelcast.initial. 0 int Initial expected cluster size to wait before member to start
min.cluster.size
completely.
hazelcast.invalida 60 int Period for which the cluster members are scanned to
tion.reconciliatio
n.interval.seconds compare generated invalidation events with the received
ones from Near Cache.
699
hazelcast.legacy.m false bool Enables the legacy (for the releases before Hazelcast 3.9)
emberlist.format.e
nabled member list format which is printed in the logs. The new
format is introduced starting with Hazelcast 3.9 and
includes member list version. Any change in the cluster,
such as a member leaving or joining, increments the
member list version. See the Starting the Member and
Client section.
hazelcast.lock.max Long.M long All locks which are acquired without an explicit lease time
.lease.time.second AX_VAL
s use this value (in seconds) as the lease time. When you
UE want to set an explicit lease time for your locks, you
cannot set it to a longer time than this value.
hazelcast.logging. jdk enum Name of logging framework type to send logging events.
type
hazelcast.map.entr false bool Notify entry listeners with predicates on map entry
y.filtering.natura
l.event.types updates with events that match entry, update or exit from
predicate value space.
hazelcast.map.inva true bool Enable or disable batching. When it is set to false, all
lidation.batch.ena
bled invalidations are sent immediately.
700
hazelcast.map.repl 10 int Scheduler delay for map tasks those are executed on
ica.wait.seconds.f
or.scheduled.tasks backup members.
hazelcast.mastersh 120 int Timeout which defines when master candidate gives up
ip.claim.timeout.s
econds waiting for response to its mastership claim. After timeout
happens, non-responding member is removed from the
member list.
701
hazelcast.max.no.m 150 int Max timeout of master confirmation from other members.
aster.confirmation
.seconds This property is deprecated as of Hazelcast IMDG 3.10.
hazelcast.member.n true bool Defines whether the Moby naming should be used for
aming.moby.enabled
generating instance names when they are not provided by
user. Moby name is a short human-readable name
consisting of a randomly chosen adjective and the
surname of a famous person. If set to true, a Moby name is
generated. Otherwise, a name that is concatenation of a
static prefix, number and cluster name is provided.
hazelcast.metrics. true bool Enables buffering the collected metrics for Management
mc.enabled
Center if set to true, disables it otherwise. Note that the
preferred way for controlling this setting is Metrics
Configuration.
hazelcast.metrics. 5 int Duration, in seconds, that the metrics are retained for
mc.retention
Management Center. Note that the preferred way for
controlling this setting is Metrics Configuration.
hazelcast.metrics. true bool Enables exposing the collected metrics over JMX if set to
jmx.enabled
true, disables it otherwise. Note that the preferred way for
controlling this setting is Metrics Configuration.
702
hazelcast.metrics. false bool Enables collecting debug metrics if set to true, disables it
debug.enabled
otherwise. Note that this is meant to be enabled only if
diagnostics feature is enabled, since currently only this
feature consumes the debug metrics.
hazelcast.client.m true bool Enables the metrics collection if set to true, disables it
etrics.enabled
otherwise. Note that the preferred way for controlling this
setting is Metrics Configuration.
hazelcast.client.m true bool Enables exposing the collected metrics over JMX if set to
etrics.jmx.enabled
true, disables it otherwise. Note that the preferred way for
controlling this setting is Metrics Configuration.
hazelcast.client.m false bool Enables collecting debug metrics if set to true, disables it
etrics.debug.enabl
ed otherwise. Note that this is meant to be enabled only if
diagnostics is enabled, since currently only diagnostics
consumes the debug metrics.
hazelcast.nio.tcp. false bool Controls whether more strict checks upon BIND requests
spoofing.checks
towards a cluster member are applied. The checks mainly
validate the remote BIND request against the remote
address as found in the socket. By default they are
disabled, to avoid connectivity issues when deployed
under NAT’ed infrastructure.
hazelcast.operatio 5000 int Maximum time a caller to wait for backup responses of an
n.backup.timeout.m
illis operation. After this timeout, operation response is
returned to the caller even no backup response is
received.
703
hazelcast.operatio false bool When enabled, an operation fails with
n.fail.on.indeterm
inate.state IndeterminateOperationStateException, if it does not
receive backup acks in time with respect to backup
configuration of its data structure, or the member which
owns primary replica of the target partition leaves the
cluster.
hazelcast.operatio 60000 int Timeout to wait for a response when a remote call is sent,
n.call.timeout.mil
lis in milliseconds.
hazelcast.operatio block string Specifies whether the response thread for internal
n.responsequeue.id
lestrategy operations on the member side are blocked or not. If you
use block (the default value) the thread is blocked and
need to be notified which can cause a reduction in the
performance. If you use backoff there is no blocking. By
enabling the backoff mode and depending on your use
case, you can get a 5-10% performance improvement.
However, keep in mind that this increases the CPU
utilization. We recommend you to use backoff with care
and if you have a tool for measuring your cluster’s
performance.
704
hazelcast.partitio true bool When enabled, which is the default behavior, partitions
n.migration.fragme
nts.enabled are migrated/replicated in small fragments instead of one
big chunk. Migrating partitions in fragments reduces
pressure on the memory and network since smaller
packets are created in the memory and sent through the
network. Note that it can increase the migration time to
complete.
hazelcast.performa bool Enable the performance monitor, a tool which allows you
nce.monitoring.ena
bled to see internal performance metrics. These metrics are
written to a dedicated log file.
hazelcast.prefer.i true bool Prefer IPv4 network interface when picking a local
pv4.stack
address.
705
hazelcast.query.ma 3 int Maximum value of local partitions to trigger local pre-
x.local.partition.
limit.for.precheck check for Predicates#alwaysTrue() query operations on
maps.
hazelcast.query.op RULES String Type of the query optimizer. For optimizations based on
timizer.type
static rules, set the value to RULES. To disable the
optimization, set the value to NONE.
hazelcast.query.pr false bool Each Hazelcast member evaluates query predicates using
edicate.parallel.e a single thread by default. In most cases, the overhead of
valuation
inter-thread communications overweight can benefit from
parallel execution. When you have a large dataset and/or
slow predicate, you may benefit from parallel predicate
evaluations. Set to true if you are using slow predicates or
have > 100,000s entries per member.
hazelcast.query.re -1 int Result size limit for query operations on maps. This value
sult.size.limit
defines the maximum number of returned elements for a
single query result. If a query exceeds this number of
elements, a QueryResultSizeExceededException is thrown.
Its default value is -1, meaning it is disabled.
hazelcast.shutdown TERMIN string Specifies the behavior when JVM is exiting while the
hook.policy ATE Hazelcast instance is still running. It has two values:
TERMINATE and GRACEFUL. The former one terminates
the Hazelcast instance immediately. The latter, GRACEFUL,
initiates the graceful shutdown which can significantly
slow down the JVM exit process, but it tries to retain data
safety. Note that you should always shutdown Hazelcast
explicitly via using the method
HazelcastInstance.shutdown(). It’s not recommended to
rely on the shutdown hook, this is a last-effort measure.
706
hazelcast.slow.ope 3600 int Defines the retention time of invocations in slow
ration.detector.lo
g.retention.second operation logs. If an invocation is older than this value, it
s is purged from the log to prevent unlimited memory
usage. When all invocations are purged from a log, the log
itself is deleted.
hazelcast.slow.ope false bool Defines if the stacktraces of slow operations are logged in
ration.detector.st
acktrace.logging.e the log file. Stack traces are always reported to the
nabled Management Center, but by default, they are not printed
to keep the log size small.
hazelcast.socket.b true bool Bind both server-socket and client-sockets to any local
ind.any
interface.
hazelcast.socket.b false bool Specifies whether the byte buffers used in the socket
uffer.direct
should be a direct byte buffer (true) or a regular one
(false). When it is set to true, Hazelcast internally uses the
method ByteBuffer.allocateDirect (instead of
ByteBuffer.allocate) which makes use of the off-heap and
may skip the memory copying when performing socket I/O
operations. See here for more information.
hazelcast.socket.c true bool Bind client-sockets to any local interface. If not set,
lient.bind.any
hazelcast.socket.bind.any is used as the default.
hazelcast.socket.c -1 int Hazelcast creates all connections with receive buffer size
lient.receive.buff
er.size set according to the hazelcast.socket.receive.buffer.size.
When it detects a connection opened by a client, then it
adjusts the receive buffer size according to this property.
It is in kilobytes and its default value is -1.
707
hazelcast.socket.c -1 int Hazelcast creates all connections with send buffer size set
lient.send.buffer.
size according to the hazelcast.socket.send.buffer.size. When
it detects a connection opened by a client, then it adjusts
the send buffer size according to this property. It is in
kilobytes and its default value is -1.
hazelcast.socket.r 128 int Socket receive buffer (SO_RCVBUF) size in KB. If you have a
eceive.buffer.size
very fast network, e.g., 10gbit) and/or you have large
entries, then you may benefit from increasing
sender/receiver buffer sizes. Use this property and the
next one below tune the size.
hazelcast.socket.s true bool Bind server-socket to any local interface. If not set,
erver.bind.any
hazelcast.socket.bind.any is used as the default.
hazelcast.tcp.join 3 int The number of incremental ports, starting with the port
.port.try.count
number defined in the network configuration, that is used
to connect to a host (which is defined without a port in
TCP/IP member list while a member is searching for a
cluster).
hazelcast.unsafe.m auto string "auto" (the default value) automatically detects whether
ode
the usage of Unsafe is suitable for a given platform.
"disabled" explicitly disables the Unsafe usage in your
platform. "enforced" enforces the usage of Unsafe even if
your platform does not support it. This property can only
be set by passing a JVM-wide system property.
hazelcast.phone.ho true bool Enable or disable the sending of phone home data to
me.enabled
Hazelcast’s phone home server.
708
hazelcast.wan.map. false bool Configures WAN replication for IMap on the PASSIVE
useDeleteWhenProce
ssingRemoveEvents cluster to remove entries using delete instead of remove
and when using
com.hazelcast.enterprise.wan.replication.WanBatchReplic
ation as an endpoint implementation. The member which
receives the event batch in the PASSIVE cluster dispatches
WAN events to the partition owners as map merge and
remove operations. When using remove operations, the
old entry value is sent from the partition owner to the
caller even though the caller does not use the old value.
This can also lead to issues if the PASSIVE cluster does not
contain the class definition for the entry value as the
value tries to get deserialized, causing
ClassNotFoundExceptions. You can switch to using map
remove instead on the PASSIVE cluster with this property.
This both saves the bandwidth and avoid the exceptions.
|hazelcast.write.through |true |bool |Optimization that allows sending of packets over the
network to be done on the calling thread if the conditions are right. This can reduce the latency and
increase the performance for low threaded environments.
IMDG 4.0 is a major version release. The last major version release was over five years ago. Major
releases allow us to break compatibility in the wire protocols and API, as well as removing the
previously deprecated API.
As breaking changes have been made to the client and cluster member protocols, it is not possible
to perform any in-place or rolling upgrade from a running IMDG 3.x cluster to IMDG 4.x. The only
way to upgrade to IMDG 4.x is to completely shutdown the cluster.
• The hazelcast-client module has been merged into the core module: All the classes in the
hazelcast-client module have been moved to hazelcast. hazelcast-client.jar will not be
created anymore.
• Also the com.hazelcast.client Java module is not used anymore. All classes are now available
709
within the com.hazelcast.core module.
Hazelcast IMDG’s public SPI (Service Provider Interface) which was known as User Defined Service
has been removed. It was not simple enough and backwards compatibility was broken. A new and
clearly defined SPI may be developed in the future if there is enough interest. The removed SPI’s
classes will be kept to be used internally.
If there are 20 or more processors detected, the Hazelcast member by default starts 4+4 (4 input
and 4 output) I/O threads. This is to increase out of the box performance on faster machines
because often (especially the cache with caching situations) the performance is I/O bound and
having some extra cores available for I/O can make a significant difference. If less than 20 cores are
detected, 3+3 IO threads are used and the behavior remains the same as Hazelcast IMDG 3.x series.
A smart client, by default, gets 3+3 (3 input and 3 output) I/O threads to speed up the performance.
Before Hazelcast IMDG 4.0, this was 1+1. However, the client I/O can become a bottleneck with too
few threads. If TLS/SSL is enabled, then by default a smart client makes use of 3+3 I/O threads
which was already the case with previous versions.
There is a new performance feature in Hazelcast IMDG 4.0 called thread overcommit. By default,
Hazelcast creates more threads than it has cores, e.g., on a 20 cores machine it creates 28 threads;
20 threads for the partition operations and 4+4 threads for I/O. In case of a typical caching usage
(get/put/set, etc.) having too many threads can cause a performance degradation due to increased
context switching. So there is a new option called hazelcast.operation.thread.overcommit. If this
property is set to true, i.e., -Dhazelcast.operation.thread.overcommit=true, which is the default,
Hazelcast uses the old style thread configuration where there are more threads than cores. If set to
false, the number of partition threads plus the I/O threads will be equal to the core count. It
depends on the environment if this gives a performance boost or not. In some environments it can
give a significant boost and in some it will give a significant loss; it is best to benchmark for your
specific situation. If you are doing lots of queries or other tasks which are CPU-bound, e.g
aggregations, you probably want to have as many cores available to partition operations as
possible.
See the Threading Model section for more information on Hazelcast IMDG’s threading model.
A write-through optimization has been performed. This helps to reduce the latency in case of single
threaded usages.
710
Normally, when a request is made, the request is handed over the I/O system where an I/O thread
takes care of sending it over the wire. This is great for throughput, but in case of single threaded
setups, it adds to the latency and therefore it reduces the throughput because threads need to be
notified.
With this release, Hazelcast IMDG detects the single threaded usage and tries to write through to
the socket directly instead of handing it over to the I/O thread; this optimization is called "write-
through".
This technique is being applied on the client, but also on the member. We have something similar
when responses are received: normally a response is processed by the response thread, but in case
of a single threaded usage, the response is processed on the I/O thread so we can remove a thread
notification and therefore get higher throughput.
Both the write-through and response-through are enabled by default. If Hazelcast IMDG detects that
there are many active threads, response- and write-through are disabled so it won’t cause a
performance degradation.
• setSmartRouting(boolean) → getNetworkConfig().setSmartRouting(boolean);
• getSocketInterceptorConfig() → getNetworkConfig().getSocketInterceptorConfig();
• setSocketInterceptorConfig(SocketInterceptorConfig) →
getNetworkConfig().setSocketInterceptorConfig(SocketInterceptorConfig);
• getConnectionTimeout() → getNetworkConfig().getConnectionTimeout();
• setConnectionTimeout(int) → getNetworkConfig().setConnectionTimeout(int);
• addAddress(String) → getNetworkConfig().addAddress(String);
• getAddresses() → getNetworkConfig().getAddresses();
• setAddresses(List) → getNetworkConfig().setAddresses(List);
• isRedoOperation() → getNetworkConfig().isRedoOperation();
• setRedoOperation(boolean) → getNetworkConfig().setRedoOperation(boolean);
• getSocketOptions() → getNetworkConfig().getSocketOptions();
• setSocketOptions() → getNetworkConfig().setSocketOptions(SocketOptions);
• setSocketOptions() → getNetworkConfig().setSocketOptions(SocketOptions);
• getNetworkConfig().setAwsConfig(new ClientAwsConfig()); →
getNetworkConfig().setAwsConfig(new AwsConfig());
The naming for the declarative configuration elements have not been changed. See the Release
Notes for new/removed configuration features.
711
See the following table for the before/after configuration samples.
Programmatic Configuration
712
ClientConfig clientConfig = new ClientConfig clientConfig = new
ClientConfig(); ClientConfig();
clientConfig. clientConfig
setSmartRouting(true); .getNetworkConfig().setSmartRouting(true
clientConfig.isSmartRouting );
(); clientConfig
clientConfig .getNetworkConfig().isSmartRouting();
.getSocketInterceptorConfig(); clientConfig
clientConfig .getNetworkConfig().getSocketInterceptor
.setSocketInterceptorConfig(new Config();
SocketInterceptorConfig()); clientConfig
clientConfig .getNetworkConfig().setSocketInterceptor
.getConnectionTimeout(); Config(new SocketInterceptorConfig());
clientConfig clientConfig
.setConnectionTimeout(1000); .getNetworkConfig().getConnectionTimeout
clientConfig.addAddress( ();
"127.0.0.1:5701"); clientConfig
clientConfig.getAddresses(); .getNetworkConfig().setConnectionTimeout
clientConfig.setAddresses (1000);
(Collections.singletonList("127.0.0.1:57 clientConfig
01")); .getNetworkConfig().addAddress("127.0.0.
clientConfig. 1:5701");
isRedoOperation(); clientConfig
clientConfig .getNetworkConfig().getAddresses();
.setRedoOperation(true); clientConfig
clientConfig .getNetworkConfig().setAddresses(Collect
.getSocketOptions(); ions.singletonList("127.0.0.1:5701"));
clientConfig clientConfig
.setSocketOptions(new SocketOptions()); .getNetworkConfig().isRedoOperation();
clientConfig clientConfig
.getNetworkConfig().setAwsConfig(new .getNetworkConfig().setRedoOperation(tru
ClientAwsConfig()); e);
ClientAwsConfig awsConfig = clientConfig
clientConfig.getNetworkConfig().getAwsCo .getNetworkConfig().getSocketOptions();
nfig(); clientConfig
} .getNetworkConfig().setSocketOptions(new
SocketOptions());
clientConfig
.getNetworkConfig().setAwsConfig(new
AwsConfig());
AwsConfig awsConfig =
clientConfig.getNetworkConfig().getAwsCo
nfig();
}
713
B.1.8. Changes in Index Configuration
In order to support further extensibility of Hazelcast, index configuration has been refactored.
Index type is now defined through the IndexType enumeration instead of the boolean flag: ordered
index is now referred to as IndexType.SORTED, unordered as IndexType.HASH.
In composite indexes, index parts are now defined as a list of strings instead of a single string with
comma-separated values.
With these changes, the following configuration parameters have been renamed:
• MapIndexConfig → IndexConfig
• MapConfig.getMapIndexConfig → MapConfig.getIndexConfig
• MapConfig.setMapIndexConfig → MapConfig.setIndexConfig
• MapConfig.addMapIndexConfig → MapConfig.addIndexConfig
Programmatic Configuration
Declarative Configuration
714
<hazelcast> <hazelcast>
... ...
<map name="person"> <map name="person">
<indexes> <indexes>
<index ordered="false">name, <index type="HASH">
age</index> <attributes>
</indexes> <attribute>
</map> name</attribute>
... <attribute>
</hazelcast> age</attribute>
</attributes>
</index>
</indexes>
</map>
...
</hazelcast>
Custom attributes are referenced in predicates, queries and indexes. Some improvements have
been performed in Hazelcast’s query engine and one of the results is the change in custom attribute
configurations.
With this change, the following configuration parameters have been renamed:
• extractor → extractor-class-name
• MapAttributeConfig → AttributeConfig
• setExtractor() → setExtractorClassName()
• addMapAttributeConfig() → addAttributeConfig()
715
Before IMDG 4.0 After IMDG 4.0
Programmatic Configuration
Declarative Configuration
<hazelcast> <hazelcast>
... ...
<map name="trades"> <map name="trades">
<attributes> <attributes>
<attribute extractor= <attribute extractor-class-
"com.bank.CurrencyExtractor">currency</a name="com.bank.CurrencyExtractor">curren
ttribute> cy</attribute>
</attributes> </attributes>
</map> </map>
... ...
</hazelcast> </hazelcast>
Also, some custom query attribute classes were previously abstract classes with one abstract
method. They have been converted into functional interfaces:
• ValueCallback
• ValueExtractor
Implementing ValueExtractor
716
public static class public static class
PortableNameExtractor extends PortableNameExtractor implements
ValueExtractor<ValueReader, Object> { ValueExtractor<ValueReader, Object> {
@Override @Override
public void extract(ValueReader public void extract(ValueReader
target, Object argument, ValueCollector target, Object argument, ValueCollector
collector) { collector) {
target.read("name", new target.read("name",
ValueCallback<Object>() { (ValueCallback) value -> collector
@Override .addObject(value));
public void onResult(Object }
value) { }
collector.addObject
(value);
}
});
}
}
MapReduce API has been removed, which was deprecated since Hazelcast IMDG 3.8. Instead, you
can use the Aggregations on top of Query infrastructure and the Hazelcast Jet distributed
computing platform as its successors and replacements.
See the following table for the before(MapReduce)/after(Hazelcast Jet) word count sample.
717
JobTracker tracker = hazelcastInstance JobTracker t = hz.getJobTracker("word-
.getJobTracker("default"); count");
IMap<Long, String> documents = hz.
IMap<String, String> map = getMap("documents");
hazelcastInstance.getMap(MAP_NAME); LongSumAggregation<String, String> aggr
KeyValueSource<String, String> source = = new LongSumAggregation<>();
KeyValueSource.fromMap(map); Map<String, Long> counts =
t.newJob(KeyValueSource.fromMap
Job<String, String> job = tracker. (documents))
newJob(source); .mapper((Long x, String
ICompletableFuture<Map<String, Integer>> document, Context<String, Long> ctx) ->
future = job Stream.of(document
.mapper(new TokenizerMapper( .toLowerCase().split("\\W+"))
)) .filter(w -> !w
.combiner(new .isEmpty())
WordcountCombinerFactory()) .forEach(w ->
.reducer(new ctx.emit(w, 1L)))
WordcountReducerFactory()) .combiner(aggr
.submit(); .getCombinerFactory())
.reducer(aggr.
System.out.println getReducerFactory())
(ToStringPrettyfier.toString(future.get( .submit()
))); .get();
The MigrationListener API has been refactored. With this change, an event is published when a new
migration process starts and another event when migration is completed. These events include
statistics about the migration process including the start time, planned migration count, completed
migration count, etc.
Additionally, a migration event is published on each replica migration, both for primary and
backup replica migrations. This event includes the partition ID, replica index and migration
progress statistics.
Before IMDG 4.0, the following were the events listened by MigrationListener:
• migrationStarted
• migrationCompleted
• migrationFailed
• migrationStarted
718
• migrationFinished
• replicaMigrationCompleted
• replicaMigrationFailed
719
import import
com.hazelcast.core.MigrationEvent; com.hazelcast.partition.MigrationListene
import r;
com.hazelcast.core.MigrationListener; import
com.hazelcast.partition.MigrationState;
public class ClusterMigrationListener import
implements MigrationListener { com.hazelcast.partition.ReplicaMigration
@Override Event;
public void migrationStarted
(MigrationEvent migrationEvent) { public class ClusterMigrationListener
System.err.println("Started: " + implements MigrationListener {
migrationEvent);
} @Override
@Override public void migrationStarted
public void migrationCompleted (MigrationState state) {
(MigrationEvent migrationEvent) { System.out.println("Migration
System.err.println("Completed: " Started: " + state);
+ migrationEvent); }
}
@Override @Override
public void migrationFailed public void migrationFinished
(MigrationEvent migrationEvent) { (MigrationState state) {
System.err.println("Failed: " + System.out.println("Migration
migrationEvent); Finished: " + state);
} }
}
@Override
public void
replicaMigrationCompleted(ReplicaMigrati
onEvent event) {
System.out.println("Replica
Migration Completed: " + event);
}
@Override
public void replicaMigrationFailed
(ReplicaMigrationEvent event) {
System.out.println("Replica
Migration Failed: " + event);
}
}
720
B.1.12. Defaulting to OpenSSL
• when you use TLS/SSL and Hazelcast IMDG detects some OpenSSL capabilities
The GroupConfig class has been removed. Both the client and member configurations have the
GroupConfig (or <group> in XML) replaced by a simple cluster name configuration. The password
part from the GroupConfig which was already deprecated is removed now.
Declarative Configuration
<hazelcast> <hazelcast>
<group> <cluster-name>dev</cluster-name>
<name>dev</name> </hazelcast>
<password>dev-pass</password>
</group>
</hazelcast>
Programmatic Configuration
721
requesting security just references the security realm name.
The <credentials> configuration is not supported anymore in the client security configuration.
Existing <credentials-factory> configuration allows to fully replace the credentials as it is more
flexible. There are also new <username-password> and <token> configuration elements which simplify
the migration.
722
Before IMDG 4.0 After IMDG 4.0
<security> <security>
<username-password username="
<credentials>com.acme.security.JohnDoeCr johndoe" password="s3crEt"/>
edentials</credentials> </security>
</security>
The ClusterPrincipal class representing an authenticated user within the JAAS Subject has been
replaced by three different principal types:
• ClusterIdentityPrincipal
• ClusterRolePrincipal
• ClusterEndpointPrincipal
All these new principal types share the HazelcastPrincipal interface so it is simple to get or remove
them all from the subject.
With this change, the Credentials object is not referenced from the principals anymore.
Also, DefaultPermissionPolicy which was consuming ClusterPrincipal and also reading the
endpoint address from it works with the new ClusterRolePrincipals and ClusterEndpointPrincipals
principal types.
See the following table for the before/after sample IPermissionPolicy implementations.
723
public PermissionCollection public PermissionCollection
getPermissions(Subject subject, Class<? getPermissions(Subject subject, Class<?
extends Permission> type) { extends Permission> type) {
PermissionCollection collection = . PermissionCollection collection = .
..; ..;
for (ClusterPrincipal principal : Set<ClusterEndpointPrincipal>
subject.getPrincipals(ClusterPrincipal.c endpointPrincipals = subject
lass)) { .getPrincipals(ClusterEndpointPrincipal.
String endpoint = principal class);
.getEndpoint(); String endpoint = endpointIterator
String principalName = principal .hasNext() ? endpointIterator.next()
.getPrincipal(); .getName() : null;
addPermissionsToPrincipal for (ClusterRolePrincipal
(collection, principalName, endpoint); rolePrincipal : subject.getPrincipals
} (ClusterRolePrincipal.class)) {
return collection; String role = rolePrincipal
} .getName();
addPermissionsToPrincipal
(collection, role, endpoint);
}
return collection;
}
Changes in ClusterLoginModule
ClusterLoginModule in Hazelcast IMDG 3.x contained four abstract methods to alter the behavior of
LoginModule:
• onLogin
• onCommit
• onAbort
• onLogout
The login module was retrieving Credentials and using it to create the ClusterPrincipal back then.
In Hazelcast IMDG 4.0, only onLogin is abstract. Others now have empty implementations. The login
module creates ClusterEndpointPrincipal automatically and adds it to the Subject.
The getName() abstract method has been added. It is used for constructing ClusterIdentityPrincipal.
The addRole(String) method can be called by the child implementations to add
ClusterRolePrincipals with the given name.
Also, ClusterLoginModule introduces three login module options (boolean), which allows skipping
principals of a given type to the JAAS Subject. It allows, for instance, to have just one
ClusterIdentityPrincipal in the Subject even if there are more login modules in the chain. These
724
options are:
• skipIdentity
• skipRole
• skipEndpoint.
725
public class TestLoginModule extends public class TestLoginModule extends
ClusterLoginModule { ClusterLoginModule {
In Hazelcast IMDG 3.x, the custom credentials coming through the client protocol was always
automatically deserialized. To avoid this, the Credentials interface has been redesigned in
Hazelcast IMDG 4.0 to contain only the getName() (renamed from getPrincipal()) method. The
endpoint handling has been moved out of the interface.
726
Now, Credentials has two new subinterfaces:
• TokenCredentials: The new SimpleTokenCredentials class has been introduced to implement it.
TokenCredentials is just a holder for byte array, and the authentication implementations
themselves, i.e., custom LoginModules, are responsible for the data deserialization when needed.
The data from client authentication message is not deserialized by Hazelcast members anymore.
For standard authentication, UsernamePasswordCredentials is constructed. For custom
authentication, SimpleTokenCredentials is constructed. If the original Credentials object is not a
PasswordCredentials or TokenCredentials instance, then it can be deserialized manually. However,
the deserialization during authentication remains a dangerous operation and should be avoided.
Credentials serialization and deserialization in the member protocol has not been
changed.
In Hazelcast IMDG 3.x, the CallbackHandler implementation ClusterCallbackHandler was only able to
work with Hazelcast’s CredentialsCallback. In Hazelcast IMDG 4.0, it also works with the standard
Java Callback implementations NameCallback and PasswordCallback.
727
DefaultLoginModule was using the login module options to retrieve the member’s Config object. Now,
custom Callback types have been implemented which can be used to retrieve additional data
required for the authentication.
• javax.security.auth.callback.NameCallback
• javax.security.auth.callback.PasswordCallback
• com.hazelcast.security.CredentialsCallback (provides access to the incoming Credentials
instance)
Both in the API/code samples and documentation, the term "quorum" has been replaced by "split-
brain protection".
With this change, the following configuration parameters have been renamed:
• quorum → split-brain-protection
• quorum-size → minimum-cluster-size
• quorum-ref → split-brain-protection-ref
• quorum-type → protect-on
• probabilistic-quorum → probabilistic-split-brain-protection
• recently-active-quorum → recently-active-split-brain-protection
• quorum-function-class-name → split-brain-protection-function-class-name
• quorum-listeners → split-brain-protection-listeners
• QuorumConfig → SplitBrainProtectionConfig
• QuorumConfig.setSize() → SplitBrainProtectionConfig.setMinimumClusterSize()
• QuorumConfig.setType() → SplitBrainProtectionConfig.setProtectOn()
• QuorumListenerConfig → SplitBrainProtectionListenerConfig
• QuorumEvent → SplitBrainProtectionEvent
728
• QuorumService → SplitBrainProtectionService
• QuorumService.getQuorum() → SplitBrainProtectionService.getSplitBrainProtection()
• isPresent() → hasMinimumSize()
• setQuorumName() → setSplitBrainProtectionName()
• addQuorumConfig() → addSplitBrainProtectionConfig()
• newProbabilisticQuorumConfigBuilder() → newProbabilisticSplitBrainProtectionConfigBuilder()
• newRecentlyActiveQuorumConfigBuilder() →
newRecentlyActiveSplitBrainProtectionConfigBuilder()
<hazelcast> <hazelcast>
... ...
<quorum name= <split-brain-protection name=
"quorumRuleWithFourMembers" enabled= "splitBrainProtectionRuleWithFourMembers
"true"> " enabled="true">
<quorum-size>4</quorum-size> <minimum-cluster-size>
</quorum> 4</minimum-cluster-size>
<map name="default"> </split-brain-protection>
<quorum- <map name="default">
ref>quorumRuleWithFourMembers</quorum- <split-brain-protection-
ref> ref>splitBrainProtectionRuleWithFourMemb
</map> ers</split-brain-protection-ref>
... </map>
</hazelcast> ...
</hazelcast>
See the Split-Brain Protection section for more information on network partitioning.
The getId() method of the IdentifiedDataSerializable interface is a method with a common name,
meaning a naming conflict would happen frequently. For example, database entities also have a
getId() method. Therefore, it has been renamed as getClassId().
See the following table showing the interface code before and after IMDG 4.0.
729
package com.hazelcast.nio.serialization; package com.hazelcast.nio.serialization;
Entry Processor
The EntryBackupProcessor interface has been removed in favor of EntryProcessor which now
defines how the entries will be processed both on the primary and the backup replicas.
Because of this, the AbstractEntryProcessor interface has been removed. This should make writing
entry processors more lambda friendly.
This should cover most cases. If you need to define a custom backup entry processor, you can
730
override the EntryProcessor#getBackupProcessor method.
@Nullable
@Override
public EntryProcessor<Object, Object, Object> getBackupProcessor() {
return this::processBackupEntry;
}
});
Introduces interfaces with single abstract method which declares a checked exception. The
interfaces are also Serializable and can be readily used when providing a lambda which is then
serialized.
The Projection class was an abstract interface for historical reasons. It has been turned into a
functional interface so it’s more lambda-friendly.
The APIs of the distributed data structures have been made cleaner by adding Nullable and Nonnull
annotations, and their API documentation have been improved:
731
• Now, it is obvious when looking at the API where null is allowed and where it is not.
• Some methods actually allowed null but there was no indication that they did.
• A method when used on the member would accept null and have some behavior accordingly
while, on the client, the method would throw a NullPointerException. Now, the behavior of the
member and client have been aligned.
The data structures and interfaces enhanced in this sense are listed below:
• Cluster
• ITopic
• Ringbuffer
• ScheduledExecutor
Dependent computation stages registered using default async methods which do not accept an
explicit Executor argument (such as thenAcceptAsync, whenCompleteAsync etc) are executed by the
java.util.concurrent.ForkJoinPool#commonPool() (unless it does not support a parallelism level of at
least two, in which case, a new Thread is created to run each task).
732
import import com.hazelcast.core.Hazelcast;
com.hazelcast.core.ExecutionCallback; import
import com.hazelcast.core.Hazelcast; com.hazelcast.core.HazelcastInstance;
import import com.hazelcast.map.IMap;
com.hazelcast.core.HazelcastInstance;
import com.hazelcast.core.IMap; public class Main {
@Override
public void onFailure
(Throwable t) {
t.printStackTrace();
}
});
}
}
733
B.1.20. WAN Replication Configuration Changes
• You needed to specify the fully qualified class name of the WAN implementation that should be
used. In most cases, this was the built-in Hazelcast IMDG Enterprise Edition (EE)
implementation.
• There were various configuration options, some of which were present as Java class instance
fields or XML child nodes and attributes while others were present in a properties list. The issue
with the property list is that there was no checking for typos, no documentation and no IDE
help.
• If you wanted to use a custom WAN publisher SPI implementation, some configuration options
did not make sense as they were tied to our implementation, e.g., WAN queue size.
• It was verbose.
The tag which was supposed to cover both cases, using the built-in Hazelcast EE implementation
and a custom WAN replication implementation (wan-publisher or WanPublisherConfig), has been
separated into two configuration elements/classes to be used for built-in and custom WAN
publishers:
This means, if you’re using the Hazelcast built-in WAN replication, the new configuration element is
batch-publisher or WanBatchPublisherConfig. If you’re using a custom WAN replication
implementation, the new configuration element is custom-publisher or WanCustomPublisherConfig.
Additionally, the group password has been removed from the configuration and now only the
cluster name is checked when connecting to the target cluster. This has been done to align the
behavior with members forming a single cluster, where members with different passwords but
with the same cluster name (previously group name) could form a cluster.
See the following table for the before/after built-in WAN publisher examples:
Declarative Configuration
734
<wan-publisher group-name= <batch-publisher>
"builtInPublisher" publisher-id= <cluster-name>
"builtInPublisherId"> builtInPublisher</cluster-name>
<class- <publisher-id>
name>com.hazelcast.enterprise.wan.impl.r builtInPublisherId</publisher-id>
eplication.WanBatchReplication</class- <batch-size>1000</batch-size>
name> <batch-max-delay-millis>2000</batch-
<queue-capacity>15000</queue- max-delay-millis>
capacity> <response-timeout-millis>
<queue-full- 60000</response-timeout-millis>
behavior>DISCARD_AFTER_MUTATION</queue- <acknowledge-
full-behavior> type>ACK_ON_OPERATION_COMPLETE</acknowle
<initial-publisher-state> dge-type>
REPLICATING</initial-publisher-state> <initial-publisher-state>
<wan-sync> REPLICATING</initial-publisher-state>
<consistency-check-strategy> <snapshot-enabled>false</snapshot-
NONE</consistency-check-strategy> enabled>
</wan-sync> <queue-full-
<properties> behavior>DISCARD_AFTER_MUTATION</queue-
<property name="endpoints" full-behavior>
>10.3.5.1:5701,10.3.5.2:5701</property> <queue-capacity>10000</queue-
<property name="batch.size"> capacity>
1000</property> <target-
<property name= endpoints>10.3.5.1:5701,10.3.5.2:5701</t
"batch.max.delay.millis">2000</property> arget-endpoints>
<property name= <wan-sync>
"response.timeout.millis">60000</propert <consistency-check-strategy>
y> NONE</consistency-check-strategy>
<property name="ack.type" </wan-sync>
>ACK_ON_OPERATION_COMPLETE</property> </batch-publisher>
<property name="
snapshot.enabled">false</property>
<property name="group.password"
>nyc-pass</property>
</properties>
</wan-publisher>
Programmatic Configuration
735
WanPublisherConfig publisherConfig = new WanBatchPublisherConfig publisherConfig
WanPublisherConfig() = new WanBatchPublisherConfig()
.setGroupName("builtInPublisher .setClusterName(
") "builtInPublisher")
.setPublisherId( .setPublisherId(
"builtInPublisherId") "builtInPublisherId")
.setClassName( .setClassName(
"com.hazelcast.enterprise.wan.impl.repli "com.hazelcast.enterprise.wan.impl.repli
cation.WanBatchReplication") cation.WanBatchReplication")
.setQueueCapacity(15000) .setQueueCapacity(15000)
.setQueueFullBehavior .setQueueFullBehavior
(WANQueueFullBehavior.DISCARD_AFTER_MUTA (WanQueueFullBehavior.DISCARD_AFTER_MUTA
TION) TION)
.setInitialPublisherState .setInitialPublisherState
(WanPublisherState.REPLICATING); (WanPublisherState.REPLICATING)
publisherConfig.getWanSyncConfig().setCo .setTargetEndpoints(
nsistencyCheckStrategy(ConsistencyCheckS "10.3.5.1:5701,10.3.5.2:5701")
trategy.NONE); .setBatchSize(1000)
Map<String, Comparable> properties = .setBatchMaxDelayMillis(2000)
publisherConfig.getProperties(); .setResponseTimeoutMillis(60000)
properties.put("endpoints", .setAcknowledgeType
"10.3.5.1:5701,10.3.5.2:5701"); (WanAcknowledgeType.ACK_ON_OPERATION_COM
properties.put("batch.size", 1000); PLETE)
properties.put("batch.max.delay.millis", .setSnapshotEnabled(false);
2000); publisherConfig.getWanSyncConfig().setCo
properties.put("response.timeout.millis" nsistencyCheckStrategy(ConsistencyCheckS
, 60000); trategy.NONE);
properties.put("ack.type",
WanAcknowledgeType.ACK_ON_OPERATION_COMP
LETE.toString());
properties.put("snapshot.enabled",
false);
properties.put("group.password", "nyc-
pass");
See the following table for the before/after custom WAN publisher examples:
Declarative Configuration
736
<wan-publisher group-name= <custom-publisher>
"customWanPublisherId"> <publisher-id>
<class- customPublisherId</publisher-id>
name>com.myCompany.MyImplementation</cla <class-
ss-name> name>com.myCompany.MyImplementation</cla
<properties> ss-name>
<property name="some.property" <properties>
>some-value</property> <property name="some.property"
<property name= >some-value</property>
"some.other.property">some-other- <property name=
value</property> "some.other.property">some-other-
</properties> value</property>
</wan-publisher> </properties>
</custom-publisher>
Programmatic Configuration
In IMDG 3.x series, the WAN publisher SPI allowed you to plug into the lifecycle of a map/cache
entry and replicate the updates to another system. For example, you might implement replication
to Kafka or some JMS queue or even write out map and cache event changes to a log on disk. The
SPI was not very intuitive though:
• You had to implement different interfaces, depending on whether you were using Hazelcast
IMDG Open Source or Enterprise edition.
737
• There were cases of leaking internals which don’t make sense for some custom
implementations.
In Hazelcast IMDG 4.0, we have provided a new and cleaner WAN publisher SPI. You only need to
implement a single interface: com.hazelcast.wan.WanPublisher. This implementation can then be set
in the WAN replication configuration and be used with both Hazelcast Open Source and Enterprise
editions.
The following refactors and cleanups have been performed on the public Predicate related API:
◦ IndexAwarePredicate
◦ VisitablePredicate
◦ SqlPredicate/Parser
◦ TruePredicate
• Moved the FalsePredicate and SkipIndexPredicate classes to the
com.hazelcast.query.impl.predicates package.
◦ newPredicateBuilder()
◦ sql()
◦ pagingPredicate()
◦ partitionPredicate()
Consequently, the public Predicate API now provides only interfaces (Predicate, PagingPredicate
and PartitionPredicate) with no dependencies on any internal APIs.
Some public APIs that return UUID strings have been changed to return UUID. These changes
include getUuid() method of the Endpoint interface, getTxnId() method of the TransactionContext
interface, return values of the listener registrations and registrationId parameters for the methods
that de-register the listeners.
738
See the following table for the before/after sample implementations.
@Override @Override
public void clientDisconnected public void clientDisconnected
(Client client) { (Client client) {
String clientUuid = client UUID clientUuid = client.
.getUuid(); getUuid();
System.out.println("Client System.out.println("Client
disconnected >>> " + clientUuid); disconnected >>> " + clientUuid);
} }
}); });
hazelcastInstance.getClientService().rem hazelcastInstance.getClientService().rem
oveClientListener(registrationId); oveClientListener(registrationId);
Differently from Hazelcast IMDG 3.12, CP Subsystem received an unsafe operation mode in IMDG
4.0 which provides weaker consistency guarantees similar to former implementations in Hazelcast
IMDG 3.x series.
739
import com.hazelcast.core.Hazelcast; import com.hazelcast.core.Hazelcast;
import import
com.hazelcast.core.HazelcastInstance; com.hazelcast.core.HazelcastInstance;
import com.hazelcast.core.IAtomicLong; import com.hazelcast.cp.CPSubsystem;
import import com.hazelcast.cp.IAtomicLong;
com.hazelcast.core.IAtomicReference; import
import com.hazelcast.cp.IAtomicReference;
com.hazelcast.core.ICountDownLatch; import com.hazelcast.cp.ICountDownLatch;
import com.hazelcast.core.ILock; import com.hazelcast.cp.ISemaphore;
import com.hazelcast.core.ISemaphore; import com.hazelcast.cp.lock.FencedLock;
740
B.1.25. Removal of Legacy Merge Policies
All legacy merge policies have been removed. Replacements of legacies are under the
com.hazelcast.spi.merge package.
• com.hazelcast.map.merge.HigherHitsMapMergePolicy →
com.hazelcast.spi.merge.HigherHitsMergePolicy
• com.hazelcast.map.merge.LatestUpdateMapMergePolicy →
com.hazelcast.spi.merge.LatestUpdateMergePolicy
• com.hazelcast.map.merge.PassThroughMergePolicy →
com.hazelcast.spi.merge.PassThroughMergePolicy
• com.hazelcast.map.merge.PutIfAbsentMapMergePolicy →
com.hazelcast.spi.merge.PutIfAbsentMergePolicy
• com.hazelcast.cache.merge.HigherHitsCacheMergePolicy →
com.hazelcast.spi.merge.HigherHitsMergePolicy
• com.hazelcast.cache.merge.LatestAccessCacheMergePolicy →
com.hazelcast.spi.merge.LatestAccessMergePolicy
• com.hazelcast.cache.merge.PassThroughCacheMergePolicy →
com.hazelcast.spi.merge.PassThroughMergePolicy
• com.hazelcast.cache.merge.PutIfAbsentCacheMergePolicy →
com.hazelcast.spi.merge.PutIfAbsentMergePolicy
The merge-policy declarative configuration element that has been used in the older IMDG versions
still can be used:
<merge-policy batch-size="100">LatestAccessMergePolicy</merge-policy>
AWS programmatic configuration has been merged with a more universal configuration
infrastructure common to all cloud providers. The declarative configuration remains unchanged.
See here for more information on configuring Hazelcast IMDG on AWS.
741
Before IMDG 4.0 After IMDG 4.0
• hazelcast.rest.enabled
• hazelcast.memcache.enabled
• hazelcast.http.healthcheck.enabled
Please see the Using the REST Endpoint Groups section on how to configure Hazelcast instance to
expose REST endpoints. Please see the the Health Check and Monitoring section on how to enable
the health check. Please see the Memcache Client section on how to enable memcache client
request listener service.
742
See here for more information on configuring MultiMap.
Deprecated getObjectId() method has been removed, getObjectName() should be used instead.
The following set of deprecated EntryListener-based listener API methods has been removed:
• addLocalEntryListener(EntryListener<K, V>)
• addLocalEntryListener(EntryListener<K, V>, Predicate<K, V>, boolean)
• addLocalEntryListener(EntryListener<K, V>, Predicate<K, V>, K, boolean)
• addEntryListener(EntryListener<K, V>, boolean)
• addEntryListener(EntryListener<K, V>, K, boolean)
• addLocalEntryListener(MapListener)
• addLocalEntryListener(MapListener, Predicate<K,V>, boolean)
• addLocalEntryListener(MapListener, Predicate<K,V>, K, boolean)
• addEntryListener(MapListener, boolean)
• addEntryListener(MapListener, K, boolean)
743
EntryListener-based listeners are still supported by the newer MapListener-based API and
declarative configuration.
There has been a simplification and improvement in the way of configuring the eviction for a map.
<hazelcast> <hazelcast>
... ...
<map name="default"> <map name="default">
<eviction-policy>LRU</eviction- <eviction eviction-policy="LRU"
policy> max-size-policy="PER_NODE" size="20"/>
<max-size policy="PER_NODE"> </map>
20</max-size> ...
</map> </hazelcast>
...
</hazelcast>
There has been a simplification and improvement in the way of configuring the custom eviction
policy for a map.
<hazelcast> <hazelcast>
... ...
<map name="default"> <map name="default">
<map-eviction-policy-class-name> <eviction comparator-class-name
="com.mycompany.MyMapEvictionPolicyCompa
com.mycompany.MyMapEvictionPolicyCompara rator"/>
tor </map>
</map-eviction-policy-class- ...
name> </hazelcast>
</map>
...
</hazelcast>
744
B.1.37. Changes in EntryListenerConfig
• /hazelcast/rest/mancenter/changeurl is removed
The following REST endpoints now require cluster name and password as the first two URL-
encoded parameters:
• /hazelcast/rest/wan/sync/map
• /hazelcast/rest/wan/sync/allmaps
• /hazelcast/rest/wan/clearWanQueues
• /hazelcast/rest/wan/addWanConfig
• /hazelcast/rest/wan/pausePublisher
• /hazelcast/rest/wan/stopPublisher
• /hazelcast/rest/wan/resumePublisher
• /hazelcast/rest/wan/consistencyCheck/map
• /hazelcast/health/node-state
• /hazelcast/health/cluster-state
• /hazelcast/health/cluster-safe
• /hazelcast/health/migration-queue-size
• /hazelcast/health/cluster-size
• /hazelcast/health/ready
• /hazelcast/rest/cluster
745
B.1.39. Changes in the Diagnostics Configuration
By introducing the metrics system in Hazelcast IMDG 4.0, the metrics collected by Diagnostics and
the metrics system is shared. This has come with the following changes of the system properties
that configure diagnostics:
As Management Center now uses Hazelcast Java client for communication with the cluster, all
attributes and nested elements have been removed from programmatic, XML and YAML
configurations for Management Center, i.e., from ManagementCenterConfig class and management-
center configuration element, except for the scripting-enabled attribute.
A full example of settings available in the Management Center configuration now looks like the
following:
This has come with the following changes of the system properties that configure Management
Center:
• Upgrading Cluster Version From IMDG 3.11 to 3.12: For the IMDG versions before 3.12, REST
API could be enabled by using the hazelcast.rest.enabled system property, which is deprecated
now. IMDG 3.12 and newer versions introduce the rest-api configuration element along with
REST endpoint groups. Therefore, a configuration change is needed specifically when
performing a rolling member upgrade from IMDG 3.11 to 3.12.
So, the steps listed in the above Rolling Upgrade Procedure section should be as follows:
746
3. Update the member with 3.12 binaries
For the 4th step ("Update the configuration"), the configuration should be updated as
follows:
<hazelcast>
...
<rest-api enabled="true">
<endpoint-group name="CLUSTER_WRITE" enabled="true"/>
</rest-api>
...
</hazelcast>
See the Using the REST Endpoint Groups section for more information.
• Starting with Hazelcast 3.10, split-brain recovery is supported for the data structures whose in-
memory format is NATIVE.
• The system property based configuration for Ping Failure Detector is deprecated. Instead, use
the elements to configure it, an example of which is shown below:
747
<hazelcast>
<network>
...
<failure-detector>
<icmp enabled="true">
<timeout-milliseconds>1000</timeout-milliseconds>
<fail-fast-on-startup>true</fail-fast-on-startup>
<interval-milliseconds>1000</interval-milliseconds>
<max-attempts>2</max-attempts>
<parallel-mode>true</parallel-mode>
<ttl>255</ttl>
</icmp>
</failure-detector>
</network>
...
</hazelcast>
Until Hazelcast IMDG 3.10, the configuration has been like the following:
<hazelcast>
...
<properties>
<property name="hazelcast.icmp.enabled">true</property>
<property name="hazelcast.icmp.parallel.mode">true</property>
<property name="hazelcast.icmp.timeout">1000</property>
<property name="hazelcast.icmp.max.attempts">3</property>
<property name="hazelcast.icmp.interval">1000</property>
<property name="hazelcast.icmp.ttl">0</property>
</properties>
...
</hazelcast>
• Synchronizing WAN Target Cluster: The URL for the related REST call has been changed from
http://member_ip:port/hazelcast/rest/wan/sync/map to
http://member_ip:port/hazelcast/rest/mancenter/wan/sync/map.
• JCache usage: Due to a compatibility problem, CacheConfig serialization may not work if your
member is 3.8.x where x < 5. You need to use the 3.8.5 or higher versions where the problem is
fixed.
748
B.6. Upgrading to Hazelcast IMDG 3.7.x
This section provides information to be considered when upgrading from Hazelcast IMDG 3.6.x to
3.7.x and newer.
• Important note about Hazelcast System Properties: Even Hazelcast has not been
recommending the usage of GroupProperties.java class while benefiting from system properties,
there has been a change to inform to the users who have been using this class: the class
GroupProperties.java has been replaced by GroupProperty.java. In this new class, system
properties are instances of the newly introduced HazelcastProperty object. You can access the
names of these properties by calling the getName() method of HazelcastProperty.
• JCache usage: Changes in JCache implementation which broke compatibility of 3.6.x clients to
3.7, 3.7.1, 3.7.2 cluster members and vice versa. 3.7, 3.7.1, 3.7.2 clients are also incompatible with
3.6.x cluster members. This issue only affects Java clients which use JCache functionality.
You can use a compatibility option which can be used to ensure backwards compatibility with
3.6.x clients.
In order to upgrade a 3.6.x cluster and clients to 3.7.3 (or later), you need to use this
compatibility option on either the member or the client side, depending on which one is
upgraded first:
You may use any of the supported ways as described in the System Properties section to
configure the compatibility option. When done upgrading your cluster and clients, you may
remove the compatibility property from your Hazelcast member configuration.
749
B.7. Upgrading to Hazelcast IMDG 3.6.x
This section provides information to be considered when upgrading from Hazelcast IMDG 3.5.x to
3.6.x and newer.
• Introducing new configuration options for WAN replication: WAN replication related system
properties, which are configured on a per member basis, can now be configured per target
cluster. The following system properties are no longer valid.
• Change in the Custom Serialization in the C++ Client Distribution: Before, the method
getTypeId() was used to retrieve the ID of the object to be serialized. With this release, the
method getHazelcastTypeId() is used and you give your object as a parameter to this new
method. Also, getTypeId() was used in your custom serializer class; it has been renamed to
getHazelcastTypeId(), too.
• The LOCAL transaction type has been deprecated. Use ONE_PHASE for the Hazelcast IMDG releases
3.6 and higher.
• Removal of deprecated static methods: The static methods of Hazelcast class reaching
Hazelcast data components have been removed. The functionality of these methods can be
reached from the HazelcastInstance interface. You should replace the following:
with
750
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
// or if you already started an instance named "instance1"
// HazelcastInstance hazelcastInstance = Hazelcast.getHazelcastInstanceByName(
"instance1" );
Map<Integer, String> customers = hazelcastInstance.getMap( "customers" );
• Renaming "instance" to "distributed object": There were confusions about the term
"instance"; it was used for both the cluster members and distributed objects (map, queue, topic,
etc. instances). Starting with this release, the term "instance" is used for Hazelcast instances. The
term "distributed object" is used for map, queue, etc. instances. You should replace the related
methods with the new renamed ones. 3.0.x clients are smart clients in that they know in which
cluster member the data is located, so you can replace your lite members with native clients.
with
• Listener API change: The removeListener methods were taking the listener object as a
parameter. But this caused confusion since the same listener object may be used as a parameter
for different listener registrations. So we have changed the listener API. The addListener
methods returns a unique ID and you can remove a listener by using this ID. So you should do
the following replacement if needed:
751
IMap map = hazelcastInstance.getMap( "map" );
map.addEntryListener( listener, true );
map.removeEntryListener( listener );
with
• IMap changes:
◦ tryRemove(K key, long timeout, TimeUnit timeunit) returns boolean indicating whether
operation is successful.
◦ getMapEntry(K key) is renamed as getEntryView(K key). The returned object’s type (MapEntry
class) is renamed as EntryView.
◦ There is no predefined names for merge policies. You just give the full class name of the
merge policy implementation:
<merge-policy>com.hazelcast.map.merge.PassThroughMergePolicy</merge-policy>
Also the MergePolicy interface has been renamed as MapMergePolicy and returning null from
the implemented merge() method causes the existing entry to be removed.
• IQueue changes: There is no change on IQueue API but there are changes on how IQueue is
configured: there is no backing map configuration for queue. Settings like backup count are
directly configured on the queue configuration. See the Queue section.
• Transaction API change: Transaction API has been changed. See the Transactions chapter.
• ExecutorService API change: The MultiTask and DistributedTask classes have been removed.
All the functionality is supported by the newly presented interface IExecutorService. See the
Executor Service section.
• LifeCycleService API: The lifecycle has been simplified. The pause(), resume(), restart()
methods have been removed.
• ICountDownLatch: The await() operation has been removed. We expect users to use await()
method with timeout parameters.
• ISemaphore API: The ISemaphore has been substantially changed. The attach(), detach()
methods have been removed.
752
• Before, the default value for max-size eviction policy was cluster_wide_map_size. Starting with
this release, the default is PER_NODE. After upgrading, the max-size should be set according to
this new default, if it is not changed. Otherwise, it is likely that OutOfMemoryException may be
thrown.
• HazelcastOverloadException: Thrown when the system cannot handle any more load due to an
overload. This exception is thrown when back pressure is enabled.
Hazelcast also throws the following exceptions in the cases of overall system problems such as
networking issues and long pauses:
Hazelcast IMDG Enterprise is a commercial product of Hazelcast, Inc. and is distributed under a
commercial license that must be acquired before using it in any type of released software. Feel free
to contact Hazelcast sales department for more information on commercial offers.
Depending on the used feature-set, Hazelcast has certain runtime dependencies which might have
different licenses. Following are dependencies and their respective licenses.
753
D.1. Embedded Dependencies
Embedded dependencies are merged (shaded) with the Hazelcast codebase at compile-time. These
dependencies become an integral part of the Hazelcast distribution.
For license files of embedded dependencies, see the license directory of the Hazelcast distribution,
available at our download page.
minimal-json:
minimal-json is a JSON parsing and generation library which is a part of the Hazelcast distribution.
It is used for communication between the Hazelcast cluster and the Management Center.
minimal-json is distributed under the MIT license and offers the same rights to add, use, modify
and distribute the source code as the Apache License 2.0 that Hazelcast uses. However, some other
restrictions might apply.
Spring Framework:
Hazelcast offers a tight integration into the Spring Framework. Hazelcast can be configured and
controlled using Spring.
The Spring Framework is distributed under the terms of the Apache License 2 and therefore it is
fully compatible with Hazelcast.
Hibernate:
Hibernate is distributed under the terms of the Lesser General Public License 2.1, also known as
LGPL. Please read carefully the terms of the LGPL since restrictions might apply.
Apache Tomcat:
Hazelcast IMDG Enterprise offers native integration into Apache Tomcat for web session clustering.
Apache Tomcat is distributed under the terms of the Apache License 2 and therefore fully
compatible with Hazelcast.
Eclipse Jetty:
Hazelcast IMDG Enterprise offers native integration into Jetty for web session clustering.
Jetty is distributed with a dual licensing strategy. It is licensed under the terms of the Apache
License 2 and under the Eclipse Public License v1.0, also known as EPL. Due to the Apache License,
754
it is fully compatible with Hazelcast.
Hazelcast offers a native implementation for JCache (JSR 107), which has a runtime dependency to
the JCache API.
The JCache API is distributed under the terms of the so called Specification License. Please read
carefully the terms of this license since restrictions might apply.
Hazelcast IMDG Enterprise offers a native C++ client, which has a link-time dependency to the Boost
C++ Libraries.
The Boost Libraries are distributed under the terms of the Boost Software License), which is very
similar to the MIT or BSD license. Please read carefully the terms of this license since restrictions
might apply.
Hazelcast IMDG member instances call our phone home server initially when they are started and
then every 24 hours. This applies to all the instances joined to the cluster.
• Download ID
• Cluster size bands for 5, 10, 20, 40, 60, 100, 150, 300, 600 and > 600
• Number of connected clients bands of 5, 10, 20, 40, 60, 100, 150, 300, 600 and > 600
• Cluster uptime
• Member uptime
• Environment Information:
755
• Hazelcast IMDG Enterprise specific:
Set the hazelcast.phone.home.enabled system property to false either in the config or on the Java
command line. See the System Properties appendix for information on how to set a property.
You can also disable the phone home using the environment variable HZ_PHONE_HOME_ENABLED.
export HZ_PHONE_HOME_ENABLED=false
The partition count of 271, being a prime number, is a good choice because it is distributed to the
members almost evenly. For a small to medium sized cluster, the count of 271 gives an almost even
partition distribution and optimal-sized partitions. As your cluster becomes bigger, you should
make this count bigger to have evenly distributed partitions.
Is Hazelcast thread-safe?
756
• Discovery by TCP/IP: The first member created in the cluster (leader) forms a list of IP addresses
of other joining members and sends this list to these members so the members will know each
other.
• Discovery on clouds: Hazelcast supports discovery on cloud platforms such as jclouds based
environments, Azure, Consul and PCF.
• Multicast discovery: The members in a cluster discover each other by multicast, by default. It is
not recommended for production since UDP is often blocked in production environments and
other discovery mechanisms are more definite.
Once members are discovered, all the communication between them is via TCP/IP.
• And finally, new backups for these recovered data are formed.
If you notice that there is a problem with a member joining a cluster, you may want to perform a
connectivity test between the member to be joined and a member from the cluster. You can use the
iperf tool for this purpose. For example, you can execute the below command on one member (i.e.
listening on port 5701).
iperf -s -p 5701
And you can execute the below command on the other member.
The output should include connection information, such as the IP addresses, transfer speed and
bandwidth. Otherwise, if the output says No route to host, it means a network connection problem
exists.
When you store a key and value in a distributed Map, Hazelcast serializes the key and value and
stores the byte array version of them in local ConcurrentHashMaps. These ConcurrentHashMaps
use equals and hashCode methods of byte array version of your key. It does not take into account the
757
actual equals and hashCode implementations of your objects. So it is important that you choose your
keys in a proper way.
Implementing equals and hashCode is not enough, it is also important that the object is always
serialized into the same byte array. All primitive types like String, Long, Integer, etc. are good
candidates for keys to be used in Hazelcast. An unsorted Set is an example of a very bad candidate
because Java Serialization may serialize the same unsorted set in two different byte arrays.
Hazelcast always return a clone copy of a value. Modifying the returned value does not change the
actual value in the map (or multimap, list, set). You should put the modified value back to make
changes visible to all members.
Hazelcast allows you to create more than one instance on the same JVM. Each member is called
HazelcastInstance and each has its own configuration, socket and threads, so you can treat them as
totally separate instances.
This enables you to write and to run cluster unit tests on a single JVM. Because you can use this
feature for creating separate members different applications running on the same JVM (imagine
running multiple web applications on the same JVM), you can also use this feature for testing your
Hazelcast cluster.
Let’s say you want to test if two members have the same size of a map.
758
@Test
public void testTwoMemberMapSizes() {
// start the first member
HazelcastInstance h1 = Hazelcast.newHazelcastInstance();
// get the map and put 1000 entries
Map map1 = h1.getMap( "testmap" );
for ( int i = 0; i < 1000; i++ ) {
map1.put( i, "value" + i );
}
// check the map size
assertEquals( 1000, map1.size() );
// start the second member
HazelcastInstance h2 = Hazelcast.newHazelcastInstance();
// get the same map from the second member
Map map2 = h2.getMap( "testmap" );
// check the size of map2
assertEquals( 1000, map2.size() );
// check the size of map1 again
assertEquals( 1000, map1.size() );
}
In the test above, everything happens in the same thread. When developing a multi-threaded test,
you need to carefully handle coordination of the thread executions. It is highly recommended that
you use CountDownLatch for thread coordination (you can certainly use other ways). Here is an
example where we need to listen for messages and make sure that we got these messages.
759
@Test
public void testTopic() {
// start two member cluster
HazelcastInstance h1 = Hazelcast.newHazelcastInstance();
HazelcastInstance h2 = Hazelcast.newHazelcastInstance();
String topicName = "TestMessages";
// get a topic from the first member and add a messageListener
ITopic<String> topic1 = h1.getTopic( topicName );
final CountDownLatch latch1 = new CountDownLatch( 1 );
topic1.addMessageListener( new MessageListener() {
public void onMessage( Object msg ) {
assertEquals( "Test1", msg );
latch1.countDown();
}
});
// get a topic from the second member and add a messageListener
ITopic<String> topic2 = h2.getTopic(topicName);
final CountDownLatch latch2 = new CountDownLatch( 2 );
topic2.addMessageListener( new MessageListener() {
public void onMessage( Object msg ) {
assertEquals( "Test1", msg );
latch2.countDown();
}
} );
// publish the first message, both should receive this
topic1.publish( "Test1" );
// shutdown the first member
h1.shutdown();
// publish the second message, second member's topic should receive this
topic2.publish( "Test1" );
try {
// assert that the first member's topic got the message
assertTrue( latch1.await( 5, TimeUnit.SECONDS ) );
// assert that the second members' topic got two messages
assertTrue( latch2.await( 5, TimeUnit.SECONDS ) );
} catch ( InterruptedException ignored ) {
}
}
You can start Hazelcast members with different configurations. Remember to call
Hazelcast.shutdownAll() after each test case to make sure that there is no other running member
left from the previous tests.
@After
public void cleanup() throws Exception {
Hazelcast.shutdownAll();
}
760
For more information please check our existing tests.
Yes. Hazelcast performed a successful test on Amazon EC2 with 200 members.
Yes. However, there are some points you should consider. The environment should be LAN with a
high stability and the network speed should be 10 Gbps or higher. If the number of members is
high, the client type should be selected as Unisocket, not Smart Client. In the case of Smart Clients,
since each client opens a connection to the members, these members should be powerful enough
(for example, more cores) to handle hundreds or thousands of connections and client requests.
Also, you should consider using Near Caches in clients to lower the network traffic. And you should
use the Hazelcast releases with the NIO implementation (which starts with Hazelcast 3.2).
Also, you should configure the clients attentively. See the Clients section for configuration notes.
Lite member supports task execution (distributed executor service), smart client does not. Also, Lite
Member is highly coupled with cluster, smart client is not. Starting with Hazelcast 3.9, you can also
promote lite members to data members. See the Lite Members section for more information.
We have two support services: community and commercial support. Community support is
provided through our Mail Group and StackOverflow web site. For information on support
subscriptions, link:see Hazelcast.com.
No. However, Hazelcast provides MapStore and MapLoader interfaces. For example, when you
implement the MapStore interface, Hazelcast calls your store and load methods whenever needed.
Yes. But please note that Hazelcast’s main design focus is multi-member clusters to be used as a
distribution platform.
761
How can I monitor Hazelcast?
Hazelcast Management Center is what you use to monitor and manage the members running
Hazelcast. In addition to monitoring the overall state of a cluster, you can analyze and browse data
structures in detail, you can update map configurations and you can take thread dumps from
members.
You can also use Hazelcast’s HTTP based health check implementation and health monitoring
utility. See the Health Check and Monitoring section. There is also a diagnostocs tool where you can
see detailed logs enhanced with diagnostic plugins.
Moreover, JMX monitoring is also provided. See the Monitoring with JMX section for details.
By changing the log level to "Debug". Below are example lines for log4j logging framework. See the
Logging Configuration section to learn how to set logging types.
Then set the log level to "Debug" in the properties file. Below is example content.
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.logger.com.hazelcast=debug
#log4j.logger.com.hazelcast.cluster=debug
#log4j.logger.com.hazelcast.partition=debug
#log4j.logger.com.hazelcast.partition.InternalPartitionService=debug
#log4j.logger.com.hazelcast.nio=debug
#log4j.logger.com.hazelcast.hibernate=debug
The line log4j.logger.com.hazelcast=debug is used to see debug logs for all Hazelcast operations.
Below this line, you can select to see specific logs (cluster, partition, hibernate, etc.).
762
Client-server vs. embedded topologies?
In the embedded topology, members include both the data and application. This type of topology is
the most useful if your application focuses on high performance computing and many task
executions. Since application is close to data, this topology supports data locality.
In the client-server topology, you create a cluster of members and scale the cluster independently.
Your applications are hosted on the clients and the clients communicate with the members in the
cluster to reach data.
Client-server topology fits better if there are multiple applications sharing the same data or if
application deployment is significantly greater than the cluster size (for example, 500 application
servers vs. 10 member cluster).
• You can call kill -9 <PID> in the terminal (which sends a SIGKILL signal). This results in the
immediate shutdown which is not recommended for production systems. If you set the property
hazelcast.shutdownhook.enabled to false and then kill the process using kill -15 <PID>, its result
is the same (immediate shutdown).
• You can call kill -15 <PID> in the terminal (which sends a SIGTERM signal), or you can call the
method HazelcastInstance.getLifecycleService().terminate() programmatically, or you can use
the script stop.sh located in your Hazelcast’s /bin directory. All three of them terminate your
member ungracefully. They do not wait for migration operations, they force the shutdown. But
this is much better than kill -9 <PID> since it releases most of the used resources.
• In order to gracefully shutdown a Hazelcast member (so that it waits the migration operations
to be completed), you have four options:
◦ You can use JMX API’s shutdown method. You can do this by implementing a JMX client
application or using a JMX monitoring tool (like JConsole).
◦ You can set the property hazelcast.shutdownhook.policy to GRACEFUL and then shutdown by
using kill -15 <PID>. Your member will be gracefully shutdown.
◦ You can use the "Shutdown Member" button in the member view of Hazelcast Management
Center.
If you use systemd’s systemctl utility, i.e., systemctl stop service_name, a SIGTERM signal is sent.
After 90 seconds of waiting it is followed by a SIGKILL signal by default. Thus, it calls terminate at
first and kill the member directly after 90 seconds. We do not recommend to use it with its defaults.
But systemd is very customizable and well-documented, you can see its details using the command
man systemd.kill. If you can customize it to shutdown your Hazelcast member gracefully (by using
the methods above), then you can use it.
763
How do I know it is safe to kill the second member?
Starting with Hazelcast 3.7, graceful shutdown of a Hazelcast member can be initiated any time as
follows:
hazelcastInstance.shutdown();
Once a Hazelcast member initiates a graceful shutdown, data of the shutting down member is
migrated to the other members automatically.
Below code snippet terminates a member if the cluster is safe, which means that there are no
partitions being migrated and all backups are in sync when this method is called.
Below code snippet terminates the local member if the member is safe to terminate, which means
that all backups of partitions currently owned by local member are in sync when this method is
called.
Please keep in mind that two code snippets shown above are inherently racy. If member failures
occur in the cluster after the safety condition check passes, termination of the local member can
lead to data loss. For safety of the data, graceful shutdown API is highly recommended.
See the Safety Checking Cluster Members section for more information.
• the amount of data per member is large enough to create significant garbage collection pauses
764
The only disadvantage when using Near Cache is that it may cause stale reads.
Is Hazelcast secure?
Hazelcast supports symmetric encryption, transport layer security/secure sockets layer (TLS/SSL)
and Java Authentication and Authorization Service (JAAS). See the Security chapter for more
information.
Hazelcast allows you to set some socket options such as SO_KEEPALIVE, SO_SNDBUF and SO_RCVBUF using
Hazelcast configuration properties. See the hazelcast.socket.* properties explained in the System
Properties appendix.
In Hazelcast, socket connections are created with the SO_KEEPALIVE option enabled by default. In
most operating systems, default keep-alive time is 2 hours. If you have a firewall between clients
and servers which is configured to reset idle connections/sessions, make sure that the firewall’s idle
timeout is greater than the TCP keep-alive defined in the OS.
See Using TCP keepalive under Linux and Microsoft TechNet for additional information.
The JVM on Linux consumes a file descriptor for each thread created. The default number of file
descriptors available in Linux is usually 1024. If you have many JVMs running on a single machine,
it is possible to exceed this default number.
# ulimit -a
At the operating system level, Linux users can control the amount of resources (and in particular,
file descriptors) used via one of the following options.
# vi /etc/security/limits.conf
765
testuser soft nofile 4096<br>
testuser hard nofile 10240<br>
# ulimit -Hn
10240
The default number of process per users is 1024. Adding the following to your $HOME/.profile could
solve the issue:
# ulimit -u 4096
Repartitioning is the process of redistributing the partition ownerships. Hazelcast performs the
repartitioning in the cases where a member leaves the cluster or joins the cluster. If a
repartitioning happens while an entry processor is active in a member processing on an entry
object, the repartitioning waits for the entry processor to complete its job.
Assume you have two instances on two different machines and you develop a configuration as
shown below.
When you create the Hazelcast instance, you have to pass the configuration to the instance. If you
create the instances without passing the configuration, each instance starts but cannot see each
other. Therefore, a correct way to create the instance is the following:
766
HazelcastInstance instance = Hazelcast.newHazelcastInstance();
When you start more members after the first one is started, you will see replica: 1 has no owner
entry in the newly started member’s log. There is no need to worry about it since it refers to a
transitory state. It only means the replica partition is not ready/assigned yet and eventually it will
be.
Glossary
2-phase Commit
2-phase commit protocol is an atomic commitment protocol for distributed systems. It consists of
two phases: commit-request and commit. In commit-request phase, transaction manager
coordinates all of the transaction resources to commit or abort. In commit-phase, transaction
manager decides to finalize operation by committing or aborting according to the votes of the
each transaction resource.
ACID
A set of properties (Atomicity, Consistency, Isolation, Durability) guaranteeing that transactions
are processed reliably. Atomicity requires that each transaction be all or nothing, i.e., if one part
of the transaction fails, the entire transaction fails). Consistency ensures that only valid data
following all rules and constraints is written. Isolation ensures that transactions are securely
and independently processed at the same time without interference (and without transaction
ordering). Durability means that once a transaction has been committed, it will remain so, no
matter if there is a power loss, crash, or error.
Cache
A high-speed access area that can be either a reserved section of main memory or storage
device.
Embedded Topology
|Hazelcast topology where the members are in-process with the user application and act as both
client and server.
Garbage Collection
Garbage collection is the recovery of storage that is being used by an application when that
application no longer needs the storage. This frees the storage for use by other applications (or
processes within an application). It also ensures that an application using increasing amounts of
storage does not reach its quota. Programming languages that use garbage collection are often
interpreted within virtual machines like the JVM. The environment that runs the code is also
767
responsible for garbage collection.
Hazelcast Cluster
A virtual environment formed by Hazelcast members communicating with each other in the
cluster.
Hazelcast Partitions
Memory segments containing the data. Hazelcast is built-on the partition concept, it uses
partitions to store and process data. Each partition can have hundreds or thousands of data
entries depending on your memory capacity. You can think of a partition as a block of data. In
general and optimally, a partition should have a maximum size of 50-100 Megabytes.
IMDG
An in-memory data grid (IMDG) is a data structure that resides entirely in memory and is
distributed among many members in a single location or across multiple locations. IMDGs can
support thousands of in-memory data updates per second and they can be clustered and scaled
in ways that support large quantities of data.
Invalidation
The process of marking an object as being invalid across the distributed cache.
Java heap
Java heap is the space that Java can reserve and use in memory for dynamic memory allocation.
All runtime objects created by a Java application are stored in heap. By default, the heap size is
128 MB, but this limit is reached easily for business applications. Once the heap is full, new
objects cannot be created and the Java application shows errors.
LRU, LFU
LRU and LFU are two of eviction algorithms. LRU is the abbreviation for Least Recently Used. It
refers to entries eligible for eviction due to lack of interest by applications. LFU is the
abbreviation for Least Frequently Used. It refers to the entries eligible for eviction due to having
the lowest usage frequency.
Member
A Hazelcast instance. Depending on your Hazelcast usage, it can refer to a server or a Java
virtual machine (JVM). Members belong to a Hazelcast cluster. Members are also referred as
member nodes, cluster members, or Hazelcast members.
Multicast
A type of communication where data is addressed to a group of destination members
simultaneously.
Near Cache
A caching model. When Near Cache is enabled, an object retrieved from a remote member is put
into the local cache and the future requests made to this object will be handled by this local
member. For example, if you have a map with data that is mostly read, then using Near Cache is
a good idea.
768
NoSQL
"Not Only SQL". A database model that provides a mechanism for storage and retrieval of data
that is tailored in means other than the tabular relations used in relational databases. It is a type
of database which does not adhering to the traditional relational database management system
(RDMS) structure. It is not built on tables and does not employ SQL to manipulate data. It also
may not provide full ACID guarantees, but still has a distributed and fault tolerant architecture.
OSGI
Formerly known as the Open Services Gateway initiative, it describes a modular system and a
service platform for the Java programming language that implements a complete and dynamic
component model.
Partition Table
Table containing all members in the cluster, mappings of partitions to members and further
metadata.
Race Condition
This condition occurs when two or more threads can access shared data and they try to change it
at the same time.
RSA
An algorithm developed by Rivest, Shamir and Adleman to generate, encrypt and decrypt keys
for secure data transmissions.
Serialization
Process of converting an object into a stream of bytes in order to store the object or transmit it to
memory, a database, or a file. Its main purpose is to save the state of an object in order to be able
to recreate it when needed. The reverse process is called deserialization.
Split-brain
Split-brain syndrome, in a clustering context, is a state in which a cluster of members gets
divided (or partitioned) into smaller clusters of members, each of which believes it is the only
active cluster.
Transaction
Means a sequence of information exchange and related work (such as data store updating) that
is treated as a unit for the purposes of satisfying a request and for ensuring data store integrity.
769