Ems Fault Tolerant Configuration On Amazon v2.7
Ems Fault Tolerant Configuration On Amazon v2.7
Trademarks
All brand and product names are trademarks or registered trademarks of their respective holders and are
hereby acknowledged. Technologies described herein are either covered by existing patents or patent
applications are in progress.
Content Warranty
The information in this document is subject to change without notice. THIS DOCUMENT IS PROVIDED "AS
IS" AND TIBCO MAKES NO WARRANTY, EXPRESS, IMPLIED, OR STATUTORY, INCLUDING BUT
NOT LIMITED TO ALL WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE. TIBCO Software Inc. shall not be liable for errors contained herein or for incidental or
consequential damages in connection with the furnishing, performance or use of this material.
Export
This document and related technical data, are subject to U.S. export control laws, including without limitation
the U.S. Export Administration Act and its associated regulations, and may be subject to export or import
regulations of other countries. You agree not to export or re-export this document in any form in violation of
the applicable export or import laws of the United States or any foreign jurisdiction.
1.4 Assumptions
• The reader of this document is familiar with the following concepts:
o The use of Amazon Web Services, and the AWS Console
o TIBCO EMS installation and configuration
o Linux configuration
o NFSv4
• Document only provides information for Amazon Linux 2. Other Linux kernels will be similar.
The following steps will outline setting up the EC2 instances in the AWS console. Amazon Linux 2 was used for the
EC2 instances. Other operating systems or versions are not covered. For the following examples, the US-East-1 zone
was used. Note: Ensure that the Elastic File System (EFS), is available in the AWS zone used.
o Select Amazon Linux 2 AMI (HVM), SSD Volume Type - ami-0742b4e673072066f (64-bit x86)
o Select the appropriate size instance (vCPUs and Memory) for the environment. A t3.medium is fine
for development/testing. Production should be similar to a c5.2xlarge.
o Click on Next:Configure Instance Details
o Change number of instances to 3. Two of the instances are for the for the EMS server instances,
while the third is used for client testing. Use the default VPC, and defaults for the other settings, or
set for the environment. Note: Can do the instances individually, to change instance size for the
client EC2 instance, and to ensure the EC2 instances are in different zones (recommended).
• Once the EC2 Linux instances have started, Putty (on Windows) or ssh on Mac/Linux/UNIX can be used
to access the EC2 instances using the public IP address created by AWS. Note: Security Group must be
configured to allow connectivity.
• Click on the EC2 instance, and then on the Connect button. The Connect to Your Instance instructions
will be displayed. Follow the instructions to connect to the EC2 instance.
o Under Customize:
• Choose throughput mode: Bursting or Provisioned. Bursting throughput mode should be used
for most EMS shared file systems, such development or testing. Use Provisioned throughput
mode for applications that require more throughput than provided by Bursting throughput. Only
production should utilize provisioned throughput. If provisioned is used, select the Throughput
in MiB/s. A valid range of 1 – 1024 MiB/s is available. Note: Provisioned can be expensive at
$6.00 for 1MiB/s, and $6144.00 for 1024 MiB/s per month. A minimum of 100 MiB/s is
recommended, if used.
• Choose performance mode: General Purpose or MAX I/O. No measurable performance gains
were seen with max i/o over GP. GP is recommended for all environments.
• Click on Enable encryption, if desired (recommended). Note: There will be some read/write
performance degradation if selected. Note: If using TLS, select Enable.
• No tags are required for EMS.
• Click on Next Step
o Under Network Access:
• Ensure the same security group is used as the EC2 Linux instances, and that the security group
has the necessary rules required to support TCP inbound/outbound to/from EFS.
©2021 TIBCO Software, Inc. All Rights Reserved. 9
• Ensure all Availability Zones where the EC2 instances are running are selected. Then click Next
Step
o Under File system policy
• Select any policy, if required. No policies are necessary for EMS shared storage.
o Review and Create File System
o Wait for a “Successful” creation
Note: If not using an Amazon Linux2 EC2 instance or TLS, install nfs-utils.
Note: The <DNS Name for the EFS File system> is the DNS name shown in the file systems page shown
previously in the AWS console. It will be similar to fs-abcd1234.efs.us-east-1.amazonaws.com. For the
EFS/TLS mount, just use fs-abcd1234. The mount would be similar to:
sudo mount -t efs –o tls,_netdev,soft,timeo=300,actimeo=1,noresvport fs-abcd1234:/ /home/ec2-user/efs
• The mount can also be added to /etc/fstab. The following is an example of etc/fstab with EFS/TLS:
#
UUID=76e177a9-8195-43cf-84ae-14ea371008b6 / xfs defaults,noatime 1 1
fs-abcd1234:/ /home/ec2-user/efs efs tls,_netdev,soft,actimeo=1,timeo=300,noresvport 0 0
©2021 TIBCO Software, Inc. All Rights Reserved. 10
Figure 6 - Example of /etc/fstab with TLS
Use the mount and the df commands to verify the EFS file share is mounted.
Note: The <DNS Name for the EFS File system> is the DNS name shown in the file systems page
shown previously in the AWS console. It will be similar to fs-abcd1234.efs.us-east-
1.amazonaws.com
Note: Though actimeo is not required, but is recommended. The actimeo=1 option provides for
better asynchronous persisted message write performance
• The mount can also be added to /etc/fstab to make it permanent. The following is an example of the
/etc/fstab with the NFS4 mount.
Use the mount and the df commands to verify the EFS file share is mounted.
This section will outline the installation and configuration of EMS on the EC2 Linux instances.
3.2.1 Stores.conf
In stores.conf, modify/add the following:
• The file_minimum=xxGB should be added to each synchronous data store. By adding this property, EMS
will pre-allocate the space on the shared storage the data store. This will provide a better message write
throughput on disk. The minimum should be 1GB. Expect the initial startup of EMS to take longer as it
creates and allocates the space for the store file.
• The file_crc=enabled should be added. The enables EMS to check for data integrity of the data store.
This is the default on EMS, but this ensures it is set.
[$sys.failsafe]
type=file
file=sync-msgs.db
mode=sync
file_minimum=2GB
file_crc=enabled
[sync2]
type=file
file=sync2-msgs.db
mode=sync
file_minimum=2GB
3.2.2 Factories.conf
The EMS client reconnect properties must be set to enable the EMS client to reconnect to the EMS server in the event
of an EMS server failure in an F/T configuration. The reconnect properties can be defined in a number of ways,
including in the Java/C code, TIBCO application’s configuration file, and/or through the connection factory when they
are used.
The default values may be too low in AWS to reliably allow the EMS client to reconnect to the EMS server after a fail-
over, especially with network or system failure.
It is recommended to set the reconnect_attempt_count to 100, the reconnect_attempt_delay to 5000, and the
reconnect_attempt_timeout to 5000 . With these values, the EMS client will attempt to reconnect 100 times, every 5
seconds.
Note: For configurations with a high number of EMS connections, producers and/or consumers, these numbers may
need to be tuned to provide optimal fail-over reliability.
The following example shows the values for the FTConnectionFactory in factories.conf.
Note: In the following example for the url, <server1> is tibems1 and the <port1> is 7222, and <server2> is tibems2
and the <port2> is 7222. Substitute with the appropriate values for the environment.
[FTConnectionFactory]
type = generic
url = tcp://tibems1:7222,tcp://tibems2:7222
reconnect_attempt_count = 100
reconnect_attempt_delay = 5000
reconnect_attempt_timeout = 5000
3.2.3 Tibemsd.conf
The tibemsd.conf for both EMS Servers needs to be updated for multiple properties.
These include:
• Location of all configuration files – The location must be on the EFS shared storage device.
########################################################################
# Configuration files.
########################################################################
users = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/users.conf"
groups = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/groups.conf"
topics = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/topics.conf"
queues = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/queues.conf"
acl_list = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/acl.conf"
factories = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/factories.conf"
routes = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/routes.conf"
bridges = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/bridges.conf"
transports = "/home/ec2-
user/efs/tibco/cfgmgmt/ems/data/transports.conf"
tibrvcm = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/tibrvcm.conf"
durables = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/durables.conf"
channels = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/channels.conf"
stores = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/stores.conf"
########################################################################
# Persistent Storage.
#
# store: directory to store persistent messages.
©2021 TIBCO Software, Inc. All Rights Reserved. 13
########################################################################
store = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/datastore"
• Log File location – The location must be on the local disk for the EC2 instance. The following example
locates in the ems/8.4/bin/logs directory.
logfile = "/opt/tibco/ems/8.6/bin/logs/tibemsd.log"
• Server and Client Heartbeat and timeout values – These properties determine how long the client/server
listen for the heartbeat from the client/server, before disconnecting. These properties must be set longer
than what is normally configured for a local F/T environment. The values shown below have been tested,
and work well on AWS with NFS4.
server_heartbeat_client = 10
server_timeout_client_connection = 120
client_heartbeat_server = 10
client_timeout_server_connection = 120
Note: For configurations with a high number of EMS connections, producers and/or consumers,
these numbers may need to be tuned to provide optimal fail-over reliability.
• Enabling exiting disk error property – New property since EMS version 8.4. This property defines to
EMS to exit when there is a disk error reading/writing to a device. This property will help prevent “Dual
Active Server” conditions, sometimes seen in networked storage devices. This must be enabled.
always_exit_on_disk_error = enable
• FT properties – Normal properties for the defining the peer EMS server instance, heartbeat between
instances, and etc. Ensure the following is set.
ft_reconnect_timeout = 120
• Define a value for destination_backlog_swapout. This will help limit excessive reads to the shared disk.
A minimum of 20000 is recommended. If the queues, will persistent a larger number of messages,
increase the size.
destination_backlog_swapout = 20000
Once EMS has been started on the EC2 Linux instances, the failover testing can be performed.
This section will outline several test cases, including EMS Server process failure, machine failure, and network failure.
Tests are performed using queues with synchronous persistence set. This guarantees that the shared file system will be
accessed during the tests.
Note: In the following examples, <server> is the IP address of tibems1 and the <port> is 7222. Subtitute
with the appropriate values for the environment.
• After the Java application completes, run tibemsadmin tcp://tibems2:7222 (or tibems1, if it is active), to
verify that there is a minimum of 20000 messages in the sync queue.
©2021 TIBCO Software, Inc. All Rights Reserved. 18
• Restart the EMS instance on tibems1, and stop the EMS instance on tibems2. EMS on tibems1 should
become active, and recover all 20K messages with no errors.
• Use tibemsadmin, and purge the sync queue in preparation for the next test.
• Stop and restart EMS on tibems1 and tibems2 in the foreground. EMS on tibems1 should be the active
EMS instance.
#
# Script to get the current iptables definitions, drop NFS port, then restore the original table definitions
#
#
echo " Saving existing IP table definitions"
echo ""
sudo iptables-save >iptables_save
#
# Drop the NFS port 2049
#
date
echo " Dropping NFS 2049 port"
echo ""
sudo iptables -A INPUT -p TCP --dport 2049 -j DROP
sudo iptables -A OUTPUT -p TCP --dport 2049 -j DROP
#
# Sleep for 2 minutes
#
echo " Sleeping for 2 minutes..."
echo ""
sleep 2m
#
# Restore original IP table definitions"
#
echo " Restoring original IP table definitions"
echo ""
sudo iptables -F
sudo iptables-restore <iptables_save
#
date
echo "Done."
• The active EMS instance on tibems1 should terminate with a disk write error:
• The standby EMS instance on tibems2 should determine EMS on tibems1 is no longer producing a
heartbeat, will attempt to become active. Depending on the amount of data, this should take ~60 seconds
to occur. There can be other warnings, depending on how long it takes for tibems2 to obtain the locks.
• After the Java application completes, run tibemsadmin tcp://tibems2:7222 (or tibems1, if it is active), to
verify that there is a minimum of 20000 messages in the sync queue.
• Restart the EMS instance on tibems1, and stop the EMS instance on tibems2. EMS on tibems1 should
become active, and recover all 20K messages with no errors.
• While still in tibemsadmin, purge the sync queue in preparation for the next test.
• Stop and restart EMS on tibems1 and tibems2 in the foreground. EMS on tibems1 should be the active
EMS instance.
Figure 18 - Stand-by EMS instance recovering from system failure of the primary EMS Instance
• After the Java application completes, run tibemsadmin tcp://tibems2:7222 (or to the active EMS
instance), to verify that there is a minimum of 20000 messages in the sync queue.
• Restart the EMS instance on the restarted EC2 instance, and stop the currently active EMS instance on
the second EC2 instance. EMS should become active on the restarted instance, and recover all 20K
messages with no errors.
• Use tibemsadmin to verify, then purge the sync queue.
• Stop EMS on both Linux EC2 instances.
• This concludes the tests, so all processes, terminal windows, and EC2 instances can be stopped.