0% found this document useful (0 votes)
24 views23 pages

Ems Fault Tolerant Configuration On Amazon v2.7

ems_fault_tolerant_configuration_on_amazon_v2.7

Uploaded by

Thanh Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views23 pages

Ems Fault Tolerant Configuration On Amazon v2.7

ems_fault_tolerant_configuration_on_amazon_v2.7

Uploaded by

Thanh Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Global Headquarters

3307 Hillview Avenue


Palo Alto, CA 94304
Tel: +1 650-846-1000 Configuring a TIBCO Enterprise
Toll Free: 1 800-420-8450 Message ServiceTM Fault Tolerant
Fax: +1 650-846-1005
www.tibco.com
Environment On Amazon Web
Services
This document provides the steps for configuring and testing TIBCO
Enterprise Message Service Fault Tolerance in a Linux operating
environment on AWS utilizing the Elastic File System (EFS)

Version .1 Initial Document

Version .2 Added information


on support for TLS
Tunneling on
TIBCO enables digital
Amazon Linux2
business solutions through
smart technologies that Version .3 Added info on EMS
interconnect everything and
8.5.1 and EFS
augment intelligence. This
combination delivers faster configuration
answers, better decisions, options
and smarter actions. TIBCO
provides a connected set of
technologies and services, Version .4 Updated for EMS
based on 20 years of 8.6 and changes to
innovation, to serve the
needs of all parts of an
the EFS mount
organization—from business
users to developers to data
scientists. Thousands of
customers around the globe
differentiate themselves by
relying on TIBCO to power
innovative business designs
and compelling customer
experiences. Learn how
TIBCO makes digital smarter
at www.tibco.com
Copyright Notice
COPYRIGHT© 2021 TIBCO Software Inc. This document is unpublished and the foregoing notice is affixed
to protect TIBCO Software Inc. in the event of inadvertent publication. All rights reserved. No part of this
document may be reproduced in any form, including photocopying or transmission electronically to any
computer, without prior written consent of TIBCO Software Inc. The information contained in this document
is confidential and proprietary to TIBCO Software Inc. and may not be used or disclosed except as
expressly authorized in writing by TIBCO Software Inc. Copyright protection includes material generated
from our software programs displayed on the screen, such as icons, screen displays, and the like.

Trademarks
All brand and product names are trademarks or registered trademarks of their respective holders and are
hereby acknowledged. Technologies described herein are either covered by existing patents or patent
applications are in progress.

Content Warranty
The information in this document is subject to change without notice. THIS DOCUMENT IS PROVIDED "AS
IS" AND TIBCO MAKES NO WARRANTY, EXPRESS, IMPLIED, OR STATUTORY, INCLUDING BUT
NOT LIMITED TO ALL WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE. TIBCO Software Inc. shall not be liable for errors contained herein or for incidental or
consequential damages in connection with the furnishing, performance or use of this material.

Export
This document and related technical data, are subject to U.S. export control laws, including without limitation
the U.S. Export Administration Act and its associated regulations, and may be subject to export or import
regulations of other countries. You agree not to export or re-export this document in any form in violation of
the applicable export or import laws of the United States or any foreign jurisdiction.

For more information, please contact:

TIBCO Software Inc.


3303 Hillview Avenue
Palo Alto, CA 94304
USA

©2021 TIBCO Software, Inc. All Rights Reserved. 2


Table of Contents
Configuring a TIBCO Enterprise Message ServiceTM Fault Tolerant Environment On Amazon Web
Services .................................................................................................................................................. 1
1 Overview ........................................................................................................................................ 5
1.1 Document Purpose ............................................................................................................................. 5
1.2 Linux Kernel Version Support .............................................................................................................. 5
1.3 EMS Version Support .......................................................................................................................... 5
1.4 Assumptions ........................................................................................................................................ 5
2 AWS Setup for Linux ...................................................................................................................... 6
2.1 Creating a new EC2 Instance .................................................................................................................... 6
2.2 Setting Up the Elastic File System (EFS).................................................................................................... 8
2.3 Setting up the EC2 Instance for EMS F/T ................................................................................................ 10
2.3.1 Additional Software Installation ................................................................................................... 10
2.3.2 Setting up the EFS/TLS mount for the EFS Storage ....................................................................... 10
2.3.3 Setting up the NFS mount for the EFS Storage.............................................................................. 11
3 Enterprise Message Service Installation and Configuration ....................................................... 12
3.1 EMS Installation...................................................................................................................................... 12
3.2 EMS Configuration.................................................................................................................................. 12
3.2.1 Stores.conf .................................................................................................................................... 12
3.2.2 Factories.conf ................................................................................................................................ 13
3.2.3 Tibemsd.conf .................................................................................................................................. 13
3.2.4 Starting the EMS Instances ........................................................................................................... 14
4 Testing EMS Fault Tolerance on AWS with EFS .............................................................................. 16
4.1 EMS Client App Setup ............................................................................................................................. 16
4.2 Performing the EMS Fault Tolerant Test Cases ..................................................................................... 17
4.2.1 EMS Process Failure Test............................................................................................................... 17
4.2.2 Network Failure Test on Linux....................................................................................................... 19
4.2.3 System Failure Test ....................................................................................................................... 22

©2021 TIBCO Software, Inc. All Rights Reserved. 3


Table of Figures

FIGURE 1 – AMAZON WEB SERVICES CONSOLE ............................................................................................................................6


FIGURE 2 – CONFIGURE EC2 INSTANCE DETAILS ...........................................................................................................................7
FIGURE 3 – SECURITY GROUP INPUTS .........................................................................................................................................7
FIGURE 4 - RUNNING AWS EC2 INSTANCES ................................................................................................................................8
FIGURE 5 - CUSTOMIZE EFS......................................................................................................................................................9
FIGURE 6 - EXAMPLE OF /ETC/FSTAB WITH TLS ..........................................................................................................................11
FIGURE 7 - STORES.CONF EXAMPLE ..........................................................................................................................................13
FIGURE 8 - CONNECTION FACTORY SETTINGS .............................................................................................................................13
FIGURE 9 – LINUX EMS STARTUP ............................................................................................................................................15
FIGURE 10 - CREATE THE SYNC QUEUE ......................................................................................................................................17
FIGURE 11 - RUNNING TIBJMSMSGPRODUCERPERF ....................................................................................................................18
FIGURE 12 - STANDBY EMS BECOMING ACTIVE ON TIBEMS2 .........................................................................................................18
FIGURE 13 - PURGE THE SYNC QUEUE FROM TIBEMSADMIN ..........................................................................................................19
FIGURE 14 - DROP_NFS.SH SCRIPT ...........................................................................................................................................20
FIGURE 15 - RUNNING DROP_NFS.SH .......................................................................................................................................21
FIGURE 16 - DISK WRITE ERROR ON TIBEMS1 ............................................................................................................................21
FIGURE 17 - AWS CONSOLE ON THE EC2 DASHBOARD ...............................................................................................................22
FIGURE 18 - STAND-BY EMS INSTANCE RECOVERING FROM SYSTEM FAILURE OF THE PRIMARY EMS INSTANCE .......................................23

©2021 TIBCO Software, Inc. All Rights Reserved. 4


1 Overview

1.1 Document Purpose


The purpose of the document is to provide a guide to install, configure, and run TIBCO Enterprise Message ServiceTM
(EMS) in a fault-tolerant configuration on Amazon Web Service (AWS) utilizing AWS’s Elastic File System (EFS) for
shared storage. In addition, the document will provide the steps and expected results for testing EMS F/T on AWS with
EFS.
The document will outline:
• Setting up Amazon Linux 2 EC2 instances on AWS
• Setting up EFS for the shared file system in AWS
• Setting up the NFS4 mount on the Amazon/Linux EC2 instance
• Setting up the Amazon Linux2 EC2 instance to support TLS tunneling when mounting the EFS file
system
• Installing and configuring EMS for F/T
• Tuning EMS for AWS/EFS
• Running tests for:
o EMS process failure
o Network failure between the EC2 instance running EMS and the shared storage
o Accidental reboot of the EC2 instance from the AWS Dashboard

1.2 Linux Kernel Version Support


This document covers the installation and configuration of EMS on the Amazon Linux 2 kernel, 4.14.225-
169.362amzn2.x86_64.
However, other Linux kernels that are equivalent can be used, such as Red Hat/Centos 7.9 or greater. Ensure that a
supported Linux kernel is used.
Note: The Amazon Linux2 kernel supports TLS tunneling over NFS4 to support EMS data encryption end to end on
AWS. Modification to the EMS configuration and to EFS are required to support this capability. The modifications
necessary are provided in this document, where noted. If Red Hat, CentOS, or other Linux OS is used, check with the
vendor on support of TLS tunneling.

1.3 EMS Version Support


EMS version 8.6.0 or later should be used for EMS server installations running on AWS environments. Any available
EMS hot fixes should also be applied.

1.4 Assumptions
• The reader of this document is familiar with the following concepts:
o The use of Amazon Web Services, and the AWS Console
o TIBCO EMS installation and configuration
o Linux configuration
o NFSv4

• Document only provides information for Amazon Linux 2. Other Linux kernels will be similar.

©2021 TIBCO Software, Inc. All Rights Reserved. 5


2 AWS Setup for Linux

The following steps will outline setting up the EC2 instances in the AWS console. Amazon Linux 2 was used for the
EC2 instances. Other operating systems or versions are not covered. For the following examples, the US-East-1 zone
was used. Note: Ensure that the Elastic File System (EFS), is available in the AWS zone used.

2.1 Creating a new EC2 Instance


• Login into the AWS console.
• To create a new AWS EC2 Linux instance, use the following:
o In the AWS console, select Services, EC2, and then Launch Instance to create a new Amazon EC2
instance. Three EC2 instances are needed: two for EMS F/T and one for the client application.

Figure 1 – Amazon Web Services Console

o Select Amazon Linux 2 AMI (HVM), SSD Volume Type - ami-0742b4e673072066f (64-bit x86)
o Select the appropriate size instance (vCPUs and Memory) for the environment. A t3.medium is fine
for development/testing. Production should be similar to a c5.2xlarge.
o Click on Next:Configure Instance Details
o Change number of instances to 3. Two of the instances are for the for the EMS server instances,
while the third is used for client testing. Use the default VPC, and defaults for the other settings, or
set for the environment. Note: Can do the instances individually, to change instance size for the
client EC2 instance, and to ensure the EC2 instances are in different zones (recommended).

©2021 TIBCO Software, Inc. All Rights Reserved. 6


Figure 2 – Configure EC2 Instance Details

o Click Next: Storage


o Configure the Volume Type and the Size in GB. These will depend on what else is be installed other
than EMS. If just EMS, 20GB is sufficient.
o Click on Next: Add Tags, and then Next: Configure Security Group
o Either use a new security group, or an existing security group. Ensure the security group used has
rules to support SSH, TCP, NFS, and ICMP-IPv4 at the minimum. Must consider Source, Port
Range, Protocol, and Type.

Figure 3 – Security Group Inputs

o Click on Review and Launch

©2021 TIBCO Software, Inc. All Rights Reserved. 7


Figure 4 - Running AWS EC2 Instances

• Once the EC2 Linux instances have started, Putty (on Windows) or ssh on Mac/Linux/UNIX can be used
to access the EC2 instances using the public IP address created by AWS. Note: Security Group must be
configured to allow connectivity.
• Click on the EC2 instance, and then on the Connect button. The Connect to Your Instance instructions
will be displayed. Follow the instructions to connect to the EC2 instance.

2.2 Setting Up the Elastic File System (EFS)


To setup the Elastic File System to mount on your Linux EC2 instance, use the following:
• From the AWS console, click on Services
• Under Storage, click on EFS
• Click on Create file system
o Use the appropriate VPC. This must be the same VPC used for the EC2 Linux instances
o Select Regional support, if the two EMS server instances are running in separate AWS zones
(recommended).
o Click on the customize tab.

©2021 TIBCO Software, Inc. All Rights Reserved. 8


Figure 5 - Customize EFS

o Under Customize:
• Choose throughput mode: Bursting or Provisioned. Bursting throughput mode should be used
for most EMS shared file systems, such development or testing. Use Provisioned throughput
mode for applications that require more throughput than provided by Bursting throughput. Only
production should utilize provisioned throughput. If provisioned is used, select the Throughput
in MiB/s. A valid range of 1 – 1024 MiB/s is available. Note: Provisioned can be expensive at
$6.00 for 1MiB/s, and $6144.00 for 1024 MiB/s per month. A minimum of 100 MiB/s is
recommended, if used.
• Choose performance mode: General Purpose or MAX I/O. No measurable performance gains
were seen with max i/o over GP. GP is recommended for all environments.
• Click on Enable encryption, if desired (recommended). Note: There will be some read/write
performance degradation if selected. Note: If using TLS, select Enable.
• No tags are required for EMS.
• Click on Next Step
o Under Network Access:
• Ensure the same security group is used as the EC2 Linux instances, and that the security group
has the necessary rules required to support TCP inbound/outbound to/from EFS.
©2021 TIBCO Software, Inc. All Rights Reserved. 9
• Ensure all Availability Zones where the EC2 instances are running are selected. Then click Next
Step
o Under File system policy
• Select any policy, if required. No policies are necessary for EMS shared storage.
o Review and Create File System
o Wait for a “Successful” creation

• Once the new EFS File System is created:


o Should still be on the File systems page in the AWS console
• Refresh the page
• Select the newly created file system. Note that the “File system state” is “Available” for all
zones.
• Verify the correct VPC and Security Group were used
• Click on the Attach tab, and review the Amazon EC2 mount instructions and the EFS ID.

2.3 Setting up the EC2 Instance for EMS F/T


Use the following to setup two of the newly created EC2 instances for EMS fault tolerance. All steps must be
completed on both of the EMS Server instances.

2.3.1 Additional Software Installation


• Login to the EC2 instance using SSH. Use the instructions provided in Connect to Your Instance from the
previous section
• Update the EC2 instance to the latest Linux version, for Java, and the amazon-efs-utils. Note: This step
must be performed on all three EC2 instances.
o sudo yum update
o sudo yum install amazon-efs-utils java-devel

Note: If not using an Amazon Linux2 EC2 instance or TLS, install nfs-utils.

2.3.2 Setting up the EFS/TLS mount for the EFS Storage


If TLS is being used to provide encryption between the EMS Server and the EFS file system (recommended), use the
following steps.

• Ensure amazon-efs-utils have been installed.


• Create a new mount point on the Amazon Linux2 EC2 instance.
o mkdir /home/ec2-user/efs
• Mount the new EFS file system with EFS. Note: The options used in the EFS mount are required! Failure
to use the listed options may result in message loss/corruption, or EMS not performing at an acceptable level:
o sudo mount -t efs –o tls,_netdev,soft,timeo=300,actimeo=1,noresvport <First 10 digits of the DNS Name
for EFS file system>:/ /home/ec2-user/efs

Note: The <DNS Name for the EFS File system> is the DNS name shown in the file systems page shown
previously in the AWS console. It will be similar to fs-abcd1234.efs.us-east-1.amazonaws.com. For the
EFS/TLS mount, just use fs-abcd1234. The mount would be similar to:
sudo mount -t efs –o tls,_netdev,soft,timeo=300,actimeo=1,noresvport fs-abcd1234:/ /home/ec2-user/efs

• The mount can also be added to /etc/fstab. The following is an example of etc/fstab with EFS/TLS:

#
UUID=76e177a9-8195-43cf-84ae-14ea371008b6 / xfs defaults,noatime 1 1
fs-abcd1234:/ /home/ec2-user/efs efs tls,_netdev,soft,actimeo=1,timeo=300,noresvport 0 0
©2021 TIBCO Software, Inc. All Rights Reserved. 10
Figure 6 - Example of /etc/fstab with TLS

Use the mount and the df commands to verify the EFS file share is mounted.

2.3.3 Setting up the NFS mount for the EFS Storage


The EFS file system is mounted on the Linux EC2 instances using standard NFS4 mount commands, if encryption is
not required.
• Create a new mount point on the Linux EC2 instance.
o mkdir /home/ec2-user/efs
• Mount the new EFS file system with NFS. Note: The options used in the NFS4 mount are required!
Failure to use the listed options may result in message loss/corruption, or EMS not performing at an
acceptable level:
o sudo mount -t nfs4 –o nfsvers=4.1,_netdev,soft,timeo=300,noresvport,actimeo=1 <DNS Name for
EFS file system>:/ /home/ec2-user/efs

Note: The <DNS Name for the EFS File system> is the DNS name shown in the file systems page
shown previously in the AWS console. It will be similar to fs-abcd1234.efs.us-east-
1.amazonaws.com

Note: Though actimeo is not required, but is recommended. The actimeo=1 option provides for
better asynchronous persisted message write performance
• The mount can also be added to /etc/fstab to make it permanent. The following is an example of the
/etc/fstab with the NFS4 mount.

UUID=f5bd1ae0-85b5-4686-85ff-ed8deb328c92 / xfs defaults,noatime 1 1


fs-abcd1234.efs.us-east-1.amazonaws.com:/ /home/ec2-user/efs nfs4
nfsvers=4.1,_netdev,soft,noresvport,actimeo=1,timeo=300

Use the mount and the df commands to verify the EFS file share is mounted.

©2021 TIBCO Software, Inc. All Rights Reserved. 11


3 Enterprise Message Service Installation and Configuration

This section will outline the installation and configuration of EMS on the EC2 Linux instances.

3.1 EMS Installation


EMS version 8.6.0 or later is required for EMS installations running on AWS. Install EMS on all of the EC2 instances.
Nothing specific or custom is required to the base configuration of EMS. Follow the TIBCO EMS Installation Guide
for installing EMS 8.6.0.
Once EMS is installed, use the following to configure EMS for F/T.
• On one of the EC2 instances used for the EMS server:
o Create the directory on the EFS file system for the shared EMS configuration files and data stores.
Ex: mkdir –parents /home/ec2-user/efs/tibco/cfgmgmt/ems/data/datastore
o Copy the EMS configuration files (.conf) installed during the EMS installation
($TIBCO_HOME/ems/8.6/samples/config) to the newly created /home/ec2-
user/efs/tibco/cfgmgmt/ems/data directory
o Copy the tibemsd.conf to $TIBCO_HOME/ems/8.6/bin
o Create the $TIBCO_HOME/ems/8.6/bin/logs directory
• On the second EC2 instance used for the EMS server:
o Copy the tibemsd.conf to $TIBCO_HOME/ems/8.6/bin
o Create the $TIBCO_HOME/ems/8.6/bin/logs directory

3.2 EMS Configuration


There are specific configuration changes which must be made to provide better write performance and reliability of
EMS F/T on AWS/EFS. This section will discuss these changes. See the EMS User Guide for additional information
on setting or the use of, any properties discussed.

3.2.1 Stores.conf
In stores.conf, modify/add the following:
• The file_minimum=xxGB should be added to each synchronous data store. By adding this property, EMS
will pre-allocate the space on the shared storage the data store. This will provide a better message write
throughput on disk. The minimum should be 1GB. Expect the initial startup of EMS to take longer as it
creates and allocates the space for the store file.
• The file_crc=enabled should be added. The enables EMS to check for data integrity of the data store.
This is the default on EMS, but this ensures it is set.

The following is an example of stores.conf with the changes.

[$sys.failsafe]
type=file
file=sync-msgs.db
mode=sync
file_minimum=2GB
file_crc=enabled

[sync2]
type=file
file=sync2-msgs.db
mode=sync
file_minimum=2GB

©2021 TIBCO Software, Inc. All Rights Reserved. 12


file_crc=enabled

Figure 7 - Stores.conf example

3.2.2 Factories.conf
The EMS client reconnect properties must be set to enable the EMS client to reconnect to the EMS server in the event
of an EMS server failure in an F/T configuration. The reconnect properties can be defined in a number of ways,
including in the Java/C code, TIBCO application’s configuration file, and/or through the connection factory when they
are used.
The default values may be too low in AWS to reliably allow the EMS client to reconnect to the EMS server after a fail-
over, especially with network or system failure.
It is recommended to set the reconnect_attempt_count to 100, the reconnect_attempt_delay to 5000, and the
reconnect_attempt_timeout to 5000 . With these values, the EMS client will attempt to reconnect 100 times, every 5
seconds.
Note: For configurations with a high number of EMS connections, producers and/or consumers, these numbers may
need to be tuned to provide optimal fail-over reliability.
The following example shows the values for the FTConnectionFactory in factories.conf.
Note: In the following example for the url, <server1> is tibems1 and the <port1> is 7222, and <server2> is tibems2
and the <port2> is 7222. Substitute with the appropriate values for the environment.

[FTConnectionFactory]
type = generic
url = tcp://tibems1:7222,tcp://tibems2:7222
reconnect_attempt_count = 100
reconnect_attempt_delay = 5000
reconnect_attempt_timeout = 5000

Figure 8 - Connection Factory Settings

3.2.3 Tibemsd.conf
The tibemsd.conf for both EMS Servers needs to be updated for multiple properties.
These include:
• Location of all configuration files – The location must be on the EFS shared storage device.

########################################################################
# Configuration files.
########################################################################

users = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/users.conf"
groups = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/groups.conf"
topics = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/topics.conf"
queues = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/queues.conf"
acl_list = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/acl.conf"
factories = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/factories.conf"
routes = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/routes.conf"
bridges = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/bridges.conf"
transports = "/home/ec2-
user/efs/tibco/cfgmgmt/ems/data/transports.conf"
tibrvcm = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/tibrvcm.conf"
durables = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/durables.conf"
channels = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/channels.conf"
stores = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/stores.conf"

########################################################################
# Persistent Storage.
#
# store: directory to store persistent messages.
©2021 TIBCO Software, Inc. All Rights Reserved. 13
########################################################################

store = "/home/ec2-user/efs/tibco/cfgmgmt/ems/data/datastore"

• Log File location – The location must be on the local disk for the EC2 instance. The following example
locates in the ems/8.4/bin/logs directory.

logfile = "/opt/tibco/ems/8.6/bin/logs/tibemsd.log"

• Server and Client Heartbeat and timeout values – These properties determine how long the client/server
listen for the heartbeat from the client/server, before disconnecting. These properties must be set longer
than what is normally configured for a local F/T environment. The values shown below have been tested,
and work well on AWS with NFS4.

server_heartbeat_client = 10
server_timeout_client_connection = 120
client_heartbeat_server = 10
client_timeout_server_connection = 120

Note: For configurations with a high number of EMS connections, producers and/or consumers,
these numbers may need to be tuned to provide optimal fail-over reliability.

• Enabling exiting disk error property – New property since EMS version 8.4. This property defines to
EMS to exit when there is a disk error reading/writing to a device. This property will help prevent “Dual
Active Server” conditions, sometimes seen in networked storage devices. This must be enabled.

always_exit_on_disk_error = enable

• FT properties – Normal properties for the defining the peer EMS server instance, heartbeat between
instances, and etc. Ensure the following is set.
ft_reconnect_timeout = 120

• Define a value for destination_backlog_swapout. This will help limit excessive reads to the shared disk.
A minimum of 20000 is recommended. If the queues, will persistent a larger number of messages,
increase the size.

destination_backlog_swapout = 20000

3.2.4 Starting the EMS Instances


Once the configuration files are updated, EMS can be started. It is recommended that the –forceStart parameter is used
when starting EMS. Start both instances, taking note of which instance is the active EMS instance.
Note: Leave the both windows to the EMS server instances open. This will be needed for the testing.

©2021 TIBCO Software, Inc. All Rights Reserved. 14


Figure 9 – Linux EMS Startup

©2021 TIBCO Software, Inc. All Rights Reserved. 15


4 Testing EMS Fault Tolerance on AWS with EFS

Once EMS has been started on the EC2 Linux instances, the failover testing can be performed.
This section will outline several test cases, including EMS Server process failure, machine failure, and network failure.
Tests are performed using queues with synchronous persistence set. This guarantees that the shared file system will be
accessed during the tests.

4.1 EMS Client App Setup


The third EC2 instance is used to run the test application. EMS is shipped with sample Java applications which can be
used for the testing. The tibjmsMsgProducerPerf utility should be used for the testing. All sample Java applications are
located in $TIBCO_HOME/ems/8.6/samples/java. Use the following to setup the environment:
• Ensure the version Java 1.8 development environment is installed.
• Install EMS version 8.6.0 on the third EC2 instance following the EMS installation procedures.
• After the installation of EMS is completed:
o cd to $TIBCO_HOME/ems/8.6/samples/java
o . ./setup.sh on Linux
o javac *.java – This should compile all Java apps in the directory
• Ensure that at least one of the EMS server instances is running (both should be running)
• Use the TIBCO EMS Administration Tool to create the EMS Queue sync utilizing the $sys.failsafe data
store. This is required for testing with a synchronous data store:
o cd to $TIBCO_HOME/ems/8.6/bin
o ./tibemsadmin tcp://<server>:port

Note: In the following examples, <server> is the IP address of tibems1 and the <port> is 7222. Subtitute
with the appropriate values for the environment.

©2021 TIBCO Software, Inc. All Rights Reserved. 16


Figure 10 - Create the Sync queue

4.2 Performing the EMS Fault Tolerant Test Cases


Three different tests should be performed:
1. EMS Process failure – Active EMS is stopped
2. Network failure – Network failure between the Active EMS Server machine, and the Elastic File System
3. System failure – Accidental restart of the EC2 Instance running the Active EMS server instance
This section will outline how to run these three tests, and what the expected results should be.
Note: All test cases must be run from the third virtual machine where the Java sample app was compiled.

4.2.1 EMS Process Failure Test


This test verifies that an EMS client continues to function correctly, with no message loss during an EMS server
process failover.
Two EMS server instances will be running in a F/T configuration, while messages are being sent. The active EMS
instance will be stopped, the stand-by EMS instance should take over, and continue processing messages until the EMS
Java completes publishing messages.
Note: In the following examples, <server1> is the IP address of tibems1 and the <port1> is 7222, and <server2> is the
IP address of tibems2 and the <port2> is 7222. Tibclient is the EC2 instance running the Java apps. Substitute with the
appropriate values for the environment.

4.2.1.1 Running the Process Failure Test


• Three ssh terminal sessions are needed for this test; one for tibems1, one for tibems2, and one for tibclient
• Start EMS on tibems1 and tibems2 in the foreground. EMS on tibems1 should be the active EMS
instance.
©2021 TIBCO Software, Inc. All Rights Reserved. 17
• From tibclient, start the Java application
o Cd to $TIBCO_HOME/ems/8.6/samples/java
o . ./setup.sh
o java tibjmsMsgProducerPerf –server tcp://tibems1:7222, tcp://tibems2:7222 –factory
FTConnectionFactory –delivery PERSISTENT –connections 10 –threads 8 –count 20000 –size 1024
–queue sync

Figure 11 - Running tibjmsMsgProducerPerf

• Immediately kill/stop the EMS instance on tibems1, with cntrl-C


• The standby EMS instance on tibems2 will become active, and recover all messages. It should be possible
to stop and start the EMS instances a few times while the Java test application is running. The number of
recovered messages will increase.

Figure 12 - Standby EMS becoming active on tibems2

• After the Java application completes, run tibemsadmin tcp://tibems2:7222 (or tibems1, if it is active), to
verify that there is a minimum of 20000 messages in the sync queue.
©2021 TIBCO Software, Inc. All Rights Reserved. 18
• Restart the EMS instance on tibems1, and stop the EMS instance on tibems2. EMS on tibems1 should
become active, and recover all 20K messages with no errors.
• Use tibemsadmin, and purge the sync queue in preparation for the next test.

Figure 13 - Purge the Sync queue from tibemsadmin

• Stop and restart EMS on tibems1 and tibems2 in the foreground. EMS on tibems1 should be the active
EMS instance.

4.2.1.2 Expected Results


The Java test application should complete, with a slight pause during failover but should resume sending messages
once the failover is complete. No messages should be lost. There may be more than 20K messages, but never should
there be less than 20K. Depending on the number of messages that must be recovered, the fail-over should be very
short, possibly less than 1 second.

4.2.2 Network Failure Test on Linux


This test verifies that an EMS client continues to function correctly, with no message loss during a network failure
between the active EMS server instance, and the EFS shared file system.
Two EMS server instances will be running in a F/T configuration, while messages are being sent. The TCP port will be
blocked between then active EMS instance and the AWS/EFS file system via iptables. The active EMS instance should
get a write error, and exit, allowing the stand-by EMS instance to gain the locks on the EMS data stores, and take over.
The EMS Java application should continue processing messages until it completes.
Note: In the following Linux examples, <server1> is IP address of tibems1 and the <port1> is 7222, and <server2> is
the IP address of tibems2 and the <port2> is 7222. Tibclient is the EC2 instance running the Java apps. Substitute with
the appropriate values for the environment.

©2021 TIBCO Software, Inc. All Rights Reserved. 19


4.2.2.1 Running the Network Failure Test
• Four ssh terminal sessions are needed for this test; two for tibems1, one for tibems2, and one for tibclient
• A script will be needed to block the NFS port (2049) on tibems1 while the Java app is publishing
messages. The following figure shows the drop_nfs.sh script. Cut and paste the following to create the
script. The script must be created in the second ssh terminal window on tibems1.

#
# Script to get the current iptables definitions, drop NFS port, then restore the original table definitions
#
#
echo " Saving existing IP table definitions"
echo ""
sudo iptables-save >iptables_save
#
# Drop the NFS port 2049
#
date
echo " Dropping NFS 2049 port"
echo ""
sudo iptables -A INPUT -p TCP --dport 2049 -j DROP
sudo iptables -A OUTPUT -p TCP --dport 2049 -j DROP

sudo iptables --list

#
# Sleep for 2 minutes
#
echo " Sleeping for 2 minutes..."
echo ""
sleep 2m
#
# Restore original IP table definitions"
#
echo " Restoring original IP table definitions"
echo ""
sudo iptables -F
sudo iptables-restore <iptables_save
#
date
echo "Done."

Figure 14 - drop_nfs.sh script

• From tibclient, start the Java application


o cd to $TIBCO_HOME/ems/8.6/samples/java
o . ./setup.sh
o java tibjmsMsgProducerPerf –server tcp://tibems1:7222, tcp://tibems2:7222 –factory
FTConnectionFactory –delivery PERSISTENT –connections 10 –threads 8 –count 20000 –size 1024
–queue sync
• From the second ssh terminal window on tibems1, run drop_nfs.sh

©2021 TIBCO Software, Inc. All Rights Reserved. 20


Figure 15 - Running drop_nfs.sh

• The active EMS instance on tibems1 should terminate with a disk write error:

Figure 16 - Disk Write Error on tibems1

• The standby EMS instance on tibems2 should determine EMS on tibems1 is no longer producing a
heartbeat, will attempt to become active. Depending on the amount of data, this should take ~60 seconds
to occur. There can be other warnings, depending on how long it takes for tibems2 to obtain the locks.
• After the Java application completes, run tibemsadmin tcp://tibems2:7222 (or tibems1, if it is active), to
verify that there is a minimum of 20000 messages in the sync queue.
• Restart the EMS instance on tibems1, and stop the EMS instance on tibems2. EMS on tibems1 should
become active, and recover all 20K messages with no errors.
• While still in tibemsadmin, purge the sync queue in preparation for the next test.
• Stop and restart EMS on tibems1 and tibems2 in the foreground. EMS on tibems1 should be the active
EMS instance.

4.2.2.2 Expected Results


The Java test application should complete, pausing during the failover, but should resume sending messages once the
failover is complete. No messages should be lost. There can be more than 20K messages, depending on the number of
connections/threads, but never should there be less than 20K. Depending on the number of messages that must be
recovered, the fail-over can take ~60 seconds. It has been observed with the AWS/EFS file system that with the
©2021 TIBCO Software, Inc. All Rights Reserved. 21
network failure test, the EMS recovery of the messages will take longer than the other tests. The EMS instance on
tibems2 should be able to obtain the locks without multiple attempts. This is expected behavior, provided the mount
options described in section 2.3.2 are used.

4.2.3 System Failure Test


This test verifies that an EMS client continues to function correctly, with no message loss during a system failure on
the EC2 instance running the active EMS server instance. This is not a normal occurrence. However, it is possible to
accidentally restart the virtual machine from the AWS console.
Two EMS server instances will be running in a F/T configuration, while messages are being sent. From the AWS
console, the EC2 instance where the active EMS instance is running will be restarted. The stand-by EMS instance
should be able to gain the locks on the EMS data stores, and take over. The EMS Java application should continue
processing messages until it completes.
Note: In the following examples, <server1> is the IP address of tibems1 and the <port1> is 7222, and <server2> is the
IP address of tibems2 and the <port2> is 7222. Tibclient is the EC2 instance running the Java apps. Substitute with the
appropriate values for the environment.

4.2.3.1 Running the System Failure Test


• Three ssh terminal sessions are needed for this test; one for each of the virtual machines.
• The AWS console must also be available, and be on the EC2 dashboard page.

Figure 17 - AWS Console on the EC2 Dashboard

• From tibclient, start the Java application


o cd to $TIBCO_HOME/ems/8.6/samples/java
o . ./setup.sh
o java tibjmsMsgProducerPerf –server tcp://tibems1:7222, tcp://tibems2:7222 –factory
FTConnectionFactory –delivery PERSISTENT –connections 10 –threads 8 –count 20000 –size 1024
–queue sync
• From the EC2 dashboard, click on the tibems1 instance, click on instance state, and then click on reboot
instance. This will reboot the tibems1 EC2 instance.

©2021 TIBCO Software, Inc. All Rights Reserved. 22


• The ssh terminal window to the tibems1 EC2 Linux instance should immediately terminate, and the
stand-by EMS instance should recover all messages, and become active within a few seconds, depending
on the number of messages to be recovered.

Figure 18 - Stand-by EMS instance recovering from system failure of the primary EMS Instance

• After the Java application completes, run tibemsadmin tcp://tibems2:7222 (or to the active EMS
instance), to verify that there is a minimum of 20000 messages in the sync queue.
• Restart the EMS instance on the restarted EC2 instance, and stop the currently active EMS instance on
the second EC2 instance. EMS should become active on the restarted instance, and recover all 20K
messages with no errors.
• Use tibemsadmin to verify, then purge the sync queue.
• Stop EMS on both Linux EC2 instances.
• This concludes the tests, so all processes, terminal windows, and EC2 instances can be stopped.

4.2.3.2 Expected Results


The Java test application should complete, pausing during failover, but should resume sending messages once the
failover is complete. No messages should be lost. There can be more than 20K messages, depending on the number of
connections/threads, but there should never be less than 20K messages. The fail-over should only take a few seconds,
depending on the number of messages that must be recovered. It has been observed with the AWS reboot of the EC2
instance that the EMS recovery is virtually no longer than a EMS process failure.

©2021 TIBCO Software, Inc. All Rights Reserved. 23

You might also like