FIS Report
FIS Report
Workflow of FIS
Simulation Process :-
Simulating AWS Fault Injection Simulator (FIS) involves setting up
experiments that inject failures into your AWS environment to test the
resilience of your applications. These experiments can simulate failures like
stopping EC2 instances, injecting network latency, or even simulating
Availability Zone (AZ) failures. Here’s a step-by-step guide on how to
simulate AWS FIS:
1. Prerequisites
IAM Role: Create an IAM role that grants AWS FIS permission to perform
actions on the targeted resources (e.g., EC2, RDS).
AWS CLI / Console Access: You can either use the AWS Management
Console, AWS CLI, or SDK to create and manage FIS experiments.
AWS Services Setup: Ensure you have running resources such as EC2
instances, Auto Scaling Groups, RDS databases, or other AWS services
that you want to test.
2. Create an IAM Role for FIS
FIS needs permission to perform fault injection activities like stopping EC2
instances or adding network latencies. To create the role:
1. Go to the IAM Console.
2. Create a role with the following policies:
o AWSFISServiceRolePolicy: This allows FIS to access AWS
resources.
o Add specific permissions for EC2, RDS, Auto Scaling, etc.,
depending on what you want to test.
3. Define an AWS FIS Experiment Template
An experiment template is a blueprint that defines the actions, resources,
and conditions under which the FIS will inject faults.
1. Access FIS in the AWS Management Console:
o Navigate to AWS Fault Injection Simulator.
o Click on Create experiment template.
Targets: Add targets to specify which AWS resources the actions will
be applied to.
Experiment options:
Service access:
Stop conditions:
Option to add a CloudWatch alarm as a stop condition that will
terminate the experiment if triggered.
Logs:
Tags:
Section to add tags (up to 50) to help manage and identify the
experiment.
Actions: Allows adding actions that the experiment will take (e.g.,
rebooting or stopping resources).
Service access:
IAM Role: Specifies the role required by AWS FIS to run the
experiment. In this case, it is creating a new role named AWSSIMRole-
1729235690655.
Stop conditions:
Logs:
Tags:
The Availability Zone Failover Test utilizes a standardized model involving two
nodes in two separate Availability Zones (AZs), resulting in a total of four
nodes for failover. This test ensures that if one AZ becomes unavailable, the
application can seamlessly shift workloads to the healthy AZ without
impacting performance. Implementing this approach is crucial for
maintaining minimal downtime in multi-AZ deployments, thereby enhancing
the overall reliability and resilience of the application.
Test Model
The test is designed around a standardized model that sets up:
4 nodes in total
o 2 nodes (EC2 instances) in AZ1
o 2 nodes (EC2 instances) in AZ2
The test utilizes AWS Fault Injection Simulator (FIS) to inject faults, simulate a
failure in one AZ, and monitor the system’s response in the other AZ.
Test Setup
1. AWS Resources
AWS EC2 Instances:
o Two instances in us-east-1a (AZ1)
o Two instances in us-east-1b (AZ2)
AWS Fault Injection Simulator (FIS):
o Used to stop the EC2 instances in AZ1 to simulate an outage.
CloudWatch:
o Used to monitor performance metrics like CPU utilization to ensure that
failover is functioning as expected.
2. Test Steps
Step 1: Create EC2 Instances in Multiple AZs
The following Python script uses Boto3 to provision two EC2 instances in two
different AZs (us-east-1a and us-east-1b).
Instances Created:
o AZ1 (us-east-1a):
Instance ID: i-xxxxxxxxxxxxxxxxx
Instance ID: i-xxxxxxxxxxxxxxxxx
o AZ2 (us-east-1b):
Instance ID: i-xxxxxxxxxxxxxxxxx
Instance ID: i-xxxxxxxxxxxxxxxxx
Step 2: Create AWS FIS Experiment Template
A FIS experiment template is created to simulate an AZ failure by stopping
all instances in AZ1 (us-east-1a). This tests the system's ability to handle
an AZ outage and shift traffic or workloads to the healthy nodes in AZ2 (us-
east-1b).
Create the experiment template using the AWS FIS console. In the template,
you specify two actions that will run sequentially for three minutes each. The
first action stops one of the test instances, which AWS FIS chooses randomly.
The second action stops both test instances.
To create an experiment template
1. Open the AWS FIS console at https://console.aws.amazon.com/fis/.
2. In the navigation pane, choose Experiment templates.
3. Choose Create experiment template.
4. For Description and name, enter a description and a name for the
template.
5. For Actions, do the following:
a. Choose Add action.
b. Enter a name for the action. For example, enter stopOneInstance.
c. For Action type, choose aws:ec2:stop-instances.
d. For Target keep the target that AWS FIS creates for you.
e. For Action parameters, Start instances after duration, specify 3
minutes (PT3M).
f. Choose Save.
6. For Targets, do the following:
a. Choose Edit for the target that AWS FIS automatically created for you
in the previous step.
b. Replace the default name with a more descriptive name. For example,
enter oneRandomInstance.
c. Verify that Resource type is aws:ec2:instance.
d. For Target method, choose Resource IDs, and then choose the IDs of
the two test instances.
e. For Selection mode, choose Count. For Number of resources,
enter 1.
f. Choose Save.
7. Choose Add target and do the following:
a. Enter a name for the target. For example, enter bothInstances.
b. For Resource type, choose aws:ec2:instance.
c. For Target method, choose Resource IDs, and then choose the IDs of
the two test instances.
d. For Selection mode, choose All.
e. Choose Save.
8. From the Actions section, choose Add action. Do the following:
a. For Name, enter a name for the action. For example,
enter stopBothInstances.
b. For Action type, choose aws:ec2:stop-instances.
c. For Start after, choose the first action that you added
(stopOneInstance).
d. For Target, choose the second target that you added
(bothInstances).
e. For Action parameters, Start instances after duration, specify 3
minutes (PT3M).
f. Choose Save.
9. For Service Access, choose Use an existing IAM role, and then
choose the IAM role that you created as described in the prerequisites
for this tutorial. If your role is not displayed, verify that it has the
required trust relationship. For more information, see IAM roles for AWS
FIS experiments.
10. (Optional) For Tags, choose Add new tag and specify a tag key
and tag value. The tags that you add are applied to your experiment
template, not the experiments that are run using the template.
11. Choose Create experiment template. When prompted for
confirmation, enter create and then choose Create experiment
template.
Next Steps
Schedule recurring failover tests using AWS FIS and CloudWatch.
Integrate failover testing into the CI/CD pipeline to ensure system
reliability under real-world failure conditions.
Recovery Testing
Objective
The primary objective of recovery testing is to verify that the system can
recover from failures and return to its normal operational state within
acceptable limits, with minimal data loss or corruption. This is critical to
ensure business continuity, minimize downtime, and guarantee that users
can rely on the system during and after failures.
Key Scenarios in Recovery Testing
Recovery testing typically involves scenarios where:
1. Hardware Failures: Disk crashes, server shutdowns, or network
disconnections.
2. Software Failures: Application crashes, database connection failures,
or service interruptions.
3. Power Failures: Sudden loss of power, especially in physical data
centers.
4. System Overload: Recovery from high-load conditions where the
system or its components have failed due to resource exhaustion.
Steps for Conducting Recovery Testing
1. Identify Failure Scenarios
First, identify the types of failures your system could experience. Examples
include:
EC2 instance shutdown in one Availability Zone (AZ).
Database disconnection or failure.
Disk failure in a storage service like S3 or EBS.
Network partition or disconnect between services.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/
UsingAlarmActions.html
2. Simulate Failures
Use tools like AWS Fault Injection Simulator (FIS) or manual approaches to
simulate the failures:
AWS FIS: Inject failures such as stopping or rebooting EC2 instances
or causing network latency to simulate a failure.
Example scenario:
Stop instances in us-east-1a (AZ1) and simulate failure while
monitoring the recovery process in us-east-1b (AZ2).
3. Monitor Recovery Process
Once the failure is induced, monitor the system to assess how it responds:
Instance Recovery: Did EC2 instances in AZ2 take over the workload
when AZ1 instances failed?
Service Continuity: Was the service available during recovery, even
with degraded performance?
Data Integrity: Was any data lost or corrupted during the failure and
recovery process?
Conclusion
Recovery testing ensures that your system can return to a stable state
after encountering failures, which is crucial for maintaining availability
and minimizing downtime. By automating recovery processes and
monitoring system performance, you can ensure business continuity
even in the face of unexpected failures.
->Apache JMeter
Apache JMeter is an open-source tool designed to load test applications
and measure their performance under heavy loads.
How to Use It with AWS FIS:
o Simulate Load: Use JMeter to simulate high traffic or load on
your web application (hosted on EC2 instances).
o Failover Testing: While the load is being simulated by JMeter,
you can use AWS FIS to inject failures like EC2 instance
termination, rebooting, or network latency in multiple AZs.
o Test Use Cases: Test how your auto-scaling group, load
balancer, or multi-AZ architecture handles the EC2 instance
failures while under load.
Example Use Case:
1. Create a JMeter test to simulate HTTP requests to your application
(hosted on multiple EC2 instances in different AZs).
2. Run the FIS experiment to terminate EC2 instances in a specific AZ
or introduce network latency between AZs.
3. Monitor the application’s performance under load using CloudWatch
metrics and JMeter reports to verify that failover happens correctly and
the load balancer is distributing traffic effectively.
Steps to Integrate JMeter with AWS FIS:
1. Set up JMeter to run load tests on your application endpoint (e.g., a
web app running on EC2).
2. Use AWS FIS to create a failover scenario (e.g., terminating EC2
instances in one AZ).
3. Analyze the response times, error rates, and failover behavior
through both JMeter and AWS CloudWatch.