CSA Framework Cloud Incident Framework 1620215103
CSA Framework Cloud Incident Framework 1620215103
CSA Framework Cloud Incident Framework 1620215103
(CIR) Framework
The permanent and official location for the Cloud Incident Response Working Group is https://
cloudsecurityalliance.org/research/working-groups/cloud-incident-response/.
© 2021 Cloud Security Alliance – All Rights Reserved. You may download, store, display on your
computer, view, print, and link to the Cloud Security Alliance at https://cloudsecurityalliance.org
subject to the following: (a) the draft may be used solely for your personal, informational, non-
commercial use; (b) the draft may not be modified or altered in any way; (c) the draft may not be
redistributed; and (d) the trademark, copyright or other notices may not be removed. You may quote
portions of the draft as permitted by the Fair Use provisions of the United States Copyright Act,
provided that you attribute the portions to the Cloud Security Alliance.
Special Thanks:
Key Contributors:
Bowen Close
Aristide Bouix
David Chong
David Cowen
Karen Gispanski
Dennis Holstein
Christopher Hughes
Ashish Kurmi
Larry Marks
Abhishek Pradhan
Michael Roza
Ashish Vashishtha
Reviewers:
Oscar Monge España
Nirenj George
Tanner Jamison
Chelsea Joyce
Vani Murthy
Sandeep Singh
Fadi Sodah
With today’s emerging and rapidly evolving threat landscape, a holistic cloud incident response
framework that considers an expansive scope of factors for cloud outages is necessary. The
Cloud Incident Response (CIR) Working Group (WG) aims to develop a holistic CIR framework that
comprehensively covers fundamental causes of cloud incidents (both security and non-security
related) and their handling and mitigation strategies. The aim is to serve as a go-to guide for cloud
users to effectively prepare their detailed plan to respond and manage the aftermath of cloud
incidents. The CIR is also a transparent and common framework for cloud service providers to
share their cloud incident response practices with cloud customers. This framework’s development
includes imperative factors of cloud incidents such as operational mistakes, infrastructure or system
failure, environmental issues, cybersecurity incidents, and malicious acts.
Incident response frameworks have already been documented in many governmental and industry
guidelines, such as the NIST 800-61r2 Computer Security Incident Handling Guide or SANS Institute
Information Security Reading Room Incident Handler’s Handbook for traditional on-premises
information technology (IT) environments. However, when a cloud computing environment is
incorporated, the roles and responsibilities defined in the traditional incident response frameworks
must be revised and refined to align with the roles and responsibilities of cloud service providers (CSPs)
and cloud service customers (CSCs) for different cloud service models and deployment models.
Purpose
This document seeks to provide a Cloud Incident Response (CIR) framework that serves as a go-to
guide for a CSC to effectively prepare for and manage cloud incidents through the entire lifecycle of
a disruptive event. It also serves as a transparent and common framework for CSPs to share cloud
incident response practices with their CSCs.
Target Audience
The key beneficiaries are CSCs. This framework guides CSCs to figure out their organization’s security
requirements and thus opt for the appropriate level of incident protection. Through this, CSCs can
negotiate with CSPs or select security capabilities that are made-to-measure—providing a clear
understanding of the division of security roles and responsibilities.
• CSA Security Guidance For Critical Areas of Focus In Cloud Computing v4.0
• NIST 800-61r2 Computer Security Incident Handling Guide
• ITSC Technical Reference (TR) 62 – Cloud Outage Incident Response (COIR)
• FedRAMP Incident Communications Procedure
• NIST 800-53 Security and Privacy Controls for Information Systems and Organizations
• SANS Institute Information Security Reading Room Incident Handler’s Handbook
• ENISA Cloud Computing Risk Assessment
Figure 1 shows the relationship between the CIR phases and the primary references.
Phase 5.3
Phase 5.1 Phase 5.2 Containment, Phase 5.4
Preparation Detection and Analysis Eradication and Postmortem
Recovery
CSA Sec. Guidance v4.0 CSA Sec. Guidance v4.0 CSA Sec. Guidance v4.0 CSA Sec. Guidance v4.0
9.1.2.1 Preparation 9.1.2.2 Detection and 9.1.2.3 Containment, 9.1.2.4 Postmortem
Analysis Eradication, and Recovery
NIST 800-61r2 NIST 800-61r2
3.1 Preparation NIST 800-61r2 NIST 800-61r2 3.4 Post-Incident Activity
3.2 Detection and Analysis 3.3 Containment,
TR 62 TR 62
Eradication, and Recovery
0.1 Cloud Outage Risks TR 62 5.3 After Outage: CSC
4.2 COIR Categories TR 62 6.3 After Outage: CSP
FedRAMP Incident
5.1 Before Cloud Outage: 5.2 During Outage: CSC
Comm. Procedure FedRAMP Incident
CSC 6.2 During Outage: CSP
5.1 Preparation Comm. Procedure
6.1 Before Cloud Outage:
FedRAMP Incident Post-Incident Activity
NIST (SP) 800-53 r4 CSP
Comm. Procedure
3.1 Selecting Security Incident Handlers
FedRAMP Incident 5.3 Containment,
Control Baselines Handbook
Comm. Procedure Eradication, and Recovery
Appendix F-IR IR-1, 1R-2, 7 Lessons Learned
5.2 Detection and Analysis
1R-3, IR-8 NIST (SP) 800-53 r4 8 Checklist
NIST (SP) 800-53 r4 Appendix F-IR
Incident Handlers
Appendix F-IR 1R-4, IR-6, IR-7, IR-9
Handbook
AT-2, 1R-4, IR-6, 1R-7, IR-9,
2 Preparation Incident Handlers
SC-5, SI-4
8 Checklist Handbook
Incident Handlers 4 Containment
ENISA Cloud Computing
Handbook 5 Eradication
Security Risk Assessment
3 Identification 6 Recovery
Business Continuity
8 Checklist 8 Checklist
Management, page 79
• Phase 1: Preparation
• Phase 2: Detection and Analysis
• Phase 3: Containment, Eradication and, Recovery
• Phase 4: Postmortem
There are several key aspects of a CIR system that differentiate it from a non-cloud incident response
(IR) system, such as governance, shared responsibility, and visibility.
Governance
Data in the cloud resides in multiple locations, perhaps with different CSPs. Getting the various
organizations together to investigate an incident is a significant challenge. It is also resource-draining
on large CSPs that have a colossal client pool.
Shared responsibility
Cloud service customers, CSPs, and/or third-party providers all have different roles to ensure cloud
security. Generally, customers are responsible for their data and the CSPs for the cloud infrastructure
and services they provide. Cloud incident response should always be coordinated across all parties.
The domains of shared responsibilities are also different between the CSPs and CSCs depending
on the model of cloud services chosen, such as software-as-a-service (SaaS), platform-as-a-service
(PaaS), and infrastructure-as-a-service (IaaS). This idea has to be well understood. For example, in
IaaS, managing the operating system (OS) lies with the CSC. Therefore the IR responsibilities for the
OS also lie with the CSC.
Host risks
Infrastructure risks
It is essential to discuss—in granular detail—that roles and governance are clear and well-documented
in the contract or service-level agreement (SLA) with the CSP. The CSC should not create or settle for
any policy that cannot be enforced. Organizations should understand they can never outsource their
part of governance or shared responsibilities.
A single CSP approach to the supply of cloud services may result in a situation where the
organization’s business could suffer a sustained outage in case of any failures introduced at the CSC/
CSP over which the organization does not have control. This scenario will impact business operations
substantially and raises the possibility of a business continuity plan (BCP) strategy unable to
recover—resulting in a systemic CIR event.
When approaching service provider diversity from the CIR perspective, organizations are also
encouraged to consider aspects of digital service sovereignty (e.g., data residency, data sovereignty)
in their plans.
5 Microsoft TechNet 25 October 2019, Shared Responsibilities for Cloud Computing, https://gal-
lery.technet.microsoft.com/Shared-Responsibilities-81d0ff91
Incident response and management frameworks have been developed and documented by many
organizations, as stated in chapter 2 of this document. Different frameworks have their objectives
and target audiences. This framework has adopted the commonly accepted “Incident Response
Lifecycle” described in CSA Security Guidance for Critical Areas of Focus in Cloud Computing v4.0 and
NIST Computer Security Incident Handling Guide (NIST 800-61rev2 08/2012).
Containment,
Preparation Detection
Eradication, Post-Mortem
& Analysis
Recovery
When an incident occurs, the objectives of CIR are to achieve the following:
To understand the organization incident response capability, one of the critical differences between
the traditional IR framework and the CIR framework is the presence of the “Shared Responsibility
6 Cloud Security Alliance 2017, Security Guidance for Critical Areas of Focus in Cloud Computing
v4, https://cloudsecurityalliance.org/artifacts/security-guidance-v4/
However, in a cloud environment, the CSC is not the owner of all the systems. Depending on the
adopted service models and their corresponding shared responsibility model, some artifacts and logs
are managed by the CSP. When third-party IR providers are engaged, the CIR plan should also include
them in the overall process. This juncture presents an appropriate opportunity for organizations
to consider vetting any third-party IR vendors to ensure quick access to resources should they be
needed in an emergency response situation.
Organizations should familiarize and make full use of their CSPs’ business continuity and disaster
recovery capabilities to invoke them in incidents. Thus, it would be necessary for the CSC to
understand the IR procedures of the CSP and to align with them through the SLA and contract. To
manage and execute this initiative, a CIR plan should include:
5.1.1 Documentation
Throughout the IR process, the organization shall maintain incident documentation to ensure a
systematic record for efficient review of the incident and lessons learned. The organization should
manage the following information about an incident record:
1. The current status of the incident (“new,” “in progress,” “forwarded for investigation,”
“resolved,” etc.).8
2. A summary of the incident.
3. Indicators of compromise related to the incident.
4. Other incidents related to the original incident.
5. Actions taken by handlers of this incident.
6. Chain of custody, if applicable.
7. Impact assessments related to the incident.
8. Contact information for other involved parties (e.g., system owners, system administrators).
9. A list of evidence gathered during the incident investigation.
10. Comments from incident handlers.
11. Planned next steps (e.g., rebuild the host, upgrade an application).
12. Restrict access to the incident record to appropriate personnel since it may contain
sensitive information with regulatory or compliance implications, IP addresses, exploited
vulnerabilities, confidential business information.
13. Retrospective/lessons learned: Document any lessons learned, such as successes, areas of
improvement, actions to avoid, and new procedures to improve outcomes.
A cloud incident—as defined in this document—is an occurrence that harms the operation of IaaS,
PaaS, desktop-as-a-service (DaaS), SaaS, and related services that CSPs provide. Cloud incidents
There is usually a sign before an incident. According to the National Institute of Standards and
Technology (NIST) definition, scenarios that constitute a sign include:
Precursor Indicator
The CSP and CSC must have a system or process to detect these signs, which may prevent an actual
occurrence. The common sources of precursors and indicators include:
1. Alerts
2. Logs
3. Indicators of compromise (IoC)
4. Industry events
5. Market analysis reports
6. Threat intelligence reports
7. Publicly available information
8. People
9. Social media
It is recommended to have systems in place to collect and analyze these precursors and indicators,
ranging from system logs, alerts, SIEM, and a security ops center to an integrated ops center. Ideally,
monitor and correlate the various alerts, logs, events, calls, and logs for comprehensive cyber-
situation awareness via an integrated ops center. In all cases, the collection and analysis scope must
cover the cloud's management plane and not merely the deployed assets.
Part of the incident information collection effort is to determine if the issue is a false positive or a
false negative.9 If the issue is a “false alarm,” then the documentation (i.e., ticket) should be updated
to document this assessment and close the issue. Each indicator must be evaluated to determine
legitimacy.
The incident response plan should be organized systematically to minimize impact to business and
service operations, and relevant parties should be informed when incidents occur. Incident escalation
should be based on the severity of the incident’s impact. Because of the high volume of incidents
in a highly complex cloud environment, senior management should only be informed of critical and
high-impact incidents. The CSP and CSCs should develop and integrate an escalation matrix into
the contract and/or SLAs. Note: The CSP may obligate CSCs to inform the CSP of any significant CSC
incidents, as these may pose a threat to the CSP’s infrastructure and operations.
For accurate reporting, the following critical information (5Ws) should be gathered from the incident
reporter and, if possible, the affected environment:
1. What happened? Did the user take any actions before and after the incident?
2. Where did the incident happen? Has it been contained, or have any other areas been
impacted? What is the confidence level for non-impacted zones?
3. When did it happen?
4. Who discovered it? Who is affected or not affected? How was it discovered?
5. Why did the incident occur? Has the source or “patient zero” been found?
Time is of the essence. Though there is a need to resolve the incident quickly, it is equally essential
to inform relevant stakeholders promptly to allow them to understand the situation so they can
advise or take necessary actions to reduce incident impact. During an incident, crisis communication
is an integral part of the crisis management plan covering any incident related to service or
business outages, including a cyberattack. Poor incident management can lead to regulatory fines,
reputational damage, loss of customer trust, and severe financial loss.
• The initial incident notification should be disseminated to key stakeholders internally and
externally within the first two to eight hours to enable horizon scanning at the CSC/CSP/
third-party provider.
• A primary informational incident report should be shared with internal stakeholders
containing information on at least the first 4Ws within the first four to 48 hours (depending
on the incident’s impact). When necessary, external stakeholders (CSP/third-party providers)
may need to be involved in the investigation and containment. When necessary, external
stakeholders may also need to be involved.
• The CSCs/CSPs usually undertake self-reporting within an agreeable timeframe, as per
generalized contractual terms and conditions. Organizations may want to undertake a
Depending on the escalation workflow, organizations should send notifications expeditiously via the
agreed-upon medium (phone call, SMS, email, etc.). Incidents of different severity levels should be
escalated to different execution and management parties as agreed to in the CIR plan. If there is a
critical impact on business continuity or reputation, organizations should also activate their BCP and/
or crisis management plans (CMP).
An incident impact model must be developed upfront and used by the CSP and CSC to ensure
consistency in event assessment, impact, notifications, and actions needed to respond accordingly.
The incident priority matrix (also called an “impact and urgency matrix”) is derived from impact
severity and urgency levels. A quick and proper impact assessment must be performed to determine
the damage extent. The following examples include the key impact types that both the CSP and CSC
should consider together:
Organizations must establish and define proper categorization of the impact severity levels based
on their tolerances and appetite for risk. Under the European Union Agency for Cybersecurity
(ENISA) Cloud Security Incident Reporting, one or more parameters can assess the level of impact.
For example: For one day of downtime and a geographic spread of 70 percent, the incident’s impact
would register “Level 2/Level 1.” Upon this determination, users should reference containment
guidelines for incidents with “Level 2/Level 1” impact. It is important to note that the given values
serve as examples, and the values should be adjusted to reflect organizational nature, priorities, and
business objectives.
Urgency levels range from the lowest (“Level 5”) to the highest (“Level 1/2”) using the following
considerations:
Organizations should adopt incident classification scales used by several standards and guidelines to
help users gauge the severity of impact and/or the relevant importance of cloud services availability to
business operations. The following is a set of policies based on the current operational trend of CSPs:
Organizations may also wish to undertake a business impact analysis (BIA)—or a threat, vulnerability,
and risk assessment (TVRA) specific to organizational parameters—and consider purchasing cyber-
insurance to mitigate the potential financial impact of a cloud incident.
Identifying data relevant to the investigation is vital in determining the root cause of the incident
and identifying lessons learned to avoid repeated incidents. Identified data can also help support
beneficial information-sharing initiatives for leverage to prevent similar incidents.
Note that CSPs may limit log retention periods due to GDPR or other compliance requirements.
These limitations must be understood and accounted for in incident response planning as log
availability will affect necessary evidence gathering (depending on the cloud service chosen).
Possible locations of relevant data include storage drives attached to virtual instances and the
memory space of an instance. By utilizing CSP capabilities, such as for-instance snapshots, CIR
teams can obtain snapshots of the virtualized storage drives attached to incidents and utilize them
for further analysis and discovery. These snapshots can be mounted to digital forensic investigative
resources for scrutiny with widely used forensic analysis tool sets.
Any collected evidence should also undergo hash activity processes. This helps ensure the integrity
of the collected information and that the data has not been altered from its original source. This
undertaking also helps ensure evidence admissibility regarding potential legal proceedings. Ensure
forensic work is performed on a copy of the collected evidence (rather than the original data that has
been hashed) for court admissibility.
For cybersecurity incidents, the following steps should occur to identify attacking hosts:
Any collected evidence should utilize a hash activity to ensure the integrity of the collected data. This
process can be used to verify that evidence has not been altered from its original source, and helps
ensure admissibility for potential legal proceedings.
Note: Depending on the incident and its effects, containment, eradication, and recovery may all be
part of the same process.
Next, the attacker may move laterally and establish persistence by installing different malware on a
small number of other machines. This keeps their detection risk low while providing means of re-
entering the network should the initial compromise be detected. The attacker is now established in
the system and will start to execute their mission.
Upon incident discovery, affected organizations should execute predefined CIR plans (as stipulated
in “Phase 1: Preparation”), such as taking systems offline, quarantining systems, and restricting
connectivity. It is paramount not to remove the threat by blind deletion, as this destroys forensics
evidence needed for CIR plan revisions. Containment provides time to develop a remediation
strategy. An essential part of containment is decision-making (e.g., shut down a system, disconnect
it from a network, delete API keys, disable username). Such decisions are much easier to make with
predetermined strategies and procedures for incident containment. To define and document the
strategies and procedures, IR teams should utilize playbooks and runbooks to simplify tasks.
Organizations should define acceptable risks in dealing with incidents and develop strategies
accordingly. Containment strategies vary based on incident types. For example, the process to
contain an email-borne malware infection is quite different compared to a network-based DDoS
attack response. Organizations should create separate containment strategies for each major
incident type, with criteria documented clearly to facilitate decision-making.
• Business impact
• Potential resource theft and damage
• Need for evidence preservation
• Service availability (e.g., network connectivity, services provided to external parties)
• Time and resources needed to implement the strategy
• Strategy effectiveness (e.g., partial containment, full containment)
• Containment approach duration, complexity (e.g., an emergency workaround to be removed
in four hours vs. temporary workaround to be removed in two weeks vs. permanent
solution)
• Resource availability (particularly technical expertise)
• Availability and integrity of backup/copies/snapshots
• Availability of sandbox/honeypots environments
The appropriate containment strategy’s ultimate goal is to limit the attacker’s movement and prevent
further unauthorized access or infection within the shortest possible time while minimizing service
disruptions. An appropriate strategy will prevent further damage from happening while preserving
forensic evidence necessary for investigation.
In certain cases, some organizations redirect attackers to a sandbox (a form of containment similar to
a honey pot) so they can monitor the attacker’s activity (usually to gather additional evidence). The IR
team should discuss this strategy with its legal counsel to determine feasibility.
Organizations should not implement alternative methods to monitor attacker activities (other
than sandboxing). If an organization detects a system compromise and allows the compromise to
continue, the organization may be held liable if the attacker uses the compromised system to attack
other systems.
The delayed containment strategy is dangerous because an attacker could escalate unauthorized
access or compromise other systems. Another potential issue is that some attacks may cause
additional damage after containment. For example, a compromised host may run a malicious process
that pings another host periodically. When the incident handler attempts to contain the incident by
disconnecting the compromised host from the network, the subsequent pings will fail.
As a result of the failure, the malicious process may overwrite or encrypt all the data on the host’s
hard drive. Even after a host has been disconnected from the network, handlers must not assume
that further damage to the host will be prevented.
An objective postmortem will also help the team use collected information to gauge the overall
effectiveness of the CIR process.
If incident data is collected and stored properly, it should highlight several measures of success (or at
least activities) of the IR team.
Incident data can also be collected to determine if notable trends exist over time. These patterns may
reveal more about how the team is doing over a defined duration and if there are improvements (e.g.,
a decreasing number of incidents) or areas that warrant increased attention (e.g., a spike in security-
related incidents). Organizations must typically report such information in regulated industries—
especially major incidents—to regulatory bodies and management. The CSCs are expected to collect
necessary data in a timely, accurate, and complete manner to meet these requirements.
Data such as flow logs or other traffic logs should be collected to review unauthorized access or
suspicious traffic.
• Mean time to detect (MTTD): The average time to discover the security incident. How long
did it take from when the incident started until the team became aware of it? This is directly
linked to attacker dwell time (the time between attacker infiltration and the detection point).
• Mean time to acknowledge (MTTA): The time it takes a security operator to respond to a
system alert. While MTTD measures the time before an attacker is noticed, MTTA focuses
on measuring a security operator’s time responding to the security alert and starting the
analysis.
• Mean time to recovery (MTTR): The time required to bring a system back into an operating
state (linked to phase 3).
• Mean time to containment (MTTC): The average time required to detect, respond to,
eradicate and recover from an incident. The MTTC can be calculated by adding up the MTTD,
MTTA, and MTTR for all in-scope incidents, divided by the number of in-scope incidents. This
metric is considered a key metric (key performance indicator, or KPI) as it shows how well
the incident response team is organized. An elevated MTTC signals that some subprocesses
are not optimal during incident response. A lower MTTC indicates the team is very well-
organized.
• Threat metrics, e.g., Gbps or Tbps if a DDoS attack
• Threat actor TTPs (tactics, techniques, and procedures). These include phishing and account
manipulation. More examples can be found in MITRE’s ATT&CK® Cloud Matrix13
Once the incident has closed, the CIR team that managed the event shall compose a formal after-
action report (AAR) using data collected from previous phases and incident evaluation. This task
is essential in the postmortem phase and should be performed as soon as possible while lessons
are still fresh. If delayed, critical details may be lost or forgotten—potentially making a significant
difference in future incident prevention. The CIR team should present the AAR to key stakeholders
within two weeks of the incident closure.14 Appropriate countermeasures must be formulated and
validated by (senior) management. The AARs are best created using a formally approved reporting
template to ensure that reports consistently meet expected standards.
• Review the incident’s timeline and any CIRT and CSP CIRT observations.
• Perform a thorough root-cause analysis supported by the “5 Whys” (or “5Y”) technique to
identify and review all contributing event factors.
It is often considered necessary to publish the report to the broader public after reporting the
information to top management to facilitate incident information sharing across enterprises. This
transparency helps peers better identify and control risks.
The final step in handling a security incident is determining what was learned. If gaps are identified
during the incident response related to personnel, processes, or technology, they must be
addressed. The person who closes the event must ensure that a retrospective review is held
regarding the security incident—an undertaking referred to as “lessons learned.” Use “lessons
learned” to help revise and solidify the CIR plan. Each IR team should proactively evolve to reflect
new threats, improved technology, and lessons learned16—improving future response actions.
Security Guidance: Pay particular attention to data collection limitations and determine how to
address the issues moving forward. As cloud data resides in multiple locations (and perhaps with
different CSPs), the following considerations present challenges to this phase of the process:
• Challenges related to obtaining and coordinating incident data collection from various third-
party providers (internet service providers)
• Resource dependency from third-party providers (potentially due to the size of
dependencies from their client pool).
The following suggested questions can help CSCs come up with their own inquiries:
• What part(s) of the service layer had an issue? What was the impact on the affected
applications and users?
• How long did the problem last, and at what time?
• Is the problem cause known?
• What was learned that could prevent or mitigate the occurrence of this event?
15 CSA Security Guidance For Critical Areas of Focus In Cloud Computing v4.0, section 9.1.2
16 NIST.SP.800-61r2 Computer Security Incident Handling Guide
All identified evidence collected during “Phase 2 Detection and Analysis” must be retained according
to the requirements set forward for the enterprise’s applicable legal, regulatory, industry, or
contractual obligations. Evidence is kept for the following three (3) purposes:
• Regulatory compliance requirements (i.e., specific levels and granularities of audit logging,
alerts generation, activity reporting, and data retention). Data retention may not be a part of
standard service agreements impacted by providers.
• Legal: To support a prosecution for compromise of PII/PHI or enterprise systems.
• Risk management: To reflect and reassess new threats tactics and techniques.
• Training: To facilitate better team preparedness for future incidents, incorporate adaptive
incident learning.17
The enterprise forensic model must be capable of facilitating the required evidence retention periods
and technologies used. As per prior CSA guidance.18 The CSC should work with the CSP to evaluate
incident handling. The retention of digital forensic evidence in a cloud context must be seen as an
integrated model between the CSP and CSC.19
17 Incident Response Teams – Challenges in Supporting the organizational Security Function, Ah-
mad, Hadgkiss & Ruighaver 2012; Shedden, Ahmad & Ruighaver 2011 https://www.sciencedirect.
com/science/article/pii/S0167404812000624?via%3Dihub
18 CSA Security Guidance For Critical Areas of Focus In Cloud Computing v4.0
19 An integrated conceptual digital forensic framework for cloud computing, Martini and Choo
https://www.sciencedirect.com/science/article/abs/pii/S174228761200059X
The communication path between the provider and the users should be established appropriately.
Regular updates should be available for any impacted users to mitigate losses and strategize
business recovery methods. Effective coordination and communication go beyond just reporting to
the customers.
Because of the shared nature of cloud computing, an attack typically affects more than one
organization simultaneously. Thus, incident information sharing is mutually beneficial in helping
involved organizations guard against the same threats. The CSA runs the Cloud Cyber Incident
Sharing Center (CloudCISC)20 that facilitates incident data sharing between participating CSPs.
Coordination with key partners, IR teams in other departments, and law enforcement agencies
significantly reinforce CIR capabilities. This communication should be set up from the start–during
the planning phase–and maintained throughout the entire CIR process, as necessary.
Selecting
Identifying Message for Target
Preparedness Communications
Communication Team Audience
Channels
Website Shareholders
Social Media
Customer Support
6.1 Coordination
6.1.1 Coordination Relationships
All stakeholders should work together to identify their roles and responsibilities during cloud security
incidents explicitly. Traditionally, these roles closely tie with their duties in the shared responsibility
model. For example:
• a security incident occurring in the platform or service layer for a PaaS or SaaS application
should be driven by the CSP;
• a security incident occurring in the application layer for a PaaS application should be driven
by the CSC;
• a security incident occurring in the platform layer for an IaaS infrastructure cloud should be
driven jointly by the CSC and the CSP to determine if it originated in the CSC’s environment
or the CSP’s environment.
Stakeholders should proactively identify such incident scenarios along with their roles and
responsibilities. They should also identify communication channels (e.g., email, video/call conference
call details) for use during incidents so that stakeholders know how to share information efficiently.
Once stakeholders identify their roles and responsibilities, it’s essential to get these relationships
formalized in contract agreements. These agreements should include nondisclosure agreements
(NDAs) for all stakeholders so they can share information confidentially (including an enterprise’s
most sensitive information). Organizations trying to share information with external organizations
should consult with their legal departments before initiating coordination efforts. There may be
contracts or other agreements that must be put into place before discussions occur.
Organizations should also consider any existing reporting requirements, such as sharing incident
information with an information sharing and analysis center (ISAC) or reporting incidents to a higher-
level CIRT.
1. Ad hoc
2. Partially automated
3. Security considerations
Cloud security incidents are both business problems and IT problems. Cloud security incidents
may cause a range of negative business impacts, such as financial loss (e.g., service unavailability,
loss of compliance certifications resulting in an inability to do business, incident response costs),
reputational impact (loss of customer trust), trade secret disclosures, intellectual property theft,
sensitive data breaches, or other issues.
Business impact information is only useful for reporting to organizations that are interested in
ensuring the mission of the affected enterprise. In many cases, IR teams should avoid sharing
business impact information with outside organizations unless there is a clear value proposition or
formal reporting requirements. However, in some cases, organizations may be forced to share this
information publicly due to regulatory and legal requirements.
Business impact information describes how the incident affects the organization in terms of mission
impact, financial impact, etc. At least at a summary level, such information is often reported to
higher-level coordinating IR teams to communicate an incident’s damage estimate.
Business impact information is only useful for reporting to organizations that have some interest in
ensuring the mission of the organization experiencing the incident. In many cases, IR teams should
avoid sharing business impact information with outside organizations unless there is a clear value
proposition or formal reporting requirements.
Because CSPs cater to many clients, adversaries often use the same weakness to compromise
multiple CSP customers. Once a CSC/CSP extracts technical details about an attack or emerging
threat, this data can be distributed to enhance defenses against the specific attack.
In today’s digital economy, speed and efficiency are essential. The speed at which cybercriminals
operate can be worrying for those tasked with defending networks from attacks. The industry must
share more security intelligence with industry peers to better protect and adapt to evolving threats.
While enterprises gain value from collecting their internal indicators, they may gain additional value
from analyzing indicators received from partner organizations and sharing their internal indicators for
external analysis and use. If organizations receive external indicator data about an incident they have
not seen, they can use that indicator data to identify the incident as it begins. Similarly, organizations
may use external indicator data to detect an ongoing incident it was not aware of due to a lack of
internal resources to capture the specific indicator data.
Technical indicator data is useful when it allows an organization to identify an actual incident.
However, not all indicator data received from external sources will pertain to the organization
receiving it. External data may occasionally generate false positives within the receiving
organization's network and cause unnecessary resource allocation on nonexistent problems.
While organizations gain value from collecting their internal indicators, they may gain additional
value from analyzing indicators from partner organizations and sharing internal indicators for external
Organizations should share as much insight as possible. However, there may be security and liability
reasons dictating why organizations may withhold details of an exploited vulnerability.
Technical indicator data is useful when it allows an organization to identify an actual incident.
However, not all external source indicator data will pertain to the organization receiving it. In some
cases, this external data will generate false positives within the receiving organization's network—
causing unnecessary resource allocation for nonexistent problems.
A CSP should offer a self-service customizable dashboard for users that notifies them about
incidents so customers are up-to-date. These dashboards are typically used to communicate
incidents that impact a great number of customers. The CSPs should also support configuration
options to customize cloud alerts and create personalized dashboards to analyze relevant incidents,
monitor cloud resource impacts, provide guidance and support, and share details and updates. These
dashboards can be designed as the single source of truth concerning cloud resources and should
give users more visibility into any issues that may affect them.
Therein lies the value in table-top exercises, a pure simulation of an attack scenario, and a security
incident preparedness activity. Table-top exercises help organizations consider various security
incident scenarios and prepare for potential cyber threats by guiding participants through the
process of responding to a simulated incident scenario. The experience provides hands-on training
for participants that can then highlight flaws in the IR process.
Any organization should be able to perform table-top exercises (as opposed to introducing bugs in
customer environments that require sophisticated technical and operational capabilities). Furthermore,
table-top exercises are far less resource-intensive compared to the “real-world” simulations.
Table-top exercises help improve the overall incident response posture and the collective team
preparedness and decision-making process when incidents occur. Exercises begin with the IR plan
and gauges team performance against it. Since most organizations are unprepared for cloud security
incidents, having a well-executed IR plan is critical.
In many ways, this sentiment hits home for organizations concerning cyberattack threats.
Organizations should develop a solid understanding of the incident response process—and its
incident response capabilities—to prepare for any potential incidents.
This paper explored the CIR framework and the preparation required to respond to incidents
effectively. It serves as a go-to guide for a CSC to prepare for and manage cloud incidents through
the entire lifecycle of a disruptive event. It also provides a transparent, common framework for CSPs
and CSCs to share cloud incident response practices.
We presented the CIR framework in four phases (plus a final section covering coordination and
information sharing).
Preparation addresses the strategies and actions required in advance of a cloud incident. An effective
incident response plan includes forming a CIR team (CIRT), strategy planning and preparation,
procedures development, technical preparation, and communication plan creation.
Detection and analysis covers the various signs and possible causes of cloud incidents for early
detection. To determine the root cause, multiple means are discussed. The speed of early incident
notification (and the corresponding resolution timing based on business impacts) is also highlighted
for CSP/CSC consideration.
Containment, eradication, and recovery explain the importance of choosing the right strategy
to stop the attacker from doing further systems damage while investigations and forensics are
undertaken.
The Postmortem process identifies gaps in personnel, processes, or technology and translates these
into “lessons learned” that must be ingested in the preparation phase. The key objective of this
closing phase is to improve future incident handling. To improve an enterprise’s security capabilities,
it is critical to review the incident/forensic support of the CSP(s) (if applicable), the available
technological tools to support event analysis, the TTPs used by the actor, and to conduct forensic
investigations.
The Coordination and information sharing section describes how the complexities of threats to the
cloud requires stakeholders to coordinate and share security information to mitigate losses.
In conclusion, this framework will help guide CSCs in determining their security requirements and
appropriate incident protection levels. Additionally, CSCs can use this guide to negotiate with CSPs
and/or third parties to ascertain capabilities and shared responsibilities.