0% found this document useful (0 votes)
630 views72 pages

SANS Cloud Security Principles

SANS Cloud Security Principles

Uploaded by

padlet.unkind512
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
630 views72 pages

SANS Cloud Security Principles

SANS Cloud Security Principles

Uploaded by

padlet.unkind512
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

2024 Edition

CLOUD SECURITY:
FIRST PRINCIPLES
AND FUTURE
OPPORTUNITIES

In partnership with:
Foreword
It is estimated that approximately half of enterprise workloads are in the public cloud
today.1 This is forecast to increase to more than 45% in three years1. Although this may
seem like a large number, cloud adoption has been slow and steady since the first

Whitepaper
public cloud services were made available some 20 years ago.

It’s been a journey of increased enterprise cloud adoption for many organizations. Every
security team and security professional also has been on a journey—a cloud security
journey to improve their capabilities, skills, and opportunities to keep up with the
changing business, technology, and threat landscape.

A DNS SECURITY
That’s why we are back, for the third year in a row, pairing leaders from the three major
cloud providers—Amazon Web Services (AWS), Google Cloud, and Microsoft Azure—with
independent technical experts from SANS Institute to give you insights to improve your

ARCHITECTURE
cloud security capabilities.

This book has chapters on your cloud security journey ranging from architecture to
threat detection to investigations. Along the way, the authors cover Secure by Design

AS SECOPS FORCE
principles, identity modernization, and how to update security best practices specifically
for the cloud.

MULTIPLIER
Additionally, no resource today would be timely without a discussion of AI hype,
challenges, and opportunities. This up-to-the-minute content includes an overview
of the typical generative AI (GenAI) application architecture, as well as the risks,
mitigations, and common security use cases for GenAI applications.

I hope you enjoy the content, and good luck on your cloud security journey!

Frank Kim
Written
Fellow andby John Pescatore
Curriculum Lead
SANS Institute
February 2023
1 
“The race to cloud: Reaching the inflection point to long sought value,” www.
accenture.com/us-en/insights/cloud/cloud-outcomes-perspective
Table of Contents

Chapter 1: The Cloud Security Journey: Day One Chapter 4: Evolving Cloud Security with a Modern
Introduction 1 Approach
Prepared, Protected, and Ready to Proceed 6 Introduction: Why Bad Things Keep Happening
in Cloud 38
Old Things Done New Ways: Why Core Best
Chapter 2: Building Security from the Ground up
Practices Are Still Good, but Likely Need Updates 38
with Secure by Design
The Top 10 Things to Do for Sound Cloud Security 39
Introduction 7
Looking Ahead: Adaptation and Better Security
Understanding Secure by Design and
in the Cloud 47
Secure by Default 7
Embedding Secure by Design into Your
Security Strategy 8 Chapter 5: AI Security Challenges, Hype, and
Opportunities
Planning for the Short- and Long-Term 14
Introduction 48
Facilitating a Culture of Security 15
Terminology, Concepts, and Typical Architecture 48
Benefits 17
Risk Considerations for AI Applications 50
Key Considerations on the Secure by Design Path 18
Mitigation Strategies for Addressing GenAI Risks 52
Getting Started 19
Security Use Cases for GenAI Applications 53
Conclusion 21
Software Composition Analysis (SCA) 54
Static/Dynamic Application Security
Chapter 3: Identity Modernization
Testing (SAST/DAST) 54
Introduction 22
Policy as Code Development and Analysis 55
Challenges in Identity Management 23
Automated Abuse Case Testing 55
The Imperative for Identity Modernization 26
Conclusion 56
Zero Trust and Conditional Access Controls 30
Integrating Legacy Systems with Modern Identity
Solutions 32
Conclusion 36

© 2023 SANS™ Institute. All rights reserved.


Chapter 1

The Cloud Security Journey: Day One

Written by Shaun McCullough, Ashish Rajan, and Megan Roddie-Fonseca


Advisor: Terry Hicks

i
Introduction
Let’s face it, the first day in any new position is inevitably challenging, stressful, even a
little scary. That’s the case for Katie M., who’s just stepped into an important new role
with Cyrene Life Sciences, a multinational pharmaceutical manufacturer headquartered
in the United States, where she’s now responsible for the security of all the systems,
applications, and data the company maintains in the cloud. As a senior-level security
professional with more than 15 years’ experience in the field—including, most recently,
five years as manager of Cyrene’s central security operations center (SOC)—Katie’s not
a newcomer to cloud computing. She can draw on a broad general understanding of
the cloud delivery model and its security implications, and she has more in-depth
knowledge of one of the “big three” cloud service providers (CSPs). Even so, she knows
she has a lot to learn, especially since the company is planning to increase both the
size and the complexity of its already extensive cloud operations, and she knows she
must hit the ground running on day one. That’s why three SANS Institute cloud security
experts have come together to lay out the most critical steps Katie—or any other
security professional in her position—will need to take on that all-important first day
on the job. These steps are presented in roughly the order she’ll need to take them,
but they’re all essential. And, she’ll need to address them all, at least at a basic level,
starting on day one.

Step 1: Architecture and the Cloud Footprint (Ashish Rajan)


• Katie’s first task is to get her arms around the enterprise’s cloud architecture,
especially the architecture of the business-critical applications the company cares
the most about. This will make it possible for her to develop a comprehensive,
high-level view of the current state of the cloud environment at Cyrene. That
means establishing how well its cloud usage and security posture align with
present and future needs, and especially its compliance with the company’s
regulatory requirements, corporate governance standards, and defined
risk posture.
• As a multinational company in a highly regulated industry, Cyrene faces an
exceptionally complex, wide-ranging, and sometimes even contradictory set of
compliance requirements. But literally every enterprise—large or small, private-
or public-sector, whatever industry vertical it operates in—has no choice but to
ensure that its cloud architecture addresses its own highly specific requirements.

Chapter 1: The Cloud Security Journey: Day One 1


There are three primary areas of architecture/footprint focus for day one:

• Identity and access management (IAM)—Ensuring that the right people have the
right access to the right resources for the right reasons
• Data security—Protecting the sensitive information the enterprise maintains
and manages, including personally identifiable information (PII), personal health
information (PHI), and intellectual property (IP)
• Asset management—Identifying, tracking, and securing all the enterprise’s
digital resources, including the types of compute services and cloud-native
services in use, the software and network infrastructure—and any on-premises
implementations that interact with the cloud—and whether network connectivity
exists between on-premises implementations and the cloud
• Regulatory compliance and corporate governance—Identifying any security
controls that have already been applied to the cloud environment based on
Cyrene’s regulatory, legal, and business requirements

Some of what Katie needs to do first is fairly straightforward. She needs to find out
what CSPs the company is currently working with, which ones host business-critical
applications, how large the overall cloud footprint is. For example, she needs to
know how many cloud accounts there are—and what identities have access to the
CSP. She also must determine whether there’s an associated organizational setup for
governance policy in place and identify any areas across the entire cloud footprint
where responsibility for compliance with industry standards may be shared with the
CSP. Most large enterprises have built their foundational cloud infrastructure mainly
on one of the big three—Amazon Web Services (AWS), Microsoft Azure, or Google
Cloud Platform (GCP)—and Cyrene is no exception. But the company does use other,
smaller providers, especially in locations where it needs local-language services.
And, crucially, the company’s CIO has expressed an interest in moving to a more
comprehensive multicloud environment, in the hope that the approach will better
address the company’s needs for scalability, cost savings, and application-specific
support. As Cyrene’s cloud infrastructure grows more complex and more heterogeneous,
securing that infrastructure—even with basic IAM capabilities like determining who
has rightful access to a given account—will inevitably become exponentially more
challenging. Understanding how identities can access the CSP will give Kate a gateway
into any identity-related threats that impact the cloud footprint where business-critical
applications are hosted.

Another important step for day one is to understand how data classification is being
conducted and, specifically, what form of tagging or labeling is being used. Defining
an enterprise-level data classification structure helps. This is the only way to identify
sensitive data, prioritize any investigation or security controls according to its associated
risks, and ensure that appropriate compliance and governance practices and protections

Chapter 1: The Cloud Security Journey: Day One 2


are in place. And as Katie knows all too well, that’s critical because Cyrene is subject to
an extraordinary range of data protection requirements, including the European Union’s
(EU’s) rigorous General Data Protection Regulation (GDPR) and the US Health Insurance
Portability and Accountability Act (HIPAA). A data breach or other security failure that
violates those requirements—or myriad other requirements could result in serious
financial penalties, legal liability, and reputational damage.

And, of course, Katie will need to understand which business-critical assets she has to
protect and where they’re located. For example, what geographical regions they’re in
and what types of computing are involved, everything from virtual machines (VMs) to
Kubernetes containers to serverless architecture.

These first steps will give Katie an overall understanding of the cloud-specific threats
and vulnerabilities Cyrene faces or is likely to face in the future, and the company’s
current-state ability to address those threats and vulnerabilities with the skills and
technologies it has in place. In addition, crucially, it will enable her to begin identifying
and prioritizing the gaps in those skills and technologies she’ll need to fill.

Step 2: Threat Detection (Shaun McCullough)


Katie’s next step is to take her knowledge of the assets and vulnerabilities in the cloud
infrastructure and build a threat detection program. Although a vulnerability or posture
management program will evaluate the way resources are deployed, a threat detection
program evaluates the activities of—and against—those resources to determine whether
there’s one that could be a threat. Her examination of Cyrene’s cloud assets will have
given her a solid understanding of Cyrene’s IT footprint, both cloud and on premises.
That understanding will enable her to answer some fundamental questions that are
central to effective threat detection: Are Cyrene’s applications running on VMs, in
managed containers, or using serverless computing? Was the implementation a lift-
and-shift undertaking or is it cloud-native? Are applications integrated across multiple
clouds or on-premises systems? Every enterprise deploys its resources differently, and
that means there’s no such thing as a one-size-fits-all threat detection program. Katie
will need to look beyond apparent indicators to understand what types of attacks and
other threats Cyrene’s business might face, its enterprise-specific vulnerabilities, and
how well-prepared the security organization is to respond to them.

It will be important for her to evaluate the threat detection tooling of Cyrene’s CSPs,
identify how—and whether—the company is using those tools, and determine what
detection gaps need to be filled immediately. One considerable benefit of CSP-
provided threat detection tooling is that there is no need to manage log collection and
shipping because it’s integrated into the cloud platform itself. Each CSP approaches
the implementation differently, but the core concepts are essentially the same: Identify
which family of detections to turn on and in which account or region they should
operate. However, these detections inevitably have gaps that make it necessary for SOCs
to create and deploy custom detections from logs.

Chapter 1: The Cloud Security Journey: Day One 3


All this means that Katie needs to build a logging or telemetry program. She’ll look at
the rules or policies already established for telemetry collection and data retention.
She’ll evaluate what’s being collected, identify any gaps, and push to implement
automated log collection throughout the company’s infrastructure. Then she’ll ensure
that a security incident and event management (SIEM) application is in place to collect,
query, and build alerts on all that telemetry data to detect activity that should be
investigated.

The logs that Katie will decide on first are those that can be collected without interfering
with Cyrene’s product teams:

• Management API logs—These are the most important logs, detailing user and
resource authentication, identifying interactions with the cloud’s management
API, and tracking changes to cloud-managed resources like access privileges. This
means that attacks that engage with the cloud API—for example, creating new
accounts or destroying cloud resources—can be detected in the logs.
• Cloud storage access logs—These logs record create/read/write/delete actions
performed on data inside a cloud storage container. They can be used for threat
detection to detect unusual activity with high-risk stores, especially for data
loss detection.
• Network logs—Network traffic data recorded in these logs provides metadata
about the network’s interactions with data (for example, timestamp, destination
and source port and IP address, amount of data exchanged, and network interface
involved). This network flow data is especially useful when security professionals
already know what they’re looking for because it makes it possible to find open
ports, uncover traffic patterns, and identify the parts of the network that have
interacted with a suspicious endpoint. Katie must determine whether the CSP’s
threat detection tools cover the network traffic detections needed.

Next, Katie will start looking at logs that may require interaction with product
teams to collect:

• Host and container logs—CSP’s hosts also provided detailed metrics that can be
forwarded into the cloud’s log management tool. This can be useful in detecting
server-side attacks.
• Database logs—These logs can capture queries and database management
activities. Although most database threats can be detected by capturing the logs
of the application in front of the database, there may be focused monitoring of
the most critical databases.
• Orchestration logs—Katie will likely find a combination of VMs and containers
running the applications at Cyrene. Orchestration logs provide insight into how
an entire cluster is operating and can be used to detect attackers’ manipulations.
However, CSPs are increasing their built-in detections of container orchestrators,
so identifying gaps will be necessary. Katie may want to focus detection on
deployment or manipulation outside the company’s established paved paths.

Chapter 1: The Cloud Security Journey: Day One 4


Katie has now established the logs needed for detection, ensured they are collected
in the SIEM, implemented automated deployment of the CSP’s detection services, and
is providing the SOC with new alerts. (There may be too many alerts, making ongoing
tuning necessary.) The bottom line: Katie and her security organization now have the
information they need for effective threat detection.

Step 3: Investigation and Pursuit (Megan Roddie-Fonseca)


Nobody needs to tell a security professional as experienced as Katie that, inevitably,
something will go wrong. No matter how well-designed Cyrene’s security architecture
is or what advanced threat detection technologies and processes are in place, sooner
or later an attacker will make it through the company’s defenses or a malicious
insider will expose sensitive data or an employee will simply make a careless, but very
damaging, mistake. When that happens, investigation and pursuit—figuring out what
went wrong, collecting and analyzing evidence about the incident, trying to identify the
responsible parties, and, if possible, going after them (sometimes in collaboration with
law enforcement)—becomes critical. This means working to understand the nature and
scope of the event, its potential impact on the company, and the attackers’ methods and
motivations. Different circumstances require different responses. An insider attack, for
example, may be handled discreetly, with the employee quietly terminated, their system
access canceled, and an incident report submitted to senior management. By contrast,
a highly sophisticated ransomware attack backed by a nation-state actor will almost
always involve reporting to and collaboration with external parties, including regulators
and law enforcement agencies.

This is where the understanding that Katie has developed of Cyrene’s cloud and security
environment will come into play. Without it, she and her team are likely to waste
precious time and resources chasing down blind alleys. They’ll have a better sense of
what to look for because they’ll know what resources and services the company uses
and what “normal” looks like across its cloud environment. The key areas she’ll need to
consider on day one will be:

• Logging—Katie will need to find ways to leverage logs for investigative purposes,
which will require understanding the content of the logs, as well as how to
effectively search through them, whether with cloud-native tooling or with a
dedicated SIEM platform. These logs should enable Katie to identify a broad range
of concerning activity, like suspicious log-in attempts, abnormal service account
usage, and other indicators of compromise. The ability to take raw logs and extract
meaningful insights from them will allow the security organization to rapidly
triage and respond to security incidents and other events. These logs also can
be leveraged for threat hunting to identify potential vulnerabilities before they
can be exploited, or to catch threat activity before the threat actor can further
their attack.

Chapter 1: The Cloud Security Journey: Day One 5


• Access—When a security event occurs—whatever type it is, however serious its
impact may be—time is always of the essence. Cyrene’s investigators will need to
be able to access relevant logs, cloud resources, and all kinds of organizational
information at a moment’s notice. That means proper access controls must be in
place. For this reason, Katie will have to clearly define and establish the digital
forensics and incident response (DFIR) roles, rights, and permissions (at minimum,
read-only across the entire enterprise) needed for a competent investigation to
be conducted.
• Technologies and processes—It will be critical to identify the security tools
already in place (or those that will need to be in place) to carry out an effective
investigation. Here’s one example: If the DFIR team will be conducting forensic
analysis in-cloud, Katie will have to ensure that the necessary resources, including
a DFIR workstation image and the permissions required to deploy it, exist. On the
other hand, performing investigations outside of the cloud will require physical
workstations that meet the technical requirements of forensics tasks, such as
processor speed and storage that can handle enormous amounts of cloud data.
From a procedural perspective, she’ll need to develop and communicate an
incident response plan that can be easily followed during times of urgency to
ensure investigations are conducted as quickly and efficiently as possible.

Just as in the first two steps, Katie will need to use the knowledge she’s gained from
this assessment to determine what her current-state resources are in terms of people,
processes, and technology. She also must identify and prioritize the gaps in those
resources that need to be addressed most urgently. When hiring for DFIR roles, she’ll
have to look not only for people with incident response (IR) experience, but also for
individuals who have a basic understanding of the cloud computing delivery model.
The cloud presents new concepts and new challenges not faced by incident responders
of days past, and without that basic understanding of the cloud, general IR experience
simply won’t be enough.

Prepared, Protected, and Ready to Proceed


It’s been a grueling first day for Katie—with many more to come—but she and the
security organization are now much better positioned to move forward in improving
Cyrene’s cloud security. The knowledge she’s gained will enable her to address Cyrene’s
cloud security requirements more rapidly, more efficiently, and likely more cost
effectively. She’ll be able to report to her management and to the business on the
current state of the company’s cloud security, making it possible for her to demonstrate
return on investment (ROI) for any new security personnel or technologies she decides
she needs. And, most importantly, the company’s sensitive systems, applications, and
data will be better protected against an ever more dangerous threat environment.

Chapter 1: The Cloud Security Journey: Day One 6


Chapter 2

Building Security from the Ground up


with Secure by Design

Written by Eric Johnson, Bertram Dorn, and Paul Vixie

i
Introduction
System design often prioritizes performance, functionality, and user experience over
security. This approach yields vulnerabilities that can be exploited in the product
and across the supply chain. Achieving a better outcome requires a significant shift
toward integrating security measures into every stage of development, from inception
through deployment.

As the threat landscape continues to evolve, the A total of 26,447 critical vulnerabilities were disclosed in 2023, surpassing
the previous year by more than 1,500.1
concept of Secure by Design (SbD) is gaining
importance in the effort to mitigate vulnerabilities Insecure design is ranked as the number four critical web application
security concern on the Open Web Application Security Project (OWASP)
early, minimize risks, and recognize security as a
Top 10.2
core business requirement. SbD aims to reduce
Supply chain vulnerabilities are ranked fifth on the OWASP Top 10 for Large
the burden of cybersecurity and break the cycle Language Model (LLM) Applications.3
of constantly creating and applying updates by
developing products that are foundationally secure.

The Cybersecurity and Infrastructure Security Agency (CISA), National Security Agency
(NSA), Federal Bureau of Investigation (FBI), and international partners including the Five
Eyes (FVEY) intelligence alliance have adopted the SbD mindset and are evangelizing
it to help encourage proactive security practices and avoid creating target-rich
environments for threat actors.

More than 60 technology companies—including AWS, Microsoft, and Google—recently


signed CISA’s Secure by Design Pledge as part of a push to put security first when
designing products and services.4

This chapter explores what SbD actually means and discusses its benefits, cultural
aspects, key considerations, and action items that can set you on the path to
successfully embedding SbD into your security strategy.

Understanding Secure by Design and Secure by Default


The term Secure by Design is often confused with Secure by Default. These are two
distinct but complementary elements of a holistic security strategy.

• Secure by Default is a user-centric approach that indicates the default settings


of a product are secure out-of-the-box and resilient against common exploitation
techniques, without the need for additional security configuration.

_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
1
“2023 Threat Landscape Year in Review: If Everything Is Critical, Nothing Is,” January 2024, https://blog.qualys.com/vulnerabilities-threat-
research/2023/12/19/2023-threat-landscape-year-in-review-part-one
2 “OWASP Top Ten,” https://owasp.org/www-project-top-ten/
3 “OWASP Top 10 for Large Language Model Applications,” https://owasp.org/www-project-top-10-for-large-language-model-applications/
4 “Secure by Design Pledge,” www.cisa.gov/securebydesign/pledge

Chapter 2: Building Security from the Ground up with Secure by Design 7


• Secure by Design is a developer-centric approach that goes beyond implementing
standard security measures to evaluate and address risks and vulnerabilities
at every stage of the development life cycle—from design to deployment and
maintenance—rather than reacting to them later.

Both ensure that security is inherent. Together, they work to establish a solid foundation
for proactive security, build trust with customers, and increase the level of difficulty for
threat actors seeking to exploit products and systems.

Secure by Design offers more flexibility to help protect resources and withstand threats
that originate outside of architectural components. It allows you to use products with
different options and settings, so the outcome aligns with your risk tolerance level.

With SbD, the security of architectural components that products are built around
cannot be altered without changing their fundamental design or setup. SbD principles
can be applied to components ranging from IT workloads to services, microservices,
libraries, and beyond.

Another way to think of SbD is to consider the topology of a space, such as a house. An
SbD setup should have only closed, finite rooms in the configuration space (house) that
do not allow access to an infinite space (outside of the house) except through well-
defined and carefully controlled ingress and egress points. This absence of configuration
space options facilitates added security. If you don’t make design principles accessible
to builders, then they’re creating IT workloads in a secure environment.

When software is in the cloud, SbD helps eliminate access points. Identity and access
management (IAM) is your first line of defense, as IAM misconfigurations can lead to
misconfigurations and unsecure usage elsewhere. An example of an SbD approach in an
IAM system for distinct principals (IAM users, federated users, IAM roles, or applications)
is to rely on testable outcomes that make them atomic. Because IAM is inherently based
on the “default deny” principle that either explicitly allows or implicitly denies access,
SbD helps you lay the foundation of a secure IAM setup for builders and operators
within the cloud environment as part of an overarching, centralized IAM system that is
accompanied by centralized logging. New design elements should automatically inherit
the secure setup; otherwise, they shouldn’t work.

Embedding Secure by Design into Your Security Strategy


Incorporating SbD into your overall security strategy can help your organization
minimize potential risks, boost productivity, build trust with customers and partners,
and reduce costs over time by developing products and services that require less
patching after delivery. Software development life cycle (SDLC) processes, automation,
defense-in-depth, artificial intelligence (AI), threat modeling, and compliance are key
factors to keep in mind.

Chapter 2: Building Security from the Ground up with Secure by Design 8


Integrating SbD into the SDLC
SbD contrasts with more traditional development approaches that introduce security
measures as additional layers to the end product. Shift left and DevSecOps5 are
related concepts for incorporating security throughout the SDLC that result from an
SbD approach.

Commonly used SDLC models such as waterfall, spiral, and agile don’t address software
security in detail, so secure development practices need to be added to set foundational
security. Additionally, in a cloud environment, infrastructure is also code that should fall
under the purview of the SDLC.

The National Institute of Standards and Technology (NIST) Secure Software Development
Framework (SSDF), also known as SP 800-218, can support efforts to strengthen the
security of your SDLC. The SSDF describes a set of high-level practices based on
established standards, guidance, and secure software development practice documents
from organizations such as SAFECode, BSA, and OWASP. The framework is divided into
four groups (see Figure 1) that are designed to help prepare the organization’s people,
processes, and technology to perform secure software development, protect software
from tampering and unauthorized access, produce well-secured software with minimal
security vulnerabilities in its releases, and respond to residual vulnerabilities. Although
it’s not a checklist—and the degree to which you choose to implement the practices
depends on your organization’s requirements—it can help you adopt well-understood
best practices and ensure team members across all phases of the development pipeline
assume responsibility for security.

Supporting SbD with Automation


Two areas for automation in relation to SbD workloads are important in the effort
to maintain healthy and secure setups. The first is preventive controls that ensure
configurations can be rolled out only in a secure mode that is defined by the design.

Continuous integration and continuous delivery (CI/CD) pipelines that help automate
the software delivery process are substantial contributors to SbD environments, as
they include a comprehensive set of checks to be run—such as firewall settings, OS
configurations, libraries used, security-related reviews, and software components used—
before a target configuration is implemented.

The second area of automation includes detective systems, which can identify
noncompliant components or configurations. Misconfigurations generally shouldn’t
happen within SbD setups, as they are largely prevented through the design and
preventive controls in the implementation. Additionally, it is important to note that a
vulnerability in a system due to either the design or the implementation may not pose
an immediate problem, if the design includes defense-in-depth elements that protect
the overall system despite any individual flaws. Nevertheless, if a detective system finds
something that doesn’t adhere to the design, it’s a signal that the design needs to be
______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

5 “What is DevSecOps?” https://aws.amazon.com/what-is/devsecops/

Chapter 2: Building Security from the Ground up with Secure by Design 9


improved and/or preventive controls need to be
added, and that the anticipated “closed” space
has not, in fact, been closed.

Recognizing the Importance of


Defense-in-Depth
In an SbD approach, it’s important to recognize
that no matter how careful the design or
implementation, systems or controls may fail.
Leveraging diverse security measures, such
as network hardening and security system
integration, along with secure software design,
can help you address threats and eliminate
single points of compromise, so the failure of one
control doesn’t lead to the failure of the overall
protection provided.

If a builder successfully places a potential target,


such as sensitive data or critical workloads, in
an SbD structure, that target is protected by Figure 1. Establishing Secure Development Practices
a layered defense. A secure design can, and
should, be embedded (or nested) into another
secure design to form a shield of defensive layers that support each other. In situations
involving an unsecured resource, such as a legacy application with undefined software
supply chain risks, the resource can be protected with an SbD approach by creating a
closed space around it.

If the design needs to be open to enable some business processes, a layered defense
can address threats and either prevent a breach or limit potential damage.

Applying SbD to Artificial intelligence (AI)


The need for SbD applies to AI like any other software system. Organizations in all
industries have started building generative AI (GenAI) applications using large language
models (LLMs) and other foundation models (FMs) to enhance customer experiences,
transform operations, improve employee productivity, and create
82% of business leaders view secure and trustworthy
new revenue channels. As you explore the advantages of GenAI, it’s AI as essential for their operations, but only 24% are
important not to let innovation take precedence over security. actively securing GenAI models and embedding security
processes in AI development.6
FMs and the applications built around them are often used with
highly sensitive business data, such as personal data, compliance
data, operational data, and financial information, to optimize the model’s output. Risks
can stem from the setup around training data, the origin and nature of training data,
prompt design, and the use of techniques that can lead to hallucinations, all of which

_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
6
“Securing generative AI: What matters now,” May 2024, www.ibm.com/downloads/cas/2L73BYB4?mod=djemCybersecruityPro&tpl=cs

Chapter 2: Building Security from the Ground up with Secure by Design 10


can impact the usability of results. To protect users and data, security needs to be built
into machine learning (ML) and AI with an SbD approach that considers them to be part
of a larger software system and weaves security into the AI pipeline. A “kill” switch may
be needed at the output of a GenAI system to prevent it from leading in unwanted or
misleading directions. Building finite, closed spaces can help you secure training data
against poisoning and misuse. The finite space for training data should be empty up
front, and individual decisions about introducing datasets for the targeted model should
be documented. To further address risks, additional measures should be taken to keep
them SbD. These may include placing output in a closed space and, in security-relevant
situations, specifying that it should only leave the space for use if there’s a plausibility,
applicability, and sanity check performed by humans in the loop.

Integrating an AI/ML bill of materials (AI/ML BOM) and cryptography bill of materials
(CBOM) into BOM processes can help you catalog security-relevant information, and
gain visibility into model components and data sources. Additionally, frameworks and
standards such as the NIST AI Risk Management Framework (AI RMF 1.0), the HITRUST AI
Assurance Program, and ISO/IEC 42001 can facilitate the incorporation of trustworthiness
considerations into the design, development, and use of AI systems.

Identifying Threats in the Design Phase with Threat Modeling


Threat modeling is an essential part of an SbD approach that can help you analyze the
security of a product from an adversarial perspective, identify potential threats, and
determine responses to these threats.

Threat modeling in the design phase fosters a culture of secure design and
implementation, which in today’s landscape includes infrastructure, configuration
management, and application code. Threat modeling exercises conducted during
design allow development and security teams to conceptualize and then document
potential threats before code has been written, and can save time and money by
avoiding rework or escalations later in the development process. Threat models should
be treated as living documents that can be integrated into your SDLC processes and
evolve with experience and learnings, as well as the overall product evolution and threat
landscape over time.

During a threat modeling exercise, mitigation activity should focus on both the design
and the available technology.

CVE-2024-3094—a critical supply-chain compromise recently found in the XZ Utils data


compression library—can be used as an example. The malicious code leverages testing
mechanisms in the build process and attempts to weaken the authentication of secure
shell protocol (SSH) sessions via SSHD (the server-side daemon process for SSH) within
some operating system environments to allow unauthorized access. The accessibility of
operating systems should be considered when threat modeling. The challenges of the
library supply chain are included in a standard lift-and-shift cloud migration approach.
If the access approach to affected XZ Utils versions 5.6.0 and 5.6.1 isn’t limited to one
daemon (in combination with the operating system’s login process), and includes

Chapter 2: Building Security from the Ground up with Secure by Design 11


your cloud provider’s IAM security layer, the pure vulnerability of the SSHD would be
mitigated. The finite space of the IAM setup would create finite space for the login, which
aligns with an SbD approach.

Rethinking threats to the login process itself can lead to either network-based control of
the communication as discussed, or to the login process being embedded in a different
environment, which helps to prevent the threat. The cloud- and IAM-based login design
can also provide you with benefits through the scalability or responsibility shift that
comes with the cloud. Keeping the IAM system secure is a critical task for the cloud
service provider (CSP) and part of its core business functions. The centralized logging
and monitoring capabilities provided by the CSP’s IAM system can help you ensure that
only authorized users have access to sensitive data and resources.

Design-focused exercises should be prioritized and conducted for common threats


starting with physical aspects, and moving up to the network layer, operating system
and setup topics, software topics, database queries, storage usage, authorization and
authentication methodologies, and deep into software and hardware design supply
chains. Rather than using the most flexible design with fine-grained configuration
options, threat modeling can help you define the most secure design. If you need
customization, you can document adjustments and run them through an approval
workflow. Misconfigurations can be mitigated using a combination of policy guardrails
and detective automation.

The Impact of SbD Approaches on Threat Modeling


To stay with the model of spaces, an SbD approach closes the topological space and
creates a finite environment, which has an impact on typical threat-modeling trees.
Imagine a bowl or torus (see Figure 2) that
represents all interaction possibilities (instead
of a tree structure with open ends on all
sides). As all possibilities in a closed space
are known, they can be designed to be safe
by nature. The impact of this approach is
that—because the resources within the space
are safe—the actors inside can navigate freely
within those constraints. The space needs
to be reviewed periodically to determine
whether the implementation of this freedom
based on limited choices meets your business
requirements. It should be noted that open
Figure 2. Closing Topological Space with an SbD Approach
and closed spaces are not to be directly
equated with network boundaries. Network
boundaries can be helpful to design and defense-in-depth, but zero trust principles
mean that network boundaries are never fundamental to a secure design.

Chapter 2: Building Security from the Ground up with Secure by Design 12


This allows you to scale out new deployments without reiterating on the security
setup, including access control, logging, configuration recording, connectivity, restore
possibilities, and all the other security domains that are required after conducting threat
modeling and compliance analytics.

The SbD nature of the created space (through automation or systems on the borders)
keeps new resources of the same type in the safe state and free from inherited threats.
Resources with unwanted configurations, states, or dependencies cannot be deployed.
However, the design’s assumptions should be challenged as often as your commercial
requirements allow. Each vulnerability scan with findings should be tested for an impact
on design aspects. New business requirements and compliance considerations also may
necessitate changes or add to your design targets.

Guiding Principles
There are three guiding security design principles to consider when applying SbD to
threat modeling:

1. Address threats through design first before turning to a technical or


organizational measure to try to eliminate the threat area. Choose design options
that help keep the threat out of your product, so you don’t need to address them
later.

2. Monitor the logical borders created by closed spaces to gather baselines. This
helps create enforcement automation through filter and approval technology.
Typical examples of borders include firewalls, web application firewalls (WAFs),
landing zones, system-call jails for preventing unwanted interaction with the
operating system (such as creating users), IAM settings, closed-source libraries,
and software repositories.

3. Design your spaces to allow changes to flow within. You can achieve this by
defining behavior outcome at the design level and having your builders make
decisions along this path in the design space you create.

Leveraging SbD to Simplify Compliance


An SbD approach isn’t a direct path to legal compliance. However, the use of SbD
practices can help you conform to internal technical compliance requirements as well
as regulatory requirements that include technical and organizational measures (TOMs)
addressing data management processes in relation to the General Data Protection
Regulation (GDPR) and detailed guidance from the Payment Card Industry Data Security
Standard (PCI DSS). SbD can also help you adhere to voluntary best practice frameworks,
such as the International Organization for Standardization (ISO) 27000-series
information security standards and the NIST SSDF.

An SbD approach can help you meet requirements across people, process, and
technology. From a people perspective, you can provide mandatory, role-specific training
on secure coding best practices. From a technology perspective, your design can be set

Chapter 2: Building Security from the Ground up with Secure by Design 13


to automatically initiate and enforce backups or allow only preconfigured, finite, and
known network connections. From a process perspective, you could build a deployment
pipeline with an attached ticket system that can automatically enforce necessary
processes and documentation.

You can get there by defining your compliance requirements from the domains to
the controls, and then iteratively working backward to meet them. Instead of linking
the controls directly to TOMs, such as architectural components, configurations,
processes, and procedures, the goal is to link them to design principles, which are
then implemented in TOMs outside of the target system. The TOMs are then part of
the surrounding design and unchangeable configurations, rather than part of the
workload itself.

Consider zero trust, for example. If your target system is in an environment that enables
communication only after context-based authentication and authorization verify the
identity of the actor attempting access, you can meet a set of technical compliance
requirements around access and user management and facilitate a finite and closed
space by allowing only approved connections.

Planning for the Short- and Long-Term


SbD environments address vulnerabilities early with a focus on building closed spaces
that include guardrails for workloads. This creates important short- and long-term
effects to consider.

Short-Term Effects to Mitigate


In the short term, builders who aren’t accustomed to keeping security in mind at every
stage of development may feel their agility is being constrained. Developers are often
encouraged to focus on launching new products as quickly as possible to support
competitive advantage. An SbD approach may not allow them to use all the code
libraries and system access they want in the process, which can impact time to market.

On the other hand, supporting builders with a design phase in which the main security
and technical compliance aspects are handled can make them feel safe, because the
design helps prevent insecure configurations. This is typically the situation in cloud
systems, where builders only get access to approved cloud services over approved
authorization and authentication paths, with prepared logging and reporting (landing
zones). The what, who, and where is defined and controlled by preventive automation,
such as a centrally managed IAM system and infrastructure as code (IaC).

Landing zones with built-in policy guardrails may initially add friction to the developer
workflow and decrease agility during a period of adaptation. However, the freedom
provided within well-defined constraints will ultimately pay off in terms of both better
security outcomes and the agility needed to help you achieve business goals.

Chapter 2: Building Security from the Ground up with Secure by Design 14


Some may wonder: Is speed of the essence for the product, or should you invest more in
its security? The answer provided by experience is increasingly clear. Organizations that
prioritize speed of execution over security are paying a heavy price. A properly designed
SbD engineering framework provides more than adequate speed and agility. Not only
will your products—and your customers—be better protected with security incorporated
into every development decision, but your security efforts also will ultimately cost
less overall. In many cases, good security can’t simply be added after the product is
substantially complete.

Another short-term aspect of the targeted closed space of an SbD setup to consider is
how to deal with new technology, testing, and exploration. Testing is critical to verify the
quality and reliability of applications. This requires teams to create test suites and mock
data to maximize code coverage in lower-level environments. Doing so creates a secure
space for development and testing, which is often managed by version control systems
and continuous delivery pipelines.

Long-Term Effects to Consider


In the long term, designs can ultimately create challenges if they are too fixed and
can’t allow for changes in technology. You should, therefore, check design decisions for
unnecessary rigidity to prevent negative effects.

The evolution of cryptography is one example. If a design specifies a specific algorithm


or key length, its efficacy will slowly erode. It could either stop providing the security
needed to protect sensitive data due to increases in computing power—as happened
with the Data Encryption Standard (DES), which was withdrawn and superseded by
Advanced Encryption Standard (AES)—or it could cause a device to stop working because
of an expired certificate that it cannot renew.

To make products and services more sustainable, the SbD approach requires a balance
between the usability of the product, fundamental parameters, and malleability of the
design. Designs should take long-term effects into consideration by calling out functions
and their outcome and risk surface but staying out of technical details.

Facilitating a Culture of Security


An SbD approach facilitates cultural transformation that makes security a shared
responsibility for everyone involved in development. It applies to different functions
involved with your IT workloads on the path from identifying requirements to creating
a design defining the output to technical and organizational measures that are then
looped back through controls.

Chapter 2: Building Security from the Ground up with Secure by Design 15


The Builder
Builders are the main target of an SbD approach. Although they may initially
have concerns over access and resource limitations—as well as debugging and
implementation challenges if a design does not offer the anticipated levels of technical
readiness—builders also will realize benefits.

The design should provide builders with clear guidance and eliminate risk decisions
that can be made without undergoing a security approval process. Because builders may
not always realize when they are making a risk-based decision during the development
process, the design should prevent them from having to make them in the first place.

Additionally, builders can sometimes address challenges with the borders of the created
space through the design. For example, it may be easier to design an intake process
for code libraries and other elements of the supply chain to get software components
from existing outside repositories rather than create new components that may also
introduce new risks. It’s important to note that this intake process would be subject to
threat modeling as part of your SbD approach.

The Owner
Business process owners define the target behavior and, therefore, the design space. In
many cases, business requirements are found to be broader than expected. Regulatory
considerations, commercial aspects ranging from costs to time to market, and long-term
strategies for asset development should be incorporated into the design so that its
parameters can be modeled as input data for technical integration.

Business and IT workload owners are also key players in the process of working
backward from the most secure design to a desired working point that balances
security needs with commercial aspects. Informed decisions should be made, with clear
documentation and defined risk takers.

The Supervisors
Supervisory functions, such as design review boards or auditors that define and
potentially manage the data exchange between systems, address the integration
feasibility of the design through technical or organizational measures. They implement
preventive controls to set the borders of the finite space defined by the design and can
use detective controls to guide the design into the future. Additionally, the automated
implementation of those controls, through configuration management systems, can
generate evidence and documentation to prove the compliance status of the overall IT
workload. In cloud environments, where all configurations are known, the automation of
evidence generation is particularly important. Because these controls are running on the
principal of SbD to create finite configuration spaces, the desired result is a technically
compliant state.

Chapter 2: Building Security from the Ground up with Secure by Design 16


The Insurance
When accompanied by a layered defense for situations in which a workload needs to
deal with infinite configuration spaces, the SbD approach shifts the risk discussion away
from individual technical decisions to the design layer. However, the insurance aspects
of transferring or sharing risks also should be considered. Managed services, especially
those with higher integration layers, offer the ability to transfer risk and reduce your
operational burden. The shared responsibility between your organization and the
managed service provider (or CSP) encapsulates risk, and auditor-issued reports,
certifications, accreditations, and other third-party attestations provide visibility into the
effectiveness of their security and compliance posture. Partnering with providers can
help you cut off areas of risk and minimize the potential impact of security incidents as
part of an SbD approach.

Benefits
A robust SbD approach establishes a solid foundation that reduces risks and yields
security benefits for your development teams—and your business.

Scalability
Operations inside an SbD setup allow you to quickly scale,
without reiterating security settings. This is particularly beneficial
in environments where the demand cannot be predicted
precisely up front. An SbD approach helps create well-architected
landing zones that are both scalable and secure. Automation
through code pipelines, including automatic code checks against
attempts to open the room, are a key element here. These
pipelines also add to detective controls to help identify attempts
(or the need) to cross the borders of the design. Systems that
execute these pipelines concentrate risk through the need to
be able to execute everything that is required. Therefore, they
should be outside of the target design in their own closed SbD
room. This provides a starting point your organization can use to
efficiently launch and deploy workloads and applications with
confidence in your security and infrastructure environment.

Repeatability
Having prepared spaces also allows you to repeat setups in an
agile way. With an SbD approach, you can build products and
services that are designed to be foundationally secure using
a repeatable mechanism (see Figure 3) that can anchor your
development life cycle.
Figure 3. SbD as the Anchor of Your Development Life Cycle

Chapter 2: Building Security from the Ground up with Secure by Design 17


Agility
While your builders may be concerned about the access and resource limitations
associated with an SbD approach, agility inside a closed space can be higher in the
long term. When the design of an environment is secure, builders inside the SbD
configuration do not need to rethink the security setup and can concentrate on
their areas of expertise. By weaving security into your development practices, your
organization can become more agile, resilient, and responsive to threats.

Sustainability
A solid SbD approach includes built-in feedback loops through detective controls that
facilitate sustainability by enabling you to analyze data and leverage insights to enhance
the security of your products, services, or processes. If the design considers future
developments in the technology—such as cryptography changes, for example—following
them should be possible by design. This leads to products and services with a longer
lifetime, with potentially fewer changes and iterations, and a stable interaction surface.

In a cloud environment, log data can be used to create detective controls. Detection
of anomalies, and failed attempts to configure things outside the closed space should
lead to documentation (through tickets) and can drive the creation of new controls that
can further advance detection. This creates a flywheel effect that can help you tighten
security and cut off open treats. The closed space remains closed while taking care of
the dynamics on its borders.

Manageability
Manageability functions such as logging, reporting, and gathering data for compliance
purposes can generally be built into the design and don’t need to be rethought.
Included preventive controls will automatically generate the data needed to
keep the IT workload under control. A predesigned operating setup for compute
instances, for example, can include backup and restore, logging, access management,
patch management, inventory management, and telemetry data functions that
are automatically rolled out. Nowadays, you can orchestrate these things through
automated systems and document them with detective controls.

Key Considerations on the Secure by Design Path


Security is not like a Boolean parameter that’s either true or false. It is closer to a
quantum state, since known security posture is constantly evolving. Design changes and
implementation efforts should be key considerations in your SbD approach.

Design Changes
Threats to new and existing technologies are constantly evolving. SbD preventive
controls, such as firewalls and runtime security agents, can help security teams respond
quickly to an intrusion. Consider a scenario in which your security team has discovered

Chapter 2: Building Security from the Ground up with Secure by Design 18


a threat actor compromising the environment through a vulnerable library. Following
an agile workflow, the security team quickly writes a firewall policy and deploys the
rule through a CI/CD pipeline to thwart the intrusion. From there, additional root cause
analysis can identify permanent design improvements to help prevent future incidents.
Security teams performing continuous, incremental design enhancements will be in a
better position to respond quickly to evolving threats.

Implementation Efforts
The path to an SbD approach includes an additional step between requirements and
practical implementation. You will need time to qualify design-level mitigations, and
traditional conversations and documentation may be required to establish the “why”
and “how” behind your approach.

In addition to setting aside time for the design phase, focus on standardizing
approaches that can be reused by others. You also should choose your tech stacks
carefully, keeping an eye on complexity and related security overhead. Consider
leveraging cloud services to address problems, instead of building everything
on your own.

The tool landscape also should be approached through an SbD lens. Free selection of
tools, such as programming languages, and methodologies in software development
can create more dependencies (and open spaces) than the risk appetite for the
deployment allows. Time to market and implementation considerations might outweigh
security concerns related to the selected language or programming framework with its
dependencies, which is exactly what you should consider with care.

Using services with a higher integration level, such as cloud services, can reduce
your implementation efforts. Critical capabilities, such as IAM and connectivity, are
already designed and have security built in, which helps provide you with a closed risk
environment by design.

Getting Started
Taking a new approach to security can be daunting, especially for organizations used to
focusing on “check the box” compliance exercises. Five key action items that can help
you avoid frustration and set you on the path to successfully implementing SbD include:

• Identify your core SbD pillars—Evaluate a matrix of your technical domains with
the security domains that are called out by your business processes. Technical
domain examples include logging and security domain integrity while design
examples include mandatory checksums and authenticated encryption. Each node
in the matrix will require a decision to be made regarding how to address the
associated security domains.

Chapter 2: Building Security from the Ground up with Secure by Design 19


• Define the scope of your design—Attempt to eliminate risks within the matrix with
a design change. If elimination is possible, consider the potential effects on the
business and on TOMs. Document and communicate the elimination process to
relevant stakeholders.
• Validate technical feasibility—Verify whether the identified elimination process
can be implemented with your technical or organizational capabilities. There may
be commercial aspects that prevent the most secure design. If there are conflicts
and/or a need to move away from the secure design to a less secure setup, verify
that the change and its outcome are understood. Document economical tradeoffs
and undertake a risk-acceptance process with the owner of the workloads and
the builders.
• Stay flexible—Emphasize the need to remain flexible with all your SbD
stakeholders. Known and established architectural components may not meet
actual design options, which presents an opportunity to invent and simplify.
Keep the design thinking flexible, as well. Technical needs might create topics
that require urgent attention. In such a case, the design is temporarily secondary
to security needs. However, the design should take the lead in incorporating
learnings from the emergency into the overall setup and components.
• Review your design continuously—Implement detective and preventive controls
and pay attention to their findings. Expect to discover vulnerabilities, and
continuously feed them into your design processes. Allow your products and
services to react to design changes and avoid one-way-door design decisions that
could have significant and irrevocable consequences.
• Create open lines of communication—Set up processes to gather feedback
from stakeholders and users about security issues with regular meetings to
update your leadership team. Measuring progress is key to quantifying positive
impact. However, much of the impact of an SbD approach amounts to measuring
what didn’t happen or what might otherwise happen if security doesn’t stay
central to your efforts. Although it’s impossible to measure these outcomes with
certainty, you can present metrics that convey a reasonable view of progress.
Sample metrics that can help you account for both direct costs, such as patch
management, and indirect costs, such as brand
reputation, might include: Keeping the world safe as the digital economy grows is a big challenge
that can only be accomplished through automation. Secure by Design
• How much your organization spends
(SbD) is how that automation vitally appears at the system design
each year fixing security issues after level. Every digital architect needs to know what SbD is and how to
software is deployed apply it. —Paul Vixie, AWS Deputy CISO, VP, and Distinguished Engineer
• How much you could or have saved by
building security in at the design phase
• Changes in customer satisfaction since developing products that require
fewer patches

Chapter 2: Building Security from the Ground up with Secure by Design 20


Conclusion
Good security is the key to experimenting with new technologies. A proactive, Secure by
Design approach to development that builds security from the ground up allows you to
identify and fix vulnerabilities early, increase cost efficiency, and create more resilient
products. The use of closed, finite spaces for development activities can provide your
builders with a secure environment to work in, and help you withstand threats that
originate outside of architectural components. As you consider your organization’s
development processes, start thinking about the spaces team members are using for
engineering activities. Are they open or closed? Taking a new approach may seem
daunting, but establishing the right foundations and keeping key considerations in mind
along the way can help you successfully embed SbD into your security strategy—and
build trust as you innovate.

Chapter 2: Building Security from the Ground up with Secure by Design 21


Chapter 3

Identity Modernization

Written by Simon Vernon and Angelica Faber

i
Introduction
Identity is a cornerstone of modern IT security, serving as the backbone for
authentication, authorization, and access control across cloud and hybrid environments.
Whether in a small business or a large enterprise, identity determines who has access
to what, ensuring data integrity and regulatory compliance. As organizations embrace
digital transformation and generative AI (GenAI) technologies, the traditional approaches
to identity management are no longer sufficient. Legacy systems, primarily built on
Active Directory, face significant challenges in adapting to modern security practices,
leading to vulnerabilities and operational complexities.

The concept of zero trust has emerged as a guiding principle, emphasizing that no
user or device should be trusted by default. This shift requires organizations to adopt
modern identity solutions that provide granular and dynamic control over access while
integrating with existing legacy infrastructure. Microsoft Entra ID (formerly Azure AD),
with its suite of advanced features, offers a pathway to identity modernization, enabling
organizations to implement Zero Trust Network Access (ZTNA) and protect both modern
and legacy systems.

This chapter explores the key aspects of identity modernization, focusing on the
challenges posed by legacy systems, the benefits of modern identity solutions, and
the critical role of conditional access and zero trust. We will delve into the unique
challenges of integrating legacy systems, such as outdated protocols, misconfigurations,
and technical debt, and examine how Microsoft Entra Private Access addresses these
issues. By combining advanced security features with flexible integration, Entra ID
provides a robust framework for organizations to secure their identity infrastructure and
reduce the risk of unauthorized access.

Identity management is a cornerstone of business operations, whether you‘re an


established enterprise, a dynamic startup, or an expanding organization. Central to
this is the triad of authentication, authorization, and accountability—key elements that
empower your users, partners, and clients while safeguarding against malicious entities.

Businesses find themselves in one of three states concerning identity management:

• Locked-in—This state signifies that an organization has an established, yet rigid,


identity system that is difficult to update or migrate due to legacy infrastructure.
The primary concern here is vulnerability due to the inflexibility of the system. It
restricts the organization to minor updates, which may only be applied to limited
areas, thus not adequately addressing potential security threats.
Solutions include incremental modernization, where gradually replacing parts of the
legacy system with more flexible, modular solutions, or a hybrid approach, where
a hybrid identity framework is implemented that allows new technologies to be
integrated with the existing legacy systems. In this approach, the organization often
partners with security specialists and works with cybersecurity firms to enhance the
existing systems’ security layers without a full system overhaul.

Chapter 3: Identity Modernization 22


• Migratory—Common among organizations, especially those transitioning to cloud
services, this state occurs where on-premises legacy systems meet cloud-based
solutions. The blend of old and new systems can lead to inconsistent security
protocols and potential gaps.
Solutions can include phased migration strategies, carefully planning the
transition phases to minimize disruptions and security risks or unified security
protocols, establishing comprehensive security standards that apply to both
legacy and cloud components.
• Outsourced—Typically adopted by smaller or newer startups, this state involves
outsourcing all identity services to an external provider. This does create
challenges as outsourcing can lead to reduced oversight of identity management
and potential dependencies on the third-party provider’s security practices.
Vetting and continuous assessment of providers must be in place to ensure the
provider upholds high standards of security and data management. In these
relationships, it’s important to establish clear service level agreements (SLAs) that
define precise expectations and responsibilities concerning identity management
and security incident responses.

In each of these states, it is crucial to continually evaluate and adapt identity


management strategies to fit the evolving business needs and threat landscapes.
By doing so, organizations can ensure that their identity management systems not
only protect against current threats but are also scalable and flexible enough to
accommodate future growth and technological advancements.

Challenges in Identity Management


Identity management in the cloud presents unique challenges. The complexity of
deploying cloud solutions, coupled with a limited understanding of technology, often
leads to poor implementation. Many businesses rely on legacy systems, which are
difficult to integrate with modern cloud strategies, hampering their ability to adapt
to new technologies. Legacy systems may use outdated protocols that do not align
with modern security practices, leading to vulnerabilities. Additionally, a lack of
skilled personnel to manage identity solutions further complicates the process. These
challenges create an environment where security risks proliferate, with organizations
struggling to maintain control over their data and access points.

Active Directory (AD) has been a foundational component of many enterprise networks
for more than 25 years, serving as a central repository for authentication, authorization,
and policy enforcement. However, the legacy nature of AD brings several challenges
that can hinder identity modernization and pose significant security risks. In this
section, we explore these challenges in detail, focusing on the outdated nature of AD,
misconfigurations, technical debt, stale objects, and the lack of regular assessments
or monitoring.

Chapter 3: Identity Modernization 23


Outdated Systems
AD has roots in technology designed in the 1990s, which means its architecture and
underlying protocols were built for an era with different security concerns. As a result,
older operating systems tied to AD often lack the security features found in newer
versions. This can force organizations to weaken certain AD security configurations to
support legacy protocols and authentication mechanisms.

Legacy protocols like NTLM, now officially deprecated and older versions of Kerberos,
while once standard, are now considered security risks. Their continued use is often due
to compatibility requirements with legacy systems, creating vulnerabilities that can be
exploited by malicious actors. The risk is further compounded when outdated systems
cannot be upgraded due to software or hardware constraints, leaving organizations with
limited options for mitigation.

Small and medium-sized enterprises (SMEs) particularly struggle to deal with legacy
systems, often relying heavily on on-premises AD. This reliance is exacerbated by a
rapidly dwindling technical workforce capable of managing these systems securely.
The talent shortage means fewer skilled professionals are available to implement and
maintain robust security measures. Financial constraints further limit the ability of
SMEs to invest in necessary upgrades or additional security tools, making them more
vulnerable to cyber threats. As a result, SMEs face significant challenges in maintaining
secure and compliant IT environments, often having to balance operational needs
against security risks in ways that larger organizations might not.

Misconfiguration
Even in environments where systems are generally kept current and patched,
misconfigurations can still pose significant risks. AD is a complex system with many
moving parts and ensuring that every component is configured correctly requires
expertise and constant attention. Misconfigurations can lead to unintended access,
security gaps, or vulnerabilities that malicious actors can exploit.

A common issue is the failure to apply current recommended practices, often due
to a lack of personnel with the necessary knowledge or a lack of time to implement
changes. For example, administrators may leave default settings intact, such as allowing
anonymous LDAP binds or enabling unconstrained delegation, which can lead to
privilege escalation and unauthorized access.

Technical Debt
Technical debt accumulates as systems evolve through mergers, acquisitions, and
other organizational changes. This debt can manifest as complex configurations, such
as multiple domain trusts, which should ideally undergo consolidation. However, for
various reasons, consolidation often does not happen, leaving organizations with
complicated and potentially insecure AD environments.

Chapter 3: Identity Modernization 24


It can be challenging to manage and secure multiple domain trusts, especially when
the security practices across domains are inconsistent. This inconsistency increases the
attack surface, allowing malicious actors to exploit weaker domains to gain access to
more secure ones. Trust relationships that were once necessary can become liabilities if
not properly managed or re-evaluated over time.

Life Cycling of Objects and Security Implications


Stale objects, particularly dormant user accounts, pose a significant security risk. These
dormant accounts are prime targets for malicious actors seeking unauthorized access
to sensitive data. Organizations that fail to secure or remove these accounts leave a
proverbial “back door” open, enabling attackers to move through the environment
undetected. Stale objects can result from employee turnover, resource mismanagement,
or other operational inefficiencies. If these accounts retain high-level permissions or are
linked to critical systems, they become even more valuable to attackers. Regular audits
and clean-up of stale objects are essential to mitigate this risk, yet many organizations
neglect this critical step, leaving sensitive entities vulnerable.

Implementing robust policies and procedures for the life cycle management of AD
objects is crucial. This includes:

• Regular audits—Organizations should conduct frequent and thorough audits


of AD to identify stale objects, dormant accounts, and other potential security
risks. These audits should be performed at least quarterly, if not more frequently,
depending on the size and complexity of the organization.
• Automated tools—Automated tools can be used to monitor and manage the life
cycle of AD objects. These tools can help identify dormant accounts and other
stale objects more efficiently than manual processes.
• Strict onboarding and offboarding processes—It’s key to ensure that there are
well-defined processes for adding and removing user accounts. This includes
immediate deactivation and removal of accounts when employees leave the
organization.
• Regular password updates—Organizations need to enforce policies for regular
password updates and ensure that inactive accounts do not bypass these policies.
• Permission reviews—Regularly reviewing and updating permissions for all
accounts, especially those with high-level access, helps ensure they align with
current roles and responsibilities.

By raising the profile of poor life cycle management through these practices,
organizations can significantly reduce the security risks associated with AD. Effective
management of object life cycles ensures that stale and potentially harmful accounts
are quickly identified and remediated, maintaining a more secure and resilient IT
environment.

Chapter 3: Identity Modernization 25


Lack of Regular Assessments or Monitoring
Monitoring identity systems presents numerous challenges, including data collection,
categorization of data types, identification, and analysis. The complexity increases when
dealing with vendors and implementing hybrid solutions. Interestingly, more than 50%
of the organizations I have consulted with in the past six years have had extremely
limited or no visibility into their authentication and authorization processes. This
highlights a significant oversight in managing identity security effectively.

The final challenge with legacy AD systems is the lack of regular assessments or
monitoring. Continuous monitoring and regular security assessments are critical for
maintaining a secure identity management environment. However, many organizations
fall short in this area, either due to resource constraints or a lack of prioritization.

Without regular assessments, security gaps can go undetected and misconfigurations


can persist, increasing the risk of data breaches or unauthorized access. Monitoring
tools and practices are essential for identifying abnormal behavior, such as unusual
login patterns or unauthorized access attempts. A lack of effective monitoring can leave
organizations blind to these warning signs, allowing threats to escalate unnoticed.

This detailed examination of the challenges in legacy AD systems underscores the


need for identity modernization. Addressing these issues is critical for organizations to
maintain a robust security posture in the face of evolving threats. The following sections
will explore how modern identity solutions can mitigate these risks and provide a
pathway toward a more secure future.

The Imperative for Identity Modernization


The transition from traditional identity systems to modern solutions is no longer a
question of “if” but “when.” As organizations tepidly adapt to using GenAI technologies
and cloud-based applications, the need for robust and secure identity management has
never been more critical. Microsoft Entra ID emerges as a potential solution, addressing
key challenges in identity management while providing enhanced security, compliance,
efficiency, and user experience. This section delves into the specific benefits of Entra ID
and why it’s one of the leading contenders of identity modernization.

Security
Security is the cornerstone of identity modernization, and Microsoft Entra ID delivers a
robust framework for protecting against suspicious activities and unauthorized access.
The secure access management system is designed to counter emerging threats,
especially in the context of GenAI technologies like AI copilots and complex cloud
applications. Entra ID effectively assigns permissions and monitors access, ensuring that
identities are secure and supervised.

A critical aspect of modern identity management is the on-behalf-of (OBO) flow, where
a web API uses a different identity to call another web API. This OAuth-based delegation
requires careful handling to prevent security risks. Entra ID controls permissions and

Chapter 3: Identity Modernization 26


monitors access to ensure these interactions are secure. This level of security is crucial
when dealing with GenAI, as these technologies often require access to sensitive data
and systems.

Compliance
Compliance is a major concern for organizations operating in regulated industries or
working with government entities. Microsoft Entra ID is designed to meet stringent
compliance standards, such as the Federal Risk and Authorization Management Program
(FedRAMP), the General Data Protection Regulation (GDPR), the Health Insurance
Portability and Accountability Act (HIPAA), and others, ensuring that organizations can
maintain regulatory requirements. This compliance focus is particularly important for
industries like healthcare, finance, and government, where data security and privacy
are paramount.

Entra ID’s compliance features help organizations achieve and maintain certification
with various standards, reducing the risk of noncompliance and the associated
penalties. The solution’s built-in controls and audit capabilities allow organizations
to demonstrate compliance with ease, providing peace of mind in a complex
regulatory landscape.

Microsoft has continuously invested in improving security features within Entra ID to


help organizations bolster their security posture without requiring highly specialized
cybersecurity professionals. Key initiatives and features include:

• Conditional access—This allows organizations to set granular access policies


based on user location, device state, and risk level, providing a dynamic and
context-aware approach to security.
• Identity protection—Using machine learning and behavioral analytics, identity
protection detects and mitigates identity-based risks by identifying suspicious
activities such as atypical sign-ins or brute force attacks.
• Multifactor authentication (MFA)—MFA adds an additional layer of security,
making it significantly harder for attackers to gain unauthorized access. Entra
ID simplifies MFA deployment, making it accessible even to organizations with
limited IT resources. Entra ID supports phishing-resistant MFA, which provides a
stronger defense against sophisticated phishing attacks by using methods such
as hardware tokens and biometric authentication. Additionally, Entra ID includes
support for passkeys, a modern and secure authentication method that eliminates
the need for traditional passwords and further strengthens security.
• Self-service password reset—This feature empowers users to reset their
passwords without IT intervention, reducing the administrative burden and
improving user experience.
• Integrated compliance management—Entra ID provides tools to manage and
monitor compliance, including audit logs, compliance scorecards, and policy
templates aligned with various regulatory requirements.

Chapter 3: Identity Modernization 27


Organizations can leverage these features to enhance their security posture without
needing a team of highly specialized cybersecurity professionals. By adopting Entra ID,
organizations can:

• Simplify security management—Entra ID’s intuitive interface and integrated


tools reduce the complexity of managing security, making it accessible to
general IT staff.
• Automate compliance reporting—Built-in audit and reporting tools streamline
compliance processes, enabling organizations to quickly generate the necessary
documentation for regulatory audits.
• Implement best practices—Entra ID provides preconfigured policies and
recommendations based on industry best practices, ensuring that even
organizations with limited expertise can implement robust security measures.
• Reduce costs—By leveraging Entra ID’s cloud-based infrastructure, organizations
can reduce the costs associated with maintaining on-premises security solutions
and the need for specialized staff.

Remaining Gaps and the Need for Specialists


Despite the significant advancements and tools provided by Microsoft Entra ID, certain
gaps still require the expertise of cybersecurity specialists, including:

• Advanced threat detection—Although Entra ID offers robust protection against


many common threats, advanced persistent threats (APTs) and sophisticated
cyberattacks may still require specialized detection and response capabilities.
• Custom security policies—Organizations with unique or highly specialized security
requirements may need custom policies and configurations beyond what Entra ID
provides out of the box.
• Incident response—In the event of a security breach, skilled cybersecurity
professionals are crucial for effective incident response, forensic analysis, and
remediation.
• Continuous security improvement—Ongoing security assessments and
improvements often require specialized knowledge to stay ahead of emerging
threats and vulnerabilities.

Passwordless Authentication
Passwordless authentication is a significant advancement in identity management,
offering both convenience and enhanced security. Microsoft Entra ID supports
passwordless authentication methods, utilizing FIDO (Fast IDentity Online), which
incorporates the latest web authentication (WebAuthn) standard. This approach
eliminates the need for traditional passwords, reducing the risk of phishing and
credential theft.

Chapter 3: Identity Modernization 28


Passwordless authentication not only improves security but also enhances the user
experience by providing a more intuitive and straightforward login process. Users can
authenticate using biometric methods or hardware-based security keys, ensuring a high
level of security without compromising ease of use.

However, SMEs often face significant challenges when it comes to adopting new
technologies like passwordless authentication. Limited capability to adapt to these
advancements can slow down adoption rates, primarily due to:

• Resource constraints—SMEs often have limited IT resources and budgets, making


it difficult to invest in the latest technologies and infrastructure required for
passwordless authentication.
• Legacy systems—Many SMEs still rely heavily on legacy systems that are not
compatible with modern authentication methods. Integrating passwordless
solutions with these outdated systems can be complex and costly.
• Technical expertise—The lack of specialized cybersecurity professionals
within SMEs can hinder the implementation and management of advanced
authentication methods, leaving them reliant on traditional, less secure password-
based systems.

How Microsoft Entra ID Can Help SMEs


Despite these challenges, Microsoft Entra ID provides solutions that can help SMEs
overcome barriers to adopting passwordless authentication, such as:

• Simplified implementation—Entra ID offers user-friendly interfaces and


straightforward integration processes, reducing the complexity of deployment
even for organizations with limited IT expertise.
• Cost-effective solutions—By leveraging cloud-based infrastructure, SMEs can
reduce the costs associated with on-premises solutions and make passwordless
authentication more affordable.
• Comprehensive support—Microsoft provides extensive documentation, training
resources, and support services to assist SMEs in adopting and managing
passwordless authentication technologies.
• Interoperability—Entra ID is designed to work with a wide range of devices and
systems, making it easier for SMEs to integrate passwordless authentication with
their existing infrastructure.

Innovation
Microsoft Entra ID is built on open standards, fostering innovation and facilitating
trustworthy interactions. By adopting open standards, the platform encourages
interoperability and reduces business inefficiencies associated with proprietary identity
management systems. This focus on innovation allows organizations to benefit from

Chapter 3: Identity Modernization 29


the latest advancements in identity management, ensuring they remain at the forefront
of security and technology trends. Entra ID’s commitment to innovation means that
organizations can rely on a solution that evolves with the industry, providing ongoing
value and reducing the need for costly system overhauls.

Governance
Finally, Entra ID includes robust identity governance capabilities, ensuring that proper
access controls are in place. The platform offers life cycle workflows for onboarding and
offboarding users, managing role changes, and ensuring that access is appropriately
managed throughout an employee’s tenure. These governance features are essential for
maintaining a secure and compliant identity management environment.

Identity governance is crucial for organizations that need to demonstrate compliance


and maintain security across a diverse workforce. By providing automated workflows
and comprehensive audit trails, Entra ID simplifies governance and reduces the risk of
unauthorized access or stale objects.

The combination of security, compliance, efficiency, user experience, passwordless


authentication, flexibility, integration, innovation, and governance makes Microsoft
Entra ID a compelling choice for identity modernization. These features address the
unique challenges faced by organizations in a rapidly changing technological landscape,
providing a pathway to a more secure and efficient future.

Zero Trust and Conditional Access Controls


The concept of zero trust has become a fundamental principle in modern security
frameworks, emphasizing the importance of not trusting any user or device until proven
otherwise. Conditional Access, a core component of Microsoft Entra ID, plays a crucial
role in implementing zero trust by providing flexible, risk-based controls that adapt to
changing circumstances. In this section, we explore how Conditional Access integrates
with Entra ID to enhance security, offering a dynamic approach to managing user access.

Risk-Based Policies
Conditional Access introduces the ability to define risk-based policies that respond
to varying levels of risk. By integrating with Microsoft Entra ID, Conditional Access can
calculate a risk score for each user or sign-in attempt, allowing organizations to enforce
MFA or other security measures in high-risk scenarios. This approach helps mitigate
threats by adapting to the specific context of each access request, ensuring that users
and devices are authenticated based on their risk profile.

Risk-based policies are especially useful in environments with high user turnover
or remote work scenarios, where the risk of unauthorized access may be higher. By

Chapter 3: Identity Modernization 30


analyzing user behavior, sign-in patterns, and other factors, Conditional Access can
identify potentially risky activities and enforce additional security controls to reduce the
risk of a security breach.

Flexible Access Rules


Conditional Access provides a high degree of flexibility in defining access rules, allowing
organizations to create tailored policies based on their unique security requirements.
Organizations can specify how and where users can access resources authenticated with
Entra ID. For example, they can require phishing-resistant MFA from specific locations
or devices, activate certain roles only in designated scenarios, or block access from
embargoed nations.

This flexibility enables organizations to implement policies that align with their security
goals while minimizing the impact on user experience. By allowing more stringent
controls for high-risk situations and relaxing them for trusted scenarios, Conditional
Access supports a balanced approach to security and usability.

Continuous Access Evaluation


A key aspect of zero trust is the continuous evaluation of user access throughout a
session. Conditional Access employs continuous access evaluations (CAE) to ensure the
user’s circumstances have not changed during their session. For example, if a user’s
location changes unexpectedly or their account status is updated, Conditional Access
can automatically re-evaluate their access permissions and enforce additional controls
if necessary.

CAE provides an added layer of security by detecting anomalies and unusual behavior
in real time. This feature is critical in environments where users move between
different locations or devices, as it helps prevent unauthorized access due to changing
circumstances.

Identity Protection Risk Controls


Identity protection risk controls are another significant feature of Conditional Access,
allowing the system to respond dynamically to unusual circumstances during a logon
attempt. If the system detects a strange location, a known bad actor’s IP address, or
other suspicious indicators, it can require additional authentication factors or block
access altogether. This adaptive approach helps protect against emerging threats and
reduces the risk of unauthorized access.

Risk controls are particularly valuable in preventing phishing attacks and other types
of social engineering. By adjusting security requirements based on real-time risk
assessments, organizations can better protect their resources without relying solely on
static security measures.

Chapter 3: Identity Modernization 31


Integration with Modern Technologies
Conditional Access policies are essential for the secure use of modern technologies,
including generative AI and cloud applications. These policies ensure that only
authorized users can access sensitive resources, providing an additional layer of security
in environments where AI technologies require access to critical data. By integrating with
Entra ID, Conditional Access can enforce strict controls over who can access generative AI
technologies, reducing the risk of unauthorized data exposure.

The integration capabilities of Conditional Access also extend to other modern


technologies, allowing organizations to implement a unified security strategy across
their entire IT landscape. This broad compatibility helps maintain a consistent security
posture, even as technology evolves.

Baseline Security
Microsoft provides a baseline set of Conditional Access policies to customers, which
has been shown to reduce compromises by up to 80% when turned on. These baseline
policies offer a starting point for organizations to implement basic security measures,
including MFA, device compliance checks, and other common controls. By adopting
these baseline policies, organizations can quickly improve their security posture and
reduce the risk of security incidents.

The baseline policies serve as a foundation for building more complex and tailored
Conditional Access rules. They allow organizations to adopt a zero trust mindset without
extensive customization, providing an immediate boost to security while allowing for
future adjustments as needed.

The flexibility, adaptability, and integration capabilities of Conditional Access make it


a critical tool in implementing zero trust and ensuring a robust security posture. By
embracing risk-based policies, continuous access evaluations, adaptive controls, and other
features, organizations can create a dynamic and secure identity management system.
These controls are vital in modernizing identity and protecting against emerging threats,
ensuring a secure foundation for GenAI and other advanced technologies.

Integrating Legacy Systems with Modern Identity Solutions


One of the primary challenges of identity modernization is integrating legacy systems
with modern identity solutions. Many organizations have invested heavily in their existing
infrastructure, making a complete overhaul impractical. However, with the right approach,
it’s possible to integrate legacy systems without compromising security or productivity.

A key strategy is to implement a phased approach to modernization, gradually


transforming applications and transitioning users to the new identity platform. This
approach minimizes disruption and allows for thorough testing at each stage. One of the
stages of this modernization will likely involve securing existing legacy applications that

Chapter 3: Identity Modernization 32


for a variety of reasons cannot be updated or modernized. This section explores how
modern identity solutions can integrate with legacy systems, with a focus on Microsoft
Entra Private Access and its role in enabling ZTNA.

Identity as the Foundation of Zero Trust


Zero trust has become a guiding principle for modern security frameworks, emphasizing
that no user or device should be trusted by default. The shift toward identity as the
foundation of network security reflects the need for more granular and dynamic control
over access. This shift requires modern identity solutions that can adapt to changing
circumstances and offer a high degree of flexibility.

Microsoft Entra ID embodies this approach, providing a comprehensive identity


management system that integrates with network security. By leveraging Conditional
Access, CAE, and other advanced features, Entra ID allows organizations to implement
zero trust in a way that protects both modern and legacy systems. And, yes, even legacy
applications can take advantage of these powerful features.

The Challenge of Legacy Systems


As we mentioned earlier, despite the advantages of modern identity solutions, many
organizations face significant challenges when integrating them with legacy systems.
Active Directory, a staple of enterprise networks for more than 25 years, often serves as
the backbone for authentication and authorization in legacy environments. However, its
architecture and protocols are not inherently aligned with modern security practices,
creating a barrier to adopting zero trust (see Figure 1).

Figure 1. Challenges of Legacy Systems

Chapter 3: Identity Modernization 33


Legacy systems can be complex, with outdated protocols, misconfigurations, and
technical debt complicating the integration process. Additionally, many legacy
applications are still in use, and transitioning to modern identity solutions requires
careful planning to avoid disrupting business operations.

Protecting Legacy Systems with Entra ID Private Access


The key to protecting legacy systems while embracing modern identity solutions lies in
ZTNA solutions like Microsoft Entra Private Access. This platform removes the risk and
operational complexity of traditional VPNs while boosting user productivity. It provides
secure access to private apps and resources from anywhere, using identity-centric ZTNA
(see Figure 2).

Figure 2. Microsoft Entra Private Access

Entra Private Access enables organizations to quickly and securely connect remote users
from any device and network to private applications—whether on premises, across
clouds, or anywhere in between. This seamless connection eliminates the excessive
access and lateral threat movement commonly associated with legacy VPNs.

Features of Entra Private Access


Microsoft Entra Private Access offers a range of features that support zero trust and help
protect legacy systems, including:

• Granular app segmentation—Limit threat exposure by defining granular app


segments and microsegments at the user, process, or device level. This level of
segmentation allows for more precise control over access to private apps across
hybrid and multicloud environments.

Chapter 3: Identity Modernization 34


• Adaptive per-app access controls—Control access to private apps based on
Conditional Access policies. These policies adapt to changing risk profiles and
ensure that users meet specific security requirements before gaining access.
• Continuous access evaluations (CAE)—CAE ensures that users’ circumstances have
not changed during a session. This continuous monitoring helps detect unusual
behavior or account changes that could indicate a security threat.
• Optimized local access—Deliver fast and seamless access experiences through
optimized local access. This feature enhances user productivity while maintaining
a high level of security.

Enabling Modern Security for Legacy Applications


One of the significant benefits of Entra Private Access is its ability to protect legacy
applications with modern identity features. By integrating with Conditional Access and
CAE, organizations can enforce security policies on legacy systems that would otherwise
be vulnerable to modern threats. This capability allows organizations to maintain their
existing infrastructure while gaining the security benefits of a modern identity solution.

The ability to enforce per-app access controls based on Conditional Access policies is
a game-changer for legacy applications. It means organizations can apply the same
level of security and oversight to legacy systems as they do to modern cloud-based
applications. This uniform approach to security helps reduce the risk of unauthorized
access and lateral threat movement.

Integrating legacy systems with modern identity solutions like Microsoft Entra Private
Access is a critical step toward achieving a zero trust strategy. By providing secure access
to private apps, adaptive controls, and granular app segmentation, Entra ID bridges the
gap between legacy and modern security. This integration not only protects existing
investments in legacy infrastructure but also paves the way for a more secure and
efficient future.

Organizations that embrace modern identity solutions while addressing the challenges
of legacy systems can significantly improve their security posture. By focusing on
identity as the foundation of network security, implementing ZTNA, and leveraging
Conditional Access, businesses can ensure that they are equipped to navigate the
evolving threat landscape.

Chapter 3: Identity Modernization 35


Conclusion
Identity modernization is no longer a luxury. It’s a necessity for organizations
navigating the evolving landscape of cloud technologies. As GenAI and other advanced
technologies become more prevalent, the risks associated with outdated identity
systems increase. Platforms like Microsoft Entra ID offer a pathway to modernize identity
while maintaining compatibility with legacy systems.

The traditional approaches to identity management, often rooted in legacy systems like
Active Directory, can no longer keep pace with the demands of modern security. As a
result, organizations must embrace modern identity solutions that prioritize zero trust
principles, adaptive security, and seamless integration with existing infrastructure.

Microsoft Entra ID stands out as a comprehensive platform that addresses these


challenges, providing a robust framework for modern identity management. Through
features like Conditional Access and adaptive per-app access controls, Entra ID offers
a flexible and dynamic approach to identity security. The integration of Microsoft
Entra Private Access with legacy systems enables organizations to protect their
existing investments while reducing the risk of unauthorized access and lateral
threat movement.

The emphasis on risk-based policies, flexible access rules, and innovative security
controls ensures that organizations can maintain a high level of security without
compromising user experience. This balance between security and usability is crucial as
organizations strive to enhance productivity and foster a seamless digital environment.

In this chapter, we’ve explored the unique challenges of legacy systems and the
solutions offered by modern identity platforms. By adopting a zero trust mindset
and leveraging Conditional Access, organizations can effectively secure their identity
infrastructure while adapting to the evolving threat landscape.

Technology has the potential to address many of the challenges associated with
legacy authentication systems. Advanced solutions like Microsoft Entra ID offer robust,
modern authentication methods, including passwordless options that significantly
enhance security and user experience. By adopting these technologies, organizations
can reduce the risks associated with outdated protocols and improve compliance with
regulatory standards.

However, although technology provides powerful tools to mitigate the risks of legacy
authentication, the importance of a well-thought-out architecture and careful
implementation cannot be overstated. Simply adopting new technologies without
considering the broader context of the organization’s IT environment can lead to new
vulnerabilities and operational challenges.

Chapter 3: Identity Modernization 36


Key considerations should include:

• Comprehensive planning—A thorough assessment of existing systems and


careful planning is essential to ensure a smooth transition from legacy to modern
authentication methods.
• Integration with legacy systems—Effective integration strategies are needed
to bridge the gap between old and new systems, ensuring compatibility and
minimizing disruptions.
• Ongoing management and monitoring—Continuous monitoring and management
are crucial to maintain the security and efficiency of the authentication system.
This includes regular audits, updates, and the adoption of best practices.
• User training and awareness—Ensuring that users are well-trained and aware
of new authentication methods is critical for successful implementation and
user adoption.

In conclusion, although advanced technologies like Microsoft Entra ID can significantly


improve the security and functionality of authentication systems, they must be
implemented as part of a comprehensive strategy that includes careful planning,
integration, and ongoing management. By balancing technological advancements with
robust architectural and implementation practices, organizations can effectively resolve
legacy authentication issues and build a more secure and resilient IT environment.

Chapter 3: Identity Modernization 37


Chapter 4

Evolving Cloud Security with a


Modern Approach

Written by Dave Shackleford and Anton Chuvakin

i
Introduction: Why Bad Things Keep Happening in Cloud
Over the course of the past several years, we’ve seen some alarming, but ultimately
expected trends from attackers. They’re continuing to target end users and credentials,
as well as known vulnerabilities in web apps and network services, but … there’s
something new. It’s becoming obvious that attackers and more sophisticated campaigns
are focusing on cloud deployments more than ever.

The evidence for this is mounting. In its 2024 Threat Detection As cloud usage grows and evolves, attackers are paying
Report, Red Canary research found that cloud account compromise more attention.
was the fourth most prevalent MITRE ATT&CK® technique used by
threat actors in 2023 (a 16x increase from 2022).1 The “Five Eyes”
nations (US, UK, Australia, New Zealand, and Canada) also released a joint report
detailing evolving tactics for cloud attacks and compromise seen with the APT29 threat
actor operating out of Russia.2 This group, which goes by many names, is believed to be
a state-sponsored team that is responsible for the recent “Midnight Blizzard” attacks
against Microsoft and others. In short, the attackers have found that cloud deployments
are targets ripe for the picking, and we’re likely to see that trend continue in the next
several years.

To that end, organizations need to ask themselves whether their models of cloud
security are in line with current best practices. There are many advantages to cloud
native security controls and processes, and some of the traditional models of security
controls and practices may not be the most efficient or effective. It’s never a bad time
to take stock of security strategies and consider how we might improve and elevate our
capabilities, particularly with the advantages that leading cloud environments offer.

Old Things Done New Ways: Why Core Best Practices Are Still Good, but
Likely Need Updates
For most organizations born before cloud, a natural path to the cloud includes adapting
tried-and-true controls and processes to cloud environments. For some scenarios, this
actually can work well. For example, a “lift and shift” effort will likely mean you’re moving
legacy infrastructure and applications to a cloud hosting environment, with traditional
databases, operating systems, and workload models. Existing patching and configuration
management models will probably be easily adapted into cloud environments with
minor updates (image management, cloud service offerings and permissions, etc.).
Things will ultimately work well enough, even if not every benefit of the cloud will be
realized in the short term.

_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

¹ “Welcome to the 2024 Threat Detection Report,” https://redcanary.com/threat-detection-report/


² “SVR cyber actors adapt tactics for initial cloud access,” National Cyber Security Centre, www.ncsc.gov.uk/news/svr-cyber-actors-adapt-tactics-for-
initial-cloud-access

Chapter 4: Evolving Cloud Security with a Modern Approach 38


However, for a vast number of more progressive cloud deployment options, the “old
ways” of doing things probably need more updates. Some of these are obvious: Cloud is
NOT the same as our traditional on-premises environments, and some services should
be treated differently as a result (for example, serverless or even containers). In many
cases, though, it’s just a subtle mindset shift with some new tactics and processes that
reflect the dynamic nature of cloud assets and deployment strategy. Things move faster
in the cloud, without a doubt! Developers make more decisions on their own. Security
architecture, operations, and controls models need to adapt accordingly.

The Top 10 Things to Do for Sound Cloud Security


Deciding which of the varied types of cloud controls and security categories are the
most important is a challenge. But we’ve learned a lot in the past decade or two of
practicing cloud computing. When approaching each of these, it’s critical to ask the
question: “Is the current model we have in place working?”

A second question could be: “Are we doing things as efficiently and effectively as we
could?” And lastly: “Are we missing something important by not doing security the
cloud-centric way?”

Let’s explore some of the most important cloud controls and categories of security and
look at classic models of implementation and management, more cloud native models,
and how they differ.

Governance for Cloud/Asset Discovery and Asset Management


Cloud governance is an area that security teams are internally debating and refining
more than ever. Governance describes how different groups work with each other,
report to each other, and provide data and metrics to each other. There are a number of
considerations for implementing cloud governance.3

In early implementations of cloud governance models, we’ve seen two consistent


themes. First, many teams have been operating in silos without any alignment.
The second, and often even more pervasive issue, is the lack of coordination and
cooperation across teams and technical/business areas that needs to be in place
to ensure that everyone is on the same page. Given that cloud architecture and
deployment models are much more converged than ever, it’s imperative to move away
from isolated teams that don’t know what other teams are planning and doing. In fact,
even separating a cloud security team from the rest of security may eventually evolve
to be an anti-pattern. As Mandiant data indicates, in many cases, the compromise
spans the cloud and traditional environments, and on-premises mistakes lead to
cloud incidents.4

___________________________________________________________________________________________________________________________________________________________________________________________________________________________________

3 “Leading through change: 5 steps for executives on the cloud transformation path,” March 2024,
https://cloud.google.com/transform/leading-through-change-5-steps-for-executives-on-the-cloud-transformation-path
4 “Cloud compromises: Lessons learned from Mandiant investigations in 2023,” www.youtube.com/watch?v=Fg13kGsN9ok

Chapter 4: Evolving Cloud Security with a Modern Approach 39


To better accommodate modern day-to-day cloud engineering, oversight, and
administration (including change management), organizations should design a
governance model with the following team breakdown:

1. Central DevOps and cloud engineering—This team should manage the DevOps
pipeline (code, builds, validation, and deployment). Security tools, like static
code assessment and dynamic web scanning, should ideally be integrated with
automation. This should be a multidisciplinary team that includes developers
and infrastructure specialists who have adapted their skills to infrastructure as
code (IaC) and more software-defined environments.

2. Workload image management—Ideally, to maintain some degree of separation


of duties, a distinct team (perhaps admins for OS builds) can build and maintain
a repository of container and workload images that are then used by developers
within the pipelines intended for cloud deployment.

3. Identity and access management (IAM) —The most mature governance models
include a separate IAM team that manages directory service integration,
federation, and single sign-on (SSO), as well as policy and role definitions within
SaaS, PaaS, and IaaS environments. If this is not a definitive team, there should
be at least a small number of IT operations and/or DevOps engineers focused on
this for a significant amount of time.

4. Information security—Infosec should be aligned across all of these teams to


integrate scanning tools and standards for acceptable code (bugs), system/image
vulnerabilities, pipeline monitoring, and secrets management among other
things. Standard definitions for network security parameters and tools should
also be defined and maintained. Even if there is not a separate cloud security
team, cloud security expertise on the team is a must for cloud adoption.

To ensure cohesion across teams, there should exist a Cloud Governance Committee
(or Cloud Center of Excellence)5 that includes representatives from all of these areas,
as well as dotted-line representation from legal, compliance, audit, and technology
leadership. Without a dedicated focus on cloud governance and oversight of what the
organization’s standards and processes are, it’s highly likely that “shadow cloud” will
crop up as different teams start deploying resources and “experimenting” (not always
with risk in mind!) with various cloud services. Executive support for a central committee
and coordination process is crucial. Communicating the appropriate cloud deployment
processes and approach also is important so everyone knows how best to develop and
deploy cloud assets to approved providers.

_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

5 “Building a Cloud Center of Excellence,” https:/ /services.google.com/fh/files/misc/cloud_center_of_excellence.pdf

Chapter 4: Evolving Cloud Security with a Modern Approach 40


So how do cloud consumers go about developing a governance model? The first stage—
the “requirements definition” phase—should ideally consist of the following major
tasks and goals:

• Determine the business needs for a cloud service—This should include costs,
savings, and pros and cons of insourcing and outsourcing. When evaluating
the different types of CSP offerings, keep in mind the types of responsibility for
different computing elements, as this will significantly impact the security and
governance of cloud services for most organizations. Any outsourced elements will
be much more difficult to evaluate on a regular basis in most cases, and this also
may impact compliance posture.
• Determine policy and compliance—This should be done with input from legal
and audit teams. For this reason, most large organizations will undoubtedly want
to ensure that legal and audit teams are represented within the governance
model chosen.

Once these factors have been collectively brought together as a formal requirements
definition for cloud services, providers can be evaluated based on these needs.
Contracts, control responsibilities, and auditing can then be hashed out accordingly,
which leads to a larger cloud security discussion.

For day-to-day governance, distinct teams broken out should include:

• DevOps and cloud engineering


• Workload Image creation and management for containers, workloads, etc.
• Information security
• Identity and access management
• (Possibly) distinct application development teams

A cross-functional governance board or committee also should be in place to shape


current and future cloud initiatives for the organization.

Workload Configuration and Patching


More organizations than ever are moving workloads to the cloud, as well as building
more innovative cloud-native application models in IaaS and PaaS cloud environments.
With this growth in cloud workloads and services comes an increasing need to ensure
all workloads and services are secured according to best practices, security monitoring
is enabled for the entire cloud environment, and the cloud control plane and all assets
in the cloud are protected from attacks. In the past, many organizations relied on the
tried-and-true model of building “gold images” and patching them on a regular cadence
to ensure desired configuration state and patch levels.

However, a challenge that may impact security operations and monitoring is the pace
of deployments in DevOps pipelines. With developers making frequent changes and
new workloads starting frequently, it may be easier for security operations to keep track

Chapter 4: Evolving Cloud Security with a Modern Approach 41


of inventory and asset state. Also, ephemeral workloads may only run for very short
periods of time, which could hinder traditional approaches to monitoring and visibility.

Newer, more modern models of workload configuration and patching should shift
toward tearing down workloads that don’t meet desired patching and configuration
requirements and replacing them with new workloads based on updated images. Some
of the assessments also need to shift from assessing running workloads to including the
assessment step into a build pipeline.

Privileged Identity Management


Ah, the old days. In many enterprises, privileged identity management was only available
as a separate concept of policy assignment through Group Policy Objects (GPOs) or
other centralized controls, or as a complex and separate IAM product that has been
difficult to manage and automate. Alas, the 1990s are over!

Today, in the cloud, privileged identity management is often much more integrated
into the cloud platform itself and was designed for automation from the start. Cloud
privileged identity management also comprises identity relationship and entitlements
mapping and risk analysis, cloud IAM and configuration through cloud security
posture management (CSPM) solutions, privileged user management, and just-in-time
access management, as well as SSO and federation for identities. Cloud infrastructure
entitlement management (CIEM), a whole new technology category, has emerged. And
here’s the great news: This is usually built in, and organizations just need to manage and
control policies and IAM groupings and federation effectively to significantly enhance
controls over privileged access.

Network Access Controls and Segmentation


Overly permissive cloud network access and segmentation controls are common areas
of cloud misconfigurations. These access control lists are defined as policies that can
be applied to cloud subscriptions or individual workloads. This often comes down to
unrestricted inbound and outbound TCP/UDP ports within cloud native access controls
models such as HTTP/HTTPS, which can lead to overly exposed services and workloads.
Many organizations have tried to apply traditional network security models to cloud,
whether through VPN or traditional firewall appliances. Given the dynamic nature of
cloud environments, this may not work well.

To mitigate this issue, security and operations teams should review all security groups
and cloud firewall rule sets to ensure only the network ports, protocols, and addresses
needed are permitted to communicate. Rule sets should never allow access from
anywhere to administrative services running on ports 22 (Secure Shell) or 3389 (Remote
Desktop Protocol).

In some cases, organizations have connected workloads to the internet accidentally or


without realizing what services are exposed. This exposure allows would-be attackers
to assess these systems for vulnerabilities. Another evolving strategy that is becoming
more prevalent with cloud access is zero trust and microsegmentation at the workload

Chapter 4: Evolving Cloud Security with a Modern Approach 42


level, whether traditional, virtual machine, or containers. A key theme to adopt in
network access control for the cloud is parameterization of each workload, meaning
all VMs and containers have their own isolation boundaries, and these can be set and
managed programmatically through policy. This is much easier to accomplish in the
cloud through IaC and cloud provider service configuration than in mixed on-premises
environments. Managing these network access rules—essentially, host firewall rules—
created a need for new technology as well.

Business Continuity, High Availability, and Resilience


Most organizations are coming around to the idea that traditional models of business
continuity and disaster recovery, which usually rely on dual data center infrastructure,
traditional storage and backups, and more manual processes, are becoming more
unwieldy and expensive to maintain.

In the cloud, building a more resilient and available infrastructure is actually much
simpler. Leading providers offer a wide range of cloud regions and zones, fully
automated high availability HA and failover controls within load balancing and other
components, and a highly redundant and resilient cloud storage infrastructure (as
beginning points). Most provider SLAs are also equivalent or better than many hosting
providers’ data centers.

Secrets and Key Management


For most organizations, managing sensitive secrets (including encryption keys, API
keys, passwords and other credentials, connection strings, and more) has proven
immensely challenging within a diverse technology ecosystem. Most mature enterprises
have managed secrets through a complex combination of technologies that includes
key management systems (KMS), hardware key storage (hardware security modules,
or HSMs), and privileged access management (PAM) platforms that involve check-in
procedures and time-based secrets grants. Cloud Key Management System (Cloud KMS),
Google Cloud’s key management service, offers software-based encryption or hardware-
backed HSMs, easily imported keys from on-premises cryptographic systems, simple
rotation and policy control over keys, and much more.

The following are examples of traditional best practices for secrets management that
should be considered when designing a secrets management strategy:

• No secret should be written to disk in cleartext or transmitted over a network in


cleartext, and any tools you utilize should ensure this is not done.
• All secret life cycle and access events should be recorded in an incorruptible audit
log, which should ideally be sent to a remote, immutable log store.
• Secret distribution should be coordinated by an authoritative delegator such
as a container/service scheduler or working in a close trust relationship with
the scheduler.
• Operator access to secret cleartext should be limited with least privilege access
models to secret data and values.

Chapter 4: Evolving Cloud Security with a Modern Approach 43


• Secret versioning should be easier to accomplish than revealing cleartext
whenever possible.
• All infrastructure components related to secret management and distribution
should be mutually authenticated using keys and/or certificates
• Secure system configuration should be implemented for any platform managing
secrets or storing secret data.
• The attachment of a secret to a service or container should be protected by strong
access control mechanisms and role-based access control is preferred.

Fortunately, enterprise PaaS/IaaS clouds have incorporated these capabilities into


native service offerings. Within Google Cloud, all of these considerations are embedded
into cloud services used to build applications (workloads, storage, orchestration, etc.)
and centralized within Secret Manager, a service that manages secret versioning, access
controls, life cycle and rotation, and audit trails simply and centrally.6 In addition, secrets
can be automatically detected within Google Cloud DLP with a catalog of detectors that
identify and flag them for protection within Secret Manager.7

Data Security and Protection


For many organizations, protecting sensitive data was solely the realm of encryption,
primarily through encrypted drives on servers or within a larger storage environment
such as a SAN or NAS. Specialized encryption within databases (column-level or other
types) has also been somewhat common in traditional enterprises. Another core
control for data security has been data loss prevention (DLP), which tracks and controls
sensitive data movement in enterprise environments.

What’s great about the transition to cloud is that none of these data protection controls
go out the window at all. In fact, database and workload volume encryption are all
readily available, with workload volume encryption enabled by default. DLP is also more
accessible than ever for many organizations.

Cloud providers have the capability to implement encryption at scale fairly easily, and
accordingly, all data at rest within Google Cloud is encrypted by default. For some
organizations, this automatic encryption will prove sufficient for protecting data at rest
for both workloads shifted into the cloud and new cloud native application stacks.
For others, customer-generated encryption keys will be preferred or required for
compliance, and this is also easily managed through Cloud KMS. Google has added
additional encryption solutions for data processing in Compute Engine and BigQuery as
well. The Cloud External Key Manager service provides segregation for encryption keys
with external key storage in a third-party environment for Google Cloud Platform data,
and encryption policies for access and use still apply here.8

_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

6 https://cloud.google.com/security/products/secret-manager
7 https://cloud.google.com/security/products/dlp?hl=en
8 https://cloud.google.com/kms/docs/ekm

Chapter 4: Evolving Cloud Security with a Modern Approach 44


To track and control sensitive data, many organizations turn to DLP tools and services,
which can be notoriously difficult to implement and maintain. Within a cloud
environment, discovering, classifying, and tracking data requires deep integration with
a cloud provider’s storage infrastructure. Fortunately, Google Cloud DLP provides the
following benefits:

• All data discovery is automated for BigQuery databases. This is important, as large
scale data storage environments like BigQuery are constantly changing and may
be difficult to monitor without automation.
• DLP can be implemented across a variety of storage types, including BigQuery,
Cloud Storage, and Datastore. Providing DLP across a range of different data
storage types in a cloud environment is critical for security professionals needing
comprehensive data protection coverage.
• Google Cloud DLP can help organizations flexibly protect data with policies to
classify, mask, tokenize, and transform data as desired.

Cloud DLP, for many organizations, may prove simpler to implement and maintain
than traditional on-premises DLP options—and at lower cost. Even organizations that
traditionally didn’t implement DLP due to cost and complexity can now affordably
discover, track, and protect data with this advanced capability.

Vulnerability Management/Vulnerability Scanning


Vulnerability management has been a notoriously challenging area for security teams
for a long time. Many teams today still rely on vulnerability scanning and agent
reporting—along with configuration and patch management tools and practices—to keep
systems locked down. In theory, these are all sound controls and methods that have
helped security and operations teams keep systems and applications as up to date and
secure as possible (with the understanding that every organization is somewhat unique
and there are many organizational differences).

Vulnerability management for cloud workloads and services is definitely an area that
needs to become more dynamic. Cloud workloads change much more rapidly, and
container and other images tend to now include a vast array of third-party packages
that may be vulnerable. Although some of the traditional vulnerability management
solutions have adapted relatively well to cloud, organizations are wise to look into
cloud-native scanning and repository analysis engines that can aid in providing
more effective continuous monitoring and vulnerability management of all types of
cloud assets.

Logging and Event Management


Traditional logging and event management strategies often fail in the cloud. Many
organizations don’t properly enable logging and monitoring for the right security-
related events, ranging from failed authentication to blocked network traffic to unusual
use of IAM policies and roles. In addition, they often think that sending server and

Chapter 4: Evolving Cloud Security with a Modern Approach 45


network security logs to SIEM on premise will be the most efficient way to maintain
security operational continuity. Many teams also don’t understand cloud enough to log
effectively without breaking the bank.

The reality is that cloud log data and other events are being produced in enormous
quantities, and security teams need to recognize specific indicators quickly, see patterns
of events occurring, and spot events happening in the cloud environments where the
events are occurring. Sending logs and cloud telemetry and observability data to a
cloud SIEM makes a lot more sense today. Machine learning (ML) and AI also can easily
augment massive event data processing technology to build more intelligence detection
and alerting tactics. Google Security Operations is an excellent example of a massive
scale event management engine that leverages AI and ML capabilities.9

Incident Response and/or Forensics


In years past, many forensics teams considered laptop or workload image acquisition
to be the most prevalent model of evidence collection, and this grew to incorporate
memory and other ephemeral data collection, as well. In the cloud, this proved difficult
for many reasons in the early years of deployments. Fortunately, the cloud now offers
a wide range of forensic artifacts and solutions that include not only cloud workload
image access, but also cloud logging and behavioral monitoring.

Automation has become another major focus area for cloud computing forensics
and incident response. Consider the following activities as potential opportunities to
implement automation:

• Assess the environment—continuously—Use cloud-native tools, such as


Google Security Command Center, to evaluate resources for security conditions,
where possible.
• Locate and tag suspect assets—Any number of network traffic patterns or events
in a cloud environment could indicate suspicious or malicious behavior. One of
the most effective ways to label suspicious assets is by automatically assigning
metadata tags to assets behaving unusually. This enables organizations to track
them and respond more effectively.
• Perform evidence acquisition—Automated processes can be initiated to acquire
evidence, such as memory and disk, along with local processes or indicators of
compromise. Initiate scripts or tools through cloud-compatible methods that
produce logs and audit trails to ensure proper monitoring and chain of custody.
• Remediate—For any remediation efforts—including quarantine of assets or
termination of workloads—automation can help ensure the process is executed
immediately and consistently when suspicious behavior is detected.

_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

9 https://chronicle.security/

Chapter 4: Evolving Cloud Security with a Modern Approach 46


Looking Ahead: Adaptation and Better Security in the Cloud
In 2024 and beyond, we see a variety of trends that will be likely to grow and continue:

• Major emphasis on data protection and privacy—Especially for massive


scale data analytics and processing capabilities that exist across numerous
accounts and regions
• Continued focus on identity and access management—Especially centralized
monitoring and control of identities and privileged identity control and oversight
• A trend toward continuous analysis of trust and privileges within the cloud—
Aligning and focusing assets and workloads/applications based on a principle of
least privilege and access minimization

Although a newer concept, we also definitely see significant growth in ML and AI, both
for business use cases and security analytics.

In all, these types of security controls and services are simply a natural evolution
that reflects the nature of PaaS and IaaS software-defined cloud platforms and
infrastructure. Security operations in large, distributed cloud environments will need
to adapt to accommodate more dynamic deployments and changes, new services and
workloads, and a significantly greater reliance on automation. In the next year and
beyond, it’s likely that all these trends will grow and mature significantly.

Given the attack trends we see, however, organizations should be honest in assessing
their current controls and processes for cloud security and risk management. If you’re
still relying on those from your on-premises data center environments, you’re likely out
of touch with cloud security best practices today.

Chapter 4: Evolving Cloud Security with a Modern Approach 47


Chapter 5

AI Security Challenges, Hype, and


Opportunities

Written by Brandon Evans and Ahmed Abugharbia

i
Introduction
Nearly two decades ago, the public cloud introduced a powerful tool with countless
opportunities and underestimated risks. Today, that hot new tool is generative AI
(GenAI). Although GenAI enables organizations to solve new problems and reduce the
resources necessary to do so, it also enables attackers to leverage new attack vectors.
This is often because organizations do not understand the intricate details of how GenAI
works. At the same time, the security industry sees promise in GenAI to help improve
their operations and tooling. However, although GenAI is highly promising in many cases,
it is useless or counterproductive in some others.

To understand the security risks associated with hosting or building a GenAI application,
let us first dissect how one is typically built.

Terminology, Concepts, and Typical Architecture


Creating a GenAI application involves assembling different components, including
these major ones:

• Large language model (LLM)—These models serve as the core engine for
generating output. An example is a model that generates text.
• Vector database—A database employed to store specialized knowledge bases
tailored for the model’s use, often referred to as VectorDB.
• Retrieval-augmented generation (RAG)—Combines the retrieval of relevant
documents from a large library (a VectorDB) with the generation capabilities of
LLM models to produce more accurate and contextually informed responses.
• Embedding models—These models facilitate the retrieval of pertinent data from
the VectorDB.
• Agents—These are programs that enable communication between a GenAI
application and the external world.

Designing GenAI applications begins with models capable of autonomously creating new
content, such as text or images, from extensive training data. Notable examples include
LLMs, like GPT-3, GPT-4, and BERT, which are trained on vast datasets to understand
and generate human-like language. These models can serve as foundational elements
for various GenAI applications. They can be hosted either locally or in the cloud using
providers like OpenAI, Hugging Face AWS, Azure, or Google Cloud.

The next critical component is a specialized dataset tailored to the application’s specific
task. For example, a chatbot designed to address medical queries about patients’
symptoms and conditions necessitates access to patient data stored in a database. RAG
GenAI applications typically use vector databases for this purpose, efficiently storing,
retrieving, and manipulating the necessary vector data. This is similar to how search and
machine learning (ML) applications operate.

Chapter 5: AI Security Challenges, Hype, and Opportunities 48


Another essential component is the embedding model, which is distinct from the text
generation model. The embedding model’s role is to analyze the user’s input and
retrieve the most relevant data chunks from a VectorDB. Subsequently, the application
combines this retrieved data with the user’s original query and any additional
instructions to form a coherent prompt. This prompt is then sent to the text generation
model, and the final responses generated by the chatbot are based on the output of
this process.

Finally, the application may require access to external components, such as a search
engine. This access is facilitated through agents. An “agent” refers to a program or
system capable of interacting with external services and taking actions to achieve
specific goals. Through agents, GenAI applications can access various tools such as
Google search, a terminal, or external APIs, allowing interaction with other parts of
the application or the environment. For example, there are agents that can read logs
and device conclusions, find answers using Google search, and execute commands
(see Figure 1).

Due to the complexity and myriad


components involved, numerous
projects have emerged to assist
developers in building RAG
GenAI applications and GenAI
applications in general. Many of
these projects are libraries that
abstract the concepts we are
discussing here. A notable example
is LangChain, which simplifies the
process of working with vector
databases, agents, and LLMs
themselves.1
Figure 1. RAG GenAI High-Level Design
In addition to these libraries, we are
also witnessing the emergence of
services aimed at facilitating the publishing and hosting of models. Hugging Face stands
out as a prominent example, providing a hub that enables users to deploy, share, and
collaborate on their models with other users and developers.2

The access requirements of a GenAI application depend on its specific function.


Although a basic chatbot necessitates access to a VectorDB, an embedding model, and
a text generation model, more complex applications may require additional resources
via agents such as external search engines, cloud service providers (CSP) APIs, or
internal systems. For example, consider a GenAI assistant with access to corporate
emails, calendars, meeting transcripts, and recordings. This is where security risks
begin to surface.
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

1 www.langchain.com/
2 https://huggingface.co/

Chapter 5: AI Security Challenges, Hype, and Opportunities 49


Risk Considerations for AI Applications
Based on our understanding of the GenAI applications so far, we can break the risks
associated with these applications into three categories:

• Data risks
• LLM risks
• Application risks

Data Risks
As discussed earlier, the specific requirements of a GenAI application dictate the level
and nature of the access it needs. A GenAI application requiring access to corporate
resources can provide substantial advantages, but it also poses certain risks.

Data Sharing with Models


A crucial issue revolves around how data sent to these models for processing is handled,
raising privacy concerns. Because the process entails retrieving relevant data from a
VectorDB to incorporate into prompts, there is a risk of sharing this data with external
model providers like OpenAI. Examples of such data could be corporate emails, calendar
items, meeting transcripts, and patient information, as previously noted. The model
provider may use this data to improve the quality of their responses. However, this
could result in the unintentional disclosure of one organization’s private data to another
organization using the same model.

Data Poisoning
In addition to concerns about exposing data to external parties, unauthorized access to
the data stored in a VectorDB and used by the models also presents risks. Unauthorized
changes to the data could alter the behavior of the models, resulting in malformed
outputs. This is especially critical if a decision-making process relies on the output of
a GenAI application. An example of this is a GenAI application that helps a purchasing
committee to compare different RFPs. Unauthorized access to the data (the RFPs) could
lead the GenAI application to favor a specific candidate.

LLM Risks
LLMs are central to GenAI applications, and decisions about their usage, deployment,
and communication methods entail significant risks. Key vulnerabilities include
tampering with prompt components (leading to instructions poisoning and prompt
injection) and deploying malicious or untrusted third-party models, which can result
in data manipulation, compromised response integrity, sensitive data leaks, and
unintended behavior.

Chapter 5: AI Security Challenges, Hype, and Opportunities 50


Instructions Poisoning and Prompt Injection
The prompt sent to a LLM is typically comprised of:

• Instructions
• Data fetched from a VectorDB
• User’s query or request

The combination of these elements creates the prompt, which determines the responses
generated by the LLM.

If any of these components are tampered with, it can affect the expected output. We
have addressed the aspects of data poisoning in previous sections. However, if an
attacker gains access to any of the three mentioned components, they not only can
manipulate the data but also the instructions or prompt sent to the model. This could
lead to data exfiltration, manipulation of results, or misuse of the access granted to the
GenAI application.

For example, consider an assistant with access to corporate emails. An attacker could
send a carefully crafted email intended for consumption by the AI assistant to trigger a
specific action. By tricking the AI assistant, the attacker could compel it to expose data
from other accessible components, such as a user’s calendar, thereby gaining insights
into someone’s schedule.

Malicious Models
The proliferation of third-party models shared on platforms like Hugging Face brings
a significant risk of encountering malicious entities. Deploying these models exposes
organizations to various threats, including data breaches, altered outputs, compromised
systems, reputational harm, and compliance violations. Malicious models may illicitly
access sensitive data, produce misleading results, or harbor vulnerabilities that enable
unauthorized access.

This risk was exemplified in a recent incident reported by “The Hacker News” in March
2024. The discovery of more than 100 malicious AI and ML models on platforms like
Hugging Face underscored the severity of the threat.3

Untrusted Models
Malicious models demonstrate how LLM marketplaces are yet another supply chain.
It is unlikely that organizations are going to extensively vet all the third-party models
they use. This is especially unlikely in the early days of GenAI as organizations are
just starting to understand its fundamental concepts. As a result, they are once again
trusting contributions from strangers on the internet to meet critical business needs.
Even if the developers of these models mean well, they can still inadvertently introduce
risk and bias.

_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

3 “Over 100 Malicious AI/ML Models Found on Hugging Face Platform,” The Hacker News, March 2024, https://thehackernews.com/2024/03/over-100-
malicious-aiml-models-found-on.html

Chapter 5: AI Security Challenges, Hype, and Opportunities 51


Unauthorized Model Sharing
An organization’s custom models might be considered valuable intellectual property. As
such, an attacker might attempt to steal a model, either by downloading it or sharing it
with their own account on the same platform.

Application Risks
In addition to the GenAI-specific security risks already discussed, we still have the
traditional risk associated with running a multicomponent application.

Access Keys
Authentication keys are necessary for communication between the various components
used by GenAI applications. The security of these components depends on safeguarding
access keys from unauthorized exposure. The compromise of VectorDB access keys
carries significant consequences as they are crucial for maintaining the integrity of the
GenAI application’s knowledge base. Breach of these keys could corrupt the knowledge
base, resulting in inaccurate outcomes or compromised data integrity. Such incidents
not only undermine the reliability of the GenAI application but also jeopardize the
trustworthiness of its outputs.

Furthermore, the keys associated with model providers, such as OpenAI, are critical assets
that require protection. Breaching these keys could provide attackers with unauthorized
access to the service, posing a threat to the confidentiality, integrity, and availability of the
GenAI models. They also could use the organization’s premium plans with the service for
malicious purposes while leaving the organization on the hook for the bill.

Additionally, agents rely on keys to access services such as search engines and CSP
APIs. Moreover, using CSP GenAI services, such as AWS Bedrock, involves the use
of CSP keys across the application. Exposing CSP access keys poses a severe risk,
potentially compromising the entire cloud account. This could lead to unauthorized
access and misuse of cloud resources, which, in turn could result in data breaches or
financial losses.

Mitigation Strategies for Addressing GenAI Risks


At a high level, mitigation strategies for addressing GenAI risks align closely with those
for other types of applications, with most effective measures beginning at the design
phase. Understanding potential risks helps prevent design flaws that could lead to
security vulnerabilities.

Key strategies include applying the defense-in-depth principle, following secure


development best practices, and maintaining a curated list of supported models and
trusted sources. Additionally, implementing comprehensive logging and continuous
monitoring, along with meticulous access control, ensures robust protection. Managing
access to and between different components of a GenAI application is particularly
crucial. Keys and secrets are at the heart of that.

Chapter 5: AI Security Challenges, Hype, and Opportunities 52


To maintain security and enhance flexibility in application development, it’s crucial
to avoid hardcoding keys. Instead, consider using key vaults, which are specialized
tools designed to securely store and manage sensitive information such as API keys,
passwords, and encryption keys. Several options are available, including HashiCorp
Vault, AWS Secret Manager, AWS SSM Parameter Store, and Azure Key Vault.

HashiCorp Vault offers a robust solution for secret management, with features like
dynamic secrets, encryption as a service, and access control policies. Similarly, AWS
Secret Manager and AWS SSM Parameter Store are AWS-native services tailored for
securely storing and managing secrets, offering integration with other AWS services and
robust security features.

Azure Key Vault is Microsoft’s cloud-based service for securely storing and managing
cryptographic keys, certificates, and secrets. Its features include hardware security
module (HSM) protection and role-based access control (RBAC), ensuring the
confidentiality and integrity of stored keys.

Using key vaults, developers can centralize the management of keys and secrets to
reduce the risk of unauthorized access and improve operational efficiency. Additionally,
keys stored in key vaults can be dynamically retrieved at runtime, allowing for easy
integration into applications without the need for hardcoding.

Applications running in the cloud can access that cloud provider’s AI service without
needing long-lived credentials. For example, an application running in the Amazon
Elastic Kubernetes Service (EKS) can use temporary, automatically rotating credentials to
assume an identity and access management (IAM) role with the necessary permissions
to use Amazon Bedrock. This is much better than using long-lived access key pairs for
an IAM user. Multicloud environments complicate this further as Microsoft Azure cannot
rotate an AWS IAM principal’s credentials without extensive permissions. This can be
resolved with the Workload Identity Federation tool, which is both powerful and often
hard to use. For more information, refer to a previously published webcast from SANS on
this subject.4

Security Use Cases for GenAI Applications


One of the core issues with every hype cycle is that proponents of the technology
apply it to every conceivable problem. We can confidently say that both public cloud
computing and GenAI are here to stay. But that does not mean that all its use cases
will stand the test of time. Just like there are workloads that would be better suited on
premises, there are tools that will not be enhanced with GenAI tacked on. This section
will discuss some of the most prominent security product categories and analyze if they
would benefit from GenAI. We will start with the tools we believe will benefit the least
from GenAI and work our way up to what we believe are its most promising applications.

_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

⁴ “Securely Integrate Multicloud Environments with Workload Identity Federation,” www.youtube.com/watch?v=fYKhCmDUr4M

Chapter 5: AI Security Challenges, Hype, and Opportunities 53


Software Composition Analysis (SCA)
SCA tools enumerate the third-party software packages used by an application and alert
the user when they have known issues. We believe the SCA tools on the market would
benefit the least from GenAI for the following reasons:

• Timeliness—SCA platforms help developers and security professionals in


vulnerability management detect packages that have known vulnerabilities. If a
new vulnerability is discovered, the SCA tool should report on it immediately. At
this time, LLMs cannot ingest data in real time, both because there is so much
data and because this would allow for model poisoning.
• Lack of perceived benefit—SCA tools are already quite accurate. They can say
with a high degree of confidence that a dependency has an associated CVE. We
cannot think of a compelling reason why GenAI would improve the accuracy of
this process. However, some SCA platforms attempt to analyze how the application
uses these dependencies. This can help determine if the usage of these
vulnerable dependencies actually makes their application vulnerable. It is worth
considering how GenAI may improve this functionality.
• Regular AI could yield similar benefits—AI has been around for a long time. If
these platforms could benefit from AI, they could have implemented it many
years ago—and many of them likely have. The novelty of GenAI and LLMs is that
they allow human beings to easily interact with AI using human language. These
platforms are developed by highly technical engineers, and they do not need to
be extensively customized by the user, so we cannot think of a reason why this
innovation would be helpful.

Static/Dynamic Application Security Testing (SAST/DAST)


SAST tools analyze application code and flag potential mistakes that can impact the
application’s security. These checks are much more complicated than the ones made by
SCA tools, but they are still fairly simple to conceptualize. Most of these checks look for
user input, trace the flow of that input throughout the application, and flag when the
input is not properly encoded or validated.

It is unclear how GenAI can improve this process. One area to consider is false-positive
reduction. SAST has earned a bad reputation for generating a massive number of false
positives that take time to triage. It is conceivable that the user could use an LLM to ask
questions about a finding using natural language. However, because these checks are
fairly rudimentary, we do not think this would be particularly useful.

DAST tools work very differently, interacting with a live application to look for suspicious
responses. However, the same principles apply. Like SAST, DAST does straightforward
checks that do not appear to be augmented with natural language. For example, DAST

Chapter 5: AI Security Challenges, Hype, and Opportunities 54


tools will let the user know if a response is missing a security header. This finding is
so self-evident that the only question we can imagine the customer would want to ask
ChatGPT is “Why should I care?” The answer to that question should be answered by the
DAST tool, and if not, it could be found with a simple search just as easily.

Policy as Code Development and Analysis


Here, we are referring to infrastructure as code (IaC), IAM policies defined using formats
like JSON, and anything else that defines security controls using code. These capabilities
are especially useful in automatically provisioning, configuring, and protecting cloud
environments. Unlike the tools previously mentioned, you cannot simply procure an IaC
service and expect it to generate policies for you. These policies are highly specific to
each organization. At the same time, reading and writing code can be daunting, even for
technical security professionals.

These challenges are opportunities for GenAI. Most security professionals would prefer
to define their security requirements using natural language, not brackets and braces.
GenAI can help them translate the former into the latter. Similarly, the user could
provide GenAI with an existing policy and have it explain what it accomplishes in a way
they can understand. At the same time, the user should not take these results at face
value because of the prevalence of GenAI hallucinations.

Automated Abuse Case Testing


General purpose testing can only find so much. Traditional security tools cannot find
business logic flaws because they do not understand the organization’s business. This
problem is analogous to tests for quality assurance (QA). There are no tools that can
validate the quality of an application out of the box. Instead, development organizations
rely on QA professionals. As manual QA testing is slow and expensive, many QA
professionals are now expected to develop automated test suites using code. Security
organizations will benefit greatly by following a similar approach for security testing.

We believe GenAI has the potential to revolutionize this area. Automated test cases
are arguably the hardest to write, the most contextual, and the least commonly
implemented controls we have mentioned. In our opinion, even implementing IaC is
orders of magnitude easier than developing automated tests for abuse cases. These
test cases are more or less defined using application code, and very few security
professionals are comfortable developing reliable applications using Java(Script),
Python, and similar languages. As such, security professionals can benefit even more by
working with GenAI to develop them.

Like with IaC, this approach has issues. Thankfully, these are the same issues that exist
with all code. Developers frequently fail to account for edge-cases in their code. They
also may uncritically copy and paste bad code, and it does not really matter if this

Chapter 5: AI Security Challenges, Hype, and Opportunities 55


code came from a knowledge base like Stack Overflow or a hallucinating LLM. Business
logic flaws will not be eliminated anytime soon, but GenAI is a worthwhile tool for
tackling them.

Conclusion
As organizations increasingly rely on GenAI applications, securing these systems will
become critically important. The security community must proactively prepare by gaining
a deep understanding of GenAI technologies and anticipating the associated security
risks. By starting early, we can develop robust strategies to protect these applications
and ensure their safe and effective use.

GenAI is not magic. It is not the solution to all security woes. It is a tool. Like any other
tool, it can be useful or detrimental. The security industry must think critically about this
tool’s costs and benefits. There is no doubt that GenAI is a remarkable innovation that
will change the industry in ways that we do not yet fully comprehend. We encourage you
to improve your comprehension by continuing to research this topic and using GenAI
platforms with caution. This is the same advice we would have given you in 2006 if you
were looking to explore the cloud, and it is the same advice we will give you in 2042 for
whatever revolutionary technology is the zeitgeist of that year.

Chapter 5: AI Security Challenges, Hype, and Opportunities 56


Appendix

i
i
i
i
i
i
Access more free educational
content from SANS at:

SANS Reading Room

Checkout upcoming
and on demand webcasts:

SANS Webcasts

Aviata Cloud Solo Flight Challenge

Cloud
Security: First
Principles
and Future
OpportunitiesChapter 5:  i

You might also like