Accelerating Machine Learning Innovation Through Security
Accelerating Machine Learning Innovation Through Security
learning innovation
through security
Security features from Amazon SageMaker and the
AWS Cloud can help you go from idea to production faster.
I N T R O D U C T IO N
2
Executive summary
As a managed AWS service, Amazon SageMaker automatically inherits the AWS global infrastructure and its network security features. AWS is
purpose built for the cloud, with data centers and a network architected to help protect your information, identities, applications, and devices.
The AWS network and infrastructures are monitored 24/7 to ensure confidentiality, integrity, and availability of your data. In addition, Amazon
SageMaker offers a comprehensive set of capabilities, so you can run your machine learning workloads with the most flexible and secure machine
learning environment available today.
Customers have told us that the following are the key security criteria they consider when evaluating machine learning solutions. Together, AWS
Cloud and Amazon SageMaker security features allow you to meet these criteria readily—so you can put machine learning to work securely in
production applications.
3
Infrastructure and network security
Machine learning security starts with the core infrastructure, including underlying
compute, storage, and networking. When assessing infrastructure and network security
of machine learning solutions, look for these critical qualifications: 1) the ability to isolate
the network and keep data traffic across the various components of the workflow within
secure private network connections; 2) the ability to control access, and, more specifically,
to block inflow (ingress) and outflow (egress) of data and code from and to the internet;
and 3) a tenancy model that provides isolation between user environments.
Amazon SageMaker uses Amazon Virtual Private Cloud (VPC), a service that provides
logically isolated sections of the AWS Cloud to launch its resources in a virtual network
of its own. All data traffic between various Amazon SageMaker components flows within
this network, controlled tightly by security group permissions. You also have the option to
deploy Amazon SageMaker within your own VPC to provide secure access to your private
resources. In addition, Amazon SageMaker enables network isolation from the internet
by allowing you to disable outbound data traffic to the internet through its network.
This option helps prevent users from engaging in risky behaviors, such as installing
unauthorized software.
You can also control Amazon SageMaker’s network traffic using AWS PrivateLink,
a service that provides private connectivity between VPCs, AWS services, and on-premises
applications. Further, Amazon SageMaker instances are deployed on single-tenancy
Amazon EC2 instances to ensure that your machine learning environments are isolated
from other customers. Lastly, Amazon SageMaker allows you to restrict root access to
users in a programmatic fashion, so you can decide when to give your data scientists the
flexibility they need to leverage external libraries.
4
3M innovates while maintaining focus
5
Authentication and authorization
One of the fundamental capabilities you need to secure your machine learning environment is a strong mechanism to define, enforce, and audit
who can sign in (called authentication) and what resources and functions they are authorized to access (called authorization).
Amazon SageMaker is governed by AWS Identity and Access Management (IAM), a service that enables you to manage access to AWS services and
resources securely. With AWS IAM, you can implement fine-grained access controls. AWS IAM allows you to specify who can perform what actions to
which resources and under what circumstances at the level of specific features, users, groups, and roles. You can readily bring existing user identities
from AWS Directory Service, your enterprise user directory such as Active Directory (AD), Lightweight Directory Access Protocol (LDAP), or a web
identity provider.
1 2 3 4
Multi-factor authentication (MFA), Tag-based access control to Detective controls that identify Preventive controls that can stop a
which prompts users for their categorize resources by purpose, potential security threats or potentially harmful action before
user name and password (the first owner, environment, and other incidents using user behavior in it takes place
factor), and an authentication code criteria, making it easier to manage, Amazon SageMaker
from their AWS MFA device (the search, and filter resources
second factor)
6
Data protection
Another important security requirement for machine learning solutions is protecting data
through automatic encryption at rest, in transit, and during training across distributed
clusters. Machine learning solutions should also provide the flexibility to bring your own
encryption keys.
Amazon SageMaker comes with built-in encryption capabilities to ensure that training
datasets, input data for inference, and other machine learning model and system artifacts
are encrypted in transit and at rest. Amazon SageMaker also gives you flexible data
encryption options through Amazon SageMaker managed keys, AWS managed keys,
and customer managed keys.
7
The NFL tackles player safety
Together, the NFL and AWS are leveraging machine learning to build the
“Digital Athlete,” a platform to improve injury prevention and treatment—and,
ultimately, predict injury. The program will use anonymized and aggregated
player data to create a composite that will simulate infinite scenarios of the
game environment. The NFL and AWS hope the program will eventually have
implications beyond football—for example, it could become a useful tool in
the healthcare industry.
8
Monitoring and auditability
Auditability is about tracking, tracing, and monitoring API calls, events, data access, and interactions down to the user and IP levels to ensure quick
remediation (if necessary). It’s critical to be able to capture audit trails at the granular level of users, files, and objects.
Amazon SageMaker is integrated with Amazon CloudWatch Logs and AWS CloudTrail for logging events and API calls. You can also set alarms that
watch for certain thresholds and send notifications or take actions when those thresholds are met. And you can identify which users and accounts
called AWS, the source IP address from which the calls were made, and when the calls occurred. Since Amazon SageMaker uses data from Amazon
Simple Storage Service (Amazon S3), all data access activities are automatically logged for monitoring.
9
Regulatory compliance
In many cases, machine learning solutions need to comply with regulatory standards and
pass compliance certifications that vary significantly across countries and industries.
AWS supports more security standards and compliance certifications than any
other cloud vendor. As an AWS service, Amazon SageMaker complies with a wide
range of compliance programs, including PCI, HIPAA, SOC 1/2/3, FedRAMP, and ISO
9001/27001/27017/27018. In addition, to aid your compliance efforts, AWS regularly
achieves third-party validation for thousands of global compliance requirements across
finance, retail, healthcare, government, and more. For the latest SageMaker certifications,
see the AWS Compliance Program website.
10
Thomson Reuters innovates faster
with security
11
Try Amazon SageMaker
for two months, free
Amazon SageMaker can help your organization secure your machine
learning environment quickly—so you can focus on scaling and innovating
faster. As part of the AWS Free Tier, you can get started with Amazon
SageMaker for free.
12