ML at Scale Ebook
ML at Scale Ebook
at scale
High-performance, low-cost
machine learning for any use case
INTRODUCTION
The relevance and impact of ML are expected to accelerate. According to IDC, In this eBook, we’ll explore the major barriers to ML scalability and success.
by 2025, global spending on AI will reach $204 billion. 1
Then we’ll demonstrate how solutions and services from AWS can help
virtually any organization overcome those challenges—and leverage ML to
drive innovation and achieve tangible business results.
1
“Investment in Artificial Intelligence Solutions Will Accelerate as Businesses Seek Insights, Efficiency, and Innovation,
According to a New IDC Spending Guide,” IDC, 2021
1 Data
processing 2 Acquiring data
science skills 3 Responsible use of
machine learning 4 Expensive
infrastructure 5 Lack of development
tools and MLOps
2
Examine the barriers to
machine learning success
For many organizations, ML has proven difficult to scale, leading to a
lack of progress and frustration with the technology.
With the right services, solutions, tools, and processes, any organization
can achieve success with ML and scale it across their business. But
determining what those solutions are—and how best to implement
them—starts with examining and understanding the barriers that must
be overcome.
In that spirit, let’s take a look at the five greatest challenges to driving
widespread adoption and business results with ML.
1 Data processing
Data processing is very time-consuming, typically comprising about 80
percent of an ML project. Further, ML models are built on an enormous
foundation of data from multiple modalities—tabular, text, audio,
video, and others—which need to be managed differently. There are
many disparate tools for processing structured data, and individual
teams will have their own preferred approach. This makes it difficult for
organizations to centralize their efforts into a single method for creating
data pipelines.
3
2 Acquiring data science skills
Unfortunately, a shortage of data science professionals makes it difficult
to acquire new talent or train existing talent for ML development. Without
proper training or skilled data scientists to pick up the slack, developers
often struggle to make effective use of ML. Providing developers access
to pretrained models and fully managed solutions can help bridge this gap.
4
4 Expensive infrastructure 5 Lack of development tools and MLOps
With the increased use of ML comes more requirements for compute, storage, Due to the relative newness and rapidly changing nature of ML, most
and networking. This can lead to burdens of time, cost, and resources— organizations don’t have standard processes for ML development. Most also
especially for organizations that choose to house and manage their ML lack an integrated set of securely connected ML tools—such as integrated
infrastructure on premises. As organizations push the boundaries of ML development environments (IDEs), debuggers, profilers, and solutions for
complexity—creating models that use billions of parameters to make collaboration, workflows, and
thousands of predictions—these problems can escalate exponentially if project management.
left unchecked.
Instead, teams are forced to rely on disparate, disconnected tools for ML
Costs can be controlled by only procuring the amount of infrastructure that development. This makes it difficult to scale ML throughout the organization,
is needed for an organization’s ML workloads. But this can prove difficult, as as business analysts, developers, and data scientists will struggle to collaborate,
infrastructure requirements drastically change throughout the ML lifecycle. deliver results at the speed the business demands, and involve non-technical
For example, moving ML workloads to production can account for up to 90 teams in the process. By adopting ML operations (MLOps) processes and
percent of the overall operational budget. standardizing ML development, organizations can move faster and more
efficiently toward achieving success with ML at scale.
1 Simplified data
processing 2 No-code/low-
code solutions 3 Responsible
machine learning 4 Flexible
infrastructure 5 Development tools
and MLOps
5
Achieve machine learning success with AWS
Now you can overcome ML challenges, accelerate your ML journey, and reach your business
goals faster by using cloud services designed specifically for ML.
2 No-code/low-code solutions
To open ML to a broad range of users, Amazon SageMaker Canvas offers
a visual point-and-click interface to generate predictions. You can easily
access data from the cloud and on-premises data sources and automatically
generate predictions without having to write a single line of code. SageMaker
also comes with hundreds of built-in algorithms and pre-built ML solutions
that you can deploy with just a few clicks. Additionally, you can pick from
25+ API-based AI services for top ML use cases.
6
3 Responsible machine learning
SageMaker provides bias detection, explainability, security, and governance
features to help you support responsible use of ML and offers
transparency to your business stakeholders and customers. SageMaker
detects potential bias during data preparation, after model training, and in
your deployed model and includes feature importance graphs that help you
explain model predictions and produce reports for stakeholders.
4 Flexible infrastructure
SageMaker offers you the ideal combination of high-performance
and low-cost infrastructure available in a fully managed service. For
example, AWS Trainium is a custom ML chip designed by AWS specifically
for training deep learning applications, such as image classification, semantic
search, translation, voice recognition, natural language processing (NLP),
and recommendation engines, to deliver the best performance for training
in the cloud. AWS Inferentia, Amazon’s first custom silicon, is designed
to accelerate deep learning workloads and drive down the total cost
of inference.
7
Simplify machine learning at scale
with SageMaker
To maintain focus on your core business objectives, avoid the struggle of building Intuit empowers smarter financial
your own ML solution. Instead, offload the heavy lifting to SageMaker, which provides decisions with machine learning
high-performance, cost-effective, and scalable ML capabilities to implement an ML
environment across your entire business. Regardless of your organization’s level of ML Intuit began its ML journey with just one model
skills and experience, your teams can use SageMaker to prepare data and build, train, that empowered its customers to get the most
and deploy ML models for virtually any use case. With SageMaker, your organization can out of their tax deductions.
access a broad set of purpose-built ML capabilities under one unified visual user interface.
Since then, ML models have become a core part
of Intuit’s business, and the company has seen a
massive expansion of the number of ML models
How does Amazon deliver packages so quickly? it uses—from fraud detection to customer
service, personalization, and development of new
Take a virtual tour of an Amazon Fulfillment Center to find out. Discover how
product features.
Amazon uses a “symphony of machine learning” to help fulfill, sort, and deliver
packages in record time. In 2020 alone, Intuit increased the number of
models deployed across its platform by over 50
percent. Intuit turned to Amazon SageMaker
to develop and deploy hundreds of models at
The top 4 benefits you can achieve with SageMaker:
scale. Using SageMaker, Intuit modernized its ML
1. Enable a wider range of people to innovate with ML through a choice of tools—IDEs platform, saved tax filers over 25,000 hours by
for data scientists and no-code interface for business analysts. utilizing self-help tools and cutting expert review
time in half, and improved customer confidence.
2. Access, label, and process large amounts of structured data (tabular data) and
unstructured data (photo, video, and audio) for ML.
Watch the video ›
3. Reduce training time from hours to minutes with optimized infrastructure. Achieve up
to 10 times better team productivity with purpose-built tools.
4. Automate and standardize MLOps practices across your organization to build, train,
deploy, and manage models at scale.
8
Process machine learning data at scale
SageMaker helps with both structured and unstructured data processing. Your ML
practitioners can prepare data in fully managed Jupyter notebooks, where they can visually
browse, discover, and connect to Apache Spark data processing environments running on Thomson Reuters accelerates
Amazon EMR. They can also interactively query, explore, and visualize data. And they can research with Amazon SageMaker
run Spark jobs to build end-to-end data preparation and ML workflows.
Thomson Reuters—the world’s leading source of
You can also use Amazon SageMaker Data Wrangler to prepare structured data with news and information for professional markets—
a no-code visual interface. SageMaker Data Wrangler contains over 300 built-in data accelerated research and development of NLP
transformations, so you can quickly normalize, transform, and combine features without solutions with cost savings and flexibility using
having to write any code. Amazon SageMaker.
Additionally, Amazon SageMaker Ground Truth Plus can be used to build high-quality Thomson Reuters developed an internal platform
ML training datasets at a lower cost—and without having to build labeling applications or to apply ML at scale with AWS. The platform
manage a labeling workforce on your own. enables its developers and data scientists to
quickly gain new insights from real-time and
AWS customers are achieving massive scale in data preparation: historical data in a fully managed and secure
environment. It saves developers and data
• The NFL, in collaboration with AWS, developed the Digital Athlete program, which uses ML scientists countless hours of coding by providing
to track and identify risks coming from helmet collisions. This requires labeling hours of all the components used for ML in a single
video footage so that computer vision models can be trained on SageMaker and then track toolset. This helps the company put models into
helmet collisions and detect impact during games. production faster, with much less effort and at
• Postis created a scalable system with the power to run heavy ML workloads and support a lower cost.
its global growth using AWS. Postis now serves more than 200 customers in 25 countries,
including leading companies such as Ikea, Carrefour, Auchan, and Intersport. Read the success story ›
• Aurora, a leader in self-driving vehicle technology, trains ML and cloud-based simulation
workloads using AWS, processing trillions of data points each day. The company is scaling
to complete up to 12 million physics-based driving simulations, building on the petabytes
of data it collects during real-world road tests.
9
Make machine learning available
to more users
SageMaker enables all users—including business analysts with no coding or ML experience Freddy’s orders up insights two times
and ML practitioners from beginners to experts—to generate predictions and transform their faster with Amazon SageMaker
businesses with ML.
Freddy’s Frozen Custard & Steakburgers, a fast-
For data scientists and developers who prefer to write code in Python, SageMaker offers fully casual restaurant chain headquartered in Wichita,
managed Jupyter notebook environments available through the SageMaker Studio IDE. For Kansas, turned to data science to find a better
builders who prefer more automation, Amazon SageMaker Autopilot automatically builds, way to evaluate the quality of its restaurants.
trains, and tunes ML models without any loss of visibility or control. When projects need to get Leveraging the accessibility of Domo AutoML
fast-tracked, Amazon SageMaker JumpStart offers hundreds of pre-built algorithms, models, powered by Amazon SageMaker Autopilot,
and solutions for the most common use cases, which can be deployed in just a few clicks. Freddy’s built ML models to optimize staffing
levels in its restaurants without having to hire
For line-of-business analysts supporting finance, marketing, and operations, SageMaker
ML experts.
Canvas offers a visual point-and-click interface to generate accurate ML predictions without
requiring any ML experience or having to write a single line of code. SageMaker Canvas helps
business analysts support common use cases such as churn prediction, forecasting, and pricing Read the success story ›
recommendations. For developers who prefer not to create their own models in SageMaker,
they can use any of 25+ AI services from AWS for all top ML use cases, including text and
documents, chatbots, speech, vision, search, business processes, code and DevOps, and even
industry-specific services for healthcare and industrial.
10
Foster responsible machine learning
Responsible use of ML is key to achieving tangible benefits that scale across the business.
AWS is committed to developing fair and accurate AI and ML services and helping
organizations transform responsible AI from theory into practice with purpose-built tools Bundesliga scores higher fan
and guidance. engagement with Amazon SageMaker
To use ML in a responsible manner, ML models need to be built with transparency, fairness, The Deutsche Fußball Liga (DFL) GmbH,
and security in mind. Amazon SageMaker Clarify provides bias detection across the responsible for organizing and marketing
ML workflow and includes feature importance graphs. These explain model predictions German professional football, set out to create
and produce reports to support internal presentations while also identifying issues with a more engaging experience for Bundesliga fans
models to enable course correction. around the world by uncovering game insights
during football matches.
To help your organization meet security criteria applicable to ML workloads, SageMaker
includes solutions for encryption, private network connectivity, authorization, authentication, Bundesliga Match Facts, powered by AWS, give
monitoring, and auditability. viewers information on the difficulty of a shot,
the performance of their favorite players, and
an exploration of offensive and defensive trends
of their team. Using Amazon SageMaker Clarify,
Achieve responsible and secure machine learning the DFL can now interactively explain the key
components of the Bundesliga Match Facts
with SageMaker Clarify:
insights predictions to improve its ML models
• Gain greater visibility into data and models to identify and limit bias and ultimately deliver higher-quality game
• Detect potential bias throughout the entire workflow insights to fans.
11
Improve cost-efficiency with purpose-built machine learning tools
As your use of ML grows, so will your infrastructure requirements. To prevent AWS customers are achieving massive scale, productivity, and cost-efficiency
your costs from becoming prohibitive, you’ll need tools and processes that allow with purpose-built tools from AWS:
you to dynamically match your spend to your specific compute, storage, and
networking needs throughout the ML lifecycle. You’ll also need to maximize • Vanguard has fully automated the setup of its ML environments and can
productivity and efficiency, enabling your developers to avoid wasted time now deploy ML models 20 times faster.
and duplicative efforts and to put models into production quickly. • AstraZeneca can deploy new ML environments in five minutes versus one
month to generate insights that improve research and development and
By using services and tools that are purpose-built for ML, you can achieve accelerate the commercialization of new therapeutics.
speed, scale, and cost-efficiency that go far beyond general-purpose and
• NerdWallet reduced training costs by about 75 percent, even while
on-premises solutions.
increasing the number of models it trained.
Throughout the ML lifecycle—including labeling, data preparation, feature • Zendesk reduced ML inference costs by 90 percent by deploying thousands
engineering, training, hosting, monitoring, and workflows—your team can use of models per endpoint using SageMaker multi-model endpoints.
a single visual inference in SageMaker Studio. This provides you with greater
• Mueller Water Products used SageMaker to improve leak detection
control over your infrastructure spend. Furthermore, it can improve your data
performance. One of its customers estimates the solution will save it $8
science team’s productivity by up to 10 times and enable them to develop
million over five years.
models in weeks instead of months.2
2
Lowering total cost of ownership for machine learning and increasing productivity with Amazon SageMaker
Learn more about accelerating training and
development of ML models ›
12
Scale machine learning across your
business with MLOps
MLOps practices help you streamline the ML lifecycle by automating and standardizing
ML workflows. With standardized MLOps processes in place, your teams can get models
into production faster and collaborate more effectively. Over time, MLOps can help you
reach your ultimate goal—scaling ML adoption and using ML to improve results across the
entire organization.
13
It’s time to embrace
machine learning
By using purpose-built development and data tools, MLOps, no-code ML, infrastructure,
and solutions focused on responsible use of data and models on a fully managed service,
you can propel many more models from concept to production in a repeatable way for less cost.
And with 22 compliance programs (including PCI, HIPAA, SOC, 1/2/3, FedRAMP, and ISO),
AWS can help you gain the swiftness and security that powers your business into the future.
©️ 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.