0% found this document useful (0 votes)
24 views59 pages

Analytics Services v2

The document discusses how customers are moving from traditional data warehouse architectures to a Lake House architecture on AWS. It provides examples of how customers have benefited from this approach, including increased agility and innovation, cost optimization, and improved performance and scalability. Specifically, customers have seen faster query performance, lower costs, the ability to ingest and analyze vast amounts of data daily, and to safely experiment and drive insights.

Uploaded by

jagsp.hp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views59 pages

Analytics Services v2

The document discusses how customers are moving from traditional data warehouse architectures to a Lake House architecture on AWS. It provides examples of how customers have benefited from this approach, including increased agility and innovation, cost optimization, and improved performance and scalability. Specifically, customers have seen faster query performance, lower costs, the ability to ingest and analyze vast amounts of data daily, and to safely experiment and drive insights.

Uploaded by

jagsp.hp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

AWS Lake House

architecture for analytics:


Turn data into insights

© 2021, Amazon Web Services, Inc. or its Affiliates.


Agility is more
important than ever

© 2021, Amazon Web Services, Inc. or its Affiliates.


Customers want more value from their data

Growing From new Increasingly Used by Analyzed by many


Exponentially sources diverse many people applications

© 2021, Amazon Web Services, Inc. or its Affiliates.


Customers moving from traditional data
warehouse approach

Relational
databases

Business Business
Intelligence Intelligence
Non-
Big data
relational
processing
databases

Data silos to Data Lake

DW Silo 1 DW Silo 2 Log Machine


analytics learning

Data
warehousing
OLTP ERP CRM LOB Devices Web Sensors Social

© 2021, Amazon Web Services, Inc. or its Affiliates.


Customers moving to Lake House architecture
Relational
databases

Scalable data lakes

Non-
Big data
relational
processing
databases Purpose-built
data services

Data Lake Seamless


data movement

Log Machine
analytics learning
Unified governance

Data Performant and


warehousing cost-effective
© 2021, Amazon Web Services, Inc. or its Affiliates.
Lake House architecture on AWS
Amazon
Aurora

Scalable data lakes

Amazon Amazon
EMR DynamoDB Purpose-built
data services
Amazon
Athena

Amazon Seamless
S3
data movement

Amazon Amazon
Elasticsearch SageMaker
Service Unified governance

Amazon Performant and


Redshift cost-effective
© 2021, Amazon Web Services, Inc. or its Affiliates.
Focusing on business outcomes
Customer Agility and Cost Performance
experience innovation optimization and scale

Built a customer engagement service Manages over 150 PB of data at $5 per Moved to a Lake House Architecture
Accelerates zero-carbon transition
using a Lake House Architecture to serve terabyte of data scanned to ingest 70 billion records per day,
with automated energy predictions and
over eight million developers working FINRA and now runs Amazon Redshift
maximized wind farm energy production
with 190k+ businesses in 100+ countries queries 32% faster
ENGIE
Twilio Shifting to AWS saves more than $2 Nasdaq
Helps drive better insights needed to million annually in data storage costs
Real-time insights to give tens of Scalability and cost efficiency during a
make key race-time decisions, giving a INVISTA
millions of users personalized global pandemic with 20x increase in
technological edge over competitors
streaming recommendations ventilator production while reducing
Toyota Racing Development first-pass inspection failures by 60%
Disney+ AWS Analytics reduced operational costs
by over 30% while freeing software Vyaire Medical
With Amazon Managed Streaming for engineers of low-value work
Increased the use of self-service
Apache Kafka, the company is able to Pinterest Scaled ingestion to six billion
analytics platform by over 40% for
experiment with big changes safely documents per day using Amazon
daily active fans—sharing richer
with little risk Elasticsearch Service
information in near real-time Amazon EMR as its core ML platform
New Relic Pearson
OneFootball allows for more accurate ML models
Built a sophisticated infectious disease 80% faster at an 80% lower cost
Personalizes searches for better tracker in four months for retirement Eightfold.ai Had the tools to support a 101%
customer experience and gets fewer community residents and employees increase in language learners
returns due to improved sizing Erickson Living Duolingo
recommendations
Zappos

© 2021, Amazon Web Services, Inc. or its Affiliates.


Amazon
Aurora

Amazon Amazon
EMR DynamoDB

Scalable
Amazon
Athena

data lakes Amazon


S3

Amazon Amazon
Elasticsearch SageMaker
Service

Amazon
Redshift

© 2021, Amazon Web Services, Inc. or its Affiliates.


Amazon S3 is the most popular choice
for data lakes

Unmatched durability,
Most object-level controls
availability, and scalability

Easiest to use with


Broadest portfolio
cost optimization:
of analytics tools
Intelligent tiering
Amazon S3

Best security, compliance,


Most ways to get data in
and audit capabilities

Cold storage and archive capabilities


© 2021, Amazon Web Services, Inc. or its Affiliates.
More data lakes run on AWS than anywhere else
Tens of thousands of data lakes run on AWS across all industries

© 2021, Amazon Web Services, Inc. or its Affiliates.


ENGIE builds the Common
Data Hub on AWS, accelerates
zero-carbon transition

Challenge
ENGIE’s decentralized global customer base had accumulated
lots of data, and it required a smarter, unique approach and
solution to align its initiatives and to efficiently provide data
across its global business units.

Solution
ENGIE built its Common Data Hub data lake on AWS, enabling the
company’s business units to collect and analyze data to support
a data-driven strategy and to lead the zero-carbon transition.

Result
• Collected 95 TB of data across 351 projects
• Automated energy predictions
• Maximized wind farm energy production

Amazon
© 2021, Amazon WebKinesis
Services,Data
Inc. orStreams
its Affiliates. Amazon Redshift AWS Glue Amazon Athena Amazon S3 Amazon SageMaker
Amazon
Aurora

Amazon Amazon
EMR DynamoDB

Purpose-built
Amazon
Athena

data services Amazon


S3

Amazon Amazon
Elasticsearch SageMaker
Service

Amazon
Redshift

© 2021, Amazon Web Services, Inc. or its Affiliates.


Purpose-built data services
Optimize performance, cost, and scale for your use cases

Amazon Amazon Amazon Amazon Amazon


Athena EMR Elasticsearch Kinesis and Redshift
Service Amazon MSK

Interactive query Big data processing Log and Real-time analytics Data warehousing
search analytics

© 2021, Amazon Web Services, Inc. or its Affiliates.


Amazon Athena
Query data in S3 using SQL
Serverless
Quickly query S3 data without managing
infrastructure, and pay only for the queries you run

Amazon
Aurora
Open and standard
Use ANSI SQL for querying with support for Parquet, CSV,
Amazon Amazon JSON, Avro and other standard data formats
EMR DynamoDB

Amazon
Athena

Amazon
S3 Fast interactive performance
Parallel execution to deliver most results within seconds,
Amazon
Elasticsearch
Amazon with no cluster management required
SageMaker
Service

Amazon
Redshift
Cost effective
Pay only for queries run and save 30–90% by
compressing, partitioning, and converting your
data into columnar formats
© 2021, Amazon Web Services, Inc. or its Affiliates.
FINRA operates petabyte-
scale analytics on data
lakes using Amazon Athena

Challenge
FINRA sought a serverless technology solution for cost-effectively
handling data at petabyte scale while maintaining high performance
and security under a heavy load.

Solution
FINRA used Amazon Athena to meet its required specifications,
including compatibility with Apache Hive Metastore, which
FINRA uses for its data catalog.

Result
• Scales automatically
• Provides secure internode encryption and Amazon KMS encryption
• Has a 17-second response time to requests
• Manages over 150 PB of data at $5 per terabyte of data scanned

Amazon
© 2021, Amazon Athena
Web Services, Inc. or its Amazon
Affiliates. S3
Amazon EMR
Easily run Spark, Hadoop, Hive,
Automate provisioning, configuring, and tuning
Presto, HBase, and other big Easy setup, management, and monitoring, with latest
data frameworks open-source framework updates within 30 days

Amazon
Aurora
Run workloads faster and more cost-effectively
1.7x faster than standard Apache Spark 3.0 at 40% of the
Amazon Amazon cost, and 2.6x faster than open-source Presto 0.238 at 80%
EMR DynamoDB of the cost
Amazon
Athena

Amazon
S3 Automatically scale up and down
Manage cluster size based on utilization to reduce costs
Amazon
Amazon
Elasticsearch
SageMaker
Service

Amazon
Redshift
Simple and predictable pricing
Per-second pricing, and save 50%–80% with
Amazon EC2 Spot and Reserved Instances
© 2021, Amazon Web Services, Inc. or its Affiliates.
Amazon EMR differentiated performance 5

1.7x faster performance than standard


Apache Spark 3.0 at 40% of the cost

Up to 2.6x faster performance than open-source


Presto 0.238 at 80% of the cost

11.5% average performance improvement


with Graviton2

25.7% average cost reduction with Graviton2


© 2021, Amazon Web Services, Inc. or its Affiliates.
Nielsen builds a multi-petabyte
data platform using Amazon EMR

Challenge
By 2019, Nielsen Marketing Cloud (NMC) had a third-party data
warehouse that cost $3 million per year to run. Much of the raw
data was inaccessible: data scientists needed help from developers
to access the data.

Solution
NMC used Amazon S3 to build its own solution using Amazon EMR
for both building and querying data marts, eliminating its need for
the third-party data warehouse. It further simplified data access
for users by implementing Apache Spark, enabling users to launch
their own clusters to develop and run queries, further enabling
them to interact with data using SQL.

Result
• Lowered costs by $1 million per year
• Made data accessible and available immediately

Amazon
© 2021, Amazon WebEMR Amazon
Services, Inc. or S3
its Affiliates.
IAS moves big data to AWS,
accelerating speed to market
and reducing costs by 12%

Challenge
Digital ad verification leader Integral Ad Science (IAS) was
using an on-premises data infrastructure that was difficult
to manage and scale and cost $16 million for equipment alone.

Solution
IAS used multiple AWS services, including Amazon EC2,
Amazon EMR, and Amazon S3, to build a comprehensive
cloud-based infrastructure and integrate contextual
analysis technology.

Result
• Can scale to meet fluctuating demand without provisioning servers
• Reduced costs by 12%

Amazon
© 2021, Amazon WebEMR Amazon
Services, Inc. or S3
its Affiliates.
Amazon Elasticsearch Service
Fully managed, scalable, and secure
Elasticsearch service Easy integration
Open source Elasticsearch APIs, managed Kibana,
integration with Logstash

Amazon
Aurora
Fully managed
Deployment in minutes, software installation and
Amazon Amazon patching, failure recovery, backups, and monitoring
EMR DynamoDB

Amazon
Athena

Amazon
S3 Scalable, secure, and compliant
Network isolation with Amazon VPC, encryption
Amazon
Elasticsearch
Amazon at-rest and in transit, and compliant with
SageMaker
Service HIPPA PCI DSS, and ISO

Amazon
Redshift
Cost-effective
Pay only for resources used with choice of on-demand
and Reserved Instance compute pricing, and save
up to 90% with Ultrawarm low-cost storage tier
© 2021, Amazon Web Services, Inc. or its Affiliates.
Open distro for Elasticsearch
An Apache 2.0-licensed distribution for Elasticsearch enhanced with
enterprise security, alerting, SQL, and more

100% open source Enterprise-grade Community-driven

Providing you the Delivering security Providing individuals


freedoms, so you can and advanced capabilities and organizations the
freely view, use, change, such as alerting, SQL, freedom to easily contribute
and distribute the code and cluster diagnostics changes to the distro

© 2021, Amazon Web Services, Inc. or its Affiliates.


Pinterest scales daily log search
and analytics with Amazon
Elasticsearch Service
Challenge
One of the largest visual discovery engines in the world and with
400 million monthly active users, Pinterest sought a solution to
address the growing volume of data it needed to ingest for its
engineers to effectively analyze log data.

Solution
Pinterest moved to managed analytics using Amazon
Elasticsearch Service, enabling it not only to scale its log-
analysis capabilities but also to reduce operational burdens,
improve security, and save costs.

Result
• Scaled monitoring and alerting capabilities for
software deployment
• Reduced operational costs by 30%, with 40–50% expected
• Increased productivity by freeing software engineers from
low-value work

Amazon
© 2021, Amazon WebElasticsearch
Services, Inc. or Service
its Affiliates.
Fully compatible
Run your existing Apache Kafka applications
on AWS without changes to source code

Fully managed
Focus on creating applications not managing
your Apache Kafka environment

Elastic stream processing


Run Apache Flink applications written in SQL, Java, or
Scala that elastically scale to process data streams
Amazon MSK
Fully managed, highly Highly available
available, and secure Apache Take advantage of multi-AZ replication within
an AWS region
Kafka service

Highly secure
Protect your data with multiple levels of security, including VPC
network isolation, encryption at-rest and in-transit, and more

© 2021, Amazon Web Services, Inc. or its Affiliates.


New Relic migrates its Apache
Kafka Cluster to Amazon MSK

Challenge
New Relic was handling incoming data with a single
monolithic on-premises Apache Kafka cluster, resulting
in minimal fault isolation, a lurking scalability wall, and
difficulty evaluating big changes safely.

Solution
New Relic migrated to AWS and used Amazon Managed
Streaming for Apache Kafka (Amazon MSK) to create and
manage a new cellular architecture on the cloud.

Result
• Routes most incoming data to Amazon MSK clusters
• Limits the blast radius of incidents
• Lowers the risk of experimentation
• Enables scaling

Amazon
© 2021, Amazon Managed
Web Streaming
Services, Inc. for Kafka
or its Affiliates.
Kinesis Data Analytics
Analyze data streams with serverless Apache Flink or SQL

Kinesis Data Streams


Capture, process, and store data streams

Amazon Kinesis
Easily collect, process, and
analyze data and video Kinesis Data Firehose
Load data streams into AWS data stores
streams in real time

© 2021, Amazon Web Services, Inc. or its Affiliates.


Disney+ empowers fast data
ubiquity using Amazon Kinesis

Challenge
Experiencing slow, limited data insights from data silos and batch
processing, Disney+ needed to give its teams fast data access so
that they could improve the Disney+ customer experience at scale.

Solution
To achieve fast data democracy and near-real-time insights,
Disney+ built a streaming data platform using AWS Analytics
services, including Amazon Kinesis Data Streams and Amazon
Kinesis Data Firehose.

Result
Using AWS, Disney+ now supports a data-driven culture that
provides near-real-time data and insights based on billions of
events to improve the experience of tens of millions of users
reliably and cost efficiently.

Amazon
© 2021, Amazon WebKinesis
Services,Data
Inc. orStreams
its Affiliates. Amazon Kinesis Data Firehose
Amazon Redshift
Analyze all your data with the fastest and most widely used cloud data warehouse

Amazon
Aurora Analyze all your data
Deepest integration with your data lake
Amazon Amazon
EMR DynamoDB

Amazon
Athena

Amazon
S3
Performance at any scale
Up to 3x better price performance than other cloud DW
Amazon
Amazon
Elasticsearch
SageMaker
Service

Amazon
Redshift
Lower your costs
At least 50% less expensive than other cloud DW

© 2021, Amazon Web Services, Inc. or its Affiliates.


Amazon Redshift innovates to meet your needs
NEW! NEW!

Analyze all your data Amazon Redshift Data lake Federated Data sharing Amazon
Lake House with Spectrum + Lake export query Redshift ML
AWS integration Formation

NEW!

Performance & scale Concurrency RA3 nodes & AQUA Materialized Automated
Fast and self-tuning scaling managed views perf. tuning
storage

NEW!

Low cost & best value On-demand Cross-AZ cluster Pause and Cost controls Built-in security Automatic
Predictable costs and RIs recovery resume features workload
manager

© 2021, Amazon Web Services, Inc. or its Affiliates.


Tens of thousands of customers process exabytes
of data with Amazon Redshift daily

NTT DOCOMO FOX Corp. Yelp Jack in the Box Warner Bros.
Moved >10 PB of Taking a lake house Enabling a Improved ops by Games
data from on- approach with data-driven moving off of Performance, scale,
premises to cloud RA3 nodes and organization with on-premises DW cost-effective
Amazon S3 concurrency scaling

© 2021, Amazon Web Services, Inc. or its Affiliates.


Simulmedia delivers richer
advertising experiences
using Amazon Redshift

Challenge
Simulmedia uses the power of data to target TV advertisements to
the right viewers. However, analyzing large volumes of data was
costly and inefficient due to the constraints of the company’s
legacy on-premises data center and Apache Hadoop environment.

Solution
Simulmedia migrated to an Amazon Redshift cloud data
warehouse on AWS, enabling it to analyze data at scale.

Result
Amazon Redshift enables Simulmedia to flexibly scale its
technology and its business, achieve data insights more efficiently,
and take advantage of upgrades while keeping costs manageable.

Amazon
© 2021, Amazon Redshift
Web Services, Inc. or its Amazon
Affiliates. S3
Amazon
Aurora

Amazon Amazon
EMR DynamoDB

Seamless Amazon
Athena

data movement Amazon


S3

Amazon Amazon
Elasticsearch SageMaker
Service

Amazon
Redshift

© 2021, Amazon Web Services, Inc. or its Affiliates.


Seamless data movement
Move your data, at scale, to where you need it the most

Data lake

Extract, Visual data Data Data warehouse Federated


transform, load preparation replication to/from data lake query

© 2021, Amazon Web Services, Inc. or its Affiliates.


AWS Glue
Simple, scalable, and serverless data integration

Connect to more sources


Easily ingest data from hundreds of popular data sources
Amazon
Aurora

Amazon Amazon Simplify workflow orchestratation


EMR DynamoDB
Easily run and manage thousands of data integration jobs
Amazon
Athena

Amazon
S3

Amazon
Elasticsearch
Amazon
SageMaker
No servers to manage
Service
Pay only for the resources your jobs consume

Amazon
Redshift

Simplify development
Visually develop and manage data integration jobs

© 2021, Amazon Web Services, Inc. or its Affiliates.


AWS Glue Studio
Easily author, run, and monitor AWS Glue ETL jobs

Author AWS Glue jobs visually without coding

Monitor 1,000s of jobs through a single


pane of glass

Distributed processing without the


learning curve

Advanced transforms though code snippets

© 2021, Amazon Web Services, Inc. or its Affiliates.


AWS Glue DataBrew NEW
GA
Visual data preparation for analytics and machine learning

Clean and normalize data with a visual interface

250+ built-in transformations without writing code

Work on large datasets at scale

© 2021, Amazon Web Services, Inc. or its Affiliates.


AWS Glue DataBrew
Visual data preparation for analytics and machine learning

Clean and normalize data with


a rich visual interface

Choose from 250+ built-in


transformations to automate tasks

Profile data to understand data


patterns and anomalies

Work on large datasets at scale

© 2021, Amazon Web Services, Inc. or its Affiliates.


AWS Glue Elastic Views NEW
PREVIEW
Easily combine and replicate data across multiple data stores

Amazon
Create materialized views across a wide variety
Amazon
Redshift
Elasticsearch
Service
Amazon S3 of databases and data stores using familiar SQL
DynamoDB Aurora RDS

Continually monitors source databases for


AWS Glue changes and updates targets within seconds
Elastic Views

Materialized Views
Access views of up-to-date
data in multiple targets
Serverless and automatically scales capacity up
and down to accommodate your workloads

DynamoDB Aurora RDS

Handles the heavy lifting of copying and combining


data without requiring custom code
© 2021, Amazon Web Services, Inc. or its Affiliates.
Moving data to and from the data lake
Extend the data warehouse to exabytes Unload Amazon Redshift data as Parquet to
of data in Amazon S3 data lakes Amazon S3 data lakes for faster sharing and analytics

Parquet is an open data format supported by


Directly query data stored in Amazon S3
Amazon EMR, Athena, and Amazon Redshift

Amazon Redshift now supports exporting


Parquet, ORC, Avro, JSON, and CSV data formats
data to Amazon S3 in Parquet format

Use SQL with Amazon Redshift’s Unload


Any scale of data; pay for what you use
command to export data in Parquet format

Unloaded data is automatically


registered in AWS Glue Data Catalog

Amazon Redshift

Query across data in Amazon EMR


Amazon Redshift & S3

Amazon Redshift Amazon S3 AWS Glue

Managed
storage

Amazon Athena
© 2021, Amazon Web Services, Inc. or its Affiliates.
Federated query in Amazon Redshift and Athena
Unified analytics across databases, data warehouse, and data lake

Amazon Redshift
Amazon Athena Integrate operational database with data
warehouse and Amazon S3 data lake

Analytics on operational data without data


movement and ETL delays

Operational Amazon S3
databases (i.e.,
Aurora, RDS)
data lake Flexible and easy way to ingest data, avoiding
complex ETL pipelines
*Other sources available in Amazon Athena: Amazon ElastiCache for Redis,
Amazon DocumentDB, Amazon DynamoDB, HBase in Amazon EMR

© 2021, Amazon Web Services, Inc. or its Affiliates.


Dollar Shave Club uses
AWS to speed data analysis,
improve user experience

Challenge
Dollar Shave Club needed to find the best way to optimize
storage and compute for its growing analytics environment.

Solution
Dollar Shave Club created a Lake House Architecture featuring
Amazon S3 and Amazon Redshift, taking advantage of the
Amazon Redshift Spectrum feature to query 60 TB of data.

Result
• Builds analytical reports in 5 minutes instead of 8 hours
• Saves $300,000 a year by optimizing cluster sizes
• Puts savings into research and development
• Creates multiple reports daily instead of 3–4 a week

Amazon
© 2021, Amazon Redshift
Web Services, Inc. or itsAWS Glue
Affiliates. Amazon S3
Amazon
Aurora

Amazon Amazon
EMR DynamoDB

Unified Amazon
Athena

governance Amazon
S3

Amazon Amazon
Elasticsearch SageMaker
Service

Amazon
Redshift

© 2021, Amazon Web Services, Inc. or its Affiliates.


AWS Lake Formation
Build a secure data lake in days

Amazon
Build data lakes quickly
Aurora
Move, store, and catalog your data faster; simplify
data management with governed storage
Amazon Amazon
EMR DynamoDB

Amazon
Athena Simplify security management
Amazon Centrally define and enforce security,
S3
governance, and auditing policies
Amazon
Amazon
Elasticsearch
SageMaker
Service

Amazon
Redshift
Provide self-service access to data
Share datasets easily and securely within your
organization and with partners

© 2021, Amazon Web Services, Inc. or its Affiliates.


Introducing Governed Tables Preview released
in 2021
NEW TYPE OF S3 TABLE

UPDATE OPTIMIZATION TIME


DATA ACCELERATION TRAVEL

ACID TRANSACTIONS STORAGE OPTIMIZATION DATA VERSIONS

CONSISTENT ACROSS TASKS AUTO-COMPACT SMALL FILES DATA HISTORY


INSERT, UPDATE, DELETE PUSH-DOWN FILTERS REPRODUCE EXPERIMENTS
CONVERGE BATCH & REAL-TIME REDUCE DATA SCAN AUDIT CHANGED DATA

RELIABLE PERFORMANT VERSIONED


© 2021, Amazon Web Services, Inc. or its Affiliates.
INVISTA builds a modern
data architecture on AWS

Challenge
INVISTA needed a more efficient way to glean insights from
manufacturing sites around the world and drive digital innovation
within their company.

Solution
The company migrated from siloed data to a data lake on AWS—
building a modern data architecture with AWS Analytics services to
unlock the potential of the digital plant and using data to remove
manual processes and transform their manufacturing workstreams.

Result
• Employees gain access to data in minutes instead of months
• Optimizing the whole chain from supply to production,
delivery, and sales
• Lower maintenance costs due machinery maintenance data

AWS Lake
© 2021, Amazon Formation
Web Services, AWS
Inc. or its Affiliates. Glue Amazon Athena Amazon SageMaker
Amazon
Aurora

Amazon Amazon
EMR DynamoDB

Performant and Amazon


Athena

cost-effective Amazon
S3

Amazon Amazon
Elasticsearch SageMaker
Service

Amazon
Redshift

© 2021, Amazon Web Services, Inc. or its Affiliates.


Performant and cost-effective

Industry-leading choice 100 GBPS bandwidth On-demand, reserved, Five highly available
of 200+ instance types network interfaces and spot instances storage tiers and
to meet workload needs for performance to reduce costs intelligent tiering

© 2021, Amazon Web Services, Inc. or its Affiliates.


No compromises on performance, scale, and cost
Amazon 3x better price performance than other cloud data warehouses
Redshift
Automated performance tuning and near-linear scaling

Amazon Optimized runtimes that provide the best price-performance


EMR
1.7x faster than standard Apache Spark; 2.6x faster than standard Presto

Amazon UltraWarm storage tier reduces costs by 90%


Elasticsearch
Service Store 6x more log data without increasing costs

Amazon Amazon S3 Select retrieves a subset of data leading to queries that run up to 400% faster
S3
Amazon S3 intelligent tiering saves up to 70% on storage costs for data lakes

Compute
integration With Graviton2 instance, customers save 25.7% for typical workloads
© 2021, Amazon Web Services, Inc. or its Affiliates.
Reinventing
business intelligence

© 2021, Amazon Web Services, Inc. or its Affiliates.


Amazon QuickSight
BI at scale
No servers to manage; pay per use billing

Embedded analytics
Quickly embed dashboards in your apps

A scalable, embeddable,
ML-powered Bi service
built for the cloud
ML-powered insights
Built-in anomaly detection and forecasting
Written narratives to interpret your data for you

© 2021, Amazon Web Services, Inc. or its Affiliates.


Amazon QuickSight Q NEW
PREVIEW
ML-powered natural language capability in Amazon QuickSight

Enter business questions in search


bar and get answer in seconds

ML generates data models that automatically


understands meanings and relationships

Not limited to only asking a specific set of questions

© 2021, Amazon Web Services, Inc. or its Affiliates.


Erickson Living built a solution
on AWS for managing
infectious diseases

Challenge
Erickson Living needed a sophisticated way to track infectious
diseases among its resident and employee populations, particularly
during the COVID-19 pandemic.

Solution
Erickson Living used a solution of AWS Analytics services including
Amazon Elasticsearch Service and Amazon QuickSight for data
intake, contact tracing, and analytics.

Result
• Built IDMS in 4 months after attending the AWS Data Lab
for help building the solution
• Tracks events and conducts contact tracing
• Provided a single source of truth for data about its communities
• Uses data and analytics to inform corporate decision-making

© 2021, Amazon
AmazonWebElasticsearch
Services, Inc. or Service
its Affiliates. Amazon Athena AWS Glue Amazon QuickSight Amazon S3
Accelerate your predictive analytics & machine learning journey
Broadest and most complete set of machine learning capabilities
AI SERVICES HEALTH AI INDUSTRIAL AI ANOMALY DETECTION CODE AND DEVOPS

NEW
Amazon Amazon NEW NEW NEW NEW NEW NEW
Amazon Transcribe Comprehend AWS Panorama Amazon Amazon Lookout Amazon Lookout Amazon Lookout Amazon Amazon
HealthLake for Medical for Medical + Appliance Monitron for Equipment for Vision for Metrics DevOps Guru CodeGuru

VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD CONTACT CENTERS

Contact Lens
Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon
Rekognition Polly Transcribe Comprehend Translate Textract Kendra Lex Personalize Forecast Fraud Detector Voice ID
+Medical +Medical For Amazon Connect

ML SERVICES
SAGEMAKER STUDIO IDE

NEW NEW NEW NEW


Label Visualize in Pick Train Tune Deploy in Manage NEW Human
Aggregate & Store & share Auto ML Spark/R Detect Debug &
data notebooks algorithm models parameters production & monitor CI/CD review
Amazon prepare data features bias profile
SageMaker
NEW: SageMaker JumpStart

NEW: Model management for edge devices

FRAMEWORKS & INFRASTRUCTURE


Deep
Learning GPUs & Elastic
Trainium Inferentia FPGA
AMIs & CPUs Inference
DeepGraphLibrary
Containers

© 2021, Amazon Web Services, Inc. or its Affiliates.


Amazon SageMaker: Built to make ML more accessible

Label Collect and Store Check Visualize in


data prepare data features data notebooks

Pick Train Tune Deploy in Manage CI/CD


algorithm models parameters production and monitor

SageMaker Studio IDE

© 2021, Amazon Web Services, Inc. or its Affiliates.


Amazon Redshift ML PREVIEW

Create, train, and deploy machine learning (ML) models using familiar SQL commands

Simple, optimized, and secure integration


between Redshift and Amazon SageMaker

Train and deploy an ML model using a SQL


command in your data warehouse

Embed predictions like fraud detection, risk


scoring, and churn in queries and reports

© 2021, Amazon Web Services, Inc. or its Affiliates.


Lake House architecture on AWS
Amazon
Aurora
Scalable data lakes

Amazon Amazon Purpose-built


EMR DynamoDB
data services
Amazon
Athena
Seamless
Amazon
S3 Data movement

Amazon Amazon
Elasticsearch SageMaker
Service Unified governance

Performant and
Amazon
Redshift cost-effective
© 2021, Amazon Web Services, Inc. or its Affiliates.
Accelerating innovation with AWS analytics

Purpose-built Amazon EMR


On Amazon EKS
AMazon EMR Studio Amazon Kinesis Data
Streams long-term
Amazon Redshift
data sharing
Amazon Redshift ML
data services retention

Seamless data AWS Glue Schema Registry AWS Glue DataBrew AWS Glue Elastic views
movement

AWS Lake Formation:


Unified transactions,
governance row-level security,
acceleration

Performant and Amazon redshift


Automated performance tuning
EMR performance
improvements
Aqua for
Amazon redshift
cost-effective

© 2021, Amazon Web Services, Inc. or its Affiliates.


Want to build a data Have a strategy and
vision and strategy? need help executing it?

Joint engagements with business and Joint engineering engagements between


technology stakeholder alignment customers and AWS technical resources

Create an organizational vision for innovation Create tangible deliverables to accelerate


with data to drive business outcomes strategic databases, analytics, and ML initiatives

Leave with an architecture, working prototype,


Define the first pilot, learn, and build path to production, and deeper knowledge of
AWS services

Jumpstart the data flywheel Come with an idea, leave with a solution

© 2021, Amazon Web Services, Inc. or its Affiliates.


Thank you

© 2021, Amazon Web Services, Inc. or its Affiliates.


OneFootball built a data lake in
days using AWS Lake Formation

Challenge
OneFootball needed to ensure its various teams could easily
access data needed to make informed business decisions and
build and test machine learning models.

Solution
The company used AWS Lake Formation to rapidly build a data lake
and enable self-service analytics, increasing data availability to
internal staff and end users and reducing technical debt.

Result
• Increased data coverage from relevant backend databases
from 30% to 60%
• Increased usage of analytics platform by 40%
• Cut time needed to request and receive data insights from
4–6 weeks to 2 days

AWS
© 2021, LakeWeb
Amazon Formation
Services, Inc. or itsAmazon
Affiliates. Redshift Amazon Elasticsearch Service Amazon Kinesis Data Streams Amazon S3

You might also like