DataAnalytics AWS PDF
DataAnalytics AWS PDF
Please join using the email id used for registration to enable us to mark
your attendance.
AWS Partners: Data Analytics on AWS –
Technical
Nidhi Seth
[email protected]
Course objectives
In this course, you will learn how to:
• Identify Amazon Web Services (AWS) services in the AWS analytics stack
• Describe decision points and technology selections for data analytics architectures
• Design highly available and fault-tolerant serverless data analytics architectures
• Discuss the AWS Data Pipeline and the customer data analytics journey using the Data Flywheel
• Describe five AWS data analytics technical solutions:
• Modernizing a data warehouse with Amazon Redshift
• Data lakes
• Streaming data
• Data governance
• Machine learning (ML)
• Identify technical engagement strategies and best practices for delivering a proof of concept
(POC)
• Locate and use AWS Partner Network (APN) Partner resources for opportunities and training
6
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Module 3: AWS Data Analytics Solutions – Module 6: APN Partner Opportunities and
Part I Resources
Break
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 8
Module 2: AWS Data Analytics
Portfolio
Objectives
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 10
Customer challenges and
opportunities for APN Partners
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 11
New realities
https://www.forbes.com/sites/cognitiveworld/2019/02/06/data-the-fuel-powering-ai-digital-transformation/#51efb027578b
http://newvantage.com/wp-content/uploads/2017/01/Big-Data-Executive-Survey-2017-Executive-Summary.pdf
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 13
Common data analytics challenges
What challenges do you see when using big data
analytics/technologies? (n=545)
https://bi-survey.com/challenges-big-data-analytics
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 14
AWS data analytics portfolio
overview
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 15
Secure infrastructure for analytics
AWS Identify and Access AWS Certificate Manager Private AWS Artifact
Amazon GuardDuty
Management (IAM) Certificate Authority (ACM Private CA)
Amazon Inspector
AWS Shield AWS Key Management Service (AWS KMS)
AWS Single Sign-On
AWS CloudHSM
AWS Well-Architected Tool Encryption at rest
Amazon Cloud Directory
Amazon Macie Amazon Cognito
Encryption in transit
AWS Directory Service
Amazon Virtual Private Bring your own keys, AWS CloudTrail
Cloud (Amazon VPC) hardware security module (HSM) support
AWS Organizations
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 16
AWS data analytics portfolio
Data visualization, engagement, and machine learning
AWS Data Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon
Exchange QuickSight Pinpoint SageMaker Comprehend Polly Lex Rekognition Translate
Analytics
Amazon Elasticsearch
Amazon Amazon EMR AWS Glue Amazon Amazon Kinesis Data
Service
Redshift (Spark and Presto) (Spark and Python) Athena Analytics
Amazon Simple Storage Service (Amazon S3) AWS Lake Formation AWS Glue
& Amazon S3 Glacier
Data movement
AWS Database Migration Service (AWS DMS) | AWS Snowball | AWS Snowmobile | Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams | Amazon Managed Streaming for Apache Kafka
17
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data movement services
Help customers move data from on premises to the cloud
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 18
Data lake services
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 19
Analytics services
Help customers extract value out of their data
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 20
Data visualization, engagement, and
machine learning services
Help customers understand and visualize their data, and use
machine learning (ML) for advanced analytics and predictions
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 21
AWS value proposition
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 22
Standards, formats, and open source
Managed Streaming
Amazon EMR Amazon ES
for Apache Kafka
Spark Logstash
Kibana
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 24
Data analytics pipeline
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 25
Data management challenges
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 26
Data analytics pipeline
Process and
Collect Visualize
analyze
Data Insights
Insights
Store
Time-to-answer (latency)
Balance of throughput and cost
https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 27
Data pipeline challenges
Building a data pipeline is challenging. Customers must:
• Manage updates, patches, and software integrations
• Handle increased overhead costs plus need for support
• Maintain focus on the core task of building applications that lead to data insights
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 28
AWS data analytics pipeline services
Collect Store Process and analyze Visualize
Amazon Kinesis AWS Amazon S3 Amazon Amazon EMR Amazon Athena Amazon
Data Firehose Snowball S3 Glacier QuickSight
Amazon Aurora
AWS Database
Automate Migration Service 29
AWS Glue
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Flywheel
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 30
Data Flywheel and customer journey
Store and
ü Save time manage data Modernize data
ü Save costs warehouse and
ü Agility build a data lake
ü Global distribution
ü Scale and performance ü New and faster insights
ü Broader access to analytics
Migrate data and
workloads to the cloud Build data-driven
applications
010010010
Data
01010001
100010100
Attract new customers
Generate more data
Innovate with
ü Better experiences
machine learning
ü Deeper engagement
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
ü Efficient processes 31
https://pages.awscloud.com/data-flywheel.html
Summary
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 32
Module 3: Data Analytics
Solutions on AWS – Part I
Objectives
In this module, you will learn how to:
• Explain data migration options from on premises to the AWS Cloud
• Describe two AWS data analytics technical solutions
• Modernizing a data warehouse with Amazon Redshift
• Data lakes
Evolution of data architecture
10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 34
Data migration options
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 35
Journey to a modern data architecture
Evolution of data architecture
100110000100
101011100101
010111001010
100001011111
011010
001111001011
Data
0010110
Machine
0
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Types of data 36
AWS data migration options
AWS Direct AWS Storage Amazon S3 Transfer AWS Snowball Amazon Kinesis AWS Database
Connect Gateway Acceleration Data Firehose Migration Service
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 37
Solution 1: Modernizing a data
warehouse with Amazon Redshift
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 38
Journey to a modern data architecture
Evolution of data architecture
100110000100
101011100101
010111001010
100001011111
011010
001111001011
Data Data
0010110
Machine
0
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Types of data 39
Data warehouses
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 40
Data warehouse defined
Extract
Central repository of curated data
from different sources
• Data optimized for reporting and data
analysis
• Data extracted, cleaned, transformed, Source 1
Staging area
Benefits
• Better decision making Source 2 Data warehouse
(database)
• Consolidated data from many sources
• Improved data quality, consistency, and
accuracy
• Access to historical intelligence Source 3
(database)
• Improved performance
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Data warehouse concepts: https://aws.amazon.com/data-warehouse/ 41
OLTP and OLAP comparison
Online Transactional Online Analytical
Processing (OLTP) Processing (OLAP)
Relational Database Data Warehouse
Application create, read, update, delete
Data Source OLTP and secondary source
(CRUD), origin
SQL INSERT, UPDATE, DELETE – short ETL focused, batch job to import,
Workloads
and fast queries JOINs, run complex queries
Denormalized using fewer tables in
Highly normalized, many distinct
Database Design STAR and snowflake schema with
tables to reduce duplication
duplicated data for fast performance
Depends on the amount of data, Growth over time, typically ranges
Database Size
typically from MB to TB in size from TB to PB in size
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 42
Traditional architecture and on-premises
data warehouse challenges
• Difficult to scale
• Long lead times for hardware procurement
• Complex upgrades are the norm
• High overhead costs for administration
• Expensive licensing and support costs
• Proprietary formats do not support newer open data formats, which results in data silos
• Data not cataloged, unreliable quality
• Licensing cost limits number of users and how much data can be accommodated
• Difficult to integrate with services and tools
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 43
Amazon Redshift
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 44
Amazon Redshift
Secure data warehouse that extends seamlessly to a data lake
Breaks a large job it into smaller Data from each column is stored Independent and resilient nodes
tasks, then distributes the tasks to together so the data can be without any dependencies
multiple compute nodes accessed faster, without scanning
and sorting all other columns
Result: Faster processing time Result: Compression of stored Result: Improves scalability
data improves performance
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 46
Amazon Redshift architecture
Leader node
https://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 52
Amazon Redshift
differentiating features
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 54
Amazon Redshift
differentiating features
Amazon Redshift
Federated query
lake house architecture
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 55
Federated query
Integrate queries on live data in Amazon RDS for
PostegreSQL and Amazon Aurora PostgreSQL with
queries on Amazon Redshift and Amazon data lake
Benefits
• Incorporate live data into business intelligence
(BI) and reporting applications OLTP ERP CRM LOB
• Ingest data into Amazon Redshift
• Query operational databases directly
• Apply transformations on the fly
• Load data into target tables without
complex ETL pipelines
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 56
Amazon Redshift
lake house architecture
Amazon Redshift lake house queries are run by a fleet of nodes that are
owned and maintained by AWS.
https://aws.amazon.com/redshift/lake-house-architecture/
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 57
1 Query
SELECT COUNT(*)
FROM S3.EXT_TABLE SQL clients, business intelligence tools 9 Result is sent to client.
GROUP BY…
2 JDBC/ODBC
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 60
AWS SCT data extractors
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 62
AWS SCT data extractors
Extract data from your data warehouse and migrate to Amazon Redshift
• Extracts data through local migration agents
• Data is optimized for Amazon Redshift and saved in local files
• Files are loaded to an Amazon S3 bucket (through network or AWS Snowball Edge)
and then to Amazon Redshift
Microsoft SQL
Server
Solution
Equinox migrated from a legacy data warehouse to Amazon Redshift
to combine data from disparate sources like clickstream data, cycling
log data, club management software, and more. They land data
directly in an Amazon S3 data lake and perform analytics using
Amazon Redshift, Amazon Redshift Spectrum, and Amazon EMR.
Result
Their monthly Amazon Redshift bill is now 20% of prior yearly
maintenance of their legacy data warehouse. AWS data lake and
analytics reduced report delivery time from months to days.
Club
management Maximilian Equinox • Migrated from Teradata data
software (ELT applications warehouse
scripts)
Amazon • Built a data warehouse with Amazon
Redshift Third-party Redshift and data lake with Amazon S3
Applications
applications • Analytics on data lake with Amazon
Spark on
Amazon Athena, Amazon Redshift Spectrum,
EMR and Amazon EMR
Cycling logs
• Increased user productivity to
Amazon Athena move faster
• Amazon Redshift costs approximately
Clickstream
20% of original Teradata maintenance
and support
Amazon S3 Amazon EMR
• Report time reduced from months
Social
to days
Amazon
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 69
Redshift
Solution 2: Data lakes
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 71
Journey to a modern data architecture
Evolution of data architecture
100110000100
101011100101
010111001010
100001011111
011010
001111001011
Traditional
010001100001
Machine
0
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Types of data 72
Data lakes defined
Architectural approach for a centralized
enterprise data repository stored on
Amazon S3
• Stores all structured, semi-structured, Machine
unstructured, and binary data at unlimited scale learning
• Holds curated and raw data Business
• Uses AWS data analytics tools for analytics Data lake intelligence
and
• Increases pace of innovation by extracting insights Data analytics
from data warehousing
• Enables more organizational agility
• Reduces cost and delivers results with predictive Open formats
analytics and ML central catalog
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 74
Secure data lake on Amazon S3
Amazon FSx
for Lustre
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://aws.amazon.com/compliance/services-in-scope 75
Reference architecture:
Catalog and search Access and user interface
Data lake on AWS
AWS Glue Amazon DynamoDB Amazon ES Amazon API Gateway IAM Amazon Cognito
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 77
Cleansing data
After migration, data still presents challenges:
Job workflow
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 79
AWS Glue crawlers
AWS IAM role
Built-in classifiers
AWS Glue crawler
MySQL
MariaDB
Databases PostgreSQL
JDBC Amazon Aurora
connection Oracle
Amazon Athena
Amazon Redshift
lake house
Amazon EMR
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 81
Use case: Log aggregation with ETL
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 82
Amazon Athena
Interactive query service to analyze data in Amazon S3 using standard SQL
$ SQL
Zero setup costs, Pay only for queries run, ANSI SQL interface, Serverless, zero
point to Amazon S3 save 30%–90% on JDBC/ODBC drivers, multiple infrastructure, zero
and start querying per-query costs through formats, compression types, administration,
compression and complex joins and data integrated with Amazon
types QuickSight
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 85
AWS Lake Formation
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 86
Challenges of building a secure data lake
1 Set up
storage
4 Configure and
2 Move data enforce security and
3 Cleanse,
compliance policies
prepare, and 5 Make data available
catalog data for analytics
1 2 3 4
Ingest and organize Secure and control Collaborate and use Monitor and audit
Automates creating data Sets up fine-grained Search and data Based on data access
lake and data ingestion. access control and data discovery using Data and governance policies,
governance. Catalog metadata. alert notifications are
raised on policy violation
To protect data, all and logged.
access is checked against
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. set policies. 88
AWS Lake Formation builds on AWS Glue
AWS Lake Formation
Connections,
AWS Glue ETL jobs AWS Glue crawlers
databases, tables
AWS Glue
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 89
AWS Lake Formation benefits
Centralized management of
fine-grained permissions
AWS Lake empowers security officers.
Formation
Simplified ingest and cleaning
AWS Glue Blueprints ML Data Catalog Access enables data engineers to build
Transforms control faster.
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 90
Amazon QuickSight
BI service built for the cloud with pay-per-session pricing and ML insights
Amazon RDS
Amazon EMR Amazon
AWS Glue AWS Glue Data
Amazon S3 crawler QuickSight
Catalog
Other databases
Amazon Redshift
On-premises data Spectrum
Streaming data
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 93
Use case: COVID-19 pandemic
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 96
Use case: COVID-19 data lake architecture
Tableau: COVID-19 data platform Visualization for
desktop for users
Upload to Amazon S3
AWS Cloud
Amazon S3 Amazon
Athena
Define
Athena Schema
Publish and update data products with AWS Connect to S3 data with
Data Exchange Amazon Athena
connector in Tableau
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://d2908q01vomqb2.cloudfront.net/77de68daecd823babbb58edb 97
1c8e14d7106e83bb/2020/05/29/COVID-19-AWS-Tableau-1.1.jpg
Summary
10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 98
Activity: Serverless Data Lake
Lab Demonstration
Step 1: Serverless data lake architecture
Amazon SQS
Crawler
Amazon S3 AWS AWS Glue Amazon AWS AWS Glue Amazon S3
Lambda CloudWatch Lambda
Email notification
Amazon SNS
Amazon
CloudWatch
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 101
Module 4: AWS Data Analytics
Solutions – Part II
Objectives
In this module, you will learn about three key types of data analytics
technical solutions on AWS:
• Streaming and real-time analytics with Amazon Kinesis
• Data governance
• Extended solution: Insights and monetization with machine learning (ML)
Evolution of data architecture
10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 109
Solution 3: Streaming and
real-time analytics with
Amazon Kinesis
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 110
Journey to a modern data architecture
Evolution of data architecture
10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 112
Common use cases: Real-time analytics
The value of data diminishes over time
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 113
Enabling real-time analytics
Data streaming technology enables a customer to ingest, process, and analyze high
volumes of high-velocity data from a variety of sources, in real time.
1. 2. 3. 4. 5.
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 114
Data streaming solution challenges
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 115
AWS streaming data solutions
Efficiently collect, process, and analyze data streams in real time
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 116
Data generators: Simple streaming
data patterns
Data producers Streaming services Data consumers
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 118
Amazon Kinesis Data Streams
Massively scalable, highly durable data ingestion and processing service
optimized for real-time data streaming
Data collected is Real-time analytics Data synchronously Serverless, can scale
available within replicates data across dynamically to handle
70
MB to TB Thousands to
3 Availability and millions
each hour
Zones in a Region of PutRecords
each second
milliseconds Dashboards
• Data can be stored up to
• Anomaly detection 365 Days
• Dynamic pricing
No upfront cost
low, pay-as-you-
go pricing
https://aws.amazon.com/kinesis/data-streams/faqs/?nc=sn&loc=5
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 119
How Kinesis Data Streams works
Amazon Kinesis
Data Analytics
Amazon Kinesis
Input
Data Streams Output
Amazon EC2
Capture and send data Ingest and store data Analyze streaming data
streams for processing using BI tools
AWS Lambda
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 123
How Kinesis Data Firehose works
Amazon S3
Capture and send data Prepares and loads data Analyze streaming data
continuously to the Amazon using analytics tools
selected destinations Elasticsearch Service
Splunk
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Durably store the data 124
for analytics
Kinesis Data Streams and
Kinesis Data Firehose
Characteristics Amazon Kinesis Data Streams Amazon Kinesis Data Firehose
Stream storage and In shards, default 24 hours and up to 7 Max buffer size 128 MB and max time 900
duration days seconds
Data transformation and
None Uses AWS Lambda and AWS Glue
conversion
Amazon Kinesis Agent, applications using Amazon Kinesis Producer Library (KPL), AWS SDK
Data producer
for Java, Amazon CloudWatch Logs and CloudWatch Events, AWS IoT
AWS Lambda, Amazon Kinesis Data Analytics,
AWS Lambda, Amazon Kinesis Data
and Kinesis Data Firehose, apps using the KCL
Analytics, Amazon Kinesis Data Firehose,
Data consumer and SWK for Java, Amazon S3, Amazon
Applications using the Kinesis Client Library
Redshift, Amazon ES, Splunk, and Amazon
(KCL) and SDK for Java
Kinesis Data Analytics
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/kinesis/data-streams/faqs/?nc=sn&loc=5 125
https://aws.amazon.com/kinesis/data-firehose/faqs/?nc=sn&loc=5
When to use Kinesis Data Streams and
Kinesis Data Firehose
For data streaming applications that require near real-time responses in seconds
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 126
Amazon Kinesis Data Analytics
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 127
Amazon Kinesis Data Analytics
Amazon Kinesis
Input Data Analytics Output
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 128
Kinesis data analytics application details
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 129
Use case: Clickstream analytics
Evolve from batch processing to real-time analytics
Websites send Collects the data and Processes data in Loads processed Runs analytics Readers see
clickstream data sends to Kinesis Data near-real time data into models to identify personalized content
Analytics Amazon Redshift content suggestions and
recommendations increase
engagement
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. s130
Streaming data analytics architecture
Alerts Notifications
Amazon Simple
Notification Service
3
5
2 Fan-out
Kinesis
Kinesis Data Streams
Data Analytics
Kinesis Machine
1
Data Streams learning
Kinesis
Amazon Data Firehose
Kinesis
Millions of Kinesis enabled 4 Data science
data sources Data Firehose applications
Amazon S3 Amazon Amazon
Logs and data lake RDS Redshift Downstream
processed data applications Reporting
AWS Lambda
DynamoDB Amazon
Elasticsearch 132
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Service
Solution 4: Data governance
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 136
Journey to a modern data architecture
Evolution of data architecture
10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010
• Securing data
• Auditing data usage
• Managing data access
• Safeguarding sensitive data and PII
• Maintaining regulations and mandates
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 138
Data security and governance
34% 37%
of IT decision makers cite ensuring data of IT decision makers cite ensuring
governance/privacy as one of their security/compliance upon movement of
organization’s biggest digital data as one of their most important IoT
© ENTERPRISE STRATEGY GROUP, 2019.
transformation challenges priorities over the next 18–24 months
More than one in three companies cite data privacy and governance as
a hurdle to both digital transformation and IoT initiatives
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 139
https://www.esg-global.com/hubfs/ESG-Infographic-IT-Spending-Intentions-2019.pdf
Resolving PII dangers
Consumer
consent
violation
External Data
• Do these issues need to be hacking breach
resolved?
• Is there a solution architecture
that solves all PII issues? Personally
Second- identifiable
• What best practices can be party information Spyware
used to mitigate PII dangers? misuse (PII)
Unsecured
Espionage
devices
Rogue
agents
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 140
Amazon Macie
Continually evaluate
Discover sensitive
Amazon Macie Amazon S3 Take action
data
environment
Enable Amazon Automatically Analyzes bucket using Generates findings
Macie with one-click generates an ML and pattern matching and sends to
in the AWS inventory of to discover sensitive Amazon
Management Amazon S3 bucket data, like PII CloudWatch Events
Console or with a and details on the for integration into
• Financial
single API call bucket-level security workflows and
• Personal
and access controls remediation actions
• National
• Medical
• Credentials and secrets
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 141
De-identified data lake (DIDL) on AWS
A de-identified data lake (DIDL) is an architectural approach that reduces the risks
associated with managing data, particularly personally identifiable information (PII).
Benefits
Reduce risk
• Remove PII before it enters a data lake
Understand all the data
• Create a Data Catalog of an entire data lake
Reduce compliance costs
• Automate the discovery, classification, de-identification,
and ongoing monitoring of data across an organization
Turn data into an asset, not a liability
• Enable a broader set of governed analytic and machine learning use cases
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 142
Masking PII data
Email ID Name, SSN, State
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
NA 3344325 It is expected to rain tomorrow
143
Extended solution 5: Insights and
monetization with ML on AWS
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 144
Journey to a modern data architecture
Evolution of data architecture
10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010
AI and
Business analytics machine learning
Machine learning requires:
• More data: Collect all types of data
• Flexibility: Define schema during analysis
Data warehouse Big data
Interactive Real time
queries processing • Scalability: Scale storage and compute (CPU or
Data Catalog GPU) independently
10011000010010101
11001010101110010
• Data transformation and processing: Run a broad
set of processing and analytics on the
10100001011111011
010
00111100101100101
Data warehouse
Data lake
• Security: Networking, identity, encryption, and
compliance
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/machine-learning/?nc2=h_ql_prod_ml 147
Machine learning resources
AWS STP: Machine Learning (ML) on AWS for ML Practitioners - Technical
https://partnercentral.awspartner.com/LmsSsoRedirect?RelayState=%2flearningobject%2fcurri
culum%3fid%3d25521
AWS Foundations: How Amazon Practical Data Science with The Machine Learning Pipeline
SageMaker Can Help Amazon SageMaker on AWS
• Fundamental digital course on • Learn to solve real-world use • Explore how to use the
how SageMaker mitigates the cases with machine learning machine learning pipeline to
core challenges of (intermediate) solve a real business problem
implementing an ML pipeline (intermediate)
• Duration: 1 day
• Duration: 30 minutes • Duration: 4 days
• https://www.aws.training/Sessi
• https://www.aws.training/Detai onSearch?pageNumber=1&cou • https://www.aws.training/Sessi
ls/Video?id=49646 rseId=40748 onSearch?pageNumber=1&cou
rseId=38910
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 149
Summary
10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 151
Module 5: AWS Technical
Conversations and Engagement
Technical engagement
conversations using the Data
Flywheel
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Data Flywheel
2. Move and manage all
workloads in the cloud
5. Innovate with
machine learning
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 155
Conversations using the Data Flywheel
2. Move and manage all
workloads in the cloud
3. Build
4. Analyze with
data-driven apps
data lake
1. Move and store architectures
data in the cloud
5. Innovate with
machine learning
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 156
Architecture and data migration: APN Partner best
practices
Do Avoid
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Discounting and funding programs
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 204
AWS Data and Analytics Competency
categories
Data Analytics Provide a set of integrated tools to solve data
Platforms analytics challenges within a standard framework
Business Intelligence
Help customers turn raw data into actionable business
(BI) and Data
Visualization
information, such as reporting, dashboards, and data visualization
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 205
Collaboration workflow
Register an
Receive approval Engage AWS
opportunity on Engage Before SA
from AWS PSM account or
APN Partner AWS sales involvement
Partner SA
Central
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 207
AWS data analytics solutions and
Immersion Days
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Immersion Days
Designed to help APN Advanced and Premier Consulting Partners deliver technical data analytics
workshops to their customers and help grow their businesses
Data Engineering
Database Migration
Immersion Day Amazon EMR
Immersion Day
Build a serverless data lake Immersion Day
Give your customers a head
solution on AWS including Focus on unique facets of
start with the AWS Database
modules focusing on Amazon EMR for big data
Migration Service and the
ingestion, hydration, workloads
Schema Conversion Tool
exploration, and consumption
Benefits: Access to technical workshop content, AWS usage credits, Market Development Funds (MDF)
opportunities, and support from AWS teams
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/partners/immersion-days/ 211
AWS Certified data analytics and
learning resources
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Technical Professional Learning Path
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 216
AWS Certified Data Analytics – Specialty
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/certification/certified-data-analytics-specialty/ 217
Partner Cast: Analytics
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 219
Thank You !
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission
from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course-
[email protected]. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners.