0% found this document useful (0 votes)
539 views133 pages

DataAnalytics AWS PDF

Uploaded by

bhanu ac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
539 views133 pages

DataAnalytics AWS PDF

Uploaded by

bhanu ac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 133

We will begin the session at 9:05 am IST.

Please join using the email id used for registration to enable us to mark
your attendance.
AWS Partners: Data Analytics on AWS –
Technical
Nidhi Seth
[email protected]
Course objectives
In this course, you will learn how to:
• Identify Amazon Web Services (AWS) services in the AWS analytics stack
• Describe decision points and technology selections for data analytics architectures
• Design highly available and fault-tolerant serverless data analytics architectures
• Discuss the AWS Data Pipeline and the customer data analytics journey using the Data Flywheel
• Describe five AWS data analytics technical solutions:
• Modernizing a data warehouse with Amazon Redshift
• Data lakes
• Streaming data
• Data governance
• Machine learning (ML)
• Identify technical engagement strategies and best practices for delivering a proof of concept
(POC)
• Locate and use AWS Partner Network (APN) Partner resources for opportunities and training
6
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda

Module 1: Course Introduction Module 4: AWS Data Analytics Solutions –


Part II
Module 2: AWS Data Analytics Stack
Portfolio Break

Break Module 5: Technical Engagement Strategies

Module 3: AWS Data Analytics Solutions – Module 6: APN Partner Opportunities and
Part I Resources

- Data lake solution

Break

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 8
Module 2: AWS Data Analytics
Portfolio
Objectives

In this module, you will learn how to:


• Understand customer challenges related to data analytics in their business
• Provide a technical overview of AWS data analytics portfolio
• Discuss technical advantages and position of data analytics solutions on AWS
• Explain how to build a data analytics pipeline
• Explain the Data Flywheel

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 10
Customer challenges and
opportunities for APN Partners

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 11
New realities

Explosion of data- Demand growing for faster Pay-as-you-go pricing


connected devices, apps, decision making on allows organizations to
and systems generate more real-time data. analyze data to gain
data than ever before. insights.

By making 10% more data accessible, a typical Fortune 1000


company will see a $65 million increase in net income.*
*Source: Forbes Online; New Vantage Partners - Big Data Executive Survey
https://www.forbes.com/sites/cognitiveworld/2019/02/06/data-the-fuel-powering-ai-digital-transformation/#5062b36b578b
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 12
Customers need your help

85% of businesses want to be data driven, but


only 37% have been successful.

https://www.forbes.com/sites/cognitiveworld/2019/02/06/data-the-fuel-powering-ai-digital-transformation/#51efb027578b

http://newvantage.com/wp-content/uploads/2017/01/Big-Data-Executive-Survey-2017-Executive-Summary.pdf

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 13
Common data analytics challenges
What challenges do you see when using big data
analytics/technologies? (n=545)

Inadequate analytical know-how in our company


53% Top four challenges
Data privacy issues (safety of personal data)
49% involve knowledge, skill,
Inadequate technical know-how in our company
48%
security, and privacy
Data security (unauthorized access to company data) 48%

This is your opportunity

https://bi-survey.com/challenges-big-data-analytics

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 14
AWS data analytics portfolio
overview

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 15
Secure infrastructure for analytics

Customers need multiple levels of security, identity and access


management, encryption, and compliance to secure their data lake.

Security Identity Encryption Compliance

AWS Identify and Access AWS Certificate Manager Private AWS Artifact
Amazon GuardDuty
Management (IAM) Certificate Authority (ACM Private CA)
Amazon Inspector
AWS Shield AWS Key Management Service (AWS KMS)
AWS Single Sign-On
AWS CloudHSM
AWS Well-Architected Tool Encryption at rest
Amazon Cloud Directory
Amazon Macie Amazon Cognito
Encryption in transit
AWS Directory Service
Amazon Virtual Private Bring your own keys, AWS CloudTrail
Cloud (Amazon VPC) hardware security module (HSM) support
AWS Organizations

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 16
AWS data analytics portfolio
Data visualization, engagement, and machine learning
AWS Data Amazon Amazon Amazon Amazon Amazon Amazon Amazon Amazon
Exchange QuickSight Pinpoint SageMaker Comprehend Polly Lex Rekognition Translate

Analytics
Amazon Elasticsearch
Amazon Amazon EMR AWS Glue Amazon Amazon Kinesis Data
Service
Redshift (Spark and Presto) (Spark and Python) Athena Analytics

Data lake infrastructure and management

Amazon Simple Storage Service (Amazon S3) AWS Lake Formation AWS Glue
& Amazon S3 Glacier

Data movement
AWS Database Migration Service (AWS DMS) | AWS Snowball | AWS Snowmobile | Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams | Amazon Managed Streaming for Apache Kafka

17
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data movement services
Help customers move data from on premises to the cloud

AWS DMS AWS Snowball AWS Snowmobile

Amazon Kinesis Amazon Kinesis Amazon Managed


Data Streams Data Firehose Streaming for
Kafka

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 18
Data lake services

Customers are constrained by volume, variety, veracity, and velocity of


on-premises data, and data silos pose a major challenge.

Amazon S3 Amazon S3 Glacier AWS Lake Formation AWS Glue

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 19
Analytics services
Help customers extract value out of their data

Amazon Redshift Amazon EMR AWS Glue

Amazon Athena Amazon ES Amazon Kinesis


Data Analytics

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 20
Data visualization, engagement, and
machine learning services
Help customers understand and visualize their data, and use
machine learning (ML) for advanced analytics and predictions

AWS Data Exchange Amazon QuickSight Amazon SageMaker

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 21
AWS value proposition

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 22
Standards, formats, and open source

• Apache Flink • Apache Mahout • PyTorch


• Ganglia • MapReduce • R
• Apache HBase • Apache MXNet • Scala
• HCatalog • MySQL • Apache Spark
• Hadoop Distributed • Apache Oozie • Sqoop
File System (HDFS)
• Apache ORC • SQL
• Apache Hive
• Apache Parquet • TensorFlow
• Hudi
• Phoenix • Tez
• Java
• Apache Pig • Yarn
• JupyterHub
• Apache Kafka • Presto • Apache Zeppelin

• Apache Livy • Python • Apache Zookeeper

…and many more


© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 23
AWS alternatives to open source

Managed Streaming
Amazon EMR Amazon ES
for Apache Kafka

Spark, Hive, Presto, Operational Real-time


Flink, HBase analytics analytics
Hadoop Elasticsearch Kafka

Spark Logstash

Kibana
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 24
Data analytics pipeline

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 25
Data management challenges

How can customers:


• Collect a variety of data types accumulating at varying velocities?
• Collect data from numerous sources accumulating at differing velocities?
• Store massive amounts of data without running out of space?
• Cleanse and augment data quality to be analyzed?

Can they automate these steps?

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 26
Data analytics pipeline

Process and
Collect Visualize
analyze

Data Insights
Insights
Store

Time-to-answer (latency)
Balance of throughput and cost

https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 27
Data pipeline challenges
Building a data pipeline is challenging. Customers must:
• Manage updates, patches, and software integrations
• Handle increased overhead costs plus need for support
• Maintain focus on the core task of building applications that lead to data insights

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 28
AWS data analytics pipeline services
Collect Store Process and analyze Visualize

Amazon Kinesis AWS Amazon S3 Amazon Amazon EMR Amazon Athena Amazon
Data Firehose Snowball S3 Glacier QuickSight

Amazon DynamoDB Amazon RDS Amazon Kinesis Amazon


Data Analytics SageMaker

Amazon Kinesis AWS Direct


Data Streams Connect
Amazon ES Amazon Amazon Redshift
Amazon Managed
CloudSearch
Streaming for Kafka

Amazon Aurora
AWS Database
Automate Migration Service 29
AWS Glue
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Flywheel

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 30
Data Flywheel and customer journey
Store and
ü Save time manage data Modernize data
ü Save costs warehouse and
ü Agility build a data lake
ü Global distribution
ü Scale and performance ü New and faster insights
ü Broader access to analytics
Migrate data and
workloads to the cloud Build data-driven
applications
010010010
Data
01010001
100010100
Attract new customers
Generate more data

Innovate with
ü Better experiences
machine learning
ü Deeper engagement
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
ü Efficient processes 31
https://pages.awscloud.com/data-flywheel.html
Summary

In this module, you learned about:


• Customer challenges related to data analytics
• AWS data analytics portfolio
• Technical benefits of AWS data analytics solutions
• Data analytics pipeline
• Data Flywheel

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 32
Module 3: Data Analytics
Solutions on AWS – Part I
Objectives
In this module, you will learn how to:
• Explain data migration options from on premises to the AWS Cloud
• Describe two AWS data analytics technical solutions
• Modernizing a data warehouse with Amazon Redshift
• Data lakes
Evolution of data architecture

10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010

Traditional Data warehouse Data lakes Real-time


Data Machine
data warehousing modernization on AWS analytics with
governance learning
streaming data

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 34
Data migration options

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 35
Journey to a modern data architecture
Evolution of data architecture

100110000100
101011100101
010111001010
100001011111
011010
001111001011

Data
0010110

Traditional Data warehouse Data lakes Real-time analytics


010001100001

Machine
0

data warehousing modernization on AWS with streaming data governance learning

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Types of data 36
AWS data migration options

AWS Direct AWS Storage Amazon S3 Transfer AWS Snowball Amazon Kinesis AWS Database
Connect Gateway Acceleration Data Firehose Migration Service

• File gateway • Snowball Edge storage


• Tape gateway optimized
• Volume gateway • AWS Snowmobile

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 37
Solution 1: Modernizing a data
warehouse with Amazon Redshift

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 38
Journey to a modern data architecture
Evolution of data architecture

100110000100
101011100101
010111001010
100001011111
011010
001111001011

Data Data
0010110

Traditional Data lakes Real-time analytics


010001100001

Machine
0

data warehousing warehouse on AWS with streaming data governance learning


modernization

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Types of data 39
Data warehouses

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 40
Data warehouse defined
Extract
Central repository of curated data
from different sources
• Data optimized for reporting and data
analysis
• Data extracted, cleaned, transformed, Source 1

Transform and load


(flat files) Analytics
and loaded into a data warehouse using
extract, transform, load (ETL) tool

Staging area
Benefits
• Better decision making Source 2 Data warehouse
(database)
• Consolidated data from many sources
• Improved data quality, consistency, and
accuracy
• Access to historical intelligence Source 3
(database)
• Improved performance

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Data warehouse concepts: https://aws.amazon.com/data-warehouse/ 41
OLTP and OLAP comparison
Online Transactional Online Analytical
Processing (OLTP) Processing (OLAP)
Relational Database Data Warehouse
Application create, read, update, delete
Data Source OLTP and secondary source
(CRUD), origin

Analyze and gain insights from


Purpose Capture and store transactional data
historical data

SQL INSERT, UPDATE, DELETE – short ETL focused, batch job to import,
Workloads
and fast queries JOINs, run complex queries
Denormalized using fewer tables in
Highly normalized, many distinct
Database Design STAR and snowflake schema with
tables to reduce duplication
duplicated data for fast performance
Depends on the amount of data, Growth over time, typically ranges
Database Size
typically from MB to TB in size from TB to PB in size
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 42
Traditional architecture and on-premises
data warehouse challenges
• Difficult to scale
• Long lead times for hardware procurement
• Complex upgrades are the norm
• High overhead costs for administration
• Expensive licensing and support costs
• Proprietary formats do not support newer open data formats, which results in data silos
• Data not cataloged, unreliable quality
• Licensing cost limits number of users and how much data can be accommodated
• Difficult to integrate with services and tools

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 43
Amazon Redshift

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 44
Amazon Redshift
Secure data warehouse that extends seamlessly to a data lake

A fully managed data warehouse that is highly integrated with


other AWS services. Features include:
• Optimized for high performance
• Support for open file formats
• Petabyte-scale capability
• Support for complex queries and analytics, with data visualization
Amazon Redshift tools
• Secure end-to-end encryption and certified compliance
• Service Level Agreement (SLA) of 99.9 percent
• Based on open source Postgres database
• Cost efficient
https://aws.amazon.com/redshift/pricing/
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 45
Amazon Redshift performance features

Massively parallel processing Columnar storage Shared-nothing architecture


(MPP)

Breaks a large job it into smaller Data from each column is stored Independent and resilient nodes
tasks, then distributes the tasks to together so the data can be without any dependencies
multiple compute nodes accessed faster, without scanning
and sorting all other columns

Result: Faster processing time Result: Compression of stored Result: Improves scalability
data improves performance

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 46
Amazon Redshift architecture

Java Database Open Database


Connectivity Client applications Connectivity
(JDBC) (ODBC)

Leader node

Compute Node 1 Compute Node 2

Node slices Node slices

Data warehouse cluster


47
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://docs.aws.amazon.com/redshift/index.html
Amazon Redshift instance types

https://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 52
Amazon Redshift
differentiating features

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 54
Amazon Redshift
differentiating features

Amazon Redshift
Federated query
lake house architecture

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 55
Federated query
Integrate queries on live data in Amazon RDS for
PostegreSQL and Amazon Aurora PostgreSQL with
queries on Amazon Redshift and Amazon data lake

Reduce data moved over the network with


Amazon Redshift’s intelligent optimizer. Pushes
Data warehouse Amazon Aurora
and distributes portions of computation directly
into remote operational databases

Benefits
• Incorporate live data into business intelligence
(BI) and reporting applications OLTP ERP CRM LOB
• Ingest data into Amazon Redshift
• Query operational databases directly
• Apply transformations on the fly
• Load data into target tables without
complex ETL pipelines
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 56
Amazon Redshift
lake house architecture
Amazon Redshift lake house queries are run by a fleet of nodes that are
owned and maintained by AWS.

With Amazon Redshift lake house


architecture, customers can:
• Query data in the data lake and
write data back in open formats
• Use familiar SQL statements to
combine and process data across
data stores
• Run queries on live data in
operational databases without
requiring data loading and ETL
pipelines

https://aws.amazon.com/redshift/lake-house-architecture/
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 57
1 Query
SELECT COUNT(*)
FROM S3.EXT_TABLE SQL clients, business intelligence tools 9 Result is sent to client.
GROUP BY…
2 JDBC/ODBC

Query is optimized and compiled using Leader node


ML at the leader node.
Determine what is run locally and what 8 Final aggregations and joins w
goes to Amazon local Amazon Redshift tables
Redshift lake house. Compute node 1 done in-cluster.
Compute node 2
3 Query plan sent Node slices Node slices
to all compute
nodes. Amazon Redshift
lake house fleet
Compute nodes obtained
4
from the Data Catalog;
dynamically prune
partitions. 7 Amazon Redshift lake house
projects, filters, joins, and
5 Each compute node issues aggregates.
multiple requests to Amazon
Redshift lake house layers.
6 Amazon Redshift lake house
Amazon S3 AWS Glue Data Amazon Redshift
nodes scan Amazon S3 data.
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Catalog lake house 58
Advanced Query Accelerator (AQUA)
A new distributed and hardware-accelerated cache that makes Amazon Redshift
faster than other cloud data warehouses, without increasing cost

Minimizes data movement over the network


by pushing operations to Advanced Query RA3 RA3 RA3
Accelerator (AQUA) nodes cluster cluster cluster

AQUA nodes with custom AWS designed


analytics processors to make operations Running in parallel
(compression, encryption, filtering, and
aggregations) faster than traditional CPUs AQUA node AQUA node AQUA node AQUA node
Custom Custom Custom Custom
AWS designed AWS designed AWS designed AWS designed
processor processor processor processor

Amazon Redshift managed storage


© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
59
Migration to Amazon Redshift

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 60
AWS SCT data extractors

Amazon Redshift extracts data


through local migration agents.

Data is optimized for Amazon


Redshift and saved in local files.
Legacy data AWS SCT
AWS SCT S3 BucketS3
Amazon Amazon
Amazon
warehouse Redshift
Redshift
Files are loaded to an Amazon S3
bucket (through network or AWS
Snowball) and then to Amazon
Redshift.

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 62
AWS SCT data extractors
Extract data from your data warehouse and migrate to Amazon Redshift
• Extracts data through local migration agents
• Data is optimized for Amazon Redshift and saved in local files
• Files are loaded to an Amazon S3 bucket (through network or AWS Snowball Edge)
and then to Amazon Redshift

Microsoft SQL
Server

NETEZZA Source DW AWS SCT Amazon Amazon


S3 bucket Redshift
Equinox sees faster
reports, 80% cost savings
Challenge
Their data warehouse had limited integration, was very expensive, and
required a lot of platform-specific domain knowledge. They needed to
reduce administration and costs, blend structured and semi-
structured data for analytics, and evolve into a data lake strategy.

Solution
Equinox migrated from a legacy data warehouse to Amazon Redshift
to combine data from disparate sources like clickstream data, cycling
log data, club management software, and more. They land data
directly in an Amazon S3 data lake and perform analytics using
Amazon Redshift, Amazon Redshift Spectrum, and Amazon EMR.

Result
Their monthly Amazon Redshift bill is now 20% of prior yearly
maintenance of their legacy data warehouse. AWS data lake and
analytics reduced report delivery time from months to days.

Amazon Redshift Amazon S3 Amazon EMR


Use case: Equinox (continued)

Club
management Maximilian Equinox • Migrated from Teradata data
software (ELT applications warehouse
scripts)
Amazon • Built a data warehouse with Amazon
Redshift Third-party Redshift and data lake with Amazon S3
Applications
applications • Analytics on data lake with Amazon
Spark on
Amazon Athena, Amazon Redshift Spectrum,
EMR and Amazon EMR
Cycling logs
• Increased user productivity to
Amazon Athena move faster
• Amazon Redshift costs approximately
Clickstream
20% of original Teradata maintenance
and support
Amazon S3 Amazon EMR
• Report time reduced from months
Social
to days

Amazon
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 69
Redshift
Solution 2: Data lakes

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 71
Journey to a modern data architecture
Evolution of data architecture

100110000100
101011100101
010111001010
100001011111
011010
001111001011

Data warehouse Data lakes Real-time analytics Data


0010110

Traditional
010001100001

Machine
0

data warehousing modernization on AWS with streaming data governance learning

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Types of data 72
Data lakes defined
Architectural approach for a centralized
enterprise data repository stored on
Amazon S3
• Stores all structured, semi-structured, Machine
unstructured, and binary data at unlimited scale learning
• Holds curated and raw data Business
• Uses AWS data analytics tools for analytics Data lake intelligence
and
• Increases pace of innovation by extracting insights Data analytics
from data warehousing
• Enables more organizational agility
• Reduces cost and delivers results with predictive Open formats
analytics and ML central catalog

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 74
Secure data lake on Amazon S3

Amazon FSx
for Lustre

Amazon S3 Amazon S3 Amazon S3 Amazon S3


Access Points Block Public Access object tags object lock
• Multi-tenant bucket • Across AWS accounts and • Access control, lifecycle • Immutable Amazon S3
• Dedicated access points Amazon S3 bucket level policies, and analysis objects
• Customer permissions • Specify public permissions • Classify data with • Retention management
from an Amazon Virtual using Access Control List metadata controls
Private Cloud (Amazon (ACL) or policy • Use tags to filter objects • Data protection and
VPC) • Four settings: • Define replication policies compliance
• BlockPublicAcls • Populate tags with AWS
• IgnorePublicAcls Lambda functions or S3
• BlockPublicPolicy Batch Operations
• RestrictPublicBuckets

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://aws.amazon.com/compliance/services-in-scope 75
Reference architecture:
Catalog and search Access and user interface
Data lake on AWS

AWS Glue Amazon DynamoDB Amazon ES Amazon API Gateway IAM Amazon Cognito

Data ingestion Central storage Processing and analytics

AWS Data Exchange Amazon Kinesis


Machine Amazon QuickSight Amazon EMR
Amazon S3 learning

AWS AWS DMS Amazon


Direct Connect AWS Snowball Amazon Athena
Redshift
Protect and secure

Amazon CloudWatch IAM AWS STS AWS KMS AWS CloudTrail


76
Data services – AWS Glue

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 77
Cleansing data
After migration, data still presents challenges:

Data is increasingly diverse It accumulates rapidly It must be cleansed before analyzed by


many applications
• Volume • Missing or incorrect data
• Variety • Wrong data format Avoid unsearchable data
• Velocity • Partial missing data
• Veracity

How can customers provide access to users to gain insights?


© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 78
AWS Glue
§ Hive metastore compatible with enhanced functionality
§ Crawlers automatically extracts metadata and creates tables
§ Integrates with Amazon Athena, Amazon EMR, and many more
AWS Glue Data
Catalog

§ Generates ETL code


§ Build on open frameworks – Python, Scala, and Apache Spark
§ Developer-centric – editing, debugging, sharing
Job authoring

§ Run jobs on a serverless Spark platform


§ Use flexible scheduling, job monitoring, and alerting
Job running

§ Orchestrate triggers, crawlers, and jobs


§ Author and monitor entire flows and integrated alerting

Job workflow

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 79
AWS Glue crawlers
AWS IAM role
Built-in classifiers
AWS Glue crawler
MySQL
MariaDB
Databases PostgreSQL
JDBC Amazon Aurora
connection Oracle

NoSQL Amazon Redshift Amazon Redshift


connection
Apache Avro
Object Parquet
connection Amazon DynamoDB ORC
XML
JSON and JSONPaths
AWS CloudTrail
Amazon S3
Binary JSON (BSON)
Logs
Delimited
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
… growing 80
AWS Glue Data Catalog services

AWS Glue ETL

Amazon Athena

AWS Glue Data


Catalog

Amazon Redshift
lake house

Amazon EMR
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 81
Use case: Log aggregation with ETL

Update table partition


AWS Glue ETL AWS Glue Data
Catalog
Create partition
on Amazon S3
AWS Glue
crawler Amazon S3 Amazon S3
AWS service logs
bucket bucket
Web application logs
Query data
Server logs
Amazon Athena

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 82
Amazon Athena
Interactive query service to analyze data in Amazon S3 using standard SQL

No setup costs Pay per query Open Streamlined

$ SQL

Zero setup costs, Pay only for queries run, ANSI SQL interface, Serverless, zero
point to Amazon S3 save 30%–90% on JDBC/ODBC drivers, multiple infrastructure, zero
and start querying per-query costs through formats, compression types, administration,
compression and complex joins and data integrated with Amazon
types QuickSight

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 85
AWS Lake Formation

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 86
Challenges of building a secure data lake

Typical steps to build a secure data lake


Ingestion and cleaning Security
Analytics and machine learning

1 Set up
storage

4 Configure and
2 Move data enforce security and
3 Cleanse,
compliance policies
prepare, and 5 Make data available
catalog data for analytics

Data engineer Data security officer Data analyst


© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 87
AWS Lake Formation for a secure data lake

1 2 3 4

Ingest and organize Secure and control Collaborate and use Monitor and audit

Automates creating data Sets up fine-grained Search and data Based on data access
lake and data ingestion. access control and data discovery using Data and governance policies,
governance. Catalog metadata. alert notifications are
raised on policy violation
To protect data, all and logged.
access is checked against
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. set policies. 88
AWS Lake Formation builds on AWS Glue
AWS Lake Formation

Blueprints Security, search, collaboration


Monitoring

Workflow AWS Glue Data Catalog

Connections,
AWS Glue ETL jobs AWS Glue crawlers
databases, tables

AWS Glue

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 89
AWS Lake Formation benefits

Amazon Athena Amazon Amazon Redshift Amazon Amazon EMR


QuickSight SageMaker Comprehensive set of integrated
tools enables every user equally.

Centralized management of
fine-grained permissions
AWS Lake empowers security officers.
Formation
Simplified ingest and cleaning
AWS Glue Blueprints ML Data Catalog Access enables data engineers to build
Transforms control faster.

Cost effective, durable storage


includes global replication
capabilities.
Amazon S3
data lake storage

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 90
Amazon QuickSight
BI service built for the cloud with pay-per-session pricing and ML insights

Scalable Pay for use


Automatically scales with use and Pay monthly or annually.
activity, with no additional
With pay-per-session pricing, customers
infrastructure requirements.
only pay when they access their reports
and dashboards, with no upfront costs.
Seamlessly grows with customers.

Serverless and fully Fully integrated


managed Deeply integrated with data sources and
other AWS services like Amazon Redshift,
Fully managed cloud application,
Amazon S3, Athena, Amazon Aurora,
meaning there's no upfront cost,
Amazon RDS, IAM, AWS CloudTrail, and
software to deploy, capacity planning,
Amazon Cloud Directory– providing
maintenance, upgrades, or migrations.
customers with everything they need for an
end-to-end cloud BI solution.
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 92
Serverless data lakes and analytics

Web app data Amazon Athena

Amazon RDS
Amazon EMR Amazon
AWS Glue AWS Glue Data
Amazon S3 crawler QuickSight
Catalog
Other databases

Amazon Redshift
On-premises data Spectrum

Streaming data
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 93
Use case: COVID-19 pandemic

Challenge Solution Benefits


The COVID-19 pandemic has Amazon worked with APN Health workers, scientists, and
stressed healthcare systems, Partners Salesforce, Tableau, and decision makers can access and
businesses, and economies. It has MuleSoft to create a secure data compare international data to
disrupted the daily lives of lake using AWS Data Exchange, their local data, enabling
people around the world. AWS Glue, Amazon Athena, and understanding and visualization
Amazon S3 as a store of trusted of the impact of COVID-19
People need a solution to data from open source COVID-19 locally and globally.
capture data (diagnosis, data providers.
mortality, and recovery rates) This solution enables decision
globally in real time, and turn the making and deeper insights to
data into insights they can share help manage and flatten the
and respond to with confidence. COVID-19 curve until a vaccine is
available.

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 96
Use case: COVID-19 data lake architecture
Tableau: COVID-19 data platform Visualization for
desktop for users
Upload to Amazon S3
AWS Cloud

Lambda function Data revision


export to Amazon S3
AWS Glue
Amazon S3 AWS Data Exchange

Amazon S3 Amazon
Athena

Define
Athena Schema

Publish and update data products with AWS Connect to S3 data with
Data Exchange Amazon Athena
connector in Tableau
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://d2908q01vomqb2.cloudfront.net/77de68daecd823babbb58edb 97
1c8e14d7106e83bb/2020/05/29/COVID-19-AWS-Tableau-1.1.jpg
Summary

Evolution of data architecture

10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010

Traditional Data warehouse Data lakes Real-time


Data Machine
data warehousing modernization on AWS analytics with
governance learning
streaming data
AWS data • Amazon S3
Amazon Redshift
migration • AWS Glue
options • AWS Data Exchange
• Amazon Athena
• AWS Lake Formation
• Amazon QuickSight

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 98
Activity: Serverless Data Lake
Lab Demonstration
Step 1: Serverless data lake architecture

Amazon SQS

Raw zone ETL job Processed zone

Crawler
Amazon S3 AWS AWS Glue Amazon AWS AWS Glue Amazon S3
Lambda CloudWatch Lambda
Email notification

Amazon SNS
Amazon
CloudWatch

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 101
Module 4: AWS Data Analytics
Solutions – Part II
Objectives
In this module, you will learn about three key types of data analytics
technical solutions on AWS:
• Streaming and real-time analytics with Amazon Kinesis
• Data governance
• Extended solution: Insights and monetization with machine learning (ML)
Evolution of data architecture

10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010

Traditional Data warehouse Data lakes Real-time


Data Machine
data warehousing modernization on AWS analytics with
governance learning
streaming data

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 109
Solution 3: Streaming and
real-time analytics with
Amazon Kinesis

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 110
Journey to a modern data architecture
Evolution of data architecture

10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010

Traditional Data warehouse Data lakes Real-time


Data Machine
data warehousing modernization on AWS analytics with
governance learning
streaming data

Types of data used


© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 111
Streaming data defined
Data that is generated continuously from thousands of
data sources, sent simultaneously

Player-game interactions Social media streams


Music downloads
Geolocation of
cars and devices Website clicks

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 112
Common use cases: Real-time analytics
The value of data diminishes over time

Milliseconds Seconds Minutes Hours

• Messaging between • Log ingestion • Streaming ETL into


microservices • Internet of Things (IoT) data lakes and
• Response analytics device maintenance data warehouse
(web and mobile • Change data capture (CDC)
application
notifications)

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 113
Enabling real-time analytics
Data streaming technology enables a customer to ingest, process, and analyze high
volumes of high-velocity data from a variety of sources, in real time.

1. 2. 3. 4. 5.

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 114
Data streaming solution challenges

Challenges of building on-premises, real-time streaming solutions:

Difficult to set up Tricky to scale

Difficult to achieve high availability


Integration requires development
Error prone and complex to
manage Expensive to maintain

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 115
AWS streaming data solutions
Efficiently collect, process, and analyze data streams in real time

Amazon Kinesis Amazon Kinesis Amazon Kinesis


Data Streams Data Firehose Data Analytics

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 116
Data generators: Simple streaming
data patterns
Data producers Streaming services Data consumers

Mobile and applications

Amazon Kinesis Agent Amazon Simple Amazon EMR


Amazon Kinesis
Storage Service (S3)
Amazon Kinesis Producer Data Analytics
Library (KPL)

Amazon Kinesis Data Streams Amazon EC2


Amazon Redshift

Amazon CloudWatch Logs Amazon Kinesis


Data Firehose

Amazon CloudWatch Events


Amazon Kinesis
Connector Library
AWS IoT
Amazon Kinesis
Apache Kafka Data Streams
117
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Streams

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 118
Amazon Kinesis Data Streams
Massively scalable, highly durable data ingestion and processing service
optimized for real-time data streaming
Data collected is Real-time analytics Data synchronously Serverless, can scale
available within replicates data across dynamically to handle

70
MB to TB Thousands to
3 Availability and millions
each hour
Zones in a Region of PutRecords
each second
milliseconds Dashboards
• Data can be stored up to
• Anomaly detection 365 Days
• Dynamic pricing
No upfront cost
low, pay-as-you-
go pricing
https://aws.amazon.com/kinesis/data-streams/faqs/?nc=sn&loc=5
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 119
How Kinesis Data Streams works

Amazon Kinesis
Data Analytics

Spark on Amazon EMR

Amazon Kinesis
Input
Data Streams Output
Amazon EC2

Capture and send data Ingest and store data Analyze streaming data
streams for processing using BI tools

AWS Lambda

Build custom, real-time


applications
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 120
Kinesis Data Streams architecture
Data record
Shard 1
• Sequence #
Amazon EC2 instances • Partition Key Amazon S3
Amazon Kinesis
• Data blob
Data Firehose

Client Shard 1 Amazon DynamoDB


EC2 instance
Shard 2
Mobile client Amazon Redshift
EC2 instance

Shard N Amazon EMR


Traditional
server
Amazon Kinesis
Data producers Amazon Kinesis Data Analytics Amazon Kinesis
Data Stream Data stream Data consumers Data Firehose
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 121
https://aws.amazon.com/kinesis/data-streams/faqs/?nc=sn&loc=5
Amazon Kinesis Data Firehose

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 123
How Kinesis Data Firehose works

Amazon S3

Amazon Kinesis Amazon Redshift


Input
Data Firehose Output

Capture and send data Prepares and loads data Analyze streaming data
continuously to the Amazon using analytics tools
selected destinations Elasticsearch Service

Splunk

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Durably store the data 124
for analytics
Kinesis Data Streams and
Kinesis Data Firehose
Characteristics Amazon Kinesis Data Streams Amazon Kinesis Data Firehose

Processing time As fast as 70 milliseconds after ingestion Between 60–900 seconds

Stream storage and In shards, default 24 hours and up to 7 Max buffer size 128 MB and max time 900
duration days seconds
Data transformation and
None Uses AWS Lambda and AWS Glue
conversion
Amazon Kinesis Agent, applications using Amazon Kinesis Producer Library (KPL), AWS SDK
Data producer
for Java, Amazon CloudWatch Logs and CloudWatch Events, AWS IoT
AWS Lambda, Amazon Kinesis Data Analytics,
AWS Lambda, Amazon Kinesis Data
and Kinesis Data Firehose, apps using the KCL
Analytics, Amazon Kinesis Data Firehose,
Data consumer and SWK for Java, Amazon S3, Amazon
Applications using the Kinesis Client Library
Redshift, Amazon ES, Splunk, and Amazon
(KCL) and SDK for Java
Kinesis Data Analytics

Data compression None gzip, Snappy, Zip, or no data compression

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/kinesis/data-streams/faqs/?nc=sn&loc=5 125
https://aws.amazon.com/kinesis/data-firehose/faqs/?nc=sn&loc=5
When to use Kinesis Data Streams and
Kinesis Data Firehose

For data streaming applications with massive ingestion requirements

• Requires data to be sent to consumer analytics services for millisecond


response time
• Massively scalable
Amazon Kinesis • Data retention time ranging from hours to days
Data Streams • Example: Real-time gaming

For data streaming applications that require near real-time responses in seconds

• Need for data augmentation, data transformation, or data compression


• Need to save data to Amazon S3, Amazon Redshift, Amazon ES, Splunk, or
send data to Amazon Kinesis Data Analytics for analytics
Amazon Kinesis • Example: Log analytics
Data Firehose

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 126
Amazon Kinesis Data Analytics

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 127
Amazon Kinesis Data Analytics

Amazon Kinesis
Input Data Analytics Output

Capture streaming data Send processes data


with Amazon MSK, Query and analyze to analytics tools to
Amazon Kinesis Data streaming data create alerts and
Streams, Amazon Kinesis respond in real time
Data Firehose, or other
data sources

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 128
Kinesis data analytics application details

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 129
Use case: Clickstream analytics
Evolve from batch processing to real-time analytics

Amazon Kinesis Amazon Kinesis Amazon Kinesis Amazon Redshift


Input Output
Data Firehose Data Analytics Data Firehose

Websites send Collects the data and Processes data in Loads processed Runs analytics Readers see
clickstream data sends to Kinesis Data near-real time data into models to identify personalized content
Analytics Amazon Redshift content suggestions and
recommendations increase
engagement

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. s130
Streaming data analytics architecture
Alerts Notifications

Amazon Simple
Notification Service
3
5
2 Fan-out
Kinesis
Kinesis Data Streams
Data Analytics
Kinesis Machine
1
Data Streams learning
Kinesis
Amazon Data Firehose
Kinesis
Millions of Kinesis enabled 4 Data science
data sources Data Firehose applications
Amazon S3 Amazon Amazon
Logs and data lake RDS Redshift Downstream
processed data applications Reporting
AWS Lambda
DynamoDB Amazon
Elasticsearch 132
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Service
Solution 4: Data governance

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 136
Journey to a modern data architecture
Evolution of data architecture

10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010

Traditional Data warehouse Data lakes Real-time


Data Machine
data warehousing modernization on AWS analytics with governance learning
streaming data

Types of data used


© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 137
Challenges of data in data lakes

• Securing data
• Auditing data usage
• Managing data access
• Safeguarding sensitive data and PII
• Maintaining regulations and mandates

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 138
Data security and governance

With big data comes big responsibility.

34% 37%
of IT decision makers cite ensuring data of IT decision makers cite ensuring
governance/privacy as one of their security/compliance upon movement of
organization’s biggest digital data as one of their most important IoT
© ENTERPRISE STRATEGY GROUP, 2019.
transformation challenges priorities over the next 18–24 months

More than one in three companies cite data privacy and governance as
a hurdle to both digital transformation and IoT initiatives
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 139
https://www.esg-global.com/hubfs/ESG-Infographic-IT-Spending-Intentions-2019.pdf
Resolving PII dangers
Consumer
consent
violation
External Data
• Do these issues need to be hacking breach
resolved?
• Is there a solution architecture
that solves all PII issues? Personally
Second- identifiable
• What best practices can be party information Spyware
used to mitigate PII dangers? misuse (PII)

Unsecured
Espionage
devices
Rogue
agents

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 140
Amazon Macie

Continually evaluate
Discover sensitive
Amazon Macie Amazon S3 Take action
data
environment
Enable Amazon Automatically Analyzes bucket using Generates findings
Macie with one-click generates an ML and pattern matching and sends to
in the AWS inventory of to discover sensitive Amazon
Management Amazon S3 bucket data, like PII CloudWatch Events
Console or with a and details on the for integration into
• Financial
single API call bucket-level security workflows and
• Personal
and access controls remediation actions
• National
• Medical
• Credentials and secrets
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 141
De-identified data lake (DIDL) on AWS
A de-identified data lake (DIDL) is an architectural approach that reduces the risks
associated with managing data, particularly personally identifiable information (PII).

Benefits
Reduce risk
• Remove PII before it enters a data lake
Understand all the data
• Create a Data Catalog of an entire data lake
Reduce compliance costs
• Automate the discovery, classification, de-identification,
and ongoing monitoring of data across an organization
Turn data into an asset, not a liability
• Enable a broader set of governed analytic and machine learning use cases

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 142
Masking PII data
Email ID Name, SSN, State

Email Customer ID Transcript


[email protected] 19664 Just talked to Carlos Salazar
[email protected] 23423 Mary’s SSN is 000000000
[email protected] 99644 Mateo is moving to Nevada
NA 02945 It is expected to rain tomorrow

Email Customer ID Transcript


4t34gttt 7462391 Just talked to Jane Roe
44e5325 1239474 Jorge’s SSN is 666666666
0we&yrw 9983487 Sofia is moving to Texas

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
NA 3344325 It is expected to rain tomorrow
143
Extended solution 5: Insights and
monetization with ML on AWS

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 144
Journey to a modern data architecture
Evolution of data architecture

10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010

Traditional Data warehouse Data lakes Real-time


Data Machine
data warehousing modernization on AWS analytics with governance learning
streaming data

Types of data used


© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 145
Data lakes and machine learning

AI and
Business analytics machine learning
Machine learning requires:
• More data: Collect all types of data
• Flexibility: Define schema during analysis
Data warehouse Big data
Interactive Real time
queries processing • Scalability: Scale storage and compute (CPU or
Data Catalog GPU) independently
10011000010010101
11001010101110010
• Data transformation and processing: Run a broad
set of processing and analytics on the
10100001011111011
010
00111100101100101

same data without movement


10
0100011000010

Data warehouse
Data lake
• Security: Networking, identity, encryption, and
compliance

OLTP ERP CRM LOB Devices Web Sensors Social


© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 146
Amazon SageMaker
Machine learning at enterprise scale
• Managed Jupyter for enterprise data science
Build
• Sample notebooks for most common use cases
Notebooks for High-performance
common problems algorithms • Single-pass, streaming training algorithms

• Training models at scale without DevOps


Train and tune assistance
One-click training Hyperparameter • ML on ML to optimize hyperparameters
optimization

• Deploy to production with a single call


Deploy and manage
• Fully managed, production-grade inferences
One-click Fully managed
deployment elastic hosting

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/machine-learning/?nc2=h_ql_prod_ml 147
Machine learning resources
AWS STP: Machine Learning (ML) on AWS for ML Practitioners - Technical
https://partnercentral.awspartner.com/LmsSsoRedirect?RelayState=%2flearningobject%2fcurri
culum%3fid%3d25521

AWS Foundations: How Amazon Practical Data Science with The Machine Learning Pipeline
SageMaker Can Help Amazon SageMaker on AWS
• Fundamental digital course on • Learn to solve real-world use • Explore how to use the
how SageMaker mitigates the cases with machine learning machine learning pipeline to
core challenges of (intermediate) solve a real business problem
implementing an ML pipeline (intermediate)
• Duration: 1 day
• Duration: 30 minutes • Duration: 4 days
• https://www.aws.training/Sessi
• https://www.aws.training/Detai onSearch?pageNumber=1&cou • https://www.aws.training/Sessi
ls/Video?id=49646 rseId=40748 onSearch?pageNumber=1&cou
rseId=38910

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 149
Summary

Evolution of data architecture

10011000010010101110010
10101110010101000010111
11011010
0011110010110010110
0100011000010

Traditional Data warehouse Data lakes Real-time


Data Machine
data warehousing modernization on AWS analytics with
governance learning
streaming data

• Kinesis Data Streams Amazon SageMaker


Amazon Macie
• Kinesis Data Firehose
• Kinesis Data Analytics

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 151
Module 5: AWS Technical
Conversations and Engagement
Technical engagement
conversations using the Data
Flywheel

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Data Flywheel
2. Move and manage all
workloads in the cloud

3. Build 4. Analyze with


data-driven applications data lake
1. Move and store architectures
data in the cloud

5. Innovate with
machine learning
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 155
Conversations using the Data Flywheel
2. Move and manage all
workloads in the cloud

3. Build
4. Analyze with
data-driven apps
data lake
1. Move and store architectures
data in the cloud

5. Innovate with
machine learning
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 156
Architecture and data migration: APN Partner best
practices

Do Avoid

Engage AWS Engaging AWS Support


Phase 3 AWS Partner Development too late in the process
Managers
Architecture
and data Partner Solutions Architects
migration AWS Professional Services
Module 6: APN Partner
Opportunities and Resources
APN Partners and
AWS for Data Analytics

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Discounting and funding programs

Migration POC funding


programs
r pr ise
Ente unt
Disco m
ro g ra
P )
(E D P

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 204
AWS Data and Analytics Competency
categories
Data Analytics Provide a set of integrated tools to solve data
Platforms analytics challenges within a standard framework

Provide highly scalable databases that


NoSQL/New SQL
organize data into a structure

Enable customers to move and consolidate data


Data Integration and
from disparate sources, transform it, and
Preparation
prepare it for analytics

Business Intelligence
Help customers turn raw data into actionable business
(BI) and Data
Visualization
information, such as reporting, dashboards, and data visualization

Data Governance and


Security
Help customers discover, categorize, and control their data

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 205
Collaboration workflow
Register an
Receive approval Engage AWS
opportunity on Engage Before SA
from AWS PSM account or
APN Partner AWS sales involvement
Partner SA
Central

Build a reference Build and deliver


Conduct a big Validate the Direct SA
solution the live solution
data POC POC involvement

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 207
AWS data analytics solutions and
Immersion Days

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Immersion Days
Designed to help APN Advanced and Premier Consulting Partners deliver technical data analytics
workshops to their customers and help grow their businesses

Data Engineering
Database Migration
Immersion Day Amazon EMR
Immersion Day
Build a serverless data lake Immersion Day
Give your customers a head
solution on AWS including Focus on unique facets of
start with the AWS Database
modules focusing on Amazon EMR for big data
Migration Service and the
ingestion, hydration, workloads
Schema Conversion Tool
exploration, and consumption

… and many more.

Benefits: Access to technical workshop content, AWS usage credits, Market Development Funds (MDF)
opportunities, and support from AWS teams
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/partners/immersion-days/ 211
AWS Certified data analytics and
learning resources

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Technical Professional Learning Path

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 216
AWS Certified Data Analytics – Specialty

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/certification/certified-data-analytics-specialty/ 217
Partner Cast: Analytics
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. 219
Thank You !

Nidhi Seth: [email protected]

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission
from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course-
[email protected]. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners.

You might also like