0% found this document useful (0 votes)
70 views

Data Architecture On Aws Slides

Uploaded by

iransamir
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

Data Architecture On Aws Slides

Uploaded by

iransamir
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Data Architecture on AWS

David Tucker
TECHNICAL ARCHITECT & CTO CONSULTANT
@_davidtucker_ davidtucker.net
Reviewing approaches for integrating
Overview data from your own data center
Examining approaches for processing
data
Exploring data analysis approaches
Integrating machine learning and AI into
data analysis
Integrating On-premise Data
On-premise Data Integration Services

AWS Storage AWS


Gateway DataSync
Hybrid-cloud storage Automated data
service transfer service
AWS Storage Gateway
Integrates cloud storage into your local
network
Deployed as a VM or specific hardware
appliance
Integrates with S3 and EBS
Supports three different gateway types
- Tape Gateway
- Volume Gateway
- File Gateway
Gateway Types

File Gateway Tape Gateway Volume Gateway


Stores files in Amazon S3 Enables tape backup Provides cloud based
while providing cached processes to store data in iSCSI volumes to local
low-latency local access the cloud on virtual tapes applications
AWS DataSync

Leverages the DataSync agent deployed


as a VM on your network
Integrates with S3, EFS, and FSx for
Windows File Server on AWS
Greatly improved speed of transfer due
to custom protocol and optimizations
Charged per GB of data transferred
Processing Data
Data Processing Services

AWS Glue Amazon EMR AWS Data Pipeline


Managed Extract, Big-data cloud Data workflow
Transform, and Load processing using orchestration service
(ETL) Service popular tools across AWS services
AWS Glue

Fully managed ETL (extract, transform,


and load) service on AWS
Supports data in Amazon RDS,
DynamoDB, Redshift, and S3
Supports a serverless model of
execution
Amazon EMR

Enables big-data processing on


Amazon EC2 and S3
Supports popular open-source
frameworks and tools
Operates in a clustered environment
without additional configuration
Supports many different big-data use
cases
Supported Amazon EMR Frameworks

Apache Spark Apache Hive Apache HBase

Apache Flink Apache Hudi Presto


AWS Data Pipeline

Managed ETL (extract, transform, and


load) service on AWS
Manages data workflow through AWS
services
Supports S3, EMR, Redshift,
DynamoDB, and RDS
Can integrate on-premise data stores
Analyzing Data
Data Analysis Services

Amazon Athena Amazon Quicksight Amazon CloudSearch


Service that enables Business intelligence Managed search
querying of data service enabling data service for custom
stored in Amazon S3 dashboards applications
Amazon Athena

Fully-managed serverless service


Enables querying of large-scale data
stored within Amazon S3
Queries are written using standard SQL
Charged based on data scanned for
query
Amazon Quicksight

Fully managed business intelligence


service
Enables dynamic data dashboard
based on data stored in AWS
Charged on a per-user and per-session
pricing model
Multiple versions provided based on
needs
Amazon CloudSearch

Fully-managed search service on AWS


Support scaling of search
infrastructure to meet demand
Charged per hour and instance type of
search infrastructure
Enables developers to integrate search
into custom applications
Integrating AI and Machine Learning
AI and Machine Learning Services

Amazon Rekognition Amazon Translate Amazon Transcribe


Computer vision Text translation Speech to text
service powered by service powered by solution using
Machine Learning Machine Learning Machine Learning
Amazon Rekognition

Fully-managed image and video


recognition deep learning service
Identifies objects in images
Identifies objects and actions in videos
Can detect specific people using facial
analysis
Supports custom labels for your
business objects
Amazon Translate

Fully-managed service for translation


of text
Currently supports 54 languages
Can perform language identification
Works both in batch and real-time
Amazon Transcribe
Fully-managed speech recognition
services
Recorded speech is converted into text
in custom applications
Includes a specific sub-service for
medical use
Supports batch and real-time
transcription
Currently supports 31 languages
Scenario Based Review
Scenario 1

Ruth is a data scientist for a financial


services company
Large-scale data set needs to be
processed before analysis
Ruth doesn’t want to manage servers
but just wants to define processing
What service would you recommend to
Ruth?
Scenario 2

Jessi is a member of the IT team for a


biotech company
She is currently working to identify an
approach for controlled lab access
She wants leverage AI to determine
access based on facial imaging
Is there an AWS service that can help
with this approach?
Scenario 3

Roger’s company sells custom services


around machine learning
His head of sales is trying to find a great
way to visualize their sales data
This data is currently stored in Redshift
as their data warehouse
What AWS service would allow this
access to the data by non-technical
resources?
Summary
Reviewed approaches for integrating
Summary data from your own data center
Examined approaches for processing
data
Explored data analysis approaches
Integrated machine learning and AI into
data analysis
Scenario 1

Ruth is a data scientist for a financial


services company
Large-scale data set needs to be
processed before analysis
Ruth doesn’t want to manage servers
but just wants to define processing
What service would you recommend to
Ruth?
Solution: AWS Glue
Scenario 2

Jessi is a member of the IT team for a


biotech company
She is currently working to identify an
approach for controlled lab access
She wants leverage AI to determine
access based on facial imaging
Is there an AWS service that can help
with this approach?

Solution: Amazon Rekognition


Scenario 3

Roger’s company sells custom services


around machine learning
His head of sales is trying to find a great
way to visualize their sales data
This data is currently stored in Redshift
as their data warehouse
What AWS service would allow this
access to the data by non-technical
resources?

Solution: Amazon Quicksight

You might also like