0% found this document useful (0 votes)

386 views51 pages

Getting Started With Amazon Redshift

Uploaded by

rohit kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

386 views51 pages

Getting Started With Amazon Redshift

Uploaded by

rohit kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Getting Started with

Amazon Redshift

Maor Kleider, Sr. Product Manager, Amazon Redshift

• Introduction
• Benefits
• Use cases
• Getting started
• Q&A
What is Big Data?

When your data sets become so large and diverse

that you have to start innovating around how to
collect, store, process, analyze and share them
It’s never been easier to generate vast amounts of data

Generate

Individual AWS customers Collect & Store

generate over a PB/day

Analyze

Collaborate & Act

Amazon S3 lets you collect and store all this data

Generate

Store exabytes of
Individual AWS customers Collect & Store
data in S3
generating over PB/day

Analyze

Collaborate & Act

But how do you analyze it?

Generate

Store exabytes of
Individual AWS customers Collect & Store
data in S3
generating over PB/day

Highly
Analyze
Constrained

Collaborate & Act

The Dark Data Problem
Most generated data is unavailable for analysis
Data Volume

Generated Data
Available for Analysis

Year
1990 2000 2010 2020
Sources:
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
AWS Big Data Portfolio
Collect Store Analyze

Amazon Kinesis AWS Direct

Amazon S3 Amazon Glacier Amazon EMR Amazon EC2
Firehose Connect

Amazon Kinesis Amazon Amazon Amazon RDS, Amazon Athena

Amazon
Analytics Snowball Dynamo DB Amazon Aurora Redshift Athena

Amazon Kinesis Amazon Amazon Amazon Amazon Machine

Streams CloudSearch Elasticsearch QuickSight Learning

AWS Database Migration Service AWS

AWSGlue
Glue
Amazon Redshift

shift
Fast, simple, petabyte-scale data warehousing for $1,000/TB/Year

150+ features
a lot faster
a lot simpler
a lot cheaper

Relational data warehouse

Massively parallel; petabyte scale

Amazon Fully managed

Redshift HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
Selected Amazon Redshift customers
Use Case: Traditional Data Warehousing

Business Advanced pipelines Secure and Bulk Loads

Reporting and queries Compliant and Updates

Easy Migration – Point & Click using AWS Database Migration Service
Secure & Compliant – End-to-End Encryption. SOC 1/2/3, PCI-DSS, HIPAA and FedRAMP compliant
Large Ecosystem – Variety of cloud and on-premises BI and ETL tools

Japanese Mobile World’s Largest Children’s Powering 100 marketplaces

Phone Provider Book Publisher in 50 countries
Use Case: Log Analysis

Log & Machine Clickstream Time-Series

IOT Data Events Data Data

Cheap – Analyze large volumes of data cost-effectively

Fast – Massively Parallel Processing (MPP) and columnar architecture for fast queries and parallel loads
Near real-time – Micro-batch loading and Amazon Kinesis Firehose for near-real time analytics

Interactive data analysis and Ride analytics for pricing Ad prediction and
recommendation engine and product development on-demand analytics
Use Case: Business Applications

Multi-Tenant BI Back-end Analytics as a

Applications services Service

Fully Managed – Provisioning, backups, upgrades, security, compression all come built-in so you can
focus on your business applications
Ease of Chargeback – Pay as you go, add clusters as needed. A few big common clusters, several
data marts
Service Oriented Architecture – Integrated with other AWS services. Easy to plug into your pipeline

Infosys Information Analytics-as-a- Product and Consumer

Platform (IIP) Service Analytics
Amazon Redshift architecture
Leader node
Simple SQL endpoint BI tools Analytics tools SQL clients

Stores metadata JDBC/ODBC

Optimizes query plan
Coordinates query execution

Compute nodes
Leader node
Local columnar storage
10 GigE
Parallel/distributed execution of all queries, loads, (HPC)
backups, restores, resizes

Start at just $0.25/hour, grow to 2 PB (compressed) Compute node Compute node Compute node
DC1: SSD; scale from 160 GB to 326 TB
Ingestion
DS2: HDD; scale from 2 TB to 2 PB Backup
Restore

Amazon S3 Amazon EMR Amazon Dynamo DB SSH

Benefit #1: Amazon Redshift is fast
analyze compression listing;

Dramatically less I/O Table | Column | Encoding

Direct-attached storage 10 10 | 13 | 14 | 26 |…

324 … | 100 | 245 | 324

Large data block sizes 375 375 | 393 | 417…

623 … 512 | 549 | 623

637 637 | 712 | 809 …

959 … | 834 | 921 | 959

Benefit #1: Amazon Redshift is fast

Parallel and distributed

Query

Load

Export

Backup

Restore

Resize
Benefit #1: Amazon Redshift is fast

Hardware optimized for I/O intensive workloads, 4 GB/sec/node

Enhanced networking, over 1 million packets/sec/node

Choice of storage type, instance size

Regular cadence of auto-patched improvements

Benefit #1: Amazon Redshift is fast

“Did I mention that it’s ridiculously fast? We’re using “After investigating Redshift, Snowflake, and
it to provide our analysts with an alternative to Hadoop” BigQuery, we found that Redshift offers top-of-the-
line performance at best-in-market price points”

“On our previous big data warehouse system, it took

around 45 minutes to run a query against a year of
“…[Redshift] performance has blown away everyone
data, but that number went down to just 25 seconds
here. We generally see 50-100X speedup over Hive”
using Amazon Redshift”

“We regularly process multibillion row datasets “We saw a 2X performance improvement on a wide
and we do that in a matter of hours. We are heading variety of workloads. The more complex the queries,
to up to 10 times more data volumes in the next couple the higher the performance improvement”
of years, easily
And has gotten faster...

5X Query throughput improvement over the past year

 Memory allocation (launched)
 Improved commit and I/O logic (launched)
 Queue hopping (launched)
Fast
 Query monitoring rules (launched)

10X Vacuuming performance improvement

 Ensures data is sorted for efficient and fast I/O
 Reclaims space from deleted rows
 Enhanced vacuum performance leads to better system throughput

Efficient
The life of a query
Client Amazon Redshift Cluster

2 3
BI tools
Compute node

1
Queue 1

Analytics tools
Queue 2
Compute node

Leader node

SQL clients
Compute node
Query monitoring rules
• Allows automatic handling of runaway (poorly written) queries

• Metrics with operators and values (e.g. query_cpu_time > 1000) create a predicate

• Multiple predicates can be AND-ed together to create a rule

• Multiple rules can be defined for a queue in WLM. These rules are OR-ed together

If { rule } then [action]

{ rule : metric operator value } eg: rows_scanned > 100000
• Metric : cpu_time, query_blocks_read, rows scanned, query
execution time, cpu & io skew per slice, join_row_count, etc.
• Operator : <, >, ==
• Value : integer
[action] : hop, log, abort
Query monitoring rules
Monitor and control
cluster resources
consumed by a query

Get notified, abort and

reprioritize long-
running / bad queries

Pre-defined templates
for common use
cases
Query monitoring rules
Common use cases:
• Protect interactive queues
INTERACTIVE = { “query_execution_time > 15 sec” or
“query_cpu_time > 1500 uSec” or
”query_blocks_read > 18000 blocks” } [HOP]

• Monitor ad-hoc queues for heavy queries

AD-HOC = { “query_execution_time > 120” or
“query_cpu_time > 3000” or
”query_blocks_read > 180000” or
“memory_to_disk > 400000000000”} [LOG]

• Limit the number of rows returned to a client

MAXLINES = { “RETURN_ROWS > 50000” } [ABORT]
Benefit #2: Amazon Redshift is inexpensive

Price per hour for Effective annual

DS2 (HDD) DS2.XL single node price per TB compressed

On-demand $ 0.850 $ 3,725

1 year reservation $ 0.500 $ 2,190 Pricing is simple
3 year reservation $ 0.228 $ 999 Number of nodes x price/hour
No charge for leader node
Price per hour for Effective annual
No upfront costs
DC1 (SSD) DC1.L single node price per TB compressed Pay as you go
On-demand $ 0.250 $ 13,690
1 year reservation $ 0.161 $ 8,795
3 year reservation $ 0.100 $ 5,500
Benefit #3: Amazon Redshift is fully managed

Continuous/incremental backups
Multiple copies within cluster Compute node Compute node Compute node

Continuous and incremental backups

to Amazon S3
Region 1

Continuous and incremental backups

across regions Amazon S3

Streaming restore Region 2

Amazon S3
Benefit #3: Amazon Redshift is fully managed

Fault tolerance
Disk failures Compute node Compute node Compute node

Node failures

Network failures Region 1

Availability Zone/region level disasters Amazon S3

Region 2

Amazon S3
Node fault tolerance
Data-path monitoring agents
Node level monitoring
can detect SW/HW
Compute node
issues and take action

Leader node Compute node

Client

Compute node
Node fault tolerance
Data-path monitoring agents Failure is detected at one
of the compute nodes

Compute node

Leader node Compute node

Client

Compute node
Node fault tolerance
Data-path monitoring agents Redshift parks the
connections

Compute node Next, the node is

replaced

Leader node Compute node

Client

Compute node
Node fault tolerance
Data-path monitoring agents Queries are re-submitted

Compute node

Leader node Compute node

Client

Compute node
Node fault tolerance
Data-path monitoring agents Additional monitoring
layer for the leader
Cluster-level monitoring agents node and network
Compute node

Leader node Compute node

Client

Compute node
Benefit #4: Security is built-in Customer VPC

 Load encrypted from S3

BI tools Analytics tools SQL clients
 SSL to secure data in transit
JDBC/ODBC
 ECDHE perfect forward secrecy
Internal VPC
 Amazon VPC for network isolation

 Encryption to secure data at rest

Leader node

 All blocks on disks and in S3 encrypted 10 GigE

(HPC)
 Block key, cluster key, master key (AES-256)

 On-premises HSM & AWS CloudHSM support

Compute node Compute node Compute node

 Audit logging and AWS CloudTrail integration

Ingestion
Backup
 SOC 1/2/3, PCI-DSS, FedRAMP, BAA Restore

Amazon S3 Amazon EMR Amazon Dynamo DB SSH

Benefit #5: Amazon Redshift is powerful
• Approximate functions

• User defined functions

• Machine learning

• Data science
Benefit #6: Amazon Redshift has a large ecosystem

Data integration Business intelligence Systems integrators

Benefit #7: Service oriented architecture

EC2/SSH
DynamoDB

RDS/Aurora

Amazon ML

EMR
Amazon
Redshift CloudSearch

Data Pipeline
Amazon
Mobile
S3 Amazon Kinesis Analytics
Amazon Redshift Spectrum
Amazon Redshift Spectrum
Run SQL queries directly against data in S3 using thousands of nodes

Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query

S3
SQL
High concurrency: Multiple No ETL: Query data in-place Full Amazon Redshift
clusters access same data using open file formats SQL support
Life of a query Query
SELECT COUNT(*)
FROM S3.EXT_TABLE
GROUP BY…

JDBC/ODBC

Amazon
Redshift

Redshift Spectrum ...

Fast @ Exabyte scale
1 2 3 4 N

Amazon S3 Data Catalog

Exabyte-scale object storage Apache Hive Metastore
Amazon Redshift Spectrum – Current support

File formats Compression Encryption

• Parquet • Gzip • SSE with AES256

• CSV • Snappy • SSE KMS with default
• Sequence • Lzo (coming soon) key
• RCFile • Bz2
• ORC (coming soon)
• RegExSerDe (coming soon)

Column types Table type

• Numeric: bigint, int, smallint, float, double • Non-partitioned table

and decimal (s3://mybucket/orders/..)
• Char/varchar/string • Partitioned table
• Timestamp (s3://mybucket/orders/date=YYYY-MM-
• Boolean DD/..)
• DATE type can be used only as a
partitioning key
The Emerging Analytics Architecture

Storage
Amazon S3 AWS Glue Data Catalog
Exabyte-scale Object Storage Hive-compatible Metastore

Serverless
Compute
Amazon Kinesis Firehose AWS Glue Amazon Redshift Spectrum AWS Lambda
Real-Time Data Streaming ETL & Data Catalog Fast @ Exabyte scale Trigger-based Code Execution

Data
Processing
Amazon EMR Amazon Redshift Amazon Athena
Athena
Managed Hadoop Applications Petabyte-scale Data Warehousing Interactive Query
Over 20 customers helped preview Amazon Redshift Spectrum
Use cases
NTT Docomo: Japan’s largest mobile service provider

68 million customers Scaling challenges

Tens of TBs per day of data across a Performance issues
mobile network
6 PB of total data (uncompressed) Need same level of security
Data science for marketing Need for a hybrid environment
operations, logistics, and so on

Greenplum on-premises
NTT Docomo: Japan’s largest mobile service provider

125 node DS2.8XL cluster

S3
4,500 vCPUs, 30 TB RAM
2 PB compressed
Data ET Forwarder
Source State Loader
Management
10x faster analytic queries
AWS
Direct
Connect 50% reduction in time for new
Client Amazon Redshift Sandbox BI application deployment
Significantly less operations
overhead
Nasdaq: powering 100 marketplaces in 50 countries

Orders, quotes, trade executions, Expensive legacy DW

market “tick” data from 7 exchanges ($1.16 M/yr.)
7 billion rows/day Limited capacity (1 yr. of data
Analyze market share, client activity, online)
surveillance, billing, and so on
Needed lower TCO
Must satisfy multiple security
Microsoft SQL Server on-premises and regulatory requirements
Similar performance
Nasdaq: powering 100 marketplaces in 50 countries

23 node DS2.8XL cluster

828 vCPUs, 5 TB RAM
368 TB compressed
2.7 T rows, 900 B derived
8 tables with 100 B rows
7 man-month migration
¼ the cost, 2x storage, room to
grow
Faster performance, very
secure
Amazon.com clickstream analytics

Web log analysis for Amazon.com

• PBs workload, 2TB/day@67% YoY
• Largest table: 400 TB

Understand customer behavior

Previous solution
• Legacy DW (Oracle)—query across 1 week/hr
• Hadoop—query across 1 month/hr
Results with Amazon Redshift

• Query 15 months in 14 min • 100 node DS2.8XL clusters • 20% time of one DBA

• Load 5B rows in 10 min • Easy resizing • Increased productivity

• 21B w/ 10B rows: 3 days to 2 hrs • Managed backups and restore

(Hive  Redshift)
• Failure tolerance and recovery
• Load pipeline: 90 hrs to 8 hrs
(Oracle  Redshift)
Resources

Detail Pages
• http://aws.amazon.com/redshift
• https://aws.amazon.com/marketplace/redshift/
• https://aws.amazon.com/redshift/developer-resources/
• Amazon Redshift Utilities - GitHub

Best Practices
• http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-
practices.html
• http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-
practices.html
• http://docs.aws.amazon.com/redshift/latest/dg/c-optimizing-query-
performance.html
Thank you!

80 Mock Questions - Aws Certified Data Engineer Associate
100% (1)
80 Mock Questions - Aws Certified Data Engineer Associate
33 pages
WhizCard AWS Certified Developer Associate (DVA C02)
No ratings yet
WhizCard AWS Certified Developer Associate (DVA C02)
87 pages
Application Note - Schema Companion and Reference Guide For Content Server
No ratings yet
Application Note - Schema Companion and Reference Guide For Content Server
103 pages
AWS RDS User Guide PDF
100% (1)
AWS RDS User Guide PDF
759 pages
AWS S3 Interview Questions
No ratings yet
AWS S3 Interview Questions
23 pages
AWS-Solution-Architect-Associate Quiz
No ratings yet
AWS-Solution-Architect-Associate Quiz
6 pages
APC Building Data Lakes On AWS SG
No ratings yet
APC Building Data Lakes On AWS SG
187 pages
Awsfundamentals Introduction
0% (1)
Awsfundamentals Introduction
9 pages
DataAnalytics AWS PDF
No ratings yet
DataAnalytics AWS PDF
133 pages
AWS Certified Data Analytics - Specialty Exam Guide - v1.0!08!23-2019 - FINAL
0% (1)
AWS Certified Data Analytics - Specialty Exam Guide - v1.0!08!23-2019 - FINAL
2 pages
Amazon Redshift Database Developer Guide
No ratings yet
Amazon Redshift Database Developer Guide
783 pages
Amazon Red Shift
No ratings yet
Amazon Red Shift
17 pages
Aws - DB Notes
No ratings yet
Aws - DB Notes
10 pages
Deep Dive and Best Practices For Amazon Redshift ANT418
100% (1)
Deep Dive and Best Practices For Amazon Redshift ANT418
85 pages
Redshift DG PDF
100% (1)
Redshift DG PDF
1,161 pages
Unite Real-Time and Batch Analytics With AWS Glue
No ratings yet
Unite Real-Time and Batch Analytics With AWS Glue
28 pages
1 AWS Analytics and Data Lakes
No ratings yet
1 AWS Analytics and Data Lakes
15 pages
Amazon S3: Prepared by
No ratings yet
Amazon S3: Prepared by
42 pages
AWS CP - Sruya Kiran Sir Notes
No ratings yet
AWS CP - Sruya Kiran Sir Notes
8 pages
AcademyCloudFoundations Module 01
No ratings yet
AcademyCloudFoundations Module 01
47 pages
Dea C01
No ratings yet
Dea C01
7 pages
Amazon Web Services Training
No ratings yet
Amazon Web Services Training
5 pages
1 AWS EC2 Interview Questions - MindMajix
No ratings yet
1 AWS EC2 Interview Questions - MindMajix
25 pages
05.azure Data Lake Authentication
No ratings yet
05.azure Data Lake Authentication
16 pages
Bigquery: Introducing Powerful New Enterprise Data Warehousing Features
No ratings yet
Bigquery: Introducing Powerful New Enterprise Data Warehousing Features
6 pages
AWS Glue 101 - All You Need To Know With A Full Walk-Through - by Kevin Bok - Towards Data Science
No ratings yet
AWS Glue 101 - All You Need To Know With A Full Walk-Through - by Kevin Bok - Towards Data Science
23 pages
Migrate Oracle DB To AWS RDS Using Oracle Dump and DMS
No ratings yet
Migrate Oracle DB To AWS RDS Using Oracle Dump and DMS
41 pages
Databricks Cloud Workshop: SF, 2015-05-20! Download Slides
100% (1)
Databricks Cloud Workshop: SF, 2015-05-20! Download Slides
168 pages
AWS Certified SysOps Administrator
No ratings yet
AWS Certified SysOps Administrator
3 pages
AmazonWebService Course Content
No ratings yet
AmazonWebService Course Content
15 pages
Change Data Capture Using Aws Dms Ra
No ratings yet
Change Data Capture Using Aws Dms Ra
3 pages
MLS C01
0% (1)
MLS C01
4 pages
MongoDB Indexes Guide
No ratings yet
MongoDB Indexes Guide
68 pages
Advanced Identity in AWS
100% (1)
Advanced Identity in AWS
28 pages
Orchestrate Redshift ETL Using AWS Glue and Step Functions Report
No ratings yet
Orchestrate Redshift ETL Using AWS Glue and Step Functions Report
31 pages
Amazon Web Service Overview
No ratings yet
Amazon Web Service Overview
48 pages
AWS Toolkit For Eclipse: User Guide
No ratings yet
AWS Toolkit For Eclipse: User Guide
68 pages
Practice Test 6
No ratings yet
Practice Test 6
50 pages
Aws-Saa-C02-Course - Readme - MD at Master Alozano-77 - Aws-Saa-C02-Course Github PDF
0% (1)
Aws-Saa-C02-Course - Readme - MD at Master Alozano-77 - Aws-Saa-C02-Course Github PDF
101 pages
AWS Certification Preparation Notes
No ratings yet
AWS Certification Preparation Notes
25 pages
Part 7 AWS Solution Architect Real World Scenarios 1734966847
No ratings yet
Part 7 AWS Solution Architect Real World Scenarios 1734966847
7 pages
Aws Certified Solutions Architect Associate Study Guide B08L1BC3QR
No ratings yet
Aws Certified Solutions Architect Associate Study Guide B08L1BC3QR
91 pages
AWS-Archi-SERVERLESS MULTI-TIER ARCHITECTURE
No ratings yet
AWS-Archi-SERVERLESS MULTI-TIER ARCHITECTURE
7 pages
Mongo DB
No ratings yet
Mongo DB
31 pages
AWS Cloud Practitioner Practice Exams Sample
100% (1)
AWS Cloud Practitioner Practice Exams Sample
5 pages
2 AWS Services
No ratings yet
2 AWS Services
21 pages
Team Members:: Sahilpreet Singh Rohil Bansal Navleen Kaur Vedant Taak
No ratings yet
Team Members:: Sahilpreet Singh Rohil Bansal Navleen Kaur Vedant Taak
48 pages
Certification
No ratings yet
Certification
16 pages
Aws Test
No ratings yet
Aws Test
122 pages
AWS Data Engineering Notes by Iusmanmaqbool
No ratings yet
AWS Data Engineering Notes by Iusmanmaqbool
79 pages
Optimizing Tableau Aws Redshift Whitepaper
No ratings yet
Optimizing Tableau Aws Redshift Whitepaper
33 pages
Aws Overview
No ratings yet
Aws Overview
36 pages
5.1 AWS - Basic - All
No ratings yet
5.1 AWS - Basic - All
734 pages
Company Interview Question Bank
No ratings yet
Company Interview Question Bank
16 pages
Aws SMS
No ratings yet
Aws SMS
34 pages
Danish Shamim: Professional Profile
No ratings yet
Danish Shamim: Professional Profile
3 pages
Architecting On Aws1
No ratings yet
Architecting On Aws1
4 pages
Top Cloud Service Providers - A Quick Comparison - Avenga
No ratings yet
Top Cloud Service Providers - A Quick Comparison - Avenga
25 pages
Tauqeer Iqbal AWS Architect IDC
No ratings yet
Tauqeer Iqbal AWS Architect IDC
6 pages
AWS Boto - 1
No ratings yet
AWS Boto - 1
55 pages
Amazon Redshift论文
No ratings yet
Amazon Redshift论文
13 pages
Amul
No ratings yet
Amul
16 pages
CSA Database Monitoring Guide
No ratings yet
CSA Database Monitoring Guide
49 pages
DB2 Administrators Unix Commands Surviva PDF
No ratings yet
DB2 Administrators Unix Commands Surviva PDF
7 pages
DB2 Administrators Unix Commands Surviva PDF
No ratings yet
DB2 Administrators Unix Commands Surviva PDF
7 pages
Running Oracle EBS in The Cloud
No ratings yet
Running Oracle EBS in The Cloud
60 pages
Advanced Configuration and Cloud Deployment of Oracle E-Business Suite
No ratings yet
Advanced Configuration and Cloud Deployment of Oracle E-Business Suite
136 pages
Oracle EBS To Cloud - IaaS
No ratings yet
Oracle EBS To Cloud - IaaS
35 pages
The Analyst's Guide To Amazon Redshift: Periscope Data Presents
No ratings yet
The Analyst's Guide To Amazon Redshift: Periscope Data Presents
17 pages
Other Relevant Roadmaps: Postgresql Roadmap Backend Developer Roadmap
No ratings yet
Other Relevant Roadmaps: Postgresql Roadmap Backend Developer Roadmap
1 page
MC4020 DWDM Iat 1 (Set1)
No ratings yet
MC4020 DWDM Iat 1 (Set1)
1 page
14.student Information Sy
No ratings yet
14.student Information Sy
7 pages
Usha DA
No ratings yet
Usha DA
3 pages
Dsa Lab - Assignment-Queue Submission Deadline (On or Before) : 15th September 2021, 5 PM Policies For Submission and Evaluation
100% (1)
Dsa Lab - Assignment-Queue Submission Deadline (On or Before) : 15th September 2021, 5 PM Policies For Submission and Evaluation
7 pages
Lab12 - Create Database Using phpMyAdmin
No ratings yet
Lab12 - Create Database Using phpMyAdmin
7 pages
Vdocuments - MX Big Data The 5 Vs Everyone Must Know
No ratings yet
Vdocuments - MX Big Data The 5 Vs Everyone Must Know
9 pages
Cs 403
No ratings yet
Cs 403
6 pages
Advantages and Disadvantages of DBMS
No ratings yet
Advantages and Disadvantages of DBMS
5 pages
1.2.2.9 Hands-On Lab Provision An Instance of IBM Db2 Lite Plan - MD
No ratings yet
1.2.2.9 Hands-On Lab Provision An Instance of IBM Db2 Lite Plan - MD
3 pages
Homework 4
0% (1)
Homework 4
2 pages
IT2305 Database Systems 1 PDF
No ratings yet
IT2305 Database Systems 1 PDF
498 pages
Lecture 2
No ratings yet
Lecture 2
16 pages
What Is RCU (2018) - Paul E. McKenney
No ratings yet
What Is RCU (2018) - Paul E. McKenney
241 pages
Alfresco Workdesk Installation Guide-4.1
100% (1)
Alfresco Workdesk Installation Guide-4.1
72 pages
Android Penetration Testing Report
No ratings yet
Android Penetration Testing Report
38 pages
DFD
No ratings yet
DFD
13 pages
Shaikh Abdul
No ratings yet
Shaikh Abdul
5 pages
SQL - Structured Query Language
No ratings yet
SQL - Structured Query Language
24 pages
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
No ratings yet
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
28 pages
SQL Reminder
No ratings yet
SQL Reminder
20 pages
By Deepshikha Neerunjun
No ratings yet
By Deepshikha Neerunjun
45 pages
Surface Automation Training - Setup Guide
No ratings yet
Surface Automation Training - Setup Guide
2 pages
Jet Reports Help
No ratings yet
Jet Reports Help
158 pages
Midterm414 18au Sol
No ratings yet
Midterm414 18au Sol
9 pages
Dipankar Banerjee: Associate Engineer-Technology
No ratings yet
Dipankar Banerjee: Associate Engineer-Technology
2 pages
GCSE OCR 1.2 Suitable Storage Devices & Storage Media
No ratings yet
GCSE OCR 1.2 Suitable Storage Devices & Storage Media
13 pages
TBD SQL
No ratings yet
TBD SQL
3 pages
DELIMITER
No ratings yet
DELIMITER
6 pages

Getting Started With Amazon Redshift

Uploaded by

Getting Started With Amazon Redshift

Uploaded by

Getting Started with

Maor Kleider, Sr. Product Manager, Amazon Redshift

When your data sets become so large and diverse

Individual AWS customers Collect & Store

Collaborate & Act

Collaborate & Act

Collaborate & Act

Amazon Kinesis AWS Direct

Amazon Kinesis Amazon Amazon Amazon RDS, Amazon Athena

Amazon Kinesis Amazon Amazon Amazon Amazon Machine

AWS Database Migration Service AWS

Relational data warehouse

Amazon Fully managed

Business Advanced pipelines Secure and Bulk Loads

Japanese Mobile World’s Largest Children’s Powering 100 marketplaces

Log & Machine Clickstream Time-Series

Cheap – Analyze large volumes of data cost-effectively

Multi-Tenant BI Back-end Analytics as a

Infosys Information Analytics-as-a- Product and Consumer

Stores metadata JDBC/ODBC

Amazon S3 Amazon EMR Amazon Dynamo DB SSH

Dramatically less I/O Table | Column | Encoding

324 … | 100 | 245 | 324

623 … 512 | 549 | 623

959 … | 834 | 921 | 959

Parallel and distributed

Hardware optimized for I/O intensive workloads, 4 GB/sec/node

Enhanced networking, over 1 million packets/sec/node

Choice of storage type, instance size

Regular cadence of auto-patched improvements

“On our previous big data warehouse system, it took

5X Query throughput improvement over the past year

10X Vacuuming performance improvement

• Multiple predicates can be AND-ed together to create a rule

If { rule } then [action]

Get notified, abort and

• Monitor ad-hoc queues for heavy queries

• Limit the number of rows returned to a client

Price per hour for Effective annual

On-demand $ 0.850 $ 3,725

Continuous and incremental backups

Continuous and incremental backups

Streaming restore Region 2

Network failures Region 1

Availability Zone/region level disasters Amazon S3

Leader node Compute node

Leader node Compute node

Compute node Next, the node is

Leader node Compute node

Leader node Compute node

Leader node Compute node

 Load encrypted from S3

 Encryption to secure data at rest

 All blocks on disks and in S3 encrypted 10 GigE

 On-premises HSM & AWS CloudHSM support

 Audit logging and AWS CloudTrail integration

Amazon S3 Amazon EMR Amazon Dynamo DB SSH

• User defined functions

Data integration Business intelligence Systems integrators

Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query

Redshift Spectrum ...

Amazon S3 Data Catalog

File formats Compression Encryption

• Parquet • Gzip • SSE with AES256

Column types Table type

• Numeric: bigint, int, smallint, float, double • Non-partitioned table

68 million customers Scaling challenges

125 node DS2.8XL cluster

Orders, quotes, trade executions, Expensive legacy DW

23 node DS2.8XL cluster

Web log analysis for Amazon.com

Understand customer behavior

• Load 5B rows in 10 min • Easy resizing • Increased productivity

• 21B w/ 10B rows: 3 days to 2 hrs • Managed backups and restore

You might also like