0% found this document useful (0 votes)
34 views

Using Machine Learning From Your Database: Danilo Poccia, Chief Evangelist (EMEA) @danilop

Uploaded by

Tony
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Using Machine Learning From Your Database: Danilo Poccia, Chief Evangelist (EMEA) @danilop

Uploaded by

Tony
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Using Machine Learning

From Your Database


Danilo Poccia, Chief Evangelist (EMEA)
@danilop

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Where to store data?
Object Storage
Unstructured
Database LOB (Large OBject)

NoSQL Database
Semi-Structured Time Series Database
Relational Database
Data
with JSON Document Extensions

Structured Relational Database

NoSQL Database

Transient Memory
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Aurora: Design Considerations for High
Throughput Cloud-Native Relational Databases (2017)
Amazon Aurora: Design Considerations for High
Throughput Cloud-Native Relational Databases
Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta,
Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, Xiaofeng Bao

Amazon Web Services

ABSTRACT The I/O bottleneck faced by traditional database systems changes


Amazon Aurora is a relational database service for OLTP in this environment. Since I/Os can be spread across many nodes
workloads offered as part of Amazon Web Services (AWS). In and many disks in a multi-tenant fleet, the individual disks and
this paper, we describe the architecture of Aurora and the design nodes are no longer hot. Instead, the bottleneck moves to the
considerations leading to that architecture. We believe the central network between the database tier requesting I/Os and the storage
constraint in high throughput data processing has moved from tier that performs these I/Os. Beyond the basic bottlenecks of
compute and storage to the network. Aurora brings a novel packets per second (PPS) and bandwidth, there is amplification of
architecture to the relational database to address this constraint, traffic since a performant database will issue writes out to the
most notably by pushing redo processing to a multi-tenant scale- storage fleet in parallel. The performance of the outlier storage
out storage service, purpose-built for Aurora. We describe how node, disk or network path can dominate response time.
doing so not only reduces network traffic, but also allows for fast Although most operations in a database can overlap with each
crash recovery, failovers to replicas without loss of data, and other, there are several situations that require synchronous
fault-tolerant, self-healing storage. We then describe how Aurora operations. These result in stalls and context switches. One such
achieves consensus on durable state across numerous storage situation is a disk read due to a miss in the database buffer cache.
nodes using an efficient asynchronous scheme, avoiding A reading thread cannot continue until its read completes. A cache
expensive and chatty recovery protocols. Finally, having operated miss may also incur the extra penalty of evicting and flushing a
Aurora as a production service for over 18 months, we share dirty cache page to accommodate the new page. Background
lessons we have learned from our customers on what modern processing such as checkpointing and dirty page writing can
cloud applications expect from their database tier. reduce the occurrence of this penalty, but can also cause stalls,
context switches and resource contention.
Keywords
Databases; Distributed Systems; Log Processing; Quorum Transaction commits are another source of interference; a stall in
Models; Replication; Recovery; Performance; OLTP committing one transaction can inhibit others from progressing.
Handling commits with multi-phase synchronization protocols
1. INTRODUCTION such as 2-phase commit (2PC) [3][4][5] is challenging in a cloud-
IT workloads are increasingly moving to public cloud providers. scale distributed system. These protocols are intolerant of failure
Significant reasons for this industry-wide transition include the and high-scale distributed systems have a continual “background
ability to provision capacity on a flexible on-demand basis and to noise” of hard and soft failures. They are also high latency, as
pay for this capacity using an operational expense as opposed to high scale systems are distributed across multiple data centers.
capital expense model. Many IT workloads require a relational
OLTP database; providing equivalent or superior capabilities to
Data Plane Control Plane
on-premise databases is critical to support this secular transition.
In modern distributed cloud services, resilience and scalability are SQL
increasingly achieved by decoupling compute from storage
Transactions
[10][24][36][38][39] and by replicating storage across multiple
nodes. Doing so lets us handle operations such as replacing Caching Amazon
DynamoDB
misbehaving or unreachable hosts, adding replicas, failing over
from a writer to a replica, scaling the size of a database instance
up or down, etc. Logging + Storage
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or Amazon SWF
distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned
by others than the author(s) must be honored. Abstracting with credit is permitted.
To copy otherwise, or republish, to post on servers or to redistribute to lists, require
prior specific permission and/or a fee. Request permissions from Amazon S3
[email protected]
SIGMOD’17, May 14 – 19, 2017, Chicago, IL, USA.
Copyright is held by the owner/author(s). Publication rights licensed to ACM. Figure 1: Move logging and storage off the database engine
ACM 978-1-4503-4197-4/17/05…$15.00
DOI: http://dx.doi.org/10.1145/3035918.3056101

1041

https://www.amazon.science/publications/amazon-aurora-design-considerations-for-high-throughput-cloud-native-relational-databases

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Move logging and storage off the database engine

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Network IO in mirrored MySQL (not Amazon Aurora)

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Network IO in Amazon Aurora

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Aurora

• MySQL 5.6 and 5.7 compatible

• PostgreSQL 9.6, 10, 11 compatible

• Serverless
• MySQL 5.6 and 5.7, and PostgreSQL 10.7 compatible
• Built-in synchronous Data API with an
• HTTP endpoint
• Integration with AWS SDKs

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Aurora

• Integrated with Machine Learning services


• Amazon Comprehend
• Amazon SageMaker
• Aurora MySQL 5.7 – version 2.07.0 and higher
• Aurora PostgreSQL 10.11
• Aurora PostgreSQL 11.6 and higher

• Global Database
• Sub-Second Data Access in Any Region
• Read Replica Write Forwarding
• Cross-Region Disaster Recovery

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using Machine Learning directly from your databases

https://aws.amazon.com/blogs/aws/new-for-amazon-aurora-use-machine-learning-directly-from-your-databases/

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using Machine Learning directly from your databases

https://aws.amazon.com/blogs/aws/new-for-amazon-aurora-use-machine-learning-directly-from-your-databases/

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using Amazon Aurora
+ Amazon Comprehend

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sentiment Analysis – Managing Comments

CREATE TABLE IF NOT EXISTS comments (


comment_id INT AUTO_INCREMENT PRIMARY KEY,
comment_text VARCHAR(255) NOT NULL
);

INSERT INTO comments (comment_text)


VALUES ("This is very useful, thank you for writing it!");
INSERT INTO comments (comment_text)
VALUES ("Awesome, I was waiting for this feature.");
INSERT INTO comments (comment_text)
VALUES ("An interesting write up, please add more details.");
INSERT INTO comments (comment_text)
VALUES ("I don’t like how this was implemented.");

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Checking Sentiment & Confidence of Comments

SELECT comment_text,
aws_comprehend_detect_sentiment(comment_text, 'en') AS sentiment,
aws_comprehend_detect_sentiment_confidence(comment_text, 'en') AS confidence
FROM comments;

+---------------------------------------------------+-----------+--------------------+
| comment_text | sentiment | confidence |
+---------------------------------------------------+-----------+--------------------+
| This is very useful, thank you for writing it! | POSITIVE | 0.9996347427368164 |
| Awesome, I was waiting for this feature. | POSITIVE | 0.9977971315383911 |
| An interesting write up, please add more details. | NEUTRAL | 0.5156506896018982 |
| I don’t like how this was implemented. | NEGATIVE | 0.9982835054397583 |
+---------------------------------------------------+-----------+--------------------+

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Estimating Overall Sentiment from Comments

SELECT AVG(CASE aws_comprehend_detect_sentiment(comment_text, 'en')


WHEN 'POSITIVE' THEN 1.0
WHEN 'NEGATIVE' THEN -1.0
ELSE 0.0 END) AS avg_sentiment,
COUNT(*) AS total
FROM comments
WHERE aws_comprehend_detect_sentiment_confidence(comment_text, 'en') >= 0.80;

+---------------+-------+
| avg_sentiment | total |
+---------------+-------+
| 0.33333 | 3 |
+---------------+-------+

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using Amazon Aurora
+ Amazon SageMaker

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customer Churn Prediction with Amazon SageMaker Autopilot

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/autopilot/autopilot_customer_churn.ipynb

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Managing Customer Churn

CREATE FUNCTION will_churn (


state varchar(2048),
acc_length bigint(20),
area_code bigint(20), phone varchar(2048),
int_plan varchar(2048),
vmail_plan varchar(2048), vmail_msg bigint(20),
day_mins double, day_calls bigint(20), day_charge double,
eve_mins double, eve_calls bigint(20), eve_charge double,
night_mins double, night_calls bigint(20), night_charge double,
int_mins double, int_calls bigint(20), int_charge double,
cust_service_calls bigint(20))
RETURNS varchar(2048) CHARSET utf8mb4
alias aws_sagemaker_invoke_endpoint
endpoint name 'tuning-job-123’;

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Estimating (and Caching) Customer Churn

CREATE TABLE IF NOT EXISTS customers_churn AS


SELECT *, will_churn(state, acc_length, area_code, phone, int_plan,
vmail_plan, vmail_msg, day_mins, day_calls, day_charge,
eve_mins, eve_calls, eve_charge, night_mins, night_calls, night_charge,
int_mins, int_calls, int_charge, cust_service_calls) will_churn
FROM customers;

SELECT * FROM customers_churn LIMIT 7;

SELECT will_churn, COUNT(*) FROM customers_churn GROUP BY will_churn;

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customer Churn – Some Results

SELECT * FROM customers_churn LIMIT 7;

+-------+------------+-----------+----------+----------+------------+-----------+----------+-----------+------------+----------+-----------+------------+------------+-------------+--------------+----------+-----------+------------+--------------------+------------+
| state | acc_length | area_code | phone | int_plan | vmail_plan | vmail_msg | day_mins | day_calls | day_charge | eve_mins | eve_calls | eve_charge | night_mins | night_calls | night_charge | int_mins | int_calls | int_charge | cust_service_calls | will_churn |
+-------+------------+-----------+----------+----------+------------+-----------+----------+-----------+------------+----------+-----------+------------+------------+-------------+--------------+----------+-----------+------------+--------------------+------------+
| KS | 128 | 415 | 382-4657 | no | yes | 25 | 265.1 | 110 | 45.07 | 197.4 | 99 | 16.78 | 244.7 | 91 | 11.01 | 10 | 3 | 2.7 | 1 | False. |
| OH | 107 | 415 | 371-7191 | no | yes | 26 | 161.6 | 123 | 27.47 | 195.5 | 103 | 16.62 | 254.4 | 103 | 11.45 | 13.7 | 3 | 3.7 | 1 | False. |
| NJ | 137 | 415 | 358-1921 | no | no | 0 | 243.4 | 114 | 41.38 | 121.2 | 110 | 10.3 | 162.6 | 104 | 7.32 | 12.2 | 5 | 3.29 | 0 | False. |
| OH | 84 | 408 | 375-9999 | yes | no | 0 | 299.4 | 71 | 50.9 | 61.9 | 88 | 5.26 | 196.9 | 89 | 8.86 | 6.6 | 7 | 1.78 | 2 | False. |
| OK | 75 | 415 | 330-6626 | yes | no | 0 | 166.7 | 113 | 28.34 | 148.3 | 122 | 12.61 | 186.9 | 121 | 8.41 | 10.1 | 3 | 2.73 | 3 | False. |
| AL | 118 | 510 | 391-8027 | yes | no | 0 | 223.4 | 98 | 37.98 | 220.6 | 101 | 18.75 | 203.9 | 118 | 9.18 | 6.3 | 6 | 1.7 | 0 | False. |
| MA | 121 | 510 | 355-9993 | no | yes | 24 | 218.2 | 88 | 37.09 | 348.5 | 108 | 29.62 | 212.6 | 118 | 9.57 | 7.5 | 7 | 2.03 | 3 | False. |
+-------+------------+-----------+----------+----------+------------+-----------+----------+-----------+------------+----------+-----------+------------+------------+-------------+--------------+----------+-----------+------------+--------------------+------------+

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customer Churn – Overall Stats

SELECT will_churn, COUNT(*)


FROM customers_churn
GROUP BY will_churn;

+------------+----------+
| will_churn | COUNT(*) |
+------------+----------+
| False. | 2898 |
| True. | 435 |
+------------+----------+

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customer Churn – Diving Deeper

SELECT will_churn,
AVG(acc_length),
AVG(cust_service_calls),
COUNT(*)
FROM customers_churn
GROUP BY will_churn;

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Aurora Labs for MySQL

https://awsauroralabsmy.com/ml/overview/

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Takeaways

Relational databases contain highly structured data

Using machine learning from your databases unleashes


new use cases for new users

You don’t have to change traditional SQL-based apps to leverage


sentiment analysis and machine learning inference

Machine learning adoption is easier and broader in scope

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
@danilop Please give me your feedback J

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

You might also like