Aki Ariga | Field Data Scientist
2018.05.17
2 © Cloudera, Inc. All rights reserved.
● Field Data Scientist at Cloudera
● Previously research engineer at Toshiba, Rails developer at Cookpad
● Co-author of “ ”
● Founder of kawasaki.rb & MLCT
● Twitter: @chezou
● GitHub: https://github.com/chezou/
:
3 © Cloudera, Inc. All rights reserved.
Hidden technical debt in Machine learning systems [2]
Project
procedure
Culture
+
+
© Cloudera, Inc. All rights reserved.
Building a Data-driven product ≠ Research
5 © Cloudera, Inc. All rights reserved.
A journey for Data-driven product
1.
2.
3. A/B
4. A/B
5.
6.
7.
http://tjo.hatenablog.com/entry/2016/01/18/080000 ( )
Culture
BI
Statistics
ML
6 © Cloudera, Inc. All rights reserved.
1.
2.
3.
4.
5.
6.
7.
8.
Procedure in a Machine Learning project
Step.4 7
7 © Cloudera, Inc. All rights reserved.
•
•
•
• / Web
•
Typical project member recommendation for ML project
© Cloudera, Inc. All rights reserved.
What’s the difference between academia and industry for ML?
9 © Cloudera, Inc. All rights reserved.
Production by Nick Youngson CC BY-SA 3.0 Alpha Stock Images
10 © Cloudera, Inc. All rights reserved.
Sample data science/machine learning workflow
From data to exploration to action
Data Engineering Data Science (Exploratory) Production (Operational)
Data
Wrangling
Data
Exploration
Model Training
& Testing
Production
Data Pipelines Batch Scoring
Online Scoring
Serving
Data GovernanceCuration
Data Engineering
Acquisition
Reports,
Dashboards
Data Models Predictions Business value
1.
12 © Cloudera, Inc. All rights reserved.
1.
2.
3.
Production
MLOps
13 © Cloudera, Inc. All rights reserved.
1.
2.
3.
Production
MLOps
14 © Cloudera, Inc. All rights reserved.
1. Train by batch, predict on the fly, serve via REST API
2. Train by batch, predict by batch, serve through the shared DB
3. Train, predict, serve by streaming
4. Train by batch, predict on mobile app
1.
15 © Cloudera, Inc. All rights reserved.
Web Application
DB
Trained Model
Execute training
Extract feature
Prediction
result
Activity log/
Contents data
Feature
Training result
Feature
Batch SystemAPI Server
REST
API
User ID/
Item ID
ML System
Pattern 1: Train by batch, predict on the fly, serve via REST API
1.
16 © Cloudera, Inc. All rights reserved.
Extract feature &
Train/update model
Extract feature & Predict
Trained Model
Activity log
Export model as
PMML
Model building layer
Predicting &
serving layer
Updated model
CDSW
Prediction results
HDFSRequest to predict
Load model
Example architecture: PMML + OpenScoring
1.
17 © Cloudera, Inc. All rights reserved.
Extract feature &
Train/update model
Extract feature & Predict
Trained Model
Activity log
Save model on
object storage
Model building layer
Predicting &
serving layer
Updated model
Prediction results
HDFSRequest to predict
Load model
Object
storage
Pack the runtime
env with Docker
CDSW
Example architecture: Docker based API Server
1.
18 © Cloudera, Inc. All rights reserved.
Web Application
DB
Trained Model
Batch System
Execute training
Extract feature
Prediction
result
Activity log/
Contents data
Feature
Training result
Feature
Serve prediction
Training BatchPrediction Batch
Pattern 2: Train by batch, predict by batch, serve through the shared DB
1.
19 © Cloudera, Inc. All rights reserved.
Kudu/HBase
Extract feature &
Train/update model
Extract feature & Predict
Activity log
Prediction results
Model building &
predicting layerServing layer
Updated model
Activity log Load trained
model
Prediction results
HDFS
CDSW
Historical
data
Historical
data
Example architecture: Serving by HBase/Kudu
Trained Model
1.
20 © Cloudera, Inc. All rights reserved.
Web Application
Trained Model
Stream-based ML System
(e.g. Spark Streaming)
Train & Predict
Extract feature
Prediction
results
Recent
log data
Feature Model updates
Model
- Querying for prediction
- Showing or sending alerts
- This component may work
with message queue like
Kafka
Messagequeue
(e.g.Kafka)
Log data
Prediction
results
Pattern 3: Train, predict, serve by streaming
1.
21 © Cloudera, Inc. All rights reserved.
Mobile Application
DB
Trained Model
Batch System
Execute training
Extract feature
Extract feature
Request for
prediction Activity logs/
Contents data
Prediction
result
Activity log/
Contents data
Feature
Training resultFeature
DB
Trained Model
Convert
model
Pattern 4: Train by batch, predict on a mobile app
1.
22 © Cloudera, Inc. All rights reserved.
Extract feature &
Train/update model
Extract feature & Predict
Trained Model
Activity log
Convert model to
TFLite/CoreML
Model building layer
Predicting &
serving layer
Updated model
Prediction results
HDFSRequest to predict
Load model
Storage in a
smart phone
CDSW
Example architecture: Serving on a mobile app
1.
23 © Cloudera, Inc. All rights reserved.
Pattern 4’: Federated learning
https://research.googleblog.com/2017/04/federated-learning-
collaborative.html
1.
24 © Cloudera, Inc. All rights reserved.
4 patterns Comparison
1.
Pattern 1 (REST API) Pattern 2 (Shared DB) Pattern 3 (Streaming) Pattern 4 (Mobile app)
Training by batch by batch NRT (by streaming) by batch
Prediction NRT (on the fly) by batch NRT (by streaming) NRT (on the fly)
Prediction result
delivery
NRT (via REST API) NRT
(through the shared DB)
NRT
(by streaming via MQ )
NRT (via in-process API
on mobile)
Latency for prediction
from getting new data
So so So so ~ Long Very low Low
Required time to predict Short Long Short Short
Tight/loose coupling
with app
Loose Loose Loose Tight
Dependency of
languages
Independent Independent Independent Depends on frameworks
System management
difficulty
So so Easy Very Hard So so
NRT: Near real time
25 © Cloudera, Inc. All rights reserved.
CI, CD and Blue Green deployment
https://www.slideshare.net/hiroakikudo77/ss-84593653/14
1.
26 © Cloudera, Inc. All rights reserved.
1.
2.
3.
Production
MLOps
27 © Cloudera, Inc. All rights reserved.
• /Feedback loop
•
•
2.
28 © Cloudera, Inc. All rights reserved.
•
• ) MeCab
•
• )
•
•
•
/Feedback loop
https://twitter.com/hagino3000/status/986257856730034177
2.
29 © Cloudera, Inc. All rights reserved.
•
• “safe to serve” & “desired prediction quality” [4]
• (offline) (online)
• “Silent failures” [3]
• ) Join
• )
•
•
•
• serving
2.
30 © Cloudera, Inc. All rights reserved.
• •
• [1]
• ) DVC, Bitemporal Modeling
• [4]
• )
•
• [2,4]
• [4]
2.
31 © Cloudera, Inc. All rights reserved.
1.
2.
3.
Production
MLOps
32 © Cloudera, Inc. All rights reserved.
•
• [7]
• Google, Facebook [4, 9]
• /
• /
•
•
Researcher, Dev, Ops:
https://www.slideshare.net/syou6162/ss-88255142
3.
33 © Cloudera, Inc. All rights reserved.
• IoT
[8]
•
•
(GDPR)
3.
34 © Cloudera, Inc. All rights reserved.
• Data-driven product
•
•
•
• ML systems Production
•
•
•
•
35 © Cloudera, Inc. All rights reserved.
• [1] “My model has higher BLEU, can I ship it? The Joel Test for machine learning systems”, L. Park,
2017, ACML-AIMLP Workshop
• [2] “Hidden Technical Debt in Machine Learning Systems”, D. Sculley et al., NIPS’ 15
• [3] “Rules of Machine Learning: Best Practices for ML Engineering”, M. Zinkevich
• [4] “TFX: A TensorFlow-Based Production-Scale Machine Learning Platform”, A. Naresh et al., KDD
2017
• [5] “What’s your ML test score? A rubric for ML production systems”, E. Breck et al., Reliable Machine
Learning in the Wild - NIPS 2016 Workshop (2016)
• [6] , 2017, ML Ops Study #1
• [7] , , 2018, HACKER TACKLE 2018
• [8] “DevOps for models: How to manage millions of models in production—and at the edge”, T. Tung
et al., Strata Data Singapore, 2017
• [9] “Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective”, K. Hazelwood
et al., IEEE HPCA, 2018
THANK YOU

仕事ではじめる機械学習

  • 1.
    Aki Ariga |Field Data Scientist 2018.05.17
  • 2.
    2 © Cloudera,Inc. All rights reserved. ● Field Data Scientist at Cloudera ● Previously research engineer at Toshiba, Rails developer at Cookpad ● Co-author of “ ” ● Founder of kawasaki.rb & MLCT ● Twitter: @chezou ● GitHub: https://github.com/chezou/ :
  • 3.
    3 © Cloudera,Inc. All rights reserved. Hidden technical debt in Machine learning systems [2] Project procedure Culture + +
  • 4.
    © Cloudera, Inc.All rights reserved. Building a Data-driven product ≠ Research
  • 5.
    5 © Cloudera,Inc. All rights reserved. A journey for Data-driven product 1. 2. 3. A/B 4. A/B 5. 6. 7. http://tjo.hatenablog.com/entry/2016/01/18/080000 ( ) Culture BI Statistics ML
  • 6.
    6 © Cloudera,Inc. All rights reserved. 1. 2. 3. 4. 5. 6. 7. 8. Procedure in a Machine Learning project Step.4 7
  • 7.
    7 © Cloudera,Inc. All rights reserved. • • • • / Web • Typical project member recommendation for ML project
  • 8.
    © Cloudera, Inc.All rights reserved. What’s the difference between academia and industry for ML?
  • 9.
    9 © Cloudera,Inc. All rights reserved. Production by Nick Youngson CC BY-SA 3.0 Alpha Stock Images
  • 10.
    10 © Cloudera,Inc. All rights reserved. Sample data science/machine learning workflow From data to exploration to action Data Engineering Data Science (Exploratory) Production (Operational) Data Wrangling Data Exploration Model Training & Testing Production Data Pipelines Batch Scoring Online Scoring Serving Data GovernanceCuration Data Engineering Acquisition Reports, Dashboards Data Models Predictions Business value 1.
  • 11.
    12 © Cloudera,Inc. All rights reserved. 1. 2. 3. Production MLOps
  • 12.
    13 © Cloudera,Inc. All rights reserved. 1. 2. 3. Production MLOps
  • 13.
    14 © Cloudera,Inc. All rights reserved. 1. Train by batch, predict on the fly, serve via REST API 2. Train by batch, predict by batch, serve through the shared DB 3. Train, predict, serve by streaming 4. Train by batch, predict on mobile app 1.
  • 14.
    15 © Cloudera,Inc. All rights reserved. Web Application DB Trained Model Execute training Extract feature Prediction result Activity log/ Contents data Feature Training result Feature Batch SystemAPI Server REST API User ID/ Item ID ML System Pattern 1: Train by batch, predict on the fly, serve via REST API 1.
  • 15.
    16 © Cloudera,Inc. All rights reserved. Extract feature & Train/update model Extract feature & Predict Trained Model Activity log Export model as PMML Model building layer Predicting & serving layer Updated model CDSW Prediction results HDFSRequest to predict Load model Example architecture: PMML + OpenScoring 1.
  • 16.
    17 © Cloudera,Inc. All rights reserved. Extract feature & Train/update model Extract feature & Predict Trained Model Activity log Save model on object storage Model building layer Predicting & serving layer Updated model Prediction results HDFSRequest to predict Load model Object storage Pack the runtime env with Docker CDSW Example architecture: Docker based API Server 1.
  • 17.
    18 © Cloudera,Inc. All rights reserved. Web Application DB Trained Model Batch System Execute training Extract feature Prediction result Activity log/ Contents data Feature Training result Feature Serve prediction Training BatchPrediction Batch Pattern 2: Train by batch, predict by batch, serve through the shared DB 1.
  • 18.
    19 © Cloudera,Inc. All rights reserved. Kudu/HBase Extract feature & Train/update model Extract feature & Predict Activity log Prediction results Model building & predicting layerServing layer Updated model Activity log Load trained model Prediction results HDFS CDSW Historical data Historical data Example architecture: Serving by HBase/Kudu Trained Model 1.
  • 19.
    20 © Cloudera,Inc. All rights reserved. Web Application Trained Model Stream-based ML System (e.g. Spark Streaming) Train & Predict Extract feature Prediction results Recent log data Feature Model updates Model - Querying for prediction - Showing or sending alerts - This component may work with message queue like Kafka Messagequeue (e.g.Kafka) Log data Prediction results Pattern 3: Train, predict, serve by streaming 1.
  • 20.
    21 © Cloudera,Inc. All rights reserved. Mobile Application DB Trained Model Batch System Execute training Extract feature Extract feature Request for prediction Activity logs/ Contents data Prediction result Activity log/ Contents data Feature Training resultFeature DB Trained Model Convert model Pattern 4: Train by batch, predict on a mobile app 1.
  • 21.
    22 © Cloudera,Inc. All rights reserved. Extract feature & Train/update model Extract feature & Predict Trained Model Activity log Convert model to TFLite/CoreML Model building layer Predicting & serving layer Updated model Prediction results HDFSRequest to predict Load model Storage in a smart phone CDSW Example architecture: Serving on a mobile app 1.
  • 22.
    23 © Cloudera,Inc. All rights reserved. Pattern 4’: Federated learning https://research.googleblog.com/2017/04/federated-learning- collaborative.html 1.
  • 23.
    24 © Cloudera,Inc. All rights reserved. 4 patterns Comparison 1. Pattern 1 (REST API) Pattern 2 (Shared DB) Pattern 3 (Streaming) Pattern 4 (Mobile app) Training by batch by batch NRT (by streaming) by batch Prediction NRT (on the fly) by batch NRT (by streaming) NRT (on the fly) Prediction result delivery NRT (via REST API) NRT (through the shared DB) NRT (by streaming via MQ ) NRT (via in-process API on mobile) Latency for prediction from getting new data So so So so ~ Long Very low Low Required time to predict Short Long Short Short Tight/loose coupling with app Loose Loose Loose Tight Dependency of languages Independent Independent Independent Depends on frameworks System management difficulty So so Easy Very Hard So so NRT: Near real time
  • 24.
    25 © Cloudera,Inc. All rights reserved. CI, CD and Blue Green deployment https://www.slideshare.net/hiroakikudo77/ss-84593653/14 1.
  • 25.
    26 © Cloudera,Inc. All rights reserved. 1. 2. 3. Production MLOps
  • 26.
    27 © Cloudera,Inc. All rights reserved. • /Feedback loop • • 2.
  • 27.
    28 © Cloudera,Inc. All rights reserved. • • ) MeCab • • ) • • • /Feedback loop https://twitter.com/hagino3000/status/986257856730034177 2.
  • 28.
    29 © Cloudera,Inc. All rights reserved. • • “safe to serve” & “desired prediction quality” [4] • (offline) (online) • “Silent failures” [3] • ) Join • ) • • • • serving 2.
  • 29.
    30 © Cloudera,Inc. All rights reserved. • • • [1] • ) DVC, Bitemporal Modeling • [4] • ) • • [2,4] • [4] 2.
  • 30.
    31 © Cloudera,Inc. All rights reserved. 1. 2. 3. Production MLOps
  • 31.
    32 © Cloudera,Inc. All rights reserved. • • [7] • Google, Facebook [4, 9] • / • / • • Researcher, Dev, Ops: https://www.slideshare.net/syou6162/ss-88255142 3.
  • 32.
    33 © Cloudera,Inc. All rights reserved. • IoT [8] • • (GDPR) 3.
  • 33.
    34 © Cloudera,Inc. All rights reserved. • Data-driven product • • • • ML systems Production • • • •
  • 34.
    35 © Cloudera,Inc. All rights reserved. • [1] “My model has higher BLEU, can I ship it? The Joel Test for machine learning systems”, L. Park, 2017, ACML-AIMLP Workshop • [2] “Hidden Technical Debt in Machine Learning Systems”, D. Sculley et al., NIPS’ 15 • [3] “Rules of Machine Learning: Best Practices for ML Engineering”, M. Zinkevich • [4] “TFX: A TensorFlow-Based Production-Scale Machine Learning Platform”, A. Naresh et al., KDD 2017 • [5] “What’s your ML test score? A rubric for ML production systems”, E. Breck et al., Reliable Machine Learning in the Wild - NIPS 2016 Workshop (2016) • [6] , 2017, ML Ops Study #1 • [7] , , 2018, HACKER TACKLE 2018 • [8] “DevOps for models: How to manage millions of models in production—and at the edge”, T. Tung et al., Strata Data Singapore, 2017 • [9] “Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective”, K. Hazelwood et al., IEEE HPCA, 2018
  • 35.