0% found this document useful (0 votes)

62 views70 pages

Introduction To Data Science

Uploaded by

Murari Rajagopalan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views70 pages

Introduction To Data Science

Uploaded by

Murari Rajagopalan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

Real-world demonstration

For the beginner modeler

Revisit Today’s Webinar Materials
For anyone who may have been running late
or wanted to reference these materials, we are
happy to provide the presentation and a link
to the recording of the webinar.

Expect to hear from us after the presentation!

© Minitab Inc. 10/24/2017 2

Today’s Discussion (10/24)
Quick Refresher – What can Machine Learning do for you

Today’s Presenter
Salford Systems – Pioneering Predictive Analytics and
Machine Learning Charlie Harrison

Manufacturing Defects Dataset: Applied Examples Charlie is part of Salford’s Data

CART Scientist Team, and has been
providing customer support and
TreeNet training for several years.

His favorite thing about Data Science

Random Forest
is proving theoretical results.

© Minitab Inc. 10/24/2017 3

What Can Machine Learning Do For You?

Find the Most

Discover the Important
Predict Future Solve Your
Explore Data Most Important Relationships in
Observations Problem
Features Factors &
Response

© Minitab Inc. 10/24/2017 4

How Broad and Deep is the Application
Potential?
Machine learning methods can be applied in almost any context. The following is a
brief selection of industry and functional examples:
INDUSTRIES FUNCTIONAL AREAS

MANUFACTURING FINANCIAL HEALTH CARE OTHER SALES MARKETING

SERVICES INDUSTRIES

Manufacturing Disease Customer Customer

Loan Defaults Insurance Claims
Defects Prevention Churn Segmentation

Preventative Environmental Cross-

Fraud Prevention Genetics Marketing Lift
Maintenance Impacts Sell/Upsell

© Minitab Inc. 10/24/2017 5

CLASSIFICATION MODELS using CART, Gradient Boosting
& Random Forests

REGRESSION UNSUPERVISED LEARNING

Predict a quantitative value Clustering

CLASSIFICATION
Predict a qualitative value
TIME SERIES
SURVIVAL ANALYSIS
Predict future values
Predict time until occurrence
based on past values

© Minitab Inc. 10/24/2017 6

What Do You Need to Get Started?

Sufficient Data Pick the Right Solve with

Problem the Right Tool

Have you downloaded SPM 8.2? After this webinar, we’ll give you access to the dataset used so you
can try it out for yourself.

https://info.salford-systems.com/spm-8-download

© Minitab Inc. 10/24/2017 7

Salford Systems

© Minitab Inc. 10/24/2017 8

Salford’s Legacy in Pioneering Predictive Analytics &
Machine Learning
Salford’s solutions are innovative, reliable and robust because they were created and
are implemented by inventors and pioneers of Predictive Analytics & Machine
Learning (PAML):
• Dr. Jerome Friedman (Professor of Statistics, Stanford)
• Dr. Leo Breiman (Professor of Statistics, UC Berkeley)

The algorithms covered today were either created or co-created by either Dr. Breiman
or Dr. Friedman.

© Minitab Inc. 10/24/2017 9

Salford Stands Out Against Competitors
Salford solutions are distinguished in particular by their:

Ease of Use
Salford’s models don’t require coding

Accuracy of Prediction
Salford’s models stand the test of time and are used by some of the biggest
corporations in the world

Defensibility of Models
Salford’s models are defensible internally to executive stakeholders and
externally to regulators

© Minitab Inc. 10/24/2017 10

Suite of Solutions – Data Science Toolkit
Time- and market-tested predictive modeling tools including
everything from market-leading decision tree and classification
engines to advanced interaction detection and automation to state-
of-the-art machine learning capabilities.

SPM Software Suite

Random
CART MARS TreeNet RuleLearner ISLE GPS
Forests
Decision trees Nonlinear Data ensemble Gradient Rule ensemble Model Regularized
regression bagging boosting compression regression

© Minitab Inc. 10/24/2017 11

Why Do Classification Models Matter?
Classification methods are a simple, effective
and accurate approach to solve organization’s
most difficult problems and uncover new MANUFACTURING What machine signals are predictive
of defects?

INDUSTRIES
opportunities by narrowing down with factors
have the most impact in your outcome FINANCIAL SERVICES Does level of education impact credit
risk?
Some of the most common applications
include: HEALTH CARE Does body weight influence the risk
• Fraud Prevention of heart disease?
• Risk Reduction in Credit Scoring and Loan SALES

FUNCTIONAL
What promotions are most effective?
Default

AREAS
• Optimizing Marketing Campaigns MARKETING Does customer satisfaction influence
• Improving Operations loyalty?

© Minitab Inc. 10/24/2017 12

Machine Learning Terminology
Response Variable = Dependent Variable = Target Putting It All Together
Variable
This is what we are trying to predict Signal 1, Signal 2, … Signal 590
Target Variable: Defect
Examples: default vs. no default, air pressure, number of
claims, etc. Predictor Variables: Signal
1, Signal 2, …, Signal 590 𝐷𝑒𝑓𝑒𝑐𝑡 = Regression = 𝛽0 + 𝛽1 𝑆𝑖𝑔𝑛𝑎𝑙1 + ⋯
Predictor Variables = Predictors = Factors + 𝛽590 𝑆𝑖𝑔𝑛𝑎𝑙590
This is what we use to predict the response. Algorithm: Logistic
Example: I will use two predictors, level of education and Regression
work experience, to predict income which is the target
variable.
Signal 1, Signal 2, … Signal 590
Algorithm = Method Used = Technique
This is the method that we will use to both predict the Target Variable: Defect
target variable and discover the relationships, if any,
between the predictors and the target. Predictor Variables: CART =
Signal 1, Signal 2, …, Signal
𝐷𝑒𝑓𝑒𝑐𝑡 =
Examples: CART decision trees, gradient boosted trees, 590
Random Forests, LASSO, Elastic Net, MARS, Support
Vector Machines (SVMs), and Neural Networks. Algorithm: CART decision
tree

© Minitab Inc. 10/24/2017 13

Hands-on Practice

© Minitab Inc. 10/24/2017 14

Manufacturing
Defects
Let’s Get Started . . .
Live Demo
A manufacturing process involves myriad machines, and the information concerning the operation of the machines is recorded.
There are 590 metrics recorded from the machines from the start of the process to the end and we’ll refer to these metrics as
“signals.”
Open SPM
MANUFACTURING DATA SET

1. What signals, if any, are

predictive of
manufacturing defects?
2. If signals are predictive
of defects, then how are
these signals related to
the likelihood of
manufacturing defects?

© Minitab Inc. 10/24/2017 15

CART and Random Forests

A Random Forest prediction is really just an average of CART tree predictions. When you build a Random
Forest model just keep this picture in the back of your mind:

AUTOMATIC INVARIANT TO
AUTOMATIC AUTOMATIC AUTOMATIC
PREDICTIVE MISSING MONOTONE INTERPRETABILITY
SPM ENGINE PERFORMANCE
VARIABLE INTERACTION
VALUE/OUTLIER
MODELING OF
TRANSFORMATIONS
SELECTION DETECTION LOCAL EFFECTS
HANDLING OF PREDICTORS

10/24/2017

16
Manufacturing

Solving Problems with Machine Learning: Defects

Machine Settings and Manufacturing Defects

A manufacturing process involves myriad machines, and the information concerning the operation of the
machines is recorded. There are 590 metrics recorded from the machines from the start of the process to
the end and we’ll refer to these metrics as “signals.”

We will try to answer two primary questions:

1. What signals, if any, are predictive of manufacturing defects?
2. If signals are predictive of defects, then how are these signals related to the likelihood of manufacturing
defects?

We will use an algorithm called gradient boosting to do this. TreeNet® software will be used. TreeNet is
unique in that its code was originally written by Jerome Friedman, the creator of gradient boosting.

© Minitab Inc. 10/24/2017 17

Dataset Citations
Manufacturing Defect Dataset: Michael McCann and Adrian
Johnston donated the dataset to the UCI Machine Learning
Repository in 2008:

Link: https://archive.ics.uci.edu/ml/datasets/SECOM

© Minitab Inc. 10/24/2017 18

CART
Let’s apply CART to the SIGNAL_294

manufacturing defect dataset. SIGNAL_293 SIGNAL_359

SIGNAL_60 SIGNAL_359

Applying CART
1. Build the model in SPM SIGNAL_246 SIGNAL_158 SIGNAL_21

SIGNAL_247 SIGNAL_111 SIGNAL_158

2. Understand CART Relative Cost

SIGNAL_66 SIGNAL_60 SIGNAL_311 SIGNAL_112

3. Find the most interesting rules that

are predictive of manufacturing SIGNAL_246

defects using Hotspot Detection

SIGNAL_549

4. Using the model: Generating

manufacturing defect predictions
and deploying CART outside of SPM

© Minitab Inc. 10/24/2017 19

CART Review
CART is a decision tree algorithm that divides
the data so that the dependent variable can be
predicted more accurately

CART automatically:
1. Selects variables
2. Models nonlinear relationships
3. Model local effects
4. Models interactions
5. Handles missing values

© Minitab Inc. 10/24/2017 20

Node 1
Class = Circle
X2 <= -0.49
Class Cases %

CART : Relative Cost

Circle 16 64.0
Triangle 9 36.0
W = 25.00
N = 25

X2 <= -0.49 X2 > -0.49

Terminal Node 2
Node 1 Class = Circle
Class = Circle X1 <= 0.23

𝑂𝑣𝑒𝑟𝑎𝑙𝑙 𝑀𝑖𝑠𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝐶𝑜𝑠𝑡 𝑈𝑠𝑖𝑛𝑔 𝑎 𝐶𝐴𝑅𝑇 𝑡𝑟𝑒𝑒

Class Cases %
Circle 6 100.0
Class Cases %
Circle 10 52.6

Relative Cost = Triangle 0

W = 6.00
0.0 Triangle 9 47.4
W = 19.00

𝑁𝑜 𝐷𝑎𝑡𝑎 𝑂𝑝𝑡𝑖𝑚𝑎𝑙 𝑅𝑢𝑙𝑒

N=6 N = 19

X1 <= 0.23 X1 > 0.23

Terminal Terminal
Node 2 Node 3
Class = Triangle Class = Circle

The No Data Optimal Rule classifies every observation as one class. More specifically, the class
Class Cases %
Circle 1 14.3
Class Cases %
Circle 9 75.0

chosen for the no data optimal rule is the class that has the lowest cost compared to the other(s)
Triangle 6
W = 7.00
85.7 Triangle 3
W = 12.00
25.0

N=7 N = 12
Relative Cost = .44

Good: If the relative cost is closer to zero (closer is better) then CART is better than the No Data
Optimal Rule CART Predicted
CART Predicted No Data Optimal Rule
Class: Predicted Class:
Class:

Bad: If the relative cost is equal to 1 then the CART error is the same as the No Data Optimal Rule
which means that CART is no better than just predicting every observation as the same class
CART Predicted
The relative cost can be greater 1 which is especially bad
Class:and, more generally, values around 1 should be
considered “bad”
CART Confusion Matrix
Use the Confusion Matrix to assess CART and the
types of correct or incorrect predictions that it
makes.

CART correctly predicted “No Defect” 935 times

CART correctly predicted “No Defect” 57 times

CART incorrectly predicted “Defect” when

there was actually no defect 528 times (we call
this a false positive)

CART incorrectly predicted “No Defect” when

there actually was a defect 47 times (we call this
a false negative)

© Minitab Inc. 10/24/2017 22

CART: Variable Selection & Importance
There were 590 variables
available to be selected by CART.

13 variables appear in the tree

79 variables are used in the

model (i.e. 13 variables used in
the tree and 66 used to handle
missing values via surrogate
splits)

© Minitab Inc. 10/24/2017 23

CART: Hotspot Detection
Recall: a CART tree can be thought SIGNAL_294

of as a collection of rules. SIGNAL_293 SIGNAL_359

SIGNAL_60 SIGNAL_359

Each rule defines a path to a SIGNAL_246 SIGNAL_158 SIGNAL_21

terminal node
SIGNAL_247 SIGNAL_111 SIGNAL_158

SIGNAL_66 SIGNAL_60 SIGNAL_311 SIGNAL_112

For large CART trees, is there an
easy way to find the “most SIGNAL_246

interesting” rules? Yes, use SIGNAL_549

Hotspot Detection.

© Minitab Inc. 10/24/2017 24

CART: Hotspot Detection
Hotspot Detection computes
summary information about
each terminal node (every rule
leads to a terminal node) and
displays the information
conveniently to the user.

Use this information to easily

and efficiently find the most
important rules in your CART
tree.

© Minitab Inc. 10/24/2017 25

CART: Using Hotspot Detection

Here terminal node 5 has the

largest class count and a lift
value of around 2.5. This
means that the probability of a
“Defect” is 2.5 times more
likely than the overall
population.

What rule leads to terminal

node 5?

© Minitab Inc. 10/24/2017 26

CART Hotspot Interpretation
If Signal 294 <= 368.82 and Signal 293 > .006
and Signal 60 > 1.51 and Signal 246 <= 1.42 and
Signal 247 > 2.98 then we predict “Defect”.

If the machine signals satisfy this rule then the

probability of a defect is 2.5 times larger than
the overall probability of a defect.

© Minitab Inc. 10/24/2017 27

CART: Hotspot Detection
Focus Class: the class (i.e. “Defect” or “No Defect” that you
want to generate the hotspot report for. I set the focus
class to be “Defect.”

𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐹𝑜𝑐𝑢𝑠 𝐶𝑙𝑎𝑠𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑙 𝑛𝑜𝑑𝑒

𝐿𝑖𝑓𝑡 =
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐹𝑜𝑐𝑢𝑠 𝐶𝑙𝑎𝑠𝑠 𝑂𝑣𝑒𝑟𝑎𝑙𝑙

Node Class Count = number of records in the sample that

fall into the node

If Lift = 1, then the probability of a “Defect” is the same as

it is in the overall sample.

If Lift = 2 then the probability of a “Defect” is twice as

much in the terminal node as it is in the overall sample.

If Lift = .5 then the probability of a “Defect” is half as

much in the terminal node as it is in the overall sample.

© Minitab Inc. 10/24/2017 28

What Can Machine Learning Do For You?

Find the Most

Discover the Important
Explore Data Predict Future Solve Your
Most Important Relationships in
Observations Problem
Features Factors &
Response

© Minitab Inc. 10/24/2017 29

Deploying CART
If you want to use CART to generate predictions, you have two
primary options:

1. Generate Predictions inside of SPM

2. Translate CART into a programming language and deploy it in

your environment

© Minitab Inc. 10/24/2017 30

Generating Predictions Inside of SPM
Let’s suppose that you have a
set of machine signal values
(i.e. you know the values for
Signal 1 – Signal 590) and you
want to predict if there will be
a product defect (i.e. you don’t
know the “STATUS” value)

© Minitab Inc. 10/24/2017 31

Deploying CART via Code Translations
A CART model is fundamentally a
collection of rules where each rule is an
if-then statement (also else-if statements
etc.). We can then take these if-then
statements and translate them into
different programming languages. In
SPM we can translate into 4 languages: C,
PMML, Java, and SAS.

***Use the code to generate CART

predictions in other
applications/programs or to make
predictions in real-time.

© Minitab Inc. 10/24/2017 32

What Can Machine Learning Do For You?

Find the Most

Discover the Important
Explore Data Predict Future Solve Your
Most Important Relationships in
Observations Problem
Features Factors &
Response

© Minitab Inc. 10/24/2017 33

CART: Finding Rules
CART automatically gave us a set of interpretable rules that are
predictive of manufacturing defects. Now we will need to determine
what the signals actually measure and determine if we can control
the inputs that drive the settings.

© Minitab Inc. 10/24/2017 34

CART: Generating Predictions
1. Use CART to predict if there will or will not be a product defect
inside of SPM.

2. Translate CART into C (or Java, PMML, or SAS) and deploy your
CART model in your environment in order to make predictions
in real-time.

© Minitab Inc. 10/24/2017 35

TreeNet Gradient Boosting
Let’s apply the gradient boosting algorithm using TreeNet® software

Applying TreeNet
1. Understanding the model: Partial Dependency Plots

2. Choosing the number of trees (set the maximum number of trees such that the
error no longer meaningfully declines; SPM will choose the optimal number for
you)

3. Choosing the number of nodes with Automate NODES

4. Discover important interactions with interaction reporting

5. Making predictions and deploying the model.

© Minitab Inc. 10/24/2017 36

Gradient Boosting Review
Idea: fit a CART tree to the error
from the previous error and use
this new prediction to update
the model

© Minitab Inc. 10/24/2017 37

Gradient Boosting: Why it works
How does TreeNet model this curve? It makes small improvements (i.e. the
learning rate is a small number that “shrinks” the model updates). The
Tree 1
small improvements, taken together, produce an accurate model.

Tree 10

Tree 50
Tree 600
Tree 100

Tree 150

Tree 200

Tree 400

Tree 600

Note: Noise ~ N(0,1)

© Minitab Inc. 10/24/2017 38

What Can Machine Learning Do For You?

Find the Most

Discover the Important
Explore Data Predict Future Solve Your
Most Important Relationships in
Observations Problem
Features Factors &
Response

© Minitab Inc. 10/24/2017 39

Manufacturing
Defects
Most Important Signals
TreeNet, like CART, automatically selects the
most important variables (i.e. the signals).
Steps
1. Import the dataset
2. Select “TreeNet Gradient Boosting
Machine”
3. Set variables
4. Click “Start”
5. View variable importance measures

Of the 590 signals, TreeNet automatically

identifies 299 of them as useful (you can actually
run a series of variable “shaving” experiments to
see if you can reduce the number of variables
used even more)

© Minitab Inc. 10/24/2017 40

What Can Machine Learning Do For You?

Find the Most

Discover the Important
Explore Data Predict Future Solve Your
Most Important Relationships in
Observations Problem
Features Factors &
Response

© Minitab Inc. 10/24/2017 41

Manufacturing

How are Most Important Signals Related to Defects

the Likelihood of Product Defects?

The plots on the right are generated
automatically from a TreeNet model, so you
only have to click two buttons to see the plots.

The plots are ordered in terms of the variable

importance (most important first).

© Minitab Inc. 10/24/2017 42

Manufacturing
Defects
Most Important Signal: Signal 60
This plot tells us that, after accounting
for the other 299 variables in the
model, the likelihood of a product
defect increases once Signal 60 has
values beyond 3.25. Once Signal 60
reaches about 13.3, the likelihood of a
defect remains constant.

TreeNet automatically discovered this

relationship. Now we have a few
questions to answer:
What does Signal 60 actually measure?
Signal_60=3.25
Signal_60=13.3
What machine settings have an effect on
Signal 60? To what extent, if any, can we
control these settings?

© Minitab Inc. 10/24/2017 43

Manufacturing

Most Important Two-Way Interaction: Defects

Signal 60 and Signal 334

The most important two-way interaction in the
model is between Signal 60 and Signal 334. Defect is more likely

The red and orange areas in the plot on the right

mean that the likelihood of a defect is higher.
When Signal 60 is between about 15 and 150 and
Signal 334 is between 30 and 100, then the
likelihood of a defect is higher.

Follow-up questions for identifying the machine

settings that affect the signals:
What do the two signals measure?

What machine settings, if any, have an affect on

Signal 60 and Signal 334?

© Minitab Inc. 10/24/2017 44

Interaction Statistics: Global Score
Use the Global Score to find the most important two-way interactions in the model. The
Global Score for a pair of variables tells you the percentage of the total variation in the
predicted response that is accounted for by the two-way interaction between two variables. A
value of 5.66 means that 5.66% of the variation in the predicted response is accounted for by
the interaction between Signal 60 and Signal 334.

− −
𝐆𝐥𝐨𝐛𝐚𝐥 𝐒𝐜𝐨𝐫𝐞 =

Total Variation in the Predicted Response

© Minitab Inc. 10/24/2017 45

Using the Interaction Statistics: Next Webinar
One way to leverage the interaction statistics is allow only
interactions between the pairs of variable deemed to be “important”
by the TreeNet interaction statistics and disallow interactions
among the unimportant variables. If we do this and the model error
does not change meaningfully then we can be more confident that
the interaction is real (i.e. not noise!). We will talk more about this
in Webinar 5.

© Minitab Inc. 10/24/2017 46

What Can Machine Learning Do For You?

Find the Most

Discover the Important
Explore Data Predict Future Solve Your
Most Important Relationships in
Observations Problem
Features Factors &
Response

Manufacturing

Solving the Problem: Defects

Predicting Future Observations & Running Simulations

Engineers can predict the likelihood Proposed Machine Settings
of a defect based on the signal
values:
1. Take data (i.e. hypothetical signal Hypothetical (or estimated) Signal Values
values or estimated signal values
given the machine settings) and
substitute the values into the
TreeNet model
2. TreeNet will generate the
probability of a defect based on the
signal values supplied.

***If we can predict signal values based on

the machine settings, then we could Predicted probability of “Defect” and the
predict the probability of a defect based on predicted class: “Defect” or “No Defect.”
chosen machine settings***

Generating Predictions in SPM
We can generate predictions
inside of SPM just like CART
(the same is true for Random
Forests, MARS, etc.)

Click the “Score” button

Deploying TreeNet via Code Translations
A TreeNet model is fundamentally a
collection of rules where each rule is an
if-then statement (also else-if statements
etc.). We can then take these if-then
statements and translate them into
different programming languages. In
SPM we can translate into 4 languages: C,
PMML, Java, and SAS.

***Use the code to generate TreeNet

predictions in other
applications/programs or to make
predictions in real-time.

What Can Machine Learning Do For You?

Find the Most

Discover the Important
Explore Data Predict Future Solve Your
Most Important Relationships in
Observations Problem
Features Factors &
Response

Manufacturing

Solving the Problem: Defects

Predicting Future Observations & Running Simulations

***If we can predict signal values based on

Manufacturing

Solving the Problem: Defects

Understanding the relationship of signals and the likelihood of defects

Use TreeNet gradient boosting to
1. View signals that are useful in
predicting defects (or, conversely,
non-defects; signals that are not
important are either rarely used in
the model or not used at all)

2. Visually understand the

relationship between the likelihood
of a defect and a signal

3. Visually understand the nature of

the interactions that are important
in the model.

Optimizing Models with SPM Automates
One way to choose the optimal value for a model parameter in TreeNet is to run an experiment:
build multiple TreeNet models with identical settings except that change the value of one
parameter each time.

Model experimentation and optimization routines are pre-packaged for you in SPM, so you
never have to write even a single line of code. We want you to spend time on solving problems,
not troubleshooting while loops and function calls!

We will discuss this more in the second webinar, but we will provide one example.

Automate NODES
The number of terminal nodes in
each tree in the TreeNet model
controls the extent to which the
model can capture interactions.

Use Automate NODES to easily

find the optimal number of
terminal nodes in each tree. Here
the optimal number of terminal
nodes is 6 (this is actually the
default value).

What Can Machine Learning Do For You?

Find the Most

Discover the Important
Explore Data Predict Future Solve Your
Most Important Relationships in
Observations Problem
Features Factors &
Response

CART: Generating Predictions
1. Use CART to predict if there will or will not be a product defect
inside of SPM.

2. Translate CART into C (or Java, PMML, or SAS) and deploy your
CART model in your environment in order to make predictions
in real-time.

Random Forests: Review
Idea: fit CART trees to
independent bootstrap samples
and combine the predictions

Random Forest Output
For smaller datasets (i.e. <10,000 records) we can compute a variety
of useful metrics including outlier statistics.

Optimizing Random Forests: Automate
RFNPREDS
Use Automate RFNPREDS to
conveniently find optimal value
for the random variable subset
size.

Here the optimal size is

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠 ∗ 2 =
49

Other Machine Learning Applications

Manufacturing: Value Creation through
Machine Learning Application
INDUSTRIES

Organizations Gain Efficiencies Through Smarter Lean Adoption

MANUFACTURING
Identifying challenges and the benefits of LEAN implementation in
small to medium sized companies using CART.

Implementation of lean manufacturing in Saudi manufacturing organizations: an empirical

study

Proceedings of the 2011 International Conference on Materials and Products Manufacturing

Technology: https://eprints.qut.edu.au/46594/1/2011011893_Karim_ePrints.pdf

Financial Services: Value Creation through
Machine Learning Application
INDUSTRIES

Improving Credit Scoring in Highly-Competitive Environment

FINANCIAL SERVICES
Accurate credit scoring using CART and TreeNet is critical for
financial services and is increasingly competitive. Less risk is assumed
as future instances of loan default are predicted.

Mining the customer credit using classification and regression tree and multivariate
adaptive regression splines

Computational Statistics & Data Analysis:

http://www.sciencedirect.com/science/article/pii/S016794730400355X

Healthcare: Value Creation through Machine
Learning Application
INDUSTRIES

Predicting Lung Cancer for High Risk Patients

HEALTHCARE
Medical researchers were looking to improve lung cancer detection
through blood testing. CART analysis was leveraged to predict which
patients had cancer given the serum biomarkers.

Panel of Serum Biomarkers for the Diagnosis of Lung Cancer

Journal of Clinical Oncology: http://ascopubs.org/doi/full/10.1200/JCO.2007.13.5392

Continue To Use Machine Learning On Your Own
Practice, Practice, Practice

We’ll provide you a Download a trial version of SPM Schedule a demo and we’ll walk you
link to the dataset https://info.salford- through the example shown today
used today in a follow systems.com/spm-8-download
up email

Feeling Stuck? We Can Help!

Check out our other training materials online: If you need help getting started, give us a shout:
https://www.salford-systems.com/resources/training- [email protected]
videos

Ready For More? Join Our Next Webinar
Tuesday October 31, 2017 @ 10 am (PDT):
Real-world demonstration for the advanced modeler
Register: http://info.salford-
systems.com/datascience101webinarseries

In this webinar I am going to explain the how to leverage powerful

Machine Learning algorithms in detail using SPM software.

Appendix

CART® Software Applications
Predicting Return to Work with Data Mining
Society of Actuaries: https://www.soa.org/files/research/projects/data-mining.pdf

Implementation of lean manufacturing in Saudi manufacturing organizations: an empirical study

Proceedings of the 2011 International Conference on Materials and Products Manufacturing Technology:
https://eprints.qut.edu.au/46594/1/2011011893_Karim_ePrints.pdf

Assessing the prediction of employee productivity: a comparison of OLS vs. CART

International Journal of Productivity and Quality Management: http://www.inderscienceonline.com/doi/abs/10.1504/IJPQM.2011.042511

Mining the customer credit using classification and regression tree and multivariate adaptive regression splines
Computational Statistics & Data Analysis: http://www.sciencedirect.com/science/article/pii/S016794730400355X

Panel of Serum Biomarkers for the Diagnosis of Lung Cancer

Journal of Clinical Oncology: http://ascopubs.org/doi/full/10.1200/JCO.2007.13.5392

Automated urban land-use classification with remote sensing

International Journal of Remote Sensing: http://www.tandfonline.com/doi/abs/10.1080/01431161.2012.714510

Random Forest® Software Applications
Mapping Oil and Gas Development Potential in the US Intermountain West and Estimating Impacts to Species
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0007400

Random Forests applied as a soil spatial predictive model in arid Utah

Digital Soil Mapping: http://link.springer.com/content/pdf/10.1007/978-90-481-8863-5.pdf#page=188

Factors Associated With Increased Reading Frequency in Children Exposed to Reach Out and Read
Academic Pediatrics: ttp://www.sciencedirect.com/science/article/pii/S1876285915002752
This paper used Random Forests® software to pick the factors

Using Random Forests to Provide Predicted Species Distribution Maps as a Metric for Ecological Inventory & Monitoring
Programs
Applications of Computational Intelligence in Biology: https://link.springer.com/chapter/10.1007/978-3-540-78534-7_9

Random Forest for Gene Expression Based Cancer Classification: Overlooked Issues
Iberian Conference on Pattern Recognition and Image Analysis: https://link.springer.com/chapter/10.1007/978-3-540-72849-8_61

09-Predictive Maintenance Toolbox User's Guide
No ratings yet
09-Predictive Maintenance Toolbox User's Guide
640 pages
Krishna Janmashtami
100% (1)
Krishna Janmashtami
18 pages
Accelerate Your Workflow With Data Analytics
0% (1)
Accelerate Your Workflow With Data Analytics
49 pages
King of Dharma
0% (1)
King of Dharma
653 pages
The MathWorks, Inc. - MATLAB Predictive Maintenance Toolbox™ User's Guide (2020, The MathWorks, Inc.)
No ratings yet
The MathWorks, Inc. - MATLAB Predictive Maintenance Toolbox™ User's Guide (2020, The MathWorks, Inc.)
466 pages
ML Workshop
No ratings yet
ML Workshop
78 pages
Up M PHD Seminar Cart RF May 2023
No ratings yet
Up M PHD Seminar Cart RF May 2023
101 pages
1 Tailieuthamkhao MachineLearning
No ratings yet
1 Tailieuthamkhao MachineLearning
151 pages
Krishna Seva - Kenneth R Valpey
No ratings yet
Krishna Seva - Kenneth R Valpey
290 pages
Search For God in Ancient Egypt, The - Jan Assmann
100% (2)
Search For God in Ancient Egypt, The - Jan Assmann
292 pages
Sent-Machine Learning For Data Science
100% (1)
Sent-Machine Learning For Data Science
463 pages
Course Report
No ratings yet
Course Report
22 pages
Krishnaleelalu Telugu
100% (1)
Krishnaleelalu Telugu
65 pages
Sales Prediction For Big Mart 3.0.pptx MM
No ratings yet
Sales Prediction For Big Mart 3.0.pptx MM
25 pages
3 Pred Analysis
No ratings yet
3 Pred Analysis
18 pages
Lord of Vaikunta
No ratings yet
Lord of Vaikunta
140 pages
000 Into Machine Learning
No ratings yet
000 Into Machine Learning
45 pages
Breaking Into AI!
No ratings yet
Breaking Into AI!
30 pages
Machine Learning New
No ratings yet
Machine Learning New
8 pages
Aiml Project
No ratings yet
Aiml Project
13 pages
Inicial Apu Aps - 231107 - 232152
No ratings yet
Inicial Apu Aps - 231107 - 232152
138 pages
Wepik Unveiling The Power of Data Science A Profound Exploration Into Iris Classification Prediction Mode 20231122165302mubr
No ratings yet
Wepik Unveiling The Power of Data Science A Profound Exploration Into Iris Classification Prediction Mode 20231122165302mubr
17 pages
Chandamama 1957 9
No ratings yet
Chandamama 1957 9
88 pages
Use Machine Learning To Forecast Future Earnings
No ratings yet
Use Machine Learning To Forecast Future Earnings
31 pages
Algorithmeknn 121213175830 Phpapp02
No ratings yet
Algorithmeknn 121213175830 Phpapp02
52 pages
Chandamama 1970 11
No ratings yet
Chandamama 1970 11
68 pages
Chandamama 1956 10
No ratings yet
Chandamama 1956 10
68 pages
Machine Learning Engineer Interview Preparation Guide
No ratings yet
Machine Learning Engineer Interview Preparation Guide
14 pages
AML Slides Indexed 2in1
No ratings yet
AML Slides Indexed 2in1
33 pages
Designing Machine Learning Systems by Chip Huygen by Rick
No ratings yet
Designing Machine Learning Systems by Chip Huygen by Rick
15 pages
Classification Notes
No ratings yet
Classification Notes
14 pages
11-Predictive Maintenance Toolbox Getting Started Guide
No ratings yet
11-Predictive Maintenance Toolbox Getting Started Guide
56 pages
Chandamama 1970 8
No ratings yet
Chandamama 1970 8
58 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
TM800V Service Manual
No ratings yet
TM800V Service Manual
149 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
ML Mdu 2024 10939237
No ratings yet
ML Mdu 2024 10939237
20 pages
Condition Monitoring of A Turbfan Engine - NCMAPSS
No ratings yet
Condition Monitoring of A Turbfan Engine - NCMAPSS
46 pages
Ai Notes
No ratings yet
Ai Notes
8 pages
EXAMPLE ML in Real Life
No ratings yet
EXAMPLE ML in Real Life
6 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
Polyphase Rectifier
No ratings yet
Polyphase Rectifier
72 pages
APS1070 Lecture (3) Slides
No ratings yet
APS1070 Lecture (3) Slides
70 pages
Professional Education - Drill 6 - Part 1
100% (2)
Professional Education - Drill 6 - Part 1
4 pages
31 Startup Ideas
No ratings yet
31 Startup Ideas
32 pages
Lecturer-Predictive Analytics Techniques and Regression Analysis
No ratings yet
Lecturer-Predictive Analytics Techniques and Regression Analysis
29 pages
Week 2: Machine Learning Intro: Instructor: Ting Sun
No ratings yet
Week 2: Machine Learning Intro: Instructor: Ting Sun
21 pages
Maa-Godessess of Hindu
No ratings yet
Maa-Godessess of Hindu
58 pages
Lecture 1
No ratings yet
Lecture 1
21 pages
Machine Learning Curriculum Berkley
100% (1)
Machine Learning Curriculum Berkley
12 pages
Calculation For Open Drain Design: Rain Storm Discharge Calculation
No ratings yet
Calculation For Open Drain Design: Rain Storm Discharge Calculation
45 pages
Unit1 ML
No ratings yet
Unit1 ML
10 pages
02 Predictive Maintenance Workshop - PIDI 4.0
No ratings yet
02 Predictive Maintenance Workshop - PIDI 4.0
39 pages
BSC Aeronautical
No ratings yet
BSC Aeronautical
144 pages
Lec 2
No ratings yet
Lec 2
13 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
8 pages
Lecture Notes 4
No ratings yet
Lecture Notes 4
6 pages
Rough Volatility 2023 Part 1 Handout
No ratings yet
Rough Volatility 2023 Part 1 Handout
43 pages
From Field Problems To Machine Learning
No ratings yet
From Field Problems To Machine Learning
51 pages
Glycol Dehydrator Design Manual
No ratings yet
Glycol Dehydrator Design Manual
36 pages
Preparing Data For Machine Learning - Pluralsight PDF
No ratings yet
Preparing Data For Machine Learning - Pluralsight PDF
74 pages
ml-4
No ratings yet
ml-4
22 pages
What Is Trip Circuit Supervision (TCS) Protection
No ratings yet
What Is Trip Circuit Supervision (TCS) Protection
7 pages
Mini Project Report
No ratings yet
Mini Project Report
21 pages
3-1 Derivatives of Elementary Weaves
No ratings yet
3-1 Derivatives of Elementary Weaves
20 pages
Intro To Stat (STAT 111) by Ewens
No ratings yet
Intro To Stat (STAT 111) by Ewens
113 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
12 pages
Algorithms For Predictive Maintenance Efficiently Developed With Matlab
No ratings yet
Algorithms For Predictive Maintenance Efficiently Developed With Matlab
22 pages
Assembly Procedure 24M
No ratings yet
Assembly Procedure 24M
21 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
An Introduction To Machine Learning and Its Applications
No ratings yet
An Introduction To Machine Learning and Its Applications
8 pages
Machine Learning Section1 Ebook
No ratings yet
Machine Learning Section1 Ebook
12 pages
Restaurant
No ratings yet
Restaurant
24 pages
Oracle Data Encryption
No ratings yet
Oracle Data Encryption
40 pages
Prediction of Mental Health (Depression) Using Data Science Technique
No ratings yet
Prediction of Mental Health (Depression) Using Data Science Technique
6 pages
Kulfoldi Kutatasi Jelentesek Gyujtemenye
No ratings yet
Kulfoldi Kutatasi Jelentesek Gyujtemenye
92 pages
AI Strategy Flow Chart Share by WorldLine Technology
No ratings yet
AI Strategy Flow Chart Share by WorldLine Technology
1 page
Machine Learning P
No ratings yet
Machine Learning P
9 pages
Predictive Maintenance With Matlab and Simulink
No ratings yet
Predictive Maintenance With Matlab and Simulink
24 pages
Toyota 4Y Motor Spec - Motorpower
No ratings yet
Toyota 4Y Motor Spec - Motorpower
1 page
MAE 3181 Materials and Structures Laboratory
No ratings yet
MAE 3181 Materials and Structures Laboratory
22 pages
Problems Chapter 1 Sec B
No ratings yet
Problems Chapter 1 Sec B
7 pages
Mantenimiento Predictivo MATLAB
No ratings yet
Mantenimiento Predictivo MATLAB
11 pages
C11.4.QA1.Chemical Bonding.R
No ratings yet
C11.4.QA1.Chemical Bonding.R
9 pages
Feature Labs - ML 2.0
No ratings yet
Feature Labs - ML 2.0
13 pages
Machine Learning Canvas (v1.1)
No ratings yet
Machine Learning Canvas (v1.1)
2 pages
Machine Learning Ai Manufacturing PDF
No ratings yet
Machine Learning Ai Manufacturing PDF
6 pages
An Extension of The Finite Hankel Transforms
No ratings yet
An Extension of The Finite Hankel Transforms
21 pages
An Empirical Assessment of Empirical Corporate Finance
No ratings yet
An Empirical Assessment of Empirical Corporate Finance
40 pages
Enhanced Performance of Air-Cooled Chillers Using Evaporative Cooling PDF
No ratings yet
Enhanced Performance of Air-Cooled Chillers Using Evaporative Cooling PDF
5 pages
Bcf42ht Maruyama
No ratings yet
Bcf42ht Maruyama
16 pages
Canopus
No ratings yet
Canopus
6 pages
CLIP Systems and Applications: The Complete Guide for Developers and Engineers
From Everand
CLIP Systems and Applications: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
MSP360 Solutions and Administration: Definitive Reference for Developers and Engineers
From Everand
MSP360 Solutions and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Sharp Photodevices Application Cirquits
No ratings yet
Sharp Photodevices Application Cirquits
7 pages
Synchronous Rectifier MOSFET Driver Substantially Reduces Power Adapter
No ratings yet
Synchronous Rectifier MOSFET Driver Substantially Reduces Power Adapter
6 pages
Sentry Error Monitoring and Application Observability: Definitive Reference for Developers and Engineers
From Everand
Sentry Error Monitoring and Application Observability: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Egypt Nile Cruise
100% (1)
Egypt Nile Cruise
454 pages
Loctite 271™: Product Description
100% (1)
Loctite 271™: Product Description
3 pages
Effective Dynatrace Deployment and Operations: Definitive Reference for Developers and Engineers
From Everand
Effective Dynatrace Deployment and Operations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
EE Review 2
No ratings yet
EE Review 2
5 pages
32 + 44 B10F-Ball-Valve
No ratings yet
32 + 44 B10F-Ball-Valve
1 page
Barrierboard - Info Sheet
No ratings yet
Barrierboard - Info Sheet
2 pages