0% found this document useful (0 votes)
7 views18 pages

3 Pred Analysis

Uploaded by

namisha211
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views18 pages

3 Pred Analysis

Uploaded by

namisha211
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Predictive Analytics aka Machine

Learning
Dr Srinivas Padmanabhuni

[email protected]
Predictive Analytics / ML Types
Machine learning
Inductive – from raw data to general patterns/rules/models/knowledge/program

No explicit programming required

No control over the program,model.knowledge , rule generated

Data In Program Out

Debugging is by changing data

More data better model/program – Improves with experience

Deep learning works with thousands of records

Model/pattern could be several types – Equation, Tree, Graph , Brain like structure
Types of Machine Learning
• A human builds a model based on labelled input
and output data
• Classification
Supervised • That model is trained with a training set of data
• Regression
• That model is tested with a test set of data
• Deployment if the output is satisfactory

• Clustering
• No Output variable • Association Rule Mining
• Identification of patterns in unlabelled input • Feature Extraction
Unsupervised data • Anomaly Detection
• Patterns could be clusters, Associations etc • Collaborative Filtering
• PCA

• Notion of a set of states from start to finish


• At each state a set of rewards and punishments
Reinforcement • Goal is improve performance of reaching goal
by changing values
Reinforcement Learning

 Emotion of self-learning
 Technically, Reinforcement Learning need not have past data
 Notion of a set of states from start to finish
 At each state a set of rewards and punishments
 Goal is improve performance of reaching goal by changing values
 Example Applications include games, dynamic path planning for robots, etc
 Longitudinal Analysis
Reinforcement Learning

Learning by doing See below.

Google DeepMind's Deep Q-learning playing Atari


Breakout! (youtube.com)
Supervised Learning

Training data is usually 80 to 90 percent


Supervised vs Unsupervised

Output known and Detect patterns or


can be labeled (e.g. groups
right or wrong)
•Upselling
• Autonomous Cars •Recommendation
• Customer Lifetime engine
Value •Security threats
• Marketing •Business Analytics
campaigns
• Default or not
Supervised Regression
In regression, the target variable is continuous.

For a new input you have to predict the potential


value of the output e.g. risk rating score, default
rating score

Example: Derive the model for Fuel Consumption


based on model, time of day, age of vehicle etc.

Example: Derive the potential future price of a


stock based on historical analysis of data deriving
from features like stock, quantity, news related to
stock etc.
Supervised Classification

In classification, the target


variable is categorical.

For a new input, you have to put


it in one of the categories into
which it will fit

Example : Classify customers into


Potential Default and Good or,
High default risk (0), lower
default risk (1), negligible default
risk (2)
Simplest: Binary classification
Unsupervised Clustering
No Output Class

Data point is not labeled

Group the items which are close to each


other

Identify structure or pattern in data

Example: Identify customers with similar


buying habits

Techniques : K-means, Hierarchical


dendrogram clustering

How to optimise clusters : Elbow method,


Dendrogram slicing, Hierarchical K-Means

Challenge : Define distance


Unsupervised Clustering – Credit Scores vs ML-based scores

Human-crafted rules
Anomaly Detection
Often data points exist which do not
follow usual behaviour as rest of
data

Identifying such anomalies/outliers


is a key ML task

Example: Fraud Detection


Unsupervised Association Rule Mining
No Output Class
Items Bought together Most Frequently
Identify items occurring together in market
baskets or shopping carts
Example: Identify items to cross sell with
market basket analysis, Co-Occuring
Payment types, Fraud Detection in credit
card, next best offer

Example of co-occurring payment types for cross-sell:


 If SII payment order, then VAT payment order
(confidence=81)
 If VAT payment order, then SII payment order
(confidence= 51)

When you shop on Amazon this is what they suggest when you buy something
METRICS
Regression

Regression Model Evaluation Techniques

• Diverse notions of Error is used in Regression.


• Some Common notions are as below

RMSE is SQUARE ROOT OF MSE (Mean Squared Error)

Mean Absolute Error

Both MSE and MAE are dependent upon Unit of Measurement


Avoiding dependence on Unit

R-squared, often called the coefficient of determination, is defined as the ratio of


the sum of squares explained by a regression model and the "total" sum of squares
around the mean

R2 = 1 - SSE / SST

SST = Sum of Square of difference from Mean

Unit Independent
Classification Model Evaluation Techniques
Classification Model Evaluation
Techniques

• Diverse notions of Error is used in Classification.


• Usual Method is to hold out 10 to 20 percent as
Test Set
• Useful metric Is ACCURACY

Number of correct classifications


Accuracy = ,
Total number of test cases

You might also like