0% found this document useful (0 votes)
12 views39 pages

Pma 5

Uploaded by

927621bad030
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views39 pages

Pma 5

Uploaded by

927621bad030
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

PREDICTIVE MODELLING ANALYTICS

DIVYA M

SANTHANAM L
Corporate Trainer
UNIT V
INTRODUCTION TO MODEL
MODELLING ALGORITHMS
Modelling Algorithms are used provides machines the ability to
learn automatically by feeding lot of data.
TYPES OF MACHINE LEARNING

SUPERVISED LEARNING UNSUPERVISED REINFORCEMENT


LEARNING LEARNING
SUPERVISIED LEARNING
• Supervised learning is a technique in which we teach or train
the machine using data which is well labeled.
Supervised learning can be grouped further in three
categories of algorithms:
1. CLASSIFICATION
2. REGRESSION
3. SEGMENTATION
CLASSIFICATION
The model is trained in such a way that the output data is separated
into different labels (or categories) according to the given input data.
Output variable will be assigned to a category or class is Discrete
Value.
EXAMPLES:
1.To Predict the customer is
eligible for getting loan?
o/p: yes or no

2.Prdeict team India will win or


loss?
o/p: win or loss / yes or no
CLASSIFIER
The algorithm which implements the classification on a dataset is known as a
classifier. There are two types of Classifications:

• BINARY CLASSIFIER: This classification problem has only two possible

Outcomes.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or
DOG, etc.

• MULTI CLASS CLASSIFIER: This classification problem has more than two
outcomes.

Example: Classification of types of music.


REGRESSION
It is statistical method to model the relationship between a dependent
(target) and independent (predictor) variables with one or more
independent variables.

No labels defined-variable output is a continuous numerical value.


Examples:
1.To predict the whether for next 24
hours?
o/p: Continues value depends on
temperature

2.Predict share price?


o/p: Continues numeric range not
exacted.
1. Predict the House price in 2030?

2. Spam Email Detection

3. Temperature Forecasting

4. Stock Price Prediction

5. Medical Diagnosis

6. Predict Netflix Monthly income?

7. Predict Mr. Narendra Modi will win 2024?

8. Predict if an individual likes IPL?


SEGMENTATION
Segmentation, the technique of splitting customers into separate groups
depending on their attributes or behavior, makes this possible.

Cluster customers based on age, call usage, data usage, etc. in order to
divide them into gold, silver and bronze segments.

Each segment of customers can then be approached separately.

Cluster insurance claims and look for unusual cases within the groups.
This is also known as anomaly detection and is commonly used method to
detect fraud.
CREATING A MODEL IN IBM SPSS MODELER
When execute a model, a model nugget (a yellow diamond node)
is added to the stream canvas.
The model nugget stores the results of the analysis and is linked
to the modeling node.
The link ensures that when you rerun the model, for example with
other inputs, that the model nugget is updated with the new
results.
To view the model's results, open the model nugget. The output
depends on which model was executed. For example, you will have
a tree diagram when you run a CHAID node, a cluster profile
when you run a segmentation node, and a set of rules when you
execute an association model.
MODELLING PALETTE
The Modeling palette is organized into categories based on type of
models: each type is a sub palette.

Selecting one of the sub palettes will show all modeling nodes
suitable for that category.
Each type of model requires specific field roles:
❑ Supervised models require one of more input fields
(predictors) and a target field.
❑ Segmentation models only require input fields. The cluster
solution will be based on these fields. No target field is
specified.
❑ Association models involve rules where a field can appear both.
as input and as target
NEURAL NETWORKS
A Neural Network has an Input Layer, a Hidden Layer, and an
Output Layer.

The input layer consists of all predictors / input variables.

The output layer has the target variable.

Hidden Layer(s) are created automatically during model training.


A simple Neural Network is shown below:
The predictors have an effect on the target via a hidden layer. This extra
layer enables you to model non-linear relationships between the
predictors and the target.
Neural networks are generally viewed as powerful models but the
interpretation of the results is difficult. Even when you know the values
of all the coefficients (the connections in the diagram), it will still be
hard to establish the relationship between a predictor and the target
because of the hidden layer, and there can be more than one hidden
layer. That is why neural networks are referred to as black box models.
Neural Networks can score new data by just inputting the values of the
predictors and passing these values through the network, which will
return a value for the target.
INTRODUCTION TO LINEAR REGRESSION
Linear Regression is a linear approach to modeling the
relationship between a continuous target variable and one or
more predictor variables.

It is used to predict a continuous target by finding a linear


combination of predictors such that the correlation between the
actual values of the target and the predicted values of the target
is maximum.
Linear Node is available under the Modeling palette of SPSS Modeler.

Some examples of Linear Regression:


Predicting house prices (target) using input variables like total area
of house (square feet, square meter), number of rooms, distance
from nearest shopping mall, etc.
Predicting the weight of a person using input variables like height,
age, etc.
INTRODUCTION TO LOGISTIC REGRESSION
The logistic model is expressed in terms of a ratio: the probability that a
particular event occurs (a customer churns, a customer accepts an offer,
a claim is fraudulent, a customer does not pay back a loan, a student
passes an exam, and so forth) versus the probability that the event does
not occur. This ratio is known as odds.

Logistic Node can be found under the Modeling Palette


1. LOGISTIC REGRSSION
It is a simple and widely used algorithm for solving classification problems.

A classification algorithm used for binary classification, which estimates the


probability of belonging to a particular class using a sigmoid function.

In the Logistic Regression we will get a ‘S’ shaped sigmoid function .

This function is responsible for predicting values between 0 and 1.


THRESHOLD VALUE
The threshold value is a parameter to
determine the probability of the output
values.
The values that are higher than the
threshold value - probability of 1.
The values lower than the threshold
value - probability of 0.
Sigmoid or Logit function
It gives ‘S’ shaped curve that can taken any real
valued number and maps it to value between 0
and 1.

▪ 'z' is positive, the sigmoid function


approaches 1.

▪ 'z' is negative, the sigmoid function


approaches 0.
CASE STUDY:
Problem Statement: Predict the Bank Costumers Loan eligibility using
logistic regression.

Input data: Age Have Insurance


50 1
34 0
42 0
46 1
54 1
33 0
. .
. .
. .
23 0
1.Draw the Scatterplot for values
Have
Insurance

0.5

20 30 40 50 60 80

Age
2. Suppose Linear regression:-
Have
Insurance

0.5

20 30 40 50 60 80

Age
3. In Logistic Regression
Have
Insurance

0.5

20 30 40 50 60 80

Age
For example, when the probability that a customer pays back a loan is 4/5, then the
probability that that customer does not pay back the loan= 1- 4/5 = 1/5.

Therefore, the odds will be (4/5) / (1/5) = 4 for this customer. When the odds are 1,
you know that the probability for the event to occur equals the probability that the
event does not occur, and both probabilities are 0.5
The odds are linked to the predictors by the equation given as:

Here, exp (…) is another way to write e ^ (…), where e is, approximately,
the number 2.72.
Logistic Regression can be used for classification problems such as:
Predict whether a customer churns or not
Predict whether customer accepts an offer or not
Predict whether an email is spam or not
INTRODUCTION OF NEURAL NETWORKS
Neural networks attempt to solve problems using methods modeled on
how the human brain operates.

A typical neural network consists of several neurons arranged in layers to


create a network.

Each neuron can be thought of as a processing element that is given a


simple part of a task.

The connections between the neurons provide the network with the ability
to learn patterns and relationships in data.
MULTILAYER PERCEPTRON (MLP)
The Multilayer Perceptron consists of several processing units, the neurons, arranged in layers
to create a network.

The neurons in the input layer represent the predictors.

The neuron in the output layer represents the target.

Each neuron in the hidden layer receives an input based on a weighted combination of the
values of the neurons in the previous layer.

The neurons within the hidden layer are, in turn, combined to produce an output value, the
prediction.

This predicted value is compared to the actual value of the target and the difference between
the two values (the error) is fed back into the network (known as "back propagation"), which
in turn is updated
HOW DOES A MULTILAYER PERCEPTRON
NEURAL NETWORK LEARN?
Consider the example of a child learning the difference between an apple and a pear. The child
currently does not know the difference between an apple and a pear.

When shown the first example of a fruit, the child may look at the fruit and decide that it is round,
red in color and of a particular weight.

Not knowing what an apple or a pear actually looks like, the child may decide to place equal
importance on each of these factors.

The importance is what a network refers to as weights. At this stage the child is most likely to
randomly choose either an apple or a pear for the prediction. On being told the correct response, the
child will increase or decrease the relative importance of each of the factors to improve the decision
(reduce the error).
In a similar fashion, a Multilayer Perceptron network begins with
random weights placed on each of the inputs and generates a
predicted value of target.
On being told the actual value of the target, the network adjusts
these internal weights. In time, the child and the network will
hopefully make correct predictions.
RADIAL BASIS FUNCTION (RBF)
The Radial Basis Function (RBF) is a more recent type of network and
is quicker to train than the Multilayer Perceptron.
The RBF can be thought of performing a type of clustering within the
input space, encircling individual clusters of data by a number of
so-called basis functions.
If a data point falls within the region of activation of a particular basis
function, then the neuron corresponding to that basis function
responds most strongly.
The selection of the centers of each basis function is where difficulties
arise.
RBF networks are typically quicker to train than a MLP, and it can
model data that are clustered within the input space.

You might also like