Module 5
Module 5
Module 5 – Prediction
• Introduction: Overview,
• Classification,
• Regression,
• Building a Prediction Model,
• Applying a Prediction Model,
• Simple Linear Regression,
• Simple Non Linear Regression.
Machine learning is divided into two main
categories
Machine learning
Supervised Unsupervised
learning learning
Algorithm
Input that learns the mapping
Output
(x) function from the input (y)
to the output
A supervised learning technique:
Regression
How does regression work?
•Regression models use an algorithm to understand the
relationship between a dependent variable (input) and
an independent variable (output).
y = wX + b
28
26
Simple linear regression only has
one Y variable and one X variable: 24
22
Umbrellas sold
• The independent variable x: 20
rainfall measured in 1
millimeters 8
Umbrellas sold
2 between the line and each 20 (weight)
datapoint (the residuals) 1
8
3 We sum up the 1
residuals 6
y = w 1x 1 + w 2x 2 + … + b
•Multiple linear regression can have many independent variables
to one dependent variable
•Datasets with multiple features like the number of bedrooms,
age of the building, covered area, etc.
How can we evaluate the
performance of a regression model?
• We use performance evaluation
metrics
• The most commonly used evaluation metrics is taking the
difference between predicted and actual value of some test
points:
• The mean of the squared difference is taken – Mean Squared
Error (MSE)
• The size of the error is measured by taking the square root
of MSE - Root Mean Squared Error (RMSE)
Evaluating the performance of a regression model
using MSE & RMSE
n
MSE
1 Σ ( yi - ŷi )2
= i=1
n
n
RMSE=(MSE)1/2
1 Σ ( yi - i
2
i= ŷ)
n 1
MSE = M ean squared error Yi = Observed values
n = Number of data ŷi = Predicted values
points
Simple Non-Linear Regression
• In situations where the relationship between two
variables is nonlinear, a simple way of generating a
regression equation is to transform the nonlinear
relationship to a linear relationship using a
mathematical transformation.
• A linear model can then be generated.
• Once a prediction has been made, the predicted value
is transformed back to the original scale.
• For example, in Table 7.10 two columns show a
nonlinear relationship.
11
Simple Non-Linear Regression
• Plotting these values results in the scatterplot in
Figure 7.13.
• There is no linear relationship between these two
variables and hence we cannot calculate a linear model
directly from the two variables.
• To generate a model, we transform x or y or both to
create a linear relationship.
• In this example, we transform the y variable using the
following formula:
12
Simple Non-Linear Regression
13
Simple Non-Linear Regression
• We now generate a new column, y’ (Table 7.12). If we
now plot x against y’, we can see that we now have an
approximate linear relationship (see Figure 7.14).
14
Simple Non-Linear Regression
• Using x we can now calculate a
predicted value for the
transformed value of y (y’).
• To map this new prediction of y’
we must now perform inverse
transformation, that is, -1/y’.
• In Table 7.12, we have calculated
the predicted value for y’ and
transformed the number to
Predicted y.
• The Predicted y values are close to
the actual y values.
15
Simple Non-Linear Regression
• Some common nonlinear relationships are shown in Figure 7.15.
• The following transformation may create a linear relationship for the
charts shown:
Situation a: Transformations on the x, y or both x and y
variables such as log or square root
Situation b: Transformation on the x variable such as square
root, log or -1/x.
Situation c: Transformation on the y variable such as square
root, log or -1/y.
• This approach of creating simple nonlinear models can only be
used when there is a clear transformation of the data to a linear
relationship.
16
Fig 7.14 and 7.15
17
A supervised learning technique:
classification
What is classification?
•Classification is the process of categorizing a given set
of data into classes. The pre-defined classes act as our
labels, or ground truth.
•The model uses the features of an object to predict its
labels. E.g., filtering spam from non-spam emails or
classifying types of fruits based on their color, weight
and size.
1
What types of problems does
classification solve?
There are two types of classification problems
Binary Multi-class
| 24
Support vector machine (SVM)
• What is support vector machine
(SVM)?
• Support vector machine (SVM), is a supervised ML technique
that can be used to solve classification and regression
problems. It is, however, mostly used for classification.
• In this algorithm, each feature & data points are plotted
in the space. Then, the SVM model finds boundaries to
separates different data samples into specific classes.
| 25
A practical example: finding a 2D
plane that differentiates two classes
Let’s say we have a dataset of
different animals of two classes:
birds & fish
•There are only three features:
body weight, body length, and
daily food consumption
•We draw a 3D grid and plot all
these points
| 26
If there are more than three
features, we would have a hyper-
space
A hyper-space is a space with higher than 3 dimensions like
4D, 5D etc., and a separating line in a dimension higher
than 3, is called a hyper-plane.
•If the hyper-planes are linear, the SVM is called
Linear Kernel SVM
•For nonlinear hyper-planes, a Polynomial Kernel
or other advanced SVMs are used
What is a Prediction Model
• Predictive models are used in many situations
where an estimate for forecast is required.
• Ex: To project sales or forecast the weather.
28
A portion of the observations are shown in the below table.
29
• A model to predict the car fuel efficiency was
built using:
• The MPG variable as the response and,
• The Cylinders, Displacement, Horsepower, Weight
and Acceleration variables as descriptors.
30
Ex: The observations in the below table could be presented to the model & the model
would predict the MPG column.
31
• There are many methods for building prediction models
and they are often characterized based on the response
variable.
• When the response is a categorical variable, the model
is called a classification model.
• When the response is a continuous variable, then the
model is called a regression model.
32
• Below table summarizes some of the methods available:
33
• There are two distinct phases, each with a unique set of
processes and issues to consider:
• Building
• Applying
34
Building a Prediction Model
• Building:
• The prediction model is built using existing data
called training set.
• This training set contains examples with values for
the descriptor and response variables.
• The training set is used to determine and qualify
the relationships between the input descriptors
and the output response variables.
• This set will be divided into observations used to
build the model and assess the quality of any
model built.
35
Building a Prediction Model
A. Preparing the Data set
• It is important to prepare a data set prior to modeling.
• Preparation should include the operations outlined
such as characterizing, cleaning, and transforming the
data.
• Particular care should be taken to determine whether
subsetting the data is needed to simplify the resulting
models
36
Building a Prediction Model
B. Designing a Modelling Experiment:
• Building a prediction model is an experiment.
• It will be necessary to build many models for which
you do not necessarily know which model will be the
‘best’.
• This experiment should be appropriately designed to
ensure an optimal result.
• There are three major dimensions that should be
explored:
37
Building a Prediction Model
1. Different models:
• There are many different approaches to building prediction
models.
• A series of alternative models should be explored since all
models work well in different situations.
• The initial list of modeling techniques to be explored can be
based on the criteria previously defined as important to the
project.
38
Building a Prediction Model
39
Building a Prediction Model
3. Model parameters:
• Most predictive models can be optimized by fine tuning
different model parameters.
• Building a series of models with different parameter
settings and comparing the quality of each model will allow
you to optimize the model.
• For example, when building a neural network model there
are a number of settings, which will influence the quality of
the models built such as the number of cycles or the
number of hidden layers.
40
Building a Prediction Model
• Evaluating the ‘best’ model depends on the objective
of the modeling process defined at the start of the
project.
• Other issues, for example, the ability to explain how a
prediction was made, may also be important and
should be taken into account when assessing the
models generated.
• Wherever possible, when two or more models give
comparable results, the simpler model should be
selected.
41
Building a Prediction Model
C. Separating Test and Training Sets:
• The goal of building a predictive model is to generalize
the relationship between the input descriptors and the
output responses.
• The quality of the model depends on how well the
model is able to predict correctly for a given set of
input descriptors.
• If the model generalizes the input/output relationships
too much, the accuracy of the model will be low. -
Overfitting
42
Building a Prediction Model
C. Separating Test and Training Sets:
• If the model does not generalize the relationships
enough, then the model will have difficulties making
predictions for observations not included in the data
set used to build the model.
• Hence, when assessing the quality of the model, it is
important to use a data set to build the model, which is
different from the data set used to test the accuracy of
the model.
• There are a number of ways for achieving this
separation of test and training set.
43
Applying a Prediction Model
• Applying:
• Once a model has been built, a data set with no
output response variables can be fed into this
model and the model will produce an estimate for
this response.
• A measure that reflects the confidence in this
prediction is often calculated along with an
explanation of how the value was generated.
44
Applying a Prediction Model
• Once a model has been built and verified, it can be
used to make predictions.
• Along with the presentation of the prediction, there
should be some indications of the confidence in this
value.
45
• During the data preparation step of the process, the
descriptors and/or the response variables may have been
translated to facilitate analysis.
• Once a prediction has been made, the variables should be
translated back into their original format prior to
presenting the information to the end user.
• For example, the log of the variable Weight was taken in
order to create a new variable log(Weight) since the original
variable was not normally distributed.
• This variable was used as a response variable in a model.
• Before any results are presented to the end user, the
log(Weight) response should be translated back to Weight
by taking the inverse of the log and presenting the value
using the original weight scale.
46
Applying a Prediction Model
• When applying these models to new data, some
criteria will need to be established as to which model
the observation will be presented to.
• For example, a series of models predicting house
prices in different locations such as coastal,
downtown, and suburbs were built.
• When applying these models to a new data set, the
observations should be applied only to the appropriate
model.
47