0% found this document useful (0 votes)
11 views27 pages

Predictive Analytics - Regression

Uploaded by

i2brdii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views27 pages

Predictive Analytics - Regression

Uploaded by

i2brdii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

COE 102

Introductory
Big Data
College of Engineering

Chapter -8 -
Predictive Analytics -
Regression
Learning Objectives
• Predictive analytics elements
• Regression Models
• Linear Regression
• Predictive performance measures
Predictive Analytics
Predictive analytics: Answers “What might
Descriptive analytics : Answers What happened?
happen in the future”?
• Summarizes or condensates data to extract • Produces predictions based on predictive
patterns models.
• The result of a given method or technique is • A predictive model is a generalization of the
obtained directly by applying an algorithm to relationship between data and the desired
the data output. It associates the hidden relationships in
• Examples: relationship between Hight and data with a sought or target perdition
weight, average grade in the class, students with • Predictive tasks do not predict what is going to
similar study interests, clusters of mall happen in the future, but how likely or probable
customers or patients- groups… etc. are the outcomes of a given event.
• Examples: predicting the possibility of getting a
certain disease for a new patient based on the
history of genetic data of previous patients
records.
Examples – Predictive analytics
Name Age Diagnosis

Ahmad 19 Positive - Predictive analytics can answer the


COVID following:

Ali 71 Negative- Does this patient have COVID:


COVID
Name: Sara, Age: 19 years
Randa 18 Positive-
COVID

4
Predictive
Analytics
• Data Labels
• Predicative Analytics Elements
Data Labels
Data Labels – Desired Predictions
Data Labels – Desired Predictions
Data Labels – Desired Predictions
• Datasets can be labeled and unlabeled (i.e., annotated or
unannotated)
• Predictive tasks use labeled data
• Labeled data is data whose outcome is already known, to guide the
prediction of labels (desired outcome) for new, unlabeled, data.
• A label represents a possible outcome of an event and can be of
several types.
• For example:
 a person can be labeled “child” or “adult”  Binary labels , or can
be labelled as male of female
 a car can be of types “family”, “sport”, “terrain” or “truck”  multi
labels
 movies can be rated “worst”, “bad”, “neutral”, “good” and
“excellent” (multi labels),
Predictive Analytics Elements
• Training data: the historical data used to induce a model
• Training: the process of utilizing an algorithm to build a generalizable model
that can correctly associate the attributes of each instance to a true prediction
• Predictive attributes: all the features utilized in the training process
• Target attributes: data labels that indicate the desired prediction
• Test data : the data used to test the performance/quality of the predictive
model.
• Predictive model: predictive techniques usually build or induce a predictive
model from the labeled data
• Once a predictive model is induced on the training set, it can be used to predict
the correct label for new data in the next step
• Prediction models are not 100% accurate
• We aim to minimize the number or extent of future mispredictions produced by prediction models
Predictive Analytics - Example
• A hospital might have the records of several
patients, and each record would be the
result of a set of clinical examinations
and the diagnosis (i.e., feature space)
for one of the patients. The set of patient
records is called the training data

• Each clinical examination represents a


predictive attribute of a patient. In other
rewards, it helps in predicting the patient’s
diagnoses.
• the diagnosis is the target attribute
(label)
• The predictive model induced from these
training data is then used to predict the
most likely diagnosis of new patients
belonging to the test set, for which we
know the attributes (the results of their
clinical examinations)
• What is the scale type of the target variable?
Predictive Analytics
• Predictive tasks are divided between:
• classification tasks
• regression tasks

• Regression task is a predictive task whose aim is to assign a


quantitative value to a new, unlabeled object, given the values
of its predictive attributes.
• A mathematical relation is induced from the training instances
and their labels to predict labels for test instances
Regression Algorithms
• Several regression algorithms have been proposed, most of them in the
area of statistics:
1. Linear regression:the Linear Regression (LR) algorithm is one of the oldest and simplest
regression algorithms. It is able to induce good regression models:
1. Univariate linear regression: one predicative attribute is considered
2. Multivariate linear regression: more than one predicative attribute is
considered
Univariate Linear Regression

• A linear regression task is called univariate when only one predicative is


considered in training
• In the social network example:
• each instance (friend) x is associated with only one attribute; the weight
• the target attribute y is associated with the height.
• We can see that there are two parameters and :

• is associated with the importance of the predictive attribute : the weight in our
Example – this is an unknown value that needs to be calculated

• is called the intercept and is the value of y when the linear model intercepts
the y-axis, in other words when x1 = 0.
• When developing a univariate regression model we aim to find the
Parameter 𝛽 1 that produce predictions with the minimum
mispredictions possible
+ * Predictive attribute
Regression - Example
• Let us use the social network data set to predict the
height of new friends
• The “weight” is a predictive attribute
• The “height” is the target attribute,
• The data in the table is the training set,
• The predictive model is a simple linear regression
model:
height = 128.017 + 0.611 × weight
• let us predict the height of our new friends Omar
and Patricia, whose weights are 91 and 58
• The predicted height of Omar will be:
height = 128.017 + 0.611 × 91 = 183.618 cm
• Can you find the weight of Patricia using the same
regression model?
Multivariate Linear Regression
• The linear model generalized for any number p of predictive attributes
• A multivariate regression model can be expressed:

• where p is the number of predictive attributes (i.e., the number of dimensions)


• is the value of y when all xj = 0 in the jth dimension
• is the slope of the linear model according to the jth axis

• When developing a univariate regression model we aim to find the Parameters


that produce predictions with the minimum mispredictions possible
Multivariate Linear Regression-
Example
Consider the following example, where we aim to predict the BMI:

The diet score feature determines how closely it was associated with actual measurements of BMI
• What are the predictive attributes?
• What are the possible values for each attribute?
• What is the target attribute?
Multivariate Linear Regression-
Example
• The regression mathematical function:
BMI = 18.0 + (1.5 (diet score) + 1.6 (male) + 4.2 (age>20))
• What is the BMI for the following new unseen test instance:(J,3, 1, 1)

• Answer:
BMI = 18.0 + (1.5* 3 + 1.6*1 + 4.2 *1)
BMI= 28.3
Class Activity
• Use the same regression model:
BMI = 18.0 + (1.5 (diet score) + 1.6 (male) + 4.2 (age>20))

to predict the BMI for the following new unseen test instance:
(L,3, 0, 0)
Advantages and disadvantages of
linear regression
Advantages Disadvantages
• Strong mathematical • Poor fit if relationship between
foundation predictive attributes and target is non-
linear
• Easily interpretable
• The number of instances must be
larger than the number of attributes
• Sensitive to outliers
Predictive Performance Measures for
Regression
• In the social network example the regression model was induced from the 14 record training
set

• The predicted heights for Omar and Patricia were 183.618 and 163.455 cm
respectively
• The real heights of Omar and Patricia are, however, different from these
predicted values:
• This means that here are errors in our predictions!
• The real, measured, heights of Omar and Patricia be 176 and 168
cm, respectively
Predictive Performance Measures for
Regression
• It is important to measure and assess the quality of a regression model:
• Models that produce an unacceptable level of mispredictions must be
discarded
• The quality of the induced model is obtained by comparing the predicted
values ^yi with the respective true values yi on the given test set S.
• Various performance measures can be set up:
• Mean absolute error
• Mean square error
• Root mean square error
• Relative mean square error (RelMSE)
Predictive Performance Measures for
Regression
Mean absolute error (MAE) Mean square error (MSE)

The value of MAE has the same Compared to MAE, MSE


unit measure as y emphasizes bigger errors more.
• The value of MSE has the
square of the y unit’s measure,
and is therefore hard to
interpret
Predictive Performance Measures for
Regression
Root mean square error (RMSE)

This measure has the same unit


measure as y, so it is easier to
interpret than MSE.
Performance Measures - Example
• In our social network example on which the performance is measured,
we have two test instances:
• (x1, y1) = ((Omar, 91), 176) and (x2, y2) = ((Patricia, 58), 168)
• The predicted values of the target attribute are ̂y1 = 183.618 and ̂y2
= 163.455
• The MAE:

• The MSE
Class Activity

• Calculate the RMSE


Reading
• Chapter 8 from the textbook “A General Introduction to Data
Analytics”

You might also like