0% found this document useful (0 votes)
14 views29 pages

Lecturer-Predictive Analytics Techniques and Regression Analysis

Uploaded by

hoangha43kd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views29 pages

Lecturer-Predictive Analytics Techniques and Regression Analysis

Uploaded by

hoangha43kd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Predictive analytics techniques and models

COMP 1810 –Data and Web Analytics.

Predictive analytics techniques and models 1


Predictive Analytics Techniques And Models
Predictive analytics is
making predictions
about future events
using historical data
and statistical
techniques. It
includes but not
limited to fields like
data mining and
machine learning.

Predictive analytics techniques and models 2


Predictive Analytics Techniques And Models
Data mining and machine learning are both similar in many ways.
Both uses the same algorithm. Data mining is discovering pattern
and making prediction in large collection of datasets like in a
Datawarehouse, meanwhile Machine learning is using the same
type of algorithm and historical data to train computer to learn to
make prediction.

Predictive analytics techniques and models 3


Predictive Analytics Techniques And Models
The predictive analytic techniques follow stages of CRISP or Data analytics life
cycle

https://www.voksedigital.com/data-analytics-life-cycle/
https://www.datasciencecentral.com/profiles/blogs/crisp-dm-a-standard-methodology-to-ensure-a-good-outcome

Predictive analytics techniques and models 4


Predictive Analytics Techniques And Models
Machine learning, Data mining, Data modelling,
inference and Descriptive statistics and many more are
all parts of Data analytics.

The two life cycle are the same (different ways of


presenting the same thing). Access and Explore data
are the same as data understanding. Data are collected,
access and explore to be able to understand the data.

Predictive analytics techniques and models 5


Predictive Analytics Techniques And Models
Stages Access and Data Exploration Preprocessing
Activities •Get appropriate data from different sources and integrated •Cleaning of the errors (if necessary).
Carried out. them into one. •Feature selection (if necessary).
•Get appropriate data, even if with different format and •Normalization (if necessary).
integrate them into the required format (Unified view). •Dimensionality Reduction (if necessary).
•Identify types of Errors. •Data Transformation (if necessary).
•Identify types of Data types. •Decision on the Machine Learning algorithm to use.
•Run Descriptive statistics on the data. All activities on this stage are important but also optional for
example, you may have dataset with no error, therefore you
don’t need to clean it.

Stages Modelling Evaluation


Activities •Split data into train and test set •Evaluate the model using an independent dataset.
Carried out. •Train the model. •If the model did not perform well during evaluation repeat
•Score the model. the modelling.
•Optimized the model (if necessary).
•Implement cross-validation (if necessary).

Stages Deployment
Activities Make decisions based on the results in evaluation.
Carried out. Put the modelling information

Predictive analytics techniques and models 6


Supervised Learning
These belongs to the group of Machine Learning (ML) were
what is being predicted is known, for example

I. Predicting the outcome of election (win or lose) (Classification)


II. Predicting if a patent has cancer or not. (Classification)
III.Predicting if a patient has diabetes or not. (Classification)
IV.Predicting if a customer is going to churn or stay. (Classification)
V. Predicting the prices of house in London. (Regression)
VI.Predicting the prices of house in Car. (Regression)
VII.Predicting the Amount of Rainfall. (Regression)
Predictive analytics techniques and models 7
Classification
This is a type of Machine learning (ML) where what is
being predicted (output class label or target) are known and
a categories or group .For example, below is a cross section
of Wisconsin Breast cancer dataset.

Predictive analytics techniques and models 8


Classification
•It has nine attributes and one output Class.
•The algorithm will teach the computer system to learn to predict, who will have cancer
(predict 1) or not (predict 0).
•The emphasis here, is that we know what is being predicted; predict 1 or 0
Another example of Binary classification for Customer’s data

Predictive analytics techniques and models 9


Classification
In the Customer’s data
It is also classification because the prediction put
them into groups. When it is two groups is
called Binary Classification, while more than
two groups are called Multi-class Classification.

Predictive analytics techniques and models 10


Metrics for Measuring the Performance of Classification
Imbalanced data:
One of the pre-existing conditions for most real-life dataset is called class
imbalance. This a situation where the groups in the datasets are not equally
distributed. One groups will be much more in number than the other groups.
In a Binary (two groups) or multi (more than two) classed dataset. The larger
group is called majority class, while the smaller group are called the minority
classes. The ratio of Majority to minority class is called the imbalance ratio
(IR).

Predictive analytics techniques and models 11


Imbalanced data:
Due to the imbalance, most classification algorithm tend to bias in
predicting the majority class(es). instead of the minority classes. A
model could have an accuracy as much as 98% without predicting
adequately the minority classes.

Whereas in most predicting modelling are usually the reason for the
prediction. For example, in health cancer dataset. The minority are
the few patience that has cancer in comparison of the many patients
without cancer which are the majority, also in intrusion detection
dataset the minority that are being sought after.

Predictive analytics techniques and models 12


Imbalanced data:
An analogy is to consider a study of a population consisting of 1000 patients,
and assuming that 900 patients out of 1000 have no disease, a model that
predicts all 1000 as not having the disease would still appear to be 90%
accurate, even if the remaining 100 patients have the disease, and they were
not identified. Therefore, Accuracy has failed or rather is not enough to
estimate the performance of classification models due to imbalance nature of
data sets.
The research into techniques for handling imbalance dataset or reducing the
negative effects of the imbalance is an active area of research. Please
see Synthetic Minority Over Sampling Techniques (SMOTE) and Variance
Ranking Attributes Selection Techniques for Binary Classification Problem in
Imbalance Data.
If Accuracy has failed, how then do we measure the performance of
classification modelling?
Predictive analytics techniques and models 13
The Confusion Matrix.
A confusion matrix, also known as an error matrix is a table
used to visualized classification performance.

Predictive analytics techniques and models 14


The Confusion Matrix.

• True positives (TP): The algorithm predicted positive, and the


correct answer is positive; (correctly predicted);
• True negatives (TN): The algorithm predicted negatives, and
correct answer is negatives (correctly predicted);
• False positives (FP): The algorithm predicted Positive, but
the correct answer is negative (incorrectly predicted); and
• False negatives (FN): The algorithm predicted negatives, but
the correct answer is positives (incorrectly predicted).

Predictive analytics techniques and models 15


The Confusion Matrix.

Predictive analytics techniques and models 16


The Receiver Operating Characteristics and Area Under the Curve.
The graph of Receiver Operating Characteristics (ROC) is used to provides a trade-off
value between Sensitivity and Specificity. The y-axis is TP (rate) plotted against FP
(rate) in the x-axis. The graph provides a corresponding score for any change in either
value using the ROC graph is possible to predict all values of TP (rate) and FP (rate) for
any type of classifier both binary and multi-classed. In the below figure. The scale of
the graph is from 0.00 to 1.00 in both axes.

Predictive analytics techniques and models 17


The Receiver Operating Characteristics and Area Under the Curve.
The graph has four curves; yellow, green, red and blue. For TP (rate)
plotted in the y-axis the highest value is 1.00; therefore, the yellow
curve with the highest TP (rate) in y-axis at the position (0.00,1.00)
is a perfect classifier (more accurate) followed by green and red
accordingly. The blue curve (straight line) is the result of random
guess classification. The more the curves get closer to the position
(0.00,1.00) the better the classification.

AUC=A+B=Area of Shaded portions


Predictive analytics techniques and models 18
Logistic Regression Classification

The screenshot
is a typical
output of
Logistic
Regression
Classification
model in
Weka

Predictive analytics techniques and models 19


Regression
Here, what is being predicted is known just like in
classification, the difference is that is not in a group, rather is
a continuous number. In the example below, the algorithm
will teach the computer system to learn to predict the prices
of cars based on the seven attributes

Predictive analytics techniques and models 20


Regression
The prediction is continuous value, it has a range from minimum to
maximum, it cannot be group into categories, hence is called
supervised learning Regression.
Regression is used to predict an output (Y)called dependent
variables using inputs (X) called independent variables.
Example, we want to predict the Sales of Ice Cream based on
the Temperature of the day.
Sales (Y)is the Output >>>>> dependent variables.
Temperature (X)is input >>>>> independent variables.
Sales (Y)of Ticket based on the TV advertisement.
Sales is the Output >>>>> dependent variables.
TV advertisement (X) is input >>>>> independent variables.
Predictive analytics techniques and models 21
Univariate and Multivariate Linear Regression
The example is called Univariate Linear Regression because a
single variables(Unit) which is TV advertisement(x) is used to
predict the number of Sales (Y).

Predictive analytics techniques and models 22


Predictive analytics techniques and models 23
Multivariate Linear
Regression- is when more
than one input (x1,x2,
….x3) are used to predict an
Output (Y).
•For example;
•Predict the Sale (Y) based
on three inputs; TV
advertisement (x1),
Radio advertisement (x2)
and Newspaper
advertisement (x3)
Predictive analytics techniques and models 24
Some sections of car dataset used to
predict the prices of Cars

Predicting the prices


of different car based
on their properties.

Predictive analytics techniques and models 25


Metrics for Measuring the Performance of Regression
To evaluate the
performance of
Regression model is
totally different for that
of classification. In
regression, all the
evaluation is trying to
justify how well the
plotted points is closer
to the actual points.

Predictive analytics techniques and models 26


1. R-squared (R2), is a proportion of variation in the output explained by the input variables. The Higher the R-
squared, the better the model.
2. Mean Squared Error and Root Mean Squared Error (RMSE), this measures the mean error when predicting
the output. Mathematically, MSE and RMSE are related, MSE = mean((observeds - predicteds)^2) and RMSE
= sqrt(MSE). The lower the RMSE, the better the model.
3. Residual Standard Error (RSE), also called model sigma, is when the RMSE calculated for the number of
inputs. The lower the RSE, the better the model.
4. Mean Absolute Error (MAE), like the RMSE, the MAE measures the inputs error. Mathematically, it is the
average absolute difference between actual and predicted ouputs, MAE = mean(abs(observeds - predicteds)).
MAE is less sensitive to outliers compared to RMSE.

Predictive analytics techniques and models 27


Unsupervised Learning
The output or what is being predicted is not known, is a type of pattern discovery.
•The algorithm runs through the data and put the data point that shares similar characteristics
into the same cluster, hence the name Clustering or Association.
•It is now left for the ML expert to discover the pattern or similar characteristics which each
member of the clusters or associate has in common.
•Why were they group together?
•Unsupervised Learning do not require label or output target.

Predictive analytics techniques and models 28


End:

Predictive analytics techniques and models 29

You might also like