11 - Model Eval and Tuning
11 - Model Eval and Tuning
Tuning
Cross-industry Standard Process for Data Mining (CRISP-DM)
R-Squared
◦ Sometimes called “coefficient of determination”
◦ Interpreted as
◦ Percent of variance explained by the model as compared to default model
Adjusted R-Squared
◦ Same Interpretation as R2
2 (𝑆𝑆 𝐸 𝑟𝑒𝑔 )/(𝑛 − 𝐾 )
◦ Includes penalty for adding predictors 𝐴𝑑𝑗 𝑅 =1 −
( 𝑆𝑆 𝐸 𝑠𝑖𝑚𝑝𝑙𝑒 ) /(𝑛 − 𝐾 )
Evaluating Residuals
Normal distribution of residuals
◦ Jarque-Bera Test
◦ Evaluates skewness and kurtosis
◦ Shapiro-Wilk Test
◦ Evaluates of the sample comes from a normally distributed population
◦ Graphical Tests
◦ Quantile-Quantile (QQ) plot
◦ Prediction versus Observed Plots
Homoscedasticity
◦ Bartlett’s Test
◦ If the ratio of the largest
variance to smallest
variance is 1.5 or below
Classifier Evaluation Measures
Confusion matrix
Classified or Predicted
a b
Actual a aa ab
b ba bb
D = aa+bb+ab+ba,
Actual a = (aa+ab), also Actual non-b
Actual b = (ba+bb), also Actual non-a
Classified a = (aa+ba), Classified b = (ab+bb)
6
Evaluation Metrics
Accuracy is the overall correctness of the model and is calculated as the sum of correct
classifications divided by the total number of classifications.
Accuracy = (aa+bb) / D
True Positive Rate (a) = aa / Actual a
True Positive Rate (b) = bb / Actual b
False Positive Rate (a) = ba / Actual b
False Positive Rate (b) = ab / Actual a
7
Precision, Recall, and F-Measure
Precision
◦ Measure of accuracy for a specific class.
◦ Precision (a) = aa/ Classified a
◦ Precision (b) = bb/ Classified b
Recall is a measure of the ability of a classification model to select instances of a certain class from a
data set. It is commonly also called sensitivity.
◦ Equivalent to TP rate.
◦ Recall (a) = aa / Actual a
◦ Recall (b) = bb/ Actual b
F-Measure
◦ The F-measure is the harmonic mean of precision and recall.
◦ It can be used as a single measure of performance of the test.
◦ F = ( 2 x Precision x Recall ) / ( Precision + Recall )
8
ROC Curves
Used for binary classifiers
Originated with radar analysis of signal vs noise
Plot of True Positive Rate (Recall)
against False Positive Rate
The larger the Area Under the Curve (AUC), the better
the discrimination of classification
Can monitor the relationship through the ranges of
positives and negatives
K-Fold Cross Validation
Useful for smaller data sets
Alternative to train – test splitting
◦ Every piece of data is used for both
training and testing
◦ If k = number of folds then
◦ Each data point will be used for training k-1 times
◦ Each data point will be used for testing 1 time
Optimization – Hyperparameter
Tuning
A hyperparameter is defined as
◦ A parameter that impact model performance but …
◦ Is NOT learned from the data
Examples include:
◦ Number of neighbors in a KNN model (MAE, RMSE)
◦ Number of clusters in a clustering model (Silhouette Score)
◦ Number of levels in a decision tree (Precision / Recall)
◦ Number of epochs in a neural network model (Train / Validation performance by epoch)
Tuning procedure
◦ Decide on metric that you will use to evaluate model performance
◦ Examples include
◦ Repeat the training / testing process for many hyperparameter values
Example – KNN Tuning with MAE
Techniques for Improving
Performance
Transforming target variables
◦ When targets are highly skewed it can cause issues with model performance
◦ Can make the model difficult to interpret
◦ Example: Regression predicting diamond prices
Group with highest presentation score (Items 2 and 3 above) will be given the choice to waive
the final exam, which means that you can take the grade you have without taking the final.
Final Exam
I will present you with two short scenarios with a data set for each scenario.
Each scenario will require data analysis to address the scenario requirements. Therefore you must:
◦ Select the appropriate analytical technique based on
◦ Scenario objective
◦ Structure of the data set, especially the target variable.
Submission will be a Word Document or PDF illustrating your decisions, analysis, processes, etc.
Final will be released after the presentations on Dec 2 and will be at 10 PM on Dec 9.
This is a take-home exam, so be aware that the following conditions will be strictly enforced:
◦ Work alone. Anyone reported or observed to be working with others will have their scores adjusted downward. This will be strictly
enforced for the final. No exceptions.
◦ Please do not ask other students in the class for help. That is unfair to them because they will have to say no or could have their own
grade impacted.
◦ No late exams will be accepted. If they are not submitted on time, your score will be 0
The team that wins the presentation judging will have the option to waive the exam. This does not mean that you will get
100% on the exam, it means that you can accept your grade before the exam as your final grade.