0% found this document useful (0 votes)
11 views17 pages

11 - Model Eval and Tuning

Uploaded by

eric8y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views17 pages

11 - Model Eval and Tuning

Uploaded by

eric8y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Model Evaluation and

Tuning
Cross-industry Standard Process for Data Mining (CRISP-DM)

Figure by Kenneth Jensen 2


Evaluating Regressions
Mean Squared Error (MSE)
◦ Average squared residuals
◦ Pros
◦ Easy to calculate
◦ Widely used
◦ Cons
◦ Can be difficult to interpret

Root Means Squared Error (RMSE)


◦ Square root of MSE
◦ Puts MSE back into original units
◦ How far, on average, the predictions are from actuals
Evaluating Regressions (cont.)
Mean Absolute Error
◦ The average of the absolute values of deviations
◦ Easy to interpret
◦ Easy to calculate

R-Squared
◦ Sometimes called “coefficient of determination”
◦ Interpreted as
◦ Percent of variance explained by the model as compared to default model

Adjusted R-Squared
◦ Same Interpretation as R2
2 (𝑆𝑆 𝐸 𝑟𝑒𝑔 )/(𝑛 − 𝐾 )
◦ Includes penalty for adding predictors 𝐴𝑑𝑗 𝑅 =1 −
( 𝑆𝑆 𝐸 𝑠𝑖𝑚𝑝𝑙𝑒 ) /(𝑛 − 𝐾 )
Evaluating Residuals
Normal distribution of residuals
◦ Jarque-Bera Test
◦ Evaluates skewness and kurtosis
◦ Shapiro-Wilk Test
◦ Evaluates of the sample comes from a normally distributed population
◦ Graphical Tests
◦ Quantile-Quantile (QQ) plot
◦ Prediction versus Observed Plots

Homoscedasticity
◦ Bartlett’s Test
◦ If the ratio of the largest
variance to smallest
variance is 1.5 or below
Classifier Evaluation Measures
Confusion matrix
Classified or Predicted
a b
Actual a aa ab
b ba bb

D = aa+bb+ab+ba,
Actual a = (aa+ab), also Actual non-b
Actual b = (ba+bb), also Actual non-a
Classified a = (aa+ba), Classified b = (ab+bb)

6
Evaluation Metrics
Accuracy is the overall correctness of the model and is calculated as the sum of correct
classifications divided by the total number of classifications.
Accuracy = (aa+bb) / D
True Positive Rate (a) = aa / Actual a
True Positive Rate (b) = bb / Actual b
False Positive Rate (a) = ba / Actual b
False Positive Rate (b) = ab / Actual a

7
Precision, Recall, and F-Measure
Precision
◦ Measure of accuracy for a specific class.
◦ Precision (a) = aa/ Classified a
◦ Precision (b) = bb/ Classified b

Recall is a measure of the ability of a classification model to select instances of a certain class from a
data set. It is commonly also called sensitivity.
◦ Equivalent to TP rate.
◦ Recall (a) = aa / Actual a
◦ Recall (b) = bb/ Actual b

F-Measure
◦ The F-measure is the harmonic mean of precision and recall.
◦ It can be used as a single measure of performance of the test.
◦ F = ( 2 x Precision x Recall ) / ( Precision + Recall )

8
ROC Curves
Used for binary classifiers
Originated with radar analysis of signal vs noise
Plot of True Positive Rate (Recall)
against False Positive Rate
The larger the Area Under the Curve (AUC), the better
the discrimination of classification
Can monitor the relationship through the ranges of
positives and negatives
K-Fold Cross Validation
Useful for smaller data sets
Alternative to train – test splitting
◦ Every piece of data is used for both
training and testing
◦ If k = number of folds then
◦ Each data point will be used for training k-1 times
◦ Each data point will be used for testing 1 time
Optimization – Hyperparameter
Tuning
A hyperparameter is defined as
◦ A parameter that impact model performance but …
◦ Is NOT learned from the data

Examples include:
◦ Number of neighbors in a KNN model (MAE, RMSE)
◦ Number of clusters in a clustering model (Silhouette Score)
◦ Number of levels in a decision tree (Precision / Recall)
◦ Number of epochs in a neural network model (Train / Validation performance by epoch)

Tuning procedure
◦ Decide on metric that you will use to evaluate model performance
◦ Examples include
◦ Repeat the training / testing process for many hyperparameter values
Example – KNN Tuning with MAE
Techniques for Improving
Performance
Transforming target variables
◦ When targets are highly skewed it can cause issues with model performance
◦ Can make the model difficult to interpret
◦ Example: Regression predicting diamond prices

** Note: You can also do this with predictors


Look for segments of better
prediction
Evaluate residuals over the entire range of target values
It might show that prediction is better in a specific range
Questions?
Presentations
The following team members from RevUnit will be here to judge your presentations:
◦ Tim Lee: Product Owner
◦ Anh Ta: VP of Design
◦ Renee Lu: Machine Learning Engineer (UNLV Grad)

Group Projects will be graded in the following Rubric:


1. 10%: Evaluation of other group members (360 Eval)
2. 30%: Quality of presentation / audience engagement
3. 30%: Appropriateness of analytical processes and relevance of results
4. 30%: Executive Summary – completeness, grammar, design, etc.

Group with highest presentation score (Items 2 and 3 above) will be given the choice to waive
the final exam, which means that you can take the grade you have without taking the final.
Final Exam
I will present you with two short scenarios with a data set for each scenario.
Each scenario will require data analysis to address the scenario requirements. Therefore you must:
◦ Select the appropriate analytical technique based on
◦ Scenario objective
◦ Structure of the data set, especially the target variable.

Submission will be a Word Document or PDF illustrating your decisions, analysis, processes, etc.
Final will be released after the presentations on Dec 2 and will be at 10 PM on Dec 9.
This is a take-home exam, so be aware that the following conditions will be strictly enforced:
◦ Work alone. Anyone reported or observed to be working with others will have their scores adjusted downward. This will be strictly
enforced for the final. No exceptions.
◦ Please do not ask other students in the class for help. That is unfair to them because they will have to say no or could have their own
grade impacted.
◦ No late exams will be accepted. If they are not submitted on time, your score will be 0

The team that wins the presentation judging will have the option to waive the exam. This does not mean that you will get
100% on the exam, it means that you can accept your grade before the exam as your final grade.

You might also like