Lecture 5b - Model Performance Analytics
Lecture 5b - Model Performance Analytics
• 𝑓 𝑥 = 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3
• 𝑓 𝑥 = 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 + 𝑤4 𝑥4 + 𝑤5 𝑥5
• 𝑓 𝑥 = 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 + 𝑤4 𝑥4 + 𝑤5 𝑥5 + 𝑤6 𝑥12
• 𝑓 𝑥 = 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 + 𝑤4 𝑥4 + 𝑤5 𝑥5 + 𝑤6 𝑥12 + 𝑤7 ∗
𝑥2 /𝑥3
Example: Classifying Flowers
Example: Classifying Flowers
Example: Classifying Flowers
Example: Classifying Flowers
Need for holdout evaluation
Over-fitting
Good Fit
• Tree Induction:
• Post-pruning
• takes a fully-grown decision tree and discards unreliable parts
• Pre-pruning
• stops growing a branch when information becomes unreliable
Linear Models:
• Feature Selection
• Regularization
• Optimize some combination of fit and simplicity
Regularization
• argmax[fit 𝒙, 𝒘 − 𝜆 ∗ penalty(𝒘)]
𝑾
• “L2-norm”
• The sum of the squares of the weights
• L2-norm + standard least-squares linear regression = ridge regression
• “L1-norm”
• The sum of the absolute values of the weights
• L1-norm + standard least-squares linear regression = lasso
• Automatic feature selection
Nested Cross-Validation
Some Possible Remedies*
• Use simpler (less flexible) models.
• Use fewer features in final model (feature selection).
• Enrich data with influential predictors.
• Enrich (training) data with more observations.
• Tip: Also check compatibility of training and validation sets.
Q&A