l05 Machine Learning
l05 Machine Learning
3
4
(regularized) Empirical Risk
Minimization
6
By default both apply L2 regularization 7
Picking a loss?
Obvious idea:
Minimize number of misclassifications aka 0-1 loss.
But: non-convex, not continuous => Relax
8
Logistic Regression
9
Penalized Logistic Regression
10
Effect of Regularization
13
SVM or LogReg?
14
LR Example C=1
15
LR Example C=100
16
LR Example C=0.01
17
Overfitting and Underfitting
Multiclass Classification
23
Reduction to Binary Classification
For 4 classes
● One vs Rest Standard
● One vs One
1v2, 1v3, 1v4, 2v3, 2v4, 3v4
n * (n-1) / 2 binary classifiers - each on a fraction of the data
24
25
Prediction with One Vs Rest
28
One vs One
Iris Dataset
30
What about Coefficients?
● Each row of coef_ contains the coefficient vector for one of the
three classes
● Each column holds the coefficient value for a specific feature
31
Coefficient and Intercept
32
Multinomial Logistic Regression
Probabilistic multi-class model:
Same prediction
rule as OvR!
33
In scikit-learn
● OvO: only SVC
●OvR: default for all linear models, even
LogisticRegression
● LogisticRegression(multinomial=True)
● clf.decision_function = w^Tx
● logreg.predict_proba
● SVC(probability=True) not great
34
Computational Considerations
(for all linear models)
35
Solver Choices
● Don’t use SVC(kernel=’linear’), use LinearSVC
● For n_features >> n_samples: Lars (or LassoLars) instead
of Lasso.
● For small n_samples (<10.000?), don’t worry.
large”
36
Stochastic Gradient Descent
Leon Bottou and Olivier Bousquet. 2007. The tradeoffs of large scale
learning. In Proceedings of the 20th International Conference on Neural
Information Processing Systems, 161–168.
https://scikit-learn.org/stable/modules/sgd.html
37
SGD Pros and Cons
PROS
● Efficiency
tuning)
CONS
● SGD requires a number of hyperparameters such as the
38