0% found this document useful (0 votes)

2 views34 pages

l05 Machine Learning

The document discusses linear models for classification, focusing on logistic regression and support vector machines (SVM). It covers concepts such as empirical risk minimization, regularization, and the differences between binary and multiclass classification methods. Additionally, it highlights practical considerations for implementing these models using scikit-learn and the trade-offs associated with stochastic gradient descent.

Uploaded by

sashakayukov23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views34 pages

l05 Machine Learning

Uploaded by

sashakayukov23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

6012B0419Y Machine Learning

Linear Models for Classification

13-11-2022

Guido van Capelleveen

(Prepared by: Stevan Rudinac)

Slide Credit
●Andreas Müller, lecturer at the Data Science
Institute at Columbia University
● Author of the book we will be using for this course
“Introduction to Machine Learning with Python”

● Great materials available at:

● https://github.com/amueller/applied_ml_spring_2017/
● https://amueller.github.io/applied_ml_spring_2017/
Linear Models for Binary Classification

3
4
(regularized) Empirical Risk
Minimization

Data fitting Regularization

Who remembers this slide from the previous lecture?

Differences Between Algorithms
●The way in which they measure how well a particular
combination of coefficients and intercept fits the
training data

● If and what kind of regularization they use

6
By default both apply L2 regularization 7
Picking a loss?

Obvious idea:
Minimize number of misclassifications aka 0-1 loss.
But: non-convex, not continuous => Relax

8
Logistic Regression

9
Penalized Logistic Regression

C is inverse to alpha (or alpha / n_samples)

Both versions strongly convex, l2 version smooth (differentiable).

All points contribute to w (dense solution to dual).

10
Effect of Regularization

Decision boundaries of a linear SVM for different values of C

● Small C (a lot of regularization) limits the influence of individual

points! 14
(soft margin) linear SVM
(soft margin) linear SVM

Both versions strongly convex, neither smooth.

Only some points contribute (the support vectors) to w (sparse solution to dual).

13
SVM or LogReg?

Do you need probability estimates?

no yes

It doesn’t matter Logistic Regression

- try either / both

Need compact model or believe solution is sparse? Use L1.

14
LR Example C=1

● The default value of C=1 provides good performance

● However, performance on the training and the test set are very

close, so we are likely underfitting

15
LR Example C=100

● Using C=100 results in higher training and test set accuracies

● A more complex model performs better in this case

16
LR Example C=0.01

● Training and test set accuracy decrease

● What does it tell us?

17
Overfitting and Underfitting
Multiclass Classification

23
Reduction to Binary Classification
For 4 classes
● One vs Rest Standard

1v{2, 3, 4}, 2v{1, 3, 4}, 3v{1, 2, 4}, 4v{1, 2, 3}

n binary classifiers - each on all data

● One vs One
1v2, 1v3, 1v4, 2v3, 2v4, 3v4
n * (n-1) / 2 binary classifiers - each on a fraction of the data

24
25
Prediction with One Vs Rest

●“Class with highest score” / highest result of the

classification confidence formula

● Unclear why it even

works but works
well.
26
One vs Rest
Prediction with One Vs One
● “Vote for highest positives”
● Classify by all classifiers.
● Count how often each class was predicted.
● Return most commonly predicted class.

● Again – just a heuristic.

28
One vs One
Iris Dataset

30
What about Coefficients?

● Each row of coef_ contains the coefficient vector for one of the
three classes
● Each column holds the coefficient value for a specific feature

31
Coefficient and Intercept

32
Multinomial Logistic Regression
Probabilistic multi-class model:

Same prediction
rule as OvR!
33
In scikit-learn
● OvO: only SVC
●OvR: default for all linear models, even
LogisticRegression
● LogisticRegression(multinomial=True)
● clf.decision_function = w^Tx
● logreg.predict_proba
● SVC(probability=True) not great

34
Computational Considerations
(for all linear models)

35
Solver Choices
● Don’t use SVC(kernel=’linear’), use LinearSVC
● For n_features >> n_samples: Lars (or LassoLars) instead

of Lasso.
● For small n_samples (<10.000?), don’t worry.

● LinearSVC, LogisticRegression: dual=False if

n_samples >> n_features

● LogisticRegression(solver=”sag”) for n_samples large.

● Stochastic Gradient Descent for “n_samples really

large”

36
Stochastic Gradient Descent

Leon Bottou and Olivier Bousquet. 2007. The tradeoffs of large scale
learning. In Proceedings of the 20th International Conference on Neural
Information Processing Systems, 161–168.

https://scikit-learn.org/stable/modules/sgd.html

37
SGD Pros and Cons
PROS
● Efficiency

● Ease of implementation (lots of opportunities for code

tuning)

CONS
● SGD requires a number of hyperparameters such as the

regularization parameter and the number of iterations

● SGD is sensitive to feature scaling

FAFD Questions
90% (10)
FAFD Questions
89 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
Rolls-Royce 20-25HP - Handbook - To - XVI - Chap2-2
No ratings yet
Rolls-Royce 20-25HP - Handbook - To - XVI - Chap2-2
10 pages
MTS Mto Ato Cto Eto PDF
No ratings yet
MTS Mto Ato Cto Eto PDF
5 pages
Lecture03. Classification (Chapter 3)
No ratings yet
Lecture03. Classification (Chapter 3)
46 pages
Bell ADT D-Series General Info
100% (1)
Bell ADT D-Series General Info
32 pages
Chapter 7 - Introduction To Arrays
No ratings yet
Chapter 7 - Introduction To Arrays
33 pages
Beyond Binary Classification
No ratings yet
Beyond Binary Classification
34 pages
Amalgamation & Sale of Partnership Firm
No ratings yet
Amalgamation & Sale of Partnership Firm
24 pages
Malawi - Assessment of Agricultural Information Needs
100% (1)
Malawi - Assessment of Agricultural Information Needs
122 pages
Linear Classification: Slides Credit: CMU AI, Zico Kolter, Pat Virtue
No ratings yet
Linear Classification: Slides Credit: CMU AI, Zico Kolter, Pat Virtue
527 pages
AHLBORN - Katalog Mjernih Instrumenata 2016
No ratings yet
AHLBORN - Katalog Mjernih Instrumenata 2016
311 pages
The 12 Essential Elements of Data Center Facility Operations
No ratings yet
The 12 Essential Elements of Data Center Facility Operations
43 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
MMG 301 Final March18
No ratings yet
MMG 301 Final March18
143 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
Dtv-md-0359-Directv Shef Public Beta Command Set-V1.0
No ratings yet
Dtv-md-0359-Directv Shef Public Beta Command Set-V1.0
25 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
APA Chapter3 T20
No ratings yet
APA Chapter3 T20
24 pages
Combinepdf 7
No ratings yet
Combinepdf 7
120 pages
Forward Look 2023 - Analysis and Research Team
No ratings yet
Forward Look 2023 - Analysis and Research Team
19 pages
PTS1 Reader
No ratings yet
PTS1 Reader
130 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
SVM
No ratings yet
SVM
59 pages
ML - Mod2 Classification
No ratings yet
ML - Mod2 Classification
74 pages
Lecture 4
No ratings yet
Lecture 4
63 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
2018 02 Msu Data Science
No ratings yet
2018 02 Msu Data Science
65 pages
Lecture - 4 - Logistic Regression
No ratings yet
Lecture - 4 - Logistic Regression
62 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
l06 Machine Learning
No ratings yet
l06 Machine Learning
52 pages
Lecture 11 - 09.09.24 Classification Part 1
No ratings yet
Lecture 11 - 09.09.24 Classification Part 1
51 pages
Machine Learning: Engr. Ejaz Ahmad
No ratings yet
Machine Learning: Engr. Ejaz Ahmad
54 pages
Lec 05
No ratings yet
Lec 05
54 pages
Chapter 3 - Sources of Financing
No ratings yet
Chapter 3 - Sources of Financing
5 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
机器学习
No ratings yet
机器学习
41 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Adani Foundation Annual Report - 2020-21
No ratings yet
Adani Foundation Annual Report - 2020-21
33 pages
Lecture 15 - Recap and Midterm Review
No ratings yet
Lecture 15 - Recap and Midterm Review
37 pages
AI Lec 4
No ratings yet
AI Lec 4
35 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
l13 Machine Learning
No ratings yet
l13 Machine Learning
37 pages
SML Lecture1
No ratings yet
SML Lecture1
37 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Lecture 02
No ratings yet
Lecture 02
43 pages
ML Classification Trupesh Patel
No ratings yet
ML Classification Trupesh Patel
39 pages
l09 Machine Learning
No ratings yet
l09 Machine Learning
39 pages
CS373 Lecture18.1
No ratings yet
CS373 Lecture18.1
33 pages
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
No ratings yet
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
32 pages
Ds 2
No ratings yet
Ds 2
27 pages
Classification: Prof. Gheith Abandah
No ratings yet
Classification: Prof. Gheith Abandah
30 pages
Linear Models and Learning Via Optimization: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Linear Models and Learning Via Optimization: Piyush Rai Introduction To Machine Learning (CS771A)
26 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Chapter 5 & 6
No ratings yet
Chapter 5 & 6
28 pages
Lect 1
No ratings yet
Lect 1
24 pages
Lect 02 Evaluation Part 1
No ratings yet
Lect 02 Evaluation Part 1
33 pages
MLT Notes
No ratings yet
MLT Notes
28 pages
Daily Work Instructions Plan (IKH) 03-04 12 24
No ratings yet
Daily Work Instructions Plan (IKH) 03-04 12 24
3 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
5 Markd
No ratings yet
5 Markd
24 pages
L02 Classification and Regression
No ratings yet
L02 Classification and Regression
26 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Infosys Questions - 2
No ratings yet
Infosys Questions - 2
21 pages
Ml-Unit 2-QB
No ratings yet
Ml-Unit 2-QB
6 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
No ratings yet
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
10 pages
PW3 SupervisedLearning
No ratings yet
PW3 SupervisedLearning
10 pages
Lecture 1, Part 2: Linear Classification: Roger Grosse
No ratings yet
Lecture 1, Part 2: Linear Classification: Roger Grosse
10 pages
Set Alpha - Model Paper PSPM SP025 - KMM 23-24 - Answer
No ratings yet
Set Alpha - Model Paper PSPM SP025 - KMM 23-24 - Answer
9 pages
Notes Binary Mulitclass Classification
No ratings yet
Notes Binary Mulitclass Classification
7 pages
DSCI 6003 Class Notes
No ratings yet
DSCI 6003 Class Notes
7 pages
Division of Negros Occidental
No ratings yet
Division of Negros Occidental
5 pages
Conclusion
No ratings yet
Conclusion
2 pages
Classifying Data Using Support Vector Machines (SVMS) in Python
No ratings yet
Classifying Data Using Support Vector Machines (SVMS) in Python
5 pages
Assignment 3
No ratings yet
Assignment 3
5 pages
Machine Learning and Pattern Recognition Week 3 Intro - Classification
No ratings yet
Machine Learning and Pattern Recognition Week 3 Intro - Classification
5 pages
ML Insem Solved Question Paper
No ratings yet
ML Insem Solved Question Paper
4 pages
Digital Resonance Frequency Tester
No ratings yet
Digital Resonance Frequency Tester
3 pages
2024 WASCCE ALT A Chemistry Likely Questions
No ratings yet
2024 WASCCE ALT A Chemistry Likely Questions
3 pages
Statement of Financial Position (S.F.P)
No ratings yet
Statement of Financial Position (S.F.P)
3 pages
Data Sheet - AST r01
No ratings yet
Data Sheet - AST r01
3 pages
CYCLOPENTANE
No ratings yet
CYCLOPENTANE
2 pages
LFP Syllabus
No ratings yet
LFP Syllabus
2 pages
Concert Mri Datasheet
No ratings yet
Concert Mri Datasheet
3 pages
Universidad Nacional Mayor de San Marcos: Facultad de Ingenieria Industrial Calculo Iii Práctica 2
No ratings yet
Universidad Nacional Mayor de San Marcos: Facultad de Ingenieria Industrial Calculo Iii Práctica 2
3 pages
BlakeBlossomXXX OnlyFans Pictures & Videos Complete Siterip 3 Download
No ratings yet
BlakeBlossomXXX OnlyFans Pictures & Videos Complete Siterip 3 Download
1 page
Métodos numéricos aplicados a Ingeniería: Casos de estudio usando MATLAB
From Everand
Métodos numéricos aplicados a Ingeniería: Casos de estudio usando MATLAB
Héctor Jorquera González
5/5 (1)

l05 Machine Learning

Uploaded by

l05 Machine Learning

Uploaded by

6012B0419Y Machine Learning

Linear Models for Classification

Guido van Capelleveen

(Prepared by: Stevan Rudinac)

● Great materials available at:

Data fitting Regularization

Who remembers this slide from the previous lecture?

● If and what kind of regularization they use

C is inverse to alpha (or alpha / n_samples)

Both versions strongly convex, l2 version smooth (differentiable).

Decision boundaries of a linear SVM for different values of C

● Small C (a lot of regularization) limits the influence of individual

Both versions strongly convex, neither smooth.

Do you need probability estimates?

It doesn’t matter Logistic Regression

Need compact model or believe solution is sparse? Use L1.

● The default value of C=1 provides good performance

close, so we are likely underfitting

● Using C=100 results in higher training and test set accuracies

● Training and test set accuracy decrease

1v{2, 3, 4}, 2v{1, 3, 4}, 3v{1, 2, 4}, 4v{1, 2, 3}

●“Class with highest score” / highest result of the

● Unclear why it even

● Again – just a heuristic.

● LinearSVC, LogisticRegression: dual=False if

n_samples >> n_features

● Stochastic Gradient Descent for “n_samples really

● Ease of implementation (lots of opportunities for code

regularization parameter and the number of iterations

You might also like