DataCamp Linear
Classifiers in Python
LINEAR CLASSIFIERS IN PYTHON
Support Vectors
Michael (Mike) Gelbart
Instructor
The University of British Columbia
DataCamp Linear Classifiers in Python
What is an SVM?
Linear classifiers (so far)
Trained using the hinge loss and L2 regularization
DataCamp Linear Classifiers in Python
What are support vectors?
Support vector: a training example not in the flat part of the loss
diagram
Support vector: an example that is incorrectly classified or close to
the boundary
If an example is not a support vector, removing it has no effect on
the model
Having a small number of support vectors makes kernel SVMs really
fast
DataCamp Linear Classifiers in Python
Max-margin viewpoint
The SVM maximizes the "margin" for linearly separable datasets
Margin: distance from the boundary to the closest points
DataCamp Linear Classifiers in Python
Max-margin viewpoint
The SVM maximizes the "margin" for linearly separable datasets
Margin: distance from the boundary to the closest points
DataCamp Linear Classifiers in Python
LINEAR CLASSIFIERS IN PYTHON
Let's practice!
DataCamp Linear Classifiers in Python
LINEAR CLASSIFIERS IN PYTHON
Kernel SVMs
Michael (Mike) Gelbart
Instructor
The University of British Columbia
DataCamp Linear Classifiers in Python
Transforming your features
DataCamp Linear Classifiers in Python
Transforming your features
DataCamp Linear Classifiers in Python
Transforming your features
transformed feature = (original feature)2
DataCamp Linear Classifiers in Python
Transforming your features
transformed feature = (original feature)2
DataCamp Linear Classifiers in Python
Transforming your features
transformed feature = (original feature)2
DataCamp Linear Classifiers in Python
Kernel SVMs
In [1]: from sklearn.svm import SVC
In [2]: svm = SVC(gamma=1) # default is kernel="rbf"
DataCamp Linear Classifiers in Python
Kernel SVMs
In [1]: from sklearn.svm import SVC
In [2]: svm = SVC(gamma=0.01) # default is kernel="rbf"
smaller gamma leads to smoother boundaries
DataCamp Linear Classifiers in Python
Kernel SVMs
In [1]: from sklearn.svm import SVC
In [2]: svm = SVC(gamma=2) # default is kernel="rbf"
larger gamma leads to more complex boundaries
DataCamp Linear Classifiers in Python
LINEAR CLASSIFIERS IN PYTHON
Let's practice!
DataCamp Linear Classifiers in Python
LINEAR CLASSIFIERS IN PYTHON
Comparing logistic
regression and SVM
Michael (Mike) Gelbart
Instructor
The University of British Columbia
DataCamp Linear Classifiers in Python
Pros and Cons
Logistic regression: Support vector machine (SVM):
Is a linear classifier Is a linear classifier
Can use with kernels, but slow Can use with kernels, and fast
Outputs meaningful probabilities Does not naturally output
Can be extended to multi-class probabilities
All data points affect fit Can be extended to multi-class
L2 or L1 regularization Only "support vectors" affect fit
Conventionally just L2
regularization
DataCamp Linear Classifiers in Python
Use in scikit-learn
Logistic regression in sklearn: SVM in sklearn:
linear_model.LogisticRegression svm.LinearSVC and svm.SVC
Key hyperparameters in sklearn: Key hyperparameters in sklearn:
C (inverse regularization C (inverse regularization
strength) strength)
penalty (type of regularization) kernel (type of kernel)
multi_class (type of multi-class) gamma (inverse RBF
smoothness)
DataCamp Linear Classifiers in Python
SGDClassifier
SGDClassifier: scales well to large datasets
In [1]: from sklearn.linear_model import SGDClassifier
In [2]: logreg = SGDClassifier(loss='log')
In [3]: linsvm = SGDClassifier(loss='hinge')
SGDClassifier hyperparameter alpha is like 1/C
DataCamp Linear Classifiers in Python
LINEAR CLASSIFIERS IN PYTHON
Let's practice!
DataCamp Linear Classifiers in Python
LINEAR CLASSIFIERS IN PYTHON
Conclusion
Michael (Mike) Gelbart
Instructor
The University of British Columbia
DataCamp Linear Classifiers in Python
How does this course fit into Data Science?
Data science
--> Machine learning
--> --> Supervised learning
--> --> --> Classification
--> --> --> --> Linear classifiers (this course)
DataCamp Linear Classifiers in Python
LINEAR CLASSIFIERS IN PYTHON
Congratulations &
Thanks!