Open navigation menu

Scribd

0% found this document useful (0 votes)

59 views47 pages

Supervised Learning Using Python - Chapter3

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views47 pages

Supervised Learning Using Python - Chapter3

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

How good is your

model?
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N

George Boorman
Core Curriculum Manager, DataCamp
Classification metrics
Measuring model performance with accuracy:
Fraction of correctly classified samples

Not always a useful metric

SUPERVISED LEARNING WITH SCIKIT-LEARN

Class imbalance
Classification for predicting fraudulent bank transactions
99% of transactions are legitimate; 1% are fraudulent

Could build a classifier that predicts NONE of the transactions are fraudulent
99% accurate!
But terrible at actually predicting fraudulent transactions

Fails at its original purpose

Class imbalance: Uneven frequency of classes

Need a different way to assess performance

SUPERVISED LEARNING WITH SCIKIT-LEARN

Confusion matrix for assessing classification
performance
Confusion matrix

SUPERVISED LEARNING WITH SCIKIT-LEARN

Assessing classification performance

SUPERVISED LEARNING WITH SCIKIT-LEARN

Assessing classification performance

SUPERVISED LEARNING WITH SCIKIT-LEARN

Assessing classification performance

SUPERVISED LEARNING WITH SCIKIT-LEARN

Assessing classification performance

SUPERVISED LEARNING WITH SCIKIT-LEARN

Assessing classification performance

SUPERVISED LEARNING WITH SCIKIT-LEARN

Assessing classification performance

SUPERVISED LEARNING WITH SCIKIT-LEARN

Assessing classification performance

SUPERVISED LEARNING WITH SCIKIT-LEARN

Assessing classification performance

Accuracy:

SUPERVISED LEARNING WITH SCIKIT-LEARN

Precision

Precision

High precision = lower false positive rate

High precision: Not many legitimate transactions are predicted to be fraudulent

SUPERVISED LEARNING WITH SCIKIT-LEARN

Recall

Recall

High recall = lower false negative rate

High recall: Predicted most fraudulent transactions correctly

SUPERVISED LEARNING WITH SCIKIT-LEARN

F1 score
precision ∗ recall
F1 Score: 2 ∗ precision + recall

SUPERVISED LEARNING WITH SCIKIT-LEARN

Confusion matrix in scikit-learn
from [Link] import classification_report, confusion_matrix
knn = KNeighborsClassifier(n_neighbors=7)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,
random_state=42)
[Link](X_train, y_train)
y_pred = [Link](X_test)

SUPERVISED LEARNING WITH SCIKIT-LEARN

Confusion matrix in scikit-learn
print(confusion_matrix(y_test, y_pred))

[[1106 11]
[ 183 34]]

SUPERVISED LEARNING WITH SCIKIT-LEARN

Classification report in scikit-learn
print(classification_report(y_test, y_pred))

precision recall f1-score support

0 0.86 0.99 0.92 1117

1 0.76 0.16 0.26 217

accuracy 0.85 1334

macro avg 0.81 0.57 0.59 1334
weighted avg 0.84 0.85 0.81 1334

SUPERVISED LEARNING WITH SCIKIT-LEARN

Let's practice!
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N
Logistic regression
and the ROC curve
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N

George Boorman
Core Curriculum Manager, DataCamp
Logistic regression for binary classification
Logistic regression is used for classification problems
Logistic regression outputs probabilities

If the probability, p > 0.5:

The data is labeled 1

If the probability, p < 0.5:

The data is labeled 0

SUPERVISED LEARNING WITH SCIKIT-LEARN

Linear decision boundary

SUPERVISED LEARNING WITH SCIKIT-LEARN

Logistic regression in scikit-learn
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
[Link](X_train, y_train)
y_pred = [Link](X_test)

SUPERVISED LEARNING WITH SCIKIT-LEARN

Predicting probabilities
y_pred_probs = logreg.predict_proba(X_test)[:, 1]
print(y_pred_probs[0])

[0.08961376]

SUPERVISED LEARNING WITH SCIKIT-LEARN

Probability thresholds
By default, logistic regression threshold = 0.5
Not specific to logistic regression
KNN classifiers also have thresholds

What happens if we vary the threshold?

SUPERVISED LEARNING WITH SCIKIT-LEARN

The ROC curve

SUPERVISED LEARNING WITH SCIKIT-LEARN

The ROC curve

SUPERVISED LEARNING WITH SCIKIT-LEARN

The ROC curve

SUPERVISED LEARNING WITH SCIKIT-LEARN

The ROC curve

SUPERVISED LEARNING WITH SCIKIT-LEARN

The ROC curve

SUPERVISED LEARNING WITH SCIKIT-LEARN

The ROC curve

SUPERVISED LEARNING WITH SCIKIT-LEARN

Plotting the ROC curve
from [Link] import roc_curve
fpr, tpr, thresholds = roc_curve(y_test, y_pred_probs)
[Link]([0, 1], [0, 1], 'k--')
[Link](fpr, tpr)
[Link]('False Positive Rate')
[Link]('True Positive Rate')
[Link]('Logistic Regression ROC Curve')
[Link]()

SUPERVISED LEARNING WITH SCIKIT-LEARN

Plotting the ROC curve

SUPERVISED LEARNING WITH SCIKIT-LEARN

ROC AUC

SUPERVISED LEARNING WITH SCIKIT-LEARN

ROC AUC in scikit-learn
from [Link] import roc_auc_score
print(roc_auc_score(y_test, y_pred_probs))

0.6700964152663693

SUPERVISED LEARNING WITH SCIKIT-LEARN

Let's practice!
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N
Hyperparameter
tuning
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N

George Boorman
Core Curriculum Manager
Hyperparameter tuning
Ridge/lasso regression: Choosing alpha
KNN: Choosing n_neighbors

Hyperparameters: Parameters we specify before fitting the model

Like alpha and n_neighbors

SUPERVISED LEARNING WITH SCIKIT-LEARN

Choosing the correct hyperparameters
1. Try lots of different hyperparameter values
2. Fit all of them separately

3. See how well they perform

4. Choose the best performing values

This is called hyperparameter tuning

It is essential to use cross-validation to avoid overfitting to the test set

We can still split the data and perform cross-validation on the training set

We withhold the test set for final evaluation

SUPERVISED LEARNING WITH SCIKIT-LEARN

Grid search cross-validation

SUPERVISED LEARNING WITH SCIKIT-LEARN

Grid search cross-validation

SUPERVISED LEARNING WITH SCIKIT-LEARN

Grid search cross-validation

SUPERVISED LEARNING WITH SCIKIT-LEARN

GridSearchCV in scikit-learn
from sklearn.model_selection import GridSearchCV
kf = KFold(n_splits=5, shuffle=True, random_state=42)
param_grid = {"alpha": [Link](0.0001, 1, 10),
"solver": ["sag", "lsqr"]}
ridge = Ridge()
ridge_cv = GridSearchCV(ridge, param_grid, cv=kf)
ridge_cv.fit(X_train, y_train)
print(ridge_cv.best_params_, ridge_cv.best_score_)

{'alpha': 0.0001, 'solver': 'sag'}

0.7529912278705785

SUPERVISED LEARNING WITH SCIKIT-LEARN

Limitations and an alternative approach
3-fold cross-validation, 1 hyperparameter, 10 total values = 30 fits
10 fold cross-validation, 3 hyperparameters, 30 total values = 900 fits

SUPERVISED LEARNING WITH SCIKIT-LEARN

RandomizedSearchCV
from sklearn.model_selection import RandomizedSearchCV
kf = KFold(n_splits=5, shuffle=True, random_state=42)
param_grid = {'alpha': [Link](0.0001, 1, 10),
"solver": ['sag', 'lsqr']}
ridge = Ridge()
ridge_cv = RandomizedSearchCV(ridge, param_grid, cv=kf, n_iter=2)
ridge_cv.fit(X_train, y_train)
print(ridge_cv.best_params_, ridge_cv.best_score_)

{'solver': 'sag', 'alpha': 0.0001}

0.7529912278705785

SUPERVISED LEARNING WITH SCIKIT-LEARN

Evaluating on the test set
test_score = ridge_cv.score(X_test, y_test)
print(test_score)

0.7564731534089224

SUPERVISED LEARNING WITH SCIKIT-LEARN

Let's practice!
S U P E R V I S E D L E A R N I N G W I T H S C I K I T- L E A R N

You might also like

How Good Is Your Model?: Andreas Müller
No ratings yet
How Good Is Your Model?: Andreas Müller
54 pages
Scikit-Learn for Data Scientists
No ratings yet
Scikit-Learn for Data Scientists
31 pages
Supervised Learning with Scikit-Learn
100% (2)
Supervised Learning with Scikit-Learn
178 pages
Slides (A12 A14)
No ratings yet
Slides (A12 A14)
353 pages
Supervised Learning With Scikit-Learn
No ratings yet
Supervised Learning With Scikit-Learn
178 pages
4-1 Fine-Tuning Your Model
No ratings yet
4-1 Fine-Tuning Your Model
60 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
Supervised Learning Using Python - Chapter1
No ratings yet
Supervised Learning Using Python - Chapter1
34 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
Supervised Learning with Scikit-Learn
No ratings yet
Supervised Learning with Scikit-Learn
34 pages
Chapter 2
No ratings yet
Chapter 2
50 pages
Regression Techniques with Scikit-Learn
No ratings yet
Regression Techniques with Scikit-Learn
50 pages
Chapter 2
No ratings yet
Chapter 2
50 pages
Ch1 - Slides - Supervised Learning
No ratings yet
Ch1 - Slides - Supervised Learning
32 pages
Supervised Learning: Andreas Müller
No ratings yet
Supervised Learning: Andreas Müller
43 pages
Supervised Learning with Scikit-Learn
No ratings yet
Supervised Learning with Scikit-Learn
31 pages
3-2 Supervised Learning With Scikit-Learn - Chapter 2 Regression
No ratings yet
3-2 Supervised Learning With Scikit-Learn - Chapter 2 Regression
58 pages
Binary Classifier Evaluation Guide
No ratings yet
Binary Classifier Evaluation Guide
12 pages
Scikit Learn
No ratings yet
Scikit Learn
25 pages
Vtu ML
No ratings yet
Vtu ML
62 pages
ADS - Phase 3
No ratings yet
ADS - Phase 3
34 pages
ML File - 1
No ratings yet
ML File - 1
12 pages
Scikit-Learn Python Cheat Sheet
No ratings yet
Scikit-Learn Python Cheat Sheet
3 pages
Linear Regression: Scikit-Learn
No ratings yet
Linear Regression: Scikit-Learn
3 pages
Scikit-Learn for Data Scientists
No ratings yet
Scikit-Learn for Data Scientists
32 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Scikit-Learn Python Cheat Sheet
100% (1)
Scikit-Learn Python Cheat Sheet
1 page
Scikit-Learn: Library For Machine Learning and Data Science With Python
100% (1)
Scikit-Learn: Library For Machine Learning and Data Science With Python
11 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
34 pages
Logistic Regression
100% (2)
Logistic Regression
30 pages
Detect Fake Social Media Profiles with SVM
No ratings yet
Detect Fake Social Media Profiles with SVM
8 pages
Scikit-Learn Algorithm Overview
No ratings yet
Scikit-Learn Algorithm Overview
1 page
Scikit-Learn Python Cheat Sheet
No ratings yet
Scikit-Learn Python Cheat Sheet
1 page
Machine Learning Evaluation Guide
100% (1)
Machine Learning Evaluation Guide
504 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
8 pages
Telecom Churn Proj
No ratings yet
Telecom Churn Proj
4 pages
Unit 2 Supervised Learning
No ratings yet
Unit 2 Supervised Learning
20 pages
Logistic Regression vs SVM Analysis
No ratings yet
Logistic Regression vs SVM Analysis
7 pages
Scikit-Learn Python Cheat Sheet
100% (1)
Scikit-Learn Python Cheat Sheet
1 page
Confusion Matrix Tutorial for Model Evaluation
No ratings yet
Confusion Matrix Tutorial for Model Evaluation
6 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
Import As Import As From Import From Import From Import From Import
No ratings yet
Import As Import As From Import From Import From Import From Import
4 pages
Ann Experiential Learning
No ratings yet
Ann Experiential Learning
43 pages
Random Forest
No ratings yet
Random Forest
8 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
Lab4 - Jupyter Notebook
No ratings yet
Lab4 - Jupyter Notebook
7 pages
P06 The Classification Pipeline Ans
No ratings yet
P06 The Classification Pipeline Ans
16 pages
3-1 Supervised Learning With Scikit-Learn - Chapter 1 Classification
No ratings yet
3-1 Supervised Learning With Scikit-Learn - Chapter 1 Classification
87 pages
Scikit-Learn Classification Cheat Sheet
No ratings yet
Scikit-Learn Classification Cheat Sheet
1 page
Professional Machine Learning
No ratings yet
Professional Machine Learning
67 pages
Lesson 09 - Introduction To Model Building
No ratings yet
Lesson 09 - Introduction To Model Building
85 pages
ML Lab Manual
No ratings yet
ML Lab Manual
13 pages
Machine Learning with Python Workshop
No ratings yet
Machine Learning with Python Workshop
65 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
7 pages
SPSS Workshops & Resources Guide
No ratings yet
SPSS Workshops & Resources Guide
10 pages
CD 404 Imp Que of Data Science
100% (2)
CD 404 Imp Que of Data Science
3 pages
Logistic Regression From Introductory To Advanced Concepts and Applications 1st Edition Scott Menard
No ratings yet
Logistic Regression From Introductory To Advanced Concepts and Applications 1st Edition Scott Menard
48 pages
Assignment 3 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 3 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
8 pages
Unstructured Data Classification
100% (2)
Unstructured Data Classification
83 pages
Simplified Explanation of The Penalized Logistic Regression Model
No ratings yet
Simplified Explanation of The Penalized Logistic Regression Model
2 pages
Machine Learning Quick Start Guide
No ratings yet
Machine Learning Quick Start Guide
1 page
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
Lecture 17: Multicollinearity 1 Why Collinearity Is A Problem
No ratings yet
Lecture 17: Multicollinearity 1 Why Collinearity Is A Problem
9 pages
Data Analytics With Python Homework Help
No ratings yet
Data Analytics With Python Homework Help
5 pages
Stationarity & AR, MA, ARIMA, SARIMA
100% (1)
Stationarity & AR, MA, ARIMA, SARIMA
6 pages
Boosting
No ratings yet
Boosting
13 pages
Machine Learning Semester Notes JNTUK
No ratings yet
Machine Learning Semester Notes JNTUK
2 pages
Slide 4-Reliability
No ratings yet
Slide 4-Reliability
17 pages
Data Analysis Project MAIN
No ratings yet
Data Analysis Project MAIN
6 pages
Econometrics Assignment
No ratings yet
Econometrics Assignment
5 pages
Data Projections & Visualization: Student Eng.: Maria-Alexandra MATEI
No ratings yet
Data Projections & Visualization: Student Eng.: Maria-Alexandra MATEI
18 pages
What Is Time Series Analysis
No ratings yet
What Is Time Series Analysis
28 pages
Power Plant Output Prediction
No ratings yet
Power Plant Output Prediction
12 pages
Lecture 8
No ratings yet
Lecture 8
22 pages
Vincent Granville, Ph.D. Co-Founder, DSC
No ratings yet
Vincent Granville, Ph.D. Co-Founder, DSC
16 pages
Statistics and Probability
No ratings yet
Statistics and Probability
20 pages
Generalised Linear Model
No ratings yet
Generalised Linear Model
4 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
R Square 30%
No ratings yet
R Square 30%
10 pages
Credit Card Default Analysis
No ratings yet
Credit Card Default Analysis
5 pages
Machine Learning & Deep Learning Course
No ratings yet
Machine Learning & Deep Learning Course
7 pages
CTH Soal
No ratings yet
CTH Soal
4 pages
ML_Nov_Dec_2023
No ratings yet
ML_Nov_Dec_2023
3 pages
STAT Activity
No ratings yet
STAT Activity
26 pages