0% found this document useful (0 votes)

21 views21 pages

Lesson 2.4.1 What Is Scikit Learn Keynote

Scikit-Learn (sklearn) is a Python library built on NumPy and Matplotlib that provides numerous machine learning models and evaluation methods, featuring a well-designed API. The document outlines an end-to-end workflow for using Scikit-Learn, including data preparation, model selection, fitting, predictions, evaluation, and improvement. It also discusses classification and regression metrics, hyperparameter tuning, and best practices for model training and testing.

Uploaded by

soulopp27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views21 pages

Lesson 2.4.1 What Is Scikit Learn Keynote

Uploaded by

soulopp27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

What is Scikit-Learn (sklearn)?

Data

?
Why Scikit-Learn?
• Built on NumPy and Matplotlib (and Python)
• Has many in-built machine learning models
• Methods to evaluate your machine learning models
• Very well-designed API
What are we going to cover?

A Scikit-Learn workflow
What are we going to cover?
• An end-to-end Scikit-Learn workflow

• Getting data ready (to be used with machine learning models)

• Choosing a machine learning model

• Fitting a model to the data (learning patterns)

• Making predictions with a model (using patterns)

• Evaluating model predictions

• Improving model predictions

• Saving and loading models

Where can you get help?

• Follow along with the code

• Try it for yourself
• Press SHIFT + TAB to read the docstring
• Search for it
• Try again
• Ask
Let’s model!
Supervised learning

Heart disease?

Classification Regression

• “Is this example one thing or another?”

• “How much will this house sell for?”
• Binary classification = two options
• “How many people will buy this app?”
• Multi-class classification = more than two options
One Hot Encoding
A process used to turn categories into numbers.

Car Colour Car Red Green Blue

0 Red 0 1 0 0

1 Green 1 0 1 0

2 Blue 2 0 0 1

3 Red 3 1 0 0
Classification and Regression
metrics
Classification Regression

Accuracy R2 (r-squared)

Precision Mean absolute error (MAE)

Recall Mean squared error (MSE)

F1 Root mean squared error (RMSE)

Bold = default evaluation in Scikit-Learn

Confusion matrix anatomy

• True positive = model predicts 1 when truth is 1

• False positive = model predicts 1 when truth is 0
• True negative = model predicts 0 when truth is 0
• False negative = model predicts 0 when truth is 1
Classification report anatomy
• Precision - Indicates the proportion of positive identifications (model
predicted class 1) which were actually correct. A model which produces
no false positives has a precision of 1.0.
• Recall - Indicates the proportion of actual positives which were correctly
classified. A model which produces no false negatives has a recall of 1.0.
• F1 score - A combination of precision and recall. A perfect model
achieves an F1 score of 1.0.
• Support - The number of samples each metric was calculated on.
• Accuracy - The accuracy of the model in decimal form. Perfect accuracy
is equal to 1.0.
• Macro avg - Short for macro average, the average precision, recall and
F1 score between classes. Macro avg doesn’t class imbalance into
effort, so if you do have class imbalances, pay attention to this metric.
• Weighted avg - Short for weighted average, the weighted average
precision, recall and F1 score between classes. Weighted means each
metric is calculated with respect to how many samples there are in each
class. This metric will favour the majority class (e.g. will give a high value
when one class out performs another due to having more samples).
Which classification metric should
you use?
• Accuracy is a good measure to start with if all classes are balanced (e.g. same amount
of samples which are labelled with 0 or 1).
• Precision and recall become more important when classes are imbalanced.
• If false positive predictions are worse than false negatives, aim for higher precision.
• If false negative predictions are worse than false positives, aim for higher recall.
• F1-score is a combination of precision and recall.
Which regression metric should you
use?
• R2 is similar to accuracy. It gives you a quick indication of how well your model might be doing.
Generally, the closer your R2 value is to 1.0, the better the model. But it doesn't really tell exactly
how wrong your model is in terms of how far off each prediction is.
• MAE gives a better indication of how far off each of your model's predictions are on average.
• As for MAE or MSE, because of the way MSE is calculated, squaring the differences between
predicted values and actual values, it amplifies larger differences. Let's say we're predicting the
value of houses (which we are).
• Pay more attention to MAE: When being $10,000 off is twice as bad as being $5,000 off.
• Pay more attention to MSE: When being $10,000 off is more than twice as bad as being
$5,000 off.
Improving a model
(via hyperparameter tuning)

Cooking time: 1 hour

Temperature: 180ºC

Cooking time: 1 hour

Temperature: 200ºC
Tuning Hyperparameters by Hand
The most important concept in
machine learning
(the 3 sets)

Course materials Practice exam Final exam

(training set) (validation set) (test set)

The ability for a machine learning model to perform

Generalization well on data it hasn’t seen before.
Cross-validation
5-fold Cross-validation
100 patient records
Normal Train & Test Split

100 patient records

Split 20 80 patient records

80 patient records 20

Training split (80%) Test split (20%)

Model is trained on training data, and evaluated on the test

data.

Model is trained on 5 diﬀerent versions of training data, and

evaluated on 5 diﬀerent versions of the test data.
5-fold Cross-validation

100 patient records

Normal Train & Test Split

100 patient records

20 80 patient records

80 patient records 20

Training split (80%) Test split (20%)

Figure 1.0: Model is trained on training data, and evaluated

on the test data.

Figure 2.0: Model is trained on training data, and evaluated

on the test data.
Things to remember
• All data should be numerical

• There should be no missing values

• Manipulate the test set the same as the training set

• Never test on data you’ve trained on

• Tune hyperparameters on validation set OR use cross-validation

• One best performance metric doesn’t mean the best model

Solutions Chapter4
100% (2)
Solutions Chapter4
27 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Lesson 3.0 Introduction To Classification Structured Data Projects
No ratings yet
Lesson 3.0 Introduction To Classification Structured Data Projects
10 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Lec 8
No ratings yet
Lec 8
35 pages
BSC ML CH1
No ratings yet
BSC ML CH1
63 pages
Chapter 5 Model Evaluation
No ratings yet
Chapter 5 Model Evaluation
21 pages
Lecture - (3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture - (3-4) Evaluation Metrices Classification and Regression
28 pages
MLT Notes
No ratings yet
MLT Notes
28 pages
Semester
No ratings yet
Semester
8 pages
Performance Evaluation
No ratings yet
Performance Evaluation
24 pages
Metric
No ratings yet
Metric
6 pages
Lect 02 Evaluation Part 1
No ratings yet
Lect 02 Evaluation Part 1
33 pages
ML MAKAUT Unit-3
No ratings yet
ML MAKAUT Unit-3
6 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
Machine Learning Note
No ratings yet
Machine Learning Note
40 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
Unit8 (Evaluation Method)
No ratings yet
Unit8 (Evaluation Method)
43 pages
Interview Questions For Machine Learning Total 215 Questions
100% (1)
Interview Questions For Machine Learning Total 215 Questions
70 pages
Ad3501-Dl-Unit 4 Notes
No ratings yet
Ad3501-Dl-Unit 4 Notes
16 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Machine Learning Model Evaluation - Zero To Mastery Academy
No ratings yet
Machine Learning Model Evaluation - Zero To Mastery Academy
1 page
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
3 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Python Learning
No ratings yet
Python Learning
21 pages
08 Classifier Evaluation
No ratings yet
08 Classifier Evaluation
39 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
4 pages
Unit III 1
No ratings yet
Unit III 1
21 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
Lecture 5
No ratings yet
Lecture 5
18 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
Gene Expression Prediction Using Machine Learning Project Presentation
No ratings yet
Gene Expression Prediction Using Machine Learning Project Presentation
14 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
Evaluation Metrics:: Confusion Matrix
No ratings yet
Evaluation Metrics:: Confusion Matrix
7 pages
Evaluating A Machine Learning Model
No ratings yet
Evaluating A Machine Learning Model
14 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
No ratings yet
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
31 pages
Artificial Intelligence Template 16x9
No ratings yet
Artificial Intelligence Template 16x9
12 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
49 pages
Unit 4 ML
No ratings yet
Unit 4 ML
28 pages
IT 138 - Lecture 4
No ratings yet
IT 138 - Lecture 4
30 pages
PTCB Pharmacy Calculations Workbook: Master Alligations, Dilutions, IV Flow Rates, Dosages & Conversions with Over 350 Practice Questions with Detailed Explanations
From Everand
PTCB Pharmacy Calculations Workbook: Master Alligations, Dilutions, IV Flow Rates, Dosages & Conversions with Over 350 Practice Questions with Detailed Explanations
Stanley Lawrence Richardson
No ratings yet
Project
No ratings yet
Project
6 pages
Notes On Mathematics of Quantum Mechanics: Sadi Turgut
No ratings yet
Notes On Mathematics of Quantum Mechanics: Sadi Turgut
56 pages
15a. Caretium NB-201 PDF
No ratings yet
15a. Caretium NB-201 PDF
2 pages
Structures Congress 2017: Buildings and Special Structures
No ratings yet
Structures Congress 2017: Buildings and Special Structures
801 pages
Project Planning and Approval Worksheet
100% (2)
Project Planning and Approval Worksheet
8 pages
Practice For 2ND PT
No ratings yet
Practice For 2ND PT
3 pages
A Natural Asymmetry in Electrical Systems With Far-Reaching Consequences
No ratings yet
A Natural Asymmetry in Electrical Systems With Far-Reaching Consequences
4 pages
MAT 1100 Inequalities - 2020
No ratings yet
MAT 1100 Inequalities - 2020
15 pages
Solution Manual For Trigonometry 3rd Edition by Young
No ratings yet
Solution Manual For Trigonometry 3rd Edition by Young
90 pages
Jyoti Mam 8th Cbse Class Test
No ratings yet
Jyoti Mam 8th Cbse Class Test
2 pages
MMPBSA Python Manual
No ratings yet
MMPBSA Python Manual
17 pages
PythonScientific Simple PDF
100% (2)
PythonScientific Simple PDF
335 pages
CH2114
No ratings yet
CH2114
2 pages
Lecture 4 - Metrology & Measurement
No ratings yet
Lecture 4 - Metrology & Measurement
15 pages
CW1 Balancing of Rotating Masses
No ratings yet
CW1 Balancing of Rotating Masses
5 pages
Session 15
100% (1)
Session 15
31 pages
PH.D Presentation
No ratings yet
PH.D Presentation
16 pages
Case Studies On Theory of Computation
No ratings yet
Case Studies On Theory of Computation
2 pages
Lines and Angles
No ratings yet
Lines and Angles
3 pages
NCHRP RPT 292 PDF
No ratings yet
NCHRP RPT 292 PDF
143 pages
Verilog HDL Basics
No ratings yet
Verilog HDL Basics
73 pages
Math 5 Week 2 Q1
No ratings yet
Math 5 Week 2 Q1
9 pages
CO2 Ged102 pg.193
No ratings yet
CO2 Ged102 pg.193
3 pages
Research Paper On Security Paillier
No ratings yet
Research Paper On Security Paillier
16 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Mini-Frac Analysis For Unconventional Reservoirs Using Fast Welltest 16-Aug-2013 0
100% (1)
Mini-Frac Analysis For Unconventional Reservoirs Using Fast Welltest 16-Aug-2013 0
44 pages
PHY 111b
No ratings yet
PHY 111b
8 pages
Numerical Calculation of Tertiary Air Duct in The Cement Kiln Installation
No ratings yet
Numerical Calculation of Tertiary Air Duct in The Cement Kiln Installation
3 pages
1.3 Translational Equilibrium Statics
No ratings yet
1.3 Translational Equilibrium Statics
55 pages