0% found this document useful (0 votes)

38 views

Introduction To ML Lecture 1

The document provides an agenda for an introduction to machine learning theory and practice workshop. The agenda includes sessions on the machine learning landscape, classification and regression, linear regression with NumPy, and an introduction to Scikit-Learn. It also lists the instructor's background and references related machine learning books and documentation.

Uploaded by

Amal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Introduction To ML Lecture 1

Uploaded by

Amal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Introduction to Machine Learning

Theory and Practice

David R. Pugh
Instructional Assistant Professor, KAUST
Director, SDAIA-KAUST AI

• 5+ years teaching applied machine learning and deep learning at KAUST.

• 2+ years as the director of SDAIA-KAUST AI where I work to match applied AI
problems of interest to SDAIA with AI solutions developed at KAUST.
• 15+ years experience with the core data science Python stack: NumPy, SciPy,
Pandas, Matplotlib, NetworkX, Jupyter, Scikit-Learn, PyTorch, etc.

KAUST Academy 2
Agenda
Introduction to Machine Learning: Theory and Practice

09:00 - 09:05 Welcome and Opening Remarks Prof. David Pugh

09:05 - 10:30 The Machine Learning Landscape Prof. David Pugh

10:30 - 10:45 Break

10:45 - 12:00 Classification and Regression Prof. David Pugh

12:00 - 13:00 Lunch

13:00 - 14:30 Linear Regression with NumPy Prof. David Pugh + TAs

14:30 - 14:45 Break

14:45 – 16:00 Introduction to Scikit-Learn Prof. David Pugh + TAs

KAUST Academy 3
References

• Slides closely follow Hands-on Machine Learning with Scikit-Learn,

Keras and Tensorflow by Aurelien Geron.
• Another great reference is Machine Learning with PyTorch and Scikit-
Learn by Sebastian Raschka.
• Official documentation for Scikit-Learn is also fantastic.

KAUST Academy Prof. Da vi d R. Pugh 4

The ML Landscape

Prof. Da vi d R. Pugh
What is difference between AI and ML?

KAUST Academy Prof. Da vi d R. Pugh 6

What is ML?

• ML is the science (and art) of programming computers so they can learn from
data (Geron, 2019).
• [ML is the] field of study that gives computers the ability to learn without
being explicitly programmed (Samuel, 1959).
• A computer program is said to learn from experience E with respect to some
task T and some performance measure P, if its performance on T, as
measured by P, improves with experience E (Mitchell, 1997).

KAUST Academy Prof. Da vi d R. Pugh 7

Why is ML so popular right now?

Stanford’s Coursera machine learning course had more than 100,000 expressing interest
in the first year.

1. The field has matured both in terms of identity and in terms of methods and tools.
2. There is an abundance of data available
3. There is an abundance of computation to run methods
4. There have been impressive results, increasing acceptance, respect, and competition

Resources + Ingredients + Tools + Desire = Popularity

KAUST Academy Based on: http://machinelearningmastery.com/machine-learning-is-popular/?__s=yq1qzcnf67sfiuzmnvjf 8

Traditional approach is model/rules based...

KAUST Academy Prof. Da vi d R. Pugh 9

...ML approach is data-driven!

KAUST Academy Prof. Da vi d R. Pugh 10

ML adapts to change!

KAUST Academy Prof. Da vi d R. Pugh 11

ML can help humans learn!

KAUST Academy Prof. Da vi d R. Pugh 12

Types of ML systems

• Supervised vs unsupervised
• Semi-supervised vs self-supervised
• Batch (offline) vs incremental (online)
• Instance-based vs model-based

KAUST Academy Prof. Da vi d R. Pugh 13

Supervised learning

Classification Regression

KAUST Academy Prof. Da vi d R. Pugh 14

Other forms of supervised learning
Semi-supervised learning Self-supervised learning

KAUST Academy Prof. Da vi d R. Pugh 15

Unsupervised learning
Clustering Data visualization

KAUST Academy Prof. Da vi d R. Pugh 16

Reinforcement Learning

KAUST Academy Prof. Da vi d R. Pugh 17

Batch (offline) vs incremental (online) learning

Batch (offline) Learning Incremental (online) learning

KAUST Academy Prof. Da vi d R. Pugh 18

Out-of-core learning

KAUST Academy Prof. Da vi d R. Pugh 19

Instance-based vs model-based learning

Instance-based learning Model-based learning

KAUST Academy Prof. Da vi d R. Pugh 20

Main Challenges of Applying ML

KAUST Academy 21
Main Challenges of Applying ML

• Insufficient quantity of training data

• Non-representative training data
• Poor quality data
• Irrelevant features
• Overfitting the training data
• Underfitting the training data

KAUST Academy Prof. Da vi d R. Pugh 22

Insufficient quantity of training data

• The more data for training the

better!
• It can take a lot of data for most
ML algorithms to work.
• "Simple" problems often require
O(10k) samples.
• "Complex" problems often
require O(1m) samples.

KAUST Academy Prof. Da vi d R. Pugh 23

Non-representative training data

• Need training data to be

representative of new data for
generalization.
• Sampling noise: not enough
data => training data not
representative by chance.
• Sampling bias: poor sampling
technique => training data not
representative (biased).

KAUST Academy Prof. Da vi d R. Pugh 24

Poor quality training data

• Data can be full of errors, • Data types? Do you have

outliers, and noise (e.g., due to numeric features? Ordinal
poor-quality measurements). features? Categorical features?
• Dirty data => hard for • Look for outliers in your data:
any algorithm to detect Remove? Fix manually?
patterns. • Look for missing data:
• Significant amount of your Remove? Impute values?
time will be spent cleaning
data.

KAUST Academy Prof. Da vi d R. Pugh 25

Irrelevant features

Garbage in => garbage out! Feature engineering is often

critical to success.
• Learning requires sufficient
relevant features (and not too • Feature selection:
many irrelevant ones!). selecting the "best" subset
• Developing a good set of of features for training.
features for training is critical
part of ML project. • Feature extraction:
• Significant amount of combining existing features to
your time will be spent doing produce new ones.
feature engineering. • Creating new features
from new data.
KAUST Academy Prof. Da vi d R. Pugh 26
Overfitting the training data

What is overfitting?

• Overfitting is when model

performs well on training data
but poorly on new data.
• If model is complex or training
data is limited, model will detect
spurious patterns.
• Constraining a complex
model to make it simpler is
called regularization.
KAUST Academy Prof. Da vi d R. Pugh 27
Underfitting the training data

What is underfitting? How to reduce underfitting?

• Underfitting is when a model is • Select more complex (more

too simple to learn the parameters) model.
underlying structure of the data.
• Linear models will often • Feed better features to the
underfit (but often a good place model (feature engineering).
to start). • Reduce the constraints on
model (reduce regularization).

KAUST Academy Prof. Da vi d R. Pugh 28

Validation and Testing

KAUST Academy 29
Why measure generalization error?

• Only way to know if your model Some train-test split heuristics:

is good is to measure
performance new data! • For datasets smaller than
• Split your data into train and O(100k) samples, take 80%
test sets: error on the test set is for train and holdout 20%
estimate of generalization error. for test.
• Low training error, high • For larger datasets, O(1m)
generalization error => samples, holdout 1-10% of the
overfitting! dataset for test.

KAUST Academy Prof. Da vi d R. Pugh 30

Model Selection

• Often need to tune • Validation set too small =>

hyperparameters to find a good might select "bad" model by
model within a particular class mistake.
of models. • Validation set too large
• How? Split training data into => training set too small!
training set and validation set. • Cross validation: create lots
• Always compare tuned models of small validation sets,
using the test set! evaluate model on each
validation set, measure
average performance across
validation sets.
KAUST Academy Prof. Da vi d R. Pugh 31
Model selection process

KAUST Academy Prof. Da vi d R. Pugh 32

Thanks!

KAUST Academy Prof. Da vi d R. Pugh 36

Motivation Letter For A Teaching Job
100% (1)
Motivation Letter For A Teaching Job
2 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
35 pages
Introduction To Systems Analysis and Design:: An Agile, Iterative Approach
No ratings yet
Introduction To Systems Analysis and Design:: An Agile, Iterative Approach
49 pages
Introduction To Systems Analysis and Design:: An Agile, Iterative Approach
No ratings yet
Introduction To Systems Analysis and Design:: An Agile, Iterative Approach
39 pages
Outline 103 3
No ratings yet
Outline 103 3
5 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
60 pages
Syl3 ML
No ratings yet
Syl3 ML
5 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
Class1-%20Introduction%20and%20foundation-1717413257735
No ratings yet
Class1-%20Introduction%20and%20foundation-1717413257735
23 pages
CMPE257 - W2C2 - ML Fundamentals_ Part 1
No ratings yet
CMPE257 - W2C2 - ML Fundamentals_ Part 1
18 pages
Intro ML 1 Day
No ratings yet
Intro ML 1 Day
43 pages
Introduction to ML Unit-1 PPT
No ratings yet
Introduction to ML Unit-1 PPT
90 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
90 pages
Chapter 5 AI
No ratings yet
Chapter 5 AI
40 pages
Data Management and Data Transformation, Introduction To Machine Learning
No ratings yet
Data Management and Data Transformation, Introduction To Machine Learning
54 pages
EE353 - 769 06 Intro To ML
No ratings yet
EE353 - 769 06 Intro To ML
27 pages
Unit1 ML NGP
No ratings yet
Unit1 ML NGP
106 pages
Introduction To Machine Learning: Pekka Parviainen
No ratings yet
Introduction To Machine Learning: Pekka Parviainen
39 pages
ML-cahp-1
No ratings yet
ML-cahp-1
35 pages
Lecture # 2-2 Artificial Neural Networks
No ratings yet
Lecture # 2-2 Artificial Neural Networks
70 pages
Study Notes - Lesson 1 - 7 PDF
No ratings yet
Study Notes - Lesson 1 - 7 PDF
25 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
MLUnit_1
No ratings yet
MLUnit_1
131 pages
Unit Iv Parametric Machine Learning
No ratings yet
Unit Iv Parametric Machine Learning
4 pages
From Field Problems To Machine Learning
No ratings yet
From Field Problems To Machine Learning
51 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
01 Intro To ML Wo Videos
No ratings yet
01 Intro To ML Wo Videos
46 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
ML Revision
No ratings yet
ML Revision
207 pages
L1 Overview
No ratings yet
L1 Overview
28 pages
Lecture 1 Ai
No ratings yet
Lecture 1 Ai
38 pages
ML Module No 01.Pptx
No ratings yet
ML Module No 01.Pptx
138 pages
A.I. Lecture 4 NEW
No ratings yet
A.I. Lecture 4 NEW
31 pages
1-Introduction to Machine Learning
No ratings yet
1-Introduction to Machine Learning
61 pages
APS1070 Lecture (3) Slides
No ratings yet
APS1070 Lecture (3) Slides
70 pages
1 - Machine Learning Overview
No ratings yet
1 - Machine Learning Overview
56 pages
CSL0777 L02
No ratings yet
CSL0777 L02
19 pages
2021 Machine Learning Intro
No ratings yet
2021 Machine Learning Intro
43 pages
Intro_DL_01
No ratings yet
Intro_DL_01
64 pages
ML-chap-2
No ratings yet
ML-chap-2
60 pages
Overview of machine learning
No ratings yet
Overview of machine learning
60 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
ML - Week 1
No ratings yet
ML - Week 1
37 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
13 pages
Basic_concepts_of_Machine_Learning_for_Beginners_1732109263
No ratings yet
Basic_concepts_of_Machine_Learning_for_Beginners_1732109263
102 pages
Lec-01
No ratings yet
Lec-01
28 pages
Lecture 1
No ratings yet
Lecture 1
51 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
10 pages
ENG6500 1 IntroductionToMLDL Part1
No ratings yet
ENG6500 1 IntroductionToMLDL Part1
63 pages
Lecture - 1 Introduction To ML
No ratings yet
Lecture - 1 Introduction To ML
38 pages
01ML Introduction
No ratings yet
01ML Introduction
80 pages
An Enlightenment To Machine Learning
100% (1)
An Enlightenment To Machine Learning
16 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
81 pages
MLUnit_1 Share (1)
No ratings yet
MLUnit_1 Share (1)
162 pages
Lecture 1 Machine Learning
No ratings yet
Lecture 1 Machine Learning
22 pages
1 - Machine Learning Overview
No ratings yet
1 - Machine Learning Overview
53 pages
dbms-10 marks
No ratings yet
dbms-10 marks
32 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
45 pages
GML-prelude
No ratings yet
GML-prelude
23 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
Number & Operations - Task Sheets Gr. PK-2
From Everand
Number & Operations - Task Sheets Gr. PK-2
Nat Reed
No ratings yet
A Measurement Framework for Software Projects: A Generic and Practical Goal-Question-Metric(Gqm) Based Approach.
From Everand
A Measurement Framework for Software Projects: A Generic and Practical Goal-Question-Metric(Gqm) Based Approach.
Prashanth Harish Southekal
No ratings yet
vertopal.com_lab6
No ratings yet
vertopal.com_lab6
29 pages
Vertopal.com HW4ML Project Code
No ratings yet
Vertopal.com HW4ML Project Code
24 pages
ece265p-fahmy-day7
No ratings yet
ece265p-fahmy-day7
93 pages
Chapter 12
No ratings yet
Chapter 12
61 pages
XML: Extensible Markup Language
No ratings yet
XML: Extensible Markup Language
35 pages
Chapter19 v2
No ratings yet
Chapter19 v2
54 pages
The Role of Multichannel Integration in CRM: Adrian Payne and Pennie Frow
No ratings yet
The Role of Multichannel Integration in CRM: Adrian Payne and Pennie Frow
9 pages
Simple Present
No ratings yet
Simple Present
28 pages
Chapter-1---The-history-of-dynamic-appr_2021_The-Handbook-of-Personality-Dyn
No ratings yet
Chapter-1---The-history-of-dynamic-appr_2021_The-Handbook-of-Personality-Dyn
29 pages
F3 Lesson Plan Civic Responsibility
100% (1)
F3 Lesson Plan Civic Responsibility
2 pages
Call for Abstracts_ Innovation and Research Summit 2025
No ratings yet
Call for Abstracts_ Innovation and Research Summit 2025
1 page
Stress
No ratings yet
Stress
25 pages
Product Design Morphology
50% (2)
Product Design Morphology
15 pages
09 Hmef5053 T5
No ratings yet
09 Hmef5053 T5
14 pages
Curriculum Vite Sunil Kumar Nayak:-: E-Mail Id Mob: 7328002714 Career Objectives
No ratings yet
Curriculum Vite Sunil Kumar Nayak:-: E-Mail Id Mob: 7328002714 Career Objectives
2 pages
Topic 2 Agents
No ratings yet
Topic 2 Agents
33 pages
It Is True That Some People May Never Use A Second Language in Their Daily Lives
No ratings yet
It Is True That Some People May Never Use A Second Language in Their Daily Lives
4 pages
Sos Lesson Plans
No ratings yet
Sos Lesson Plans
8 pages
Session Plan
No ratings yet
Session Plan
2 pages
Non-Directive Counseling: Group 2
No ratings yet
Non-Directive Counseling: Group 2
23 pages
Entry 2 - Reading Mark Scheme
No ratings yet
Entry 2 - Reading Mark Scheme
2 pages
37 The Power of Conformity
No ratings yet
37 The Power of Conformity
4 pages
My Dictionary: Sl. No. Word/Phrase Synonyms Antonyms Meaning(s)
No ratings yet
My Dictionary: Sl. No. Word/Phrase Synonyms Antonyms Meaning(s)
10 pages
Artificial Neural Networks and Machine Learning - ICANN 2018
No ratings yet
Artificial Neural Networks and Machine Learning - ICANN 2018
854 pages
Patricia Simoes Aelbrecht
No ratings yet
Patricia Simoes Aelbrecht
21 pages
Insight Advanced Contents
100% (1)
Insight Advanced Contents
2 pages
Myself in Another Person'S Shoes
No ratings yet
Myself in Another Person'S Shoes
5 pages
INTP Profile
100% (2)
INTP Profile
17 pages
Advanced 2 - Units 15-16 Quiz
No ratings yet
Advanced 2 - Units 15-16 Quiz
2 pages
Glide A Communication Aid For Deaf-Mute
No ratings yet
Glide A Communication Aid For Deaf-Mute
4 pages
Fidp Entrep
No ratings yet
Fidp Entrep
19 pages
International Exam Preparation: IEP 2 Unit 5: Grammar Block 5
No ratings yet
International Exam Preparation: IEP 2 Unit 5: Grammar Block 5
21 pages
Coherent Loss: A Generic Framework For Stable Video Segmentation
No ratings yet
Coherent Loss: A Generic Framework For Stable Video Segmentation
12 pages
HRM MCQ's
92% (13)
HRM MCQ's
4 pages
Simple Present - Test 1: A - Put in The Correct Verb Forms
No ratings yet
Simple Present - Test 1: A - Put in The Correct Verb Forms
3 pages

Introduction To ML Lecture 1

Uploaded by

Introduction To ML Lecture 1

Uploaded by

Introduction to Machine Learning

Theory and Practice

• 5+ years teaching applied machine learning and deep learning at KAUST.

09:00 - 09:05 Welcome and Opening Remarks Prof. David Pugh

09:05 - 10:30 The Machine Learning Landscape Prof. David Pugh

10:30 - 10:45 Break

10:45 - 12:00 Classification and Regression Prof. David Pugh

12:00 - 13:00 Lunch

14:30 - 14:45 Break

14:45 – 16:00 Introduction to Scikit-Learn Prof. David Pugh + TAs

• Slides closely follow Hands-on Machine Learning with Scikit-Learn,

KAUST Academy Prof. Da vi d R. Pugh 4

KAUST Academy Prof. Da vi d R. Pugh 6

KAUST Academy Prof. Da vi d R. Pugh 7

Resources + Ingredients + Tools + Desire = Popularity

KAUST Academy Based on: http://machinelearningmastery.com/machine-learning-is-popular/?__s=yq1qzcnf67sfiuzmnvjf 8

KAUST Academy Prof. Da vi d R. Pugh 9

KAUST Academy Prof. Da vi d R. Pugh 10

KAUST Academy Prof. Da vi d R. Pugh 11

KAUST Academy Prof. Da vi d R. Pugh 12

KAUST Academy Prof. Da vi d R. Pugh 13

KAUST Academy Prof. Da vi d R. Pugh 14

KAUST Academy Prof. Da vi d R. Pugh 15

KAUST Academy Prof. Da vi d R. Pugh 16

KAUST Academy Prof. Da vi d R. Pugh 17

Batch (offline) Learning Incremental (online) learning

KAUST Academy Prof. Da vi d R. Pugh 18

KAUST Academy Prof. Da vi d R. Pugh 19

Instance-based learning Model-based learning

KAUST Academy Prof. Da vi d R. Pugh 20

• Insufficient quantity of training data

KAUST Academy Prof. Da vi d R. Pugh 22

• The more data for training the

KAUST Academy Prof. Da vi d R. Pugh 23

• Need training data to be

KAUST Academy Prof. Da vi d R. Pugh 24

• Data can be full of errors, • Data types? Do you have

KAUST Academy Prof. Da vi d R. Pugh 25

Garbage in => garbage out! Feature engineering is often

• Overfitting is when model

What is underfitting? How to reduce underfitting?

• Underfitting is when a model is • Select more complex (more

KAUST Academy Prof. Da vi d R. Pugh 28

• Only way to know if your model Some train-test split heuristics:

KAUST Academy Prof. Da vi d R. Pugh 30

• Often need to tune • Validation set too small =>

KAUST Academy Prof. Da vi d R. Pugh 32

KAUST Academy Prof. Da vi d R. Pugh 36

You might also like