0% found this document useful (0 votes)

24 views30 pages

Chapter 19

Uploaded by

afrahabdirahman80

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views30 pages

Chapter 19

Uploaded by

afrahabdirahman80

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

1

Chapter 19.
Machine learning.
Eng.Abdulrazak A. Dirie
Learning from Examples: Machine Learning 2

Machine Learning
 Learning: Improve performance after making observations about the world. That is, learn what works
and what doesn’t to get closer to optimal decisions.
 How to learn a model to make better decisions from data/experience?
 Supervised Learning: Learn a function (model) to map input to output from a
training set.
Examples:
 Use a naïve Bayesian classifier to distinguish between spam/no spam
 Learn a playout policy to simulate games (current board -> good move)
 Unsupervised Learning: Organize data (e.g., clustering, embedding)
 Deap learning :
 Reinforcement Learning: Learn from rewards/punishment (e.g., winning a game)
obtained via interaction with the environment over time.
Supervised Learning 3
 Examples
 We assume there exists a target function that produces iid (independent and identically distributed) examples
possibly with noise and errors.
 Examples are observed input-output pairs , where is a vectors called the feature vector.

 Learning problem 𝑓

Given a hypothesis space H of representable models.
 Find a hypothesis such that
 That is, we want to approximate by using .

 Supervised learning includes

 Classification (outputs = class labels). E.g., is an email and is spam / ham.

Regression (outputs = real numbers). E.g., x is a house and is its selling price.
Consistency vs. Simplicity 4

Example: Univariate curve fitting (regression, function approximation)

y Examples y Learned Models x…
lines …

Very simple,
but not very
consistent
with the
data!

 Consistency:
 Simplicity: small number of model parameters
Measuring Consistency using Loss 5

Goal of learning: Find a hypothesis that makes predictions that are consistent with the examples .
That is,

 Measure mistakes: Loss function

 Absolute-value loss For Regression
 Squared-error loss For Classification
 0/1 loss Loss
 Log loss, cross-entropy loss and many others… 𝑓
∗
 Empirical loss: average loss over the N examples in the dataset h
Learning Consistent by Minimizing the Loss 6

 Empirical loss

 Find the best hypothesis that minimizes the loss

Loss

𝑓
Reasons for
∗
a)

b)
Realizability:
is nondeterministic or examples are noisy.
h
c) It is computationally intractable to search all ,
so we use a non-optimal heuristic.
The Bayes Classifier 7

For 0/1 loss, the empirical loss is minimized by the model that predicts for each the most likely class using MAP
(Maximum a posteriori) estimates. This is called the Bayes classifier.

Optimality: The Bayes classifier is optimal for 0/1 loss. It is the most consistent classifier possible with the
lowest possible error called the Bayes error rate. No better classifier is possible!

Issue: The classifier requires to learn from the examples.

 It needs the complete joint probability which requires in the general case a probability table with one entry
for each possible value for the feature vector .
 This is impractical (unless a simple Bayes network exists) and most classifiers try to approximate the Bayes
classifier using a simpler model with fewer parameters.
Simplicity 8

Ease of use
 Simpler hypotheses have fewer model parameters to estimate and store.

Generalization: How well does the hypothesis perform on new data?

 We do not want the model to be too specific to the training examples (an issue called overfitting).
 Simpler models typically generalize better to new examples.

How to achieve simplicity?

a) Model bias: Restrict to simpler models (e.g., assumptions like independence, only consider linear
models).
b) Feature selection: use fewer variables from the feature vector
c) Regularization: penalize model for its complexity (e.g., number of parameters)

Penalty term
Overfitting
Model Selection: Bias vs. Variance 9

Simpler More consistent

Points: Two
samples from the
same function to
show variance.

Lines: the learned

function .

High Bias: restrictions by the model class Low

This is a tradeoff
Low Variance: difference in the model due to slightly different data. high
The Dataset Feature vector
(Features, Variables, Attributes)
Class
10
Label

Examples
(Instances,
Observation)

on
ve

Pa y

e
ns

ti m
r
a ti

va
tro
rn

ai t
se
te

W
Re
Al

Find a hypothesis (called “model”) to predict the class given the features.
Feature Engineering 11

 Add information sources as new variables to the model.

 Add derived features that help the classifier (e.g., , ).
 Embedding: E.g., convert words to vectors where vector similarity
between vectors reflects semantic similarity.

 Example for Spam detection: In addition to words

 Have you emailed the sender before?
 Have 1000+ other people just gotten the same email?
 Is the header information consistent?
 Is the email in ALL CAPS?
 Do inline URLs point where they say they point?
 Does the email address you by (your) name?

 Feature Selection: Which features should be used in the model is a

model selection problem (choose between models with different
features).
12

Training
and
Testing
Model Evaluation (Testing) 13

The model was trained on the training examples . We want to test how well the model
will perform on new examples (i.e., how well it generalizes to new data).

 Testing loss: Calculate the empirical loss for predictions on a testing data set that
is different from the data used for training.

 For classification we often use the accuracy measure, the proportion of correctly
classified test examples.

is an indicator function returning 1 if and otherwise 0

Training a Model 14

 Models are “trained” (learned) on the training data. This involved estimating:

1. Model parameters (the model): E.g., probabilities, weights, factors.

2. Hyperparameters: Many learning algorithms have choices for learning rate,
regularization , maximal decision tree depth, selected features,... The algorithm tries Training
to optimizes the model parameters given user-specified hyperparameters. Data

 We need to tune the hyperparameters!

Test
Data
Hyperparameter Tuning/Model Selection 15

1. Hold a validation data set back from the training data.

2. Learn models using the training set with different hyperparameters. Often a
grid of possible hyperparameter combinations or some greedy search is used.
3. Evaluate the models using the validation data and choose the model with the
best accuracy. Selecting the right type of model, hyperparameters and features is Training
called model selection. Data
Training
4. Learn the final model with the chosen hyperparameters using all training Data
(including validation data).

 Notes: Validation

Data
The validation set was not used for training, so we get generalization accuracy for the
different hyperparameter settings.
 If no model selection is necessary, then no validation set is used.
Test
Data
Testing a Model 16

Training
Data

 After the model is selected, the final model is evaluated against the
test set to estimate the final model accuracy.
Test
 Very important: never “peek” at the test set during training! Data
How to Split the Dataset 17

 Random splits: Split the data randomly in, e.g.,

60% training, 20% validation, and 20% testing.

 Stratified splits: Like random splits, but balance classes and other properties of the Training
examples. Data
Training
Data
 k-fold cross validation: Use training & validation data better
 Split the training & validation data randomly into k folds. Validation
 For k rounds hold one fold back for testing and use the remaining folds for training. Data
 Use the average error/accuracy as a better estimate.
 Some algorithms/tools do this internally.
Test
Data
 LOOCV (leave-one-out cross validation): used if very little data is available.
Learning Curve: 18

The Effect the Training Data Size

Accuracy of a classifier
when the amount of
available training data
increases.
Accuracy

More data is
better!
At some point the learning
curve flattens out and more
data does not contribute
much!
Comparing to a Baselines 19

 First step: get a baseline

 Baselines are very simple straw man model.
 Helps to determine how hard the task is.
 Helps to find out what a good accuracy is.

 Weak baseline: The most frequent label classifier

 Gives all test instances whatever label was most common in the training set.
 Example: For spam filtering, give every message the label “ham.”
 Accuracy might be very high if the problem is skewed (called class imbalance).
 Example: If calling everything “ham” gets already 66% right, so a classifier that gets 70% isn’t very
good…

 Strong baseline: For research, we typically compare to previous published

state-of-the-art as a baseline.
20
Types of
Models
REGRESSION: PREDICT A
NUMBER
CLASSIFICATION: PREDICT A
LABEL
Regression: Linear Regression 21

Model:
Squared error loss over the whole data matrix
Empirical Loss:
The gradient is a vector of partial derivatives
Gradient:

Find: 0

Gradient descend: ∇ 𝐿(𝒘 )

𝒘

Analytical solution:

Pseudo inverse
Naïve Bayes Classifier 22

 Approximates a Bayes classifier with the naïve independence assumption that all features are
conditional independent given the class.

The s and the s are estimated from the data by counting.

 Gaussian Naïve Bayes Classifiers extend the approach to continuous features by assuming:

The parameters for the normal distribution are estimated from

data.
Decision Trees 23

 A sequence of decisions represented as a tree.

 Many implementations that differ by
 How to select features to split?
 When to stop splitting?
 Is the tree pruned?

 Approximates a Bayesian classifier by

K-Nearest Neighbors Classifier 24

 Class is predicted by looking at the majority in the set of the k nearest neighbors. is a hyperparameter.
Larger smooth the decision boundary.
 Neighbors are found using a distance measure (e.g., Euclidean distance between points).
 Approximates a Bayesian classifier by
Support Vector Machine (SVM) 25

Margin

Decision
boundary

 Linear classifier that finds the maximum margin separator using only the points that are “support
vectors” and quadratic optimization.
 The kernel trick can be used to learn non-linear decision boundaries.
Artificial Neural Networks/Deep Learning 26
Computational graph
Hidden Layer For classification
typically a softmax  Represent as a network of
activation function
returning
weighted sums with non-linear
activation functions g (e.g.,
logistic, ReLU).
 Learn weights from examples
using backpropagation of
prediction errors (gradient
descend).
 ANNs are universal
approximators. Large networks
can approximate any function (no
bias). Regularization is typically
used to avoid overfitting.
 Deep learning adds more hidden
Perceptron layers and layer types (e.g.,
Bias term Non-linear activation function convolution layers) for better
learning.
27
Other
Many other models exist

• Generalized linear model (GLM): This important

model family includes linear regression and the

Popular classification method logistic regression.

Models and Often used methods

• Regularization: enforce simplicity by using a penalty

for complexity.

Methods • Kernel trick: Let a linear classifier learn non-linear

decision boundaries ( = a linear boundary in a high
dimensional space).
• Ensemble Learning: Use many models and combine
the results (e.g., random forest, boosting).
• Embedding and Dimensionality Reduction: Learn
how to represent data in a simpler way.
Some Use Cases of ML for Intelligent 28

Agents Learn Actions Learn Heuristics Perception Compressing Tables

• Directly learn the best • Learn evaluation functions • Natural language • Neural networks can be
action from examples. for states. processing: Use deep used as a compact
learning / word representation of tables
embeddings / language that do not fit in memory.
models to understand E.g.,
• Can learn a heuristic for concepts, translate • Joint probability table
• This model can also be
between languages, or • State utility table
used as a playout policy minimax search from generate text.
for Monte Carlo tree examples. • Speech recognition: • The tables can be learned
search with data from self-
play. Identify the most likely form data.
sequence of words.
• Vision: Object recognition
in images/videos. Generate
images/video.

Bottom line: Learning a function is often more effective than hard-coding it

However, we do not always know how it performs in very rare cases!
Conclusion 29

 Machine leaning
 Supervised learning ( k-nearest naiboughr, linear regresson, ANN)
 Unsupervised learning. (clustering..)
 Deep learning ( ANN, CNN, RNN…)
 Reinforcement learning ( q table ..)
 Practice of supervised learning:
 Linear regression.
 Support vector machine.
 Ensample leaning.
 Decision tree.
 Naïve bayes classifier.
30

END

Algorithmic Number Theory, Vol. 1 Efficient Algorithms - Bach E., Shallit J.
100% (3)
Algorithmic Number Theory, Vol. 1 Efficient Algorithms - Bach E., Shallit J.
516 pages
DLL Matatag Week 5 Pe and Health
No ratings yet
DLL Matatag Week 5 Pe and Health
14 pages
Perhitungan Tugas Besar Geometri Jalan Raya (Andre Gunawan 1622201019)
No ratings yet
Perhitungan Tugas Besar Geometri Jalan Raya (Andre Gunawan 1622201019)
77 pages
Total Design Manuall
No ratings yet
Total Design Manuall
313 pages
Cleaning Validation For Biopharmaceutical Manufacturing at Genentech
100% (1)
Cleaning Validation For Biopharmaceutical Manufacturing at Genentech
4 pages
Desmi Operations and Maintenance Instructions
100% (2)
Desmi Operations and Maintenance Instructions
29 pages
CSR Bernard Madoff Case Analysis and Conclusion
No ratings yet
CSR Bernard Madoff Case Analysis and Conclusion
6 pages
Extracted-Chemistry Hand Book Last and Final Copy 17-6-2025
No ratings yet
Extracted-Chemistry Hand Book Last and Final Copy 17-6-2025
40 pages
Grade 5 P.E and Arts Paper 1 End of Year Exams 2022
No ratings yet
Grade 5 P.E and Arts Paper 1 End of Year Exams 2022
2 pages
Bridging (Animal Production) - L1
No ratings yet
Bridging (Animal Production) - L1
107 pages
Preformulasi Merged
No ratings yet
Preformulasi Merged
147 pages
Jurutera CSD Sdn. BHD.: Consulting Engineers
No ratings yet
Jurutera CSD Sdn. BHD.: Consulting Engineers
6 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Disposal of Plastic Bags
No ratings yet
Disposal of Plastic Bags
15 pages
Lesson 5
No ratings yet
Lesson 5
35 pages
CHC Rotortales 2004 Annual Edition
No ratings yet
CHC Rotortales 2004 Annual Edition
16 pages
Supervised Machine Learning Algorithm
100% (1)
Supervised Machine Learning Algorithm
111 pages
Apllicant Tracking Sarinah
No ratings yet
Apllicant Tracking Sarinah
43 pages
CH 4 Knowledge Representation
No ratings yet
CH 4 Knowledge Representation
33 pages
Activity 4 Worlds Greatest Strategists
No ratings yet
Activity 4 Worlds Greatest Strategists
3 pages
LECTURE SET 07 - Machine Learning For Artificial Intelligence
No ratings yet
LECTURE SET 07 - Machine Learning For Artificial Intelligence
75 pages
Evaluating Model Performance: Evaluation Strategies: Train/Validation/Test
No ratings yet
Evaluating Model Performance: Evaluation Strategies: Train/Validation/Test
127 pages
Lesson 6
No ratings yet
Lesson 6
38 pages
TTNT 09 Learning From Examples
No ratings yet
TTNT 09 Learning From Examples
58 pages
RT - Tank - Bunds
No ratings yet
RT - Tank - Bunds
3 pages
CS585 Lecture October03rd
No ratings yet
CS585 Lecture October03rd
146 pages
Machine Learning - I
No ratings yet
Machine Learning - I
126 pages
Mining Rehabilitation Fund Questions and Answers
No ratings yet
Mining Rehabilitation Fund Questions and Answers
4 pages
Checklist For Post Registration - Plots
No ratings yet
Checklist For Post Registration - Plots
23 pages
OOPR Lesson-1
No ratings yet
OOPR Lesson-1
46 pages
ML 3RD Unit
No ratings yet
ML 3RD Unit
67 pages
Paulo Coelho'S: Aleph
No ratings yet
Paulo Coelho'S: Aleph
1 page
Lesson 9
No ratings yet
Lesson 9
51 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
Queue - Haynes Kia Sephia &amp Spectra Automotive Repair Manual
No ratings yet
Queue - Haynes Kia Sephia &amp Spectra Automotive Repair Manual
4 pages
Quiz 1 Materials
No ratings yet
Quiz 1 Materials
159 pages
Ichartpro: Electronic Clinical Documentation
No ratings yet
Ichartpro: Electronic Clinical Documentation
4 pages
Unit 3 Exam - Hands-On - Part 1
No ratings yet
Unit 3 Exam - Hands-On - Part 1
2 pages
DM Assignment 2
No ratings yet
DM Assignment 2
23 pages
Infosys Questions - 2
No ratings yet
Infosys Questions - 2
21 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
06 Regularizations
No ratings yet
06 Regularizations
42 pages
Test and Evaluation of Aircraft Avionics and Weapon Systems 2nd Edition Robert B. Mcshea PDF Download
No ratings yet
Test and Evaluation of Aircraft Avionics and Weapon Systems 2nd Edition Robert B. Mcshea PDF Download
52 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
4 DL
No ratings yet
4 DL
81 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
Lec10 Intro ML
No ratings yet
Lec10 Intro ML
93 pages
LECTURE SET 07 - Machine Learning For Artificial Intelligence
No ratings yet
LECTURE SET 07 - Machine Learning For Artificial Intelligence
48 pages
19 ML Intro
No ratings yet
19 ML Intro
33 pages
14 Supervised Machine Learning
No ratings yet
14 Supervised Machine Learning
94 pages
Lecture 9 - Evaluations
No ratings yet
Lecture 9 - Evaluations
68 pages
How To Trade The IV Flush Strategy
No ratings yet
How To Trade The IV Flush Strategy
4 pages
SML Lecture1
No ratings yet
SML Lecture1
37 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Module3 DS PPT
No ratings yet
Module3 DS PPT
68 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
2020 Evaluation PDF
No ratings yet
2020 Evaluation PDF
25 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Erick Oliva
No ratings yet
Erick Oliva
6 pages
Classification
No ratings yet
Classification
53 pages
ML Fundamentals
No ratings yet
ML Fundamentals
15 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Lec2 Intro To ML
No ratings yet
Lec2 Intro To ML
35 pages
Xchapter 1
No ratings yet
Xchapter 1
31 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
Introduction To ML
No ratings yet
Introduction To ML
48 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Generative Ai: A Comprehensive Guide to Innovative Ai Models (A Step-by-step Understanding of Fundamental Concepts With Practical Applications)
From Everand
Generative Ai: A Comprehensive Guide to Innovative Ai Models (A Step-by-step Understanding of Fundamental Concepts With Practical Applications)
Anthony Phillips
No ratings yet
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
University Institute of Engineering Department of Computer Science and Engg
No ratings yet
University Institute of Engineering Department of Computer Science and Engg
27 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
BlakeBlossomXXX OnlyFans Pictures & Videos Complete Siterip 3 Download
No ratings yet
BlakeBlossomXXX OnlyFans Pictures & Videos Complete Siterip 3 Download
1 page
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
12 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
An Overview of Machine Learning
No ratings yet
An Overview of Machine Learning
17 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Unit 3 - Subject Evaluation Building Sentences and Paragraphs (BSP)
No ratings yet
Unit 3 - Subject Evaluation Building Sentences and Paragraphs (BSP)
4 pages
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet

Chapter 19

Uploaded by

Chapter 19

Uploaded by

1

 Supervised learning includes

Example: Univariate curve fitting (regression, function approximation)

 Measure mistakes: Loss function

 Find the best hypothesis that minimizes the loss

Issue: The classifier requires to learn from the examples.

Generalization: How well does the hypothesis perform on new data?

How to achieve simplicity?

Simpler More consistent

Lines: the learned

High Bias: restrictions by the model class Low

 Add information sources as new variables to the model.

 Example for Spam detection: In addition to words

 Feature Selection: Which features should be used in the model is a

is an indicator function returning 1 if and otherwise 0

1. Model parameters (the model): E.g., probabilities, weights, factors.

 We need to tune the hyperparameters!

1. Hold a validation data set back from the training data.

 Random splits: Split the data randomly in, e.g.,

The Effect the Training Data Size

 First step: get a baseline

 Weak baseline: The most frequent label classifier

 Strong baseline: For research, we typically compare to previous published

Gradient descend: ∇ 𝐿(𝒘 )

The s and the s are estimated from the data by counting.

The parameters for the normal distribution are estimated from

 A sequence of decisions represented as a tree.

 Approximates a Bayesian classifier by

• Generalized linear model (GLM): This important

Popular classification method logistic regression.

Models and Often used methods

• Regularization: enforce simplicity by using a penalty

Methods • Kernel trick: Let a linear classifier learn non-linear

Agents Learn Actions Learn Heuristics Perception Compressing Tables

Bottom line: Learning a function is often more effective than hard-coding it

You might also like