0% found this document useful (0 votes)

12 views169 pages

C4 +Supervised+Machine+Learning

The document provides an overview of supervised machine learning (SML), detailing its lifecycle, key concepts, and types of algorithms such as classification and regression. It explains the process of training SML models, including data preparation, error calculation using various metrics, and the optimization of trainable parameters through methods like gradient descent. Additionally, it highlights specific applications of SML, such as optical character recognition and email prioritization.

Uploaded by

sirine.nahra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views169 pages

C4 +Supervised+Machine+Learning

Uploaded by

sirine.nahra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 169

Spring 2025

ECE 490: Introduction to

Machine Learning
Chapter 4: Supervised Machine Learning ALgorithms
Machine Learning Lifecycle

ECE 490: Introduction to ML 2

Where are we in the life cycle now?
We are here

ECE 490: Introduction to ML 3

Supervised Learning

ECE 490: Introduction to ML 4

Recap: Supervised Learning

Supervised learning is a category of machine learning that uses labeled datasets

to train algorithms to predict outcomes and recognize patterns.

ECE 490: Introduction to ML 5

Algorithm vs Model

A model is the outcome of training an algorithm on data; it represents the

learned patterns, relationships, or predictions based on the training process.

ECE 490: Introduction to ML 6

Algorithm vs Model

ECE 490: Introduction to ML 7

Types of Supervised Machine Learning
(SML) Applications

ECE 490: Introduction to ML 8

Classification vs Regression

ECE 490: Introduction to ML 9

Classification

Classification is a type of supervised learning where the goal is to predict the

categorical label of new observations based on past observations.

ECE 490: Introduction to ML 10

Regression

Regression is another key type of supervised learning that focuses on predicting

continuous numerical values rather than categorical labels.

ECE 490: Introduction to ML 11

Examples of SML Applications

ECE 490: Introduction to ML 12

Optical Character Recognition (OCR)

The model is able to identify handwritten characters and classify each image as a
character. In this case, we are classifying the number digits.

ECE 490: Introduction to ML 13

Email Prioritization

The model is able to successfully detect which of the arriving emails go to spam
and which to the primary inbox.

ECE 490: Introduction to ML 14

Language Translation

The model is able to take in a sequence in

one language and output a sequence of the
same information in a different language.

Would this be classification or regression?

ECE 490: Introduction to ML 15

Language Translation

Language translation is a classification

problem in because it involves predicting the
next word (or token) from a predefined
vocabulary, which can be seen as a list of
possible "classes." Each word in the
vocabulary is effectively a "class" that the
model selects based on the input context.

ECE 490: Introduction to ML 16

Linear Regression

ECE 490: Introduction to ML 17

Linear Regression
Linear regression is a statistical method
used to model the relationship between a
the target variable and a feature variable
by a line.

Based on the training data points, we try to

create a line that best models the
relationships between the input feature and
the output feature.

ECE 490: Introduction to ML 18

Applications of Linear Regression

ECE 490: Introduction to ML 19

Linear Regression

Given that our dataset has one input feature and an output feature, let’s plot our
data.

ECE 490: Introduction to ML 20

Linear Regression

We want to create a line that models the training data points in the lowest error
possible. We call the modeled relationship: “the best fit line”.

- Best fit line

- Predicted line

ECE 490: Introduction to ML 21

Linear Regression

Data point

Label

Predicted value

ECE 490: Introduction to ML Input feature value 22

Linear Regression

We want to create a line that will not only

model the current data points, but will also
allow us to predict future outputs with high
accuracy.

Predicted value

New input (value not

seed in the dataset)
ECE 490: Introduction to ML 23
Linear regression

To get to the best fit line, we will go

through multiple iterations of updates.

What is the model updating (or “learning”)?

In the case of creating a line, we update

the slope, and y-intercept.

ECE 490: Introduction to ML 24

Linear Regression

Since the model is trying to create a best fit line, it is optimizing the equation of a
line.

Predicted value
Input feature value

Model trainable
parameters
ECE 490: Introduction to ML 25
“Learning” of machine learning algorithms

We call the variables being updated during the training process as “trainable
parameters” or “weights”. In some cases, we also have a “bias” term as a
trainable parameter.

- Trainable parameters because they are getting “trained” or updated during

the learning process.
- Weights because they also carry the importance and contribution of each
input value.
- Bias because it helps direct the predictions of the model for higher
accuracy.

ECE 490: Introduction to ML 26

Linear Regression

In the case of linear regression, we have both a weight and a bias in our model.

Bias term Weight

Model trainable
parameters
ECE 490: Introduction to ML 27
How SML algorithms learn

ECE 490: Introduction to ML 28

Learning flow of SML models

ECE 490: Introduction to ML 29

SML training process

The training (or learning) process of

supervised machine learning algorithms
consists of 4 parts:

1. Preparing the training dataset

2. Initializing the algorithm
3. Loop over data points in the training
set:
a. Make a prediction (Class or Number)
b. Calculate the error of the prediction
4. Update the algorithm parameters

ECE 490: Introduction to ML 30

Preparing the training dataset

The data preparation steps, as

covered in the previous chapters,
need:

1. EDA and feature tuning

2. Transformation to numerical
representation
3. Split the dataset for training
and testing

ECE 490: Introduction to ML 31

Training Linear Regression

We mentioned that linear regression has one bias term

and one weight that needs to be updated using the
training loop. Let’s see how that happens.

First, we will use a dataset of randomly created points.

ECE 490: Introduction to ML 32

Training Linear Regression

Since the dataset is random and does not actually hold

any information, we don’t need to perform EDA.

We will only pre-process the data by normalizing it.

ECE 490: Introduction to ML 33

Initializing the algorithm

Every machine learning algorithm holds its

learned patterns and connections within its
parameters, often referred to as “trainable
parameters”.

These trainable parameters, at the

beginning of the training process, are
randomly initialized to be updated and
“learned” during the training process.

ECE 490: Introduction to ML 34

Linear Regression Example

So, we will choose random initializations for our trainable parameters.

ECE 490: Introduction to ML 35

Linear Regression Example

Let’s visualize initialized line

ECE 490: Introduction to ML 36

Predicted value
Looping over data points

For every data point, we feed the data

points to the model so it can make a
Error
prediction.

This prediction would not be accurate, so it

contains some error.

Real value

Input data point

ECE 490: Introduction to ML 37
Looping over data points

This error that we got, which is the difference between real and predicted values,
should guide our update of the trainable parameters.

The goal is to update the trainable parameters so that their update results in
a lower error.

ECE 490: Introduction to ML 38

Error calculation

For regression tasks, we have multiple

options for calculating the error between the
real and predicted values:
1. Mean Bias Error (MBE)
2. Mean Squared Error (MSE)
3. Mean Absolute Error (MAE)
4. Root Mean Squared Error (RMSE)
5. Huber Loss
6. Mean Logarithmic Error (MLE)
7. …

ECE 490: Introduction to ML 39

Error calculation - Loss function

If we calculate the error for more than one data point at a time, this is done using a
loss function.

The loss function aggregates the individual errors across the selected data
points, typically by computing the average or sum of the errors using a specified
error function (e.g., MSE, MAE).

ECE 490: Introduction to ML 40

Error Calculation - Mean Bias Error

Measures the average difference between predicted and actual values. It indicates
the direction of the error (positive or negative bias).

- Positive MBE: Overestimation.

- Negative MBE: Underestimation.

Useful for understanding bias in predictions but not ideal as a standalone metric
because it doesn't capture the magnitude of errors.

Error Function: Loss Function:

ECE 490: Introduction to ML 41

Linear Regression Example - MBE

In the linear regression model we initialized earlier, let’s try to get a prediction from
the first data point in the training set and calculate its MBE.

ECE 490: Introduction to ML 42

Error Calculation - Mean Squared Error

Measures the average squared differences between predicted and actual values.
Squaring emphasizes larger errors, making it sensitive to outliers.

Use this error when you want to penalize large errors more heavily or when
outliers are meaningful.

Error Function: Loss Function:

ECE 490: Introduction to ML 43

Linear Regression Example - MSE

Using the MSE, we see a significant difference in the error.

Both MSE and MBE can guide the update of trainable parameters during model
training, but they serve different purposes. However MSE is more commonly
used to minimize overall prediction error, and MBE is used to provide insight
into whether the model consistently overestimates or underestimates.

ECE 490: Introduction to ML 44

Error Calculation - Mean Absolute Error

Measures the average of absolute differences between predicted and actual

values. It treats all errors equally, making it robust to outliers.

Useful for when you want a simple, interpretable metric that is less sensitive to
outliers than MSE.

Error Function: Loss Function:

ECE 490: Introduction to ML 45

Linear Regression Example - MAE

The MAE provides a more robust measure of error by treating all deviations
equally, without disproportionately penalizing large errors, unlike MSE. This
characteristic makes MAE less sensitive to outliers and provides a more balanced
reflection of typical errors in the model.

ECE 490: Introduction to ML 46

Error Calculation - Root Mean Squared Error

The square root of MSE. It provides the error in the same units as the target
variable, making it more interpretable than MSE.
Useful for when you want a metric in the same scale as the target variable, while
still penalizing large errors more heavily.
Suppose you're predicting house prices in dollars. RMSE provides an error value
(e.g., $5,000) that is also in dollars. This tells you that, on average, your model's
prediction is about $5,000 off from the actual value.

Error Function: Loss Function:

ECE 490: Introduction to ML 47

Linear Regression Example - RMSE

In this case, we got RMSE and MAE as the same value. This makes sense
because we calculating the errors for one data point.

Both are expressed in the same units as the target variable, but RMSE can
sometimes overemphasize large errors, which might distort the perception of the
model's performance.

ECE 490: Introduction to ML 48

Error Calculation - Huber Loss

Combines the properties of MSE and MAE. It behaves like MSE for small errors
and switches to MAE for large errors, making it robust to outliers while maintaining
sensitivity to small errors.

Error Function:

You set the threshold

(δ) according to the size
Loss Function: of your dataset

ECE 490: Introduction to ML 49

Linear Regression Example - Huber Loss

In our case, since the error > delta, then we applied the modified version of MAE
as the result of the huber loss.

ECE 490: Introduction to ML 50

Error Calculation - MLE

Measures the error logarithmically, which reduces the impact of large errors. This
metric is useful when the target values vary over several orders of magnitude.

Useful for when handling data with widely varying scales or when large errors are
undesirable but should not dominate the metric.

Error Function:

Loss Function:

ECE 490: Introduction to ML 51

Linear Regression Example - MLE

As you can see MLE tends to be much smaller than other error metrics like MSE,
MAE, or RMSE. This is because it focuses on relative error rather than outlier
impact.

ECE 490: Introduction to ML 52

Regression Error Functions Recap

ECE 490: Introduction to ML 53

Updating Algorithm Trainable Parameters

Now, the algorithm is initialized, and we made a prediction (or a set of predictions)
and calculated their error using the chosen loss function.

We want to utilize this error to perform an update in to the trainable parameters.

Intuitively, we want to update the trainable parameters in a way that will minimize
the error.

This process is performed using a family of algorithms called optimization

functions.

ECE 490: Introduction to ML 54

Optimization Algorithms - Gradient Descent
For now, we will focus on the most popular optimization algorithm: Gradient
Descent, which can be used in the training of ANY machine learning
algorithm.