0% found this document useful (0 votes)
8 views35 pages

Lecture 1.2 Basics and Prerequisite

The document outlines a Machine Learning course led by Dr. Mohamed-Rafik Bouguelia at Halmstad University, focusing on the basics and prerequisites of machine learning, including terminology, dataset notations, and model parameters. It covers concepts such as feature vectors, training datasets, cost functions, and linear algebra fundamentals necessary for understanding machine learning algorithms. Additionally, it emphasizes the importance of learning and optimizing model parameters to minimize prediction errors.

Uploaded by

homerajasekhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views35 pages

Lecture 1.2 Basics and Prerequisite

The document outlines a Machine Learning course led by Dr. Mohamed-Rafik Bouguelia at Halmstad University, focusing on the basics and prerequisites of machine learning, including terminology, dataset notations, and model parameters. It covers concepts such as feature vectors, training datasets, cost functions, and linear algebra fundamentals necessary for understanding machine learning algorithms. Additionally, it emphasizes the importance of learning and optimizing model parameters to minimize prediction errors.

Uploaded by

homerajasekhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Machine Learning Course

Basics and Prerequisites


Terminology, definitions and review of some math
notions

Dr. Mohamed-Rafik Bouguelia


[email protected]

Halmstad University, Sweden.


You can also watch the video corresponding
to this lecture at: https://youtu.be/91siCik-b7o
Notations related to
datasets

Machine Learning Course.


Dr. Mohamed-Rafik Bouguelia.
[email protected]
Notations related to datasets
Heath state:
malignant cancer
benign cancer
healthy (no cancer)

House
Age
price

House size Uniformity of cells


Notations related to datasets
Heath state:
malignant cancer
benign cancer
Data-points healthy (no cancer)
(also called feature-vectors, examples,
instances, or observations)

House
Age
price

House size Uniformity of cells


Notations related to datasets
Heath state:
malignant cancer
benign cancer
Data-points healthy (no cancer)
(also called feature-vectors, examples,
instances, or observations)

House
Age
price

House size Uniformity of cells

One-dimensional Two-dimensional
data (1 feature) The features data (2 features)
(also called attributes / variables)
3 Classes

Notations related to datasets


The output variable Heath state:
(also called target variable) malignant cancer
benign cancer
Data-points healthy (no cancer)
(also called feature-vectors, examples,
instances, or observations)

House
Age
price

House size Uniformity of cells

One-dimensional Two-dimensional
data (1 feature) The features data (2 features)
(also called attributes / variables)
Notations related to datasets
• Assume we have a set of 𝑛𝑛 houses

• Each house 𝑥𝑥 (𝑖𝑖) is characterized by:


1. its size
2. its number of rooms
3. its location (distance from the city center)

• This is a 3-dimensional data (we have 𝑑𝑑 = 3 features). So,


each data-point 𝑥𝑥 (𝑖𝑖) ∈ ℝ3 is represented by a feature-vector:

𝑖𝑖
• Let 𝑥𝑥𝑗𝑗 be the 𝑗𝑗𝑡𝑡𝑡 feature value of the 𝑖𝑖 𝑡𝑡𝑡 house.
 First house: 𝑥𝑥11 = 𝟖𝟖𝟖𝟖 𝑥𝑥21 = 𝟑𝟑 𝑥𝑥31 = 𝟒𝟒
 Second house: 𝑥𝑥12 = 𝟐𝟐𝟎𝟎 𝑥𝑥22 = 𝟐𝟐 𝑥𝑥32 = 𝟑𝟑
 …
 The whole data is represented as a matrix
of 𝑛𝑛 rows and 𝑑𝑑 columns (here 𝑑𝑑 = 3 features)
Notations related to datasets
• We want to train a supervised ML algorithm to predict the price of new houses.

• We need first to prepare a training dataset which consists of:


– The input data
– The real price (output) 𝑦𝑦 (𝑖𝑖) associated to each training data-point 𝑥𝑥 (𝑖𝑖)
• NOTE: These real prices are given to teach (or supervise) the algorithm, so that it learns (or models)
the relation between “the features that characterizes the input data”, and the “desired output” (price).

• The 𝑖𝑖 𝑡𝑡𝑡 house has a price 𝑦𝑦 (𝑖𝑖) (a scalar value) and is characterized by a feature-
vector 𝑥𝑥 (𝑖𝑖) . So, our training dataset is:

• Can, also be represented as matrix and a vector of prices


Notations related to
models

Machine Learning Course.


Dr. Mohamed-Rafik Bouguelia.
[email protected]
Notations related to models
• The model (to be learned) is a function ℎ𝜃𝜃 (called hypothesis).
• The model has a parameters vector 𝜃𝜃.

• Learning (or training) means finding the optimal parameters on a given dataset.

• In this example, as we have one feature (house size), the input 𝑥𝑥 is a scalar value.
• ℎ𝜃𝜃 (𝑥𝑥) is the predicted price for the input 𝑥𝑥 using the model ℎ𝜃𝜃

House
price

hypothesis
(model)
House size
Notations related to models
𝒂𝒂
𝜽𝜽𝟏𝟏 =
𝒃𝒃

𝒃𝒃
𝜽𝜽𝟎𝟎 𝒂𝒂 𝜽𝜽𝟎𝟎

How to choose 𝜃𝜃0 and 𝜃𝜃1  We will see this in the next lecture.
Notations related to models

• In this example, the input 𝒙𝒙 is a two-dimensional vector (i.e., 𝒙𝒙 ∈ ℝ𝟐𝟐 ), as we


have two features:
1. house size
2. number of rooms

• ℎ𝜃𝜃 (𝑥𝑥) is the predicted price for the input 𝑥𝑥 using the model ℎ𝜃𝜃 .
Notations related to models

• How would you write the equation of ℎ𝜃𝜃 (𝑥𝑥) in a more compact format
(using vectors) ?

Help:
• The dot product between two vectors 𝑢𝑢 and 𝑣𝑣
of the same dimension is a scalar value: ;
𝑢𝑢𝑇𝑇 𝑣𝑣 = � 𝑢𝑢𝑖𝑖 𝑣𝑣𝑖𝑖 = 𝑢𝑢0 𝑣𝑣0 + 𝑢𝑢1 𝑣𝑣1 + ⋯
𝑖𝑖
Notations related to models

• We redefine the input 𝑥𝑥 by adding 𝟏𝟏 as the first element.


Then we can just use the dot product between 𝜃𝜃 and 𝑥𝑥.

= 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥1 + 𝜃𝜃2 𝑥𝑥2


ℎ𝜃𝜃 (𝑥𝑥) = 𝜃𝜃 𝑇𝑇 𝑥𝑥
𝜃𝜃 𝑇𝑇 𝑥𝑥 ℎ𝜃𝜃 𝑥𝑥
Learning is estimating the parameters of the model
• Learning (training) means finding the parameters that
minimizes the cost (error).

y
The cost function (or error)
• Given a dataset:

• The cost 𝐸𝐸(𝜃𝜃) of a model ℎ𝜃𝜃 on this dataset is:

𝑖𝑖
NOTE: The error function is also The predicted output for The true output for 𝑥𝑥
sometimes called “cost function” the data-point 𝑥𝑥 𝑖𝑖
or “loss function”. e.g. the true price of the
e.g. the predicted price of 𝑖𝑖 𝑡𝑡𝑡 house
the 𝑖𝑖 𝑡𝑡𝑡 house.
Notations to remember
• 𝑥𝑥 (𝑖𝑖) ∈ ℝ𝑑𝑑 the 𝑖𝑖 𝑡𝑡𝑡 data-point (or feature-vector). It is a 𝑑𝑑-dimensional vector.

𝑖𝑖
• 𝑥𝑥𝑗𝑗 ∈ ℝ the value of the 𝒋𝒋𝒕𝒕𝒕𝒕 feature (or attribute, or variable) in the data-point 𝑥𝑥 (𝑖𝑖) .

• 𝑦𝑦 (𝑖𝑖) the value of the output variable (or target variable), for the 𝑖𝑖 𝑡𝑡𝑡 data-point.
𝑦𝑦 (𝑖𝑖) ∈ ℝ in regression, and 𝑦𝑦 𝑖𝑖 ∈ {… } in classification.

• 𝐗𝐗 ∈ ℝ𝑛𝑛×𝑑𝑑 a dataset represented as a matrix of 𝑛𝑛 rows and 𝑑𝑑 columns.

• 𝜃𝜃 ∈ ℝ𝑝𝑝 a vector representing the model parameters. It has 𝑝𝑝 parameters.


Sometimes also called weights vector.

• ℎ𝜃𝜃 a model (hypothesis function) with parameters vector .

• ℎ𝜃𝜃 (𝑥𝑥) the output predicted by the model ℎ𝜃𝜃 for the data-point 𝑥𝑥.

• 𝐸𝐸 𝜃𝜃 the cost (or loss, or error) of a model ℎ𝜃𝜃 , on some dataset.


Some basics of linear
algebra

Machine Learning Course.


Dr. Mohamed-Rafik Bouguelia.
[email protected]
Matrices and Vectors

The matrix 𝐴𝐴 has a dimension of 4 × 2

• A vector is simply an 𝑛𝑛 × 1 matrix

𝑢𝑢𝑖𝑖 is the 𝑖𝑖 𝑡𝑡𝑡 element of 𝑢𝑢


Matrix addition
Scalar Multiplication
Combination of operations
Matrix Vector multiplication

3×2 2×1 3×1

Just a dot product


between two vectors
Matrix Vector multiplication
• Example: to predict the outputs of all data-points in a
dataset using a linear model ℎ𝜃𝜃 , just multiply the dataset
matrix by the vector of parameters 𝜃𝜃

𝒏𝒏 × 𝟑𝟑 𝟑𝟑 × 𝟏𝟏 𝒏𝒏 × 𝟏𝟏
Matrix Matrix multiplication
Matrix Matrix multiplication
• Example: to predict the outputs of all data-points in a dataset using several
linear models (ℎ𝜃𝜃 , 𝑔𝑔𝜃𝜃 , 𝑓𝑓𝜃𝜃 ) just multiply the dataset matrix by a matrix that
contains on each column the parameters of one model.
Predictions of ℎ

Predictions of 𝑔𝑔
Predictions of 𝑓𝑓
Dataset matrix Each column is the
parameters of one model

𝒏𝒏 × 𝟑𝟑 𝟑𝟑 × 𝟑𝟑
𝒏𝒏 × 𝟑𝟑
Matrix multiplication properties
• Matrix multiplication is not commutative

• Matrix multiplication is associative

Same
result
Identity matrix, inverse, and transpose
• Identity matrix • Transpose

• Inverse of a matrix
If 𝐴𝐴 is an 𝑛𝑛 × 𝑛𝑛 matrix, and if it has an inverse, then:
Norm of a vector
Example:
The 2-norm (or 𝑙𝑙2 norm, or Euclidian norm) of the vector is:

More generally:

Euclidian distance:
The Euclidian distance between two vectors 𝑥𝑥
and 𝑧𝑧, is the Euclidian norm of their difference:
Norm of a vector

Example:
Derivatives

Machine Learning Course.


Dr. Mohamed-Rafik Bouguelia.
[email protected]
Definition of a derivative
Derivatives – Time saving rules

Question:
Compute the derivative of the error function 𝐸𝐸 with respect to
each parameter of the linear model ℎ𝜃𝜃 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥
Example:
Compute the derivative of the function 𝐸𝐸
where:

• Derivative of 𝑬𝑬(𝜽𝜽𝟎𝟎 , 𝜽𝜽𝟏𝟏 ) with respect to 𝜽𝜽𝟎𝟎

• Derivative of 𝑬𝑬(𝜽𝜽𝟎𝟎 , 𝜽𝜽𝟏𝟏 ) with respect to 𝜽𝜽𝟏𝟏

You might also like