Lecture 1.2 Basics and Prerequisite
Lecture 1.2 Basics and Prerequisite
House
Age
price
House
Age
price
House
Age
price
One-dimensional Two-dimensional
data (1 feature) The features data (2 features)
(also called attributes / variables)
3 Classes
House
Age
price
One-dimensional Two-dimensional
data (1 feature) The features data (2 features)
(also called attributes / variables)
Notations related to datasets
• Assume we have a set of 𝑛𝑛 houses
𝑖𝑖
• Let 𝑥𝑥𝑗𝑗 be the 𝑗𝑗𝑡𝑡𝑡 feature value of the 𝑖𝑖 𝑡𝑡𝑡 house.
First house: 𝑥𝑥11 = 𝟖𝟖𝟖𝟖 𝑥𝑥21 = 𝟑𝟑 𝑥𝑥31 = 𝟒𝟒
Second house: 𝑥𝑥12 = 𝟐𝟐𝟎𝟎 𝑥𝑥22 = 𝟐𝟐 𝑥𝑥32 = 𝟑𝟑
…
The whole data is represented as a matrix
of 𝑛𝑛 rows and 𝑑𝑑 columns (here 𝑑𝑑 = 3 features)
Notations related to datasets
• We want to train a supervised ML algorithm to predict the price of new houses.
• The 𝑖𝑖 𝑡𝑡𝑡 house has a price 𝑦𝑦 (𝑖𝑖) (a scalar value) and is characterized by a feature-
vector 𝑥𝑥 (𝑖𝑖) . So, our training dataset is:
• Learning (or training) means finding the optimal parameters on a given dataset.
• In this example, as we have one feature (house size), the input 𝑥𝑥 is a scalar value.
• ℎ𝜃𝜃 (𝑥𝑥) is the predicted price for the input 𝑥𝑥 using the model ℎ𝜃𝜃
House
price
hypothesis
(model)
House size
Notations related to models
𝒂𝒂
𝜽𝜽𝟏𝟏 =
𝒃𝒃
𝒃𝒃
𝜽𝜽𝟎𝟎 𝒂𝒂 𝜽𝜽𝟎𝟎
How to choose 𝜃𝜃0 and 𝜃𝜃1 We will see this in the next lecture.
Notations related to models
• ℎ𝜃𝜃 (𝑥𝑥) is the predicted price for the input 𝑥𝑥 using the model ℎ𝜃𝜃 .
Notations related to models
• How would you write the equation of ℎ𝜃𝜃 (𝑥𝑥) in a more compact format
(using vectors) ?
Help:
• The dot product between two vectors 𝑢𝑢 and 𝑣𝑣
of the same dimension is a scalar value: ;
𝑢𝑢𝑇𝑇 𝑣𝑣 = � 𝑢𝑢𝑖𝑖 𝑣𝑣𝑖𝑖 = 𝑢𝑢0 𝑣𝑣0 + 𝑢𝑢1 𝑣𝑣1 + ⋯
𝑖𝑖
Notations related to models
y
The cost function (or error)
• Given a dataset:
𝑖𝑖
NOTE: The error function is also The predicted output for The true output for 𝑥𝑥
sometimes called “cost function” the data-point 𝑥𝑥 𝑖𝑖
or “loss function”. e.g. the true price of the
e.g. the predicted price of 𝑖𝑖 𝑡𝑡𝑡 house
the 𝑖𝑖 𝑡𝑡𝑡 house.
Notations to remember
• 𝑥𝑥 (𝑖𝑖) ∈ ℝ𝑑𝑑 the 𝑖𝑖 𝑡𝑡𝑡 data-point (or feature-vector). It is a 𝑑𝑑-dimensional vector.
𝑖𝑖
• 𝑥𝑥𝑗𝑗 ∈ ℝ the value of the 𝒋𝒋𝒕𝒕𝒕𝒕 feature (or attribute, or variable) in the data-point 𝑥𝑥 (𝑖𝑖) .
• 𝑦𝑦 (𝑖𝑖) the value of the output variable (or target variable), for the 𝑖𝑖 𝑡𝑡𝑡 data-point.
𝑦𝑦 (𝑖𝑖) ∈ ℝ in regression, and 𝑦𝑦 𝑖𝑖 ∈ {… } in classification.
• ℎ𝜃𝜃 (𝑥𝑥) the output predicted by the model ℎ𝜃𝜃 for the data-point 𝑥𝑥.
𝒏𝒏 × 𝟑𝟑 𝟑𝟑 × 𝟏𝟏 𝒏𝒏 × 𝟏𝟏
Matrix Matrix multiplication
Matrix Matrix multiplication
• Example: to predict the outputs of all data-points in a dataset using several
linear models (ℎ𝜃𝜃 , 𝑔𝑔𝜃𝜃 , 𝑓𝑓𝜃𝜃 ) just multiply the dataset matrix by a matrix that
contains on each column the parameters of one model.
Predictions of ℎ
Predictions of 𝑔𝑔
Predictions of 𝑓𝑓
Dataset matrix Each column is the
parameters of one model
𝒏𝒏 × 𝟑𝟑 𝟑𝟑 × 𝟑𝟑
𝒏𝒏 × 𝟑𝟑
Matrix multiplication properties
• Matrix multiplication is not commutative
Same
result
Identity matrix, inverse, and transpose
• Identity matrix • Transpose
• Inverse of a matrix
If 𝐴𝐴 is an 𝑛𝑛 × 𝑛𝑛 matrix, and if it has an inverse, then:
Norm of a vector
Example:
The 2-norm (or 𝑙𝑙2 norm, or Euclidian norm) of the vector is:
More generally:
Euclidian distance:
The Euclidian distance between two vectors 𝑥𝑥
and 𝑧𝑧, is the Euclidian norm of their difference:
Norm of a vector
Example:
Derivatives
Question:
Compute the derivative of the error function 𝐸𝐸 with respect to
each parameter of the linear model ℎ𝜃𝜃 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥
Example:
Compute the derivative of the function 𝐸𝐸
where: