Supevised Learning - 1
Supevised Learning - 1
Artificial Intelligence
(ME3181)
Supervised Learning
o Supervised Learning (Học có giám sát): learns from labeled training data
to make predictions or decisions.
o Regression: Finding the relationship between a dependent variable
(label, target, output, outcome variable) and one or more
independent variables (also known as predictors or features).
o Classification: assign input data points to one of several predefined
categories or classes
o Unsupervised Learning (Học không giám sát): finds patterns,
relationships, or structures in a dataset without the presence of labeled
output or target variables.
Training set
Learning Algorithm
𝑥 𝑦
ℎ
Data Estimated Value
Hypothesis/ Model
https://www.amybergquist.com/
Lecture notes of Andrew Ng
Applications of AI (ME3181) 4
Linear Regression
Linear Regression
𝑦 = 𝑦ො + 𝜖, 𝜖~𝑁 0, 𝜎 2
𝑦 ≈ 𝑦ො = 𝑤𝑥 + 𝑏
Applications of AI (ME3181) 8
Simple Linear Regression
𝑥
Applications of AI (ME3181) 10
Simple Linear Regression
Assumption:
𝑦 = 𝑓 𝑥 ≈ 𝑦ො = 𝑎𝑥 + 𝑏
Or
𝑦 = 𝑦ො + 𝜖 𝜖: residual
Where 𝜖~𝑁 0, 𝜎 2
Dataset
o 𝐿 = σ𝑛𝑖=1 𝑦𝑖 − 𝑦ത − 𝑎(𝑥𝑖 − 𝑥ҧ ) 2
𝒙 𝒚
o To minimize 𝐿:
𝑥 1 𝑦 1 𝑛 𝑛
𝜕𝐿 2
= −2 𝑦𝑖 − 𝑦ത 𝑥𝑖 − 𝑥ҧ − 𝑎 𝑥𝑖 − 𝑥ҧ =0
𝑥 2
𝑦 2 𝜕𝑎 𝑖=1 𝑖=1
… …
𝑚 𝑚
𝑥 𝑦
𝑦
Real value 𝑦
Estimated value 𝑦ො
𝑥
Applications of AI (ME3181) 11
Residual Analysis
𝑚
Total sum of square
𝑖
𝑇𝑆𝑆 = (𝑦 − 𝑦)
ത
1
𝑅𝑆𝑆
𝑅2 =1−
𝑇𝑆𝑆
Applications of AI (ME3181) 12
General Linear Regression
- Generalization for multiple variables (multiple features):
estimate: 𝑦ො = 𝑓 𝑥 = 𝑏 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛
Note: 𝑦 = 𝑤𝑥 + 𝑏 is not linear, it is affine.
- Make the equation become linear
Let 𝑏 = 𝑤0 and 𝑥0 = 1
𝑦ො = 𝑤0 𝑥0 + 𝑤1 𝑥1 + ⋯ + 𝑤𝑛 𝑥𝑛
𝑦ො = 𝑤 𝑇 𝑥
𝒙 = 𝑥0 , … , 𝑥𝑛 Vectors (n+1) x 1
𝒘 = 𝑤0 , … , 𝑤𝑛
- The problem is to find w
Applications of AI (ME3181) 14
General Linear Regression
Solving the model
Data
𝑚
𝑥 𝑖 ,𝑦 𝑖
𝑖=1
Hypothesis
𝑦 (𝑖) ≈ 𝑤 𝑇 𝑥 𝑖
Loss-function
𝑚 𝑚
1 𝑖 𝑖 2 𝑖 2
min 𝑦 − 𝑦ො → 𝑚𝑖𝑛 𝑦 − 𝑤𝑇𝑥 𝑖
𝑚
𝑖=1 𝑖=1
𝑚
1 𝑖 𝑇 𝑖 2 1 2
Analytical solution 𝐿 𝑤 = 𝑦 −𝑤 𝑥 = 𝑦 − 𝑋𝑇𝑤 2
𝑚 𝑚
𝑖=1
𝑤0
1 1 1 𝑥0
1 𝑥0 = 1
𝑦1 𝑥0 𝑥1 𝑥𝑛 𝑤= 𝑤
1 1 1 1 1 …1
2 𝑥 = 𝑥1
𝑦 = 𝑦… 𝑋= 𝑥0 𝑥1 … 𝑥𝑛 … 𝑤𝑛
… … …
1
𝑦𝑚 1
𝑥0 𝑥1
1
𝑥𝑛
1 𝑥𝑛
Applications of AI (ME3181) 15
General Linear Regression
Analytical solution
1 2
1
𝐿 𝑤 = 𝑦 − 𝑋𝑇𝑤 2 = 𝑦 − 𝑋𝑇𝑤 2
𝑚 𝑚
𝐿(𝑤)
To minimize L(𝑤), =0
𝜕𝑤
𝑤 = 𝑋𝑋 𝑇 −1 𝑋𝑦
Applications of AI (ME3181) 16
Introduction to Gradient Descent
Numerical solution of Loss-function
𝜕𝐿(𝑤)
- Solving a cost function is to solve 𝜕𝑤 = 0
- In the case 𝑤 is a vector: ∇𝑤 𝐿 𝑤 = 0
- It may be difficult to find and solve analytical solutions
Gradient descent
𝑛𝑒𝑥𝑡 𝑐𝑢𝑟𝑟𝑒𝑛𝑡
𝑤 =𝑤 − 𝛼∇𝑤 𝐿 𝑤
Applications of AI (ME3181) 17
Introduction to Gradient Descent
Batch Gradient Descent (or Gradient Descent – GD)
𝑛𝑒𝑥𝑡 𝑐𝑢𝑟𝑟𝑒𝑛𝑡
𝑤 =𝑤 − 𝛼∇𝑤 𝐿 𝑤
- The whole train data set is used to train
Stochastic Gradient Descent (SGD)
- Each random data sample is used per 1 update
Mini-batch Gradient Descent
- The whole dataset is split into small batches.
Let ln 𝑥𝑖 → 𝑥′𝑖
https://analystprep.com/study-notes/cfa-level-2/linear-or-log-linear-model/
Applications of AI (ME3181) 19
Related forms
Polymial regression
𝑦 ≈ 𝑦ො = 𝑤𝑛 𝑥 𝑛 + ⋯ + 𝑤0 Let 𝑥 𝑖 → 𝑥𝑖
Degree
Degree = 2
?
Applications of AI (ME3181) 20
Model valuation
Overfitting and Underfitting
Underfitting
Overfitting
Applications of AI (ME3181) 21
Model valuation
Learning curves
Underfitting Overfitting
Applications of AI (ME3181) 22
Model valuation
Bias/ Variance trade-off
Error = Bias + Variance + Irreducible Noise
Trade-off:
Increase a model’s complexity → Increase variance and Reduce Bias
Reduce a model’s complexity → Reduce variance and Increase Bias
Applications of AI (ME3181) 23
Regularization
Applications of AI (ME3181) 24
Regularization
Ridge Regression 𝑛
𝛼
𝐿𝑅𝑖𝑑𝑔𝑒 𝑤 = 𝐿 𝑤 + 𝑤𝑖2
2
𝑖=0
Lasso Regression
𝑛
𝐿𝐿𝑎𝑠𝑠𝑜 𝑤 = 𝐿 𝑤 + 𝛼 𝑤𝑖
𝑖=0
Elastic Net
𝑛 𝑛
1−𝑟
𝐿𝐸𝑙𝑎𝑠𝑡𝑖𝑐𝑁𝑒𝑡 𝑤 = 𝐿 𝑤 + 𝑟𝛼 𝑤𝑖 + 𝛼 𝑤𝑖2
2
𝑖=0 𝑖=0
Applications of AI (ME3181) 25
Regularization
Early Stopping
Applications of AI (ME3181) 26