0% found this document useful (0 votes)
8 views26 pages

Supevised Learning - 1

The document discusses supervised learning in artificial intelligence, focusing on its two main types: regression and classification, which utilize labeled training data to make predictions. It explains linear regression, including its assumptions, loss functions, and methods for minimizing errors, such as the least squares method and gradient descent. Additionally, it addresses model evaluation concepts like overfitting, underfitting, and the bias-variance trade-off.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views26 pages

Supevised Learning - 1

The document discusses supervised learning in artificial intelligence, focusing on its two main types: regression and classification, which utilize labeled training data to make predictions. It explains linear regression, including its assumptions, loss functions, and methods for minimizing errors, such as the least squares method and gradient descent. Additionally, it addresses model evaluation concepts like overfitting, underfitting, and the bias-variance trade-off.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Applications of

Artificial Intelligence
(ME3181)

Supervised Learning

Phung Thanh Huy


Department of Mechatronics
Ho Chi Minh City University of Technology (HCMUT)
[email protected]
14/09/2023
Supervised Learning
Supervised Learning
Supervised Learning Unsupersvised Learning
Discrete Classification Clustering
Continuous Regression Dimentionality Reduction

o Supervised Learning (Học có giám sát): learns from labeled training data
to make predictions or decisions.
o Regression: Finding the relationship between a dependent variable
(label, target, output, outcome variable) and one or more
independent variables (also known as predictors or features).
o Classification: assign input data points to one of several predefined
categories or classes
o Unsupervised Learning (Học không giám sát): finds patterns,
relationships, or structures in a dataset without the presence of labeled
output or target variables.

Lecture notes of Andrew Ng


Applications of AI (ME3181) 3
Supervised Learning
Supervised Learning Unsupersvised Learning
Discrete Classification Clustering
Continuous Regression Dimentionality Reduction

Training set

Learning Algorithm

𝑥 𝑦

Data Estimated Value
Hypothesis/ Model

https://www.amybergquist.com/
Lecture notes of Andrew Ng
Applications of AI (ME3181) 4
Linear Regression
Linear Regression

UST Class: Machine Learning (by Junseong Bang)


Applications of AI (ME3181) 6
Linear Correlation

UST Class: Machine Learning (by Junseong Bang)


Applications of AI (ME3181) 7
Simple Linear Regression
- Equation of a line: 𝑦 = 𝑤𝑥 + 𝑏
𝑤: coefficient / weight
𝑏: intercept / bias
- Linear regression:
Predict the relation between x and y, assuming that the relation is linear.
i.e. Giving x and y, estimate a function 𝑦ො ≈ 𝑤𝑥 + 𝑏
The problem becomes finding 𝑤 and 𝑏

𝑦 = 𝑦ො + 𝜖, 𝜖~𝑁 0, 𝜎 2
𝑦 ≈ 𝑦ො = 𝑤𝑥 + 𝑏

Applications of AI (ME3181) 8
Simple Linear Regression

UST Class: Machine Learning (by Junseong Bang)


Applications of AI (ME3181) 9
Simple Linear Regression
Assumption:
𝑦 = 𝑓 𝑥 ≈ 𝑦ො = 𝑎𝑥 + 𝑏
Or
𝑦 = 𝑦ො + 𝜖 𝜖: residual
Where 𝜖~𝑁 0, 𝜎 2 o For each data sample:
Dataset 𝜖𝑖 = 𝑦𝑖 − 𝑦ො𝑖
𝒙 𝒚 o Sum of squared 𝜖𝑖 :
𝑛 𝑛
𝑥 1 𝑦 1 𝐿= ෍ 𝜖𝑖2 =෍ 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2
𝑖=1 𝑖=1
𝑥 2
𝑦 2 o Least Square Method:
𝑎, 𝑏 are selected so that 𝑄 is minimized
… …
𝑎, 𝑏 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑄
𝑚 𝑚
𝑥 𝑦 o Consider:
𝑦 σ𝑛𝑖=1 𝜖𝑖 𝑛 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏
𝜖ҧ = =෍ = 𝑦ത − 𝑥ҧ − 𝑏
𝑚 𝑖=1 𝑚
Real value 𝑦
o Since 𝜖~𝑁 0, 𝜎 2 , 𝐸 𝜖 = 𝜖 ҧ = 0
Estimated value 𝑦ො 𝑏 = 𝑦ത − 𝑥ҧ

𝑥
Applications of AI (ME3181) 10
Simple Linear Regression
Assumption:
𝑦 = 𝑓 𝑥 ≈ 𝑦ො = 𝑎𝑥 + 𝑏
Or
𝑦 = 𝑦ො + 𝜖 𝜖: residual
Where 𝜖~𝑁 0, 𝜎 2
Dataset
o 𝐿 = σ𝑛𝑖=1 𝑦𝑖 − 𝑦ത − 𝑎(𝑥𝑖 − 𝑥ҧ ) 2
𝒙 𝒚
o To minimize 𝐿:
𝑥 1 𝑦 1 𝑛 𝑛
𝜕𝐿 2
= −2 ෍ 𝑦𝑖 − 𝑦ത 𝑥𝑖 − 𝑥ҧ − 𝑎 ෍ 𝑥𝑖 − 𝑥ҧ =0
𝑥 2
𝑦 2 𝜕𝑎 𝑖=1 𝑖=1

… …
𝑚 𝑚
𝑥 𝑦
𝑦

Real value 𝑦

Estimated value 𝑦ො

𝑥
Applications of AI (ME3181) 11
Residual Analysis
𝑚
Total sum of square
𝑖
𝑇𝑆𝑆 = ෍(𝑦 − 𝑦)

1

Explained sum of square 𝑚


𝑖 2
𝐸𝑆𝑆 = ෍ 𝑦ො − 𝑦ത 𝑦ത is mean values of all 𝑦
1

Residual sum of squares


𝑚
𝑖 2
𝑅𝑆𝑆 = ෍ 𝑦 − 𝑦ො
1
Coefficient of determination

𝑅𝑆𝑆
𝑅2 =1−
𝑇𝑆𝑆

Applications of AI (ME3181) 12
General Linear Regression
- Generalization for multiple variables (multiple features):
estimate: 𝑦ො = 𝑓 𝑥 = 𝑏 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛
Note: 𝑦 = 𝑤𝑥 + 𝑏 is not linear, it is affine.
- Make the equation become linear
Let 𝑏 = 𝑤0 and 𝑥0 = 1
𝑦ො = 𝑤0 𝑥0 + 𝑤1 𝑥1 + ⋯ + 𝑤𝑛 𝑥𝑛
𝑦ො = 𝑤 𝑇 𝑥
𝒙 = 𝑥0 , … , 𝑥𝑛 Vectors (n+1) x 1
𝒘 = 𝑤0 , … , 𝑤𝑛
- The problem is to find w

UST Class: Machine Learning (by Junseong Bang)


Applications of AI (ME3181) 13
General Linear Regression
Error of Each data point: Cost
Loss Function – Cost Function Error of All data points: Lost
𝑦
- Total errors of the estimated
function should be minimized Real value 𝑦
Error
- There are several ways to evaluate the Estimated value 𝑦ො
errors:
𝑚
1 2 𝑥
𝑀𝑆𝐸 = ෍ 𝑦 𝑖 − 𝑦ො 𝑖
𝑚 𝑚 set of data
𝑖=1 𝑥 1 …𝑥 𝑚
𝑚 And 𝑦 1 … 𝑦 𝑚
1 𝑖
𝑀𝐴𝐸 = ෍ |𝑦 − 𝑦ො 𝑖 |
𝑚
𝑖=1
- Usually, MSE is used.
- The problem becomes finding 𝑤 so that MSE is minimum

Applications of AI (ME3181) 14
General Linear Regression
Solving the model
Data
𝑚
𝑥 𝑖 ,𝑦 𝑖
𝑖=1
Hypothesis
𝑦 (𝑖) ≈ 𝑤 𝑇 𝑥 𝑖

Loss-function
𝑚 𝑚
1 𝑖 𝑖 2 𝑖 2
min ෍ 𝑦 − 𝑦ො → 𝑚𝑖𝑛 ෍ 𝑦 − 𝑤𝑇𝑥 𝑖
𝑚
𝑖=1 𝑖=1

𝑚
1 𝑖 𝑇 𝑖 2 1 2
Analytical solution 𝐿 𝑤 = ෍ 𝑦 −𝑤 𝑥 = 𝑦 − 𝑋𝑇𝑤 2
𝑚 𝑚
𝑖=1

𝑤0
1 1 1 𝑥0
1 𝑥0 = 1
𝑦1 𝑥0 𝑥1 𝑥𝑛 𝑤= 𝑤
1 1 1 1 1 …1
2 𝑥 = 𝑥1
𝑦 = 𝑦… 𝑋= 𝑥0 𝑥1 … 𝑥𝑛 … 𝑤𝑛
… … …
1
𝑦𝑚 1
𝑥0 𝑥1
1
𝑥𝑛
1 𝑥𝑛

Applications of AI (ME3181) 15
General Linear Regression
Analytical solution
1 2
1
𝐿 𝑤 = 𝑦 − 𝑋𝑇𝑤 2 = 𝑦 − 𝑋𝑇𝑤 2
𝑚 𝑚
𝐿(𝑤)
To minimize L(𝑤), =0
𝜕𝑤

Hence, 𝑤 could be calculated by:

𝑤 = 𝑋𝑋 𝑇 −1 𝑋𝑦

Applications of AI (ME3181) 16
Introduction to Gradient Descent
Numerical solution of Loss-function
𝜕𝐿(𝑤)
- Solving a cost function is to solve 𝜕𝑤 = 0
- In the case 𝑤 is a vector: ∇𝑤 𝐿 𝑤 = 0
- It may be difficult to find and solve analytical solutions

Gradient descent
𝑛𝑒𝑥𝑡 𝑐𝑢𝑟𝑟𝑒𝑛𝑡
𝑤 =𝑤 − 𝛼∇𝑤 𝐿 𝑤

Hyper parameter: Learning Rate

Applications of AI (ME3181) 17
Introduction to Gradient Descent
Batch Gradient Descent (or Gradient Descent – GD)
𝑛𝑒𝑥𝑡 𝑐𝑢𝑟𝑟𝑒𝑛𝑡
𝑤 =𝑤 − 𝛼∇𝑤 𝐿 𝑤
- The whole train data set is used to train
Stochastic Gradient Descent (SGD)
- Each random data sample is used per 1 update
Mini-batch Gradient Descent
- The whole dataset is split into small batches.

n features and m data samples


Applications of AI (ME3181) 18
Related forms
Log-linear
𝑤 𝑤 𝑤
ln 𝑥1 1 𝑥2 2 … 𝑥𝑛 𝑛 = 𝑤1 ln 𝑥1 + 𝑤2 ln 𝑥2 + ⋯ + 𝑤𝑛 ln 𝑥𝑛

Let ln 𝑥𝑖 → 𝑥′𝑖

https://analystprep.com/study-notes/cfa-level-2/linear-or-log-linear-model/
Applications of AI (ME3181) 19
Related forms
Polymial regression
𝑦 ≈ 𝑦ො = 𝑤𝑛 𝑥 𝑛 + ⋯ + 𝑤0 Let 𝑥 𝑖 → 𝑥𝑖

Degree

Degree = 2
?

Applications of AI (ME3181) 20
Model valuation
Overfitting and Underfitting

Underfitting

Overfitting

Applications of AI (ME3181) 21
Model valuation
Learning curves

Underfitting Overfitting

Applications of AI (ME3181) 22
Model valuation
Bias/ Variance trade-off
Error = Bias + Variance + Irreducible Noise

Bias: wrong assumption, select the wrong model


Variance: Model sensitivity to the small variation in the training data
Noise: From the data sources

Trade-off:
Increase a model’s complexity → Increase variance and Reduce Bias
Reduce a model’s complexity → Reduce variance and Increase Bias

Methods to alleviate the overfitting

‣Reduce the number of features (i.e., Leave only key features)


‣Perform regularization

Applications of AI (ME3181) 23
Regularization

Applications of AI (ME3181) 24
Regularization
Ridge Regression 𝑛
𝛼
𝐿𝑅𝑖𝑑𝑔𝑒 𝑤 = 𝐿 𝑤 + ෍ 𝑤𝑖2
2
𝑖=0

Lasso Regression
𝑛

𝐿𝐿𝑎𝑠𝑠𝑜 𝑤 = 𝐿 𝑤 + 𝛼 ෍ 𝑤𝑖
𝑖=0

Elastic Net
𝑛 𝑛
1−𝑟
𝐿𝐸𝑙𝑎𝑠𝑡𝑖𝑐𝑁𝑒𝑡 𝑤 = 𝐿 𝑤 + 𝑟𝛼 ෍ 𝑤𝑖 + 𝛼 ෍ 𝑤𝑖2
2
𝑖=0 𝑖=0

Applications of AI (ME3181) 25
Regularization
Early Stopping

Applications of AI (ME3181) 26

You might also like