Lesson 4 - Supervised Learning
Lesson 4 - Supervised Learning
ARTIFICIAL INTELLIGENCE
BUI NGOC DUNG
Information (if available)
An AI engineer took a Grab to visit his lover. However, unfortunately the Grab application crashed and could
not charge this person.
GRAB PROBLEM
Km Price
2 13
7 35
9 41
3 19
10 45
6 28
1 10
8 55
However, luckily, because the person ordering the car is an AI engineer and the application has saved this
person's travel history, he will be able to build a price prediction model based on the number of kilometers
traveled today.
TRAINING PROGRESS
TRAINING PROGRESS
𝑦ො 𝑦
Features Prediction Label
13 𝑖 2*2+3=7 13 𝑖
𝑥 𝑦
35 17 35
41 Model 21 Cost Function 41
8 19 9 19
data points 45 23 45
28 15 28
𝑦ො (𝑖) = ℎ𝜃 𝑥ො 𝑖
= 2𝑥 𝑖
+3
10 5 10
55 19 55
Initialized randomly
at the first step
DATA VISUALIZATION
Let me visualize
the data first
Km Price
2 13
7 35
9 41
3 19
10 45
6 28
1 10
8 55
OUTLIER
Let me visualize
the data first
Km Price
2 13
7 35
9 41
3 19
10 45
6 28
1 10
8 55
Set up a
HYPOTHESIS hypothetical
model
HYPOTHESIS hypothetical
model
Mean of y
(30.75)
ℎ𝜃 (𝑥) = 𝜃0 Km (x) Price (y)
𝜃1 = 0 2 13
7 35
With 1 data point 9 41
ℎ𝜃 𝑥 : ℝ → ℝ 3 19
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥 10 45
𝜃0 , 𝜃1 ∈ ℝ
6 28
1 10
8 55
Set up a
HYPOTHESIS hypothetical
model
Mean of y
(30.75)
ℎ𝜃 (𝑥) = 𝜃0 Km (x) Price (y)
𝜃1 = 0 2 13
7 35
With 1 data point 9 41
ℎ𝜃 𝑥 : ℝ → ℝ 3 19
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥 10 45
𝜃0 , 𝜃1 ∈ ℝ
6 28
1 10
8 55
HYPOTHESIS With 1 data point
ℎ𝜃 𝑥 : ℝ → ℝ
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
𝜃0 , 𝜃1 ∈ ℝ
Mean of y
(30.75)
ℎ𝜃 (𝑥) = 𝜃0 Km (x) Price (y)
𝜃1 = 0 2 13
7 35
9 41
distance from the line
to the first data point 3 19
(𝜃0 − 𝑦 (1) )
(𝑥 (1) , 𝑦 (1) )
10 45
6 28
(𝑥 (1) , 𝑦 (1) ) 1 10
Total distance = (𝜃0 − 𝑦 (1) ) 8 55
HYPOTHESIS With 1 data point
ℎ𝜃 𝑥 : ℝ → ℝ
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
𝜃0 , 𝜃1 ∈ ℝ
Mean of y
(30.75)
ℎ𝜃 (𝑥) = 𝜃0 Km (x) Price (y)
𝜃1 = 0
(𝑥 (2) , 𝑦 (2) ) 2 13
(𝜃0 − 𝑦 (2) ) 7 35
9 41
3 19
(𝑥 (2) , 𝑦 (2) )
10 45
6 28
1 10
Total distance = 𝜃0 − 𝑦 1 + (𝜃0 − 𝑦 (2) ) 8 55
HYPOTHESIS With 1 data point
ℎ𝜃 𝑥 : ℝ → ℝ
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
𝜃0 , 𝜃1 ∈ ℝ
𝜃0 − 𝑦 5 <0
Mean of y
(30.75)
ℎ𝜃 (𝑥) = 𝜃0 Km (x) Price (y)
𝜃1 = 0 2 13
(𝑥 (5) , 𝑦 (5) ) 7 35
9 41
3 19
(𝑥 (2) , 𝑦 (2) )
10 45
6 28
1 10
Total distance = 𝜃0 − 𝑦 1 + ⋯ + (𝜃0 − 𝑦 (5) ) 8 55
HYPOTHESIS With 1 data point
ℎ𝜃 𝑥 : ℝ → ℝ
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
𝜃0 , 𝜃1 ∈ ℝ
Mean of y
(30.75)
ℎ𝜃 (𝑥) = 𝜃0 Km (x) Price (y)
𝜃1 = 0 2 13
7 35
9 41
3 19
(𝑥 (8) , 𝑦 (8) ) 10 45
6 28
1 10
Total distance = 𝜃0 − 𝑦 1 + (𝜃0 − 𝑦 (2) ) + ⋯ + (𝜃0 − 𝑦 (8) ) 8 55
HYPOTHESIS With 1 data point
ℎ𝜃 𝑥 : ℝ → ℝ
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
𝜃0 , 𝜃1 ∈ ℝ
Mean of y
(30.75)
Total distance = 𝜃0 − 𝑦 1
+ (𝜃0 − 𝑦 (2) ) + ⋯ + (𝜃0 − 𝑦 (8) )
ℎ𝜃 (𝑥) = 𝜃0
𝜃1 = 0 This total distance represents the error between the
model's prediction and the actual label.
HYPOTHESIS
𝑦ො 𝑦
Features Prediction Label
13 30.75 13
35 30.75 35
41 Model 30.75 Cost Function 41
8 19 30.75 19
data points 45 30.75 45
28 30.75 28
𝑦ො (𝑖) = ℎ𝜃 𝑥ො 𝑖
= 30.75
10 30.75 10
55 30.75 55
Initialized randomly at
the first step
𝜃0 = 30.75 𝜃0 = 0
HYPOTHESIS
𝑦ො 𝑦
𝑦ො − 𝑦 (𝑦ො − 𝑦)2
Prediction Label
30.75 13 17.75 315.0625
30.75 35 4.25 18.0625
30.75 Cost Function 41 10.25 105.0625
30.75 19 11.75 138.0625
30.75 45 14.25 14.25
30.75 28 2.75 2.75
30.75 10 20.75 20.75
30.75 55 24.25 24.25
𝑦ො (𝑖) = ℎ𝜃 𝑥ො 𝑖 = 30.75
MODEL
❑ Input: 𝑥𝑖 ∈ ℝ𝑛 , 𝑖 = 1, … , 𝑚
❑ Output: 𝑦𝑖 ∈ ℝ (regression task)
❑ Model Parameters: 𝜃 ∈ ℝ𝑘
❑ Predicted Output: 𝑦ෝ𝑖 ∈ ℝ
LINEAR REGRESSION
❑ Pros: Easy to interpret results, computationally inexpensive
❑ Cons: Poorly models nonlinear data
❑ Work with: Numeric values, nominal values
GENERAL APPROACH TO REGRESSION
1. Collect: Any method.
2. Prepare: We’ll need numeric values for regression. Nominal values should be mapped to binary values.
3. Analyze: It’s helpful to visualized 2D plots. Also, we can visualize the regression weights if we apply shrinkage
methods.
4. Train: Find the regression weights.
5. Test: We can measure the R2, or correlation of the predicted value and data, to measure the success of our
models.
6. Use: With regression, we can forecast a numeric value for a number of inputs. This is an improvement over
classification because we’re predicting a continu ous value rather than a discrete category.
SINGLE VARIABLE REGRESSION
Regression problems pop up whenever we want to predict a numerical value.
NORMALIZE DATA
Regression problems pop up whenever we want to predict a numerical value.
WEIGHT
Regression problems pop up whenever we want to predict a numerical value.
BIAS
Regression problems pop up whenever we want to predict a numerical value.
LOSS FUNCTION
Regression problems pop up whenever we want to predict a numerical value.
MULTIPLE VARIABLES REGRESSION
Regression problems pop up whenever we want to predict a numerical value.
K-NEAREST NEIGHBORS
❑ Pros: High accuracy, insensitive to outliers, no assumptions about data
❑ Cons: Computationally expensive, requires a lot of memory
❑ Works with: Numeric values, nominal values
GENERAL APPROACH TO K-NEAREST NEIGHBORS
1. Collect: Any method.
2. Prepare: Numeric values are needed for a distance calculation. A structured data format is best.
3. Analyze: Any method.
4. Train: Does not apply to the k-NN algorithm.
5. Test: Calculate the error rate.
6. Use: This application needs to get some input data and output structured num eric values. Next, the
application runs the k-NN algorithm on this input data and determines which class the input data should
belong to. The application then takes some action on the calculated class.
PSEUDOCODE
𝐹𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑝𝑜𝑖𝑛𝑡 𝑖𝑛 𝑜𝑢𝑟 𝑑𝑎𝑡𝑎𝑠𝑒𝑡:
𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑖𝑛𝑋 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑝𝑜𝑖𝑛𝑡
𝑠𝑜𝑟𝑡 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑠 𝑖𝑛 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑖𝑛𝑔 𝑜𝑟𝑑𝑒𝑟
𝑡𝑎𝑘𝑒 𝑘 𝑖𝑡𝑒𝑚𝑠 𝑤𝑖𝑡ℎ 𝑙𝑜𝑤𝑒𝑠𝑡 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑠 𝑡𝑜 𝑖𝑛𝑋
𝑓𝑖𝑛𝑑 𝑡ℎ𝑒 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦 𝑐𝑙𝑎𝑠𝑠 𝑎𝑚𝑜𝑛𝑔 𝑡ℎ𝑒𝑠𝑒 𝑖𝑡𝑒𝑚𝑠
𝑟𝑒𝑡𝑢𝑟𝑛 𝑡ℎ𝑒 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦 𝑐𝑙𝑎𝑠𝑠 𝑎𝑠 𝑜𝑢𝑟 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝑜𝑓 𝑖𝑛𝑋
DISTANCE METRIC
Distance metrics are used in supervised and unsupervised learning to calculate similarity in data points.
ADVANCED ALGORITHMS
Regression problems pop up whenever we want to predict a numerical value.
DECISION TREES
Regression problems pop up whenever we want to predict a numerical value.
DECISION TREES
❑ Pros: Computationally cheap to use, easy for humans to understand learned results, missing values OK, can
deal with irrelevant features.
❑ Cons: Prone to overfitting
❑ Works with: Numeric values, nominal values
GENERAL APPROACH TO DECISION TREES
1. Collect: Any method.
2. Prepare: This tree-building algorithm works only on nominal values, so any continuous values will need to
be quantized.
3. Analyze: Any method. You should visually inspect the tree after it is built.
4. Train: Construct a tree data structure.
5. Test: Calculate the error rate with the learned tree.
6. Use: This can be used in any supervised learning task. Often, trees are used to better understand the data.
THANK YOU
INFORMATION (IF AVAILABLE)
Information (if available)