0% found this document useful (0 votes)
5 views13 pages

SLRG

The document discusses various machine learning techniques, focusing on supervised learning methods such as regression and the K-nearest neighbor (KNN) algorithm. It outlines the steps involved in applying KNN for regression problems, including data preparation, model fitting, and performance evaluation. Additionally, it highlights the importance of understanding model performance metrics like R2 and mean absolute error.

Uploaded by

hanyeelovesgod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views13 pages

SLRG

The document discusses various machine learning techniques, focusing on supervised learning methods such as regression and the K-nearest neighbor (KNN) algorithm. It outlines the steps involved in applying KNN for regression problems, including data preparation, model fitting, and performance evaluation. Additionally, it highlights the importance of understanding model performance metrics like R2 and mean absolute error.

Uploaded by

hanyeelovesgod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

035.

001 Spring, 2024

Digital Computer Concept and Practice


Supervised Learning (2)

Soohyun Yang

College of Engineering
Department of Civil and Environmental Engineering
Types of ML techniques – All learning is learning!
Our scope


Classification
“Presence of labels”
Advertisement popularity •
“Absence of labels”
Recommender systems (YT) •
“Behavior-driven : feedback loop”
Learning to play games (AlphaGo)
• Spam classification • Clustering
Buying habits (group customers) • Industrial simulation
• Regression
Face recognition • Grouping user logs • Resource management

https://towardsdatascience.com/what-are-the-types-of-machine-learning-e2b9e5d1756f
Regression
 A statistical method to determine the relationship between
a dependent variable (target) and one or more independent
variables (features), predicting a target value on a continuous scale
for a given new data.

 Algorithms of our scope


• K-nearest neighbor (KNN)
• Linear regression (LR) => Simple, Polynomial, Multiple
• Ridge regression
Regularization
• Lasso regression
• Decision trees === // Ensemble // ===> Random forest
KNN algorithm – for Regression
 Principles:
1. Choose the (odd) number of k and a distance metric (Euclidean in general).
2. Calculate a distance from a target data to all training data points.
3. Find the k-nearest neighbors from the target data.
4. Determine a predicted value by averaging target values of the nearest neighbors.
k=1 k=3

Mueller & Guido (2017)


KNN algorithm – Regression problem
 Let’s apply for the KNN algorithm
to resolve a regression problem.
 1. Data preparation & import :
InClassData_Traffic.csv
Input
Feature 1 Target

Samples
KNN algorithm – Regression problem (con’t)
 2. Data separation into the
training and test sets
• random_state [integer] : A parameter
for the random number generator.
• DO NOT NEED ‘Stratification’ process
for regression problem.

>>Tip : Index ‘-1’ in the reshape() function


means that its length is determined after  3. Reshape 1-D training sets
satisfying already user-defined dimension. as 2-D array
>>Note : Training sets must be ‘2-D array’
format to use the sci-kit learn library.
KNN algorithm – Regression problem (con’t)
 4. Import ‘KNeighborsRegressor’
class and create its instance.
• n_neighbors [integer] : A parameter
to set the number of neighbors.
KNN algorithm – Regression problem (con’t)

=> On average, the


model predictions are
~87 % of the target
values in the test set!

 4. Fit the regression model using the training set (fit method).
=> Storing the training set to compute neighbors during prediction.
 5. Make predictions on the test data (predict method).
 6. Evaluate the model’s performance (score method
=> via the coefficient of determination, R2, 결정계수).
• 0 ≤ R2 ≤ 1 : Higher value => Better performance in predicting the test set’s outcomes.
KNN algorithm – Regression problem (con’t)

83.000
=> On average, the
predicted targets differ
from the real ones as
many as ~83 vehicles/hr.

 [Tip] Let’s estimate a complementary metric to understand the model’s


performance more intuitively.
=> Mean absolute error [mae] : Averaging the absolute difference between
the predicted and the real target values for a given data.
KNN algorithm – Regression problem (con’t)
 Let’s evaluate whether the model trained with k=5 became over-
or under-fitted.
Greater k, Simpler model (DB),
Less sensitive to noise in the data

 What do you get with [knr.n_neighbors = 21]?


Trained model application to a new data1
 Let’s predict a target 847.8
value for a new data
with [feature1 = 50].
Does the result satisfy you?
If not, why?

847.8
Trained model application to a new data2
 Let’s predict a target value for the other new data with [feature1 = 100].
 The predicted outcome is 847.8, which is identical with the new data1.
=> Does it make sense? Why did it happen? How can we resolve it?
Take-home points (THPs)
-
-
-
…

You might also like