0% found this document useful (0 votes)
10 views30 pages

Digital Computer Concept and Practice: Supervised Learning

The document provides an overview of supervised learning (SL) in machine learning, detailing its techniques such as classification, regression, and clustering. It discusses the importance of labeled data, the bias-variance trade-off, and various supervised learning algorithms, including K-nearest neighbor (KNN). Additionally, it covers feature scaling methods and their impact on model performance.

Uploaded by

hanyeelovesgod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views30 pages

Digital Computer Concept and Practice: Supervised Learning

The document provides an overview of supervised learning (SL) in machine learning, detailing its techniques such as classification, regression, and clustering. It discusses the importance of labeled data, the bias-variance trade-off, and various supervised learning algorithms, including K-nearest neighbor (KNN). Additionally, it covers feature scaling methods and their impact on model performance.

Uploaded by

hanyeelovesgod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

035.

001 Spring, 2024

Digital Computer Concept and Practice


Supervised Learning (1)

Soohyun Yang

College of Engineering
Department of Civil and Environmental Engineering
Types of ML techniques – All learning is learning!
Our scope


Classification
“Presence of labels”
Advertisement popularity •
“Absence of labels”
Recommender systems (YT) •
“Behavior-driven : feedback loop”
Learning to play games (AlphaGo)
• Spam classification • Clustering
Buying habits (group customers) • Industrial simulation
• Regression
Face recognition • Grouping user logs • Resource management

https://towardsdatascience.com/what-are-the-types-of-machine-learning-e2b9e5d1756f
Supervised Learning (SL)
 A sub-category of machine learning to train algorithms for
predicting outcomes or classifying data via the use of labeled data.
=> “Regression” => “Classification”
 Each sample should be a pair of an input object (입력, typically a
vector) and a target value (타깃, i.e., label, supervisory signal).
 The input object consists of multiple features (속성, generally # > 1).
Input
Feature 1 Feature 2 Feature 3 Target
Traffic volume Number of lane Size of city Congestion
2500 4 Small No
4000 6 Big No
Samples 20000 6 Big Yes
… … … …
50000 6 Big Yes
Supervised Learning (con’t)
 Our goal is to get a generalized learning
model, which accurately predicts a new
dataset.
 Otherwise, the model gets under-fitted
or over-fitted (과소적합 or 과대적합).

https://labelyourdata.com/articles/machine-learning-and-training-data
 To achieve the goal, sample dataset
should be randomly divided into two
parts (Training : Test = 7.5 : 2.5, default).
• Training set : to build a learning model.
• Test set (new dataset) : to evaluate the

Mueller & Guido (2017);


trained model’s performance.
Under- and over-fitting in supervised learning
 Underfitted model is too simple, failing to create a relation between
the input and the output
=> high bias (low accuracy; 정확도)
 Overfitted model is too complex, trying to fit entire training data and
its noise fluctuation => high variance (low precision; 정밀도)

https://www.mathworks.com/discovery/overfitting.html
http://scott.fortmann-roe.com/docs/BiasVariance.html
Bias-Variance trade-off
 A supervised learning’s dilemma between
‘accurate capture of the regularities in the training data’ and
‘well generalization to unseen data’.
 Bias represents how much a SL model’s mean accuracy varies
as the training data is changed.
 Variance indicates how sensitive a SL model is to a given specific input
data.

©Wikipedia
Supervised Learning Algorithms (SLAs)
 A wide range of SLAs are available and applicable to regression
and/or classification problems.
 But, there is no single SLA which works best on all SL problems.
=> Selection of SLA is dependent on the types of problem and data.
 Each SLA has its own strengths and weaknesses.
 Representative examples:
• K-nearest neighbor (KNN)
• Linear models
• Decision trees === // Ensemble // ===> Random forest
• Naïve Bayes classifiers
• Support vector machines (SVM)
Advantages and disadvantages of SLAs (I)
Algorithm
(CL = classification; RG = regression)
Advantages Disadvantages
- Very easy to understand - Slow and poor prediction for
- Very fast to build the model for small many number of features or
number of samples samples (>100)
K-nearest neighbor - Good starting point before executing - Bad performance for sparse
(CL & RG) advanced techniques datasets (i.e., most features are 0
most of the time)

Linear models (CL & RG) - Relatively easy to understand the - Often unclear to interpret the
- [CL] Logistic regression / prediction procedure values of model coefficients
Linear support vector - Very fast to train & predict
classifier - Work well with large and/or sparse
- [RG] Linear / Ridge / datasets
Lasso
Classification
 Categorize data into representative distinct classes or groups,
predicting the labels for new, unseen data.

 Algorithms of our scope


• K-nearest neighbor (KNN)
K-nearest neighbor (KNN) algorithm – for Classification
 The simplest, non-parametric SL algorithm
 A lazy learner – Memorizing the training set, instead of learning a
discriminative function from it.
 Principles:
1. Choose the (odd) number of k and a distance metric (Euclidean in general).
2. Calculate a distance from a target data to all training data points.
3. Find the k-nearest neighbors from the target data.
4. Assign the class label by majority voting among the nearest neighbors.

https://www.datacamp.com/tutorial/k-
nearest-neighbor-classification-scikit-learn
KNN algorithm – Classification problem
 Let’s apply for the KNN algorithm
to resolve a classification problem.
 1. Data preparation & import :
InClassData_Traffic.csv
Input
Feature 1 Feature 2 Target

Samples
KNN algorithm – Classification problem (con’t)
 2. Data separation into the
training and test sets
• random_state [integer] : A parameter
for the random number generator.
=> To ensure that we get the same split
every time we run the code.

In this situation, what problem can occur?


KNN algorithm – Classification problem (con’t)
 2. Data separation into the
training and test sets
• random_state [integer] : A parameter
for the random number generator.
⇒To ensure that we get the same split
every time we run the code.

• stratify [array] : A parameter for the


random number generator.
=> To ensure classes in the target are
distributed in a similar way in both the
training and testing sets (as in the
original set)
=> To avoid biased model performance.
KNN algorithm – Classification problem (con’t)
 3. Import ‘KNeighborsClassifier’
class and create its instance.
• n_neighbors [integer] : A parameter
to set the number of neighbors.
=> Odd number is recommended.
KNN algorithm – Classification problem (con’t)

=> The model correctly


predicted 100 % of the
samples in the test set!

 4. Fit the classifier using the training set (fit method).


=> Storing the training set to compute neighbors during prediction.
 5. Make predictions on the test data (predict method).
 6. Evaluate the prediction’s accuracy (score method
=> the mean accuracy of the predictions on the test data).
• 0 ≤ accuracy ≤ 1 : Higher value => Better performance in classifying the test set.
Trained model application to a new data
 Let’s classify a new data [2]
with [feature1, feture2]
= [25, 150].
[[2 2 2 1 2]]

Did the model result a correct


classification (Type2) ?
(Feature 2)

(Feature 1)
Feature scaling
 The majority of ML algorithms behave much better if features are on
the same scale.
 Methods : 1) Standardization, 2) Min-max scaling, 3) Robust scaling

Did the model result a correct Not sure, because...


classification (Type2) ? the feature 1 is almost neglected to determine NNs.
(Feature 2)

(Feature 2)

(Feature 1) (Feature 1)
Feature scaling 1 : Standardization
 Center the feature columns at mean 0 with standard deviation 1
=> A standard normal distribution
 Easier to learn the weights of individual features.
 Maintains useful info about outliers.
 Makes the algorithm less sensitive to outliers.

i x − µx
i
x strd =
σx
where µx and σx indicate the mean and the standard deviation of a feature x in the training set.
Feature scaling 1 : Standardization (con’t)
 Import ‘StandardScaler’
class and create its
instance.
 Standardize all samples
& new data based on
the mean & std. of the
training set.
 Newly run a KNN (k=5)
with the standardized
data.

 Check the result!


[1]
[1 1 1 1 1]
Feature scaling 1 : Standardization (con’t)
 Visualize the result
of k = 5 & save the
figure.
Feature scaling 1 : Standardization (con’t)
 Decision boundary (DB):
=>The set of points in the feature
space where the class assignment
changes.
=>Its shape depends on the k’s
value and the geometry of the
data.
Feature scaling 1 : Standardization (con’t)
 [Observation] Effects of varying k
on the performance accuracy,
the bias&variance, and
the decision boundary

Greater k,
Simpler model (DB),
Less sensitive to
noise in the data
Feature scaling 1 : Standardization (con’t)
 [For loop application]
Create each DB as k varies
Feature scaling 1 : Standardization (con’t)
 [For loop application]
Calculate the accuracy,
as k varies
 Define a KNN model in the
for-loop.
 Append the score values in
the list.
Feature scaling 1 : Standardization (con’t)
 [For loop application]
Calculate the bias & variance,
as k varies
 Define a KNN model in the
for-loop.
 Append the ‘averaged’ bias
& variance values in the list.
Feature scaling 2 : Min-max scaling
 Rescale the features to a range of [0, 1].
 Useful when we need values in a bounded interval.

i
i x − xmin
x norm =
xmax − xmin
where xmin and xmax indicate the minimum and maximum of a feature x in the training set.
Feature scaling 2 : Min-max scaling (con’t)
 Import ‘MinMaxScaler’ class and
create its instance.
 Min-max scale all samples &
new data based on the mean &
std. of the training set.
 Newly run a KNN (k=5) with the
min-max scaled data.
Feature scaling 3 : Robust scaling
 Extreme values and outliers get less pronounced.
 Useful when we work with small datasets that contain many outliers.

i
i x − q2
x rbst =
q3 − q1
where q1, q2 and q3 indicate the 1st , 2nd, and 3rd quartiles of a feature x in the training set.
Feature scaling 3 : Robust scaling (con’t)
 Import ‘RobustScaler’ class and
create its instance.
 Robust scale all samples & new
data based on the quartiles of
the training set.
 Newly run a KNN (k=5) with the
robust scaled data.
Take-home points (THPs)
-
-
-
…

You might also like