Digital Computer Concept and Practice: Supervised Learning
Digital Computer Concept and Practice: Supervised Learning
Soohyun Yang
College of Engineering
Department of Civil and Environmental Engineering
Types of ML techniques – All learning is learning!
Our scope
•
Classification
“Presence of labels”
Advertisement popularity •
“Absence of labels”
Recommender systems (YT) •
“Behavior-driven : feedback loop”
Learning to play games (AlphaGo)
• Spam classification • Clustering
Buying habits (group customers) • Industrial simulation
• Regression
Face recognition • Grouping user logs • Resource management
https://towardsdatascience.com/what-are-the-types-of-machine-learning-e2b9e5d1756f
Supervised Learning (SL)
A sub-category of machine learning to train algorithms for
predicting outcomes or classifying data via the use of labeled data.
=> “Regression” => “Classification”
Each sample should be a pair of an input object (입력, typically a
vector) and a target value (타깃, i.e., label, supervisory signal).
The input object consists of multiple features (속성, generally # > 1).
Input
Feature 1 Feature 2 Feature 3 Target
Traffic volume Number of lane Size of city Congestion
2500 4 Small No
4000 6 Big No
Samples 20000 6 Big Yes
… … … …
50000 6 Big Yes
Supervised Learning (con’t)
Our goal is to get a generalized learning
model, which accurately predicts a new
dataset.
Otherwise, the model gets under-fitted
or over-fitted (과소적합 or 과대적합).
https://labelyourdata.com/articles/machine-learning-and-training-data
To achieve the goal, sample dataset
should be randomly divided into two
parts (Training : Test = 7.5 : 2.5, default).
• Training set : to build a learning model.
• Test set (new dataset) : to evaluate the
https://www.mathworks.com/discovery/overfitting.html
http://scott.fortmann-roe.com/docs/BiasVariance.html
Bias-Variance trade-off
A supervised learning’s dilemma between
‘accurate capture of the regularities in the training data’ and
‘well generalization to unseen data’.
Bias represents how much a SL model’s mean accuracy varies
as the training data is changed.
Variance indicates how sensitive a SL model is to a given specific input
data.
©Wikipedia
Supervised Learning Algorithms (SLAs)
A wide range of SLAs are available and applicable to regression
and/or classification problems.
But, there is no single SLA which works best on all SL problems.
=> Selection of SLA is dependent on the types of problem and data.
Each SLA has its own strengths and weaknesses.
Representative examples:
• K-nearest neighbor (KNN)
• Linear models
• Decision trees === // Ensemble // ===> Random forest
• Naïve Bayes classifiers
• Support vector machines (SVM)
Advantages and disadvantages of SLAs (I)
Algorithm
(CL = classification; RG = regression)
Advantages Disadvantages
- Very easy to understand - Slow and poor prediction for
- Very fast to build the model for small many number of features or
number of samples samples (>100)
K-nearest neighbor - Good starting point before executing - Bad performance for sparse
(CL & RG) advanced techniques datasets (i.e., most features are 0
most of the time)
Linear models (CL & RG) - Relatively easy to understand the - Often unclear to interpret the
- [CL] Logistic regression / prediction procedure values of model coefficients
Linear support vector - Very fast to train & predict
classifier - Work well with large and/or sparse
- [RG] Linear / Ridge / datasets
Lasso
Classification
Categorize data into representative distinct classes or groups,
predicting the labels for new, unseen data.
https://www.datacamp.com/tutorial/k-
nearest-neighbor-classification-scikit-learn
KNN algorithm – Classification problem
Let’s apply for the KNN algorithm
to resolve a classification problem.
1. Data preparation & import :
InClassData_Traffic.csv
Input
Feature 1 Feature 2 Target
Samples
KNN algorithm – Classification problem (con’t)
2. Data separation into the
training and test sets
• random_state [integer] : A parameter
for the random number generator.
=> To ensure that we get the same split
every time we run the code.
(Feature 1)
Feature scaling
The majority of ML algorithms behave much better if features are on
the same scale.
Methods : 1) Standardization, 2) Min-max scaling, 3) Robust scaling
(Feature 2)
(Feature 1) (Feature 1)
Feature scaling 1 : Standardization
Center the feature columns at mean 0 with standard deviation 1
=> A standard normal distribution
Easier to learn the weights of individual features.
Maintains useful info about outliers.
Makes the algorithm less sensitive to outliers.
i x − µx
i
x strd =
σx
where µx and σx indicate the mean and the standard deviation of a feature x in the training set.
Feature scaling 1 : Standardization (con’t)
Import ‘StandardScaler’
class and create its
instance.
Standardize all samples
& new data based on
the mean & std. of the
training set.
Newly run a KNN (k=5)
with the standardized
data.
Greater k,
Simpler model (DB),
Less sensitive to
noise in the data
Feature scaling 1 : Standardization (con’t)
[For loop application]
Create each DB as k varies
Feature scaling 1 : Standardization (con’t)
[For loop application]
Calculate the accuracy,
as k varies
Define a KNN model in the
for-loop.
Append the score values in
the list.
Feature scaling 1 : Standardization (con’t)
[For loop application]
Calculate the bias & variance,
as k varies
Define a KNN model in the
for-loop.
Append the ‘averaged’ bias
& variance values in the list.
Feature scaling 2 : Min-max scaling
Rescale the features to a range of [0, 1].
Useful when we need values in a bounded interval.
i
i x − xmin
x norm =
xmax − xmin
where xmin and xmax indicate the minimum and maximum of a feature x in the training set.
Feature scaling 2 : Min-max scaling (con’t)
Import ‘MinMaxScaler’ class and
create its instance.
Min-max scale all samples &
new data based on the mean &
std. of the training set.
Newly run a KNN (k=5) with the
min-max scaled data.
Feature scaling 3 : Robust scaling
Extreme values and outliers get less pronounced.
Useful when we work with small datasets that contain many outliers.
i
i x − q2
x rbst =
q3 − q1
where q1, q2 and q3 indicate the 1st , 2nd, and 3rd quartiles of a feature x in the training set.
Feature scaling 3 : Robust scaling (con’t)
Import ‘RobustScaler’ class and
create its instance.
Robust scale all samples & new
data based on the quartiles of
the training set.
Newly run a KNN (k=5) with the
robust scaled data.
Take-home points (THPs)
-
-
-
…