Machine Learning Models
Machine Learning Models
Dr. Rajavel R
Presentation Outline
▪ Introduction to ML Models
▪ Supervised ML Models
▪ Unsupervised ML Models
▪ ML Model selection
▪ Loss function
▪ Performance measure
1. K-means clustering
2. KNN (k-nearest neighbors)
3. Hierarchal clustering
4. Neural Networks
5. Principle Component Analysis
6. Independent Component Analysis
7. Apriori algorithm
8. Singular value decomposition
ML Models Selection
What is ML Model Selection?
▪ Model selection is the process of selecting one final machine learning model from
among a collection of candidate machine learning models for a training dataset.
▪ Selecting a model depends on various factors such as the dataset, task, nature of the
model etc.
Logistic Regression
ML Models Selection
ML Model can be Selected based on:
▪ Dataset:
1. Image & Video – CNN
2. Speech, Text & Time series – RNN
3. Numerical data – Regression, SVM, Decision Tree etc.
▪ Task (Application)
1. Classification – Logistic Regression, Decision Tree, KNN, SVM, etc.
2. Regression – Linear Regression, Random Forest, Polynomial regression etc.
3. Clustering - K-means clustering, KNN (k-nearest neighbors), Hierarchal clustering etc.
ML Models Selection – Cross Validation
Bias
Target Value
▪ Loss functions are classified into two classes based on the type of learning task, i.e.
Regression loss and Classification loss
Loss Function for Regression
▪ Regression involves predicting a specific value that is continuous in nature.
▪ Estimating the price of a house or predicting stock prices are examples of regression
because that would predict a real-valued quantity.
▪ Commonly used loss function for regression are:
▪ Mean Absolute Error (MAE)
▪ Mean Squared Error (MSE)
▪ Mean Bias Error (MBE)
▪ Mean Squared Logarithmic Error (MSLE)
Mean Absolute Error (MAE)
▪ Mean Absolute Error (also called L1 loss) is one of the most simple yet robust loss
functions used for regression models.
▪ Mean Absolute Error would be an ideal option for the data have considerable outliers
▪ MAE takes the average sum of the absolute differences between the actual and the
predicted values.
▪ For a data point xi and its predicted value yi, n being the total number of data points
in the dataset, the mean absolute error is defined as:
Mean Squared Error (MSE)
▪ Mean Squared Error (also called L2 loss) is almost every data scientist’s preference when
it comes to loss functions for regression.
▪ This is because most variables can be modeled into a Gaussian distribution.
▪ Mean Squared Error is the average of the squared differences between the actual and
the predicted values.
▪ For a data point Yi and its predicted value Ŷi, where n is the total number of data points
in the dataset, the mean squared error is defined as:
Huber Loss/Smooth Mean Absolute Error
▪ The Huber loss function is defined as the combination of MSE and MAE loss functions
because it approaches MSE when 𝛿 ~ 0 and MAE when 𝛿 ~ ∞
▪ Huber Loss is characterized by the parameter delta (𝛿) and is formulated as:
▪ For huge errors, it is linear and for small errors, it is quadratic in nature.
▪ The choice of the delta value determines what you’re willing to consider an outlier.
▪ Hence, the Huber loss function could be less sensitive to outliers than the MSE loss
function, depending on the hyperparameter value.
▪ Therefore, you can use the Huber loss function if the data is prone to outliers.
ML Models – Mean Bias Error (MBE)
▪ Mean Bias Error is used to calculate the average bias in the model.
▪ Mean Bias Error takes the actual difference between the target and the predicted value,
and not the absolute difference.
▪ One has to be cautious as the positive and the negative errors could cancel each other
out, which is why it is one of the lesser-used loss functions.
▪ The formula of Mean Bias Error is:
▪ Where yi is the true value, ŷi is the predicted value and ’n’ is the total number of data
points in the dataset.
Mean Squared Logarithmic Error (MBE)
▪ Relaxing the penalty on huge differences can be done with the help of Mean Squared
Logarithmic Error
▪ Calculating the Mean Squared Logarithmic Error is the same as Mean Squared Error,
except the natural logarithm of the predicted values is used rather than the actual
values.
▪ Where yi is the true value, ŷi is the predicted value and ’n’ is the total number of data
points in the dataset.
Loss Function for Classification
▪ Classification problems involve predicting a discrete class output.
▪ It involves dividing the dataset into different and unique classes based on different
parameters so that a new and unseen record can be classified into one of the classes.
▪ Ex: Email - spam or not, Person’s dietary preferences - vegetarian or non-vegetarian
▪ Some of the loss functions for classification problems are:
▪ Binary Cross-Entropy Loss / Log Loss
▪ Hinge Loss
Binary Cross Entropy Loss
▪ This is the most common loss function used in classification problems.
▪ The cross-entropy loss decreases as the predicted probability converges to the actual
label.
▪ It measures the performance of a classification model whose predicted output is a
probability value between 0 and 1
▪ When the number of classes is 2, it is binary classification
▪ Hinge loss penalizes the wrong predictions and the right predictions that are not
confident.
▪ It’s primarily used with SVM classifiers with class labels as -1 and 1, Make sure you
change your malignant class labels from 0 to -1
Performance Metrics in ML
▪ Performance metrics are a part of every machine learning pipeline.
▪ Every machine learning task broadly classified into either Regression or Classification
▪ Let us discuss some of the popular ones in each category
Regression metrics: Classification metrics:
Regression models have continuous Classification models have discrete
output. So, we need a metric based on output, so we need a metric that
calculating some sort of distance between compares discrete classes in some form.
predicted and ground truth. • Accuracy
• Mean Absolute Error (MAE), • Confusion Matrix
• Mean Squared Error (MSE), • Precision and Recall
• Root Mean Squared Error (RMSE), • F1-score
• R² (R-Squared). • AU-ROC
Performance Metrics in ML
Mean Squared Error (MSE)
▪ Mean squared error is perhaps the most popular metric used for regression problems.
▪ It essentially finds the average of the squared difference between the target value
and the value predicted by the regression model.
Where:
• y_j: ground-truth value
• y_hat: predicted value from the regression model
• N: number of datums
Performance Metrics in ML
Mean Absolute Error (MAE)
▪ Mean Absolute Error is the average of the difference between the ground truth and
the predicted values.
▪ Mathematically, its represented as :
Where:
• y_j: ground-truth value
• y_hat: predicted value from the regression model
• N: number of datums
Performance Metrics in ML
Root Mean Squared Error (RMSE)
▪ Root Mean Squared Error corresponds to the square root of the average of the
squared difference between the target value and the value predicted by the
regression model.
Where:
• y_j: ground-truth value
• y_hat: predicted value from the regression model
• N: number of datums
Performance Metrics in ML
R² Coefficient of determination
▪ R² Coefficient of determination actually works as a post metric, meaning it’s a metric
that’s calculated using other metrics.
▪ The point of even calculating this coefficient is to answer the question “How much
(what %) of the total variation in Y(target) is explained by the variation in
X(regression line)”
▪ Total variation in Y (Variance of Y):
▪ 0<P<1, The precision metric focuses on Type-I errors(FP), that is incorrectly labeling
cancer patients as non-cancerous
▪ A precision score towards 1 will signify that your model didn’t miss any true positives,
and is able to classify well between correct and incorrect labeling of cancer patients.
▪ A low precision score (<0.5) means your classifier has a high number of false positives
which can be an outcome of imbalanced class or untuned model hyperparameters.
Performance Metrics in ML
Recall/Sensitivity/Hit-Rate:
▪ A Recall is essentially the ratio of true positives to all the positives in ground truth.
▪ 0<R<1, The recall metric focuses on type-II errors(FN), that is incorrectly labeling non-
cancerous patients as cancerous.
▪ Recall towards 1 will signify that your model didn’t miss any true positives, and is able
to classify well between correctly and incorrectly labeling of cancer patients.
▪ A low recall score (<0.5) means your classifier has a high number of false negatives
which can be an outcome of imbalanced class or untuned model hyperparameters.
Performance Metrics in ML
Specificity:
▪ It is defined as Percentage of negative instances out of the total actual negative
instances.
▪ The denominator (TN + FP) here is the actual number of negative instances present in
the dataset.
▪ It is similar to recall but the shift is on the negative instances.
▪ Like finding out how many healthy patients were not having cancer and were told they
don’t have cancer.
Performance Metrics in ML
F1 Score (Precision-Recall tradeoff):
▪ To improve our model, we can either improve precision or recall – but not both!
▪ This tradeoff highly impacts real-world scenarios, That’s why people use metric that is a
combination of precision and recall.
▪ The F1-score metric uses a combination of precision and recall.
▪ A high F1 score symbolizes a high precision as well as high recall.
▪ It presents a good balance between precision and recall and gives good results on
imbalanced classification problems.
▪ A low F1 score tells you (almost) nothing — it’s unclear what the problem is (low
precision or low recall?), and whether the model suffers from type-I or type-II error.
Performance Metrics in ML
AUROC (Area under Receiver operating characteristics curve):
▪ AU-ROC makes use of true positive rates(TPR) and false positive rates(FPR).
▪ Intuitively TPR/recall corresponds to the proportion of positive data points that are
correctly considered as positive, with respect to all positive data points.
▪ In other words, the higher the TPR, the fewer positive data points we will miss.
▪ Intuitively FPR/fallout corresponds to the proportion of negative data points that are
mistakenly considered as positive, with respect to all negative data points.
▪ In other words, the higher the FPR, the more negative data points we will misclassify.
Performance Metrics in ML
AUROC (Area under Receiver operating characteristics curve):
▪ Compute the two former metrics with many different thresholds for the logistic
regression, then plot them on a single graph.
▪ The resulting curve is called the ROC curve, and the metric we consider under this curve
is called AUROC.
Performance Metrics in ML
AUROC (Area under Receiver operating characteristics curve):
▪ A no-skill classifier is one that can’t discriminate between the classes, and would predict
a random class or a constant class in all cases.
▪ It’s a horizontal line with the value of the ratio of positive cases in the dataset. For a
balanced dataset, it’s 0.5.
▪ High ROC simply means that the probability of a randomly chosen positive example is
indeed positive.
▪ High ROC also means your algorithm does a good job at ranking test data, with most
negative cases at one end of a scale and positive cases at the other.
▪ ROC curves aren’t a good choice when your problem has a huge class imbalance.
Model & Hyperparameter Parameter
▪ In a machine learning model, there are 2 types of parameters:
1. Model Parameters
2. Hyperparameters
Model Parameters:
Model parameters are configuration variables that are internal to the model, and a model
learns them on its own during training.
Example:
Weights in the Linear regression,
Weight, and biases of a neural network,
Cluster centroid in clustering
Model & Hyperparameter Parameter
Hyper Parameters:
Hyperparameters are those parameters that are explicitly defined by the user to control the
learning process and obtain a model with optimal performance
Example:
Learning Rate
Batch Size
Number of Epochs
A number of Hidden Units
Number of Layers
Session Summary
In this session we have learned,
▪ ML Models: Supervised ML Models & Unsupervised ML Models
▪ ML Model selection
▪ Loss function
▪ Performance measure