0% found this document useful (0 votes)

5 views24 pages

Performance Evaluation

The document discusses performance evaluation in machine learning, focusing on how models are assessed using various metrics for classification and regression tasks. It highlights key concepts such as accuracy, precision, recall, and different evaluation metrics like MAE and RMSE. Additionally, it explores the transition from data distributions to graph-based representations, emphasizing their importance in modeling complex relationships in real-world data.

Uploaded by

appubysani15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views24 pages

Performance Evaluation

Uploaded by

appubysani15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Machine Learning

Performance Evaluation
&
From Distributions to Graphs

Presented By
Dhanyatha
Performance Evaluation
Performance Evaluation

Performance evaluation in machine learning refers to the process of assessing how

well a machine learning model performs on a given task. It helps determine if the
model is accurate, reliable, and checks if the model generalizes well to unseen data

How performance of a model is evaluated?

Performance in machine learning is evaluated by testing the model on unseen
data (test set) and using evaluation metrics. They help compare models, tune
parameters, and determine if a model is suitable for real-world deployment.
Types Of Predictive Models

1. Classiﬁcation Metrics: 2. Regression Evaluation Metrics:

i. Accuracy i. Mean Absolute Error (MAE)

ii. Logarithmic Loss ii. Mean Squared Error (MSE)
iii. Root Mean Square Error (RMSE)
iii. Area Under Curve (AUC)
iv. Precision iv. Root Mean Squared Logarithmic

v. Ecall Error (RMSLE)

vi. F1 Score v. R2 – Score

vii. Confusion Matrix

Classiﬁcation Metrics
Accuracy
Accuracy is a fundamental metric for evaluating the performance of a classiﬁcation model, providing
a quick snapshot of how well the model is performing in terms of correct predictions. It is calculated as
the ratio of correct predictions to the total number of input samples.
Accuracy = No.of correct predictions
Total number of input samples

It works great if there are an equal number of samples for each class. For example, we have a 90%
sample of class A and a 10% sample of class B in our training set. Then, our model will predict with an
accuracy of 90% by predicting all the training samples belonging to class A. If we test the same model
with a test set of 60% from class A and 40% from class B. Then the accuracy will fall, and we will get an
accuracy of 60%.
Logarithmic Loss

Log loss penalizes the false (false positive) classification. It usually works well with multi-class classification.
Working on Log loss, the classifier should assign a probability for each and every class of all the samples. If there
are N samples belonging to the M class, then we calculate the Log loss in this way:

● where,
● N : no. of samples.
● M : no. of attributes.
● yij : indicates whether ith sample belongs to jth class or not.
● pij : indicates probability of ith sample belonging to jth class.
Area Under Curve (AUC)

It is one of the widely used metrics and basically used for binary classification. The AUC of a classifier is
defined as the probability of a classifier will rank a randomly chosen positive example higher than a
negative example.
few basic terms:
True Positive Rate:
Also called or termed sensitivity. True Positive Rate is considered as a portion of positive data points
that are correctly considered as positive, with respect to all data points that are positive.
TPR = TP
TP + FN
True Negative Rate
Also called or termed specificity. True Negative Rate is considered as a portion of negative data points that
are correctly considered as negative, with respect to all data points that are negatives.

TNR = TN
TN+FP
False Positive Rate
False Negatives rate is actually the proportion of actual positives that are incorrectly identiﬁed as negatives

FPR = FP

FP+TN
False Positive Rate and True Positive Rate both have values in the range [0, 1].

AUC is a curve plotted between False Positive Rate Vs True Positive Rate at all different data points with a range of
[0, 1]. Greater the value of AUCC better the performance of the model.
Precision: There is another metric named Precision. Precision is a measure of a model’s performance
that tells you how many of the positive predictions made by the model are actually correct.

Precision = TP
TP + FP

Recall : Recall is the ratio of correctly predicted positive instances to the total actual positive instances.
It measures how well the model captures all relevant positive cases.

Recall = TP
TP +FN
F1 Score: F1-Score is a harmonic mean between recall and precision. Its range is [0,1]. This metric usually
tells us how precise (correctly classifies how many instances) and robust (does not miss any significant
number of instances) our classifier is.
Lower recall and higher precision give you great accuracy but then it misses a large number of instances.
The more the F1 score better will be performance. It can be expressed mathematically in this way:

F1 = 2∗ 1
Precision + recall
Confusion Matrix
Confusion matrix creates a N X N matrix, where N is the number of classes or categories that are to be
predicted. Here we have N = 2, so we get a 2 X 2 matrix. Suppose there is a problem with our practice
which is a binary classification. Samples of that classification belong to either Yes or No. So, we build our
classifier which will predict the class for the new input sample. After that, we test the model with 165
samples, and we get the following result.

There are 4 terms needs to be keep in mind:

1. True Positives: It is the case where we predicted Yes and the real output was also Yes.
2. True Negatives: It is the case where we predicted No and the real output was also No.
3. False Positives: It is the case where we predicted Yes but it was actually No.
4. False Negatives: It is the case where we predicted No but it was actually Yes.
The accuracy of the matrix is always calculated by taking average values present in the main diagonal i.e.
Accuracy = True Positive + True Negative
Total Samples
Accuracy = 100+50
165
Accuracy = 0.91
Regression Evaluation Metrics
In the regression task, we predict the target variable which is in the form of continuous values.
Mean Absolute Error(MAE) :
Is the average distance between predicted and original values. Basically, it gives how we have
predicted from the actual output. However, there is one limitation i.e. it doesn’t give any idea about the
direction of the error which is whether we are under-predicting or over-predicting our data. It can be
represented mathematically in this way
Mean Squared Error (MSE)

MSE is similar to mean absolute error but the difference is it takes the square of the average of between
predicted and original values. The main advantage to take this metric is here, it is easier to calculate the gradient
whereas, in the case of mean absolute error, it takes complicated programming tools to calculate the gradient.
By taking the square of errors it pronounces larger errors more than smaller errors, we can focus more on larger
errors. It can be expressed mathematically in this way.

Root Mean Square Error (RMSE)

RMSE is a metric that can be obtained by just taking the square root of the MSE value. As we know that the MSE
metrics are not robust to outliers and so are the RMSE values. This gives higher weightage to the large errors in
predictions.
Root Mean Squared Logarithmic Error (RMSLE)

There are times when the target variable varies in a wide range of values. And hence we do not want to penalize
the overestimation of the target values but penalize the underestimation of the target values. For such cases,
RMSLE is used as an evaluation metric which helps us to achieve the above objective.
Some changes in the original formula of the RMSE code will give us the RMSLE formula that is as shown below:

R2 – Score

The coefﬁcient of determination also called the R2 score is used to evaluate the performance of a linear
regression model. It is the amount of variation in the output-dependent attribute which is predictable from the
input independent variable(s). It is used to check how well-observed results are reproduced by the model,
depending on the ratio of total deviation of results described by the model.
From Distributions to Graphs
Introduction

In machine learning, how we represent data signiﬁcantly impacts the kind of models
we can build and the insights we can gain. Two of the most fundamental
representations of data are:

1. Distributions – These represent how values are spread across possible

outcomes in a dataset.
● Distributions are excellent for statistical reasoning, probabilistic
predictions, and handling uncertainty.
● Examples: i. the height of individuals in a population might follow a normal
(bell-shaped) distribution.

ii. Predicting how likely a student is to pass an exam based on study hours.
2. Graphs – These are structures where entities (called nodes or vertices) are
connected by relationships (edges).

● Graphs are powerful when the relationships between entities are just as
important as the entities themselves.
● For example, a social network where users (nodes) are connected by
friendships (edges).
● Understanding how the influence of a user in a social media network
affects the spread of information.
Why Are Distributions Important in ML?

Most machine learning models rely on probability theory at their core. Distributions
help us:

● Model uncertainty – Instead of saying “This is the answer,” we say, “There’s an

80% chance this is the correct answer.”

● Make predictions – Based on prior data and likelihoods.

● Build flexible, interpretable models – Especially in real-world scenarios with

noise and incomplete data.
Key Concepts Involving Distributions

1. Bayesian Inference

○ Combines prior knowledge (prior distribution) with new data (likelihood) to update
beliefs (posterior distribution).

○ Formula:

P(H ∣ D) = P(D∣H)⋅P(H)
P(D)

Where H is the hypothesis, D is the data.

2. Maximum Likelihood Estimation (MLE)

○ A method to estimate model parameters by maximizing the likelihood of the

observed data.

○ Example: tossing a biased coin

3. Naive Bayes Classiﬁer

○ A simple probabilistic classiﬁer based on Bayes' Theorem with the assumption of

feature independence.

○ It uses distributions (usually Gaussian or Bernoulli) to estimate the likelihood of each

feature belonging to a class.
Why do we shift from distributions to graph?

When we deal with simple or independent data points, probability distributions are enough.
For example, if you want to model how likely someone is to get a job based only on their
marks, we can use a distribution like the normal or Gaussian distribution.

But real-world data is rarely that simple. As the data becomes more complex, different parts
of the data start to depend on each other, In such cases, distributions struggle to show these
connections clearly.

In these cases using Graph is helpful because:

● They can connect variables or entities directly.
● We can model how one thing affects another using edges between nodes.
● They are more flexible and expressive when we want to model complex systems.
Graph-Based Representations in ML

1. Bayesian Networks
● A Bayesian Network is a directed acyclic graph (DAG) where each node represents a
random variable, and the edges represent conditional dependencies between the variables.
● It’s used to model probabilistic relationships between variables, where the direction of the
arrows (edges) indicates the direction of causality.

How are they used in ML?

● They’re great for probabilistic reasoning and decision-making under uncertainty.

● For example, a Bayesian Network can model how the weather influences traffic and how
both influence a person’s mood.
Markov Random Fields (MRFs)

○ Markov Random Fields are undirected graphs that represent spatial or relational
dependencies. Unlike Bayesian Networks, where the edges are directed, MRFs don’t
have a clear direction and focus on the local dependencies between connected
nodes.

● How are they used in ML?

○ MRFs are often used in computer vision, where they model the relationships between
neighboring pixels, or in natural language processing, where they can model the
dependencies between words in a sentence.
Graph Neural Networks (GNNs)

○ Graph Neural Networks (GNNs) are a type of neural network that work directly on graph structures.

○ GNNs aggregate information from neighboring nodes in a graph to update each node’s representation (or
feature vector). This makes them especially effective for tasks like node classiﬁcation or link prediction.

● How are they used in ML?

○ GNNs are used for tasks like social network analysis, recommendation systems, and molecular structure
analysis.
Thank You !
Any Questions?

Unit8 (Evaluation Method)
No ratings yet
Unit8 (Evaluation Method)
43 pages
Lect 02 Evaluation Part 1
No ratings yet
Lect 02 Evaluation Part 1
33 pages
09 - ML-Model Evaluation
No ratings yet
09 - ML-Model Evaluation
41 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Improving Mental Health Literacy: A Review of The Literature
No ratings yet
Improving Mental Health Literacy: A Review of The Literature
116 pages
S1 Evaluate Performance LKW 1mar2025
No ratings yet
S1 Evaluate Performance LKW 1mar2025
26 pages
Unit 4 Model Evaluation
No ratings yet
Unit 4 Model Evaluation
24 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Assignment 5
No ratings yet
Assignment 5
22 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
Confusion Matrix & Evaluation Metrics in Machine Learning
No ratings yet
Confusion Matrix & Evaluation Metrics in Machine Learning
23 pages
Lecture 04
No ratings yet
Lecture 04
33 pages
Session-11 Machine Learning
No ratings yet
Session-11 Machine Learning
27 pages
FDS Notes
No ratings yet
FDS Notes
6 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
IT 138 - Lecture 4
No ratings yet
IT 138 - Lecture 4
30 pages
Psychological Association of The Philippines (PAP) 55th Convention 2018 Abstracts
No ratings yet
Psychological Association of The Philippines (PAP) 55th Convention 2018 Abstracts
131 pages
Evaluation Metrics in Machine Learning
No ratings yet
Evaluation Metrics in Machine Learning
14 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
16-Softmax Regression - Softmax Classifier-19!08!2024
No ratings yet
16-Softmax Regression - Softmax Classifier-19!08!2024
14 pages
Ads Exp 4
No ratings yet
Ads Exp 4
4 pages
Metric
No ratings yet
Metric
6 pages
Performance Measures
No ratings yet
Performance Measures
19 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
Lec 4
No ratings yet
Lec 4
24 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
25 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
2 pages
Model Evaluation
No ratings yet
Model Evaluation
18 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
Intel Assignment
No ratings yet
Intel Assignment
13 pages
ML3 Evaluating Models
No ratings yet
ML3 Evaluating Models
40 pages
Ad3501-Dl-Unit 4 Notes
No ratings yet
Ad3501-Dl-Unit 4 Notes
16 pages
Performance Metrics
No ratings yet
Performance Metrics
8 pages
Model Validation and Perf Metrics - v2 - Noman - 08 - 06 - 24
No ratings yet
Model Validation and Perf Metrics - v2 - Noman - 08 - 06 - 24
25 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Unit 3
No ratings yet
Unit 3
13 pages
Table 1. Performance of Machine Learning Techniques. Accuracy
No ratings yet
Table 1. Performance of Machine Learning Techniques. Accuracy
1 page
UNIT-1-2.Binary Classification and Related Tasks
No ratings yet
UNIT-1-2.Binary Classification and Related Tasks
22 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
06-FSSR DS610 2024 2025T1 Metrics
No ratings yet
06-FSSR DS610 2024 2025T1 Metrics
24 pages
How To Get A Research Degree A Survival Guide
No ratings yet
How To Get A Research Degree A Survival Guide
145 pages
Metrix in ML
No ratings yet
Metrix in ML
7 pages
08 Classifier Evaluation
No ratings yet
08 Classifier Evaluation
39 pages
Lec 8
No ratings yet
Lec 8
35 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
20 pages
Educating School Leaders
No ratings yet
Educating School Leaders
89 pages
CNRS Ranking
No ratings yet
CNRS Ranking
2 pages
Hero Marketing
No ratings yet
Hero Marketing
76 pages
Lecture - (3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture - (3-4) Evaluation Metrices Classification and Regression
28 pages
Unit 2 Chap 4
No ratings yet
Unit 2 Chap 4
14 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Tutorial 6 Evaluation Metrics For Machine Learning Models: Classification and Regression Models
No ratings yet
Tutorial 6 Evaluation Metrics For Machine Learning Models: Classification and Regression Models
22 pages
Unit III 1
No ratings yet
Unit III 1
21 pages
Data Standards & Clinical Data Interchange Standards Consortium (CDISC)
No ratings yet
Data Standards & Clinical Data Interchange Standards Consortium (CDISC)
179 pages
Evaluation Metrics-ML
No ratings yet
Evaluation Metrics-ML
16 pages
Data Science Statistics Mathematics Cheat Sheet
100% (1)
Data Science Statistics Mathematics Cheat Sheet
13 pages
What Is Statistics
No ratings yet
What Is Statistics
25 pages
Technical Report Writing and Presentations
No ratings yet
Technical Report Writing and Presentations
16 pages
Project Management Proposal
No ratings yet
Project Management Proposal
38 pages
RCR Presentation Revised
No ratings yet
RCR Presentation Revised
30 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Project On Apollo Tyres LTD For PCR
86% (14)
Project On Apollo Tyres LTD For PCR
92 pages
Syllabus Student FA08
No ratings yet
Syllabus Student FA08
16 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
AI 900 Help
No ratings yet
AI 900 Help
1 page
Hypergeometric Distribution
No ratings yet
Hypergeometric Distribution
9 pages
Course Work Essay
No ratings yet
Course Work Essay
12 pages
Movie Genre
No ratings yet
Movie Genre
5 pages
The Effects of Perceived Value On Loyalty: The Moderating Effect of Market Orientation Adoption
No ratings yet
The Effects of Perceived Value On Loyalty: The Moderating Effect of Market Orientation Adoption
24 pages
Evaluation Metrics:: Confusion Matrix
No ratings yet
Evaluation Metrics:: Confusion Matrix
7 pages
Studying Organizations Using Critical Realism A Practical Guide 1st Edition Paul K. Edwards - Download The Ebook Now For The Best Reading Experience
No ratings yet
Studying Organizations Using Critical Realism A Practical Guide 1st Edition Paul K. Edwards - Download The Ebook Now For The Best Reading Experience
83 pages
Accidents Preventive Practice For High-Rise Construction
No ratings yet
Accidents Preventive Practice For High-Rise Construction
6 pages
Bsbcrt511 Tasks
No ratings yet
Bsbcrt511 Tasks
18 pages
Set 4 QP CLASS 11
No ratings yet
Set 4 QP CLASS 11
15 pages
Chinese People View Cyberbullying
No ratings yet
Chinese People View Cyberbullying
14 pages
OKR Explained
100% (1)
OKR Explained
12 pages
Epidemiology Revision Module - 20240920 - JT
No ratings yet
Epidemiology Revision Module - 20240920 - JT
23 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
11 pages
ADB Support For Gender and Development: Results From Country Case Studies
No ratings yet
ADB Support For Gender and Development: Results From Country Case Studies
2 pages
Ostr 228
No ratings yet
Ostr 228
19 pages
2024 Pearson Psychology
No ratings yet
2024 Pearson Psychology
2 pages
Practical Research
No ratings yet
Practical Research
5 pages
Gut Microbiota
No ratings yet
Gut Microbiota
1 page
Research Instrument
No ratings yet
Research Instrument
6 pages

Performance Evaluation

Uploaded by

Performance Evaluation

Uploaded by

Machine Learning

Performance evaluation in machine learning refers to the process of assessing how

How performance of a model is evaluated?

1. Classiﬁcation Metrics: 2. Regression Evaluation Metrics:

i. Accuracy i. Mean Absolute Error (MAE)

v. Ecall Error (RMSLE)

vi. F1 Score v. R2 – Score

vii. Confusion Matrix

There are 4 terms needs to be keep in mind:

Root Mean Square Error (RMSE)

1. Distributions – These represent how values are spread across possible

● Model uncertainty – Instead of saying “This is the answer,” we say, “There’s an

● Make predictions – Based on prior data and likelihoods.

● Build flexible, interpretable models – Especially in real-world scenarios with

Where H is the hypothesis, D is the data.

○ A method to estimate model parameters by maximizing the likelihood of the

○ Example: tossing a biased coin

3. Naive Bayes Classiﬁer

○ A simple probabilistic classiﬁer based on Bayes' Theorem with the assumption of

○ It uses distributions (usually Gaussian or Bernoulli) to estimate the likelihood of each

In these cases using Graph is helpful because:

How are they used in ML?

● They’re great for probabilistic reasoning and decision-making under uncertainty.

● How are they used in ML?

● How are they used in ML?

You might also like