Performance Evaluation
Performance Evaluation
Performance Evaluation
&
From Distributions to Graphs
Presented By
Dhanyatha
Performance Evaluation
Performance Evaluation
It works great if there are an equal number of samples for each class. For example, we have a 90%
sample of class A and a 10% sample of class B in our training set. Then, our model will predict with an
accuracy of 90% by predicting all the training samples belonging to class A. If we test the same model
with a test set of 60% from class A and 40% from class B. Then the accuracy will fall, and we will get an
accuracy of 60%.
Logarithmic Loss
Log loss penalizes the false (false positive) classification. It usually works well with multi-class classification.
Working on Log loss, the classifier should assign a probability for each and every class of all the samples. If there
are N samples belonging to the M class, then we calculate the Log loss in this way:
● where,
● N : no. of samples.
● M : no. of attributes.
● yij : indicates whether ith sample belongs to jth class or not.
● pij : indicates probability of ith sample belonging to jth class.
Area Under Curve (AUC)
It is one of the widely used metrics and basically used for binary classification. The AUC of a classifier is
defined as the probability of a classifier will rank a randomly chosen positive example higher than a
negative example.
few basic terms:
True Positive Rate:
Also called or termed sensitivity. True Positive Rate is considered as a portion of positive data points
that are correctly considered as positive, with respect to all data points that are positive.
TPR = TP
TP + FN
True Negative Rate
Also called or termed specificity. True Negative Rate is considered as a portion of negative data points that
are correctly considered as negative, with respect to all data points that are negatives.
TNR = TN
TN+FP
False Positive Rate
False Negatives rate is actually the proportion of actual positives that are incorrectly identified as negatives
FPR = FP
FP+TN
False Positive Rate and True Positive Rate both have values in the range [0, 1].
AUC is a curve plotted between False Positive Rate Vs True Positive Rate at all different data points with a range of
[0, 1]. Greater the value of AUCC better the performance of the model.
Precision: There is another metric named Precision. Precision is a measure of a model’s performance
that tells you how many of the positive predictions made by the model are actually correct.
Precision = TP
TP + FP
Recall : Recall is the ratio of correctly predicted positive instances to the total actual positive instances.
It measures how well the model captures all relevant positive cases.
Recall = TP
TP +FN
F1 Score: F1-Score is a harmonic mean between recall and precision. Its range is [0,1]. This metric usually
tells us how precise (correctly classifies how many instances) and robust (does not miss any significant
number of instances) our classifier is.
Lower recall and higher precision give you great accuracy but then it misses a large number of instances.
The more the F1 score better will be performance. It can be expressed mathematically in this way:
F1 = 2∗ 1
Precision + recall
Confusion Matrix
Confusion matrix creates a N X N matrix, where N is the number of classes or categories that are to be
predicted. Here we have N = 2, so we get a 2 X 2 matrix. Suppose there is a problem with our practice
which is a binary classification. Samples of that classification belong to either Yes or No. So, we build our
classifier which will predict the class for the new input sample. After that, we test the model with 165
samples, and we get the following result.
MSE is similar to mean absolute error but the difference is it takes the square of the average of between
predicted and original values. The main advantage to take this metric is here, it is easier to calculate the gradient
whereas, in the case of mean absolute error, it takes complicated programming tools to calculate the gradient.
By taking the square of errors it pronounces larger errors more than smaller errors, we can focus more on larger
errors. It can be expressed mathematically in this way.
RMSE is a metric that can be obtained by just taking the square root of the MSE value. As we know that the MSE
metrics are not robust to outliers and so are the RMSE values. This gives higher weightage to the large errors in
predictions.
Root Mean Squared Logarithmic Error (RMSLE)
There are times when the target variable varies in a wide range of values. And hence we do not want to penalize
the overestimation of the target values but penalize the underestimation of the target values. For such cases,
RMSLE is used as an evaluation metric which helps us to achieve the above objective.
Some changes in the original formula of the RMSE code will give us the RMSLE formula that is as shown below:
R2 – Score
The coefficient of determination also called the R2 score is used to evaluate the performance of a linear
regression model. It is the amount of variation in the output-dependent attribute which is predictable from the
input independent variable(s). It is used to check how well-observed results are reproduced by the model,
depending on the ratio of total deviation of results described by the model.
From Distributions to Graphs
Introduction
In machine learning, how we represent data significantly impacts the kind of models
we can build and the insights we can gain. Two of the most fundamental
representations of data are:
ii. Predicting how likely a student is to pass an exam based on study hours.
2. Graphs – These are structures where entities (called nodes or vertices) are
connected by relationships (edges).
● Graphs are powerful when the relationships between entities are just as
important as the entities themselves.
● For example, a social network where users (nodes) are connected by
friendships (edges).
● Understanding how the influence of a user in a social media network
affects the spread of information.
Why Are Distributions Important in ML?
Most machine learning models rely on probability theory at their core. Distributions
help us:
1. Bayesian Inference
○ Combines prior knowledge (prior distribution) with new data (likelihood) to update
beliefs (posterior distribution).
○ Formula:
P(H ∣ D) = P(D∣H)⋅P(H)
P(D)
When we deal with simple or independent data points, probability distributions are enough.
For example, if you want to model how likely someone is to get a job based only on their
marks, we can use a distribution like the normal or Gaussian distribution.
But real-world data is rarely that simple. As the data becomes more complex, different parts
of the data start to depend on each other, In such cases, distributions struggle to show these
connections clearly.
1. Bayesian Networks
● A Bayesian Network is a directed acyclic graph (DAG) where each node represents a
random variable, and the edges represent conditional dependencies between the variables.
● It’s used to model probabilistic relationships between variables, where the direction of the
arrows (edges) indicates the direction of causality.
● For example, a Bayesian Network can model how the weather influences traffic and how
both influence a person’s mood.
Markov Random Fields (MRFs)
○ Markov Random Fields are undirected graphs that represent spatial or relational
dependencies. Unlike Bayesian Networks, where the edges are directed, MRFs don’t
have a clear direction and focus on the local dependencies between connected
nodes.
○ MRFs are often used in computer vision, where they model the relationships between
neighboring pixels, or in natural language processing, where they can model the
dependencies between words in a sentence.
Graph Neural Networks (GNNs)
○ Graph Neural Networks (GNNs) are a type of neural network that work directly on graph structures.
○ GNNs aggregate information from neighboring nodes in a graph to update each node’s representation (or
feature vector). This makes them especially effective for tasks like node classification or link prediction.
○ GNNs are used for tasks like social network analysis, recommendation systems, and molecular structure
analysis.
Thank You !
Any Questions?