0% found this document useful (0 votes)
5 views5 pages

Ads 5

The document outlines the implementation and evaluation of performance metrics for machine learning models, specifically focusing on classification problems. Key metrics discussed include the Confusion Matrix, Accuracy, Precision, Recall, and F1 Score, each serving to quantify model effectiveness. A practical example using the iris dataset demonstrates how to compute these metrics using Python and relevant libraries.

Uploaded by

madhavikhaire77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views5 pages

Ads 5

The document outlines the implementation and evaluation of performance metrics for machine learning models, specifically focusing on classification problems. Key metrics discussed include the Confusion Matrix, Accuracy, Precision, Recall, and F1 Score, each serving to quantify model effectiveness. A practical example using the iris dataset demonstrates how to compute these metrics using Python and relevant libraries.

Uploaded by

madhavikhaire77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Experiment No.

Aim:
To implement and explore performance evaluation metrics for data models

Theory:
In machine learning, once we build a model, we need to evaluate its performance. This helps in
understanding how well the model is performing and whether it is suitable for making
predictions. For classification problems, there are several performance evaluation metrics that
help quantify the effectiveness of a model. These metrics include Accuracy, Precision, Recall,
F1 Score, and the

Confusion Matrix.
1. Confusion Matrix:
The Confusion Matrix is a table used to describe the performance of a classification algorithm.
It compares the predicted labels with the actual true labels of the dataset. It contains the
following four components:
● True Positive (TP): The number of correct predictions where the model predicted the
positive class, and it is actually positive.
● True Negative (TN): The number of correct predictions where the model predicted the
negative class, and it is actually negative.
● False Positive (FP): The number of incorrect predictions where the model predicted the
positive class, but it is actually negative (also known as a Type I error).
● False Negative (FN): The number of incorrect predictions where the model predicted the
negative class, but it is actually positive (also known as a Type II error).
The Confusion Matrix is represented as:
Predicted Predicted
Positive Negative

Actual Positive True Positive (TP) False Negative


(FN)
Actual 2. Accuracy: True Negative (TN)
Negative False Positive (FP)

Accuracy is the simplest and most commonly used evaluation metric. It measures the overall
performance of the model and is calculated as the ratio of correct predictions to the total number
of predictions.
● High accuracy indicates that the model is performing well overall.
● However, accuracy can be misleading, especially if the dataset is imbalanced (e.g., one
class is much larger than the other). In such cases, even a model that predicts the
majority class most of the time could have high accuracy but poor performance in
predicting the minority class.

3. Precision:
Precision is a metric that measures how many of the predicted positive instances are actually
positive. It answers the question: Of all the instances that were predicted as positive, how many
were truly positive?

● High precision means that when the model predicts positive, it is very likely to be
correct.
● Precision is particularly important when the cost of a false positive is high (for example,
in medical diagnoses where misdiagnosing healthy patients as sick could be harmful).

4. Recall (Sensitivity or True Positive Rate):


Recall (also known as Sensitivity or True Positive Rate) measures how many actual positive
instances were correctly predicted by the model. It answers the question: Of all the actual
positives, how many did the model successfully identify?

● High recall indicates that the model is correctly identifying most of the positive cases. ●
Recall is critical when the cost of missing a positive instance (false negative) is high. For
instance, in fraud detection or disease detection, we prefer to catch most of the positive
cases, even if it means having some false positives.

5. F1 Score:
The F1 Score is the harmonic mean of Precision and Recall. It provides a single score that
balances both precision and recall, especially when the classes are imbalanced.

● The F1 score is a good metric when we want to balance both precision and recall. It is
particularly useful when we have an imbalanced dataset, where one class is much more
frequent than the other.
● The F1 score gives a better idea of a model’s performance compared to accuracy, as it
considers both false positives and false negatives.
Key Takeaways:
● Confusion Matrix: A table summarizing the true vs. predicted labels. It shows how
many instances were correctly/incorrectly classified.
● Accuracy: The overall percentage of correct predictions made by the model. ●
Precision: The ratio of correctly predicted positive instances to the total predicted
positives. It tells you how reliable your positive predictions are.
● Recall: The ratio of correctly predicted positive instances to the actual total positives. It
tells you how many of the actual positive instances were caught by the model. ● F1 Score:
The harmonic mean of precision and recall. It is a balanced metric for imbalanced datasets.

Program and output:


# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns
# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train a logistic regression model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Accuracy score
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}')

# Precision score (weighted for multi-class classification)


precision = precision_score(y_test, y_pred, average='weighted')
print(f'Precision (Weighted): {precision:.4f}')

# Recall score (weighted for multi-class classification)


recall = recall_score(y_test, y_pred, average='weighted')
print(f'Recall (Weighted): {recall:.4f}')

# F1 score (weighted for multi-class classification)


f1 = f1_score(y_test, y_pred, average='weighted')
print(f'F1 Score (Weighted): {f1:.4f}')

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

# Plotting the confusion matrix


plt.figure(figsize=(6, 5))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=iris.target_names,
yticklabels=iris.target_names)
plt.title('Confusion Matrix')
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.show()

# Classification Report (Precision, Recall, F1 for each class)


class_report = classification_report(y_test, y_pred)
print("\nClassification Report:")
print(class_report)

You might also like