0% found this document useful (0 votes)
6 views5 pages

Data Analytics III

The document outlines a laboratory exercise for a Data Science and Big Data Analytics course, focusing on implementing a Naïve Bayes classifier using the Iris dataset. It details the steps for importing libraries, loading data, splitting it into training and testing sets, training the model, making predictions, and evaluating performance metrics. The results, including the confusion matrix and various metrics such as accuracy, precision, and recall, are also displayed.

Uploaded by

Chirag Patekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views5 pages

Data Analytics III

The document outlines a laboratory exercise for a Data Science and Big Data Analytics course, focusing on implementing a Naïve Bayes classifier using the Iris dataset. It details the steps for importing libraries, loading data, splitting it into training and testing sets, training the model, making predictions, and evaluating performance metrics. The results, including the confusion matrix and various metrics such as accuracy, precision, and recall, are also displayed.

Uploaded by

Chirag Patekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Third Year Engineering (2019 Pattern)

Course Code: 310256


Course Name: Data Science and Big Data Analytics Laboratory
Group A
6) Data Analytics III
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score,
recall_score

# Load the Iris dataset


url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
columns = ["sepal_length", "sepal_width", "petal_length", "petal_width",
"species"]
df = pd.read_csv(url, names=columns)

# Split the dataset into features and target


X = df.iloc[:, :-1] # All columns except the last one as features
y = df.iloc[:, -1] # Last column as target

# Split into training and testing data (70% training, 30% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Create and train the Naïve Bayes classifier


model = GaussianNB()
model.fit(X_train, y_train)

# Predict on test data


y_pred = model.predict(X_test)

# Generate confusion matrix


cm = confusion_matrix(y_test, y_pred)

# Extract True Positive, False Positive, True Negative, False Negative


TP = cm[0][0]
FP = cm.sum(axis=0)[0] - TP
FN = cm.sum(axis=1)[0] - TP
TN = cm.sum() - (TP + FP + FN)

# Compute metrics
accuracy = accuracy_score(y_test, y_pred)
error_rate = 1 - accuracy
precision = precision_score(y_test, y_pred, average='macro')
recall = recall_score(y_test, y_pred, average='macro')

# Display Results
print(f"Confusion Matrix:\n{cm}")
print(f"True Positive (TP): {TP}")
print(f"False Positive (FP): {FP}")
print(f"True Negative (TN): {TN}")
print(f"False Negative (FN): {FN}")
print(f"Accuracy: {accuracy:.2f}")
print(f"Error Rate: {error_rate:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")

Explanation of Each Step:


1. Import Libraries
o pandas – For handling data.
o train_test_split – For splitting data into training and testing sets.
o GaussianNB – For creating a Naïve Bayes classifier.
o confusion_matrix, accuracy_score, precision_score, recall_score – For
evaluating model performance.
2. Load Dataset
o Load iris.csv using pd.read_csv()
3. Split Features and Target
o X = feature columns
o y = target column (species)
4. Split into Training and Testing Sets
o 70% for training, 30% for testing
5. Create and Train Naïve Bayes Classifier
o GaussianNB() assumes data follows a normal distribution
o fit() trains the model on training data
6. Make Predictions
o predict() predicts species on test data
7. Generate Confusion Matrix
o confusion_matrix() compares predicted vs actual values
8. Extract Confusion Matrix Values
o TP, FP, FN, TN calculated from confusion matrix
9. Compute Performance Metrics
o Accuracy = Correct predictions / Total predictions
o Error rate = 1 - Accuracy
o Precision = TP / (TP + FP)
o Recall = TP / (TP + FN)
10. Display Results
 Print confusion matrix and computed metrics

OUTPUT-

You might also like