0% found this document useful (0 votes)
12 views3 pages

Exp 4 ML

This document outlines a machine learning experiment using the diabetes dataset to classify diabetes progression as binary outcomes. It includes data preprocessing steps such as scaling and splitting the dataset, training a Gaussian Naive Bayes classifier, and evaluating its performance using accuracy, classification report, and confusion matrix. Visualizations of the confusion matrix are also provided to illustrate the model's predictions.

Uploaded by

anant chaudhari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views3 pages

Exp 4 ML

This document outlines a machine learning experiment using the diabetes dataset to classify diabetes progression as binary outcomes. It includes data preprocessing steps such as scaling and splitting the dataset, training a Gaussian Naive Bayes classifier, and evaluating its performance using accuracy, classification report, and confusion matrix. Visualizations of the confusion matrix are also provided to illustrate the model's predictions.

Uploaded by

anant chaudhari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

EXP 4

import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Load the diabetes dataset


diabetes = load_diabetes()
df = pd.DataFrame(data=diabetes.data, columns=diabetes.feature_names)
df['target'] = diabetes.target

# Statistical description of the dataset


print("Statistical Description of the Dataset:")
print(df.describe())

# Convert target to binary (1 if diabetes progression > median, else 0)


df['target'] = (df['target'] > np.median(df['target'])).astype(int)

# Separate features (X) and class (y)


X = df.drop('target', axis=1)
y = df['target']

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize features using MinMaxScaler


scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Initialize and train the Gaussian Naive Bayes classifier


model = GaussianNB()
model.fit(X_train_scaled, y_train)

# Make predictions on the test set


y_pred_test = model.predict(X_test_scaled)

# Make predictions on the training set


y_pred_train = model.predict(X_train_scaled)

# Evaluate the classifier


accuracy_train = accuracy_score(y_train, y_pred_train)
accuracy_test = accuracy_score(y_test, y_pred_test)

print(f'Accuracy on Training Data: {accuracy_train:.2f}')


print(f'Accuracy on Test Data: {accuracy_test:.2f}')

print('Classification Report for Test Data:')


print(classification_report(y_test, y_pred_test))

print('Confusion Matrix for Test Data:')


conf_matrix = confusion_matrix(y_test, y_pred_test)
print(conf_matrix)

# Plot the confusion matrix


plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=['No Diabetes', 'Diabetes'],
yticklabels=['No Diabetes', 'Diabetes'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix for Test Data')
plt.show()

You might also like