0% found this document useful (0 votes)
13 views4 pages

Dsbda 5

The document outlines a Python script for building a logistic regression model using a dataset of social network ads. It includes steps for data loading, preprocessing, model training, and evaluation, achieving an accuracy of 89%. Additional performance metrics such as precision and recall are also calculated and displayed.

Uploaded by

Manasi Deshmukh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views4 pages

Dsbda 5

The document outlines a Python script for building a logistic regression model using a dataset of social network ads. It includes steps for data loading, preprocessing, model training, and evaluation, achieving an accuracy of 89%. Additional performance metrics such as precision and recall are also calculated and displayed.

Uploaded by

Manasi Deshmukh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

# Import necessary libraries

import pandas as pd #pandas is used for data


manipulation,
from sklearn.model_selection import train_test_split # for
splitting the dataset
from sklearn.preprocessing import StandardScaler #StandardScaler
for feature scaling
from sklearn.linear_model import LogisticRegression #for logistic
regression modeling, and accuracy_score
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix #classification_report, and confusion_matrix for
evaluating the model

# Load the dataset


url = ('C:\\Users\\rashi\\OneDrive\\Desktop\\DSBD PRACTICAL\\Practical
5\\Social_Network_Ads.csv')
dataset = pd.read_csv(url)

# Display the first few rows of the dataset to understand its


structure
print(dataset.head())

User ID Gender Age EstimatedSalary Purchased


0 15624510 Male 19 19000 0
1 15810944 Male 35 20000 0
2 15668575 Female 26 43000 0
3 15603246 Female 27 57000 0
4 15804002 Male 19 76000 0

# Define features and target variable


X = dataset.iloc[:, [2, 3]].values # Assuming columns 2 and 3 are the
relevant features
y = dataset.iloc[:, 4].values # Assuming column 4 is the target
variable

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.25, random_state=0)

# Feature scaling (optional, but can improve convergence speed)


sc = StandardScaler() # creates an instance of the
StandardScaler class.
X_train = sc.fit_transform(X_train) #fits the StandardScaler on
the training data (X_train)
X_test = sc.transform(X_test)

# Initialize the logistic regression model


classifier = LogisticRegression(random_state=0)

# Fit the model to the training data


classifier.fit(X_train, y_train)
LogisticRegression(random_state=0)

# Make predictions on the test set


y_pred = classifier.predict(X_test)

# Evaluate the performance of the model


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy}')
print(f'Confusion Matrix:\n{conf_matrix}')
print(f'Classification Report:\n{classification_rep}')

Accuracy: 0.89
Confusion Matrix:
[[65 3]
[ 8 24]]
Classification Report:
precision recall f1-score support

0 0.89 0.96 0.92 68


1 0.89 0.75 0.81 32

accuracy 0.89 100


macro avg 0.89 0.85 0.87 100
weighted avg 0.89 0.89 0.89 100

# Combine the actual labels and predicted labels into a DataFrame for
comparison
results_df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})

# Print the DataFrame to see the actual and predicted labels side by
side
print("\nActual vs Predicted Labels:")
print(results_df)

Actual vs Predicted Labels:


Actual Predicted
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
.. ... ...
95 1 0
96 0 0
97 1 0
98 1 1
99 1 1

[100 rows x 2 columns]

correctly_classified_samples = results_df[results_df['Actual'] ==
results_df['Predicted']].head(10)
print("\nFirst 10 Samples with Correct Classification:")
print(correctly_classified_samples)

First 10 Samples with Correct Classification:


Actual Predicted
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 1 1
8 0 0
10 0 0

# Compute additional performance metrics


TP = conf_matrix[1, 1] # True Positives
TN = conf_matrix[0, 0] # True Negatives
FP = conf_matrix[0, 1] # False Positives
FN = conf_matrix[1, 0] # False Negatives

# Metrics calculations
accuracy = (TP + TN) / (TP + TN + FP + FN)
error_rate = (FP + FN) / (TP + TN + FP + FN)
precision = TP / (TP + FP)
recall = TP / (TP + FN)

# Print additional performance metrics


print(f'\nTrue Positives (TP): {TP}')
print(f'True Negatives (TN): {TN}')
print(f'False Positives (FP): {FP}')
print(f'False Negatives (FN): {FN}')
print(f'Accuracy: {accuracy}')
print(f'Error Rate: {error_rate}')
print(f'Precision: {precision}')
print(f'Recall: {recall}')

True Positives (TP): 24


True Negatives (TN): 65
False Positives (FP): 3
False Negatives (FN): 8
Accuracy: 0.89
Error Rate: 0.11
Precision: 0.8888888888888888
Recall: 0.75

You might also like