0% found this document useful (0 votes)
31 views2 pages

PCA - Colab

Uploaded by

Dina Bardakji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views2 pages

PCA - Colab

Uploaded by

Dina Bardakji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

12/14/24, 9:13 PM Untitled3.

ipynb - Colab

# Import necessary libraries


import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import os

# Define the file path


file_path = '/creditcard.csv'

# Check if the file exists


if os.path.isfile(file_path):
print("File found, loading the dataset.")
data = pd.read_csv(file_path) # Load dataset
print("First few rows of the dataset:")
print(data.head()) # Display the first few rows
else:
print("Error: File not found. Generating mock data.")
# Create mock data if the CSV file is not found
data = pd.DataFrame({
'Feature1': np.random.rand(10),
'Feature2': np.random.rand(10),
'Feature3': np.random.rand(10),
'Target': np.random.choice([0, 1], size=10)
})
print(data.head()) # Display mock data

# Check if the dataset has at least 2 columns


if data.shape[1] < 2:
print("Error: The dataset does not have enough columns.")
exit()

# Check if the target column is categorical with exactly two classes


if data.iloc[:, -1].nunique() != 2:
print("Error: The target variable must have exactly two classes for binary classification.")
exit()

# Features (all columns except the last) and Labels (last column)
X = data.iloc[:, :-1].values # Features
y = data.iloc[:, -1].values # Labels

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Apply PCA to reduce the dimensionality of the data to 2 components


pca = PCA(n_components=2) # Set number of components to 2
X_train_pca = pca.fit_transform(X_train) # Fit PCA on training data
X_test_pca = pca.transform(X_test) # Transform test data

# Train a logistic regression model on the reduced data


model = LogisticRegression() # Initialize logistic regression
model.fit(X_train_pca, y_train) # Train the model

# Make predictions on the test data


y_pred = model.predict(X_test_pca)

# Calculate the accuracy of the model


accuracy = accuracy_score(y_test, y_pred)
print("PCA Explained Variance Ratio:", pca.explained_variance_ratio_)
print("Accuracy of Logistic Regression:", accuracy)

# Provide interpretation of the accuracy score


if accuracy < 0.5:
interpretation = "The model is performing poorly. It may not be learning the patterns in the data."
elif 0.5 <= accuracy < 0.7:
interpretation = "The model has moderate accuracy. There may be room for improvement."
elif 0.7 <= accuracy < 0.9:
interpretation = "The model is performing well, but there might still be some overfitting."
else:
interpretation = "The model has high accuracy and is likely performing well on the test data."

# Print the interpretation of the accuracy


print("Interpretation of Accuracy:", interpretation)

https://colab.research.google.com/drive/1_KBRgYJlwVjZQJe-P3HdUflPhQPGtAG4#scrollTo=T_ebqXyuWO1Z&printMode=true 1/2
12/14/24, 9:13 PM Untitled3.ipynb - Colab
File found, loading the dataset.
First few rows of the dataset:
Time V1 V2 V3 V4 V5 V6 V7 \
0 0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599
1 0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803
2 1 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461
3 1 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609
4 2 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941

V8 V9 ... V21 V22 V23 V24 V25 \


0 0.098698 0.363787 ... -0.018307 0.277838 -0.110474 0.066928 0.128539
1 0.085102 -0.255425 ... -0.225775 -0.638672 0.101288 -0.339846 0.167170
2 0.247676 -1.514654 ... 0.247998 0.771679 0.909412 -0.689281 -0.327642
3 0.377436 -1.387024 ... -0.108300 0.005274 -0.190321 -1.175575 0.647376
4 -0.270533 0.817739 ... -0.009431 0.798278 -0.137458 0.141267 -0.206010

V26 V27 V28 Amount Class


0 -0.189115 0.133558 -0.021053 149.62 0
1 0.125895 -0.008983 0.014724 2.69 0
2 -0.139097 -0.055353 -0.059752 378.66 0
3 -0.221929 0.062723 0.061458 123.50 0
4 0.502292 0.219422 0.215153 69.99 0

[5 rows x 31 columns]
PCA Explained Variance Ratio: [9.99761791e-01 2.38122500e-04]
Accuracy of Logistic Regression: 0.9971666666666666
Interpretation of Accuracy: The model has high accuracy and is likely performing well on the test data.

https://colab.research.google.com/drive/1_KBRgYJlwVjZQJe-P3HdUflPhQPGtAG4#scrollTo=T_ebqXyuWO1Z&printMode=true 2/2

You might also like