0% found this document useful (0 votes)
6 views2 pages

Ai Code

The document outlines a data analysis workflow using Python, including data loading, exploration, visualization, handling missing values, and encoding categorical variables. It utilizes logistic regression to train a model on the processed data and evaluates its performance through accuracy, confusion matrix, and classification report. Key libraries used include pandas, numpy, matplotlib, seaborn, and scikit-learn.

Uploaded by

thetit4ns
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

Ai Code

The document outlines a data analysis workflow using Python, including data loading, exploration, visualization, handling missing values, and encoding categorical variables. It utilizes logistic regression to train a model on the processed data and evaluates its performance through accuracy, confusion matrix, and classification report. Key libraries used include pandas, numpy, matplotlib, seaborn, and scikit-learn.

Uploaded by

thetit4ns
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

# Import necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import files
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report

# Step 1: Upload and load the CSV file

# Load the data into a DataFrame


df = pd.read_csv(“Data”)

# Step 2: Explore the data


print("Exploring data info")
print(df.info())

print("\nFirst 8 rows")
print(df.head(8))

print("\nGetting statistical summary")


print(df.describe())

print("\nChecking for missing values")


print(df.isnull().sum())

print("\nGetting shape of the DataFrame")


print(df.shape)

# Step 3: Visualize the 'Purchased' column


plt.figure(figsize=(6, 4))
sns.countplot(x='Purchased', data=df)
plt.title('Distribution of Purchased (Target Variable)')
plt.show()

# Step 4: Handle missing values


# Handling missing values with mean imputation for numerical columns
imputer = SimpleImputer(strategy='mean')
df['Age'] = imputer.fit_transform(df[['Age']])
df['Income'] = imputer.fit_transform(df[['Income']])
# Step 5: Encode categorical variables
# Apply label encoding to the 'Purchased' column
label_encoder = LabelEncoder()
df['Purchased'] = label_encoder.fit_transform(df['Purchased'])

# Apply one-hot encoding to the 'Gender' column


df = pd.get_dummies(df, columns=['Gender'], drop_first=True)

# Display the first few rows to verify the encoding


print("\nData after encoding:")
print(df.head())

# Step 6: Prepare features and target variable


# Assuming 'Gender_Male' is the new column after one-hot encoding
X = df[['Age', 'Income', 'Gender_Male']] # Features
y = df['Purchased'] # Target variable

# Step 7: Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,
random_state=42)

# Step 8: Train the logistic regression model


model = LogisticRegression()
model.fit(X_train, y_train)

# Step 9: Make predictions on the test set


y_pred = model.predict(X_test)

# Step 10: Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f"\nModel Accuracy: {accuracy:.2f}")


print("\nConfusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(class_report)

You might also like